Sanger Institute - Publications 2013

Number of papers published in 2013: 484

  • Bloomsbury report on mouse embryo phenotyping: recommendations from the IMPC workshop on embryonic lethal screening.

    Adams D, Baldock R, Bhattacharya S, Copp AJ, Dickinson M, Greene ND, Henkelman M, Justice M, Mohun T, Murray SA, Pauws E, Raess M, Rossant J, Weaver T and West D

    Wellcome Trust Sanger Institute, Cambridge, CB10 1SA, UK.

    Identifying genes that are important for embryo development is a crucial first step towards understanding their many functions in driving the ordered growth, differentiation and organogenesis of embryos. It can also shed light on the origins of developmental disease and congenital abnormalities. Current international efforts to examine gene function in the mouse provide a unique opportunity to pinpoint genes that are involved in embryogenesis, owing to the emergence of embryonic lethal knockout mutants. Through internationally coordinated efforts, the International Knockout Mouse Consortium (IKMC) has generated a public resource of mouse knockout strains and, in April 2012, the International Mouse Phenotyping Consortium (IMPC), supported by the EU InfraCoMP programme, convened a workshop to discuss developing a phenotyping pipeline for the investigation of embryonic lethal knockout lines. This workshop brought together over 100 scientists, from 13 countries, who are working in the academic and commercial research sectors, including experts and opinion leaders in the fields of embryology, animal imaging, data capture, quality control and annotation, high-throughput mouse production, phenotyping, and reporter gene analysis. This article summarises the outcome of the workshop, including (1) the vital scientific importance of phenotyping embryonic lethal mouse strains for basic and translational research; (2) a common framework to harmonise international efforts within this context; (3) the types of phenotyping that are likely to be most appropriate for systematic use, with a focus on 3D embryo imaging; (4) the importance of centralising data in a standardised form to facilitate data mining; and (5) the development of online tools to allow open access to and dissemination of the phenotyping data.

    Funded by: NICHD NIH HHS: P30 HD024064

    Disease models & mechanisms 2013;6;3;571-9

  • Bacteriotherapy for the treatment of intestinal dysbiosis caused by Clostridium difficile infection.

    Adamu BO and Lawley TD

    Bacterial Pathogenesis Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    Faecal microbiota transplantation (FMT) has been used for more than five decades to treat a variety of intestinal diseases associated with pathological imbalances within the resident microbiota, termed dysbiosis. FMT has been particularly effective for treating patients with recurrent Clostridium difficile infection who are left with few clinical options other than continued antibiotic therapy. Our increasing knowledge of the structure and function of the human intestinal microbiota and C. difficile pathogenesis has led to the understanding that FMT promotes intestinal ecological restoration and highlights the microbiota as a viable therapeutic target. However, the use of undefined faecal samples creates a barrier for widespread clinical use because of safety and aesthetic issues. An emerging concept of bacteriotherapy, the therapeutic use of a defined mixture of harmless, health-associated bacteria, holds promise for the treatment of patients with severe C. difficile infection, and possibly represents a paradigm shift for the treatment of diseases linked to intestinal dysbiosis.

    Funded by: Medical Research Council: 93614; Wellcome Trust: 098051

    Current opinion in microbiology 2013;16;5;596-601

  • Dynamic image-based modelling of kidney branching morphogenesis

    Adivarahan,S.;, Menshykau,D.;, MICHOS,O.; and Iber,D.

    Lecture Notes in Computer Science 2013;8130 LNBI;106-19

  • Sequencing ancient calcified dental plaque shows changes in oral microbiota with dietary shifts of the Neolithic and Industrial revolutions.

    Adler CJ, Dobney K, Weyrich LS, Kaidonis J, Walker AW, Haak W, Bradshaw CJ, Townsend G, Sołtysiak A, Alt KW, Parkhill J and Cooper A

    Australian Centre for Ancient DNA, School of Earth and Environmental Sciences, The University of Adelaide, Adelaide, South Australia, Australia.

    The importance of commensal microbes for human health is increasingly recognized, yet the impacts of evolutionary changes in human diet and culture on commensal microbiota remain almost unknown. Two of the greatest dietary shifts in human evolution involved the adoption of carbohydrate-rich Neolithic (farming) diets (beginning ∼10,000 years before the present) and the more recent advent of industrially processed flour and sugar (in ∼1850). Here, we show that calcified dental plaque (dental calculus) on ancient teeth preserves a detailed genetic record throughout this period. Data from 34 early European skeletons indicate that the transition from hunter-gatherer to farming shifted the oral microbial community to a disease-associated configuration. The composition of oral microbiota remained unexpectedly constant between Neolithic and medieval times, after which (the now ubiquitous) cariogenic bacteria became dominant, apparently during the Industrial Revolution. Modern oral microbiotic ecosystems are markedly less diverse than historic populations, which might be contributing to chronic oral (and other) disease in postindustrial lifestyles.

    Funded by: Wellcome Trust: 076964, WT092799/Z/10/Z, WT098051

    Nature genetics 2013;45;4;450-5, 455e1

  • Partial sleep restriction activates immune response-related gene expression pathways: experimental and epidemiological studies in humans.

    Aho V, Ollila HM, Rantanen V, Kronholm E, Surakka I, van Leeuwen WM, Lehto M, Matikainen S, Ripatti S, Härmä M, Sallinen M, Salomaa V, Jauhiainen M, Alenius H, Paunio T and Porkka-Heiskanen T

    Department of Physiology, Institute of Biomedicine, University of Helsinki, Helsinki, Finland.

    Epidemiological studies have shown that short or insufficient sleep is associated with increased risk for metabolic diseases and mortality. To elucidate mechanisms behind this connection, we aimed to identify genes and pathways affected by experimentally induced, partial sleep restriction and to verify their connection to insufficient sleep at population level. The experimental design simulated sleep restriction during a working week: sleep of healthy men (N = 9) was restricted to 4 h/night for five nights. The control subjects (N = 4) spent 8 h/night in bed. Leukocyte RNA expression was analyzed at baseline, after sleep restriction, and after recovery using whole genome microarrays complemented with pathway and transcription factor analysis. Expression levels of the ten most up-regulated and ten most down-regulated transcripts were correlated with subjective assessment of insufficient sleep in a population cohort (N = 472). Experimental sleep restriction altered the expression of 117 genes. Eight of the 25 most up-regulated transcripts were related to immune function. Accordingly, fifteen of the 25 most up-regulated Gene Ontology pathways were also related to immune function, including those for B cell activation, interleukin 8 production, and NF-κB signaling (P<0.005). Of the ten most up-regulated genes, expression of STX16 correlated negatively with self-reported insufficient sleep in a population sample, while three other genes showed tendency for positive correlation. Of the ten most down-regulated genes, TBX21 and LGR6 correlated negatively and TGFBR3 positively with insufficient sleep. Partial sleep restriction affects the regulation of signaling pathways related to the immune system. Some of these changes appear to be long-lasting and may at least partly explain how prolonged sleep restriction can contribute to inflammation-associated pathological states, such as cardiometabolic diseases.

    PloS one 2013;8;10;e77184

  • CCL3L1 copy number, HIV load, and immune reconstitution in sub-Saharan Africans.

    Aklillu E, Odenthal-Hesse L, Bowdrey J, Habtewold A, Ngaimisi E, Yimer G, Amogne W, Mugusi S, Minzi O, Makonnen E, Janabi M, Mugusi F, Aderaye G, Hardwick R, Fu B, Viskaduraki M, Yang F and Hollox EJ

    Background: The role of copy number variation of the CCL3L1 gene, encoding MIP1alpha, in contributing to the host variation in susceptibility and response to HIV infection is controversial. Here we analyse a sub-Saharan African cohort from Tanzania and Ethiopia, two countries with a high prevalence of HIV-1 and a high co-morbidity of HIV with tuberculosis. Methods: We use a form of quantitative PCR called the paralogue ratio test to determine CCL3L1 gene copy number in 1134 individuals and validate our copy number typing using array comparative genomic hybridisation and fiber-FISH. Results: We find no significant association of CCL3L1 gene copy number with HIV load in antiretroviral-naive patients prior to initiation of combination highly active anti-retroviral therapy. However, we find a significant association of low CCL3L1 gene copy number with improved immune reconstitution following initiation of highly active anti-retroviral therapy (p = 0.012), replicating a previous study. Conclusions: Our work supports a role for CCL3L1 copy number in immune constitution following antiretroviral therapy in HIV, and suggests that the MIP1alpha -CCR5 axis might be targeted to aid immune reconstitution.

    BMC infectious diseases 2013;13;1;536

  • New insights into the genetic basis of TAR (thrombocytopenia-absent radii) syndrome.

    Albers CA, Newbury-Ecob R, Ouwehand WH and Ghevaert C

    Department of Haematology, University of Cambridge, UK; NHS Blood and Transplant, Cambridge, UK; Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. Electronic address: c.albers@gen.umcn.nl.

    Thrombocytopenia with absent radii (TAR) syndrome is a rare disorder combining specific skeletal abnormalities with a reduced platelet count. Rare proximal microdeletions of 1q21.1 are found in the majority of patients but are also found in unaffected parents. Recently it was shown that TAR syndrome is caused by the compound inheritance of a low-frequency noncoding SNP and a rare null allele in RBM8A, a gene encoding the exon-junction complex subunit member Y14 located in the deleted region. This finding provides new insight into the complex inheritance pattern and new clues to the molecular mechanisms underlying TAR syndrome. We discuss TAR syndrome in the context of abnormal phenotypes associated with proximal and distal 1q21.1 microdeletion and microduplications with incomplete penetrance and variable expressivity.

    Current opinion in genetics &amp; development 2013

  • AHT-ChIP-seq: a completely automated robotic protocol for high-throughput chromatin immunoprecipitation.

    Aldridge S, Watt S, Quail MA, Rayner T, Lukk M, Bimson MF, Gaffney D and Odom DT

    University of Cambridge, Cancer Research UK - Cambridge Institute, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK. duncan.odom@cruk.cam.ac.uk.

    ChIP-seq is an established manually-performed method for identifying DNA-protein interactions genome-wide. Here, we describe a protocol for automated high-throughput (AHT) ChIP-seq. To demonstrate the quality of data obtained using AHT-ChIP-seq, we applied it to five proteins in mouse livers using a single 96-well plate, demonstrating an extremely high degree of qualitative and quantitative reproducibility among biological and technical replicates. We estimated the optimum and minimum recommended cell numbers required to perform AHT-ChIP-seq by running an additional plate using HepG2 and MCF7 cells. With this protocol, commercially available robotics can perform four hundred experiments in five days.

    Funded by: Cancer Research UK: A10185; European Research Council: 202218

    Genome biology 2013;14;11;R124

  • Specificity and heterogeneity of terahertz radiation effect on gene expression in mouse mesenchymal stem cells.

    Alexandrov BS, Phipps ML, Alexandrov LB, Booshehri LG, Erat A, Zabolotny J, Mielke CH, Chen HT, Rodriguez G, Rasmussen KØ, Martinez JS, Bishop AR and Usheva A

    Theoretical Division, Los Alamos National Laboratory , Los Alamos, NM 87545, USA ; Harvard Medical School, Beth Israel Deaconess Medical Center, Department of Medicine , Boston, MA 02215, USA.

    We report that terahertz (THz) irradiation of mouse mesenchymal stem cells (mMSCs) with a single-frequency (SF) 2.52 THz laser or pulsed broadband (centered at 10 THz) source results in irradiation specific heterogenic changes in gene expression. The THz effect depends on irradiation parameters such as the duration and type of THz source, and on the degree of stem cell differentiation. Our microarray survey and RT-PCR experiments demonstrate that prolonged broadband THz irradiation drives mMSCs toward differentiation, while 2-hour irradiation (regardless of THz sources) affects genes transcriptionally active in pluripotent stem cells. The strictly controlled experimental environment indicates minimal temperature changes and the absence of any discernable response to heat shock and cellular stress genes imply a non-thermal response. Computer simulations of the core promoters of two pluripotency markers reveal association between gene upregulation and propensity for DNA breathing. We propose that THz radiation has potential for non-contact control of cellular gene expression.

    Scientific reports 2013;3;1184

  • Signatures of mutational processes in human cancer.

    Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SA, Behjati S, Biankin AV, Bignell GR, Bolli N, Borg A, Børresen-Dale AL, Boyault S, Burkhardt B, Butler AP, Caldas C, Davies HR, Desmedt C, Eils R, Eyfjörd JE, Foekens JA, Greaves M, Hosoda F, Hutter B, Ilicic T, Imbeaud S, Imielinski M, Imielinsk M, Jäger N, Jones DT, Jones D, Knappskog S, Kool M, Lakhani SR, López-Otín C, Martin S, Munshi NC, Nakamura H, Northcott PA, Pajic M, Papaemmanuil E, Paradiso A, Pearson JV, Puente XS, Raine K, Ramakrishna M, Richardson AL, Richter J, Rosenstiel P, Schlesner M, Schumacher TN, Span PN, Teague JW, Totoki Y, Tutt AN, Valdés-Mas R, van Buuren MM, van 't Veer L, Vincent-Salomon A, Waddell N, Yates LR, Australian Pancreatic Cancer Genome Initiative, ICGC Breast Cancer Consortium, ICGC MMML-Seq Consortium, ICGC PedBrain, Zucman-Rossi J, Futreal PA, McDermott U, Lichter P, Meyerson M, Grimmond SM, Siebert R, Campo E, Shibata T, Pfister SM, Campbell PJ and Stratton MR

    Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    All cancers are caused by somatic mutations; however, understanding of the biological processes generating these mutations is limited. The catalogue of somatic mutations from a cancer genome bears the signatures of the mutational processes that have been operative. Here we analysed 4,938,362 mutations from 7,042 cancers and extracted more than 20 distinct mutational signatures. Some are present in many cancer types, notably a signature attributed to the APOBEC family of cytidine deaminases, whereas others are confined to a single cancer class. Certain signatures are associated with age of the patient at cancer diagnosis, known mutagenic exposures or defects in DNA maintenance, but many are of cryptic origin. In addition to these genome-wide mutational signatures, hypermutation localized to small genomic regions, 'kataegis', is found in many cancer types. The results reveal the diversity of mutational processes underlying the development of cancer, with potential implications for understanding of cancer aetiology, prevention and therapy.

    Funded by: Wellcome Trust: 093867, 098051

    Nature 2013;500;7463;415-21

  • Deciphering signatures of mutational processes operative in human cancer.

    Alexandrov LB, Nik-Zainal S, Wedge DC, Campbell PJ and Stratton MR

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    The genome of a cancer cell carries somatic mutations that are the cumulative consequences of the DNA damage and repair processes operative during the cellular lineage between the fertilized egg and the cancer cell. Remarkably, these mutational processes are poorly characterized. Global sequencing initiatives are yielding catalogs of somatic mutations from thousands of cancers, thus providing the unique opportunity to decipher the signatures of mutational processes operative in human cancer. However, until now there have been no theoretical models describing the signatures of mutational processes operative in cancer genomes and no systematic computational approaches are available to decipher these mutational signatures. Here, by modeling mutational processes as a blind source separation problem, we introduce a computational framework that effectively addresses these questions. Our approach provides a basis for characterizing mutational signatures from cancer-derived somatic mutational catalogs, paving the way to insights into the pathogenetic mechanism underlying all cancers.

    Funded by: Wellcome Trust: 088340, 093867, 098051, WT088340MA

    Cell reports 2013;3;1;246-59

  • Modeling the association of space, time, and host species with variation of the HA, NA, and NS genes of H5N1 highly pathogenic avian influenza viruses isolated from birds in Romania in 2005-2007.

    Alkhamis M, Perez A, Batey N, Howard W, Baillie G, Watson S, Franz S, Focosi-Snyman R, Onita I, Cioranu R, Turcitu M, Kellam P, Brown IH and Breed AC

    Center for Animal Disease Modeling and Surveillance (CADMS), School of Veterinary Medicine, One Shields Avenue, University of California, Davis, CA 95616, USA. maalkamees@ucdavis.edu

    Molecular characterization studies of a diverse collection of avian influenza viruses (AIVs) have demonstrated that AIVs' greatest genetic variability lies in the HA, NA, and NS genes. The objective here was to quantify the association between geographical locations, periods of time, and host species and pairwise nucleotide variation in the HA, NA, and NS genes of 70 isolates of H5N1 highly pathogenic avian influenza virus (HPAIV) collected from October 2005 to December 2007 from birds in Romania. A mixed-binomial Bayesian regression model was used to quantify the probability of nucleotide variation between isolates and its association with space, time, and host species. As expected for the three target genes, a higher probability of nucleotide differences (odds ratios [ORs] > 1) was found between viruses sampled from places at greater geographical distances from each other, viruses sampled over greater periods of time, and viruses derived from different species. The modeling approach in the present study maybe useful in further understanding the molecular epidemiology of H5N1 HPAI virus in bird populations. The methodology presented here will be useful in predicting the most likely genetic distance for any of the three gene segments of viruses that have not yet been isolated or sequenced based on space, time, and host species during the course of an epidemic.

    Funded by: Wellcome Trust

    Avian diseases 2013;57;3;612-21

  • The anatomy of successful computational biology software.

    Altschul S, Demchak B, Durbin R, Gentleman R, Krzywinski M, Li H, Nekrutenko A, Robinson J, Rasband W, Taylor J and Trapnell C

    National Center for Biotechnology Information, Bethesda, Maryland.

    Nature biotechnology 2013;31;10;894-7

  • Inappropriately low hepcidin levels in patients with myelodysplastic syndrome carrying a somatic mutation of SF3B1.

    Ambaglio I, Malcovati L, Papaemmanuil E, Laarakkers CM, Della Porta MG, Gallì A, Da Vià MC, Bono E, Ubezio M, Travaglino E, Albertini R, Campbell PJ, Swinkels DW and Cazzola M

    luca.malcovati@unipv.it.

    Somatic mutations of the RNA splicing machinery have been recently identified in myelodysplastic syndromes. In particular, a strong association has been found between SF3B1 mutation and refractory anemia with ring sider-oblasts, a condition characterized by ineffective erythropoiesis and parenchymal iron overload. We studied the relationship between SF3B1 mutation, erythroid activity and hepcidin levels in myelodysplastic syndrome patients. Erythroid activity was evaluated through the proportion of marrow erythroblasts, soluble transferrin receptor and serum growth differentiation factor 15. Significant relationships were found between SF3B1 mutation and marrow erythroblasts (P=0.001), soluble transferrin receptor (P=0.003) and serum growth differentiation factor 15 (P=0.033). Serum hepcidin varied considerably, and multivariable analysis showed that the hepcidin to ferritin ratio, a measure of adequacy of hepcidin levels relative to body iron stores, was inversely related to the SF3B1 mutation (P=0.013). These observations suggest that patients with SF3B1 mutation have inappropriately low hepcidin levels, which may explain their propensity to parenchymal iron loading.

    Haematologica 2013;98;3;420-3

  • The African coelacanth genome provides insights into tetrapod evolution.

    Amemiya CT, Alföldi J, Lee AP, Fan S, Philippe H, Maccallum I, Braasch I, Manousaki T, Schneider I, Rohner N, Organ C, Chalopin D, Smith JJ, Robinson M, Dorrington RA, Gerdol M, Aken B, Biscotti MA, Barucca M, Baurain D, Berlin AM, Blatch GL, Buonocore F, Burmester T, Campbell MS, Canapa A, Cannon JP, Christoffels A, De Moro G, Edkins AL, Fan L, Fausto AM, Feiner N, Forconi M, Gamieldien J, Gnerre S, Gnirke A, Goldstone JV, Haerty W, Hahn ME, Hesse U, Hoffmann S, Johnson J, Karchner SI, Kuraku S, Lara M, Levin JZ, Litman GW, Mauceli E, Miyake T, Mueller MG, Nelson DR, Nitsche A, Olmo E, Ota T, Pallavicini A, Panji S, Picone B, Ponting CP, Prohaska SJ, Przybylski D, Saha NR, Ravi V, Ribeiro FJ, Sauka-Spengler T, Scapigliati G, Searle SM, Sharpe T, Simakov O, Stadler PF, Stegeman JJ, Sumiyama K, Tabbaa D, Tafer H, Turner-Maier J, van Heusden P, White S, Williams L, Yandell M, Brinkmann H, Volff JN, Tabin CJ, Shubin N, Schartl M, Jaffe DB, Postlethwait JH, Venkatesh B, Di Palma F, Lander ES, Meyer A and Lindblad-Toh K

    Molecular Genetics Program, Benaroya Research Institute, Seattle, Washington 98101, USA. camemiya@benaroyaresearch.org

    The discovery of a living coelacanth specimen in 1938 was remarkable, as this lineage of lobe-finned fish was thought to have become extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features. Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain and olfaction. Functional assays of enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues show the importance of the coelacanth genome as a blueprint for understanding tetrapod evolution.

    Funded by: Medical Research Council: MC_U137761446; NCRR NIH HHS: R24 RR032670; NHGRI NIH HHS: R01 HG003474, U54 HG003067; NICHD NIH HHS: R37 HD032443; NIEHS NIH HHS: P42 ES007381, R01 ES006272; NIH HHS: R01 OD011116, R24 OD011199; Wellcome Trust: 095908

    Nature 2013;496;7445;311-6

  • Phosphoinositide 3-Kinase δ Gene Mutation Predisposes to Respiratory Infection and Airway Damage.

    Angulo I, Vadas O, Garçon F, Banham-Hall E, Plagnol V, Leahy TR, Baxendale H, Coulter T, Curtis J, Wu C, Blake-Palmer K, Perisic O, Smyth D, Maes M, Fiddler C, Juss J, Cilliers D, Markelj G, Chandra A, Farmer G, Kielkowska A, Clark J, Kracker S, Debré M, Picard C, Pellier I, Jabado N, Morris JA, Barcenas-Morales G, Fischer A, Stephens L, Hawkins P, Barrett JC, Abinun M, Clatworthy M, Durandy A, Doffinger R, Chilvers E, Cant AJ, Kumararatne D, Okkenhaug K, Williams RL, Condliffe A and Nejentsev S

    Department of Medicine, University of Cambridge, Cambridge, UK.

    Genetic mutations cause primary immunodeficiencies (PIDs), which predispose to infections. Here we describe Activated PI3K-δ Syndrome (APDS), a PID associated with a dominant gain-of-function mutation in which lysine replaced glutamic acid at residue 1021 (E1021K) in the p110δ protein, the catalytic subunit of phosphoinositide 3-kinase δ (PI3Kδ), encoded by the PIK3CD gene. We found E1021K in 17 patients from seven unrelated families, but not among 3346 healthy subjects. APDS was characterized by recurrent respiratory infections, progressive airway damage, lymphopenia, increased circulating transitional B cells, increased immunoglobulin M and reduced immunoglobulin G2 levels in serum and impaired vaccine responses. The E1021K mutation enhanced membrane association and kinase activity of p110δ. Patient-derived lymphocytes had increased levels of phosphatidylinositol 3,4,5-trisphosphate and phosphorylated AKT protein and were prone to activation-induced cell death. Selective p110δ inhibitors IC87114 and GS-1101 reduced the activity of the mutant enzyme in vitro, which suggested a therapeutic approach for patients with APDS.

    Science (New York, N.Y.) 2013

  • The COMBREX Project: Design, Methodology, and Initial Results.

    Anton BP, Chang YC, Brown P, Choi HP, Faller LL, Guleria J, Hu Z, Klitgord N, Levy-Moonshine A, Maksad A, Mazumdar V, McGettrick M, Osmani L, Pokrzywa R, Rachlin J, Swaminathan R, Allen B, Housman G, Monahan C, Rochussen K, Tao K, Bhagwat AS, Brenner SE, Columbus L, de Crécy-Lagard V, Ferguson D, Fomenkov A, Gadda G, Morgan RD, Osterman AL, Rodionov DA, Rodionova IA, Rudd KE, Söll D, Spain J, Xu SY, Bateman A, Blumenthal RM, Bollinger JM, Chang WS, Ferrer M, Friedberg I, Galperin MY, Gobeill J, Haft D, Hunt J, Karp P, Klimke W, Krebs C, Macelis D, Madupu R, Martin MJ, Miller JH, O'Donovan C, Palsson B, Ruch P, Setterdahl A, Sutton G, Tate J, Yakunin A, Tchigvintsev D, Plata G, Hu J, Greiner R, Horn D, Sjölander K, Salzberg SL, Vitkup D, Letovsky S, Segrè D, Delisi C, Roberts RJ, Steffen M and Kasif S

    New England Biolabs, Ipswich, Massachusetts, United States of America.

    Experimental data exists for only a vanishingly small fraction of sequenced microbial genes. This community page discusses the progress made by the COMBREX project to address this important issue using both computational and experimental resources.

    PLoS biology 2013;11;8;e1001638

  • Genome-wide meta-analysis identifies new susceptibility loci for migraine.

    Anttila V, Winsvold BS, Gormley P, Kurth T, Bettella F, McMahon G, Kallela M, Malik R, de Vries B, Terwindt G, Medland SE, Todt U, McArdle WL, Quaye L, Koiranen M, Ikram MA, Lehtimäki T, Stam AH, Ligthart L, Wedenoja J, Dunham I, Neale BM, Palta P, Hamalainen E, Schürks M, Rose LM, Buring JE, Ridker PM, Steinberg S, Stefansson H, Jakobsson F, Lawlor DA, Evans DM, Ring SM, Färkkilä M, Artto V, Kaunisto MA, Freilinger T, Schoenen J, Frants RR, Pelzer N, Weller CM, Zielman R, Heath AC, Madden PA, Montgomery GW, Martin NG, Borck G, Göbel H, Heinze A, Heinze-Kuhn K, Williams FM, Hartikainen AL, Pouta A, van den Ende J, Uitterlinden AG, Hofman A, Amin N, Hottenga JJ, Vink JM, Heikkilä K, Alexander M, Muller-Myhsok B, Schreiber S, Meitinger T, Wichmann HE, Aromaa A, Eriksson JG, Traynor BJ, Trabzuni D, Rossin E, Lage K, Jacobs SB, Gibbs JR, Birney E, Kaprio J, Penninx BW, Boomsma DI, van Duijn C, Raitakari O, Jarvelin MR, Zwart JA, Cherkas L, Strachan DP, Kubisch C, Ferrari MD, van den Maagdenberg AM, Dichgans M, Wessman M, Smith GD, Stefansson K, Daly MJ, Nyholt DR, Chasman DI, Palotie A, North American Brain Expression Consortium, UK Brain Expression Consortium and International Headache Genetics Consortium

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK. anttila@atgu.mgh.harvard.edu

    Migraine is the most common brain disorder, affecting approximately 14% of the adult population, but its molecular mechanisms are poorly understood. We report the results of a meta-analysis across 29 genome-wide association studies, including a total of 23,285 individuals with migraine (cases) and 95,425 population-matched controls. We identified 12 loci associated with migraine susceptibility (P<5×10(-8)). Five loci are new: near AJAP1 at 1p36, near TSPAN2 at 1p13, within FHL5 at 6q16, within C7orf10 at 7p14 and near MMP16 at 8q21. Three of these loci were identified in disease subgroup analyses. Brain tissue expression quantitative trait locus analysis suggests potential functional candidate genes at four loci: APOA1BP, TBC1D7, FUT9, STAT6 and ATP5B.

    Funded by: Medical Research Council: G0802462, G9815508; NIA NIH HHS: Z01 AG000949-02; NIGMS NIH HHS: T32 GM007753; Wellcome Trust: 089062, 092731

    Nature genetics 2013;45;8;912-7

  • Association of cytokine and toll-like receptor gene polymorphisms with severe malaria in three regions of cameroon.

    Apinjoh TO, Anchang-Kimbi JK, Njua-Yafi C, Mugri RN, Ngwai AN, Rockett KA, Mbunwe E, Besingi RN, Clark TG, Kwiatkowski DP, Achidi EA and MalariaGEN Consortium

    Department of Biochemistry and Molecular Biology, University of Buea, Buea, Cameroon.

    P. falciparum malaria is one of the most widespread and deadliest infectious diseases in children under five years in endemic areas. The disease has been a strong force for evolutionary selection in the human genome, and uncovering the critical human genetic factors that confer resistance to the disease would provide clues to the molecular basis of protective immunity that would be invaluable for vaccine development. We investigated the effect of single nucleotide polymorphisms (SNPs) on malaria pathology in a case- control study of 1862 individuals from two major ethnic groups in three regions with intense perennial P. falciparum transmission in Cameroon. Twenty nine polymorphisms in cytokine and toll-like receptor (TLR) genes as well as the sickle cell trait (HbS) were assayed on the Sequenom iPLEX platform. Our results confirm the known protective effect of HbS against severe malaria and also reveal a protective effect of SNPs in interleukin-10 (IL10) cerebral malaria and hyperpyrexia. Furthermore, IL17RE rs708567 GA and hHbS rs334 AT individuals were associated with protection from uncomplicated malaria and anaemia respectively in this study. Meanwhile, individuals with the hHbS rs334 TT, IL10 rs3024500 AA, and IL17RD rs6780995 GA genotypes were more susceptible to severe malarial anaemia, cerebral malaria, and hyperpyrexia respectively. Taken together, our results suggest that polymorphisms in some immune response genes may have important implications for the susceptibility to severe malaria in Cameroonians. Moreover using uncomplicated malaria may allow us to identify novel pathways in the early development of the disease.

    PloS one 2013;8;11;e81071

  • Genome-wide, whole mount in situ analysis of transcriptional regulators in zebrafish embryos.

    Armant O, März M, Schmidt R, Ferg M, Diotel N, Ertzer R, Bryne JC, Yang L, Baader I, Reischl M, Legradi J, Mikut R, Stemple D, Ijcken Wv, van der Sloot A, Lenhard B, Strähle U and Rastegar S

    Institute of Toxicology and Genetics, Karlsruhe Institute of Technology, Postfach 3640, 76021 Karlsruhe, Germany.

    Transcription is the primary step in the retrieval of genetic information. A substantial proportion of the protein repertoire of each organism consists of transcriptional regulators (TRs). It is believed that the differential expression and combinatorial action of these TRs is essential for vertebrate development and body homeostasis. We mined the zebrafish genome exhaustively for genes encoding TRs and determined their expression in the zebrafish embryo by sequencing to saturation and in situ hybridisation. At the evolutionary conserved phylotypic stage, 75% of the 3302 TR genes encoded in the genome are already expressed. The number of expressed TR genes increases only marginally in subsequent stages and is maintained during adulthood suggesting important roles of the TR genes in body homeostasis. Fewer than half of the TR genes (45%, n=1711 genes) are expressed in a tissue-restricted manner in the embryo. Transcripts of 207 genes were detected in a single tissue in the 24h embryo, potentially acting as regulators of specific processes. Other TR genes were expressed in multiple tissues. However, with the exception of certain territories in the nervous system, we did not find significant synexpression suggesting that most tissue-restricted TRs act in a freely combinatorial fashion. Our data indicate that elaboration of body pattern and function from the phylotypic stage onward relies mostly on redeployment of TRs and post-transcriptional processes.

    Funded by: Medical Research Council: MC_UP_1102/1

    Developmental biology 2013;380;2;351-62

  • The general population cohort in rural south-western Uganda: a platform for communicable and non-communicable disease studies.

    Asiki G, Murphy G, Nakiyingi-Miiro J, Seeley J, Nsubuga RN, Karabarinde A, Waswa L, Biraro S, Kasamba I, Pomilla C, Maher D, Young EH, Kamali A, Sandhu MS and On behalf of the GPC team

    Medical Research Council/Uganda Virus Research Institute (MRC/UVRI), Uganda Research Unit on AIDS, Entebbe, Uganda, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK, Wellcome Trust Sanger Institute, Hinxton, UK, London School of Hygiene and Tropical Medicine, London, UK, School of International Development, University of East Anglia, Norwich, UK and Wellcome Trust, UK (formerly with MRC/UVRI Uganda Research Unit on AIDS, Entebbe, Uganda).

    The General Population Cohort (GPC) was set up in 1989 to examine trends in HIV prevalence and incidence, and their determinants in rural south-western Uganda. Recently, the research questions have included the epidemiology and genetics of communicable and non-communicable diseases (NCDs) to address the limited data on the burden and risk factors for NCDs in sub-Saharan Africa. The cohort comprises all residents (52% aged ≥13years, men and women in equal proportions) within one-half of a rural sub-county, residing in scattered houses, and largely farmers of three major ethnic groups. Data collected through annual surveys include; mapping for spatial analysis and participant location; census for individual socio-demographic and household socioeconomic status assessment; and a medical survey for health, lifestyle and biophysical and blood measurements to ascertain disease outcomes and risk factors for selected participants. This cohort offers a rich platform to investigate the interplay between communicable diseases and NCDs. There is robust infrastructure for data management, sample processing and storage, and diverse expertise in epidemiology, social and basic sciences. For any data access enquiries you may contact the director, MRC/UVRI, Uganda Research Unit on AIDS by email to mrc@mrcuganda.org or the corresponding author.

    International journal of epidemiology 2013

  • Hospital outbreak of Middle East respiratory syndrome coronavirus.

    Assiri A, McGeer A, Perl TM, Price CS, Al Rabeeah AA, Cummings DA, Alabdullatif ZN, Assad M, Almulhim A, Makhdoom H, Madani H, Alhakeem R, Al-Tawfiq JA, Cotten M, Watson SJ, Kellam P, Zumla AI, Memish ZA and KSA MERS-CoV Investigation Team

    Global Center for Mass Gatherings Medicine, Ministry of Health, Riyadh, Saudi Arabia.

    Background: In September 2012, the World Health Organization reported the first cases of pneumonia caused by the novel Middle East respiratory syndrome coronavirus (MERS-CoV). We describe a cluster of health care-acquired MERS-CoV infections.

    Methods: Medical records were reviewed for clinical and demographic information and determination of potential contacts and exposures. Case patients and contacts were interviewed. The incubation period and serial interval (the time between the successive onset of symptoms in a chain of transmission) were estimated. Viral RNA was sequenced.

    Results: Between April 1 and May 23, 2013, a total of 23 cases of MERS-CoV infection were reported in the eastern province of Saudi Arabia. Symptoms included fever in 20 patients (87%), cough in 20 (87%), shortness of breath in 11 (48%), and gastrointestinal symptoms in 8 (35%); 20 patients (87%) presented with abnormal chest radiographs. As of June 12, a total of 15 patients (65%) had died, 6 (26%) had recovered, and 2 (9%) remained hospitalized. The median incubation period was 5.2 days (95% confidence interval [CI], 1.9 to 14.7), and the serial interval was 7.6 days (95% CI, 2.5 to 23.1). A total of 21 of the 23 cases were acquired by person-to-person transmission in hemodialysis units, intensive care units, or in-patient units in three different health care facilities. Sequencing data from four isolates revealed a single monophyletic clade. Among 217 household contacts and more than 200 health care worker contacts whom we identified, MERS-CoV infection developed in 5 family members (3 with laboratory-confirmed cases) and in 2 health care workers (both with laboratory-confirmed cases).

    Conclusions: Person-to-person transmission of MERS-CoV can occur in health care settings and may be associated with considerable morbidity. Surveillance and infection-control measures are critical to a global public health response.

    Funded by: NIGMS NIH HHS: U01 GM070708, U54 GM088491; Wellcome Trust: 093724

    The New England journal of medicine 2013;369;5;407-16

  • Genome Sequencing Reveals Loci under Artificial Selection that Underlie Disease Phenotypes in the Laboratory Rat.

    Atanur SS, Diaz AG, Maratou K, Sarkis A, Rotival M, Game L, Tschannen MR, Kaisaki PJ, Otto GW, John Ma MC, Keane TM, Hummel O, Saar K, Chen W, Guryev V, Gopalakrishnan K, Garrett MR, Joe B, Citterio L, Bianchi G, McBride M, Dominiczak A, Adams DJ, Serikawa T, Flicek P, Cuppen E, Hubner N, Petretto E, Gauguier D, Kwitek A, Jacob H and Aitman TJ

    Physiological Genomic and Medicine Group, MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, UK; National Heart and Lung Institute, Imperial College London, London W12 0NN, UK.

    Large numbers of inbred laboratory rat strains have been developed for a range of complex disease phenotypes. To gain insights into the evolutionary pressures underlying selection for these phenotypes, we sequenced the genomes of 27 rat strains, including 11 models of hypertension, diabetes, and insulin resistance, along with their respective control strains. Altogether, we identified more than 13 million single-nucleotide variants, indels, and structural variants across these rat strains. Analysis of strain-specific selective sweeps and gene clusters implicated genes and pathways involved in cation transport, angiotensin production, and regulators of oxidative stress in the development of cardiovascular disease phenotypes in rats. Many of the rat loci that we identified overlap with previously mapped loci for related traits in humans, indicating the presence of shared pathways underlying these phenotypes in rats and humans. These data represent a step change in resources available for evolutionary analysis of complex traits in disease models. PAPERCLIP:

    Cell 2013

  • Effective Preparation of Plasmodium vivax Field Isolates for High-Throughput Whole Genome Sequencing.

    Auburn S, Marfurt J, Maslen G, Campino S, Ruano Rubio V, Manske M, Machunter B, Kenangalem E, Noviyanti R, Trianty L, Sebayang B, Wirjanata G, Sriprawat K, Alcock D, Macinnis B, Miotto O, Clark TG, Russell B, Anstey NM, Nosten F, Kwiatkowski DP and Price RN

    Global and Tropical Health Division, Menzies School of Health Research, Charles Darwin University, Darwin, Australia.

    Whole genome sequencing (WGS) of Plasmodium vivax is problematic due to the reliance on clinical isolates which are generally low in parasitaemia and sample volume. Furthermore, clinical isolates contain a significant contaminating background of host DNA which confounds efforts to map short read sequence of the target P. vivax DNA. Here, we discuss a methodology to significantly improve the success of P. vivax WGS on natural (non-adapted) patient isolates. Using 37 patient isolates from Indonesia, Thailand, and travellers, we assessed the application of CF11-based white blood cell filtration alone and in combination with short term ex vivo schizont maturation. Although CF11 filtration reduced human DNA contamination in 8 Indonesian isolates tested, additional short-term culture increased the P. vivax DNA yield from a median of 0.15 to 6.2 ng µl(-1) packed red blood cells (pRBCs) (p = 0.001) and reduced the human DNA percentage from a median of 33.9% to 6.22% (p = 0.008). Furthermore, post-CF11 and culture samples from Thailand gave a median P. vivax DNA yield of 2.34 ng µl(-1) pRBCs, and 2.65% human DNA. In 22 P. vivax patient isolates prepared with the 2-step method, we demonstrate high depth (median 654X coverage) and breadth (≥89%) of coverage on the Illumina GAII and HiSeq platforms. In contrast to the A+T-rich P. falciparum genome, negligible bias was observed in coverage depth between coding and non-coding regions of the P. vivax genome. This uniform coverage will greatly facilitate the detection of SNPs and copy number variants across the genome, enabling unbiased exploration of the natural diversity in P. vivax populations.

    PloS one 2013;8;1;e53160

  • Genomic triumph meets clinical reality.

    Ayub Q, Xue Y and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. cts@sanger.ac.uk.

    A report on the 'Genomic Disorders 2013: from 60 years of DNA to human genomes in the clinic' meeting, held at Homerton College, Cambridge, UK, April 10-12, 2013.

    Genome biology 2013;14;5;307

  • FOXP2 targets show evidence of positive selection in European populations.

    Ayub Q, Yngvadottir B, Chen Y, Xue Y, Hu M, Vernes SC, Fisher SE and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. qa1@sanger.ac.uk

    Forkhead box P2 (FOXP2) is a highly conserved transcription factor that has been implicated in human speech and language disorders and plays important roles in the plasticity of the developing brain. The pattern of nucleotide polymorphisms in FOXP2 in modern populations suggests that it has been the target of positive (Darwinian) selection during recent human evolution. In our study, we searched for evidence of selection that might have followed FOXP2 adaptations in modern humans. We examined whether or not putative FOXP2 targets identified by chromatin-immunoprecipitation genomic screening show evidence of positive selection. We developed an algorithm that, for any given gene list, systematically generates matched lists of control genes from the Ensembl database, collates summary statistics for three frequency-spectrum-based neutrality tests from the low-coverage resequencing data of the 1000 Genomes Project, and determines whether these statistics are significantly different between the given gene targets and the set of controls. Overall, there was strong evidence of selection of FOXP2 targets in Europeans, but not in the Han Chinese, Japanese, or Yoruba populations. Significant outliers included several genes linked to cellular movement, reproduction, development, and immune cell trafficking, and 13 of these constituted a significant network associated with cardiac arteriopathy. Strong signals of selection were observed for CNTNAP2 and RBFOX1, key neurally expressed genes that have been consistently identified as direct FOXP2 targets in multiple studies and that have themselves been associated with neurodevelopmental disorders involving language dysfunction.

    Funded by: Wellcome Trust: 098051

    American journal of human genetics 2013;92;5;696-706

  • Cooperativity of imprinted genes inactivated by acquired chromosome 20q deletions.

    Aziz A, Baxter EJ, Edwards C, Cheong CY, Ito M, Bench A, Kelley R, Silber Y, Beer PA, Chng K, Renfree MB, McEwen K, Gray D, Nangalia J, Mufti GJ, Hellstrom-Lindberg E, Kiladjian JJ, McMullin MF, Campbell PJ, Ferguson-Smith AC and Green AR

    Large regions of recurrent genomic loss are common in cancers; however, with a few well-characterized exceptions, how they contribute to tumor pathogenesis remains largely obscure. Here we identified primate-restricted imprinting of a gene cluster on chromosome 20 in the region commonly deleted in chronic myeloid malignancies. We showed that a single heterozygous 20q deletion consistently resulted in the complete loss of expression of the imprinted genes L3MBTL1 and SGK2, indicative of a pathogenetic role for loss of the active paternally inherited locus. Concomitant loss of both L3MBTL1 and SGK2 dysregulated erythropoiesis and megakaryopoiesis, 2 lineages commonly affected in chronic myeloid malignancies, with distinct consequences in each lineage. We demonstrated that L3MBTL1 and SGK2 collaborated in the transcriptional regulation of MYC by influencing different aspects of chromatin structure. L3MBTL1 is known to regulate nucleosomal compaction, and we here showed that SGK2 inactivated BRG1, a key ATP-dependent helicase within the SWI/SNF complex that regulates nucleosomal positioning. These results demonstrate a link between an imprinted gene cluster and malignancy, reveal a new pathogenetic mechanism associated with acquired regions of genomic loss, and underline the complex molecular and cellular consequences of "simple" cancer-associated chromosome deletions.

    The Journal of clinical investigation 2013;123;5;2169-82

  • Y-chromosome and mtDNA genetics reveal significant contrasts in affinities of modern Middle Eastern populations with European and African populations.

    Badro DA, Douaihy B, Haber M, Youhanna SC, Salloum A, Ghassibe-Sabbagh M, Johnsrud B, Khazen G, Matisoo-Smith E, Soria-Hernanz DF, Wells RS, Tyler-Smith C, Platt DE, Zalloua PA and Genographic Consortium

    The Lebanese American University, Chouran, Beirut, Lebanon.

    The Middle East was a funnel of human expansion out of Africa, a staging area for the Neolithic Agricultural Revolution, and the home to some of the earliest world empires. Post LGM expansions into the region and subsequent population movements created a striking genetic mosaic with distinct sex-based genetic differentiation. While prior studies have examined the mtDNA and Y-chromosome contrast in focal populations in the Middle East, none have undertaken a broad-spectrum survey including North and sub-Saharan Africa, Europe, and Middle Eastern populations. In this study 5,174 mtDNA and 4,658 Y-chromosome samples were investigated using PCA, MDS, mean-linkage clustering, AMOVA, and Fisher exact tests of F(ST)'s, R(ST)'s, and haplogroup frequencies. Geographic differentiation in affinities of Middle Eastern populations with Africa and Europe showed distinct contrasts between mtDNA and Y-chromosome data. Specifically, Lebanon's mtDNA shows a very strong association to Europe, while Yemen shows very strong affinity with Egypt and North and East Africa. Previous Y-chromosome results showed a Levantine coastal-inland contrast marked by J1 and J2, and a very strong North African component was evident throughout the Middle East. Neither of these patterns were observed in the mtDNA. While J2 has penetrated into Europe, the pattern of Y-chromosome diversity in Lebanon does not show the widespread affinities with Europe indicated by the mtDNA data. Lastly, while each population shows evidence of connections with expansions that now define the Middle East, Africa, and Europe, many of the populations in the Middle East show distinctive mtDNA and Y-haplogroup characteristics that indicate long standing settlement with relatively little impact from and movement into other populations.

    PloS one 2013;8;1;e54616

  • Metagenomic study of the viruses of African straw-coloured fruit bats: detection of a chiropteran poxvirus and isolation of a novel adenovirus.

    Baker KS, Leggett RM, Bexfield NH, Alston M, Daly G, Todd S, Tachedjian M, Holmes CE, Crameri S, Wang LF, Heeney JL, Suu-Ire R, Kellam P, Cunningham AA, Wood JL, Caccamo M and Murcia PR

    University of Cambridge, Department of Veterinary Medicine, Madingley Rd, Cambridge, Cambridgeshire, CB3 0ES, United Kingdom. kf281@cam.ac.uk

    Viral emergence as a result of zoonotic transmission constitutes a continuous public health threat. Emerging viruses such as SARS coronavirus, hantaviruses and henipaviruses have wildlife reservoirs. Characterising the viruses of candidate reservoir species in geographical hot spots for viral emergence is a sensible approach to develop tools to predict, prevent, or contain emergence events. Here, we explore the viruses of Eidolon helvum, an Old World fruit bat species widely distributed in Africa that lives in close proximity to humans. We identified a great abundance and diversity of novel herpes and papillomaviruses, described the isolation of a novel adenovirus, and detected, for the first time, sequences of a chiropteran poxvirus closely related with Molluscum contagiosum. In sum, E. helvum display a wide variety of mammalian viruses, some of them genetically similar to known human pathogens, highlighting the possibility of zoonotic transmission.

    Funded by: Medical Research Council: G0801822; Wellcome Trust

    Virology 2013;441;2;95-106

  • Fitness benefits in fluoroquinolone-resistant Salmonella Typhi in the absence of antimicrobial pressure.

    Baker S, Duy PT, Nga TV, Dung TT, Phat VV, Chau TT, Turner AK, Farrar J and Boni MF

    Oxford University Clinical Research Unit, Wellcome Trust Major Overseas Programme, Ho Chi Minh City, Vietnam.

    Fluoroquinolones (FQ) are the recommended antimicrobial treatment for typhoid, a severe systemic infection caused by the bacterium Salmonella enterica serovar Typhi. FQ-resistance mutations in S. Typhi have become common, hindering treatment and control efforts. Using in vitro competition experiments, we assayed the fitness of eleven isogenic S. Typhi strains with resistance mutations in the FQ target genes, gyrA and parC. In the absence of antimicrobial pressure, 6 out of 11 mutants carried a selective advantage over the antimicrobial-sensitive parent strain, indicating that FQ resistance in S. Typhi is not typically associated with fitness costs. Double-mutants exhibited higher than expected fitness as a result of synergistic epistasis, signifying that epistasis may be a critical factor in the evolution and molecular epidemiology of S. Typhi. Our findings have important implications for the management of drug-resistant S. Typhi, suggesting that FQ-resistant strains would be naturally maintained even if fluoroquinolone use were reduced. DOI: http://dx.doi.org/10.7554/eLife.01229.001.

    Funded by: Wellcome Trust: 100087

    eLife 2013;2;e01229

  • Genetic screens in mice for genome integrity maintenance and cancer predisposition.

    Balmus G and McIntyre RE

    The Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, UK; Experimental Cancer Genetics, The Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Genome instability is a feature of nearly all cancers and can be exploited for therapy. In addition, a growing number of genome maintenance genes have been associated with developmental disorders. Efforts to understand the role of genome instability in these processes will be greatly facilitated by a more comprehensive understanding of their genetic network. We highlight recent genetic screens in model organisms that have assisted in the discovery of novel regulators of genome stability and focus on the contribution of mice as a model organism to understanding the role of genome instability during embryonic development, tumour formation and cancer therapy.

    Current opinion in genetics & development 2013;24C;1-7

  • Atypical Mitogen-Activated Protein Kinase Phosphatase Implicated in Regulating Transition from Pre-S-Phase Asexual Intraerythrocytic Development of Plasmodium falciparum.

    Balu B, Campbell C, Sedillo J, Maher S, Singh N, Thomas P, Zhang M, Pance A, Otto TD, Rayner JC and Adams JH

    Department of Global Health, College of Public Health, University of South Florida, Tampa, Florida, USA.

    Intraerythrocytic development of the human malaria parasite Plasmodium falciparum appears as a continuous flow through growth and proliferation. To develop a greater understanding of the critical regulatory events, we utilized piggyBac insertional mutagenesis to randomly disrupt genes. Screening a collection of piggyBac mutants for slow growth, we isolated the attenuated parasite C9, which carried a single insertion disrupting the open reading frame (ORF) of PF3D7_1305500. This gene encodes a protein structurally similar to a mitogen-activated protein kinase (MAPK) phosphatase, except for two notable characteristics that alter the signature motif of the dual-specificity phosphatase domain, suggesting that it may be a low-activity phosphatase or pseudophosphatase. C9 parasites demonstrated a significantly lower growth rate with delayed entry into the S/M phase of the cell cycle, which follows the stage of maximum PF3D7_1305500 expression in intact parasites. Genetic complementation with the full-length PF3D7_1305500 rescued the wild-type phenotype of C9, validating the importance of the putative protein phosphatase PF3D7_1305500 as a regulator of pre-S-phase cell cycle progression in P. falciparum.

    Eukaryotic cell 2013;12;9;1171-8

  • Imputation-based meta-analysis of severe malaria in three African populations.

    Band G, Le QS, Jostins L, Pirinen M, Kivinen K, Jallow M, Sisay-Joof F, Bojang K, Pinder M, Sirugo G, Conway DJ, Nyirongo V, Kachala D, Molyneux M, Taylor T, Ndila C, Peshu N, Marsh K, Williams TN, Alcock D, Andrews R, Edkins S, Gray E, Hubbart C, Jeffreys A, Rowlands K, Schuldt K, Clark TG, Small KS, Teo YY, Kwiatkowski DP, Rockett KA, Barrett JC, Spencer CC, Malaria Genomic Epidemiology Network and Malaria Genomic Epidemiological Network

    Wellcome Trust Centre for Human Genetics, Oxford, United Kingdom.

    Combining data from genome-wide association studies (GWAS) conducted at different locations, using genotype imputation and fixed-effects meta-analysis, has been a powerful approach for dissecting complex disease genetics in populations of European ancestry. Here we investigate the feasibility of applying the same approach in Africa, where genetic diversity, both within and between populations, is far more extensive. We analyse genome-wide data from approximately 5,000 individuals with severe malaria and 7,000 population controls from three different locations in Africa. Our results show that the standard approach is well powered to detect known malaria susceptibility loci when sample sizes are large, and that modern methods for association analysis can control the potential confounding effects of population structure. We show that pattern of association around the haemoglobin S allele differs substantially across populations due to differences in haplotype structure. Motivated by these observations we consider new approaches to association analysis that might prove valuable for multicentre GWAS in Africa: we relax the assumptions of SNP-based fixed effect analysis; we apply Bayesian approaches to allow for heterogeneity in the effect of an allele on risk across studies; and we introduce a region-based test to allow for heterogeneity in the location of causal alleles.

    Funded by: Medical Research Council: G0600230, G0600718, G19/9; Wellcome Trust: 075491/Z/04, 077012/Z/05/Z, 087285, 090532, 090532/Z/09/Z, 090770, 090770/Z/09/Z, 091758, 091758/Z/10/Z, 096527, 097364/Z/11/Z, WT077383/Z/05/Z, WT098051

    PLoS genetics 2013;9;5;e1003509

  • Approaches to querying bacterial genomes with transposon-insertion sequencing.

    Barquist L, Boinett CJ and Cain AK

    Wellcome Trust Sanger Institute; Hinxton, Cambridge, UK; EMBL-European Bioinformatics Institute; Hinxton, Cambridge, UK.

    In this review we discuss transposon-insertion sequencing, variously known in the literature as TraDIS, Tn-seq, INSeq and HITS. By monitoring a large library of single transposon-insertion mutants with high-throughput sequencing, these methods can rapidly identify genomic regions that contribute to organismal fitness under any condition assayable in the laboratory with exquisite resolution. We discuss the various protocols that have been developed and methods for analysis. We provide an overview of studies that have examined the reproducibility and accuracy of these methods, as well as studies showing the advantages offered by the high resolution and dynamic range of high-throughput sequencing over previous methods. We review a number of applications in the literature, from predicting genes essential for in vitro growth to directly assaying requirements for survival under infective conditions in vivo. We also highlight recent progress in assaying non-coding regions of the genome in addition to known coding sequences, including the combining of RNA-seq with high-throughput transposon mutagenesis.

    RNA biology 2013;10;7

  • A comparison of dense transposon insertion libraries in the Salmonella serovars Typhi and Typhimurium.

    Barquist L, Langridge GC, Turner DJ, Phan MD, Turner AK, Bateman A, Parkhill J, Wain J and Gardner PP

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK. lb14@sanger.ac.uk

    Salmonella Typhi and Typhimurium diverged only ∼50 000 years ago, yet have very different host ranges and pathogenicity. Despite the availability of multiple whole-genome sequences, the genetic differences that have driven these changes in phenotype are only beginning to be understood. In this study, we use transposon-directed insertion-site sequencing to probe differences in gene requirements for competitive growth in rich media between these two closely related serovars. We identify a conserved core of 281 genes that are required for growth in both serovars, 228 of which are essential in Escherichia coli. We are able to identify active prophage elements through the requirement for their repressors. We also find distinct differences in requirements for genes involved in cell surface structure biogenesis and iron utilization. Finally, we demonstrate that transposon-directed insertion-site sequencing is not only applicable to the protein-coding content of the cell but also has sufficient resolution to generate hypotheses regarding the functions of non-coding RNAs (ncRNAs) as well. We are able to assign probable functions to a number of cis-regulatory ncRNA elements, as well as to infer likely differences in trans-acting ncRNA regulatory networks.

    Funded by: Wellcome Trust: WT076964, WT079643, WT098051

    Nucleic acids research 2013;41;8;4549-64

  • Neurolysin

    Barrett, A.J.

    Handbook of Proteolytic Enzymes 2013;1;509-13

  • Cytosol Alanyl Aminopeptidase

    BARRETT,A.J.

    Handbook of Proteolytic Enzymes 2013;1;431-4

  • Metridin

    BARRETT,A.J.

    Handbook of Proteolytic Enzymes 2013;3;581;2624-5

  • Animal Legumain

    BARRETT,A.J. and Chen,J.-M.

    Handbook of Proteolytic Enzymes 2013;2;518;2309-14

  • Introduction: Unsequenced Serine Peptidases

    BARRETT,A.J

    Handbook of Proteolytic Enzymes 2013;3;824;3737

  • Peptidyl-Dipeptidase B

    BARRETT,A.J

    Handbook of Proteolytic Enzymes 2013;2;392;1710-11

  • Identifying novel Plasmodium falciparum erythrocyte invasion receptors using systematic extracellular protein interaction screens.

    Bartholdson SJ, Crosnier C, Bustamante LY, Rayner JC and Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH; Malaria Programme, Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA.

    The invasion of host erythrocytes by the parasite Plasmodium falciparum initiates the blood stage of infection responsible for the symptoms of malaria. Invasion involves extracellular protein interactions between host erythrocyte receptors and ligands on the merozoite, the invasive form of the parasite. Despite significant research effort, many merozoite surface ligands have no known erythrocyte binding partner, most likely due to the intractable biochemical nature of membrane-tethered receptor proteins and their interactions. The few receptor-ligand pairs that have been described have largely relied on sourcing erythrocytes from patients with rare blood groups, a serendipitous approach that is unsatisfactory for systematically identifying novel receptors. We have recently developed a scalable assay called AVEXIS (for AVidity-based EXtracellular Interaction Screen), designed to circumvent the technical difficulties associated with the identification of extracellular protein interactions, and applied it to identify erythrocyte receptors for orphan Plasmodium falciparum merozoite ligands. Using this approach, we have recently identified Basigin (CD147) and Semaphorin-7A (CD108) as receptors for RH5 and MTRAP, respectively. In this essay, we review techniques used to identify Plasmodium receptors and discuss how they could be applied in the future to identify novel receptors both for Plasmodium parasites but also other pathogens.

    Cellular microbiology 2013

  • Network properties derived from deep sequencing of human B-cell receptor repertoires delineate B-cell populations.

    Bashford-Rogers RJ, Palser AL, Huntly BJ, Rance R, Vassiliou GS, Follows GA and Kellam P

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom;

    The adaptive immune response selectively expands B- and T-cell clones following antigen recognition by B- and T-cell receptors (BCR and TCR), respectively. Next-generation sequencing is a powerful tool for dissecting the BCR and TCR populations at high resolution, but robust computational analyses are required to interpret such sequencing. Here, we develop a novel computational approach for BCR repertoire analysis using established next-generation sequencing methods coupled with network construction and population analysis. BCR sequences organize into networks based on sequence diversity, with differences in network connectivity clearly distinguishing between diverse repertoires of healthy individuals and clonally expanded repertoires from individuals with chronic lymphocytic leukemia (CLL) and other clonal blood disorders. Network population measures defined by the Gini Index and cluster sizes quantify the BCR clonality status and are robust to sampling and sequencing depths. BCR network analysis therefore allows the direct and quantifiable comparison of BCR repertoires between samples and intra-individual population changes between temporal or spatially separated samples and over the course of therapy.

    Funded by: Wellcome Trust: 079249, 095663, 100140

    Genome research 2013;23;11;1874-84

  • ISCB Computational Biology Wikipedia Competition.

    Bateman A, Kelso J, Mietchen D, Macintyre G, Di Domenico T, Abeel T, Logan DW, Radivojac P and Rost B

    European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom.

    PLoS computational biology 2013;9;9;e1003242

  • Peripheral administration of prokineticin 2 potently reduces food intake and body weight in mice via the brainstem.

    Beale K, Gardiner J, Bewick G, Hostomska K, Patel N, Hussain S, Jayasena C, Ebling F, Jethwa P, Prosser H, Lattanzi R, Negri L, Ghatei M, Bloom S and Dhillo W

    Section of Investigative Medicine, Imperial College London, London, UK.

    BACKGROUND AND PURPOSE: Prokineticin 2 (PK2) has recently been shown to acutely reduce food intake in rodents. We aimed to determine the CNS sites and receptors that mediate the anorectic effects of peripherally administered PK2 and its chronic effects on glucose and energy homeostasis. EXPERIMENTAL APPROACH: We investigated neuronal activation following i.p. administration of PK2 using c-Fos-like immunoreactivity (CFL-IR). The anorectic effect of PK2 was examined in mice with targeted deletion of either prokineticin receptor 1 (PKR1) or prokineticin receptor 2 (PKR2), and in wild-type mice following administration of the PKR1 antagonist, PC1. The effect of IP PK2 administration on glucose homeostasis was investigated. Finally, the effect of long-term administration of PK2 on glucose and energy homeostasis in diet-induced obese (DIO) mice was determined. KEY RESULTS: I.p. PK2 administration significantly increased CFL-IR in the dorsal motor vagal nucleus of the brainstem. The anorectic effect of PK2 was maintained in mice lacking the PKR2 but abolished in mice lacking PKR1 and in wild-type mice pre-treated with PC1. DIO mice treated chronically with PK2 had no changes in glucose levels but significantly reduced food intake and body weight compared to controls. CONCLUSIONS AND IMPLICATIONS: Together, our data suggest that the anorectic effects of peripherally administered PK2 are mediated via the brainstem and this effect requires PKR1 but not PKR2 signalling. Chronic administration of PK2 reduces food intake and body weight in a mouse model of human obesity, suggesting that PKR1-selective agonists have potential to be novel therapeutics for the treatment of obesity.

    British journal of pharmacology 2013;168;2;403-410

  • Deep Resequencing of GWAS Loci Identifies Rare Variants in CARD9, IL23R and RNF186 That Are Associated with Ulcerative Colitis.

    Beaudoin M, Goyette P, Boucher G, Lo KS, Rivas MA, Stevens C, Alikashani A, Ladouceur M, Ellinghaus D, Törkvist L, Goel G, Lagacé C, Annese V, Bitton A, Begun J, Brant SR, Bresso F, Cho JH, Duerr RH, Halfvarson J, McGovern DP, Radford-Smith G, Schreiber S, Schumm PL, Sharma Y, Silverberg MS, Weersma RK, Quebec IBD Genetics Consortium, NIDDK IBD Genetics Consortium, International IBD Genetics Consortium, D'Amato M, Vermeire S, Franke A, Lettre G, Xavier RJ, Daly MJ and Rioux JD

    Montreal Heart Institute, Research Center, Montreal, Quebec, Canada.

    Genome-wide association studies and follow-up meta-analyses in Crohn's disease (CD) and ulcerative colitis (UC) have recently identified 163 disease-associated loci that meet genome-wide significance for these two inflammatory bowel diseases (IBD). These discoveries have already had a tremendous impact on our understanding of the genetic architecture of these diseases and have directed functional studies that have revealed some of the biological functions that are important to IBD (e.g. autophagy). Nonetheless, these loci can only explain a small proportion of disease variance (∼14% in CD and 7.5% in UC), suggesting that not only are additional loci to be found but that the known loci may contain high effect rare risk variants that have gone undetected by GWAS. To test this, we have used a targeted sequencing approach in 200 UC cases and 150 healthy controls (HC), all of French Canadian descent, to study 55 genes in regions associated with UC. We performed follow-up genotyping of 42 rare non-synonymous variants in independent case-control cohorts (totaling 14,435 UC cases and 20,204 HC). Our results confirmed significant association to rare non-synonymous coding variants in both IL23R and CARD9, previously identified from sequencing of CD loci, as well as identified a novel association in RNF186. With the exception of CARD9 (OR = 0.39), the rare non-synonymous variants identified were of moderate effect (OR = 1.49 for RNF186 and OR = 0.79 for IL23R). RNF186 encodes a protein with a RING domain having predicted E3 ubiquitin-protein ligase activity and two transmembrane domains. Importantly, the disease-coding variant is located in the ubiquitin ligase domain. Finally, our results suggest that rare variants in genes identified by genome-wide association in UC are unlikely to contribute significantly to the overall variance for the disease. Rather, these are expected to help focus functional studies of the corresponding disease loci.

    PLoS genetics 2013;9;9;e1003723

  • Distinct H3F3A and H3F3B driver mutations define chondroblastoma and giant cell tumor of bone.

    Behjati S, Tarpey PS, Presneau N, Scheipl S, Pillay N, Van Loo P, Wedge DC, Cooke SL, Gundem G, Davies H, Nik-Zainal S, Martin S, McLaren S, Goody V, Goodie V, Robinson B, Butler A, Teague JW, Halai D, Khatri B, Myklebost O, Baumhoer D, Jundt G, Hamoudi R, Tirabosco R, Amary MF, Futreal PA, Stratton MR, Campbell PJ and Flanagan AM

    1] Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. [2] Department of Paediatrics, University of Cambridge, Cambridge, UK. [3].

    It is recognized that some mutated cancer genes contribute to the development of many cancer types, whereas others are cancer type specific. For genes that are mutated in multiple cancer classes, mutations are usually similar in the different affected cancer types. Here, however, we report exquisite tumor type specificity for different histone H3.3 driver alterations. In 73 of 77 cases of chondroblastoma (95%), we found p.Lys36Met alterations predominantly encoded in H3F3B, which is one of two genes for histone H3.3. In contrast, in 92% (49/53) of giant cell tumors of bone, we found histone H3.3 alterations exclusively in H3F3A, leading to p.Gly34Trp or, in one case, p.Gly34Leu alterations. The mutations were restricted to the stromal cell population and were not detected in osteoclasts or their precursors. In the context of previously reported H3F3A mutations encoding p.Lys27Met and p.Gly34Arg or p.Gly34Val alterations in childhood brain tumors, a remarkable picture of tumor type specificity for histone H3.3 driver alterations emerges, indicating that histone H3.3 residues, mutations and genes have distinct functions.

    Funded by: Cancer Research UK; Wellcome Trust: 077012/Z/05/Z, 098051, WT088340MA

    Nature genetics 2013;45;12;1479-82

  • Microbial genomes as cheat sheets.

    Bennett HM

    Nature reviews. Microbiology 2013;11;5;302

  • Genome-wide meta-analysis identifies 11 new loci for anthropometric traits and provides insights into genetic architecture.

    Berndt SI, Gustafsson S, Mägi R, Ganna A, Wheeler E, Feitosa MF, Justice AE, Monda KL, Croteau-Chonka DC, Day FR, Esko T, Fall T, Ferreira T, Gentilini D, Jackson AU, Luan J, Randall JC, Vedantam S, Willer CJ, Winkler TW, Wood AR, Workalemahu T, Hu YJ, Lee SH, Liang L, Lin DY, Min JL, Neale BM, Thorleifsson G, Yang J, Albrecht E, Amin N, Bragg-Gresham JL, Cadby G, den Heijer M, Eklund N, Fischer K, Goel A, Hottenga JJ, Huffman JE, Jarick I, Johansson Å, Johnson T, Kanoni S, Kleber ME, König IR, Kristiansson K, Kutalik Z, Lamina C, Lecoeur C, Li G, Mangino M, McArdle WL, Medina-Gomez C, Müller-Nurasyid M, Ngwa JS, Nolte IM, Paternoster L, Pechlivanis S, Perola M, Peters MJ, Preuss M, Rose LM, Shi J, Shungin D, Smith AV, Strawbridge RJ, Surakka I, Teumer A, Trip MD, Tyrer J, Van Vliet-Ostaptchouk JV, Vandenput L, Waite LL, Zhao JH, Absher D, Asselbergs FW, Atalay M, Attwood AP, Balmforth AJ, Basart H, Beilby J, Bonnycastle LL, Brambilla P, Bruinenberg M, Campbell H, Chasman DI, Chines PS, Collins FS, Connell JM, Cookson WO, de Faire U, de Vegt F, Dei M, Dimitriou M, Edkins S, Estrada K, Evans DM, Farrall M, Ferrario MM, Ferrières J, Franke L, Frau F, Gejman PV, Grallert H, Grönberg H, Gudnason V, Hall AS, Hall P, Hartikainen AL, Hayward C, Heard-Costa NL, Heath AC, Hebebrand J, Homuth G, Hu FB, Hunt SE, Hyppönen E, Iribarren C, Jacobs KB, Jansson JO, Jula A, Kähönen M, Kathiresan S, Kee F, Khaw KT, Kivimäki M, Koenig W, Kraja AT, Kumari M, Kuulasmaa K, Kuusisto J, Laitinen JH, Lakka TA, Langenberg C, Launer LJ, Lind L, Lindström J, Liu J, Liuzzi A, Lokki ML, Lorentzon M, Madden PA, Magnusson PK, Manunta P, Marek D, März W, Mateo Leach I, McKnight B, Medland SE, Mihailov E, Milani L, Montgomery GW, Mooser V, Mühleisen TW, Munroe PB, Musk AW, Narisu N, Navis G, Nicholson G, Nohr EA, Ong KK, Oostra BA, Palmer CN, Palotie A, Peden JF, Pedersen N, Peters A, Polasek O, Pouta A, Pramstaller PP, Prokopenko I, Pütter C, Radhakrishnan A, Raitakari O, Rendon A, Rivadeneira F, Rudan I, Saaristo TE, Sambrook JG, Sanders AR, Sanna S, Saramies J, Schipf S, Schreiber S, Schunkert H, Shin SY, Signorini S, Sinisalo J, Skrobek B, Soranzo N, Stančáková A, Stark K, Stephens JC, Stirrups K, Stolk RP, Stumvoll M, Swift AJ, Theodoraki EV, Thorand B, Tregouet DA, Tremoli E, Van der Klauw MM, van Meurs JB, Vermeulen SH, Viikari J, Virtamo J, Vitart V, Waeber G, Wang Z, Widén E, Wild SH, Willemsen G, Winkelmann BR, Witteman JC, Wolffenbuttel BH, Wong A, Wright AF, Zillikens MC, Amouyel P, Boehm BO, Boerwinkle E, Boomsma DI, Caulfield MJ, Chanock SJ, Cupples LA, Cusi D, Dedoussis GV, Erdmann J, Eriksson JG, Franks PW, Froguel P, Gieger C, Gyllensten U, Hamsten A, Harris TB, Hengstenberg C, Hicks AA, Hingorani A, Hinney A, Hofman A, Hovingh KG, Hveem K, Illig T, Jarvelin MR, Jöckel KH, Keinanen-Kiukaanniemi SM, Kiemeney LA, Kuh D, Laakso M, Lehtimäki T, Levinson DF, Martin NG, Metspalu A, Morris AD, Nieminen MS, Njølstad I, Ohlsson C, Oldehinkel AJ, Ouwehand WH, Palmer LJ, Penninx B, Power C, Province MA, Psaty BM, Qi L, Rauramaa R, Ridker PM, Ripatti S, Salomaa V, Samani NJ, Snieder H, Sørensen TI, Spector TD, Stefansson K, Tönjes A, Tuomilehto J, Uitterlinden AG, Uusitupa M, van der Harst P, Vollenweider P, Wallaschofski H, Wareham NJ, Watkins H, Wichmann HE, Wilson JF, Abecasis GR, Assimes TL, Barroso I, Boehnke M, Borecki IB, Deloukas P, Fox CS, Frayling T, Groop LC, Haritunian T, Heid IM, Hunter D, Kaplan RC, Karpe F, Moffatt MF, Mohlke KL, O'Connell JR, Pawitan Y, Schadt EE, Schlessinger D, Steinthorsdottir V, Strachan DP, Thorsteinsdottir U, van Duijn CM, Visscher PM, Di Blasio AM, Hirschhorn JN, Lindgren CM, Morris AP, Meyre D, Scherag A, McCarthy MI, Speliotes EK, North KE, Loos RJ and Ingelsson E

    US Department of Health and Human Services, Division of Cancer Epidemiology and Genetics, National Cancer Institute, US National Institutes of Health, Bethesda, Maryland, USA.

    Approaches exploiting trait distribution extremes may be used to identify loci associated with common traits, but it is unknown whether these loci are generalizable to the broader population. In a genome-wide search for loci associated with the upper versus the lower 5th percentiles of body mass index, height and waist-to-hip ratio, as well as clinical classes of obesity, including up to 263,407 individuals of European ancestry, we identified 4 new loci (IGFBP4, H6PD, RSRC1 and PPP2R2A) influencing height detected in the distribution tails and 7 new loci (HNF4G, RPTOR, GNAT2, MRPS33P4, ADCY9, HS6ST3 and ZZZ3) for clinical classes of obesity. Further, we find a large overlap in genetic structure and the distribution of variants between traits based on extremes and the general population and little etiological heterogeneity between obesity subgroups.

    Funded by: British Heart Foundation: PG/11/63/29011; Cancer Research UK; Chief Scientist Office: CZB/4/710; Medical Research Council: G0600237, G0601261, G1000143, G9521010, MC_PC_U127561128, MC_U105260558, MC_U106179471, MC_U106179472, MC_U106188470, MC_U123092720; NHLBI NIH HHS: R01 HL105756; NIDDK NIH HHS: R01 DK072193, R01 DK075787; NIGMS NIH HHS: T32 GM074905; Wellcome Trust: 090532, 097117, 098017

    Nature genetics 2013;45;5;501-12

  • The evolutionary dynamics of influenza A virus adaptation to mammalian hosts.

    Bhatt S, Lam TT, Lycett SJ, Leigh Brown AJ, Bowden TA, Holmes EC, Guan Y, Wood JL, Brown IH, Kellam P, Combating Swine Influenza Consortium and Pybus OG

    Department of Zoology, University of Oxford, Oxford, UK.

    Few questions on infectious disease are more important than understanding how and why avian influenza A viruses successfully emerge in mammalian populations, yet little is known about the rate and nature of the virus' genetic adaptation in new hosts. Here, we measure, for the first time, the genomic rate of adaptive evolution of swine influenza viruses (SwIV) that originated in birds. By using a curated dataset of more than 24 000 human and swine influenza gene sequences, including 41 newly characterized genomes, we reconstructed the adaptive dynamics of three major SwIV lineages (Eurasian, EA; classical swine, CS; triple reassortant, TR). We found that, following the transfer of the EA lineage from birds to swine in the late 1970s, EA virus genes have undergone substantially faster adaptive evolution than those of the CS lineage, which had circulated among swine for decades. Further, the adaptation rates of the EA lineage antigenic haemagglutinin and neuraminidase genes were unexpectedly high and similar to those observed in human influenza A. We show that the successful establishment of avian influenza viruses in swine is associated with raised adaptive evolution across the entire genome for many years after zoonosis, reflecting the contribution of multiple mutations to the coordinated optimization of viral fitness in a new environment. This dynamics is replicated independently in the polymerase genes of the TR lineage, which established in swine following separate transmission from non-swine hosts.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council: MC_G0902096; Wellcome Trust

    Philosophical transactions of the Royal Society of London. Series B, Biological sciences 2013;368;1614;20120382

  • Genome-wide association study of intraocular pressure identifies the GLCCI1/ICA1 region as a glaucoma susceptibility locus.

    Blue Mountains Eye Study (BMES) and The Wellcome Trust Case Control Consortium 2 (WTCCC2)

    To discover quantitative trait loci for intraocular pressure, a major risk factor for glaucoma and the only modifiable one, we performed a genome-wide association study on a discovery cohort of 2175 individuals from Sydney, Australia. We found a novel association between intraocular pressure and a common variant at 7p21 near to GLCCI1 and ICA1. The findings in this region were confirmed through two UK replication cohorts totalling 4866 individuals (rs59072263, Pcombined = 1.10 × 10(-8)). A copy of the G allele at this SNP is associated with an increase in mean IOP of 0.45 mmHg (95%CI = 0.30-0.61 mmHg). These results lend support to the implication of vesicle trafficking and glucocorticoid inducibility pathways in the determination of intraocular pressure and in the pathogenesis of primary open-angle glaucoma.

    Human molecular genetics 2013;22;22;4653-60

  • Uniparental markers in Italy reveal a sex-biased genetic structure and different historical strata.

    Boattini A, Martinez-Cruz B, Sarno S, Harmant C, Useli A, Sanz P, Yang-Yao D, Manry J, Ciani G, Luiselli D, Quintana-Murci L, Comas D, Pettener D and Genographic Consortium

    Laboratorio di Antropologia Molecolare, Dipartimento di Scienze Biologiche, Geologiche e Ambientali, Università di Bologna, Bologna, Italy.

    Located in the center of the Mediterranean landscape and with an extensive coastal line, the territory of what is today Italy has played an important role in the history of human settlements and movements of Southern Europe and the Mediterranean Basin. Populated since Paleolithic times, the complexity of human movements during the Neolithic, the Metal Ages and the most recent history of the two last millennia (involving the overlapping of different cultural and demic strata) has shaped the pattern of the modern Italian genetic structure. With the aim of disentangling this pattern and understanding which processes more importantly shaped the distribution of diversity, we have analyzed the uniparentally-inherited markers in ∼900 individuals from an extensive sampling across the Italian peninsula, Sardinia and Sicily. Spatial PCAs and DAPCs revealed a sex-biased pattern indicating different demographic histories for males and females. Besides the genetic outlier position of Sardinians, a North West-South East Y-chromosome structure is found in continental Italy. Such structure is in agreement with recent archeological syntheses indicating two independent and parallel processes of Neolithisation. In addition, date estimates pinpoint the importance of the cultural and demographic events during the late Neolithic and Metal Ages. On the other hand, mitochondrial diversity is distributed more homogeneously in agreement with older population events that might be related to the presence of an Italian Refugium during the last glacial period in Europe.

    PloS one 2013;8;5;e65441

  • Compression of FASTQ and SAM Format Sequencing Data.

    Bonfield JK and Mahoney MV

    Wellcome Trust Sanger Institute, Cambridge, United Kingdom.

    Storage and transmission of the data produced by modern DNA sequencing instruments has become a major concern, which prompted the Pistoia Alliance to pose the SequenceSqueeze contest for compression of FASTQ files. We present several compression entries from the competition, Fastqz and Samcomp/Fqzcomp, including the winning entry. These are compared against existing algorithms for both reference based compression (CRAM, Goby) and non-reference based compression (DSRC, BAM) and other recently published competition entries (Quip, SCALCE). The tools are shown to be the new Pareto frontier for FASTQ compression, offering state of the art ratios at affordable CPU costs. All programs are freely available on SourceForge. Fastqz: https://sourceforge.net/projects/fastqz/, fqzcomp: https://sourceforge.net/projects/fqzcomp/, and samcomp: https://sourceforge.net/projects/samcomp/.

    PloS one 2013;8;3;e59190

  • A Single Multilocus Sequence Typing (MLST) Scheme for Seven Pathogenic Leptospira Species.

    Boonsilp S, Thaipadungpanit J, Amornchai P, Wuthiekanun V, Bailey MS, Holden MT, Zhang C, Jiang X, Koizumi N, Taylor K, Galloway R, Hoffmaster AR, Craig S, Smythe LD, Hartskeerl RA, Day NP, Chantratita N, Feil EJ, Aanensen DM, Spratt BG and Peacock SJ

    Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand ; Department of Microbiology and Immunology, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand.

    Background: The available Leptospira multilocus sequence typing (MLST) scheme supported by a MLST website is limited to L. interrogans and L. kirschneri. Our aim was to broaden the utility of this scheme to incorporate a total of seven pathogenic species. We modified the existing scheme by replacing one of the seven MLST loci (fadD was changed to caiB), as the former gene did not appear to be present in some pathogenic species. Comparison of the original and modified schemes using data for L. interrogans and L. kirschneri demonstrated that the discriminatory power of the two schemes was not significantly different. The modified scheme was used to further characterize 325 isolates (L. alexanderi [n = 5], L. borgpetersenii [n = 34], L. interrogans [n = 222], L. kirschneri [n = 29], L. noguchii [n = 9], L. santarosai [n = 10], and L. weilii [n = 16]). Phylogenetic analysis using concatenated sequences of the 7 loci demonstrated that each species corresponded to a discrete clade, and that no strains were misclassified at the species level. Comparison between genotype and serovar was possible for 254 isolates. Of the 31 sequence types (STs) represented by at least two isolates, 18 STs included isolates assigned to two or three different serovars. Conversely, 14 serovars were identified that contained between 2 to 10 different STs. New observations were made on the global phylogeography of Leptospira spp., and the utility of MLST in making associations between human disease and specific maintenance hosts was demonstrated. Conclusion: The new MLST scheme, supported by an updated MLST website, allows the characterization and species assignment of isolates of the seven major pathogenic species associated with leptospirosis.

    PLoS neglected tropical diseases 2013;7;1;e1954

  • Oxidative bisulfite sequencing of 5-methylcytosine and 5-hydroxymethylcytosine.

    Booth MJ, Ost TW, Beraldi D, Bell NM, Branco MR, Reik W and Balasubramanian S

    Department of Chemistry, University of Cambridge, Cambridge, UK.

    To uncover the function of and interplay between the mammalian cytosine modifications 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC), new techniques and advances in current technology are needed. To this end, we have developed oxidative bisulfite sequencing (oxBS-seq), which can quantitatively locate 5mC and 5hmC marks at single-base resolution in genomic DNA. In bisulfite sequencing (BS-seq), both 5mC and 5hmC are read as cytosines and thus cannot be discriminated; however, in oxBS-seq, specific oxidation of 5hmC to 5-formylcytosine (5fC) and conversion of the newly formed 5fC to uracil (under bisulfite conditions) means that 5hmC can be discriminated from 5mC. A positive readout of actual 5mC is gained from a single oxBS-seq run, and 5hmC levels are inferred by comparison with a BS-seq run. Here we describe an optimized second-generation protocol that can be completed in 2 d.

    Funded by: Biotechnology and Biological Sciences Research Council; Wellcome Trust

    Nature protocols 2013;8;10;1841-51

  • Genome-wide screen identifies new candidate genes associated with artemisinin susceptibility in Plasmodium falciparum in Kenya.

    Borrmann S, Straimer J, Mwai L, Abdi A, Rippert A, Okombo J, Muriithi S, Sasi P, Kortok MM, Lowe B, Campino S, Assefa S, Auburn S, Manske M, Maslen G, Peshu N, Kwiatkowski DP, Marsh K, Nzila A and Clark TG

    1] KEMRI-Wellcome Trust Research Programme, Kilifi, Kenya [2] Institute of Microbiology, Magdeburg University School of Medicine, Germany.

    Early identification of causal genetic variants underlying antimalarial drug resistance could provide robust epidemiological tools for timely public health interventions. Using a novel natural genetics strategy for mapping novel candidate genes we analyzed >75,000 high quality single nucleotide polymorphisms selected from high-resolution whole-genome sequencing data in 27 isolates of Plasmodium falciparum. We identified genetic variants associated with susceptibility to dihydroartemisinin that implicate one region on chromosome 13, a candidate gene on chromosome 1 (PFA0220w, a UBP1 ortholog) and others (PFB0560w, PFB0630c, PFF0445w) with putative roles in protein homeostasis and stress response. There was a strong signal for positive selection on PFA0220w, but not the other candidate loci. Our results demonstrate the power of full-genome sequencing-based association studies for uncovering candidate genes that determine parasite sensitivity to artemisinins. Our study provides a unique reference for the interpretation of results from resistant infections.

    Scientific reports 2013;3;3318

  • A variant in LDLR is associated with abdominal aortic aneurysm.

    Bradley DT, Hughes AE, Badger SA, Jones GT, Harrison SC, Wright BJ, Bumpstead S, Baas AF, Grétarsdóttir S, Burnand K, Child AH, Clough RE, Cockerill G, Hafez H, Scott DJ, Ariëns RA, Johnson A, Sohrabi S, Smith A, Thompson MM, van Bockxmeer FM, Waltham M, Matthíasson SE, Thorleifsson G, Thorsteinsdottir U, Blankensteijn JD, Teijink JA, Wijmenga C, de Graaf J, Kiemeney LA, Wild JB, Edkins S, Gwilliam R, Hunt SE, Potter S, Lindholt JS, Golledge J, Norman PE, van Rij A, Powell JT, Eriksson P, Stefánsson K, Thompson JR, Humphries SE, Sayers RD, Deloukas P, Samani NJ and Bown MJ

    Background: Abdominal aortic aneurysm (AAA) is a common cardiovascular disease among older people and demonstrates significant heritability. In contrast to similar complex diseases, relatively few genetic associations with AAA have been confirmed. We reanalyzed our genome-wide study and carried through to replication suggestive discovery associations at a lower level of significance. A genome-wide association study was conducted using 1830 cases from the United Kingdom, New Zealand, and Australia with infrarenal aorta diameter≥30 mm or ruptured AAA and 5435 unscreened controls from the 1958 Birth Cohort and National Blood Service cohort from the Wellcome Trust Case Control Consortium. Eight suggestive associations with P<1×10(-4) were carried through to in silico replication in 1292 AAA cases and 30,503 controls. One single-nucleotide polymorphism associated with P<0.05 after Bonferroni correction in the in silico study underwent further replication (706 AAA cases and 1063 controls from the United Kingdom, 507 AAA cases and 199 controls from Denmark, and 885 AAA cases and 1000 controls from New Zealand). Low-density lipoprotein receptor (LDLR) rs6511720 A was significantly associated overall and in 3 of 5 individual replication studies. The full study showed an association that reached genome-wide significance (odds ratio, 0.76; 95% confidence interval, 0.70-0.83; P=2.08×10(-10)). Conclusions: LDLR rs6511720 is associated with AAA. This finding is consistent with established effects of this variant on coronary artery disease. Shared causal pathways with other cardiovascular diseases may present novel opportunities for preventative and therapeutic strategies for AAA.

    Funded by: Wellcome Trust: 076113, 084695, 085475

    Circulation. Cardiovascular genetics 2013;6;5;498-504

  • Platelet Genomics

    Bray,P.F., Jones,C.I., SORANZO,N. and Ouwehand,W.H.

    Platelets 2013

  • A new method for high-resolution imaging of Ku foci to decipher mechanisms of DNA double-strand break repair.

    Britton S, Coates J and Jackson SP

    The Wellcome Trust and Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge, CB2 1QN, England, UK.

    DNA double-strand breaks (DSBs) are the most toxic of all genomic insults, and pathways dealing with their signaling and repair are crucial to prevent cancer and for immune system development. Despite intense investigations, our knowledge of these pathways has been technically limited by our inability to detect the main repair factors at DSBs in cells. In this paper, we present an original method that involves a combination of ribonuclease- and detergent-based preextraction with high-resolution microscopy. This method allows direct visualization of previously hidden repair complexes, including the main DSB sensor Ku, at virtually any type of DSB, including those induced by anticancer agents. We demonstrate its broad range of applications by coupling it to laser microirradiation, super-resolution microscopy, and single-molecule counting to investigate the spatial organization and composition of repair factories. Furthermore, we use our method to monitor DNA repair and identify mechanisms of repair pathway choice, and we show its utility in defining cellular sensitivities and resistance mechanisms to anticancer agents.

    Funded by: Cancer Research UK: A11224, C6/A11224, C6946/A14492; European Research Council: 268536; Wellcome Trust: WT092096

    The Journal of cell biology 2013;202;3;579-95

  • Neolithic mitochondrial haplogroup H genomes and the genetic origins of Europeans.

    Brotherton P, Haak W, Templeton J, Brandt G, Soubrier J, Jane Adler C, Richards SM, Sarkissian CD, Ganslmeier R, Friederich S, Dresely V, van Oven M, Kenyon R, Van der Hoek MB, Korlach J, Luong K, Ho SY, Quintana-Murci L, Behar DM, Meller H, Alt KW, Cooper A, Genographic Consortium, Adhikarla S, Ganesh Prasad AK, Pitchappan R, Varatharajan Santhakumari A, Balanovska E, Balanovsky O, Bertranpetit J, Comas D, Martínez-Cruz B, Melé M, Clarke AC, Matisoo-Smith EA, Dulik MC, Gaieski JB, Owings AC, Schurr TG, Vilar MG, Hobbs A, Soodyall H, Javed A, Parida L, Platt DE, Royyuru AK, Jin L, Li S, Kaplan ME, Merchant NC, John Mitchell R, Renfrew C, Lacerda DR, Santos FR, Soria Hernanz DF, Spencer Wells R, Swamikrishnan P, Tyler-Smith C, Paulo Vieira P and Ziegle JS

    The Australian Centre for Ancient DNA, School of Earth and Environmental Sciences, University of Adelaide, Adelaide, South Australia 5005, Australia. p.m.brotherton@hud.ac.uk

    Haplogroup H dominates present-day Western European mitochondrial DNA variability (>40%), yet was less common (~19%) among Early Neolithic farmers (~5450 BC) and virtually absent in Mesolithic hunter-gatherers. Here we investigate this major component of the maternal population history of modern Europeans and sequence 39 complete haplogroup H mitochondrial genomes from ancient human remains. We then compare this 'real-time' genetic data with cultural changes taking place between the Early Neolithic (~5450 BC) and Bronze Age (~2200 BC) in Central Europe. Our results reveal that the current diversity and distribution of haplogroup H were largely established by the Mid Neolithic (~4000 BC), but with substantial genetic contributions from subsequent pan-European cultures such as the Bell Beakers expanding out of Iberia in the Late Neolithic (~2800 BC). Dated haplogroup H genomes allow us to reconstruct the recent evolutionary history of haplogroup H and reveal a mutation rate 45% higher than current estimates for human mitochondria.

    Nature communications 2013;4;1764

  • Translating the human microbiome.

    Brown J, de Vos WM, DiStefano PS, Doré J, Huttenhower C, Knight R, Lawley TD, Raes J and Turnbaugh P

    GlaxoSmithKline, Collegeville, Pennsylvania, USA. james.r.brown@gsk.com

    Nature biotechnology 2013;31;4;304-8

  • Culture-free club.

    Bryant JM

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2013

  • Whole-genome sequencing to identify transmission of Mycobacterium abscessus between patients with cystic fibrosis: a retrospective cohort study.

    Bryant JM, Grogono DM, Greaves D, Foweraker J, Roddick I, Inns T, Reacher M, Haworth CS, Curran MD, Harris SR, Peacock SJ, Parkhill J and Floto RA

    Wellcome Trust Sanger Institute, Hinxton, UK.

    Background: Increasing numbers of individuals with cystic fibrosis are becoming infected with the multidrug-resistant non-tuberculous mycobacterium (NTM) Mycobacterium abscessus, which causes progressive lung damage and is extremely challenging to treat. How this organism is acquired is not currently known, but there is growing concern that person-to-person transmission could occur. We aimed to define the mechanisms of acquisition of M abscessus in individuals with cystic fibrosis.

    Method: Whole genome sequencing and antimicrobial susceptibility testing were done on 168 consecutive isolates of M abscessus from 31 patients attending an adult cystic fibrosis centre in the UK between 2007 and 2011. In parallel, we undertook detailed environmental testing for NTM and defined potential opportunities for transmission between patients both in and out of hospital using epidemiological data and social network analysis.

    Findings: Phylogenetic analysis revealed two clustered outbreaks of near-identical isolates of the M abscessus subspecies massiliense (from 11 patients), differing by less than ten base pairs. This variation represents less diversity than that seen within isolates from a single individual, strongly indicating between-patient transmission. All patients within these clusters had numerous opportunities for within-hospital transmission from other individuals, while comprehensive environmental sampling, initiated during the outbreak, failed to detect any potential point source of NTM infection. The clusters of M abscessus subspecies massiliense showed evidence of transmission of mutations acquired during infection of an individual to other patients. Thus, isolates with constitutive resistance to amikacin and clarithromycin were isolated from several individuals never previously exposed to long-term macrolides or aminoglycosides, further indicating cross-infection.

    Interpretation: Whole genome sequencing has revealed frequent transmission of multidrug resistant NTM between patients with cystic fibrosis despite conventional cross-infection measures. Although the exact transmission route is yet to be established, our epidemiological analysis suggests that it could be indirect.

    Funding: The Wellcome Trust, Papworth Hospital, NIHR Cambridge Biomedical Research Centre, UK Health Protection Agency, Medical Research Council, and the UKCRC Translational Infection Research Initiative.

    Funded by: Medical Research Council: G1000803; Wellcome Trust: 084953, 098051, 100140

    Lancet 2013;381;9877;1551-60

  • Transmission of M abscessus in patients with cystic fibrosis - Authors' reply.

    Bryant JM, Grogono DM, Parkhill J and Floto RA

    Lancet 2013;382;9891;504

  • Whole-genome sequencing to establish relapse or re-infection with Mycobacterium tuberculosis: a retrospective observational study.

    Bryant JM, Harris SR, Parkhill J, Dawson R, Diacon AH, van Helden P, Pym A, Mahayiddin AA, Chuchottaworn C, Sanne IM, Louw C, Boeree MJ, Hoelscher M, McHugh TD, Bateson AL, Hunt RD, Mwaigwisya S, Wright L, Gillespie SH and Bentley SD

    Wellcome Trust Sanger Institute, Hinxton, UK.

    Background: Recurrence of tuberculosis after treatment makes management difficult and is a key factor for determining treatment efficacy. Two processes can cause recurrence: relapse of the primary infection or re-infection with an exogenous strain. Although re-infection can and does occur, its importance to tuberculosis epidemiology and its biological basis is still debated. We used whole-genome sequencing-which is more accurate than conventional typing used to date-to assess the frequency of recurrence and to gain insight into the biological basis of re-infection. Methods: We assessed patients from the REMoxTB trial-a randomised controlled trial of tuberculosis treatment that enrolled previously untreated participants with Mycobacterium tuberculosis infection from Malaysia, South Africa, and Thailand. We did whole-genome sequencing and mycobacterial interspersed repetitive unit-variable number of tandem repeat (MIRU-VNTR) typing of pairs of isolates taken by sputum sampling: one from before treatment and another from either the end of failed treatment at 17 weeks or later or from a recurrent infection. We compared the number and location of SNPs between isolates collected at baseline and recurrence. Findings: We assessed 47 pairs of isolates. Whole-genome sequencing identified 33 cases with little genetic distance (0-6 SNPs) between strains, deemed relapses, and three cases for which the genetic distance ranged from 1306 to 1419 SNPs, deemed re-infections. Six cases of relapse and six cases of mixed infection were classified differently by whole-genome sequencing and MIRU-VNTR. We detected five single positive isolates (positive culture followed by at least two negative cultures) without clinical evidence of disease. Interpretation: Whole-genome sequencing enables the differentiation of relapse and re-infection cases with greater resolution than do genotyping methods used at present, such as MIRU-VNTR, and provides insights into the biology of recurrence. The additional clarity provided by whole-genome sequencing might have a role in defining endpoints for clinical trials. Funding: Wellcome Trust, European Union, Medical Research Council, Global Alliance for TB Drug Development, European and Developing Country Clinical Trials Partnership.

    The lancet. Respiratory medicine 2013;1;10;786-92

  • Inferring patient to patient transmission of Mycobacterium tuberculosis from whole genome sequencing data.

    Bryant JM, Schürch AC, van Deutekom H, Harris SR, de Beer JL, de Jager V, Kremer K, van Hijum SA, Siezen RJ, Borgdorff M, Bentley SD, Parkhill J and van Soolingen D

    BACKGROUND: Mycobacterium tuberculosis is characterised by limited genomic diversity, which makes the application of whole genome sequencing particularly attractive for clinical and epidemiological investigation. However, in order to confidently infer transmission events, an accurate knowledge of the rate of change in the genome over relevant timescales is required. METHODS: We attempted to estimate a molecular clock by sequencing 199 isolates from epidemiologically linked tuberculosis cases, collected in the Netherlands spanning almost 16 years. RESULTS: Multiple analyses support an average mutation rate of ~0.3 SNPs per genome per year. However, all analyses revealed a very high degree of variation around this mean, making the confirmation of links proposed by epidemiology, and inference of novel links, difficult. Despite this, in some cases, the phylogenetic context of other strains provided evidence supporting the confident exclusion of previously inferred epidemiological links. CONCLUSIONS: This in-depth analysis of the molecular clock revealed that it is slow and variable over short time scales, which limits its usefulness in transmission studies. However, the superior resolution of whole genome sequencing can provide the phylogenetic context to allow the confident exclusion of possible transmission events previously inferred via traditional DNA fingerprinting techniques and epidemiological cluster investigation. Despite the slow generation of variation even at the whole genome level we conclude that the investigation of tuberculosis transmission will benefit greatly from routine whole genome sequencing.

    BMC infectious diseases 2013;13;1;110

  • Discovery by the Epistasis Project of an epistatic interaction between the GSTM3 gene and the HHEX/IDE/KIF11 locus in the risk of Alzheimer's disease.

    Bullock JM, Medway C, Cortina-Borja M, Turton JC, Prince JA, Ibrahim-Verbaas CA, Schuur M, Breteler MM, van Duijn CM, Kehoe PG, Barber R, Coto E, Alvarez V, Deloukas P, Hammond N, Combarros O, Mateo I, Warden DR, Lehmann MG, Belbin O, Brown K, Wilcock GK, Heun R, Kölsch H, Smith AD, Lehmann DJ and Morgan K

    Human Genetics, School of Molecular Medical Sciences, Queen's Medical Centre, University of Nottingham, Nottingham, UK.

    Despite recent discoveries in the genetics of sporadic Alzheimer's disease, there remains substantial "hidden heritability." It is thought that some of this missing heritability may be because of gene-gene, i.e., epistatic, interactions. We examined potential epistasis between 110 candidate polymorphisms in 1757 cases of Alzheimer's disease and 6294 control subjects of the Epistasis Project, divided between a discovery and a replication dataset. We found an epistatic interaction, between rs7483 in GSTM3 and rs1111875 in the HHEX/IDE/KIF11 gene cluster, with a closely similar, significant result in both datasets. The synergy factor (SF) in the combined dataset was 1.79, 95% confidence interval [CI], 1.35-2.36; p = 0.00004. Consistent interaction was also found in 7 out of the 8 additional subsets that we examined post hoc: i.e., it was shown in both North Europe and North Spain, in both men and women, in both those with and without the ε4 allele of apolipoprotein E, and in people older than 75 years (SF, 2.27; 95% CI, 1.60-3.20; p < 0.00001), but not in those younger than 75 years (SF, 1.06; 95% CI, 0.59-1.91; p = 0.84). The association with Alzheimer's disease was purely epistatic with neither polymorphism showing an independent effect: odds ratio, 1.0; p ≥ 0.7. Indeed, each factor was associated with protection in the absence of the other factor, but with risk in its presence. In conclusion, this epistatic interaction showed a high degree of consistency when stratifying by sex, the ε4 allele of apolipoprotein E genotype, and geographic region.

    Funded by: Department of Health; Medical Research Council: G0400546

    Neurobiology of aging 2013;34;4;1309.e1-7

  • Headbobber: a combined morphogenetic and cochleosaccular mouse model to study 10qter deletions in human deafness.

    Buniello A, Hardisty-Hughes RE, Pass JC, Bober E, Smith RJ and Steel KP

    Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom ; Wolfson Centre for Age-Related Diseases, King's College London, London, United Kingdom.

    The recessive mouse mutant headbobber () displays the characteristic behavioural traits associated with vestibular defects including headbobbing, circling and deafness. This mutation was caused by the insertion of a transgene into distal chromosome 7 affecting expression of native genes. We show that the inner ear of mutants lacks semicircular canals and cristae, and the saccule and utricle are fused together in a single utriculosaccular sac. Moreover, we detect severe abnormalities of the cochlear sensory hair cells, the stria vascularis looks severely disorganised, Reissner's membrane is collapsed and no endocochlear potential is detected. Myo7a and Kcnj10 expression analysis show a lack of the melanocyte-like intermediate cells in stria vascularis, which can explain the absence of endocochlear potential. We use Trp2 as a marker of melanoblasts migrating from the neural crest at E12.5 and show that they do not interdigitate into the developing strial epithelium, associated with abnormal persistence of the basal lamina in the cochlea. We perform array CGH, deep sequencing as well as an extensive expression analysis of candidate genes in the headbobber region of and littermate controls, and conclude that the headbobber phenotype is caused by: 1) effect of a 648 kb deletion on distal Chr7, resulting in the loss of three protein coding genes (, and ) with expression in the inner ear but unknown function; and 2) indirect, long range effect of the deletion on the expression of neighboring genes on Chr7, associated with downregulation of and homeobox transcription factors. Interestingly, deletions of the orthologous region in humans, affecting the same genes, have been reported in nineteen patients with common features including sensorineural hearing loss and vestibular problems. Therefore, we propose that headbobber is a useful model to gain insight into the mechanisms underlying deafness in human 10qter deletion syndrome.

    PloS one 2013;8;2;e56274

  • Missense mutations in β-1,3-N-acetylglucosaminyltransferase 1 (B3GNT1) cause Walker-Warburg syndrome.

    Buysse K, Riemersma M, Powell G, van Reeuwijk J, Chitayat D, Roscioli T, Kamsteeg EJ, van den Elzen C, van Beusekom E, Blaser S, Babul-Hirji R, Halliday W, Wright GJ, Stemple DL, Lin YY, Lefeber DJ and van Bokhoven H

    The authors wish it to be known that, in their opinion, the first five authors should be regarded as joint First Authors.

    Several known or putative glycosyltransferases are required for the synthesis of laminin-binding glycans on alpha-dystroglycan (αDG), including POMT1, POMT2, POMGnT1, LARGE, Fukutin, FKRP, ISPD and GTDC2. Mutations in these glycosyltransferase genes result in defective αDG glycosylation and reduced ligand binding by αDG causing a clinically heterogeneous group of congenital muscular dystrophies, commonly referred to as dystroglycanopathies. The most severe clinical form, Walker-Warburg syndrome (WWS), is characterized by congenital muscular dystrophy and severe neurological and ophthalmological defects. Here, we report two homozygous missense mutations in the β-1,3-N-acetylglucosaminyltransferase 1 (B3GNT1) gene in a family affected with WWS. Functional studies confirmed the pathogenicity of the mutations. First, expression of wild-type but not mutant B3GNT1 in human prostate cancer (PC3) cells led to increased levels of αDG glycosylation. Second, morpholino knockdown of the zebrafish b3gnt1 orthologue caused characteristic muscular defects and reduced αDG glycosylation. These functional studies identify an important role of B3GNT1 in the synthesis of the uncharacterized laminin-binding glycan of αDG and implicate B3GNT1 as a novel causative gene for WWS.

    Human molecular genetics 2013;22;9;1746-54

  • A CRISPR view of genome sequences.

    Cain AK and Boinett CJ

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. microbes@sanger.ac.uk.

    This month's Genome Watch explores recent applications of the CRISPR immune system for bacterial phylogenetic analysis and genome editing.

    Nature reviews. Microbiology 2013

  • Large-scale association analysis identifies new risk loci for coronary artery disease.

    CARDIoGRAMplusC4D Consortium, Deloukas P, Kanoni S, Willenborg C, Farrall M, Assimes TL, Thompson JR, Ingelsson E, Saleheen D, Erdmann J, Goldstein BA, Stirrups K, König IR, Cazier JB, Johansson A, Hall AS, Lee JY, Willer CJ, Chambers JC, Esko T, Folkersen L, Goel A, Grundberg E, Havulinna AS, Ho WK, Hopewell JC, Eriksson N, Kleber ME, Kristiansson K, Lundmark P, Lyytikäinen LP, Rafelt S, Shungin D, Strawbridge RJ, Thorleifsson G, Tikkanen E, Van Zuydam N, Voight BF, Waite LL, Zhang W, Ziegler A, Absher D, Altshuler D, Balmforth AJ, Barroso I, Braund PS, Burgdorf C, Claudi-Boehm S, Cox D, Dimitriou M, Do R, DIAGRAM Consortium, CARDIOGENICS Consortium, Doney AS, El Mokhtari N, Eriksson P, Fischer K, Fontanillas P, Franco-Cereceda A, Gigante B, Groop L, Gustafsson S, Hager J, Hallmans G, Han BG, Hunt SE, Kang HM, Illig T, Kessler T, Knowles JW, Kolovou G, Kuusisto J, Langenberg C, Langford C, Leander K, Lokki ML, Lundmark A, McCarthy MI, Meisinger C, Melander O, Mihailov E, Maouche S, Morris AD, Müller-Nurasyid M, MuTHER Consortium, Nikus K, Peden JF, Rayner NW, Rasheed A, Rosinger S, Rubin D, Rumpf MP, Schäfer A, Sivananthan M, Song C, Stewart AF, Tan ST, Thorgeirsson G, van der Schoot CE, Wagner PJ, Wellcome Trust Case Control Consortium, Wells GA, Wild PS, Yang TP, Amouyel P, Arveiler D, Basart H, Boehnke M, Boerwinkle E, Brambilla P, Cambien F, Cupples AL, de Faire U, Dehghan A, Diemert P, Epstein SE, Evans A, Ferrario MM, Ferrières J, Gauguier D, Go AS, Goodall AH, Gudnason V, Hazen SL, Holm H, Iribarren C, Jang Y, Kähönen M, Kee F, Kim HS, Klopp N, Koenig W, Kratzer W, Kuulasmaa K, Laakso M, Laaksonen R, Lee JY, Lind L, Ouwehand WH, Parish S, Park JE, Pedersen NL, Peters A, Quertermous T, Rader DJ, Salomaa V, Schadt E, Shah SH, Sinisalo J, Stark K, Stefansson K, Trégouët DA, Virtamo J, Wallentin L, Wareham N, Zimmermann ME, Nieminen MS, Hengstenberg C, Sandhu MS, Pastinen T, Syvänen AC, Hovingh GK, Dedoussis G, Franks PW, Lehtimäki T, Metspalu A, Zalloua PA, Siegbahn A, Schreiber S, Ripatti S, Blankenberg SS, Perola M, Clarke R, Boehm BO, O'Donnell C, Reilly MP, März W, Collins R, Kathiresan S, Hamsten A, Kooner JS, Thorsteinsdottir U, Danesh J, Palmer CN, Roberts R, Watkins H, Schunkert H and Samani NJ

    Coronary artery disease (CAD) is the commonest cause of death. Here, we report an association analysis in 63,746 CAD cases and 130,681 controls identifying 15 loci reaching genome-wide significance, taking the number of susceptibility loci for CAD to 46, and a further 104 independent variants (r(2) < 0.2) strongly associated with CAD at a 5% false discovery rate (FDR). Together, these variants explain approximately 10.6% of CAD heritability. Of the 46 genome-wide significant lead SNPs, 12 show a significant association with a lipid trait, and 5 show a significant association with blood pressure, but none is significantly associated with diabetes. Network analysis with 233 candidate genes (loci at 10% FDR) generated 5 interaction networks comprising 85% of these putative genes involved in CAD. The four most significant pathways mapping to these networks are linked to lipid metabolism and inflammation, underscoring the causal role of these activities in the genetic etiology of CAD. Our study provides insights into the genetic basis of CAD and identifies key biological pathways.

    Funded by: British Heart Foundation: PG/08/094/26019, RG/08/014/24067; Medical Research Council: G0801566; NHLBI NIH HHS: K24 HL107643, R00 HL094535, R01 HL111694; NIDDK NIH HHS: R01 DK062370

    Nature genetics 2013;45;1;25-33

  • Pitpnm1 is expressed in hair cells during development but is not required for hearing.

    Carlisle FA, Pearson S, Steel KP and Lewis MA

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, Cambs, CB10 1SA, United Kingdom.

    Deafness is a genetically complex disorder with many contributing genes still unknown. Here we describe the expression of Pitpnm1 in the inner ear. It is expressed in the inner hair cells of the organ of Corti from late embryonic stages until adulthood, and transiently in the outer hair cells during early postnatal stages. Despite this specific expression, Pitpnm1 null mice showed no hearing defects, possibly due to redundancy with the paralogous genes Pitpnm2 and Pitpnm3.

    Neuroscience 2013

  • Adaptive Changes of the Insig1/SREBP1/SCD1 Set Point Help Adipose Tissue to Cope With Increased Storage Demands of Obesity.

    Carobbio S, Hagen RM, Lelliott CJ, Slawik M, Medina-Gomez G, Tan CY, Sicard A, Atherton HJ, Barbarroja N, Bjursell M, Bohlooly-Y M, Virtue S, Tuthill A, Lefai E, Laville M, Wu T, Considine RV, Vidal H, Langin D, Oresic M, Tinahones FJ, Fernandez-Real JM, Griffin JL, Sethi JK, López M and Vidal-Puig A

    University of Cambridge, Metabolic Research Laboratories, Institute of Metabolic Science Addenbrooke's Treatment Centre, Addenbrooke's Hospital, Cambridge, U.K.

    The epidemic of obesity imposes unprecedented challenges on human adipose tissue (WAT) storage capacity that may benefit from adaptive mechanisms to maintain adipocyte functionality. Here, we demonstrate that changes in the regulatory feedback set point control of Insig1/SREBP1 represent an adaptive response that preserves WAT lipid homeostasis in obese and insulin-resistant states. In our experiments, we show that Insig1 mRNA expression decreases in WAT from mice with obesity-associated insulin resistance and from morbidly obese humans and in in vitro models of adipocyte insulin resistance. Insig1 downregulation is part of an adaptive response that promotes the maintenance of SREBP1 maturation and facilitates lipogenesis and availability of appropriate levels of fatty acid unsaturation, partially compensating the antilipogenic effect associated with insulin resistance. We describe for the first time the existence of this adaptive mechanism in WAT, which involves Insig1/SREBP1 and preserves the degree of lipid unsaturation under conditions of obesity-induced insulin resistance. These adaptive mechanisms contribute to maintain lipid desaturation through preferential SCD1 regulation and facilitate fat storage in WAT, despite on-going metabolic stress.

    Diabetes 2013;62;11;3697-708

  • Adipogenesis: new insights into brown adipose tissue differentiation.

    Carobbio S, Rosen B and Vidal-Puig A

    S Carobbio, Wellcome Trust Genome Campus, Welcome Trust Sanger Insitute, Cambridge, United Kingdom.

    Confirmation of the presence of functional brown adipose tissue (BAT) in humans has renewed the interest in investigating the potential therapeutic use of this tissue. The finding that its activity positively correlates with decreased BMI, fat content and augmented energy expenditure suggests that increasing BAT mass/activity or browning of WAT could be a strategy to prevent or treat obesity and its associated morbidities. The challenge now is to find a safe and efficient way to develop this idea. Whereas BAT has being widely studied in murine models both in vivo and in vitro, there is an urgent need for human cellular models to investigate BAT physiology and functionality from a molecular point of view. In our review, we focus on the latest insights surrounding BAT development and activation in rodents and humans. Then, we discuss how the availability of murine models has been essential to identify BAT progenitors and trace their lineage. Finally, we address how this information can be exploited to develop human cellular models for BAT differentiation/activation. In this context, human embryonic (hES) and induced plutipotent cells (hIPS)-based cellular models represent a resource of great potential value, as they can provide a virtually inexhaustible supply of starting material for functional genetic studies, -omics based analysis and validation of therapeutic approaches. Moreover, these cells can be easily genetically engineered, opening the possibility of generating patient-specific cellular models, allowing the investigation of the impact of different genetic backgrounds on BAT differentiation both in pathological or physiological states.

    Journal of molecular endocrinology 2013

  • Mutations in GDP-Mannose Pyrophosphorylase B Cause Congenital and Limb-Girdle Muscular Dystrophies Associated with Hypoglycosylation of α-Dystroglycan.

    Carss KJ, Stevens E, Foley AR, Cirak S, Riemersma M, Torelli S, Hoischen A, Willer T, van Scherpenzeel M, Moore SA, Messina S, Bertini E, Bönnemann CG, Abdenur JE, Grosmann CM, Kesari A, Punetha J, Quinlivan R, Waddell LB, Young HK, Wraige E, Yau S, Brodd L, Feng L, Sewry C, Macarthur DG, North KN, Hoffman E, Stemple DL, Hurles ME, van Bokhoven H, Campbell KP, Lefeber DJ, UK10K Consortium, Lin YY and Muntoni F

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.

    Congenital muscular dystrophies with hypoglycosylation of α-dystroglycan (α-DG) are a heterogeneous group of disorders often associated with brain and eye defects in addition to muscular dystrophy. Causative variants in 14 genes thought to be involved in the glycosylation of α-DG have been identified thus far. Allelic mutations in these genes might also cause milder limb-girdle muscular dystrophy phenotypes. Using a combination of exome and Sanger sequencing in eight unrelated individuals, we present evidence that mutations in guanosine diphosphate mannose (GDP-mannose) pyrophosphorylase B (GMPPB) can result in muscular dystrophy variants with hypoglycosylated α-DG. GMPPB catalyzes the formation of GDP-mannose from GTP and mannose-1-phosphate. GDP-mannose is required for O-mannosylation of proteins, including α-DG, and it is the substrate of cytosolic mannosyltransferases. We found reduced α-DG glycosylation in the muscle biopsies of affected individuals and in available fibroblasts. Overexpression of wild-type GMPPB in fibroblasts from an affected individual partially restored glycosylation of α-DG. Whereas wild-type GMPPB localized to the cytoplasm, five of the identified missense mutations caused formation of aggregates in the cytoplasm or near membrane protrusions. Additionally, knockdown of the GMPPB ortholog in zebrafish caused structural muscle defects with decreased motility, eye abnormalities, and reduced glycosylation of α-DG. Together, these data indicate that GMPPB mutations are responsible for congenital and limb-girdle muscular dystrophies with hypoglycosylation of α-DG.

    American journal of human genetics 2013

  • Use of Vitek 2 Antimicrobial Susceptibility Profile To Identify mecC in Methicillin-Resistant Staphylococcus aureus.

    Cartwright EJ, Paterson GK, Raven KE, Harrison EM, Gouliouris T, Kearns A, Pichon B, Edwards G, Skov RL, Larsen AR, Holmes MA, Parkhill J, Peacock SJ and Török ME

    Department of Medicine, University of Cambridge, Cambridge, United Kingdom.

    The emergence of mecC methicillin-resistant Staphylococcus aureus (MRSA) poses a diagnostic challenge for clinical microbiology laboratories. Using the Vitek 2 system, we tested a panel of 896 Staphylococcus aureus isolates and found that an oxacillin-sensitive/cefoxitin-resistant profile had a sensitivity of 88.7% and a specificity of 99.5% for the identification of mecC MRSA isolates. The presence of the mecC gene, determined by bacterial whole-genome sequencing, was used as the gold standard. This profile could provide a zero-cost screening method for identification of mecC-positive MRSA strains.

    Journal of clinical microbiology 2013;51;8;2732-4

  • BamView: visualizing and interpretation of next-generation sequencing read alignments.

    Carver T, Harris SR, Otto TD, Berriman M, Parkhill J and McQuillan JA

    Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. artemis@sanger.ac.uk.

    So-called next-generation sequencing (NGS) has provided the ability to sequence on a massive scale at low cost, enabling biologists to perform powerful experiments and gain insight into biological processes. BamView has been developed to visualize and analyse sequence reads from NGS platforms, which have been aligned to a reference sequence. It is a desktop application for browsing the aligned or mapped reads [Ruffalo, M, LaFramboise, T, Koyutürk, M. Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics 2011;27:2790-6] at different levels of magnification, from nucleotide level, where the base qualities can be seen, to genome or chromosome level where overall coverage is shown. To enable in-depth investigation of NGS data, various views are provided that can be configured to highlight interesting aspects of the data. Multiple read alignment files can be overlaid to compare results from different experiments, and filters can be applied to facilitate the interpretation of the aligned reads. As well as being a standalone application it can be used as an integrated part of the Artemis genome browser, BamView allows the user to study NGS data in the context of the sequence and annotation of the reference genome. Single nucleotide polymorphism (SNP) density and candidate SNP sites can be highlighted and investigated, and read-pair information can be used to discover large structural insertions and deletions. The application will also calculate simple analyses of the read mapping, including reporting the read counts and reads per kilobase per million mapped reads (RPKM) for genes selected by the user. Availability: BamView and Artemis are freely available software. These can be downloaded from their home pages: http://bamview.sourceforge.net/; http://www.sanger.ac.uk/resources/software/artemis/. Requirements: Java 1.6 or higher.

    Briefings in bioinformatics 2013;14;2;203-12

  • Phosphoproteomics data classify hematological cancer cell lines according to tumor type and sensitivity to kinase inhibitors.

    Casado P, Alcolea MP, Iorio F, Rodríguez-Prados JC, Vanhaesebroeck B, Saez-Rodriguez J, Joel S and Cutillas PR

    Analytical Signalling Group, Centre for Cell Signalling, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1B 6BQ, UK. pedro.cutillas@imperial.ac.uk.

    BACKGROUND: Tumor classification based on their predicted responses to kinase inhibitors is a major goal for advancing targeted personalized therapies. Here, we used a phosphoproteomic approach to investigate biological heterogeneity across hematological cancer cell lines including acute myeloid leukemia, lymphoma, and multiple myeloma. RESULTS: Mass spectrometry was used to quantify 2,000 phosphorylation sites across three acute myeloid leukemia, three lymphoma, and three multiple myeloma cell lines in six biological replicates. The intensities of the phosphorylation sites grouped these cancer cell lines according to their tumor type. In addition, a phosphoproteomic analysis of seven acute myeloid leukemia cell lines revealed a battery of phosphorylation sites whose combined intensities correlated with the growth-inhibitory responses to three kinase inhibitors with remarkable correlation coefficients and fold changes (> 100 between the most resistant and sensitive cells). Modeling based on regression analysis indicated that a subset of phosphorylation sites could be used to predict response to the tested drugs. Quantitative analysis of phosphorylation motifs indicated that resistant and sensitive cells differed in their patterns of kinase activities, but, interestingly, phosphorylations correlating with responses were not on members of the pathway being targeted; instead, these mainly were on parallel kinase pathways. CONCLUSION: This study reveals that the information on kinase activation encoded in phosphoproteomics data correlates remarkably well with the phenotypic responses of cancer cells to compounds that target kinase signaling and could be useful for the identification of novel markers of resistance or sensitivity to drugs that target the signaling network.

    Genome biology 2013;14;4;R37

  • Persistence of HIV-1 Transmitted Drug Resistance Mutations.

    Castro H, Pillay D, Cane P, Asboe D, Cambiano V, Phillips A, Dunn DT and UK Collaborative Group on HIV Drug Resistance

    Medical Research Council Clinical Trials Unit, London, United Kingdom.

    There are few data on the persistence of individual human immunodeficiency virus type 1 (HIV-1) transmitted drug resistance (TDR) mutations in the absence of selective drug pressure. We studied 313 patients in whom TDR mutations were detected at their first resistance test and who had a subsequent test performed while ART-naive. The rate at which mutations became undetectable was estimated using exponential regression accounting for interval censoring. Most thymidine analogue mutations (TAMs) and T215 revertants (but not T215F/Y) were found to be highly stable, with NNRTI and PI mutations being relatively less persistent. Our estimates are important for informing HIV transmission models.

    The Journal of infectious diseases 2013;208;9;1459-63

  • Comprehensive assignment of roles for salmonella typhimurium genes in intestinal colonization of food-producing animals.

    Chaudhuri RR, Morgan E, Peters SE, Pleasance SJ, Hudson DL, Davies HM, Wang J, van Diemen PM, Buckley AM, Bowen AJ, Pullinger GD, Turner DJ, Langridge GC, Turner AK, Parkhill J, Charles IG, Maskell DJ and Stevens MP

    Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom.

    Chickens, pigs, and cattle are key reservoirs of Salmonella enterica, a foodborne pathogen of worldwide importance. Though a decade has elapsed since publication of the first Salmonella genome, thousands of genes remain of hypothetical or unknown function, and the basis of colonization of reservoir hosts is ill-defined. Moreover, previous surveys of the role of Salmonella genes in vivo have focused on systemic virulence in murine typhoid models, and the genetic basis of intestinal persistence and thus zoonotic transmission have received little study. We therefore screened pools of random insertion mutants of S. enterica serovar Typhimurium in chickens, pigs, and cattle by transposon-directed insertion-site sequencing (TraDIS). The identity and relative fitness in each host of 7,702 mutants was simultaneously assigned by massively parallel sequencing of transposon-flanking regions. Phenotypes were assigned to 2,715 different genes, providing a phenotype-genotype map of unprecedented resolution. The data are self-consistent in that multiple independent mutations in a given gene or pathway were observed to exert a similar fitness cost. Phenotypes were further validated by screening defined null mutants in chickens. Our data indicate that a core set of genes is required for infection of all three host species, and smaller sets of genes may mediate persistence in specific hosts. By assigning roles to thousands of Salmonella genes in key reservoir hosts, our data facilitate systems approaches to understand pathogenesis and the rational design of novel cross-protective vaccines and inhibitors. Moreover, by simultaneously assigning the genotype and phenotype of over 90% of mutants screened in complex pools, our data establish TraDIS as a powerful tool to apply rich functional annotation to microbial genomes with minimal animal use.

    PLoS genetics 2013;9;4;e1003456

  • Mcph1-Deficient Mice Reveal a Role for MCPH1 in Otitis Media.

    Chen J, Ingham N, Clare S, Raisen C, Vancollie VE, Ismail O, McIntyre RE, Tsang SH, Mahajan VB, Dougan G, Adams DJ, White JK and Steel KP

    Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom.

    Otitis media is a common reason for hearing loss, especially in children. Otitis media is a multifactorial disease and environmental factors, anatomic dysmorphology and genetic predisposition can all contribute to its pathogenesis. However, the reasons for the variable susceptibility to otitis media are elusive. MCPH1 mutations cause primary microcephaly in humans. So far, no hearing impairment has been reported either in the MCPH1 patients or mouse models with Mcph1 deficiency. In this study, Mcph1-deficient (Mcph1(tm1a) (/tm1a) ) mice were produced using embryonic stem cells with a targeted mutation by the Sanger Institute's Mouse Genetics Project. Auditory brainstem response measurements revealed that Mcph1(tm1a) (/tm1a) mice had mild to moderate hearing impairment with around 70% penetrance. We found otitis media with effusion in the hearing-impaired Mcph1(tm1a) (/tm1a) mice by anatomic and histological examinations. Expression of Mcph1 in the epithelial cells of middle ear cavities supported its involvement in the development of otitis media. Other defects of Mcph1(tm1a) (/tm1a) mice included small skull sizes, increased micronuclei in red blood cells, increased B cells and ocular abnormalities. These findings not only recapitulated the defects found in other Mcph1-deficient mice or MCPH1 patients, but also revealed an unexpected phenotype, otitis media with hearing impairment, which suggests Mcph1 is a new gene underlying genetic predisposition to otitis media.

    PloS one 2013;8;3;e58156

  • Proteomic comparison of historic and recently emerged hypervirulent Clostridium difficile strains.

    Chen JW, Scaria J, Mao C, Sobral B, Zhang S, Lawley T and Chang YF

    Department of Population Medicine and Diagnostic Sciences, Cornell University, Ithaca, New York 14853, United States.

    Clostridium difficile in recent years has undergone rapid evolution and has emerged as a serious human pathogen. Proteomic approaches can improve the understanding of the diversity of this important pathogen, especially in comparing the adaptive ability of different C. difficile strains. In this study, TMT labeling and nanoLC-MS/MS driven proteomics were used to investigate the responses of four C. difficile strains to nutrient shift and osmotic shock. We detected 126 and 67 differentially expressed proteins in at least one strain under nutrition shift and osmotic shock, respectively. During nutrient shift, several components of the phosphotransferase system (PTS) were found to be differentially expressed, which indicated that the carbon catabolite repression (CCR) was relieved to allow the expression of enzymes and transporters responsible for the utilization of alternate carbon sources. Some classical osmotic shock associated proteins, such as GroEL, RecA, CspG, and CspF, and other stress proteins such as PurG and SerA were detected during osmotic shock. Furthermore, the recently emerged strains were found to contain a more robust gene network in response to both stress conditions. This work represents the first comparative proteomic analysis of historic and recently emerged hypervirulent C. difficile strains, complementing the previously published proteomics studies utilizing only one reference strain.

    Journal of proteome research 2013;12;3;1151-61

  • Hierarchical and spatially explicit clustering of DNA sequences with BAPS software.

    Cheng L, Connor TR, Sirén J, Aanensen DM and Corander J

    Department of Mathematics and statistics, University of Helsinki, 00014, Finland; Cardiff School of Biosciences, Cardiff University, Cardiff, CF10 3AX, UK; Department of Infectious Disease Epidemiology, Imperial College London, Norfolk Place, London, W2 1PG, UK; Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Phylogeographical analyses have become commonplace for a myriad of organisms with the advent of cheap DNA sequencing technologies. Bayesian model-based clustering is a powerful tool for detecting important patterns in such data and can be used to decipher even quite subtle signals of systematic differences in molecular variation. Here we introduce two upgrades to the Bayesian Analysis of Population Structure (BAPS) software, which enable (1) spatially explicit modeling of variation in DNA sequences, and (2) hierarchical clustering of DNA sequence data to reveal nested genetic population structures. We provide a direct interface to map the results from spatial clustering with Google Maps using the portal http://www.spatialepidemiology.net/ and illustrate this approach using sequence data from Borrelia burgdorferii. The usefulness of hierarchical clustering is demonstrated through an analysis of the metapopulation structure within a bacterial population experiencing a high level of local horizontal gene transfer. The tools that are introduced are freely available at http://www.helsinki.fi/bsg/software/BAPS/.

    Molecular biology and evolution 2013

  • Your gut microbiota are what you eat.

    Chewapreecha C

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2013;12;1;8

  • Calreticulin gene exon 9 frameshift mutations in patients with thrombocytosis.

    Chi J, Nicolaou KA, Nicolaidou V, Koumas L, Mitsidou A, Pierides C, Manoloukos M, Barbouti K, Melanthiou F, Prokopiou C, Vassiliou GS and Costeas P

    The Center for the Study of Haematological Malignancies, Nicosia, Cyprus.

    Leukemia 2013

  • ISPD gene mutations are a common cause of congenital and limb-girdle muscular dystrophies.

    Cirak S, Foley AR, Herrmann R, Willer T, Yau S, Stevens E, Torelli S, Brodd L, Kamynina A, Vondracek P, Roper H, Longman C, Korinthenberg R, Marrosu G, Nürnberg P, UK10K Consortium, Michele DE, Plagnol V, Hurles M, Moore SA, Sewry CA, Campbell KP, Voit T and Muntoni F

    Dubowitz Neuromuscular Centre, UCL Institute of Child Health, University College London, 30 Guilford Street, London WC1N 1EH, UK. f.muntoni@ucl.ac.uk.

    Dystroglycanopathies are a clinically and genetically diverse group of recessively inherited conditions ranging from the most severe of the congenital muscular dystrophies, Walker-Warburg syndrome, to mild forms of adult-onset limb-girdle muscular dystrophy. Their hallmark is a reduction in the functional glycosylation of α-dystroglycan, which can be detected in muscle biopsies. An important part of this glycosylation is a unique O-mannosylation, essential for the interaction of α-dystroglycan with extracellular matrix proteins such as laminin-α2. Mutations in eight genes coding for proteins in the glycosylation pathway are responsible for ∼50% of dystroglycanopathy cases. Despite multiple efforts using traditional positional cloning, the causative genes for unsolved dystroglycanopathy cases have escaped discovery for several years. In a recent collaborative study, we discovered that loss-of-function recessive mutations in a novel gene, called isoprenoid synthase domain containing (ISPD), are a relatively common cause of Walker-Warburg syndrome. In this article, we report the involvement of the ISPD gene in milder dystroglycanopathy phenotypes ranging from congenital muscular dystrophy to limb-girdle muscular dystrophy and identified allelic ISPD variants in nine cases belonging to seven families. In two ambulant cases, there was evidence of structural brain involvement, whereas in seven, the clinical manifestation was restricted to a dystrophic skeletal muscle phenotype. Although the function of ISPD in mammals is not yet known, mutations in this gene clearly lead to a reduction in the functional glycosylation of α-dystroglycan, which not only causes the severe Walker-Warburg syndrome but is also a common cause of the milder forms of dystroglycanopathy.

    Brain : a journal of neurology 2013;136;Pt 1;269-81

  • Elucidating emergence and transmission of multidrug-resistant tuberculosis in treatment experienced patients by whole genome sequencing.

    Clark TG, Mallard K, Coll F, Preston M, Assefa S, Harris D, Ogwang S, Mumbowa F, Kirenga B, O'Sullivan DM, Okwera A, Eisenach KD, Joloba M, Bentley SD, Ellner JJ, Parkhill J, Jones-López EC and McNerney R

    Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, London, United Kingdom ; Faculty of Epidemiology and Population Health, London School of Hygiene & Tropical Medicine, London, United Kingdom.

    Background: Understanding the emergence and spread of multidrug-resistant tuberculosis (MDR-TB) is crucial for its control. MDR-TB in previously treated patients is generally attributed to the selection of drug resistant mutants during inadequate therapy rather than transmission of a resistant strain. Traditional genotyping methods are not sufficient to distinguish strains in populations with a high burden of tuberculosis and it has previously been difficult to assess the degree of transmission in these settings. We have used whole genome analysis to investigate M. tuberculosis strains isolated from treatment experienced patients with MDR-TB in Uganda over a period of four years. We used high throughput genome sequencing technology to investigate small polymorphisms and large deletions in 51 Mycobacterium tuberculosis samples from 41 treatment-experienced TB patients attending a TB referral and treatment clinic in Kampala. This was a convenience sample representing 69% of MDR-TB cases identified over the four year period. Low polymorphism was observed in longitudinal samples from individual patients (2-15 SNPs). Clusters of samples with less than 50 SNPs variation were examined. Three clusters comprising a total of 8 patients were found with almost identical genetic profiles, including mutations predictive for resistance to rifampicin and isoniazid, suggesting transmission of MDR-TB. Two patients with previous drug susceptible disease were found to have acquired MDR strains, one of which shared its genotype with an isolate from another patient in the cohort. Conclusions: Whole genome sequence analysis identified MDR-TB strains that were shared by more than one patient. The transmission of multidrug-resistant disease in this cohort of retreatment patients emphasises the importance of early detection and need for infection control. Consideration should be given to rapid testing for drug resistance in patients undergoing treatment to monitor the emergence of resistance and permit early intervention to avoid onward transmission.

    PloS one 2013;8;12;e83012

  • Genome of Acanthamoeba castellanii highlights extensive lateral gene transfer and early evolution of tyrosine kinase signaling.

    Clarke M, Lohan AJ, Liu B, Lagkouvardos I, Roy S, Zafar N, Bertelli C, Schilde C, Kianianmomeni A, Bürglin TR, Frech C, Turcotte B, Kopec KO, Synnott JM, Choo C, Paponov I, Finkler A, Heng Tan CS, Hutchins AP, Weinmeier T, Rattei T, Chu JS, Gimenez G, Irimia M, Rigden DJ, Fitzpatrick DA, Lorenzo-Morales J, Bateman A, Chiu CH, Tang P, Hegemann P, Fromm H, Raoult D, Greub G, Miranda-Saavedra D, Chen N, Nash P, Ginger ML, Horn M, Schaap P, Caler L and Loftus BJ

    Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland. brendan.loftus@ucd.ie.

    BACKGROUND: The Amoebozoa constitute one of the primary divisions of eukaryotes, encompassing taxa of both biomedical and evolutionary importance, yet its genomic diversity remains largely unsampled. Here we present an analysis of a whole genome assembly of Acanthamoeba castellanii (Ac) the first representative from a solitary free-living amoebozoan. RESULTS: Ac encodes 15,455 compact intron-rich genes, a significant number of which are predicted to have arisen through inter-kingdom lateral gene transfer (LGT). A majority of the LGT candidates have undergone a substantial degree of intronization and Ac appears to have incorporated them into established transcriptional programs. Ac manifests a complex signaling and cell communication repertoire, including a complete tyrosine kinase signaling toolkit and a comparable diversity of predicted extracellular receptors to that found in the facultatively multicellular dictyostelids. An important environmental host of a diverse range of bacteria and viruses, Ac utilizes a diverse repertoire of predicted pattern recognition receptors, many with predicted orthologous functions in the innate immune systems of higher organisms. CONCLUSIONS: Our analysis highlights the important role of LGT in the biology of Ac and in the diversification of microbial eukaryotes. The early evolution of a key signaling facility implicated in the evolution of metazoan multicellularity strongly argues for its emergence early in the Unikont lineage. Overall, the availability of an Ac genome should aid in deciphering the biology of the Amoebozoa and facilitate functional genomic studies in this important model organism and environmental host.

    Genome biology 2013;14;2;R11

  • Identification of seven loci affecting mean telomere length and their association with disease.

    Codd V, Nelson CP, Albrecht E, Mangino M, Deelen J, Buxton JL, Hottenga JJ, Fischer K, Esko T, Surakka I, Broer L, Nyholt DR, Mateo Leach I, Salo P, Hägg S, Matthews MK, Palmen J, Norata GD, O'Reilly PF, Saleheen D, Amin N, Balmforth AJ, Beekman M, de Boer RA, Böhringer S, Braund PS, Burton PR, de Craen AJ, Denniff M, Dong Y, Douroudis K, Dubinina E, Eriksson JG, Garlaschelli K, Guo D, Hartikainen AL, Henders AK, Houwing-Duistermaat JJ, Kananen L, Karssen LC, Kettunen J, Klopp N, Lagou V, van Leeuwen EM, Madden PA, Mägi R, Magnusson PK, Männistö S, McCarthy MI, Medland SE, Mihailov E, Montgomery GW, Oostra BA, Palotie A, Peters A, Pollard H, Pouta A, Prokopenko I, Ripatti S, Salomaa V, Suchiman HE, Valdes AM, Verweij N, Viñuela A, Wang X, Wichmann HE, Widen E, Willemsen G, Wright MJ, Xia K, Xiao X, van Veldhuisen DJ, Catapano AL, Tobin MD, Hall AS, Blakemore AI, van Gilst WH, Zhu H, Consortium C, Erdmann J, Reilly MP, Kathiresan S, Schunkert H, Talmud PJ, Pedersen NL, Perola M, Ouwehand W, Kaprio J, Martin NG, van Duijn CM, Hovatta I, Gieger C, Metspalu A, Boomsma DI, Jarvelin MR, Slagboom PE, Thompson JR, Spector TD, van der Harst P and Samani NJ

    Department of Cardiovascular Sciences, University of Leicester, Leicester, UK.

    Interindividual variation in mean leukocyte telomere length (LTL) is associated with cancer and several age-associated diseases. We report here a genome-wide meta-analysis of 37,684 individuals with replication of selected variants in an additional 10,739 individuals. We identified seven loci, including five new loci, associated with mean LTL (P < 5 × 10(-8)). Five of the loci contain candidate genes (TERC, TERT, NAF1, OBFC1 and RTEL1) that are known to be involved in telomere biology. Lead SNPs at two loci (TERC and TERT) associate with several cancers and other diseases, including idiopathic pulmonary fibrosis. Moreover, a genetic risk score analysis combining lead variants at all 7 loci in 22,233 coronary artery disease cases and 64,762 controls showed an association of the alleles associated with shorter LTL with increased risk of coronary artery disease (21% (95% confidence interval, 5-35%) per standard deviation in LTL, P = 0.014). Our findings support a causal role of telomere-length variation in some age-related diseases.

    Funded by: British Heart Foundation: RG/08/014/24067; Medical Research Council: G0902313; NIDA NIH HHS: R56 DA012854

    Nature genetics 2013;45;4;422-7, 427e1-2

  • Real-time genomic epidemiological evaluation of human campylobacter isolates by use of whole-genome multilocus sequence typing.

    Cody AJ, McCarthy ND, Jansen van Rensburg M, Isinkaye T, Bentley SD, Parkhill J, Dingle KE, Bowler IC, Jolley KA and Maiden MC

    Department of Zoology, University of Oxford, Oxford, United Kingdom.

    Sequence-based typing is essential for understanding the epidemiology of Campylobacter infections, a major worldwide cause of bacterial gastroenteritis. We demonstrate the practical and rapid exploitation of whole-genome sequencing to provide routine definitive characterization of Campylobacter jejuni and Campylobacter coli for clinical and public health purposes. Short-read data from 384 Campylobacter clinical isolates collected over 4 months in Oxford, United Kingdom, were assembled de novo. Contigs were deposited at the pubMLST.org/campylobacter website and automatically annotated for 1,667 loci. Typing and phylogenetic information was extracted and comparative analyses were performed for various subsets of loci, up to the level of the whole genome, using the Genome Comparator and Neighbor-net algorithms. The assembled sequences (for 379 isolates) were diverse and resembled collections from previous studies of human campylobacteriosis. Small subsets of very closely related isolates originated mainly from repeated sampling from the same patients and, in one case, likely laboratory contamination. Much of the within-patient variation occurred in phase-variable genes. Clinically and epidemiologically informative data can be extracted from whole-genome sequence data in real time with straightforward, publicly available tools. These analyses are highly scalable, are transparent, do not require closely related genome reference sequences, and provide improved resolution (i) among Campylobacter clonal complexes and (ii) between very closely related isolates. Additionally, these analyses rapidly differentiated unrelated isolates, allowing the detection of single-strain clusters. The approach is widely applicable to analyses of human bacterial pathogens in real time in clinical laboratories, with little specialist training required.

    Journal of clinical microbiology 2013;51;8;2526-34

  • A genetic study of Wilson's disease in the United Kingdom.

    Coffey AJ, Durkie M, Hague S, McLay K, Emmerson J, Lo C, Klaffke S, Joyce CJ, Dhawan A, Hadzic N, Mieli-Vergani G, Kirk R, Elizabeth Allen K, Nicholl D, Wong S, Griffiths W, Smithson S, Giffin N, Taha A, Connolly S, Gillett GT, Tanner S, Bonham J, Sharrack B, Palotie A, Rattray M, Dalton A and Bandmann O

    1 Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK.

    Previous studies have failed to identify mutations in the Wilson's disease gene ATP7B in a significant number of clinically diagnosed cases. This has led to concerns about genetic heterogeneity for this condition but also suggested the presence of unusual mutational mechanisms. We now present our findings in 181 patients from the United Kingdom with clinically and biochemically confirmed Wilson's disease. A total of 116 different ATP7B mutations were detected, 32 of which are novel. The overall mutation detection frequency was 98%. The likelihood of mutations in genes other than ATP7B causing a Wilson's disease phenotype is therefore very low. We report the first cases with Wilson's disease due to segmental uniparental isodisomy as well as three patients with three ATP7B mutations and three families with Wilson's disease in two consecutive generations. We determined the genetic prevalence of Wilson's disease in the United Kingdom by sequencing the entire coding region and adjacent splice sites of ATP7B in 1000 control subjects. The frequency of all single nucleotide variants with in silico evidence of pathogenicity (Class 1 variant) was 0.056 or 0.040 if only those single nucleotide variants that had previously been reported as mutations in patients with Wilson's disease were included in the analysis (Class 2 variant). The frequency of heterozygote, putative or definite disease-associated ATP7B mutations was therefore considerably higher than the previously reported occurrence of 1:90 (or 0.011) for heterozygote ATP7B mutation carriers in the general population (P < 2.2 × 10(-16) for Class 1 variants or P < 5 × 10(-11) for Class 2 variants only). Subsequent exclusion of four Class 2 variants without additional in silico evidence of pathogenicity led to a further reduction of the mutation frequency to 0.024. Using this most conservative approach, the calculated frequency of individuals predicted to carry two mutant pathogenic ATP7B alleles is 1:7026 and thus still considerably higher than the typically reported prevalence of Wilson's disease of 1:30 000 (P = 0.00093). Our study provides strong evidence for monogenic inheritance of Wilson's disease. It also has major implications for ATP7B analysis in clinical practice, namely the need to consider unusual genetic mechanisms such as uniparental disomy or the possible presence of three ATP7B mutations. The marked discrepancy between the genetic prevalence and the number of clinically diagnosed cases of Wilson's disease may be due to both reduced penetrance of ATP7B mutations and failure to diagnose patients with this eminently treatable disorder.

    Brain : a journal of neurology 2013

  • Two Pfam protein families characterized by a crystal structure of protein lpg2210 from Legionella pneumophila.

    Coggill P, Eberhardt RY, Finn RD, Chang Y, Jaroszewski L, Godzik A, Das D, Xu Q, Axelrod HL, Aravind L, Murzin AG and Bateman A

    Background: Every genome contains a large number of uncharacterized proteins that may encode entirely novel biological systems. Many of these uncharacterized proteins fall into related sequence families. By applying sequence and structural analysis we hope to provide insight into novel biology. Results: We analyze a previously uncharacterized Pfam protein family called DUF4424 [Pfam:PF14415]. The recently solved three-dimensional structure of the protein lpg2210 from Legionella pneumophila provides the first structural information pertaining to this family. This protein additionally includes the first representative structure of another Pfam family called the YARHG domain [Pfam:PF13308]. The Pfam family DUF4424 adopts a 19-stranded beta-sandwich fold that shows similarity to the N-terminal domain of leukotriene A-4 hydrolase. The YARHG domain forms an all-helical domain at the C-terminus. Structure analysis allows us to recognize distant similarities between the DUF4424 domain and individual domains of M1 aminopeptidases and tricorn proteases, which form massive proteasome-like capsids in both archaea and bacteria. Conclusions: Based on our analyses we hypothesize that the DUF4424 domain may have a role in forming large, multi-component enzyme complexes. We suggest that the YARGH domain may play a role in binding a moiety in proximity with peptidoglycan, such as a hydrophobic outer membrane lipid or lipopolysaccharide.

    BMC bioinformatics 2013;14;1;265

  • Toward knowledge support for analysis and interpretation of complex traits.

    Collier N, Oellrich A and Groza T

    European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. collier@ebi.ac.uk.

    The systematic description of complex traits, from the organism to the cellular level, is important for hypothesis generation about underlying disease mechanisms. We discuss how intelligent algorithms might provide support, leading to faster throughput.

    Genome biology 2013;14;9;214

  • Learning to Recognize Phenotype Candidates in the Auto-Immune Literature Using SVM Re-Ranking.

    Collier N, Tran MV, Le HQ, Ha QT, Oellrich A and Rebholz-Schuhmann D

    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, United Kingdom ; National Institute of Informatics, Tokyo, Japan.

    The identification of phenotype descriptions in the scientific literature, case reports and patient records is a rewarding task for bio-medical text mining. Any progress will support knowledge discovery and linkage to other resources. However because of their wide variation a number of challenges still remain in terms of their identification and semantic normalisation before they can be fully exploited for research purposes. This paper presents novel techniques for identifying potential complex phenotype mentions by exploiting a hybrid model based on machine learning, rules and dictionary matching. A systematic study is made of how to combine sequence labels from these modules as well as the merits of various ontological resources. We evaluated our approach on a subset of Medline abstracts cited by the Online Mendelian Inheritance of Man database related to auto-immune diseases. Using partial matching the best micro-averaged F-score for phenotypes and five other entity classes was 79.9%. A best performance of 75.3% was achieved for phenotype candidates using all semantics resources. We observed the advantage of using SVM-based learn-to-rank for sequence label combination over maximum entropy and a priority list approach. The results indicate that the identification of simple entity types such as chemicals and genes are robustly supported by single semantic resources, whereas phenotypes require combinations. Altogether we conclude that our approach coped well with the compositional structure of phenotypes in the auto-immune domain.

    PloS one 2013;8;10;e72965

  • Genomic and proteomic dissection of the ubiquitous plant pathogen, Armillaria mellea: toward a new infection model system.

    Collins C, Keane TM, Turner DJ, O'Keeffe G, Fitzpatrick DA and Doyle S

    Department of Biology, National University of Ireland Maynooth, Maynooth, Co Kildare, Ireland.

    Armillaria mellea is a major plant pathogen. Yet, no large-scale "-omics" data are available to enable new studies, and limited experimental models are available to investigate basidiomycete pathogenicity. Here we reveal that the A. mellea genome comprises 58.35 Mb, contains 14473 gene models, of average length 1575 bp (4.72 introns/gene). Tandem mass spectrometry identified 921 mycelial (n = 629 unique) and secreted (n = 183 unique) proteins. Almost 100 mycelial proteins were either species-specific or previously unidentified at the protein level. A number of proteins (n = 111) was detected in both mycelia and culture supernatant extracts. Signal sequence occurrence was 4-fold greater for secreted (50.2%) compared to mycelial (12%) proteins. Analyses revealed a rich reservoir of carbohydrate degrading enzymes, laccases, and lignin peroxidases in the A. mellea proteome, reminiscent of both basidiomycete and ascomycete glycodegradative arsenals. We discovered that A. mellea exhibits a specific killing effect against Candida albicans during coculture. Proteomic investigation of this interaction revealed the unique expression of defensive and potentially offensive A. mellea proteins (n = 30). Overall, our data reveal new insights into the origin of basidiomycete virulence and we present a new model system for further studies aimed at deciphering fungal pathogenic mechanisms.

    Journal of proteome research 2013;12;6;2552-70

  • Small effective population size and genetic homogeneity in the Val Borbera isolate.

    Colonna V, Pistis G, Bomba L, Mona S, Matullo G, Boano R, Sala C, Viganò F, Torroni A, Achilli A, Hooshiar Kashani B, Malerba G, Gambaro G, Soranzo N and Toniolo D

    Institute of Genetics and Biophysics 'A. Buzzati-Traverso', National Research Council (CNR), Naples, Italy. vincenza.colonna@igb.cnr.it

    Population isolates are a valuable resource for medical genetics because of their reduced genetic, phenotypic and environmental heterogeneity. Further, extended linkage disequilibrium (LD) allows accurate haplotyping and imputation. In this study, we use nuclear and mitochondrial DNA data to determine to what extent the geographically isolated population of the Val Borbera valley also presents features of genetic isolation. We performed a comparative analysis of population structure and estimated effective population size exploiting LD data. We also evaluated haplotype sharing through the analysis of segments of autozygosity. Our findings reveal that the valley has features characteristic of a genetic isolate, including reduced genetic heterogeneity and reduced effective population size. We show that this population has been subject to prolonged genetic drift and thus we expect many variants that are rare in the general population to reach significant frequency values in the valley, making this population suitable for the identification of rare variants underlying complex traits.

    European journal of human genetics : EJHG 2013;21;1;89-94

  • Out-of-Africa migration and Neolithic coexpansion of Mycobacterium tuberculosis with modern humans.

    Comas I, Coscolla M, Luo T, Borrell S, Holt KE, Kato-Maeda M, Parkhill J, Malla B, Berg S, Thwaites G, Yeboah-Manu D, Bothamley G, Mei J, Wei L, Bentley S, Harris SR, Niemann S, Diel R, Aseffa A, Gao Q, Young D and Gagneux S

    1] Genomics and Health Unit, Centre for Public Health Research (CSISP-FISABIO), Valencia, Spain. [2] CIBER (Centros de Investigación Biomédica en Red) in Epidemiology and Public Health, Barcelona, Spain.

    Tuberculosis caused 20% of all human deaths in the Western world between the seventeenth and nineteenth centuries and remains a cause of high mortality in developing countries. In analogy to other crowd diseases, the origin of human tuberculosis has been associated with the Neolithic Demographic Transition, but recent studies point to a much earlier origin. We analyzed the whole genomes of 259 M. tuberculosis complex (MTBC) strains and used this data set to characterize global diversity and to reconstruct the evolutionary history of this pathogen. Coalescent analyses indicate that MTBC emerged about 70,000 years ago, accompanied migrations of anatomically modern humans out of Africa and expanded as a consequence of increases in human population density during the Neolithic period. This long coevolutionary history is consistent with MTBC displaying characteristics indicative of adaptation to both low and high host densities.

    Funded by: Medical Research Council: MC_U117581288, MC_U117588500, U.1175.02.002.00015.01, U117581288; NIAID NIH HHS: AI090928 AND, R01 AI090928; PHS HHS: HHSN266200700022C; Wellcome Trust: 098051

    Nature genetics 2013;45;10;1176-82

  • Epigenetic regulation of COL15A1 in smooth muscle cell replicative aging and atherosclerosis.

    Connelly JJ, Cherepanova OA, Doss JF, Karaoli T, Lillard TS, Markunas CA, Nelson S, Wang T, Ellis PD, Langford CF, Haynes C, Seo DM, Goldschmidt-Clermont PJ, Shah SH, Kraus WE, Hauser ER and Gregory SG

    Department of Medicine and Division of Cardiovascular Medicine and.

    Smooth muscle cell (SMC) proliferation is a hallmark of vascular injury and disease. Global hypomethylation occurs during SMC proliferation in culture and in vivo during neointimal formation. Regardless of the programmed or stochastic nature of hypomethylation, identifying these changes is important in understanding vascular disease, as maintenance of a cells' epigenetic profile is essential for maintaining cellular phenotype. Global hypomethylation of proliferating aortic SMCs and concomitant decrease of DNMT1 expression were identified in culture during passage. An epigenome screen identified regions of the genome that were hypomethylated during proliferation and a region containing Collagen, type XV, alpha 1 (COL15A1) was selected by 'genomic convergence' for characterization. COL15A1 transcript and protein levels increased with passage-dependent decreases in DNA methylation and the transcript was sensitive to treatment with 5-Aza-2'-deoxycytidine, suggesting DNA methylation-mediated gene expression. Phenotypically, knockdown of COL15A1 increased SMC migration and decreased proliferation and Col15a1 expression was induced in an atherosclerotic lesion and localized to the atherosclerotic cap. A sequence variant in COL15A1 that is significantly associated with atherosclerosis (rs4142986, P = 0.017, OR = 1.434) was methylated and methylation of the risk allele correlated with decreased gene expression and increased atherosclerosis in human aorta. In summary, hypomethylation of COL15A1 occurs during SMC proliferation and the consequent increased gene expression may impact SMC phenotype and atherosclerosis formation. Hypomethylated genes, such as COL15A1, provide evidence for concomitant epigenetic regulation and genetic susceptibility, and define a class of causal targets that sit at the intersection of genetic and epigenetic predisposition in the etiology of complex disease.

    Human molecular genetics 2013;22;25;5107-20

  • Detailed molecular characterisation of acute myeloid leukaemia with a normal karyotype using targeted DNA capture.

    Conte N, Varela I, Grove C, Manes N, Yusa K, Moreno T, Segonds-Pichon A, Bench A, Gudgin E, Herman B, Bolli N, Ellis P, Haddad D, Costeas P, Rad R, Scott M, Huntly B, Bradley A and Vassiliou GS

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Advances in sequencing technologies are giving unprecedented insights into the spectrum of somatic mutations underlying acute myeloid leukaemia with a normal karyotype (AML-NK). It is clear that the prognosis of individual patients is strongly influenced by the combination of mutations in their leukaemia and that many leukaemias are composed of multiple subclones, with differential susceptibilities to treatment. Here, we describe a method, employing targeted capture coupled with next-generation sequencing and tailored bioinformatic analysis, for the simultaneous study of 24 genes recurrently mutated in AML-NK. Mutational analysis was performed using open source software and an in-house script (Mutation Identification and Analysis Software), which identified dominant clone mutations with 100% specificity. In each of seven cases of AML-NK studied, we identified and verified mutations in 2-4 genes in the main leukaemic clone. Additionally, high sequencing depth enabled us to identify putative subclonal mutations and detect leukaemia-specific mutations in DNA from remission marrow. Finally, we used normalised read depths to detect copy number changes and identified and subsequently verified a tandem duplication of exons 2-9 of MLL and at least one deletion involving PTEN. This methodology reliably detects sequence and copy number mutations, and can thus greatly facilitate the classification, clinical research, diagnosis and management of AML-NK.

    Funded by: Wellcome Trust: 079249, 095663, 100140

    Leukemia 2013;27;9;1820-5

  • Where genotype is not predictive of phenotype: towards an understanding of the molecular basis of reduced penetrance in human inherited disease.

    Cooper DN, Krawczak M, Polychronakos C, Tyler-Smith C and Kehrer-Sawatzki H

    Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK, cooperDN@cardiff.ac.uk.

    Some individuals with a particular disease-causing mutation or genotype fail to express most if not all features of the disease in question, a phenomenon that is known as 'reduced (or incomplete) penetrance'. Reduced penetrance is not uncommon; indeed, there are many known examples of 'disease-causing mutations' that fail to cause disease in at least a proportion of the individuals who carry them. Reduced penetrance may therefore explain not only why genetic diseases are occasionally transmitted through unaffected parents, but also why healthy individuals can harbour quite large numbers of potentially disadvantageous variants in their genomes without suffering any obvious ill effects. Reduced penetrance can be a function of the specific mutation(s) involved or of allele dosage. It may also result from differential allelic expression, copy number variation or the modulating influence of additional genetic variants in cis or in trans. The penetrance of some pathogenic genotypes is known to be age- and/or sex-dependent. Variable penetrance may also reflect the action of unlinked modifier genes, epigenetic changes or environmental factors. At least in some cases, complete penetrance appears to require the presence of one or more genetic variants at other loci. In this review, we summarize the evidence for reduced penetrance being a widespread phenomenon in human genetics and explore some of the molecular mechanisms that may help to explain this enigmatic characteristic of human inherited disease.

    Funded by: Wellcome Trust: 098051

    Human genetics 2013;132;10;1077-130

  • Novel Mycobacterium tuberculosis complex isolate from a wild chimpanzee.

    Coscolla M, Lewin A, Metzger S, Maetz-Rennsing K, Calvignac-Spencer S, Nitsche A, Dabrowski PW, Radonic A, Niemann S, Parkhill J, Couacy-Hymann E, Feldman J, Comas I, Boesch C, Gagneux S and Leendertz FH

    Swiss Tropical and Public Health Institute, Basel, Switzerland.

    Tuberculosis (TB) is caused by gram-positive bacteria known as the Mycobacterium tuberculosis complex (MTBC). MTBC include several human-associated lineages and several variants adapted to domestic and, more rarely, wild animal species. We report an M. tuberculosis strain isolated from a wild chimpanzee in Côte d'Ivoire that was shown by comparative genomic and phylogenomic analyses to belong to a new lineage of MTBC, closer to the human-associated lineage 6 (also known as M. africanum West Africa 2) than to the other classical animal-associated MTBC strains. These results show that the general view of the genetic diversity of MTBC is limited and support the possibility that other MTBC variants exist, particularly in wild mammals in Africa. Exploring this diversity is crucial to the understanding of the biology and evolutionary history of this widespread infectious disease.

    Funded by: NIAID NIH HHS: AI090928; PHS HHS: HHSN266200700022C; Wellcome Trust

    Emerging infectious diseases 2013;19;6;969-76

  • Full-genome deep sequencing and phylogenetic analysis of novel human betacoronavirus.

    Cotten M, Lam TT, Watson SJ, Palser AL, Petrova V, Grant P, Pybus OG, Rambaut A, Guan Y, Pillay D, Kellam P and Nastouli E

    Wellcome Trust Sanger Institute, Hinxton, UK.

    A novel betacoronavirus associated with lethal respiratory and renal complications was recently identified in patients from several countries in the Middle East. We report the deep genome sequencing of the virus directly from a patient's sputum sample. Our high-throughput sequencing yielded a substantial depth of genome sequence assembly and showed the minority viral variants in the specimen. Detailed phylogenetic analysis of the virus genome (England/Qatar/2012) revealed its close relationship to European bat coronaviruses circulating among the bat species of the Vespertilionidae family. Molecular clock analysis showed that the 2 human infections of this betacoronavirus in June 2012 (EMC/2012) and September 2012 (England/Qatar/2012) share a common virus ancestor most likely considerably before early 2012, suggesting the human diversity is the result of multiple zoonotic events.

    Funded by: Medical Research Council: MR/K006584/1; Wellcome Trust: 093724, 095831

    Emerging infectious diseases 2013;19;5;736-42B

  • Genome-wide association and longitudinal analyses reveal genetic loci linking pubertal height growth, pubertal timing and childhood adiposity.

    Cousminer DL, Berry DJ, Timpson NJ, Ang W, Thiering E, Byrne EM, Taal HR, Huikari V, Bradfield JP, Kerkhof M, Groen-Blokhuis MM, Kreiner-Møller E, Marinelli M, Holst C, Leinonen JT, Perry JR, Surakka I, Pietiläinen O, Kettunen J, Anttila V, Kaakinen M, Sovio U, Pouta A, Das S, Lagou V, Power C, Prokopenko I, Evans DM, Kemp JP, St Pourcain B, Ring S, Palotie A, Kajantie E, Osmond C, Lehtimäki T, Viikari JS, Kähönen M, Warrington NM, Lye SJ, Palmer LJ, Tiesler CM, Flexeder C, Montgomery GW, Medland SE, Hofman A, Hakonarson H, Guxens M, Bartels M, Salomaa V, ReproGen Consortium, Murabito JM, Kaprio J, Sørensen TI, Ballester F, Bisgaard H, Boomsma DI, Koppelman GH, Grant SF, Jaddoe VW, Martin NG, Heinrich J, Pennell CE, Raitakari OT, Eriksson JG, Smith GD, Hyppönen E, Järvelin MR, McCarthy MI, Ripatti S, Widén E and Early Growth Genetics (EGG) Consortium

    A full list of members is provided in the Supplementary Material.

    The pubertal height growth spurt is a distinctive feature of childhood growth reflecting both the central onset of puberty and local growth factors. Although little is known about the underlying genetics, growth variability during puberty correlates with adult risks for hormone-dependent cancer and adverse cardiometabolic health. The only gene so far associated with pubertal height growth, LIN28B, pleiotropically influences childhood growth, puberty and cancer progression, pointing to shared underlying mechanisms. To discover genetic loci influencing pubertal height and growth and to place them in context of overall growth and maturation, we performed genome-wide association meta-analyses in 18 737 European samples utilizing longitudinally collected height measurements. We found significant associations (P < 1.67 × 10(-8)) at 10 loci, including LIN28B. Five loci associated with pubertal timing, all impacting multiple aspects of growth. In particular, a novel variant correlated with expression of MAPK3, and associated both with increased prepubertal growth and earlier menarche. Another variant near ADCY3-POMC associated with increased body mass index, reduced pubertal growth and earlier puberty. Whereas epidemiological correlations suggest that early puberty marks a pathway from rapid prepubertal growth to reduced final height and adult obesity, our study shows that individual loci associating with pubertal growth have variable longitudinal growth patterns that may differ from epidemiological observations. Overall, this study uncovers part of the complex genetic architecture linking pubertal height growth, the timing of puberty and childhood obesity and provides new information to pinpoint processes linking these traits.

    Human molecular genetics 2013;22;13;2735-47

  • Large scale variation in DNA copy number in chicken breeds.

    Crooijmans RP, Fife MS, Fitzgerald TW, Strickland S, Cheng HH, Kaiser P, Redon R and Groenen MA

    Animal Breeding and Genomics Centre, Wageningen University, P,O, box 338, Wageningen 6700 AH, The Netherlands. richard.crooijmans@wur.nl.

    Background: Detecting genetic variation is a critical step in elucidating the molecular mechanisms underlying phenotypic diversity. Until recently, such detection has mostly focused on single nucleotide polymorphisms (SNPs) because of the ease in screening complete genomes. Another type of variant, copy number variation (CNV), is emerging as a significant contributor to phenotypic variation in many species. Here we describe a genome-wide CNV study using array comparative genomic hybridization (aCGH) in a wide variety of chicken breeds. Results: We identified 3,154 CNVs, grouped into 1,556 CNV regions (CNVRs). Thirty percent of the CNVs were detected in at least 2 individuals. The average size of the CNVs detected was 46.3 kb with the largest CNV, located on GGAZ, being 4.3 Mb. Approximately 75% of the CNVs are copy number losses relatively to the Red Jungle Fowl reference genome. The genome coverage of CNVRs in this study is 60 Mb, which represents almost 5.4% of the chicken genome. In particular large gene families such as the keratin gene family and the MHC show extensive CNV. Conclusions: A relative large group of the CNVs are line-specific, several of which were previously shown to be related to the causative mutation for a number of phenotypic variants. The chance that inter-specific CNVs fall into CNVRs detected in chicken is related to the evolutionary distance between the species. Our results provide a valuable resource for the study of genetic and phenotypic variation in this phenotypically diverse species.

    BMC genomics 2013;14;398

  • Identification of Null Alleles and Deletions from SNP Genotypes for an Intercross Between Domestic and Wild Chickens.

    Crooks L, Carlborg O, Marklund S and Johansson AM

    Wellcome Trust Sanger Institute.

    We analyzed genotypes from ~10K SNPs in two families of an F2 intercross between Red Junglefowl and White Leghorn chickens. Possible null alleles were found by patterns of incompatible and missing genotypes. We estimated that 2.6% of SNPs had null alleles compared to 2.3% with genotyping errors and that 40% of SNPs where a parent and offspring were genotyped as different homozygotes had null alleles. Putative deletions were identified by null alleles at adjacent markers. We found two candidate deletions that were supported by fluorescence intensity data from a 60K SNP chip. One of the candidate deletions was from the Red Junglefowl and one was present in both the Red Junglefowl and White Leghorn. Both candidate deletions spanned protein-coding regions and were close to a previously detected QTL affecting body weight in this population. This study demonstrates that the ~50K SNP genotyping arrays now available for several agricultural species can be used to identify null alleles and deletions in data from large families. We suggest that our approach could be a useful complement to linkage analysis in experimental crosses.

    G3 (Bethesda, Md.) 2013

  • A library of functional recombinant cell surface and secreted Plasmodium falciparum merozoite proteins.

    Crosnier C, Wanaguru M, McDade B, Osier FH, Marsh K, Rayner JC and Wright GJ

    Wellcome Trust Sanger Institute, United Kingdom;

    Malaria, an infectious disease caused by parasites of the Plasmodium genus, is one of the worlds major public health concerns causing up to a million deaths annually, mostly due to P. falciparum infections. All of the clinical symptoms are associated with the obligatory blood stage of infection, when a form of the parasite called the merozoite recognises and invades host erythrocytes. During erythrocyte invasion, merozoites are directly exposed to the host humoral immune system making the blood stage a conceptually attractive therapeutic target. Progress in the functional and molecular characterisation of P. falciparum merozoite proteins, however, has been hampered by the technical challenges associated with expressing these proteins in a biochemically active recombinant form. This challenge is particularly acute for extracellular proteins, which are the likely targets of host antibody responses, because they contain structurally-critical posttranslational modifications that are not added by some recombinant expression systems. Here, we report the development of a method that uses a mammalian expression system to compile a protein resource containing the entire ectodomains of 42 P. falciparum merozoite secreted and cell surface proteins, many of which have not previously been characterised. Importantly, we are able to recapitulate known biochemical activities by showing that recombinant MSP1-MSP7 and P12-P41 directly interact, and that both recombinant EBA175 and EBA140 can bind human erythrocytes in a sialic acid-dependent manner. Finally, we use sera from malaria-exposed immune adults to profile the relative immunoreactivity of the proteins and show that the majority of the antigens contain conformational (heat-labile) epitopes. We envisage that this resource of recombinant proteins will make a valuable contribution towards a molecular understanding of the blood stage of P. falciparum infections and facilitate the comparative screening of antigens as blood-stage vaccine candidates.

    Molecular &amp; cellular proteomics : MCP 2013

  • Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs.

    Cross-Disorder Group of the Psychiatric Genomics Consortium, Lee SH, Ripke S, Neale BM, Faraone SV, Purcell SM, Perlis RH, Mowry BJ, Thapar A, Goddard ME, Witte JS, Absher D, Agartz I, Akil H, Amin F, Andreassen OA, Anjorin A, Anney R, Anttila V, Arking DE, Asherson P, Azevedo MH, Backlund L, Badner JA, Bailey AJ, Banaschewski T, Barchas JD, Barnes MR, Barrett TB, Bass N, Battaglia A, Bauer M, Bayés M, Bellivier F, Bergen SE, Berrettini W, Betancur C, Bettecken T, Biederman J, Binder EB, Black DW, Blackwood DH, Bloss CS, Boehnke M, Boomsma DI, Breen G, Breuer R, Bruggeman R, Cormican P, Buccola NG, Buitelaar JK, Bunney WE, Buxbaum JD, Byerley WF, Byrne EM, Caesar S, Cahn W, Cantor RM, Casas M, Chakravarti A, Chambert K, Choudhury K, Cichon S, Cloninger CR, Collier DA, Cook EH, Coon H, Cormand B, Corvin A, Coryell WH, Craig DW, Craig IW, Crosbie J, Cuccaro ML, Curtis D, Czamara D, Datta S, Dawson G, Day R, De Geus EJ, Degenhardt F, Djurovic S, Donohoe GJ, Doyle AE, Duan J, Dudbridge F, Duketis E, Ebstein RP, Edenberg HJ, Elia J, Ennis S, Etain B, Fanous A, Farmer AE, Ferrier IN, Flickinger M, Fombonne E, Foroud T, Frank J, Franke B, Fraser C, Freedman R, Freimer NB, Freitag CM, Friedl M, Frisén L, Gallagher L, Gejman PV, Georgieva L, Gershon ES, Geschwind DH, Giegling I, Gill M, Gordon SD, Gordon-Smith K, Green EK, Greenwood TA, Grice DE, Gross M, Grozeva D, Guan W, Gurling H, De Haan L, Haines JL, Hakonarson H, Hallmayer J, Hamilton SP, Hamshere ML, Hansen TF, Hartmann AM, Hautzinger M, Heath AC, Henders AK, Herms S, Hickie IB, Hipolito M, Hoefels S, Holmans PA, Holsboer F, Hoogendijk WJ, Hottenga JJ, Hultman CM, Hus V, Ingason A, Ising M, Jamain S, Jones EG, Jones I, Jones L, Tzeng JY, Kähler AK, Kahn RS, Kandaswamy R, Keller MC, Kennedy JL, Kenny E, Kent L, Kim Y, Kirov GK, Klauck SM, Klei L, Knowles JA, Kohli MA, Koller DL, Konte B, Korszun A, Krabbendam L, Krasucki R, Kuntsi J, Kwan P, Landén M, Långström N, Lathrop M, Lawrence J, Lawson WB, Leboyer M, Ledbetter DH, Lee PH, Lencz T, Lesch KP, Levinson DF, Lewis CM, Li J, Lichtenstein P, Lieberman JA, Lin DY, Linszen DH, Liu C, Lohoff FW, Loo SK, Lord C, Lowe JK, Lucae S, Macintyre DJ, Madden PA, Maestrini E, Magnusson PK, Mahon PB, Maier W, Malhotra AK, Mane SM, Martin CL, Martin NG, Mattheisen M, Matthews K, Mattingsdal M, McCarroll SA, McGhee KA, McGough JJ, McGrath PJ, McGuffin P, McInnis MG, McIntosh A, McKinney R, McLean AW, McMahon FJ, McMahon WM, McQuillin A, Medeiros H, Medland SE, Meier S, Melle I, Meng F, Meyer J, Middeldorp CM, Middleton L, Milanova V, Miranda A, Monaco AP, Montgomery GW, Moran JL, Moreno-De-Luca D, Morken G, Morris DW, Morrow EM, Moskvina V, Muglia P, Mühleisen TW, Muir WJ, Müller-Myhsok B, Murtha M, Myers RM, Myin-Germeys I, Neale MC, Nelson SF, Nievergelt CM, Nikolov I, Nimgaonkar V, Nolen WA, Nöthen MM, Nurnberger JI, Nwulia EA, Nyholt DR, O'Dushlaine C, Oades RD, Olincy A, Oliveira G, Olsen L, Ophoff RA, Osby U, Owen MJ, Palotie A, Parr JR, Paterson AD, Pato CN, Pato MT, Penninx BW, Pergadia ML, Pericak-Vance MA, Pickard BS, Pimm J, Piven J, Posthuma D, Potash JB, Poustka F, Propping P, Puri V, Quested DJ, Quinn EM, Ramos-Quiroga JA, Rasmussen HB, Raychaudhuri S, Rehnström K, Reif A, Ribasés M, Rice JP, Rietschel M, Roeder K, Roeyers H, Rossin L, Rothenberger A, Rouleau G, Ruderfer D, Rujescu D, Sanders AR, Sanders SJ, Santangelo SL, Sergeant JA, Schachar R, Schalling M, Schatzberg AF, Scheftner WA, Schellenberg GD, Scherer SW, Schork NJ, Schulze TG, Schumacher J, Schwarz M, Scolnick E, Scott LJ, Shi J, Shilling PD, Shyn SI, Silverman JM, Slager SL, Smalley SL, Smit JH, Smith EN, Sonuga-Barke EJ, St Clair D, State M, Steffens M, Steinhausen HC, Strauss JS, Strohmaier J, Stroup TS, Sutcliffe JS, Szatmari P, Szelinger S, Thirumalai S, Thompson RC, Todorov AA, Tozzi F, Treutlein J, Uhr M, van den Oord EJ, Van Grootheest G, Van Os J, Vicente AM, Vieland VJ, Vincent JB, Visscher PM, Walsh CA, Wassink TH, Watson SJ, Weissman MM, Werge T, Wienker TF, Wijsman EM, Willemsen G, Williams N, Willsey AJ, Witt SH, Xu W, Young AH, Yu TW, Zammit S, Zandi PP, Zhang P, Zitman FG, Zöllner S, International Inflammatory Bowel Disease Genetics Consortium (IIBDGC), Devlin B, Kelsoe JR, Sklar P, Daly MJ, O'Donovan MC, Craddock N, Sullivan PF, Smoller JW, Kendler KS and Wray NR

    The University of Queensland, Queensland Brain Institute, Brisbane, Queensland, Australia.

    Most psychiatric disorders are moderately to highly heritable. The degree to which genetic variation is unique to individual disorders or shared across disorders is unclear. To examine shared genetic etiology, we use genome-wide genotype data from the Psychiatric Genomics Consortium (PGC) for cases and controls in schizophrenia, bipolar disorder, major depressive disorder, autism spectrum disorders (ASD) and attention-deficit/hyperactivity disorder (ADHD). We apply univariate and bivariate methods for the estimation of genetic variation within and covariation between disorders. SNPs explained 17-29% of the variance in liability. The genetic correlation calculated using common SNPs was high between schizophrenia and bipolar disorder (0.68 ± 0.04 s.e.), moderate between schizophrenia and major depressive disorder (0.43 ± 0.06 s.e.), bipolar disorder and major depressive disorder (0.47 ± 0.06 s.e.), and ADHD and major depressive disorder (0.32 ± 0.07 s.e.), low between schizophrenia and ASD (0.16 ± 0.06 s.e.) and non-significant for other pairs of disorders as well as between psychiatric disorders and the negative control of Crohn's disease. This empirical evidence of shared genetic etiology for psychiatric disorders can inform nosology and encourages the investigation of common pathophysiologies for related disorders.

    Nature genetics 2013;45;9;984-94

  • Population genomics of post-vaccine changes in pneumococcal epidemiology.

    Croucher NJ, Finkelstein JA, Pelton SI, Mitchell PK, Lee GM, Parkhill J, Bentley SD, Hanage WP and Lipsitch M

    Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, USA.

    Whole-genome sequencing of 616 asymptomatically carried Streptococcus pneumoniae isolates was used to study the impact of the 7-valent pneumococcal conjugate vaccine. Comparison of closely related isolates showed the role of transformation in facilitating capsule switching to non-vaccine serotypes and the emergence of drug resistance. However, such recombination was found to occur at significantly different rates across the species, and the evolution of the population was primarily driven by changes in the frequency of distinct genotypes extant before the introduction of the vaccine. These alterations resulted in little overall effect on accessory genome composition at the population level, contrasting with the decrease in pneumococcal disease rates after the vaccine's introduction.

    Funded by: NIAID NIH HHS: R01 AI066304, R01AI066304; Wellcome Trust: 098051

    Nature genetics 2013;45;6;656-63

  • Bacterial genomes in epidemiology--present and future.

    Croucher NJ, Harris SR, Grad YH and Hanage WP

    Department of Epidemiology, Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA, USA.

    Sequence data are well established in the reconstruction of the phylogenetic and demographic scenarios that have given rise to outbreaks of viral pathogens. The application of similar methods to bacteria has been hindered in the main by the lack of high-resolution nucleotide sequence data from quality samples. Developing and already available genomic methods have greatly increased the amount of data that can be used to characterize an isolate and its relationship to others. However, differences in sequencing platforms and data analysis mean that these enhanced data come with a cost in terms of portability: results from one laboratory may not be directly comparable with those from another. Moreover, genomic data for many bacteria bear the mark of a history including extensive recombination, which has the potential to greatly confound phylogenetic and coalescent analyses. Here, we discuss the exacting requirements of genomic epidemiology, and means by which the distorting signal of recombination can be minimized to permit the leverage of growing datasets of genomic data from bacterial pathogens.

    Funded by: NIAID NIH HHS: T32 AI007061; NIGMS NIH HHS: GM088558-01, U54 GM088558; Wellcome Trust: 098051

    Philosophical transactions of the Royal Society of London. Series B, Biological sciences 2013;368;1614;20120202

  • Dominant Role of Nucleotide Substitution in the Diversification of Serotype 3 Pneumococci over Decades and during a Single Infection.

    Croucher NJ, Mitchell AM, Gould KA, Inverarity D, Barquist L, Feltwell T, Fookes MC, Harris SR, Dordel J, Salter SJ, Browall S, Zemlickova H, Parkhill J, Normark S, Henriques-Normark B, Hinds J, Mitchell TJ and Bentley SD

    Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom ; Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, United States of America.

    Streptococcus pneumoniae of serotype 3 possess a mucoid capsule and cause disease associated with high mortality rates relative to other pneumococci. Phylogenetic analysis of a complete reference genome and 81 draft sequences from clonal complex 180, the predominant serotype 3 clone in much of the world, found most sampled isolates belonged to a clade affected by few diversifying recombinations. However, other isolates indicate significant genetic variation has accumulated over the clonal complex's entire history. Two closely related genomes, one from the blood and another from the cerebrospinal fluid, were obtained from a patient with meningitis. The pair differed in their behaviour in a mouse model of disease and in their susceptibility to antimicrobials, with at least some of these changes attributable to a mutation that up-regulated the patAB efflux pump. This indicates clinically important phenotypic variation can accumulate rapidly through small alterations to the genotype.

    PLoS genetics 2013;9;10;e1003868

  • High-resolution mapping of complex traits with a four-parent advanced intercross yeast population.

    Cubillos FA, Parts L, Salinas F, Bergström A, Scovacricchi E, Zia A, Illingworth CJ, Mustonen V, Ibstedt S, Warringer J, Louis EJ, Durbin R and Liti G

    Centre for Genetics and Genomics, Queen's Medical Centre, University of Nottingham, Nottingham, NG7 2UH, United Kingdom.

    A large fraction of human complex trait heritability is due to a high number of variants with small marginal effects and their interactions with genotype and environment. Such alleles are more easily studied in model organisms, where environment, genetic makeup, and allele frequencies can be controlled. Here, we examine the effect of natural genetic variation on heritable traits in a very large pool of baker's yeast from a multiparent 12th generation intercross. We selected four representative founder strains to produce the Saccharomyces Genome Resequencing Project (SGRP)-4X mapping population and sequenced 192 segregants to generate an accurate genetic map. Using these individuals, we mapped 25 loci linked to growth traits under heat stress, arsenite, and paraquat, the majority of which were best explained by a diverging phenotype caused by a single allele in one condition. By sequencing pooled DNA from millions of segregants grown under heat stress, we further identified 34 and 39 regions selected in haploid and diploid pools, respectively, with most of the selection against a single allele. While the most parsimonious model for the majority of loci mapped using either approach was the effect of an allele private to one founder, we could validate examples of pleiotropic effects and complex allelic series at a locus. SGRP-4X is a deeply characterized resource that provides a framework for powerful and high-resolution genetic analysis of yeast phenotypes and serves as a test bed for testing avenues to attack human complex traits.

    Funded by: Wellcome Trust: 098051, WT077192/Z/05/Z

    Genetics 2013;195;3;1141-55

  • SMIM1 underlies the Vel blood group and influences red blood cell traits.

    Cvejic A, Haer-Wigman L, Stephens JC, Kostadima M, Smethurst PA, Frontini M, van den Akker E, Bertone P, Bielczyk-Maczyńska E, Farrow S, Fehrmann RS, Gray A, de Haas M, Haver VG, Jordan G, Karjalainen J, Kerstens HH, Kiddle G, Lloyd-Jones H, Needs M, Poole J, Soussan AA, Rendon A, Rieneck K, Sambrook JG, Schepers H, Silljé HH, Sipos B, Swinkels D, Tamuri AU, Verweij N, Watkins NA, Westra HJ, Stemple D, Franke L, Soranzo N, Stunnenberg HG, Goldman N, van der Harst P, van der Schoot CE, Ouwehand WH and Albers CA

    Department of Haematology, University of Cambridge, Cambridge, UK. as889@cam.ac.uk

    The blood group Vel was discovered 60 years ago, but the underlying gene is unknown. Individuals negative for the Vel antigen are rare and are required for the safe transfusion of patients with antibodies to Vel. To identify the responsible gene, we sequenced the exomes of five individuals negative for the Vel antigen and found that four were homozygous and one was heterozygous for a low-frequency 17-nucleotide frameshift deletion in the gene encoding the 78-amino-acid transmembrane protein SMIM1. A follow-up study showing that 59 of 64 Vel-negative individuals were homozygous for the same deletion and expression of the Vel antigen on SMIM1-transfected cells confirm SMIM1 as the gene underlying the Vel blood group. An expression quantitative trait locus (eQTL), the common SNP rs1175550 contributes to variable expression of the Vel antigen (P = 0.003) and influences the mean hemoglobin concentration of red blood cells (RBCs; P = 8.6 × 10(-15)). In vivo, zebrafish with smim1 knockdown showed a mild reduction in the number of RBCs, identifying SMIM1 as a new regulator of RBC formation. Our findings are of immediate relevance, as the homozygous presence of the deletion allows the unequivocal identification of Vel-negative blood donors.

    Funded by: British Heart Foundation: RG/09/12/28096; Cancer Research UK: C45041/A14953; Wellcome Trust: 082597/Z/07/Z, 084183/Z/07/Z

    Nature genetics 2013;45;5;542-5

  • Horizontally acquired glycosyltransferase operons drive salmonellae lipopolysaccharide diversity.

    Davies MR, Broadbent SE, Harris SR, Thomson NR and van der Woude MW

    Centre for Immunology and Infection, Hull York Medical School and the Department of Biology, University of York, York, United Kingdom.

    The immunodominant lipopolysaccharide is a key antigenic factor for Gram-negative pathogens such as salmonellae where it plays key roles in host adaptation, virulence, immune evasion, and persistence. Variation in the lipopolysaccharide is also the major differentiating factor that is used to classify Salmonella into over 2600 serovars as part of the Kaufmann-White scheme. While lipopolysaccharide diversity is generally associated with sequence variation in the lipopolysaccharide biosynthesis operon, extraneous genetic factors such as those encoded by the glucosyltransferase (gtr) operons provide further structural heterogeneity by adding additional sugars onto the O-antigen component of the lipopolysaccharide. Here we identify and examine the O-antigen modifying glucosyltransferase genes from the genomes of Salmonella enterica and Salmonella bongori serovars. We show that Salmonella generally carries between 1 and 4 gtr operons that we have classified into 10 families on the basis of gtrC sequence with apparent O-antigen modification detected for five of these families. The gtr operons localize to bacteriophage-associated genomic regions and exhibit a dynamic evolutionary history driven by recombination and gene shuffling events leading to new gene combinations. Furthermore, evidence of Dam- and OxyR-dependent phase variation of gtr gene expression was identified within eight gtr families. Thus, as O-antigen modification generates significant intra- and inter-strain phenotypic diversity, gtr-mediated modification is fundamental in assessing Salmonella strain variability. This will inform appropriate vaccine and diagnostic approaches, in addition to contributing to our understanding of host-pathogen interactions.

    PLoS genetics 2013;9;6;e1003568

  • Structural and functional annotation of the porcine immunome.

    Dawson HD, Loveland JE, Pascal G, Gilbert JG, Uenishi H, Mann KM, Sang Y, Zhang J, Carvalho-Silva D, Hunt T, Hardy M, Hu Z, Zhao SH, Anselmo A, Shinkai H, Chen C, Badaoui B, Berman D, Amid C, Kay M, Lloyd D, Snow C, Morozumi T, Cheng RP, Bystrom M, Kapetanovic R, Schwartz JC, Kataria R, Astley M, Fritz E, Steward C, Thomas M, Wilming L, Toki D, Archibald AL, Bed'Hom B, Beraldi D, Huang TH, Ait-Ali T, Blecha F, Botti S, Freeman TC, Giuffra E, Hume DA, Lunney JK, Murtaugh MP, Reecy JM, Harrow JL, Rogel-Gaillard C and Tuggle CK

    USDA-ARS, Beltsville Human Nutrition Research Center, Diet, Genomics, Immunology Laboratory, Beltsville, MD 20705, USA.

    Background: The domestic pig is known as an excellent model for human immunology and the two species share many pathogens. Susceptibility to infectious disease is one of the major constraints on swine performance, yet the structure and function of genes comprising the pig immunome are not well-characterized. The completion of the pig genome provides the opportunity to annotate the pig immunome, and compare and contrast pig and human immune systems.

    Results: The Immune Response Annotation Group (IRAG) used computational curation and manual annotation of the swine genome assembly 10.2 (Sscrofa10.2) to refine the currently available automated annotation of 1,369 immunity-related genes through sequence-based comparison to genes in other species. Within these genes, we annotated 3,472 transcripts. Annotation provided evidence for gene expansions in several immune response families, and identified artiodactyl-specific expansions in the cathelicidin and type 1 Interferon families. We found gene duplications for 18 genes, including 13 immune response genes and five non-immune response genes discovered in the annotation process. Manual annotation provided evidence for many new alternative splice variants and 8 gene duplications. Over 1,100 transcripts without porcine sequence evidence were detected using cross-species annotation. We used a functional approach to discover and accurately annotate porcine immune response genes. A co-expression clustering analysis of transcriptomic data from selected experimental infections or immune stimulations of blood, macrophages or lymph nodes identified a large cluster of genes that exhibited a correlated positive response upon infection across multiple pathogens or immune stimuli. Interestingly, this gene cluster (cluster 4) is enriched for known general human immune response genes, yet contains many un-annotated porcine genes. A phylogenetic analysis of the encoded proteins of cluster 4 genes showed that 15% exhibited an accelerated evolution as compared to 4.1% across the entire genome.

    Conclusions: This extensive annotation dramatically extends the genome-based knowledge of the molecular genetics and structure of a major portion of the porcine immunome. Our complementary functional approach using co-expression during immune response has provided new putative immune response annotation for over 500 porcine genes. Our phylogenetic analysis of this core immunome cluster confirms rapid evolutionary change in this set of genes, and that, as in other species, such genes are important components of the pig's adaptation to pathogen challenge over evolutionary time. These comprehensive and integrated analyses increase the value of the porcine genome sequence and provide important tools for global analyses and data-mining of the porcine immune response.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/E010520/1, BB/E010520/2, BB/G004013/1, BB/I025328/1, EC FP6; NCRR NIH HHS: P20-RR017686; NIAID NIH HHS: T32 AI083196, T32 AI83196; Wellcome Trust: 098051

    BMC genomics 2013;14;332

  • Prelamin A causes progeria through cell-extrinsic mechanisms and prevents cancer invasion.

    de la Rosa J, Freije JM, Cabanillas R, Osorio FG, Fraga MF, Fernández-García MS, Rad R, Fanjul V, Ugalde AP, Liang Q, Prosser HM, Bradley A, Cadiñanos J and López-Otín C

    Instituto de Medicina Oncológica y Molecular de Asturias IMOMA, 33193 Oviedo, Spain.

    Defining the relationship between ageing and cancer is a crucial but challenging task. Mice deficient in Zmpste24, a metalloproteinase mutated in human progeria and involved in nuclear prelamin A maturation, recapitulate multiple features of ageing. However, their short lifespan and serious cell-intrinsic and cell-extrinsic alterations restrict the application and interpretation of carcinogenesis protocols. Here we present Zmpste24 mosaic mice that lack these limitations. Zmpste24 mosaic mice develop normally and keep similar proportions of Zmpste24-deficient (prelamin A-accumulating) and Zmpste24-proficient (mature lamin A-containing) cells throughout life, revealing that cell-extrinsic mechanisms are preeminent for progeria development. Moreover, prelamin A accumulation does not impair tumour initiation and growth, but it decreases the incidence of infiltrating oral carcinomas. Accordingly, silencing of ZMPSTE24 reduces human cancer cell invasiveness. Our results support the potential of cell-based and systemic therapies for progeria and highlight ZMPSTE24 as a new anticancer target.

    Funded by: Wellcome Trust: 079643

    Nature communications 2013;4;2268

  • Mutational genomics for cancer pathway discovery

    De Ridder,J., Kool,J, Uren,A.G., Bot,J., De Jong,J., RUST,A.G., Berns,A., Van Lohuizen,M., ADAMS,D.J., Wessels,L. and Reinders,M.;

    Lecture Notes in Computer Science  2013;7986;35-46

  • Identification of heart rate-associated loci and their effects on cardiac conduction and rhythm disorders.

    den Hoed M, Eijgelsheim M, Esko T, Brundel BJ, Peal DS, Evans DM, Nolte IM, Segrè AV, Holm H, Handsaker RE, Westra HJ, Johnson T, Isaacs A, Yang J, Lundby A, Zhao JH, Kim YJ, Go MJ, Almgren P, Bochud M, Boucher G, Cornelis MC, Gudbjartsson D, Hadley D, van der Harst P, Hayward C, den Heijer M, Igl W, Jackson AU, Kutalik Z, Luan J, Kemp JP, Kristiansson K, Ladenvall C, Lorentzon M, Montasser ME, Njajou OT, O'Reilly PF, Padmanabhan S, St Pourcain B, Rankinen T, Salo P, Tanaka T, Timpson NJ, Vitart V, Waite L, Wheeler W, Zhang W, Draisma HH, Feitosa MF, Kerr KF, Lind PA, Mihailov E, Onland-Moret NC, Song C, Weedon MN, Xie W, Yengo L, Absher D, Albert CM, Alonso A, Arking DE, de Bakker PI, Balkau B, Barlassina C, Benaglio P, Bis JC, Bouatia-Naji N, Brage S, Chanock SJ, Chines PS, Chung M, Darbar D, Dina C, Dörr M, Elliott P, Felix SB, Fischer K, Fuchsberger C, de Geus EJ, Goyette P, Gudnason V, Harris TB, Hartikainen AL, Havulinna AS, Heckbert SR, Hicks AA, Hofman A, Holewijn S, Hoogstra-Berends F, Hottenga JJ, Jensen MK, Johansson A, Junttila J, Kääb S, Kanon B, Ketkar S, Khaw KT, Knowles JW, Kooner AS, Kors JA, Kumari M, Milani L, Laiho P, Lakatta EG, Langenberg C, Leusink M, Liu Y, Luben RN, Lunetta KL, Lynch SN, Markus MR, Marques-Vidal P, Mateo Leach I, McArdle WL, McCarroll SA, Medland SE, Miller KA, Montgomery GW, Morrison AC, Müller-Nurasyid M, Navarro P, Nelis M, O'Connell JR, O'Donnell CJ, Ong KK, Newman AB, Peters A, Polasek O, Pouta A, Pramstaller PP, Psaty BM, Rao DC, Ring SM, Rossin EJ, Rudan D, Sanna S, Scott RA, Sehmi JS, Sharp S, Shin JT, Singleton AB, Smith AV, Soranzo N, Spector TD, Stewart C, Stringham HM, Tarasov KV, Uitterlinden AG, Vandenput L, Hwang SJ, Whitfield JB, Wijmenga C, Wild SH, Willemsen G, Wilson JF, Witteman JC, Wong A, Wong Q, Jamshidi Y, Zitting P, Boer JM, Boomsma DI, Borecki IB, van Duijn CM, Ekelund U, Forouhi NG, Froguel P, Hingorani A, Ingelsson E, Kivimaki M, Kronmal RA, Kuh D, Lind L, Martin NG, Oostra BA, Pedersen NL, Quertermous T, Rotter JI, van der Schouw YT, Verschuren WM, Walker M, Albanes D, Arnar DO, Assimes TL, Bandinelli S, Boehnke M, de Boer RA, Bouchard C, Caulfield WL, Chambers JC, Curhan G, Cusi D, Eriksson J, Ferrucci L, van Gilst WH, Glorioso N, de Graaf J, Groop L, Gyllensten U, Hsueh WC, Hu FB, Huikuri HV, Hunter DJ, Iribarren C, Isomaa B, Jarvelin MR, Jula A, Kähönen M, Kiemeney LA, van der Klauw MM, Kooner JS, Kraft P, Iacoviello L, Lehtimäki T, Lokki ML, Mitchell BD, Navis G, Nieminen MS, Ohlsson C, Poulter NR, Qi L, Raitakari OT, Rimm EB, Rioux JD, Rizzi F, Rudan I, Salomaa V, Sever PS, Shields DC, Shuldiner AR, Sinisalo J, Stanton AV, Stolk RP, Strachan DP, Tardif JC, Thorsteinsdottir U, Tuomilehto J, van Veldhuisen DJ, Virtamo J, Viikari J, Vollenweider P, Waeber G, Widen E, Cho YS, Olsen JV, Visscher PM, Willer C, Franke L, Global BPgen Consortium, CARDIoGRAM Consortium, Erdmann J, Thompson JR, PR GWAS Consortium, Pfeufer A, QRS GWAS Consortium, Sotoodehnia N, QT-IGC Consortium, Newton-Cheh C, CHARGE-AF Consortium, Ellinor PT, Stricker BH, Metspalu A, Perola M, Beckmann JS, Smith GD, Stefansson K, Wareham NJ, Munroe PB, Sibon OC, Milan DJ, Snieder H, Samani NJ and Loos RJ

    Medical Research Council MRC Epidemiology Unit, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge, UK.

    Elevated resting heart rate is associated with greater risk of cardiovascular disease and mortality. In a 2-stage meta-analysis of genome-wide association studies in up to 181,171 individuals, we identified 14 new loci associated with heart rate and confirmed associations with all 7 previously established loci. Experimental downregulation of gene expression in Drosophila melanogaster and Danio rerio identified 20 genes at 11 loci that are relevant for heart rate regulation and highlight a role for genes involved in signal transmission, embryonic cardiac development and the pathophysiology of dilated cardiomyopathy, congenital heart failure and/or sudden cardiac death. In addition, genetic susceptibility to increased heart rate is associated with altered cardiac conduction and reduced risk of sick sinus syndrome, and both heart rate-increasing and heart rate-decreasing variants associate with risk of atrial fibrillation. Our findings provide fresh insights into the mechanisms regulating heart rate and identify new therapeutic targets.

    Funded by: British Heart Foundation: PG/11/63/29011, PG/12/38/29615; Chief Scientist Office: CZB/4/710; Medical Research Council: G0600705, G0801056, G1000143, G9815508, MC_PC_U127561128, MC_U106179471, MC_U106179472, MC_U106179473, MC_U106188470, MC_U123092720, MC_U127592696, MC_UP_A100_1003; NCATS NIH HHS: UL1 TR000124; NHLBI NIH HHS: R00 HL094535, R01 HL090620, R01 HL092217, R01 HL105756, R01 HL111314, U19 HL065962; NIDDK NIH HHS: P30 DK063491, P30 DK072488; NIGMS NIH HHS: T32 GM007753; Wellcome Trust: 092731

    Nature genetics 2013;45;6;621-31

  • Association of HIV and ART with cardiometabolic traits in sub-Saharan Africa: a systematic review and meta-analysis.

    Dillon DG, Gurdasani D, Riha J, Ekoru K, Asiki G, Mayanja BN, Levitt NS, Crowther NJ, Nyirenda M, Njelekela M, Ramaiya K, Nyan O, Adewole OO, Anastos K, Azzoni L, Boom WH, Compostella C, Dave JA, Dawood H, Erikstrup C, Fourie CM, Friis H, Kruger A, Idoko JA, Longenecker CT, Mbondi S, Mukaya JE, Mutimura E, Ndhlovu CE, Praygod G, Pefura Yone EW, Pujades-Rodriguez M, Range N, Sani MU, Schutte AE, Sliwa K, Tien PC, Vorster EH, Walsh C, Zinyama R, Mashili F, Sobngwi E, Adebamowo C, Kamali A, Seeley J, Young EH, Smeeth L, Motala AA, Kaleebu P, Sandhu MS and African Partnership for Chronic Disease Research (APCDR)

    Department of Public Health and Primary Care, Institute of Public Health, University of Cambridge, Cambridge, UK, Genetic Epidemiology Group, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK, MRC/UVRI Uganda Research Unit on AIDS, Entebbe, Uganda, Division of Diabetic Medicine and Endocrinology, Department of Medicine, University of Cape Town, Cape Town, South Africa; Chronic Diseases Initiative in Africa, Department of Chemical Pathology, National Health Laboratory Service, University of the Witwatersrand Medical School, Johannesburg, South Africa, Malawi-Liverpool-Wellcome Trust Clinical Research Programme, Blantyre, Malawi, Department of Physiology, Muhimbili University of Health and Allied Sciences, Dar es Salaam, Tanzania, Department of Medicine, Muhimbili University of Health and Allied Sciences, Dar es Salaam, Tanzania, Royal Victoria Teaching Hospital, School of Medicine, University of The Gambia, Banjul, The Gambia, Department of Medicine, Obafemi Awolowo University, Ile Ife, Nigeria, Women's Equity in Access to Care &Treatment, Kigali, Rwanda, HIV-1 Immunopathogenesis Laboratory, Wistar Institute, Philadelphia, PA, Tuberculosis Research Unit, Department of Medicine, Case Western Reserve University, Cleveland, OH, Department of Medical and Surgical Sciences, University of Padua, Padua, Italy, Division of Diabetic Medicine and Endocrinology, Department of Medicine, University of Cape Town, Cape Town, South Africa, Infectious Diseases Unit, Department of Medicine, Grey's Hospital, Pietermaritzburg, South Africa, Department of Clinical Immunology, Aarhus University Hospital, Aarhus, Denmark, HART (Hypertension in Africa Research Team), North-West University, Potchefstroom, South Africa, Department of Nutrition, Exercise and Sports, Faculty of Science, University of Copenhagen, Copenhagen, Denmark, Africa Unit for Transdisciplinary Health Research (AUTHeR), North-West University, Potchefstroom, South Africa, Department of Medicine, Jos University Teachin

    Background: Sub-Saharan Africa (SSA) has the highest burden of HIV in the world and a rising prevalence of cardiometabolic disease; however, the interrelationship between HIV, antiretroviral therapy (ART) and cardiometabolic traits is not well described in SSA populations. Methods: We conducted a systematic review and meta-analysis through MEDLINE and EMBASE (up to January 2012), as well as direct author contact. Eligible studies provided summary or individual-level data on one or more of the following traits in HIV+ and HIV-, or ART+ and ART- subgroups in SSA: body mass index (BMI), systolic blood pressure (SBP), diastolic blood pressure (DBP), high-density lipoprotein (HDL), low-density lipoprotein (LDL), triglycerides (TGs) and fasting blood glucose (FBG) or glycated hemoglobin (HbA1c). Information was synthesized under a random-effects model and the primary outcomes were the standardized mean differences (SMD) of the specified traits between subgroups of participants. Results: Data were obtained from 49 published and 3 unpublished studies which reported on 29 755 individuals. HIV infection was associated with higher TGs [SMD, 0.26; 95% confidence interval (CI), 0.08 to 0.44] and lower HDL (SMD, -0.59; 95% CI, -0.86 to -0.31), BMI (SMD, -0.32; 95% CI, -0.45 to -0.18), SBP (SMD, -0.40; 95% CI, -0.55 to -0.25) and DBP (SMD, -0.34; 95% CI, -0.51 to -0.17). Among HIV+ individuals, ART use was associated with higher LDL (SMD, 0.43; 95% CI, 0.14 to 0.72) and HDL (SMD, 0.39; 95% CI, 0.11 to 0.66), and lower HbA1c (SMD, -0.34; 95% CI, -0.62 to -0.06). Fully adjusted estimates from analyses of individual participant data were consistent with meta-analysis of summary estimates for most traits. Conclusions: Broadly consistent with results from populations of European descent, these results suggest differences in cardiometabolic traits between HIV-infected and uninfected individuals in SSA, which might be modified by ART use. In a region with the highest burden of HIV, it will be important to clarify these findings to reliably assess the need for monitoring and managing cardiometabolic risk in HIV-infected populations in SSA.

    International journal of epidemiology 2013;42;6;1754-71

  • Multi-allelic Phenotyping - A systematic approach for the simultaneous analysis of multiple induced mutations.

    Dooley CM, Scahill C, Fényes F, W Kettleborough RN, Stemple DL and Busch-Nentwich EM

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    The zebrafish mutation project (ZMP) aims to generate a loss of function allele for every protein-coding gene, but importantly to also characterise the phenotypes of these alleles during the first five days of development. Such a large-scale screen requires a systematic approach both to identifying phenotypes, and also to linking those phenotypes to specific mutations. This phenotyping pipeline simultaneously assesses the consequences of multiple alleles in a two-step process. First, mutations that do not produce a visible phenotype during the first five days of development are identified, while a second round of phenotyping focuses on detailed analysis of those alleles that are suspected to cause a phenotype. Allele-specific PCR single nucleotide polymorphism (SNP) assays are used to genotype F2 parents and individual F3 fry for mutations known to be present in the F1 founder. With this method specific phenotypes can be linked to induced mutations. In addition a method is described for cryopreserving sperm samples of mutagenised males and their subsequent use for in vitro fertilisation to generate F2 families for phenotyping. Ultimately this approach will lead to the functional annotation of the zebrafish genome, which will deepen our understanding of gene function in development and disease.

    Methods (San Diego, Calif.) 2013

  • Back to the future!

    Dordel J and Reuter S

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2013;11;9;600

  • Histone deacetylase 1 and 2 are essential for normal T-cell development and genomic stability in mice.

    Dovey OM, Foster CT, Conte N, Edwards SA, Edwards JM, Singh R, Vassiliou G, Bradley A and Cowley SM

    Department of Biochemistry, University of Leicester, Leicester, UK.

    Histone deacetylase 1 and 2 (HDAC1/2) regulate chromatin structure as the catalytic core of the Sin3A, NuRD and CoREST co-repressor complexes. To better understand the key pathways regulated by HDAC1/2 in the adaptive immune system and inform their exploitation as drug targets, we have generated mice with a T-cell specific deletion. Loss of either HDAC1 or HDAC2 alone has little effect, while dual inactivation results in a 5-fold reduction in thymocyte cellularity, accompanied by developmental arrest at the double-negative to double-positive transition. Transcriptome analysis revealed 892 misregulated genes in Hdac1/2 knock-out thymocytes, including down-regulation of LAT, Themis and Itk, key components of the T-cell receptor (TCR) signaling pathway. Down-regulation of these genes suggests a model in which HDAC1/2 deficiency results in defective propagation of TCR signaling, thus blocking development. Furthermore, mice with reduced HDAC1/2 activity (Hdac1 deleted and a single Hdac2 allele) develop a lethal pathology by 3-months of age, caused by neoplastic transformation of immature T cells in the thymus. Tumor cells become aneuploid, express increased levels of c-Myc and show elevated levels of the DNA damage marker, γH2AX. These data demonstrate a crucial role for HDAC1/2 in T-cell development and the maintenance of genomic stability.

    Funded by: Medical Research Council: G0600135, MR/J009202/1; Wellcome Trust: 079643, 095663

    Blood 2013;121;8;1335-44

  • The presence of methylation quantitative trait Loci indicates a direct genetic influence on the level of DNA methylation in adipose tissue.

    Drong AW, Nicholson G, Hedman AK, Meduri E, Grundberg E, Small KS, Shin SY, Bell JT, Karpe F, Soranzo N, Spector TD, McCarthy MI, Deloukas P, Rantalainen M, Lindgren CM and MolPAGE Consortia

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.

    Genetic variants that associate with DNA methylation at CpG sites (methylation quantitative trait loci, meQTLs) offer a potential biological mechanism of action for disease associated SNPs. We investigated whether meQTLs exist in abdominal subcutaneous adipose tissue (SAT) and if CpG methylation associates with metabolic syndrome (MetSyn) phenotypes. We profiled 27,718 genomic regions in abdominal SAT samples of 38 unrelated individuals using differential methylation hybridization (DMH) together with genotypes at 5,227,243 SNPs and expression of 17,209 mRNA transcripts. Validation and replication of significant meQTLs was pursued in an independent cohort of 181 female twins. We find that, at 5% false discovery rate, methylation levels of 149 DMH regions associate with at least one SNP in a ±500 kilobase cis-region in our primary study. We sought to validate 19 of these in the replication study and find that five of these significantly associate with the corresponding meQTL SNPs from the primary study. We find that none of the 149 meQTL top SNPs is a significant expression quantitative trait locus in our expression data, but we observed association between expression levels of two mRNA transcripts and cis-methylation status. Our results indicate that DNA CpG methylation in abdominal SAT is partly under genetic control. This study provides a starting point for future investigations of DNA methylation in adipose tissue.

    PloS one 2013;8;2;e55923

  • Sequencing and Functional Annotation of Avian Pathogenic Escherichia coli Serogroup O78 Strains Reveal the Evolution of E. coli Lineages Pathogenic for Poultry via Distinct Mechanisms.

    Dziva F, Hauser H, Connor TR, van Diemen PM, Prescott G, Langridge GC, Eckert S, Chaudhuri RR, Ewers C, Mellata M, Mukhopadhyay S, Curtiss R, Dougan G, Wieler LH, Thomson NR, Pickard DJ and Stevens MP

    Enteric Bacterial Pathogens Laboratory, Institute for Animal Health, Compton, Berkshire, United Kingdom.

    Avian pathogenic Escherichia coli (APEC) causes respiratory and systemic disease in poultry. Sequencing of a multilocus sequence type 95 (ST95) serogroup O1 strain previously indicated that APEC resembles E. coli causing extraintestinal human diseases. We sequenced the genomes of two strains of another dominant APEC lineage (ST23 serogroup O78 strains χ7122 and IMT2125) and compared them to each other and to the reannotated APEC O1 sequence. For comparison, we also sequenced a human enterotoxigenic E. coli (ETEC) strain of the same ST23 serogroup O78 lineage. Phylogenetic analysis indicated that the APEC O78 strains were more closely related to human ST23 ETEC than to APEC O1, indicating that separation of pathotypes on the basis of their extraintestinal or diarrheagenic nature is not supported by their phylogeny. The accessory genome of APEC ST23 strains exhibited limited conservation of APEC O1 genomic islands and a distinct repertoire of virulence-associated loci. In light of this diversity, we surveyed the phenotype of 2,185 signature-tagged transposon mutants of χ7122 following intra-air sac inoculation of turkeys. This procedure identified novel APEC ST23 genes that play strain- and tissue-specific roles during infection. For example, genes mediating group 4 capsule synthesis were required for the virulence of χ7122 and were conserved in IMT2125 but absent from APEC O1. Our data reveal the genetic diversity of E. coli strains adapted to cause the same avian disease and indicate that the core genome of the ST23 lineage serves as a chassis for the evolution of E. coli strains adapted to cause avian or human disease via acquisition of distinct virulence genes.

    Infection and immunity 2013;81;3;838-49

  • The SHOCT Domain: A Widespread Domain Under-Represented in Model Organisms.

    Eberhardt RY, Bartholdson SJ, Punta M and Bateman A

    European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom ; Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.

    We have identified a new protein domain, which we have named the SHOCT domain (ort -erminal domain). This domain is widespread in bacteria with over a thousand examples. But we found it is missing from the most commonly studied model organisms, despite being present in closely related species. It's predominantly C-terminal location, co-occurrence with numerous other domains and short size is reminiscent of the Gram-positive anchor motif, however it is present in a much wider range of species. We suggest several hypotheses about the function of SHOCT, including oligomerisation and nucleic acid binding. Our initial experiments do not support its role as an oligomerisation domain.

    PloS one 2013;8;2;e57848

  • Filling out the structural map of the NTF2-like superfamily.

    Eberhardt RY, Chang Y, Bateman A, Murzin AG, Axelrod HL, Hwang WC and Aravind L

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK. re3@sanger.ac.uk.

    Background: The NTF2-like superfamily is a versatile group of protein domains sharing a common fold. The sequences of these domains are very diverse and they share no common sequence motif. These domains serve a range of different functions within the proteins in which they are found, including both catalytic and non-catalytic versions. Clues to the function of protein domains belonging to such a diverse superfamily can be gleaned from analysis of the proteins and organisms in which they are found. Results: Here we describe three protein domains of unknown function found mainly in bacteria: DUF3828, DUF3887 and DUF4878. Structures of representatives of each of these domains: BT_3511 from Bacteroides thetaiotaomicron (strain VPI-5482) [PDB:3KZT], Cj0202c from Campylobacter jejuni subsp. jejuni serotype O:2 (strain NCTC 11168) [PDB:3K7C], rumgna_01855) and RUMGNA_01855 from Ruminococcus gnavus (strain ATCC 29149) [PDB:4HYZ] have been solved by X-ray crystallography. All three domains are similar in structure and all belong to the NTF2-like superfamily. Although the function of these domains remains unknown at present, our analysis enables us to present a hypothesis concerning their role. Conclusions: Our analysis of these three protein domains suggests a potential non-catalytic ligand-binding role. This may regulate the activities of domains with which they are combined in the same polypeptide or via operonic linkages, such as signaling domains (e.g. serine/threonine protein kinase), peptidoglycan-processing hydrolases (e.g. NlpC/P60 peptidases) or nucleic acid binding domains (e.g. Zn-ribbons).

    BMC bioinformatics 2013;14;327

  • The GenoChip: a new tool for genetic anthropology.

    Elhaik E, Greenspan E, Staats S, Krahn T, Tyler-Smith C, Xue Y, Tofanelli S, Francalacci P, Cucca F, Pagani L, Jin L, Li H, Schurr TG, Greenspan B, Spencer Wells R and Genographic Consortium

    Department of Mental Health, Johns Hopkins University Bloomberg School of Public Health, USA.

    The Genographic Project is an international effort aimed at charting human migratory history. The project is nonprofit and nonmedical, and, through its Legacy Fund, supports locally led efforts to preserve indigenous and traditional cultures. Although the first phase of the project was focused on uniparentally inherited markers on the Y-chromosome and mitochondrial DNA (mtDNA), the current phase focuses on markers from across the entire genome to obtain a more complete understanding of human genetic variation. Although many commercial arrays exist for genome-wide single-nucleotide polymorphism (SNP) genotyping, they were designed for medical genetic studies and contain medically related markers that are inappropriate for global population genetic studies. GenoChip, the Genographic Project's new genotyping array, was designed to resolve these issues and enable higher resolution research into outstanding questions in genetic anthropology. The GenoChip includes ancestry informative markers obtained for over 450 human populations, an ancient human (Saqqaq), and two archaic hominins (Neanderthal and Denisovan) and was designed to identify all known Y-chromosome and mtDNA haplogroups. The chip was carefully vetted to avoid inclusion of medically relevant markers. To demonstrate its capabilities, we compared the FST distributions of GenoChip SNPs to those of two commercial arrays. Although all arrays yielded similarly shaped (inverse J) FST distributions, the GenoChip autosomal and X-chromosomal distributions had the highest mean FST, attesting to its ability to discern subpopulations. The chip performances are illustrated in a principal component analysis for 14 worldwide populations. In summary, the GenoChip is a dedicated genotyping platform for genetic anthropology. With an unprecedented number of approximately 12,000 Y-chromosomal and approximately 3,300 mtDNA SNPs and over 130,000 autosomal and X-chromosomal SNPs without any known health, medical, or phenotypic relevance, the GenoChip is a useful tool for genetic anthropology and population genetics.

    Funded by: NIMH NIH HHS: T32 MH014592; Wellcome Trust: 098051

    Genome biology and evolution 2013;5;5;1021-31

  • The 5q31 region in two African populations as a facet of natural selection by infectious diseases.

    Elhassan AA, Hussein AA, Mohamed HS, Rockett K, Kwiatkowski D, Elhassan AM and Ibrahim ME

    Unit of Disease and Diversity, Department of Molecular Biology, Institute of Endemic Diseases, University of Khartoum, Khartoum, Sudan. abjil04@yahoo.com

    Cases of extreme natural selection could lead either to rapid fixation or extinction of alleles depending on the population structure and size. It may also manifest in excess of heterozygosity and the locus concerned will be displaying such drastic features of allele change. We suspect the 5q31 in chromosome 5 to mirror situation of such extreme natural selection particularly that the region encompasses genes of type 2 cytokine known to associate with a number of infectious and non-infectious diseases. We typed two sets of single nucleotide polymorphisms (SNPS) in two populations: an initial limited set of only 4 SNP within the genes of IL-4, IL-13, IL-5 and IL-9 in 108 unrelated individuals and a replicating set of 14 SN P in 924 individuals from the same populations with disregard to relatedness. The results suggest the 5q31 area to be under intense selective pressure as indicated by marked heterozygosity independent of Linkage Disequilibrium (LD); difference in heterozygosity, allele, and haplotype frequencies between generations and departure from Hardy-Weinberg expectations (DHWE). The study area is endemic for several infectious diseases including malaria and visceral leishmaniasis (VL). Malaria caused by Plasmodiumfalciparum, however, occurs mostly with mild clinical symptoms in all ages, which makes it unlikely to account for these indices. The strong selection signals seems to emanate from recent outbreaks of VL which affected both populations to varying extent.

    Genetika 2013;49;2;279-88

  • Evaluation of the genetic overlap between osteoarthritis with body mass index and height using genome-wide association scan data.

    Elliott KS, Chapman K, Day-Williams A, Panoutsopoulou K, Southam L, Lindgren CM, GIANT consortium, Arden N, Aslam N, Birrell F, Carluke I, Carr A, Deloukas P, Doherty M, Loughlin J, McCaskie A, Ollier WE, Rai A, Ralston S, Reed MR, Spector TD, Valdes AM, Wallis GA, Wilkinson M, arcOGEN consortium and Zeggini E

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.

    Objectives: Obesity as measured by body mass index (BMI) is one of the major risk factors for osteoarthritis. In addition, genetic overlap has been reported between osteoarthritis and normal adult height variation. We investigated whether this relationship is due to a shared genetic aetiology on a genome-wide scale. Methods: We compared genetic association summary statistics (effect size, p value) for BMI and height from the GIANT consortium genome-wide association study (GWAS) with genetic association summary statistics from the arcOGEN consortium osteoarthritis GWAS. Significance was evaluated by permutation. Replication of osteoarthritis association of the highlighted signals was investigated in an independent dataset. Phenotypic information of height and BMI was accounted for in a separate analysis using osteoarthritis-free controls. Results: We found significant overlap between osteoarthritis and height (p=3.3×10(-5) for signals with p≤0.05) when the GIANT and arcOGEN GWAS were compared. For signals with p≤0.001 we found 17 shared signals between osteoarthritis and height and four between osteoarthritis and BMI. However, only one of the height or BMI signals that had shown evidence of association with osteoarthritis in the arcOGEN GWAS was also associated with osteoarthritis in the independent dataset: rs12149832, within the FTO gene (combined p=2.3×10(-5)). As expected, this signal was attenuated when we adjusted for BMI. Conclusions: We found a significant excess of shared signals between both osteoarthritis and height and osteoarthritis and BMI, suggestive of a common genetic aetiology. However, only one signal showed association with osteoarthritis when followed up in a new dataset.

    Annals of the rheumatic diseases 2013;72;6;935-41

  • Systematic evaluation of spliced alignment programs for RNA-seq data.

    Engström PG, Steijger T, Sipos B, Grant GR, Kahles A, RGASP Consortium, Rätsch G, Goldman N, Hubbard TJ, Harrow J, Guigó R and Bertone P

    1] European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK. [2].

    High-throughput RNA sequencing is an increasingly accessible method for studying gene structure and activity on a genome-wide scale. A critical step in RNA-seq data analysis is the alignment of partial transcript reads to a reference genome sequence. To assess the performance of current mapping software, we invited developers of RNA-seq aligners to process four large human and mouse RNA-seq data sets. In total, we compared 26 mapping protocols based on 11 programs and pipelines and found major performance differences between methods on numerous benchmarks, including alignment yield, basewise accuracy, mismatch and gap placement, exon junction discovery and suitability of alignments for transcript reconstruction. We observed concordant results on real and simulated RNA-seq data, confirming the relevance of the metrics employed. Future developments in RNA-seq alignment methods would benefit from improved placement of multimapped reads, balanced utilization of existing gene annotation and a reduced false discovery rate for splice junctions.

    Funded by: NHGRI NIH HHS: R01 HG006272, U41 HG007234, U54 HG004555, U54 HG004557, U54HG004555, U54HG004557; Wellcome Trust: 098051, WT09805

    Nature methods 2013;10;12;1185-91

  • A meta-analysis of genome-wide association studies identifies novel variants associated with osteoarthritis of the hip.

    Evangelou E, Kerkhof HJ, Styrkarsdottir U, Ntzani EE, Bos SD, Esko T, Evans DS, Metrustry S, Panoutsopoulou K, Ramos YF, Thorleifsson G, Tsilidis KK, arcOGEN Consortium, Arden N, Aslam N, Bellamy N, Birrell F, Blanco FJ, Carr A, Chapman K, Day-Williams AG, Deloukas P, Doherty M, Engström G, Helgadottir HT, Hofman A, Ingvarsson T, Jonsson H, Keis A, Keurentjes JC, Kloppenburg M, Lind PA, McCaskie A, Martin NG, Milani L, Montgomery GW, Nelissen RG, Nevitt MC, Nilsson PM, Ollier WE, Parimi N, Rai A, Ralston SH, Reed MR, Riancho JA, Rivadeneira F, Rodriguez-Fontenla C, Southam L, Thorsteinsdottir U, Tsezou A, Wallis GA, Wilkinson JM, Gonzalez A, Lane NE, Lohmander LS, Loughlin J, Metspalu A, Uitterlinden AG, Jonsdottir I, Stefansson K, Slagboom PE, Zeggini E, Meulenbelt I, Ioannidis JP, Spector TD, van Meurs JB and Valdes AM

    Department of Hygiene and Epidemiology, University of Ioannina Medical School, Ioannina, Greece.

    Objectives: Osteoarthritis (OA) is the most common form of arthritis with a clear genetic component. To identify novel loci associated with hip OA we performed a meta-analysis of genome-wide association studies (GWAS) on European subjects.

    Methods: We performed a two-stage meta-analysis on more than 78 000 participants. In stage 1, we synthesised data from eight GWAS whereas data from 10 centres were used for 'in silico' or 'de novo' replication. Besides the main analysis, a stratified by sex analysis was performed to detect possible sex-specific signals. Meta-analysis was performed using inverse-variance fixed effects models. A random effects approach was also used.

    Results: We accumulated 11 277 cases of radiographic and symptomatic hip OA. We prioritised eight single nucleotide polymorphism (SNPs) for follow-up in the discovery stage (4349 OA cases); five from the combined analysis, two male specific and one female specific. One locus, at 20q13, represented by rs6094710 (minor allele frequency (MAF) 4%) near the NCOA3 (nuclear receptor coactivator 3) gene, reached genome-wide significance level with p=7.9×10(-9) and OR=1.28 (95% CI 1.18 to 1.39) in the combined analysis of discovery (p=5.6×10(-8)) and follow-up studies (p=7.3×10(-4)). We showed that this gene is expressed in articular cartilage and its expression was significantly reduced in OA-affected cartilage. Moreover, two loci remained suggestive associated; rs5009270 at 7q31 (MAF 30%, p=9.9×10(-7), OR=1.10) and rs3757837 at 7p13 (MAF 6%, p=2.2×10(-6), OR=1.27 in male specific analysis).

    Conclusions: Novel genetic loci for hip OA were found in this meta-analysis of GWAS.

    Annals of the rheumatic diseases 2013

  • The DOT1L rs12982744 polymorphism is associated with osteoarthritis of the hip with genome-wide statistical significance in males.

    Evangelou E, Valdes AM, Castano-Betancourt MC, Doherty M, Doherty S, Esko T, Ingvarsson T, Ioannidis JP, Kloppenburg M, Metspalu A, Ntzani EE, Panoutsopoulou K, Slagboom PE, Southam L, Spector TD, Styrkarsdottir U, Stefanson K, Uitterlinden AG, Wheeler M, Zeggini E, Meulenbelt I, van Meurs JB and arcOGEN consortium, the TREAT-OA consortium

    1Department of Hygiene and Epidemiology, University of Ioannina Medical School, University Campus, Ioannina, Greece.

    Annals of the rheumatic diseases 2013

  • Defining the range of pathogens susceptible to ifitm3 restriction using a knockout mouse model.

    Everitt AR, Clare S, McDonald JU, Kane L, Harcourt K, Ahras M, Lall A, Hale C, Rodgers A, Young DB, Haque A, Billker O, Tregoning JS, Dougan G and Kellam P

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.

    The interferon-inducible transmembrane (IFITM) family of proteins has been shown to restrict a broad range of viruses in vitro and in vivo by halting progress through the late endosomal pathway. Further, single nucleotide polymorphisms (SNPs) in its sequence have been linked with risk of developing severe influenza virus infections in humans. The number of viruses restricted by this host protein has continued to grow since it was first demonstrated as playing an antiviral role; all of which enter cells via the endosomal pathway. We therefore sought to test the limits of antimicrobial restriction by Ifitm3 using a knockout mouse model. We showed that Ifitm3 does not impact on the restriction or pathogenesis of bacterial (Salmonella typhimurium, Citrobacter rodentium, Mycobacterium tuberculosis) or protozoan (Plasmodium berghei) pathogens, despite in vitro evidence. However, Ifitm3 is capable of restricting respiratory syncytial virus (RSV) in vivo either through directly restricting RSV cell infection, or by exerting a previously uncharacterised function controlling disease pathogenesis. This represents the first demonstration of a virus that enters directly through the plasma membrane, without the need for the endosomal pathway, being restricted by the IFITM family; therefore further defining the role of these antiviral proteins.

    PloS one 2013;8;11;e80723

  • The Role of Adiposity in Cardiometabolic Traits: A Mendelian Randomization Analysis.

    Fall T, Hägg S, Mägi R, Ploner A, Fischer K, Horikoshi M, Sarin AP, Thorleifsson G, Ladenvall C, Kals M, Kuningas M, Draisma HH, Ried JS, van Zuydam NR, Huikari V, Mangino M, Sonestedt E, Benyamin B, Nelson CP, Rivera NV, Kristiansson K, Shen HY, Havulinna AS, Dehghan A, Donnelly LA, Kaakinen M, Nuotio ML, Robertson N, de Bruijn RF, Ikram MA, Amin N, Balmforth AJ, Braund PS, Doney AS, Döring A, Elliott P, Esko T, Franco OH, Gretarsdottir S, Hartikainen AL, Heikkilä K, Herzig KH, Holm H, Hottenga JJ, Hyppönen E, Illig T, Isaacs A, Isomaa B, Karssen LC, Kettunen J, Koenig W, Kuulasmaa K, Laatikainen T, Laitinen J, Lindgren C, Lyssenko V, Läärä E, Rayner NW, Männistö S, Pouta A, Rathmann W, Rivadeneira F, Ruokonen A, Savolainen MJ, Sijbrands EJ, Small KS, Smit JH, Steinthorsdottir V, Syvänen AC, Taanila A, Tobin MD, Uitterlinden AG, Willems SM, Willemsen G, Witteman J, Perola M, Evans A, Ferrières J, Virtamo J, Kee F, Tregouet DA, Arveiler D, Amouyel P, Ferrario MM, Brambilla P, Hall AS, Heath AC, Madden PA, Martin NG, Montgomery GW, Whitfield JB, Jula A, Knekt P, Oostra B, van Duijn CM, Penninx BW, Davey Smith G, Kaprio J, Samani NJ, Gieger C, Peters A, Wichmann HE, Boomsma DI, de Geus EJ, Tuomi T, Power C, Hammond CJ, Spector TD, Lind L, Orho-Melander M, Palmer CN, Morris AD, Groop L, Järvelin MR, Salomaa V, Vartiainen E, Hofman A, Ripatti S, Metspalu A, Thorsteinsdottir U, Stefansson K, Pedersen NL, McCarthy MI, Ingelsson E, Prokopenko I and for the European Network for Genetic and Genomic Epidemiology (ENGAGE) consortium

    Molecular Epidemiology and Science for Life Laboratory, Department of Medical Sciences, Uppsala University, Uppsala, Sweden ; Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.

    Background: The association between adiposity and cardiometabolic traits is well known from epidemiological studies. Whilst the causal relationship is clear for some of these traits, for others it is not. We aimed to determine whether adiposity is causally related to various cardiometabolic traits using the Mendelian randomization approach. We used the adiposity-associated variant rs9939609 at the FTO locus as an instrumental variable (IV) for body mass index (BMI) in a Mendelian randomization design. Thirty-six population-based studies of individuals of European descent contributed to the analyses. Age- and sex-adjusted regression models were fitted to test for association between (i) rs9939609 and BMI (n = 198,502), (ii) rs9939609 and 24 traits, and (iii) BMI and 24 traits. The causal effect of BMI on the outcome measures was quantified by IV estimators. The estimators were compared to the BMI-trait associations derived from the same individuals. In the IV analysis, we demonstrated novel evidence for a causal relationship between adiposity and incident heart failure (hazard ratio, 1.19 per BMI-unit increase; 95% CI, 1.03-1.39) and replicated earlier reports of a causal association with type 2 diabetes, metabolic syndrome, dyslipidemia, and hypertension (odds ratio for IV estimator, 1.1-1.4; all p<0.05). For quantitative traits, our results provide novel evidence for a causal effect of adiposity on the liver enzymes alanine aminotransferase and gamma-glutamyl transferase and confirm previous reports of a causal effect of adiposity on systolic and diastolic blood pressure, fasting insulin, 2-h post-load glucose from the oral glucose tolerance test, C-reactive protein, triglycerides, and high-density lipoprotein cholesterol levels (all p<0.05). The estimated causal effects were in agreement with traditional observational measures in all instances except for type 2 diabetes, where the causal estimate was larger than the observational estimate (p = 0.001). Conclusions: We provide novel evidence for a causal relationship between adiposity and heart failure as well as between adiposity and increased liver enzymes. Please see later in the article for the Editors' Summary.

    PLoS medicine 2013;10;6;e1001474

  • ImmunoChip Study Implicates Antigen Presentation to T Cells in Narcolepsy.

    Faraco J, Lin L, Kornum BR, Kenny EE, Trynka G, Einen M, Rico TJ, Lichtner P, Dauvilliers Y, Arnulf I, Lecendreux M, Javidi S, Geisler P, Mayer G, Pizza F, Poli F, Plazzi G, Overeem S, Lammers GJ, Kemlink D, Sonka K, Nevsimalova S, Rouleau G, Desautels A, Montplaisir J, Frauscher B, Ehrmann L, Högl B, Jennum P, Bourgin P, Peraita-Adrados R, Iranzo A, Bassetti C, Chen WM, Concannon P, Thompson SD, Damotte V, Fontaine B, Breban M, Gieger C, Klopp N, Deloukas P, Wijmenga C, Hallmayer J, Onengut-Gumuscu S, Rich SS, Winkelmann J and Mignot E

    Center for Sleep Sciences and Medicine, Stanford University, Palo Alto, California, United States of America.

    Recent advances in the identification of susceptibility genes and environmental exposures provide broad support for a post-infectious autoimmune basis for narcolepsy/hypocretin (orexin) deficiency. We genotyped loci associated with other autoimmune and inflammatory diseases in 1,886 individuals with hypocretin-deficient narcolepsy and 10,421 controls, all of European ancestry, using a custom genotyping array (ImmunoChip). Three loci located outside the Human Leukocyte Antigen (HLA) region on chromosome 6 were significantly associated with disease risk. In addition to a strong signal in the T cell receptor alpha (TRA@), variants in two additional narcolepsy loci, Cathepsin H () and Tumor necrosis factor (ligand) superfamily member 4 (, also called ), attained genome-wide significance. These findings underline the importance of antigen presentation by HLA Class II to T cells in the pathophysiology of this autoimmune disease.

    PLoS genetics 2013;9;2;e1003270

  • A method for selectively enriching microbial DNA from contaminating vertebrate host DNA.

    Feehery GR, Yigit E, Oyola SO, Langhorst BW, Schmidt VT, Stewart FJ, Dimalanta ET, Amaral-Zettler LA, Davis T, Quail MA and Pradhan S

    New England Biolabs Inc., Ipswich, Massachusetts, United States of America.

    DNA samples derived from vertebrate skin, bodily cavities and body fluids contain both host and microbial DNA; the latter often present as a minor component. Consequently, DNA sequencing of a microbiome sample frequently yields reads originating from the microbe(s) of interest, but with a vast excess of host genome-derived reads. In this study, we used a methyl-CpG binding domain (MBD) to separate methylated host DNA from microbial DNA based on differences in CpG methylation density. MBD fused to the Fc region of a human antibody (MBD-Fc) binds strongly to protein A paramagnetic beads, forming an effective one-step enrichment complex that was used to remove human or fish host DNA from bacterial and protistan DNA for subsequent sequencing and analysis. We report enrichment of DNA samples from human saliva, human blood, a mock malaria-infected blood sample and a black molly fish. When reads were mapped to reference genomes, sequence reads aligning to host genomes decreased 50-fold, while bacterial and Plasmodium DNA sequences reads increased 8-11.5-fold. The Shannon-Wiener diversity index was calculated for 149 bacterial species in saliva before and after enrichment. Unenriched saliva had an index of 4.72, while the enriched sample had an index of 4.80. The similarity of these indices demonstrates that bacterial species diversity and relative phylotype abundance remain conserved in enriched samples. Enrichment using the MBD-Fc method holds promise for targeted microbiome sequence analysis across a broad range of sample types.

    PloS one 2013;8;10;e76096

  • Reprogramming by cell fusion: boosted by tets.

    Ficz G and Reik W

    Epigenetics Programme, The Babraham Institute, Cambridge CB22 3AT, UK.

    Pluripotent cells, when fused with somatic cells, have the dominant ability to reprogram the somatic genome. Work by Piccolo et al. (2013) shows that the Tet1 and Tet2 hydroxylases are important for DNA methylation reprogramming of pluripotency genes and parental imprints.

    Molecular cell 2013;49;6;1017-8

  • FGF Signaling Inhibition in ESCs Drives Rapid Genome-wide Demethylation to the Epigenetic Ground State of Pluripotency.

    Ficz G, Hore TA, Santos F, Lee HJ, Dean W, Arand J, Krueger F, Oxley D, Paul YL, Walter J, Cook SJ, Andrews S, Branco MR and Reik W

    Epigenetics Programme, The Babraham Institute, Cambridge, CB22 3AT, UK. Electronic address: gabriella.ficz@babraham.ac.uk.

    Genome-wide erasure of DNA methylation takes place in primordial germ cells (PGCs) and early embryos and is linked with pluripotency. Inhibition of Erk1/2 and Gsk3β signaling in mouse embryonic stem cells (ESCs) by small-molecule inhibitors (called 2i) has recently been shown to induce hypomethylation. We show by whole-genome bisulphite sequencing that 2i induces rapid and genome-wide demethylation on a scale and pattern similar to that in migratory PGCs and early embryos. Major satellites, intracisternal A particles (IAPs), and imprinted genes remain relatively resistant to erasure. Demethylation involves oxidation of 5-methylcytosine (5mC) to 5-hydroxymethylcytosine (5hmC), impaired maintenance of 5mC and 5hmC, and repression of the de novo methyltransferases (Dnmt3a and Dnmt3b) and Dnmt3L. We identify a Prdm14- and Nanog-binding cis-acting regulatory region in Dnmt3b that is highly responsive to signaling. These insights provide a framework for understanding how signaling pathways regulate reprogramming to an epigenetic ground state of pluripotency.

    Cell stem cell 2013

  • Global analysis of the sporulation pathway of Clostridium difficile.

    Fimlaid KA, Bond JP, Schutz KC, Putnam EE, Leung JM, Lawley TD and Shen A

    Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont, USA.

    The Gram-positive, spore-forming pathogen Clostridium difficile is the leading definable cause of healthcare-associated diarrhea worldwide. C. difficile infections are difficult to treat because of their frequent recurrence, which can cause life-threatening complications such as pseudomembranous colitis. The spores of C. difficile are responsible for these high rates of recurrence, since they are the major transmissive form of the organism and resistant to antibiotics and many disinfectants. Despite the importance of spores to the pathogenesis of C. difficile, little is known about their composition or formation. Based on studies in Bacillus subtilis and other Clostridium spp., the sigma factors σ(F), σ(E), σ(G), and σ(K) are predicted to control the transcription of genes required for sporulation, although their specific functions vary depending on the organism. In order to determine the roles of σ(F), σ(E), σ(G), and σ(K) in regulating C. difficile sporulation, we generated loss-of-function mutations in genes encoding these sporulation sigma factors and performed RNA-Sequencing to identify specific sigma factor-dependent genes. This analysis identified 224 genes whose expression was collectively activated by sporulation sigma factors: 183 were σ(F)-dependent, 169 were σ(E)-dependent, 34 were σ(G)-dependent, and 31 were σ(K)-dependent. In contrast with B. subtilis, C. difficile σ(E) was dispensable for σ(G) activation, σ(G) was dispensable for σ(K) activation, and σ(F) was required for post-translationally activating σ(G). Collectively, these results provide the first genome-wide transcriptional analysis of genes induced by specific sporulation sigma factors in the Clostridia and highlight that diverse mechanisms regulate sporulation sigma factor activity in the Firmicutes.

    Funded by: NCRR NIH HHS: P20RR021905; NIGMS NIH HHS: P30 GM103498, R00GM092934

    PLoS genetics 2013;9;8;e1003660

  • EMu: probabilistic inference of mutational processes and their localization in the cancer genome.

    Fischer A, Illingworth CJ, Campbell PJ and Mustonen V

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, CB10 1SA, Hinxton, Cambridge, UK. vm5@sanger.ac.uk.

    The spectrum of mutations discovered in cancer genomes can be explained by the activity of a few elementary mutational processes. We present a novel probabilistic method, EMu, to infer the mutational signatures of these processes from a collection of sequenced tumors. EMu naturally incorporates the tumor-specific opportunity for different mutation types according to sequence composition. Applying EMu to breast cancer data, we derive detailed maps of the activity of each process, both genome-wide and within specific local regions of the genome. Our work provides new opportunities to study the mutational processes underlying cancer development. EMu is available at http://www.sanger.ac.uk/resources/software/emu/.

    Genome biology 2013;14;4;R39

  • Ensembl 2013.

    Flicek P, Ahmed I, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, Fitzgerald S, Gil L, García-Girón C, Gordon L, Hourlier T, Hunt S, Juettemann T, Kähäri AK, Keenan S, Komorowska M, Kulesha E, Longden I, Maurel T, McLaren WM, Muffato M, Nag R, Overduin B, Pignatelli M, Pritchard B, Pritchard E, Riat HS, Ritchie GR, Ruffier M, Schuster M, Sheppard D, Sobral D, Taylor K, Thormann A, Trevanion S, White S, Wilder SP, Aken BL, Birney E, Cunningham F, Dunham I, Harrow J, Herrero J, Hubbard TJ, Johnson N, Kinsella R, Parker A, Spudich G, Yates A, Zadissa A and Searle SM

    European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK. flicek@ebi.ac.uk

    The Ensembl project (http://www.ensembl.org) provides genome information for sequenced chordate genomes with a particular focus on human, mouse, zebrafish and rat. Our resources include evidenced-based gene sets for all supported species; large-scale whole genome multiple species alignments across vertebrates and clade-specific alignments for eutherian mammals, primates, birds and fish; variation data resources for 17 species and regulation annotations based on ENCODE and other data sets. Ensembl data are accessible through the genome browser at http://www.ensembl.org and through other tools and programmatic interfaces.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/I025506/1; NHGRI NIH HHS: U01HG004695, U41HG006104, U54HG004563; Wellcome Trust: 095908, WT062023, WT079643

    Nucleic acids research 2013;41;Database issue;D48-55

  • Spindle checkpoint deficiency is tolerated by murine epidermal cells but not hair follicle stem cells.

    Foijer F, Ditommaso T, Donati G, Hautaviita K, Xie SZ, Heath E, Smyth I, Watt FM, Sorger PK and Bradley A

    Mouse Genomics, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, United Kingdom.

    The spindle assembly checkpoint (SAC) ensures correct chromosome segregation during mitosis by preventing aneuploidy, an event that is detrimental to the fitness and survival of normal cells but oncogenic in tumor cells. Deletion of SAC genes is incompatible with early mouse development, and RNAi-mediated depletion of SAC components in cultured cells results in rapid death. Here we describe the use of a conditional KO of mouse Mad2, an essential component of the SAC signaling cascade, as a means to selectively induce chromosome instability and aneuploidy in the epidermis of the skin. We observe that SAC inactivation is tolerated by interfollicular epidermal cells but results in depletion of hair follicle bulge stem cells. Eventually, a histologically normal epidermis develops within ∼1 mo after birth, albeit without any hair. Mad2-deficient cells in this epidermis exhibited abnormal transcription of metabolic genes, consistent with aneuploid cell state. Hair follicle bulge stem cells were completely absent, despite the continued presence of rudimentary hair follicles. These data demonstrate that different cell lineages within a single tissue respond differently to chromosome instability: some proliferating cell lineages can survive, but stem cells are highly sensitive.

    Proceedings of the National Academy of Sciences of the United States of America 2013

  • Genome Sequence of Klebsiella pneumoniae Ecl8, a Reference Strain for Targeted Genetic Manipulation.

    Fookes M, Yu J, De Majumdar S, Thomson N and Schneiders T

    Wellcome Trust, Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    We report the genome sequence of Klebsiella pneumoniae subsp. pneumoniae Ecl8, a spontaneous streptomycin-resistant mutant of strain ECL4, derived from NCIB 418. K. pneumoniae Ecl8 has been shown to be genetically tractable for targeted gene deletion strategies and so provides a platform for in-depth analyses of this species.

    Genome announcements 2013;1;1

  • Global Analysis of Apicomplexan Protein S-Acyl Transferases Reveals an Enzyme Essential for Invasion.

    Frénal K, Tay CL, Mueller C, Bushell ES, Jia Y, Graindorge A, Billker O, Rayner JC and Soldati-Favre D

    Department of Microbiology and Molecular Medicine, CMU, University of Geneva, Rue Michel-Servet 1, CH-1211, Geneva 4, Switzerland.

    The advent of techniques to study palmitoylation on a whole proteome scale has revealed that it is an important reversible modification that plays a role in regulating multiple biological processes. Palmitoylation can control the affinity of a protein for lipid membranes, which allows it to impact protein trafficking, stability, folding, signalling and interactions. The publication of the palmitome of the schizont stage of Plasmodium falciparum implicated a role for palmitoylation in host cell invasion, protein export and organelle biogenesis. However, nothing is known so far about the repertoire of protein S-acyl transferases (PATs) that catalyse this modification in Apicomplexa. We undertook a comprehensive analysis of the repertoire of Asp-His-His-Cys cysteine-rich domain (DHHC-CRD) PAT family in Toxoplasma gondii and Plasmodium berghei by assessing their localization and essentiality. Unlike functional redundancies reported in other eukaryotes, some apicomplexan-specific DHHCs are essential for parasite growth, and several are targeted to organelles unique to this phylum. Of particular interest is DHHC7, which localizes to rhoptry organelles in all parasites tested, including the major human pathogen P. falciparum. TgDHHC7 interferes with the localization of the rhoptry palmitoylated protein TgARO and affects the apical positioning of the rhoptry organelles. This PAT has a major impact on T. gondii host cell invasion, but not on the parasite's ability to egress.

    Traffic (Copenhagen, Denmark) 2013

  • Clonal Expansion Analysis of Transposon Insertions by High-Throughput Sequencing Identifies Candidate Cancer Genes in a PiggyBac Mutagenesis Screen.

    Friedel RH, Friedel CC, Bonfert T, Shi R, Rad R and Soriano P

    Department of Neuroscience, Department of Developmental and Regenerative Biology, Department of Neurosurgery, Icahn School of Medicine at Mount, Sinai, New York, New York, United States of America.

    Somatic transposon mutagenesis in mice is an efficient strategy to investigate the genetic mechanisms of tumorigenesis. The identification of tumor driving transposon insertions traditionally requires the generation of large tumor cohorts to obtain information about common insertion sites. Tumor driving insertions are also characterized by their clonal expansion in tumor tissue, a phenomenon that is facilitated by the slow and evolving transformation process of transposon mutagenesis. We describe here an improved approach for the detection of tumor driving insertions that assesses the clonal expansion of insertions by quantifying the relative proportion of sequence reads obtained in individual tumors. To this end, we have developed a protocol for insertion site sequencing that utilizes acoustic shearing of tumor DNA and Illumina sequencing. We analyzed various solid tumors generated by PiggyBac mutagenesis and for each tumor >10(6) reads corresponding to >10(4) insertion sites were obtained. In each tumor, 9 to 25 insertions stood out by their enriched sequence read frequencies when compared to frequencies obtained from tail DNA controls. These enriched insertions are potential clonally expanded tumor driving insertions, and thus identify candidate cancer genes. The candidate cancer genes of our study comprised many established cancer genes, but also novel candidate genes such as Mastermind-like1 (Mamld1) and Diacylglycerolkinase delta (Dgkd). We show that clonal expansion analysis by high-throughput sequencing is a robust approach for the identification of candidate cancer genes in insertional mutagenesis screens on the level of individual tumors.

    PloS one 2013;8;8;e72338

  • A CpG Mutational Hotspot in a ONECUT Binding Site Accounts for the Prevalent Variant of Hemophilia B Leyden.

    Funnell AP, Wilson MD, Ballester B, Mak KS, Burdach J, Magan N, Pearson RC, Lemaigre FP, Stowell KM, Odom DT, Flicek P and Crossley M

    School of Biotechnology and Biomolecular Sciences, University of New South Wales, Kensington NSW 2052, Australia.

    Hemophilia B, or the "royal disease," arises from mutations in coagulation factor IX (F9). Mutations within the F9 promoter are associated with a remarkable hemophilia B subtype, termed hemophilia B Leyden, in which symptoms ameliorate after puberty. Mutations at the -5/-6 site (nucleotides -5 and -6 relative to the transcription start site, designated +1) account for the majority of Leyden cases and have been postulated to disrupt the binding of a transcriptional activator, the identity of which has remained elusive for more than 20 years. Here, we show that ONECUT transcription factors (ONECUT1 and ONECUT2) bind to the -5/-6 site. The various hemophilia B Leyden mutations that have been reported in this site inhibit ONECUT binding to varying degrees, which correlate well with their associated clinical severities. In addition, expression of F9 is crucially dependent on ONECUT factors in vivo, and as such, mice deficient in ONECUT1, ONECUT2, or both exhibit depleted levels of F9. Taken together, our findings establish ONECUT transcription factors as the missing hemophilia B Leyden regulators that operate through the -5/-6 site.

    American journal of human genetics 2013;92;3;460-7

  • Global properties and functional complexity of human gene regulatory variation.

    Gaffney DJ

    Wellcome Trust Sanger Institute, Cambridge, United Kingdom. dg13@sanger.ac.uk

    Identification and functional interpretation of gene regulatory variants is a major focus of modern genomics. The application of genetic mapping to molecular and cellular traits has enabled the detection of regulatory variation on genome-wide scales and revealed an enormous diversity of regulatory architecture in humans and other species. In this review I summarise the insights gained and questions raised by a decade of genetic mapping of gene expression variation. I discuss recent extensions of this approach using alternative molecular phenotypes that have revealed some of the biological mechanisms that drive gene expression variation between individuals. Finally, I highlight outstanding problems and future directions for development.

    Funded by: Wellcome Trust: 098051

    PLoS genetics 2013;9;5;e1003501

  • An elephantine viral problem.

    Gall A and Palser A

    This month's Genome Watch highlights how deep sequencing was used to generate the first full genomes of herpesviruses associated with a fatal disease in elephants.

    Nature reviews. Microbiology 2013;11;8;512

  • Restriction of V3 region sequence divergence in the HIV-1 envelope gene during antiretroviral treatment in a cohort of recent seroconverters.

    Gall A, Kaye S, Hué S, Bonsall D, Rance R, Baillie GJ, Fidler SJ, Weber JN, McClure MO, Kellam P and SPARTAC Trial Investigators

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Background: Dynamic changes in Human Immunodeficiency Virus 1 (HIV-1) sequence diversity and divergence are associated with immune control during primary infection and progression to AIDS. Consensus sequencing or single genome amplification sequencing of the HIV-1 envelope (env) gene, in particular the variable (V) regions, is used as a marker for HIV-1 genome diversity, but population diversity is only minimally, or semi-quantitatively sampled using these methods.

    Results: Here we use second generation deep sequencing to determine inter-and intra-patient sequence heterogeneity and to quantify minor variants in a cohort of individuals either receiving or not receiving antiretroviral treatment following seroconversion; the SPARTAC trial. We show, through a cross-sectional study of sequence diversity of the env V3 in 30 antiretroviral-naive patients during primary infection that considerable population structure diversity exists, with some individuals exhibiting highly constrained plasma virus diversity. Diversity was independent of clinical markers (viral load, time from seroconversion, CD4 cell count) of infection. Serial sampling over 60 weeks of non-treated individuals that define three initially different diversity profiles showed that complex patterns of continuing HIV-1 sequence diversification and divergence could be readily detected. Evidence for minor sequence turnover, emergence of new variants and re-emergence of archived variants could be inferred from this analysis. Analysis of viral divergence over the same time period in patients who received short (12 weeks, ART12) or long course antiretroviral therapy (48 weeks, ART48) and a non-treated control group revealed that ART48 successfully suppressed viral divergence while ART12 did not have a significant effect.

    Conclusions: Deep sequencing is a sensitive and reliable method for investigating the diversity of the env V3 as an important component of HIV-1 genome diversity. Detailed insights into the complex early intra-patient dynamics of env V3 diversity and divergence were explored in antiretroviral-naïve recent seroconverters. Long course antiretroviral therapy, initiated soon after seroconversion and administered for 48 weeks, restricts HIV-1 divergence significantly. The effect of ART12 and ART48 on clinical markers of HIV infection and progression is currently investigated in the SPARTAC trial.

    Funded by: NIAID NIH HHS: R01 AI046995; Wellcome Trust

    Retrovirology 2013;10;8

  • Reprogramming to Pluripotency Using Designer TALE Transcription Factors Targeting Enhancers.

    Gao X, Yang J, Tsang JC, Ooi J, Wu D and Liu P

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.

    The modular DNA recognition code of the transcription-activator-like effectors (TALEs) from plant pathogenic bacterial genus Xanthomonas provides a powerful genetic tool to create designer transcription factors (dTFs) targeting specific DNA sequences for manipulating gene expression. Previous studies have suggested critical roles of enhancers in gene regulation and reprogramming. Here, we report dTF activator targeting the distal enhancer of the Pou5f1 (Oct4) locus induces epigenetic changes, reactivates its expression, and substitutes exogenous OCT4 in reprogramming mouse embryonic fibroblast cells (MEFs) to induced pluripotent stem cells (iPSCs). Similarly, dTF activator targeting a Nanog enhancer activates Nanog expression and reprograms epiblast stem cells (EpiSCs) to iPSCs. Conversely, dTF repressors targeting the same genetic elements inhibit expression of these loci, and effectively block reprogramming. This study indicates that dTFs targeting specific enhancers can be used to study other biological processes such as transdifferentiation or directed differentiation of stem cells.

    Stem cell reports 2013;1;2;183-97

  • Genome-wide haplotype analysis of cis expression quantitative trait Loci in monocytes.

    Garnier S, Truong V, Brocheton J, Zeller T, Rovital M, Wild PS, Ziegler A, Cardiogenics Consortium, Munzel T, Tiret L, Blankenberg S, Deloukas P, Erdmann J, Hengstenberg C, Samani NJ, Schunkert H, Ouwehand WH, Goodall AH, Cambien F and Trégouët DA

    INSERM, UMR_S 937, Pierre and Marie Curie University (UPMC, Paris 6), Paris, France ; ICAN Institute for Cardiometabolism and Nutrition, Pierre and Marie Curie University (UPMC, Paris 6), Paris, France.

    In order to assess whether gene expression variability could be influenced by several SNPs acting in cis, either through additive or more complex haplotype effects, a systematic genome-wide search for cis haplotype expression quantitative trait loci (eQTL) was conducted in a sample of 758 individuals, part of the Cardiogenics Transcriptomic Study, for which genome-wide monocyte expression and GWAS data were available. 19,805 RNA probes were assessed for cis haplotypic regulation through investigation of ∼2,1×10(9) haplotypic combinations. 2,650 probes demonstrated haplotypic p-values >10(4)-fold smaller than the best single SNP p-value. Replication of significant haplotype effects were tested for 412 probes for which SNPs (or proxies) that defined the detected haplotypes were available in the Gutenberg Health Study composed of 1,374 individuals. At the Bonferroni correction level of 1.2×10(-4) (∼0.05/412), 193 haplotypic signals replicated. 1000G imputation was then conducted, and 105 haplotypic signals still remained more informative than imputed SNPs. In-depth analysis of these 105 cis eQTL revealed that at 76 loci genetic associations were compatible with additive effects of several SNPs, while for the 29 remaining regions data could be compatible with a more complex haplotypic pattern. As 24 of the 105 cis eQTL have previously been reported to be disease-associated loci, this work highlights the need for conducting haplotype-based and 1000G imputed cis eQTL analysis before commencing functional studies at disease-associated loci.

    PLoS genetics 2013;9;1;e1003240

  • Sleeping Beauty mutagenesis in a mouse medulloblastoma model defines networks that discriminate between human molecular subgroups.

    Genovesi LA, Ng CG, Davis MJ, Remke M, Taylor MD, Adams DJ, Rust AG, Ward JM, Ban KH, Jenkins NA, Copeland NG and Wainwright BJ

    Institute for Molecular Bioscience, The University of Queensland, St. Lucia, QLD 4072, Australia.

    The Sleeping Beauty (SB) transposon mutagenesis screen is a powerful tool to facilitate the discovery of cancer genes that drive tumorigenesis in mouse models. In this study, we sought to identify genes that functionally cooperate with sonic hedgehog signaling to initiate medulloblastoma (MB), a tumor of the cerebellum. By combining SB mutagenesis with Patched1 heterozygous mice (Ptch1(lacZ/+)), we observed an increased frequency of MB and decreased tumor-free survival compared with Ptch1(lacZ/+) controls. From an analysis of 85 tumors, we identified 77 common insertion sites that map to 56 genes potentially driving increased tumorigenesis. The common insertion site genes identified in the mutagenesis screen were mapped to human orthologs, which were used to select probes and corresponding expression data from an independent set of previously described human MB samples, and surprisingly were capable of accurately clustering known molecular subgroups of MB, thereby defining common regulatory networks underlying all forms of MB irrespective of subgroup. We performed a network analysis to discover the likely mechanisms of action of subnetworks and used an in vivo model to confirm a role for a highly ranked candidate gene, Nfia, in promoting MB formation. Our analysis implicates candidate cancer genes in the deregulation of apoptosis and translational elongation, and reveals a strong signature of transcriptional regulation that will have broad impact on expression programs in MB. These networks provide functional insights into the complex biology of human MB and identify potential avenues for intervention common to all clinical subgroups.

    Proceedings of the National Academy of Sciences of the United States of America 2013;110;46;E4325-34

  • Gene expression changes with age in skin, adipose tissue, blood and brain.

    Glass D, Viñuela A, Davies MN, Ramasamy A, Parts L, Knowles D, Brown AA, Hedman AK, Small KS, Buil A, Grundberg E, Nica AC, Meglio P, Nestle FO, Ryten M, the UK Brain Expression consortium, the MuTHER consortium, Durbin R, McCarthy MI, Deloukas P, Dermitzakis ET, Weale ME, Bataille V and Spector TD

    Department of Twin Research and Genetic Epidemiology, King's College London, St Thomas' Campus, Westminster Bridge Road, London SE1 7EH, UK. bataille@doctors.org.uk.

    Background: Previous studies have demonstrated that gene expression levels change with age. These changes are hypothesized to influence the aging rate of an individual. We analyzed gene expression changes with age in abdominal skin, subcutaneous adipose tissue and lymphoblastoid cell lines in 856 female twins in the age range of 39-85 years. Additionally, we investigated genotypic variants involved in genotype-by-age interactions to understand how the genomic regulation of gene expression alters with age. Results: Using a linear mixed model, differential expression with age was identified in 1,672 genes in skin and 188 genes in adipose tissue. Only two genes expressed in lymphoblastoid cell lines showed significant changes with age. Genes significantly regulated by age were compared with expression profiles in 10 brain regions from 100 postmortem brains aged 16 to 83 years. We identified only one age-related gene common to the three tissues. There were 12 genes that showed differential expression with age in both skin and brain tissue and three common to adipose and brain tissues. Conclusions: Skin showed the most age-related gene expression changes of all the tissues investigated, with many of the genes being previously implicated in fatty acid metabolism, mitochondrial activity, cancer and splicing. A significant proportion of age-related changes in gene expression appear to be tissue-specific with only a few genes sharing an age effect in expression across tissues. More research is needed to improve our understanding of the genetic influences on aging and the relationship with age-related diseases.

    Genome biology 2013;14;7;R75

  • Discovery and refinement of loci associated with lipid levels.

    Global Lipids Genetics Consortium, Willer CJ, Schmidt EM, Sengupta S, Peloso GM, Gustafsson S, Kanoni S, Ganna A, Chen J, Buchkovich ML, Mora S, Beckmann JS, Bragg-Gresham JL, Chang HY, Demirkan A, Den Hertog HM, Do R, Donnelly LA, Ehret GB, Esko T, Feitosa MF, Ferreira T, Fischer K, Fontanillas P, Fraser RM, Freitag DF, Gurdasani D, Heikkilä K, Hyppönen E, Isaacs A, Jackson AU, Johansson A, Johnson T, Kaakinen M, Kettunen J, Kleber ME, Li X, Luan J, Lyytikäinen LP, Magnusson PK, Mangino M, Mihailov E, Montasser ME, Müller-Nurasyid M, Nolte IM, O'Connell JR, Palmer CD, Perola M, Petersen AK, Sanna S, Saxena R, Service SK, Shah S, Shungin D, Sidore C, Song C, Strawbridge RJ, Surakka I, Tanaka T, Teslovich TM, Thorleifsson G, Van den Herik EG, Voight BF, Volcik KA, Waite LL, Wong A, Wu Y, Zhang W, Absher D, Asiki G, Barroso I, Been LF, Bolton JL, Bonnycastle LL, Brambilla P, Burnett MS, Cesana G, Dimitriou M, Doney AS, Döring A, Elliott P, Epstein SE, Eyjolfsson GI, Gigante B, Goodarzi MO, Grallert H, Gravito ML, Groves CJ, Hallmans G, Hartikainen AL, Hayward C, Hernandez D, Hicks AA, Holm H, Hung YJ, Illig T, Jones MR, Kaleebu P, Kastelein JJ, Khaw KT, Kim E, Klopp N, Komulainen P, Kumari M, Langenberg C, Lehtimäki T, Lin SY, Lindström J, Loos RJ, Mach F, McArdle WL, Meisinger C, Mitchell BD, Müller G, Nagaraja R, Narisu N, Nieminen TV, Nsubuga RN, Olafsson I, Ong KK, Palotie A, Papamarkou T, Pomilla C, Pouta A, Rader DJ, Reilly MP, Ridker PM, Rivadeneira F, Rudan I, Ruokonen A, Samani N, Scharnagl H, Seeley J, Silander K, Stancáková A, Stirrups K, Swift AJ, Tiret L, Uitterlinden AG, van Pelt LJ, Vedantam S, Wainwright N, Wijmenga C, Wild SH, Willemsen G, Wilsgaard T, Wilson JF, Young EH, Zhao JH, Adair LS, Arveiler D, Assimes TL, Bandinelli S, Bennett F, Bochud M, Boehm BO, Boomsma DI, Borecki IB, Bornstein SR, Bovet P, Burnier M, Campbell H, Chakravarti A, Chambers JC, Chen YD, Collins FS, Cooper RS, Danesh J, Dedoussis G, de Faire U, Feranil AB, Ferrières J, Ferrucci L, Freimer NB, Gieger C, Groop LC, Gudnason V, Gyllensten U, Hamsten A, Harris TB, Hingorani A, Hirschhorn JN, Hofman A, Hovingh GK, Hsiung CA, Humphries SE, Hunt SC, Hveem K, Iribarren C, Järvelin MR, Jula A, Kähönen M, Kaprio J, Kesäniemi A, Kivimaki M, Kooner JS, Koudstaal PJ, Krauss RM, Kuh D, Kuusisto J, Kyvik KO, Laakso M, Lakka TA, Lind L, Lindgren CM, Martin NG, März W, McCarthy MI, McKenzie CA, Meneton P, Metspalu A, Moilanen L, Morris AD, Munroe PB, Njølstad I, Pedersen NL, Power C, Pramstaller PP, Price JF, Psaty BM, Quertermous T, Rauramaa R, Saleheen D, Salomaa V, Sanghera DK, Saramies J, Schwarz PE, Sheu WH, Shuldiner AR, Siegbahn A, Spector TD, Stefansson K, Strachan DP, Tayo BO, Tremoli E, Tuomilehto J, Uusitupa M, van Duijn CM, Vollenweider P, Wallentin L, Wareham NJ, Whitfield JB, Wolffenbuttel BH, Ordovas JM, Boerwinkle E, Palmer CN, Thorsteinsdottir U, Chasman DI, Rotter JI, Franks PW, Ripatti S, Cupples LA, Sandhu MS, Rich SS, Boehnke M, Deloukas P, Kathiresan S, Mohlke KL, Ingelsson E and Abecasis GR

    1] Department of Internal Medicine, Division of Cardiovascular Medicine, University of Michigan, Ann Arbor, Michigan, USA. [2] Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA. [3] Department of Human Genetics, University of Michigan, Ann Arbor, Michigan, USA. [4] Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA. [5] [6].

    Levels of low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, triglycerides and total cholesterol are heritable, modifiable risk factors for coronary artery disease. To identify new loci and refine known loci influencing these lipids, we examined 188,577 individuals using genome-wide and custom genotyping arrays. We identify and annotate 157 loci associated with lipid levels at P < 5 × 10(-8), including 62 loci not previously associated with lipid levels in humans. Using dense genotyping in individuals of European, East Asian, South Asian and African ancestry, we narrow association signals in 12 loci. We find that loci associated with blood lipid levels are often associated with cardiovascular and metabolic traits, including coronary artery disease, type 2 diabetes, blood pressure, waist-hip ratio and body mass index. Our results demonstrate the value of using genetic data from individuals of diverse ancestry and provide insights into the biological mechanisms regulating blood lipids to guide future genetic, biological and therapeutic research.

    Funded by: British Heart Foundation: PG/08/094/26019, RG/08/008/25291, RG/08/014/24067; Chief Scientist Office: CZB/4/672, CZB/4/710; Medical Research Council: G0801566, G0901213, G1000143, MC_U106179471, MC_U106179472, MC_U106188470, MC_U123092720, MC_U950080926; NCATS NIH HHS: UL1 TR000124; NHLBI NIH HHS: R00 HL094535, R01 HL105756, R01 HL109946, U01 HL069757; NIDDK NIH HHS: P30 DK063491, P30 DK072488, R01 DK072193; Wellcome Trust: 090532

    Nature genetics 2013;45;11;1274-83

  • Clonal analyses reveal associations of JAK2V617F homozygosity with hematologic features, age and gender in polycythemia vera and essential thrombocythemia.

    Godfrey AL, Chen E, Pagano F, Silber Y, Campbell PJ and Green AR

    arg1000@cam.ac.uk.

    Subclones homozygous for JAK2V617F are more common and larger in patients with polycythemia vera compared to essential thrombocythemia, but their role in determining phenotype remains unclear. We genotyped 4564 erythroid colonies from 59 patients with polycythemia vera or essential thrombocythemia to investigate whether the proportion of JAK2V617F -homozygous precursors, compared to heterozygous precursors, is associated with clinical or demographic features. In polycythemia vera, a higher proportion of homozygous-mutant precursors was associated with more extreme blood counts at diagnosis, consistent with a causal role for homozygosity in polycythemia vera pathogenesis. Larger numbers of homozygous-mutant colonies were associated with older age, and with male gender in polycythemia vera but female gender in essential thrombocythemia. These results suggest that age promotes development or expansion of homozygous-mutant clones and that gender modulates the phenotypic consequences of JAK2V617F homozygosity, thus providing a potential explanation for the long-standing observations of a preponderance of men with polycythemia vera but of women with essential thrombocythemia.

    Haematologica 2013;98;5;718-21

  • Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene.

    Gonzàlez-Porta M, Frankish A, Rung J, Harrow J and Brazma A

    European Molecular Biology Laboratory - European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom. brazma@ebi.ac.uk.

    Background: RNA sequencing has opened new avenues for the study of transcriptome composition. Significant evidence has accumulated showing that the human transcriptome contains in excess of a hundred thousand different transcripts. However, it is still not clear to what extent this diversity prevails when considering the relative abundances of different transcripts from the same gene. Results: Here we show that, in a given condition, most protein coding genes have one major transcript expressed at significantly higher level than others, that in human tissues the major transcripts contribute almost 85 percent to the total mRNA from protein coding loci, and that often the same major transcript is expressed in many tissues. We detect a high degree of overlap between the set of major transcripts and a recently published set of alternatively spliced transcripts that are predicted to be translated utilizing proteomic data. Thus, we hypothesize that although some minor transcripts may play a functional role, the major ones are likely to be the main contributors to the proteome. However, we still detect a non-negligible fraction of protein coding genes for which the major transcript does not code a protein. Conclusions: Overall, our findings suggest that the transcriptome from protein coding loci is dominated by one transcript per gene and that not all the transcripts that contribute to transcriptome diversity are equally likely to contribute to protein diversity. This observation can help to prioritize candidate targets in proteomics research and to predict the functional impact of the detected changes in variation studies.

    Genome biology 2013;14;7;R70

  • Mutations in C10orf11, Encoding a Melanocyte-Differentiation Gene, Cause Autosomal-Recessive Albinism.

    Grønskov K, Dooley CM, Ostergaard E, Kelsh RN, Hansen L, Levesque MP, Vilhelmsen K, Møllgård K, Stemple DL and Rosenberg T

    Applied Human Molecular Genetics, Kennedy Center, Copenhagen University Hospital, Rigshospitalet, DK-2100 Copenhagen, Denmark; Department of Cellular and Molecular Medicine, University of Copenhagen, DK-2200 Copenhagen, Denmark. Electronic address: karen.groenskov@regionh.dk.

    Autosomal-recessive albinism is a hypopigmentation disorder with a broad phenotypic range. A substantial fraction of individuals with albinism remain genetically unresolved, and it has been hypothesized that more genes are to be identified. By using homozygosity mapping of an inbred Faroese family, we identified a 3.5 Mb homozygous region (10q22.2-q22.3) on chromosome 10. The region contains five protein-coding genes, and sequencing of one of these, C10orf11, revealed a nonsense mutation that segregated with the disease and showed a recessive inheritance pattern. Investigation of additional albinism-affected individuals from the Faroe Islands revealed that five out of eight unrelated affected persons had the nonsense mutation in C10orf11. Screening of a cohort of autosomal-recessive-albinism-affected individuals residing in Denmark showed a homozygous 1 bp duplication in C10orf11 in an individual originating from Lithuania. Immunohistochemistry showed localization of C10orf11 in melanoblasts and melanocytes in human fetal tissue, but no localization was seen in retinal pigment epithelial cells. Knockdown of the zebrafish (Danio rerio) homolog with the use of morpholinos resulted in substantially decreased pigmentation and a reduction of the apparent number of pigmented melanocytes. The morphant phenotype was rescued by wild-type C10orf11, but not by mutant C10orf11. In conclusion, we have identified a melanocyte-differentiation gene, C10orf11, which when mutated causes autosomal-recessive albinism in humans.

    American journal of human genetics 2013

  • Examination of the relationship between variation at 17q21 and childhood wheeze phenotypes.

    Granell R, Henderson AJ, Timpson N, St Pourcain B, Kemp JP, Ring SM, Ho K, Montgomery SB, Dermitzakis ET, Evans DM and Sterne JA

    School of Social and Community Medicine, University of Bristol, Bristol, United Kingdom. raquel.granell@bristol.ac.uk

    Background: Genome-wide association studies have identified associations of genetic variants at 17q21 near ORMDL3 with childhood asthma.

    Objectives: We sought to determine whether associations in this region are specific to particular asthma phenotypes and specific to ORMDL3.

    Methods: We examined associations between 244 independent single nucleotide polymorphisms (SNPs) plus 13 previously identified asthma-related SNPs in the region between 34 and 36 Mb on chromosome 17 and early wheezing phenotypes, doctor-diagnosed asthma and atopy at 7½ years, and bronchial hyperresponsiveness and lung function at 8½ years in 7045 children from the Avon Longitudinal Study of Parents and Children birth cohort study. With this, cis expression quantitative trait loci signals for the same SNPs were assessed in 875 samples across genes in the same region.

    Results: The strongest evidence for phenotypic association was seen for persistent wheezing (rs8076131 near ORMDL3: relative risk ratio [RRR], 1.60 [95% CI, 1.40-1.84], P = 1.4 × 10(-11); rs2305480 near GSDML: RRR, 1.60 [95% CI, 1.39-1.83], P = 1.5 × 10(-11); and rs9303277 near IKZF3: RRR, 1.57 [95% CI, 1.37-1.79], P = 4.4 × 10(-11)). Similar but less precisely estimated effects were seen for intermediate-onset wheeze, but there was little evidence of associations with other wheezing phenotypes. There was some evidence of associations with bronchial hyperresponsiveness. SNPs across the whole region show strong evidence of association with differential levels of expression at GSDML, IKZF3, and MED24, as well as ORMDL3.

    Conclusions: Associations of SNPs in the 17q21 locus are specific to asthma and specific wheezing phenotypes and are not explained by associations with intermediate phenotypes, such as atopy or lung function.

    Funded by: Medical Research Council: G0401540, G9815508; Wellcome Trust: 092731, WT083431MA

    The Journal of allergy and clinical immunology 2013;131;3;685-94

  • Replication of bipolar disorder susceptibility alleles and identification of two novel genome-wide significant associations in a new bipolar disorder case-control sample.

    Green EK, Hamshere M, Forty L, Gordon-Smith K, Fraser C, Russell E, Grozeva D, Kirov G, Holmans P, Moran JL, Purcell S, Sklar P, Owen MJ, O'Donovan MC, Jones L, WTCCC, Jones IR and Craddock N

    Department of Psychological Medicine, MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, Heath Park, Cardiff, UK.

    We have conducted a genotyping study using a custom Illumina Infinium HD genotyping array, the ImmunoChip, in a new UK sample of 1218 bipolar disorder (BD) cases and 2913 controls that have not been used in any studies previously reported independently or in meta-analyses. The ImmunoChip was designed before the publication of the Psychiatric Genome-Wide Association Study Consortium Bipolar Disorder Working Group (PGC-BD) meta-analysis data. As such 3106 single-nucleotide polymorphisms (SNPs) with a P-value <1 × 10(-3) from the BD meta-analysis by Ferreira et al. were genotyped. We report support for two of the three most strongly associated chromosomal regions in the Ferreira study, CACNA1C (rs1006737, P=4.09 × 10(-4)) and 15q14 (rs2172835, P=0.043) but not ANK3 (rs10994336, P=0.912). We have combined our ImmunoChip data (569 quasi-independent SNPs from the 3016 SNPs genotyped) with the recently published PGC-BD meta-analysis data, using either the PGC-BD combined discovery and replication data where available or just the discovery data where the SNP was not typed in a replication sample in PGC-BD. Our data provide support for two regions, at ODZ4 and CACNA1C, with prior evidence for genome-wide significant (GWS) association in PGC-BD meta-analysis. In addition, the combined analysis shows two novel GWS associations. First, rs7296288 (P=8.97 × 10(-9), odds ratio (OR)=0.9), an intergenic polymorphism on chromosome 12 located between RHEBL1 and DHH. Second, rs3818253 (P=3.88 × 10(-8), OR=1.16), an intronic SNP on chromosome 20q11.2 in the gene TRPC4AP, which lies in a high linkage disequilibrium region along with the genes GSS and MYH7B.

    Funded by: Medical Research Council: G0000934; Wellcome Trust: 068545/Z/0, 076113/C/04/Z

    Molecular psychiatry 2013;18;12;1302-7

  • Massively parallel sequencing reveals the complex structure of an irradiated human chromosome on a mouse background in the tc1 model of down syndrome.

    Gribble SM, Wiseman FK, Clayton S, Prigmore E, Langley E, Yang F, Maguire S, Fu B, Rajan D, Sheppard O, Scott C, Hauser H, Stephens PJ, Stebbings LA, Ng BL, Fitzgerald T, Quail MA, Banerjee R, Rothkamm K, Tybulewicz VL, Fisher EM and Carter NP

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    Down syndrome (DS) is caused by trisomy of chromosome 21 (Hsa21) and presents a complex phenotype that arises from abnormal dosage of genes on this chromosome. However, the individual dosage-sensitive genes underlying each phenotype remain largely unknown. To help dissect genotype - phenotype correlations in this complex syndrome, the first fully transchromosomic mouse model, the Tc1 mouse, which carries a copy of human chromosome 21 was produced in 2005. The Tc1 strain is trisomic for the majority of genes that cause phenotypes associated with DS, and this freely available mouse strain has become used widely to study DS, the effects of gene dosage abnormalities, and the effect on the basic biology of cells when a mouse carries a freely segregating human chromosome. Tc1 mice were created by a process that included irradiation microcell-mediated chromosome transfer of Hsa21 into recipient mouse embryonic stem cells. Here, the combination of next generation sequencing, array-CGH and fluorescence in situ hybridization technologies has enabled us to identify unsuspected rearrangements of Hsa21 in this mouse model; revealing one deletion, six duplications and more than 25 de novo structural rearrangements. Our study is not only essential for informing functional studies of the Tc1 mouse but also (1) presents for the first time a detailed sequence analysis of the effects of gamma radiation on an entire human chromosome, which gives some mechanistic insight into the effects of radiation damage on DNA, and (2) overcomes specific technical difficulties of assaying a human chromosome on a mouse background where highly conserved sequences may confound the analysis. Sequence data generated in this study is deposited in the ENA database, Study Accession number: ERP000439.

    PloS one 2013;8;4;e60482

  • Reduced burden of very large and rare CNVs in bipolar affective disorder.

    Grozeva D, Kirov G, Conrad DF, Barnes CP, Hurles M, Owen MJ, O'Donovan MC and Craddock N

    MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff.

    Objectives: Large, rare chromosomal copy number variants (CNVs) have been shown to increase the risk for schizophrenia and other neuropsychiatric disorders including autism, attention-deficit hyperactivity disorder, learning difficulties, and epilepsy. Their role in bipolar disorder (BD) is less clear. There are no reports of an increase in large, rare CNVs in BD in general, but some have reported an increase in early-onset cases. We previously found that the rate of such CNVs in individuals with BD was not increased, even in early-onset cases. Our aim here was to examine the rate of large rare CNVs in BD in comparison with a new large independent reference sample from the same country.

    Methods: We studied the CNVs in a case-control sample consisting of 1,650 BD cases (reported previously) and 10,259 reference individuals without a known psychiatric disorder who took part in the original Wellcome Trust Case Control Consortium (WTCCC) study. The 10,259 reference individuals were affected with six non-psychiatric disorders (coronary artery disease, types 1 and 2 diabetes, hypertension, Crohn's disease, and rheumatoid arthritis). Affymetrix 500K array genotyping data were used to call the CNVs.

    Results: The rate of CNVs > 100 kb was not statistically different between cases and controls. The rate of very large (defined as > 1 Mb) and rare (< 1%) CNVs was significantly lower in patients with BD compared with the reference group. CNV loci associated with schizophrenia were not enriched in BD and, in fact, cases of BD had the lowest number of such CNVs compared with any of the WTCCC cohorts; this finding held even for the early-onset BD cases.

    Conclusions: Schizophrenia and BD differ with respect to CNV burden and association with specific CNVs. Our findings support the hypothesis that BD is etiologically distinct from schizophrenia with respect to large, rare CNVs and the accompanying associated neurodevelopmental abnormalities.

    Funded by: Wellcome Trust: 076113, 085475

    Bipolar disorders 2013;15;8;893-8

  • Global analysis of DNA methylation variation in adipose tissue from twins reveals links to disease-associated variants in distal regulatory elements.

    Grundberg E, Meduri E, Sandling JK, Hedman AK, Keildson S, Buil A, Busche S, Yuan W, Nisbet J, Sekowska M, Wilk A, Barrett A, Small KS, Ge B, Caron M, Shin SY, Multiple Tissue Human Expression Resource Consortium, Lathrop M, Dermitzakis ET, McCarthy MI, Spector TD, Bell JT and Deloukas P

    Wellcome Trust Sanger Institute, CB101SA Hinxton, UK; Department of Twin Research and Genetic Epidemiology, King's College London, SE17EH London, UK. Electronic address: elin.grundberg@mcgill.ca.

    Epigenetic modifications such as DNA methylation play a key role in gene regulation and disease susceptibility. However, little is known about the genome-wide frequency, localization, and function of methylation variation and how it is regulated by genetic and environmental factors. We utilized the Multiple Tissue Human Expression Resource (MuTHER) and generated Illumina 450K adipose methylome data from 648 twins. We found that individual CpGs had low variance and that variability was suppressed in promoters. We noted that DNA methylation variation was highly heritable (h(2)median = 0.34) and that shared environmental effects correlated with metabolic phenotype-associated CpGs. Analysis of methylation quantitative-trait loci (metQTL) revealed that 28% of CpGs were associated with nearby SNPs, and when overlapping them with adipose expression quantitative-trait loci (eQTL) from the same individuals, we found that 6% of the loci played a role in regulating both gene expression and DNA methylation. These associations were bidirectional, but there were pronounced negative associations for promoter CpGs. Integration of metQTL with adipose reference epigenomes and disease associations revealed significant enrichment of metQTL overlapping metabolic-trait or disease loci in enhancers (the strongest effects were for high-density lipoprotein cholesterol and body mass index [BMI]). We followed up with the BMI SNP rs713586, a cg01884057 metQTL that overlaps an enhancer upstream of ADCY3, and used bisulphite sequencing to refine this region. Our results showed widespread population invariability yet sequence dependence on adipose DNA methylation but that incorporating maps of regulatory elements aid in linking CpG variation to gene regulation and disease risk in a tissue-dependent manner.

    Funded by: Canadian Institutes of Health Research: EP1-120608; Wellcome Trust: 081917/Z/07/Z, 083270/Z/07/Z, 090532, 098051, 100140

    American journal of human genetics 2013;93;5;876-90

  • Gene-centric meta-analyses of 108 912 individuals confirm known body mass index loci and reveal three novel signals.

    Guo Y, Lanktree MB, Taylor KC, Hakonarson H, Lange LA, Keating BJ and The IBC 50K SNP array BMI Consortium

    List of authors is given in the Full Author List Section of Appendix.

    Recent genetic association studies have made progress in uncovering components of the genetic architecture of the body mass index (BMI). We used the ITMAT-Broad-Candidate Gene Association Resource (CARe) (IBC) array comprising up to 49 320 single nucleotide polymorphisms (SNPs) across ∼2100 metabolic and cardiovascular-related loci to genotype up to 108 912 individuals of European ancestry (EA), African-Americans, Hispanics and East Asians, from 46 studies, to provide additional insight into SNPs underpinning BMI. We used a five-phase study design: Phase I focused on meta-analysis of EA studies providing individual level genotype data; Phase II performed a replication of cohorts providing summary level EA data; Phase III meta-analyzed results from the first two phases; associated SNPs from Phase III were used for replication in Phase IV; finally in Phase V, a multi-ethnic meta-analysis of all samples from four ethnicities was performed. At an array-wide significance (P < 2.40E-06), we identify novel BMI associations in loci translocase of outer mitochondrial membrane 40 homolog (yeast) - apolipoprotein E - apolipoprotein C-I (TOMM40-APOE-APOC1) (rs2075650, P = 2.95E-10), sterol regulatory element binding transcription factor 2 (SREBF2, rs5996074, P = 9.43E-07) and neurotrophic tyrosine kinase, receptor, type 2 [NTRK2, a brain-derived neurotrophic factor (BDNF) receptor gene, rs1211166, P = 1.04E-06] in the Phase IV meta-analysis. Of 10 loci with previous evidence for BMI association represented on the IBC array, eight were replicated, with the remaining two showing nominal significance. Conditional analyses revealed two independent BMI-associated signals in BDNF and melanocortin 4 receptor (MC4R) regions. Of the 11 array-wide significant SNPs, three are associated with gene expression levels in both primary B-cells and monocytes; with rs4788099 in SH2B adaptor protein 1 (SH2B1) notably being associated with the expression of multiple genes in cis. These multi-ethnic meta-analyses expand our knowledge of BMI genetics.

    Funded by: NIA NIH HHS: R37 AG011099

    Human molecular genetics 2013;22;1;184-201

  • Genome-wide diversity in the levant reveals recent structuring by culture.

    Haber M, Gauguier D, Youhanna S, Patterson N, Moorjani P, Botigué LR, Platt DE, Matisoo-Smith E, Soria-Hernanz DF, Wells RS, Bertranpetit J, Tyler-Smith C, Comas D and Zalloua PA

    Institut de Biologia Evolutiva (CSIC-UPF), Departament de Ciències de la Salut i de la Vida, Universitat Pompeu Fabra, Barcelona, Spain.

    The Levant is a region in the Near East with an impressive record of continuous human existence and major cultural developments since the Paleolithic period. Genetic and archeological studies present solid evidence placing the Middle East and the Arabian Peninsula as the first stepping-stone outside Africa. There is, however, little understanding of demographic changes in the Middle East, particularly the Levant, after the first Out-of-Africa expansion and how the Levantine peoples relate genetically to each other and to their neighbors. In this study we analyze more than 500,000 genome-wide SNPs in 1,341 new samples from the Levant and compare them to samples from 48 populations worldwide. Our results show recent genetic stratifications in the Levant are driven by the religious affiliations of the populations within the region. Cultural changes within the last two millennia appear to have facilitated/maintained admixture between culturally similar populations from the Levant, Arabian Peninsula, and Africa. The same cultural changes seem to have resulted in genetic isolation of other groups by limiting admixture with culturally different neighboring populations. Consequently, Levant populations today fall into two main groups: one sharing more genetic characteristics with modern-day Europeans and Central Asians, and the other with closer genetic affinities to other Middle Easterners and Africans. Finally, we identify a putative Levantine ancestral component that diverged from other Middle Easterners ∼23,700-15,500 years ago during the last glacial period, and diverged from Europeans ∼15,900-9,100 years ago between the last glacial warming and the start of the Neolithic.

    Funded by: PEPFAR: 098051; Wellcome Trust

    PLoS genetics 2013;9;2;e1003316

  • The Role of Salt Bridges, Charge Density, and Subunit Flexibility in Determining Disassembly Routes of Protein Complexes.

    Hall Z, Hernández H, Marsh JA, Teichmann SA and Robinson CV

    Physical and Theoretical Chemistry Laboratory, Department of Chemistry, University of Oxford, Oxford OX1 3QZ, UK.

    Mass spectrometry can be used to characterize multiprotein complexes, defining their subunit stoichiometry and composition following solution disruption and collision-induced dissociation (CID). While CID of protein complexes in the gas phase typically results in the dissociation of unfolded subunits, a second atypical route is possible wherein compact subunits or subcomplexes are ejected without unfolding. Because tertiary structure and subunit interactions may be retained, this is the preferred route for structural investigations. How can we influence which pathway is adopted? By studying properties of a series of homomeric and heteromeric protein complexes and varying their overall charge in solution, we found that low subunit flexibility, higher charge densities, fewer salt bridges, and smaller interfaces are likely to be involved in promoting dissociation routes without unfolding. Manipulating the charge on a protein complex therefore enables us to direct dissociation through structurally informative pathways that mimic those followed in solution.

    Structure (London, England : 1993) 2013

  • Contributions of protein-coding and regulatory change to adaptive molecular evolution in murid rodents.

    Halligan DL, Kousathanas A, Ness RW, Harr B, Eöry L, Keane TM, Adams DJ and Keightley PD

    Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom.

    The contribution of regulatory versus protein change to adaptive evolution has long been controversial. In principle, the rate and strength of adaptation within functional genetic elements can be quantified on the basis of an excess of nucleotide substitutions between species compared to the neutral expectation or from effects of recent substitutions on nucleotide diversity at linked sites. Here, we infer the nature of selective forces acting in proteins, their UTRs and conserved noncoding elements (CNEs) using genome-wide patterns of diversity in wild house mice and divergence to related species. By applying an extension of the McDonald-Kreitman test, we infer that adaptive substitutions are widespread in protein-coding genes, UTRs and CNEs, and we estimate that there are at least four times as many adaptive substitutions in CNEs and UTRs as in proteins. We observe pronounced reductions in mean diversity around nonsynonymous sites (whether or not they have experienced a recent substitution). This can be explained by selection on multiple, linked CNEs and exons. We also observe substantial dips in mean diversity (after controlling for divergence) around protein-coding exons and CNEs, which can also be explained by the combined effects of many linked exons and CNEs. A model of background selection (BGS) can adequately explain the reduction in mean diversity observed around CNEs. However, BGS fails to explain the wide reductions in mean diversity surrounding exons (encompassing ~100 Kb, on average), implying that there is a substantial role for adaptation within exons or closely linked sites. The wide dips in diversity around exons, which are hard to explain by BGS, suggest that the fitness effects of adaptive amino acid substitutions could be substantially larger than substitutions in CNEs. We conclude that although there appear to be many more adaptive noncoding changes, substitutions in proteins may dominate phenotypic evolution.

    Funded by: Wellcome Trust

    PLoS genetics 2013;9;12;e1003995

  • Fine mapping of type 1 diabetes regions Idd9.1 and Idd9.2 reveals genetic complexity.

    Hamilton-Williams EE, Rainbow DB, Cheung J, Christensen M, Lyons PA, Peterson LB, Steward CA, Sherman LA and Wicker LS

    Department of Immunology and Microbial Sciences, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA, 92037, USA.

    Nonobese diabetic (NOD) mice congenic for C57BL/10 (B10)-derived genes in the Idd9 region of chromosome 4 are highly protected from type 1 diabetes (T1D). Idd9 has been divided into three protective subregions (Idd9.1, 9.2, and 9.3), each of which partially prevents disease. In this study we have fine-mapped the Idd9.1 and Idd9.2 regions, revealing further genetic complexity with at least two additional subregions contributing to protection from T1D. Using the NOD sequence from bacterial artificial chromosome clones of the Idd9.1 and Idd9.2 regions as well as whole-genome sequence data recently made available, sequence polymorphisms within the regions highlight a high degree of polymorphism between the NOD and B10 strains in the Idd9 regions. Among numerous candidate genes are several with immunological importance. The Idd9.1 region has been separated into Idd9.1 and Idd9.4, with Lck remaining a candidate gene within Idd9.1. One of the Idd9.2 regions contains the candidate genes Masp2 (encoding mannan-binding lectin serine peptidase 2) and Mtor (encoding mammalian target of rapamycin). From mRNA expression analyses, we have also identified several other differentially expressed candidate genes within the Idd9.1 and Idd9.2 regions. These findings highlight that multiple, relatively small genetic effects combine and interact to produce significant changes in immune tolerance and diabetes onset.

    Funded by: NIAID NIH HHS: AI 070351, AI 15416, U19AI050864-07; Wellcome Trust: 091157, 100140

    Mammalian genome : official journal of the International Mammalian Genome Society 2013;24;9-10;358-75

  • Mutations in B4GALNT1 (GM2 synthase) underlie a new disorder of ganglioside biosynthesis.

    Harlalka GV, Lehman A, Chioza B, Baple EL, Maroofian R, Cross H, Sreekantan-Nair A, Priestman DA, Al-Turki S, McEntagart ME, Proukakis C, Royle L, Kozak RP, Bastaki L, Patton M, Wagner K, Coblentz R, Price J, Mezei M, Schlade-Bartusiak K, Platt FM, Hurles ME and Crosby AH

    1 Institute of Biomedical and Clinical Science, University of Exeter Medical School, St. Luke's Campus, Heavitree Road, EX1 2LU, Exeter, Devon, UK.

    Glycosphingolipids are ubiquitous constituents of eukaryotic plasma membranes, and their sialylated derivatives, gangliosides, are the major class of glycoconjugates expressed by neurons. Deficiencies in their catabolic pathways give rise to a large and well-studied group of inherited disorders, the lysosomal storage diseases. Although many glycosphingolipid catabolic defects have been defined, only one proven inherited disease arising from a defect in ganglioside biosynthesis is known. This disease, because of defects in the first step of ganglioside biosynthesis (GM3 synthase), results in a severe epileptic disorder found at high frequency amongst the Old Order Amish. Here we investigated an unusual neurodegenerative phenotype, most commonly classified as a complex form of hereditary spastic paraplegia, present in families from Kuwait, Italy and the Old Order Amish. Our genetic studies identified mutations in B4GALNT1 (GM2 synthase), encoding the enzyme that catalyzes the second step in complex ganglioside biosynthesis, as the cause of this neurodegenerative phenotype. Biochemical profiling of glycosphingolipid biosynthesis confirmed a lack of GM2 in affected subjects in association with a predictable increase in levels of its precursor, GM3, a finding that will greatly facilitate diagnosis of this condition. With the description of two neurological human diseases involving defects in two sequentially acting enzymes in ganglioside biosynthesis, there is the real possibility that a previously unidentified family of ganglioside deficiency diseases exist. The study of patients and animal models of these disorders will pave the way for a greater understanding of the role gangliosides play in neuronal structure and function and provide insights into the development of effective treatment therapies.

    Brain : a journal of neurology 2013;136;Pt 12;3618-24

  • Whole-genome sequencing for analysis of an outbreak of meticillin-resistant Staphylococcus aureus: a descriptive study.

    Harris SR, Cartwright EJ, Török ME, Holden MT, Brown NM, Ogilvy-Stuart AL, Ellington MJ, Quail MA, Bentley SD, Parkhill J and Peacock SJ

    Wellcome Trust Sanger Institute, Cambridge, UK.

    Background: The emergence of meticillin-resistant Staphylococcus aureus (MRSA) that can persist in the community and replace existing hospital-adapted lineages of MRSA means that it is necessary to understand transmission dynamics in terms of hospitals and the community as one entity. We assessed the use of whole-genome sequencing to enhance detection of MRSA transmission between these settings.

    Methods: We studied a putative MRSA outbreak on a special care baby unit (SCBU) at a National Health Service Foundation Trust in Cambridge, UK. We used whole-genome sequencing to validate and expand findings from an infection-control team who assessed the outbreak through conventional analysis of epidemiological data and antibiogram profiles. We sequenced isolates from all colonised patients in the SCBU, and sequenced MRSA isolates from patients in the hospital or community with the same antibiotic susceptibility profile as the outbreak strain.

    Findings: The hospital infection-control team identified 12 infants colonised with MRSA in a 6 month period in 2011, who were suspected of being linked, but a persistent outbreak could not be confirmed with conventional methods. With whole-genome sequencing, we identified 26 related cases of MRSA carriage, and showed transmission occurred within the SCBU, between mothers on a postnatal ward, and in the community. The outbreak MRSA type was a new sequence type (ST) 2371, which is closely related to ST22, but contains genes encoding Panton-Valentine leucocidin. Whole-genome sequencing data were used to propose and confirm that MRSA carriage by a staff member had allowed the outbreak to persist during periods without known infection on the SCBU and after a deep clean.

    Interpretation: Whole-genome sequencing holds great promise for rapid, accurate, and comprehensive identification of bacterial transmission pathways in hospital and community settings, with concomitant reductions in infections, morbidity, and costs.

    Funding: UK Clinical Research Collaboration Translational Infection Research Initiative, Wellcome Trust, Health Protection Agency, and the National Institute for Health Research Cambridge Biomedical Research Centre.

    Funded by: Biotechnology and Biological Sciences Research Council; Chief Scientist Office; Department of Health; Medical Research Council: G1000803; Wellcome Trust: 098051

    The Lancet infectious diseases 2013;13;2;130-6

  • Read and assembly metrics inconsequential for clinical utility of whole-genome sequencing in mapping outbreaks.

    Harris SR, Török ME, Cartwright EJ, Quail MA, Peacock SJ and Parkhill J

    Funded by: Medical Research Council: G1000803

    Nature biotechnology 2013;31;7;592-4

  • Diagnostic pathway for the investigation of thrombocytosis.

    Harrison CN, Butt N, Campbell P, Conneally E, Drummond M, Green AR, Murrin R, Radia DH, Reilly JT and McMullin MF

    Department of Haematology, Guy's and St Thomas, Hospitals' NHS Foundation Trust, London, UK.

    British journal of haematology 2013

  • Whole genome sequencing identifies zoonotic transmission of MRSA isolates with the novel mecA homologue mecC.

    Harrison EM, Paterson GK, Holden MT, Larsen J, Stegger M, Larsen AR, Petersen A, Skov RL, Christensen JM, Bak Zeuthen A, Heltberg O, Harris SR, Zadoks RN, Parkhill J, Peacock SJ and Holmes MA

    Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.

    Several methicillin-resistant Staphylococcus aureus (MRSA) lineages that carry a novel mecA homologue (mecC) have recently been described in livestock and humans. In Denmark, two independent human cases of mecC-MRSA infection have been linked to a livestock reservoir. We investigated the molecular epidemiology of the associated MRSA isolates using whole genome sequencing (WGS). Single nucleotide polymorphisms (SNP) were defined and compared to a reference genome to place the isolates into a phylogenetic context. Phylogenetic analysis revealed two distinct farm-specific clusters comprising isolates from the human case and their own livestock, whereas human and animal isolates from the same farm only differed by a small number of SNPs, which supports the likelihood of zoonotic transmission. Further analyses identified a number of genes and mutations that may be associated with host interaction and virulence. This study demonstrates that mecC-MRSA ST130 isolates are capable of transmission between animals and humans, and underscores the potential of WGS in epidemiological investigations and source tracking of bacterial infections. →See accompanying article http://dx.doi.org/10.1002/emmm.201302622.

    EMBO molecular medicine 2013;5;4;509-15

  • A Staphylococcus xylosus Isolate with a New mecC Allotype.

    Harrison EM, Paterson GK, Holden MT, Morgan FJ, Larsen AR, Petersen A, Leroy S, De Vliegher S, Perreten V, Fox LK, Lam TJ, Sampimon OC, Zadoks RN, Peacock SJ, Parkhill J and Holmes MA

    University of Cambridge, Department of Veterinary Medicine, Cambridge, United Kingdom.

    Recently, a novel variant of mecA known as mecC (mecA(LGA251)) was identified in Staphylococcus aureus isolates from both humans and animals. In this study, we identified a Staphylococcus xylosus isolate that harbors a new allotype of the mecC gene, mecC1. Whole-genome sequencing revealed that mecC1 forms part of a class E mec complex (mecI-mecR1-mecC1-blaZ) located at the orfX locus as part of a likely staphylococcal cassette chromosome mec element (SCCmec) remnant, which also contains a number of other genes present on the type XI SCCmec.

    Antimicrobial agents and chemotherapy 2013;57;3;1524-8

  • Description and Nomenclature of Neisseria meningitidis Capsule Locus.

    Harrison OB, Claus H, Jiang Y, Bennett JS, Bratcher HB, Jolley KA, Corton C, Care R, Poolman JT, Zollinger WD, Frasch CE, Stephens DS, Feavers I, Frosch M, Parkhill J, Vogel U, Quail MA, Bentley SD and Maiden MC

    Pathogenic Neisseria meningitidis isolates contain a polysaccharide capsule that is the main virulence determinant for this bacterium. Thirteen capsular polysaccharides have been described, and nuclear magnetic resonance spectroscopy has enabled determination of the structure of capsular polysaccharides responsible for serogroup specificity. Molecular mechanisms involved in N. meningitidis capsule biosynthesis have also been identified, and genes involved in this process and in cell surface translocation are clustered at a single chromosomal locus termed cps. The use of multiple names for some of the genes involved in capsule synthesis, combined with the need for rapid diagnosis of serogroups commonly associated with invasive meningococcal disease, prompted a requirement for a consistent approach to the nomenclature of capsule genes. In this report, a comprehensive description of all N. meningitidis serogroups is provided, along with a proposed nomenclature, which was presented at the 2012 XVIIIth International Pathogenic Neisseria Conference.

    Emerging infectious diseases 2013;19;4;566-73

  • VS-5584, a Novel and Highly Selective PI3K/mTOR Kinase Inhibitor for the Treatment of Cancer.

    Hart S, Novotny-Diermayr V, Goh KC, Williams M, Tan YC, Ong LC, Cheong A, Ng BK, Amalini C, Madan B, Nagaraj H, Jayaraman R, Pasha KM, Ethirajulu K, Chng WJ, Mustafa N, Goh BC, Benes C, McDermott U, Garnett M, Dymock B and Wood JM

    Corresponding Author: Stefan Hart, S*BIO Pte Ltd., 1 Science Park Road, #05-09 The Capricorn, Singapore 117528, Singapore. stefan.sbio@gmail.com.

    Dysregulation of the PI3K/mTOR pathway, either through amplifications, deletions, or as a direct result of mutations, has been closely linked to the development and progression of a wide range of cancers. Moreover, this pathway activation is a poor prognostic marker for many tumor types and confers resistance to various cancer therapies. Here, we describe VS-5584, a novel, low-molecular weight compound with equivalent potent activity against mTOR (IC(50) = 37 nmol/L) and all class I phosphoinositide 3-kinase (PI3K) isoforms IC(50): PI3Kα = 16 nmol/L; PI3Kβ = 68 nmol/L; PI3Kγ = 25 nmol/L; PI3Kδ = 42 nmol/L, without relevant activity on 400 lipid and protein kinases. VS-5584 shows robust modulation of cellular PI3K/mTOR pathways, inhibiting phosphorylation of substrates downstream of PI3K and mTORC1/2. A large human cancer cell line panel screen (436 lines) revealed broad antiproliferative sensitivity and that cells harboring mutations in PI3KCA are generally more sensitive toward VS-5584 treatment. VS-5584 exhibits favorable pharmacokinetic properties after oral dosing in mice and is well tolerated. VS-5584 induces long-lasting and dose-dependent inhibition of PI3K/mTOR signaling in tumor tissue, leading to tumor growth inhibition in various rapalog-sensitive and -resistant human xenograft models. Furthermore, VS-5584 is synergistic with an EGF receptor inhibitor in a gastric tumor model. The unique selectivity profile and favorable pharmacologic and pharmaceutical properties of VS-5584 and its efficacy in a wide range of human tumor models supports further investigations of VS-5584 in clinical trials. Mol Cancer Ther; 12(2); 151-61. ©2012 AACR.

    Molecular cancer therapeutics 2013;12;2;151-61

  • Identification of the zebrafish maternal and paternal transcriptomes.

    Harvey SA, Sealy I, Kettleborough R, Fenyes F, White R, Stemple D and Smith JC

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    Transcription is an essential component of basic cellular and developmental processes. However, early embryonic development occurs in the absence of transcription and instead relies upon maternal mRNAs and proteins deposited in the egg during oocyte maturation. Although the early zebrafish embryo is competent to transcribe exogenous DNA, factors present in the embryo maintain genomic DNA in a state that is incompatible with transcription. The cell cycles of the early embryo titrate out these factors, leading to zygotic transcription initiation, presumably in response to a change in genomic DNA chromatin structure to a state that supports transcription. To understand the molecular mechanisms controlling this maternal to zygotic transition, it is important to distinguish between the maternal and zygotic transcriptomes during this period. Here we use exome sequencing and RNA-seq to achieve such discrimination and in doing so have identified the first zygotic genes to be expressed in the embryo. Our work revealed different profiles of maternal mRNA post-transcriptional regulation prior to zygotic transcription initiation. Finally, we demonstrate that maternal mRNAs are required for different modes of zygotic transcription initiation, which is not simply dependent on the titration of factors that maintain genomic DNA in a transcriptionally incompetent state.

    Development (Cambridge, England) 2013;140;13;2703-10

  • A blood pressure genetic risk score is a significant predictor of incident cardiovascular events in 32 669 individuals.

    Havulinna AS, Kettunen J, Ukkola O, Osmond C, Eriksson JG, Kesäniemi YA, Jula A, Peltonen L, Kontula K, Salomaa V and Newton-Cheh C

    Center for Human Genetic Research, Cardiovascular Research Center, Massachusetts General Hospital, 185 Cambridge St, CPZN 5.242, Boston, MA 02114.

    Recent genome-wide association studies have identified genetic variants associated with blood pressure (BP). We investigated whether genetic risk scores (GRSs) constructed of these variants would predict incident cardiovascular disease (CVD) events. We genotyped 32 common single nucleotide polymorphisms in several Finnish cohorts, with up to 32 669 individuals after exclusion of prevalent CVD cases. The median follow-up was 9.8 years, during which 2295 incident CVD events occurred. We created GRSs separately for systolic BP and diastolic BP by multiplying the risk allele count of each single nucleotide polymorphism by the effect size estimated in published genome-wide association studies. We performed Cox regression analyses with and without adjustment for clinical factors, including BP at baseline in each cohort. The results were combined by inverse variance-weighted fixed-effects meta-analysis. The GRSs were strongly associated with systolic BP and diastolic BP, and baseline hypertension (all P<10(-62)). Hazard ratios comparing the highest quintiles of systolic BP and diastolic BP GRSs with the lowest quintiles after adjustment for age, age squared, and sex were 1.25 (1.07-1.46; P=0.006) and 1.23 (1.05-1.43; P=0.01), respectively, for incident coronary heart disease; 1.24 (1.01-1.53; P=0.04) and 1.35 (1.09-1.66; P=0.005), respectively, for incident stroke; and 1.23 (1.08-1.40; P=2×10(-6)) and 1.26 (1.11-1.44; P=5×10(-4)), respectively, for composite CVD. In conclusion, BP findings from genome-wide association studies are strongly replicated. GRSs comprising bona fide BP-single nucleotide polymorphisms predicted CVD risk, consistent with a lifelong effect on BP of these variants collectively.

    Funded by: NHLBI NIH HHS: R01 HL098283

    Hypertension 2013;61;5;987-94

  • Mcl-1 and FBW7 control a dominant survival pathway underlying HDAC and Bcl-2 inhibitor synergy in squamous cell carcinoma.

    He L, Torres-Lockhart K, Forster N, Ramakrishnan S, Greninger P, Garnett MJ, McDermott U, Rothenberg SM, Benes CH and Ellisen LW

    Massachusetts General Hospital Cancer Center and Harvard Medical School, Boston, Massachusetts 02114, USA.

    Effective targeted therapeutics for squamous cell carcinoma (SCC) are lacking. Here, we uncover Mcl-1 as a dominant and tissue-specific survival factor in SCC, providing a roadmap for a new therapeutic approach. Treatment with the histone deacetylase (HDAC) inhibitor vorinostat regulates Bcl-2 family member expression to disable the Mcl-1 axis and thereby induce apoptosis in SCC cells. Although Mcl-1 dominance renders SCC cells resistant to the BH3-mimetic ABT-737, vorinostat primes them for sensitivity to ABT-737 by shuttling Bim from Mcl-1 to Bcl-2/Bcl-xl, resulting in dramatic synergy for this combination and sustained tumor regression in vivo. Moreover, somatic FBW7 mutation in SCC is associated with stabilized Mcl-1 and high Bim levels, resulting in a poor response to standard chemotherapy but a robust response to HDAC inhibitors and enhanced synergy with the combination vorinostat/ABT-737. Collectively, our findings provide a biochemical rationale and predictive markers for the application of this therapeutic combination in SCC.

    Funded by: NCI NIH HHS: BC093523; NIDCR NIH HHS: NIH KO8 DE-020139, R01 DE015945; Wellcome Trust: 086357

    Cancer discovery 2013;3;3;324-37

  • Emergence and global spread of epidemic healthcare-associated Clostridium difficile.

    He M, Miyajima F, Roberts P, Ellison L, Pickard DJ, Martin MJ, Connor TR, Harris SR, Fairley D, Bamford KB, D'Arc S, Brazier J, Brown D, Coia JE, Douce G, Gerding D, Kim HJ, Koh TH, Kato H, Senoh M, Louie T, Michell S, Butt E, Peacock SJ, Brown NM, Riley T, Songer G, Wilcox M, Pirmohamed M, Kuijper E, Hawkey P, Wren BW, Dougan G, Parkhill J and Lawley TD

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Epidemic C. difficile (027/BI/NAP1) has rapidly emerged in the past decade as the leading cause of antibiotic-associated diarrhea worldwide. However, the key events in evolutionary history leading to its emergence and the subsequent patterns of global spread remain unknown. Here, we define the global population structure of C. difficile 027/BI/NAP1 using whole-genome sequencing and phylogenetic analysis. We show that two distinct epidemic lineages, FQR1 and FQR2, not one as previously thought, emerged in North America within a relatively short period after acquiring the same fluoroquinolone resistance-conferring mutation and a highly related conjugative transposon. The two epidemic lineages showed distinct patterns of global spread, and the FQR2 lineage spread more widely, leading to healthcare-associated outbreaks in the UK, continental Europe and Australia. Our analysis identifies key genetic changes linked to the rapid transcontinental dissemination of epidemic C. difficile 027/BI/NAP1 and highlights the routes by which it spreads through the global healthcare system.

    Funded by: Medical Research Council: 93614, G0901743(93614); Wellcome Trust: 086418, 093869, 098051

    Nature genetics 2013;45;1;109-13

  • A genome-wide association study of depressive symptoms.

    Hek K, Demirkan A, Lahti J, Terracciano A, Teumer A, Cornelis MC, Amin N, Bakshis E, Baumert J, Ding J, Liu Y, Marciante K, Meirelles O, Nalls MA, Sun YV, Vogelzangs N, Yu L, Bandinelli S, Benjamin EJ, Bennett DA, Boomsma D, Cannas A, Coker LH, de Geus E, De Jager PL, Diez-Roux AV, Purcell S, Hu FB, Rimm EB, Hunter DJ, Jensen MK, Curhan G, Rice K, Penman AD, Rotter JI, Sotoodehnia N, Emeny R, Eriksson JG, Evans DA, Ferrucci L, Fornage M, Gudnason V, Hofman A, Illig T, Kardia S, Kelly-Hayes M, Koenen K, Kraft P, Kuningas M, Massaro JM, Melzer D, Mulas A, Mulder CL, Murray A, Oostra BA, Palotie A, Penninx B, Petersmann A, Pilling LC, Psaty B, Rawal R, Reiman EM, Schulz A, Shulman JM, Singleton AB, Smith AV, Sutin AR, Uitterlinden AG, Völzke H, Widen E, Yaffe K, Zonderman AB, Cucca F, Harris T, Ladwig KH, Llewellyn DJ, Räikkönen K, Tanaka T, van Duijn CM, Grabe HJ, Launer LJ, Lunetta KL, Mosley TH, Newman AB, Tiemeier H and Murabito J

    Research Centre O3, Department of Psychiatry, Erasmus MC, Rotterdam, The Netherlands; Department of Epidemiology, Erasmus MC, Rotterdam, The Netherlands.

    Background: Depression is a heritable trait that exists on a continuum of varying severity and duration. Yet, the search for genetic variants associated with depression has had few successes. We exploit the entire continuum of depression to find common variants for depressive symptoms.

    Methods: In this genome-wide association study, we combined the results of 17 population-based studies assessing depressive symptoms with the Center for Epidemiological Studies Depression Scale. Replication of the independent top hits (p<1×10(-5)) was performed in five studies assessing depressive symptoms with other instruments. In addition, we performed a combined meta-analysis of all 22 discovery and replication studies.

    Results: The discovery sample comprised 34,549 individuals (mean age of 66.5) and no loci reached genome-wide significance (lowest p = 1.05×10(-7)). Seven independent single nucleotide polymorphisms were considered for replication. In the replication set (n = 16,709), we found suggestive association of one single nucleotide polymorphism with depressive symptoms (rs161645, 5q21, p = 9.19×10(-3)). This 5q21 region reached genome-wide significance (p = 4.78×10(-8)) in the overall meta-analysis combining discovery and replication studies (n = 51,258).

    Conclusions: The results suggest that only a large sample comprising more than 50,000 subjects may be sufficiently powered to detect genes for depressive symptoms.

    Funded by: NCI NIH HHS: 5UO1CA098233, CA49449, CA50385, CA65725, CA67262, CA87969; NCRR NIH HHS: UL1-RR-024156, UL1RR025005, UL1RR033176; NHGRI NIH HHS: U01-HG004402; NHLBI NIH HHS: HL075366, HL080295, HL087652, HL105756, N01 HC-15103, N01 HC-55222, N01-HC-35129, N01-HC-45133, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-65226, N01-HC-75150, N01-HC-85079, N01-HC-85086, N01-HC-85239, N01-HC-95159, N01-HC-95169, N02-HL-6-4278, R01 HL101161, R01-HL087641, R01-HL093029, R01-HL70825; NIA NIH HHS: 1R01AG032098-01A1, AG-023629, AG-027058, AG-15928, AG-20098, AG]916413, K08 AG034290, K08AG34290, N01-AG-1-2109, N01-AG-12100, N01AG62101, N01AG62103, N01AG62106, N01]AG]821336, P30AG10161, R01 AG015819, R01-AG29451, R01AG15819, R01AG17917, R01AG30146, ZIA AG000183-22, ZIA AG000183-23, ZIA AG000196-03, ZIA AG000196-04, ZIA AG000197-03, ZIA AG000197-04; NIDDK NIH HHS: DK063491; NIMH NIH HHS: R01 MH086498; NIMHD NIH HHS: 263 MD 821336, 263 MD 9164 13; PHS HHS: HHSN268200625226C, HHSN268200782096C; Wellcome Trust: WT098051

    Biological psychiatry 2013;73;7;667-78

  • Aberrant 3' oligoadenylation of spliceosomal U6 small nuclear RNA in poikiloderma with neutropenia.

    Hilcenko C, Simpson PJ, Finch AJ, Bowler FR, Churcher MJ, Jin L, Packman LC, Shlien A, Campbell P, Kirwan M, Dokal I and Warren AJ

    MRC Laboratory of Molecular Biology, Cambridge, UK.

    The recessive disorder poikiloderma with neutropenia (PN) is caused by mutations in the C16orf57 gene that encodes the highly conserved USB1 protein. Here, we present the 1.1 Å resolution crystal structure of human USB1, defining it as a member of the LigT-like superfamily of 2H phosphoesterases. We show that human USB1 is a distributive 3'-5' exoribonuclease that posttranscriptionally removes uridine and adenosine nucleosides from the 3' end of spliceosomal U6 small nuclear RNA (snRNA), directly catalyzing terminal 2', 3' cyclic phosphate formation. USB1 measures the appropriate length of the U6 oligo(U) tail by reading the position of a key adenine nucleotide (A102) and pausing 5 uridine residues downstream.We show that the 3' ends of U6 snRNA in PN patient lymphoblasts are elongated and unexpectedly carry nontemplated 3' oligo(A) tails that are characteristic of nuclear RNA surveillance targets. Thus, our study reveals a novel quality control pathway in which posttranscriptional 3'-end processing by USB1 protects U6 snRNA from targeting and destruction by the nuclear exosome. Our data implicate aberrant oligoadenylation of U6 snRNA in the pathogenesis of the leukemia predisposition disorder PN.

    Funded by: Medical Research Council: G0800784, U105161083; Wellcome Trust: 079249

    Blood 2013;121;6;1028-38

  • Dense genotyping of immune-related disease regions identifies 14 new susceptibility loci for juvenile idiopathic arthritis.

    Hinks A, Cobb J, Marion MC, Prahalad S, Sudman M, Bowes J, Martin P, Comeau ME, Sajuthi S, Andrews R, Brown M, Chen WM, Concannon P, Deloukas P, Edkins S, Eyre S, Gaffney PM, Guthery SL, Guthridge JM, Hunt SE, James JA, Keddache M, Moser KL, Nigrovic PA, Onengut-Gumuscu S, Onslow ML, Rosé CD, Rich SS, Steel KJ, Wakeland EK, Wallace CA, Wedderburn LR, Woo P, Boston Children's JIA Registry, British Society of Paediatric and Adolescent Rheumatology (BSPAR) Study Group, Childhood Arthritis Prospective Study (CAPS), Childhood Arthritis Response to Medication Study (CHARMS), German Society for Pediatric Rheumatology (GKJR), JIA Gene Expression Study, NIAMS JIA Genetic Registry, TREAT Study, United Kingdom Juvenile Idiopathic Arthritis Genetics Consortium (UKJIAGC), Bohnsack JF, Haas JP, Glass DN, Langefeld CD, Thomson W and Thompson SD

    1] Arthritis Research UK Epidemiology Unit, Manchester Academic Health Science Centre, The University of Manchester, Manchester, UK. [2] National Institute for Health Research Manchester Musculoskeletal Biomedical Research Unit, Central Manchester University Hospitals National Health Service Foundation Trust, Manchester Academic Health Science Centre, Manchester, UK. [3].

    We used the Immunochip array to analyze 2,816 individuals with juvenile idiopathic arthritis (JIA), comprising the most common subtypes (oligoarticular and rheumatoid factor-negative polyarticular JIA), and 13,056 controls. We confirmed association of 3 known JIA risk loci (the human leukocyte antigen (HLA) region, PTPN22 and PTPN2) and identified 14 loci reaching genome-wide significance (P < 5 × 10(-8)) for the first time. Eleven additional new regions showed suggestive evidence of association with JIA (P < 1 × 10(-6)). Dense mapping of loci along with bioinformatics analysis refined the associations to one gene in each of eight regions, highlighting crucial pathways, including the interleukin (IL)-2 pathway, in JIA disease pathogenesis. The entire Immunochip content, the HLA region and the top 27 loci (P < 1 × 10(-6)) explain an estimated 18, 13 and 6% of the risk of JIA, respectively. In summary, this is the largest collection of JIA cases investigated so far and provides new insight into the genetic basis of this childhood autoimmune disease.

    Funded by: NIDDK NIH HHS: U01 DK062418

    Nature genetics 2013;45;6;664-9

  • JAK2V617F leads to intrinsic changes in platelet formation and reactivity in a knock-in mouse model of essential thrombocythemia.

    Hobbs CM, Manning H, Bennett C, Vasquez L, Severin S, Brain L, Mazharian A, Guerrero JA, Li J, Soranzo N, Green AR, Watson SP and Ghevaert C

    National Health System Blood and Transplant, Cambridge, United Kingdom;

    The principal morbidity and mortality in patients with essential thrombocythemia (ET) and polycythemia rubra vera (PV) stems from thrombotic events. Most patients with ET/PV harbor a JAK2V617F mutation, but its role in the thrombotic diathesis remains obscure. Platelet function studies in patients are difficult to interpret because of interindividual heterogeneity, reflecting variations in the proportion of platelets derived from the malignant clone, differences in the presence of additional mutations, and the effects of medical treatments. To circumvent these issues, we have studied a JAK2V617F knock-in mouse model of ET in which all megakaryocytes and platelets express JAK2V617F at a physiological level, equivalent to that present in human ET patients. We show that, in addition to increased differentiation, JAK2V617F-positive megakaryocytes display greater migratory ability and proplatelet formation. We demonstrate in a range of assays that platelet reactivity to agonists is enhanced, with a concomitant increase in platelet aggregation in vitro and a reduced duration of bleeding in vivo. These data suggest that JAK2V617F leads to intrinsic changes in both megakaryocyte and platelet biology beyond an increase in cell number. In support of this hypothesis, we identify multiple differentially expressed genes in JAK2V617F megakaryocytes that may underlie the observed biological differences.

    Blood 2013;122;23;3787-97

  • A genomic portrait of the emergence, evolution, and global spread of a methicillin-resistant Staphylococcus aureus pandemic.

    Holden MT, Hsu LY, Kurt K, Weinert LA, Mather AE, Harris SR, Strommenger B, Layer F, Witte W, de Lencastre H, Skov R, Westh H, Zemlicková H, Coombs G, Kearns AM, Hill RL, Edgeworth J, Gould I, Gant V, Cooke J, Edwards GF, McAdam PR, Templeton KE, McCann A, Zhou Z, Castillo-Ramírez S, Feil EJ, Hudson LO, Enright MC, Balloux F, Aanensen DM, Spratt BG, Fitzgerald JR, Parkhill J, Achtman M, Bentley SD and Nübel U

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB19 1SA, United Kingdom.

    The widespread use of antibiotics in association with high-density clinical care has driven the emergence of drug-resistant bacteria that are adapted to thrive in hospitalized patients. Of particular concern are globally disseminated methicillin-resistant Staphylococcus aureus (MRSA) clones that cause outbreaks and epidemics associated with health care. The most rapidly spreading and tenacious health-care-associated clone in Europe currently is EMRSA-15, which was first detected in the UK in the early 1990s and subsequently spread throughout Europe and beyond. Using phylogenomic methods to analyze the genome sequences for 193 S. aureus isolates, we were able to show that the current pandemic population of EMRSA-15 descends from a health-care-associated MRSA epidemic that spread throughout England in the 1980s, which had itself previously emerged from a primarily community-associated methicillin-sensitive population. The emergence of fluoroquinolone resistance in this EMRSA-15 subclone in the English Midlands during the mid-1980s appears to have played a key role in triggering pandemic spread, and occurred shortly after the first clinical trials of this drug. Genome-based coalescence analysis estimated that the population of this subclone over the last 20 yr has grown four times faster than its progenitor. Using comparative genomic analysis we identified the molecular genetic basis of 99.8% of the antimicrobial resistance phenotypes of the isolates, highlighting the potential of pathogen genome sequencing as a diagnostic tool. We document the genetic changes associated with adaptation to the hospital environment and with increasing drug resistance over time, and how MRSA evolution likely has been influenced by country-specific drug use regimens.

    Funded by: Biotechnology and Biological Sciences Research Council; Chief Scientist Office: CZB/4/717; Medical Research Council: G0800777, MR/K001744/1; PHS HHS: 2 RO1I457838-12; Wellcome Trust: 089472, 098051

    Genome research 2013;23;4;653-64

  • Tracking the establishment of local endemic populations of an emergent enteric pathogen.

    Holt KE, Thieu Nga TV, Thanh DP, Vinh H, Kim DW, Vu Tra MP, Campbell JI, Hoang NV, Vinh NT, Minh PV, Thuy CT, Nga TT, Thompson C, Dung TT, Nhu NT, Vinh PV, Tuyet PT, Phuc HL, Lien NT, Phu BD, Ai NT, Tien NM, Dong N, Parry CM, Hien TT, Farrar JJ, Parkhill J, Dougan G, Thomson NR and Baker S

    Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, VIC 3010, Australia.

    Shigella sonnei is a human-adapted pathogen that is emerging globally as the dominant agent of bacterial dysentery. To investigate local establishment, we sequenced the genomes of 263 Vietnamese S. sonnei isolated over 15 y. Our data show that S. sonnei was introduced into Vietnam in the 1980s and has undergone localized clonal expansion, punctuated by genomic fixation events through periodic selective sweeps. We uncover geographical spread, spatially restricted frontier populations, and convergent evolution through local gene pool sampling. This work provides a unique, high-resolution insight into the microevolution of a pioneering human pathogen during its establishment in a new host population.

    Funded by: Wellcome Trust: 093724, 098051, 100087

    Proceedings of the National Academy of Sciences of the United States of America 2013;110;43;17522-7

  • Arginine Catabolic Mobile Element in Methicillin-Resistant Staphylococcus aureus (MRSA) Clonal Group ST239-MRSA-III Isolates in Singapore: Implications for PCR-Based Screening Tests.

    Hon PY, Chan KS, Holden MT, Harris SR, Tan TY, Zu YB, Krishnan P, Oon LL, Koh TH and Hsu LY

    Department of Medicine, National University Health System, Singapore, Singapore.

    Antimicrobial agents and chemotherapy 2013;57;3;1563-4

  • WikiGWA: an open platform for collecting and using genome-wide association results.

    Huang J, Liu EY, Welch R, Willer C, Hindorff LA and Li Y

    Department of Human Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    The number of discovered genetic variants from genome-wide association (GWA) studies (GWAS) has been growing rapidly. Centralized efforts such as the National Human Genome Research Institute's GWAS catalog provide regular updates and a convenient interface for quick lookup. However, the catalog entries are manually curated and rely on data from published articles. Other tools such as SNPedia (http://www.snpedia.com) collect published results regarding functional consequences of genetic variations. Here, we propose an approach that allows individual investigators to share their GWA results through an open platform. Unlike GWAS catalog or SNPedia, wikiGWA collects first-hand GWAS results and in a much larger scale. Investigators are not only able to post a much larger amount of results, but also post results from unpublished studies, which could alleviate publication bias and facilitate identification of weak signals. Our interface allows for flexible and fast queries, and the query results are formatted to work seamlessly with the LocusZoom program for visualization and annotation. We here describe wikiGWA, made publically available at http://www.wikiGWA.org.

    Funded by: NHGRI NIH HHS: R01 HG006292, R01 HG006703

    European journal of human genetics : EJHG 2013;21;4;471-3

  • The duck genome and transcriptome provide insight into an avian influenza virus reservoir species.

    Huang Y, Li Y, Burt DW, Chen H, Zhang Y, Qian W, Kim H, Gan S, Zhao Y, Li J, Yi K, Feng H, Zhu P, Li B, Liu Q, Fairley S, Magor KE, Du Z, Hu X, Goodman L, Tafer H, Vignal A, Lee T, Kim KW, Sheng Z, An Y, Searle S, Herrero J, Groenen MA, Crooijmans RP, Faraut T, Cai Q, Webster RG, Aldridge JR, Warren WC, Bartschat S, Kehr S, Marz M, Stadler PF, Smith J, Kraus RH, Zhao Y, Ren L, Fei J, Morisson M, Kaiser P, Griffin DK, Rao M, Pitel F, Wang J and Li N

    State Key Laboratory for Agrobiotechnology, China Agricultural University, Beijing, China.

    The duck (Anas platyrhynchos) is one of the principal natural hosts of influenza A viruses. We present the duck genome sequence and perform deep transcriptome analyses to investigate immune-related genes. Our data indicate that the duck possesses a contractive immune gene repertoire, as in chicken and zebra finch, and this repertoire has been shaped through lineage-specific duplications. We identify genes that are responsive to influenza A viruses using the lung transcriptomes of control ducks and ones that were infected with either a highly pathogenic (A/duck/Hubei/49/05) or a weakly pathogenic (A/goose/Hubei/65/05) H5N1 virus. Further, we show how the duck's defense mechanisms against influenza infection have been optimized through the diversification of its β-defensin and butyrophilin-like repertoires. These analyses, in combination with the genomic and transcriptomic data, provide a resource for characterizing the interaction between host and influenza viruses.

    Funded by: Wellcome Trust: 095908

    Nature genetics 2013;45;7;776-83

  • Olfaction and olfactory-mediated behaviour in psychiatric disease models.

    Huckins LM, Logan DW and Sánchez-Andrade G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Rats and mice are the most widely used species for modelling psychiatric disease. Assessment of these rodent models typically involves the analysis of aberrant behaviour with behavioural interactions often being manipulated to generate the model. Rodents rely heavily on their excellent sense of smell and almost all their social interactions have a strong olfactory component. Therefore, experimental paradigms that exploit these olfactory-mediated behaviours are among the most robust available and are highly prevalent in psychiatric disease research. These include tests of aggression and maternal instinct, foraging, olfactory memory and habituation and the establishment of social hierarchies. An appreciation of the way that rodents regulate these behaviours in an ethological context can assist experimenters to generate better data from their models and to avoid common pitfalls. We describe some of the more commonly used behavioural paradigms from a rodent olfactory perspective and discuss their application in existing models of psychiatric disease. We introduce the four olfactory subsystems that integrate to mediate the behavioural responses and the types of sensory cue that promote them and discuss their control and practical implementation to improve experimental outcomes. In addition, because smell is critical for normal behaviour in rodents and yet olfactory dysfunction is often associated with neuropsychiatric disease, we introduce some tests for olfactory function that can be applied to rodent models of psychiatric disorders as part of behavioural analysis.

    Cell and tissue research 2013;354;1;69-80

  • Novel Loci Associated with Increased Risk of Sudden Cardiac Death in the Context of Coronary Artery Disease.

    Huertas-Vazquez A, Nelson CP, Guo X, Reinier K, Uy-Evanado A, Teodorescu C, Ayala J, Jerger K, Chugh H, Wtccc, Braund PS, Deloukas P, Hall AS, Balmforth AJ, Jones M, Taylor KD, Pulit SL, Newton-Cheh C, Gunson K, Jui J, Rotter JI, Albert CM, Samani NJ and Chugh SS

    The Heart Institute, Cedars-Sinai Medical Center, Los Angeles, California, United States of America.

    Background: Recent genome-wide association studies (GWAS) have identified novel loci associated with sudden cardiac death (SCD). Despite this progress, identified DNA variants account for a relatively small portion of overall SCD risk, suggesting that additional loci contributing to SCD susceptibility await discovery. The objective of this study was to identify novel DNA variation associated with SCD in the context of coronary artery disease (CAD). Using the MetaboChip custom array we conducted a case-control association analysis of 119,117 SNPs in 948 SCD cases (with underlying CAD) from the Oregon Sudden Unexpected Death Study (Oregon-SUDS) and 3,050 controls with CAD from the Wellcome Trust Case-Control Consortium (WTCCC). Two newly identified loci were significantly associated with increased risk of SCD after correction for multiple comparisons at: rs6730157 in the RAB3GAP1 gene on chromosome 2 (P = 4.93×10(-12), OR = 1.60) and rs2077316 in the ZNF365 gene on chromosome 10 (P = 3.64×10(-8), OR = 2.41). Conclusions: Our findings suggest that RAB3GAP1 and ZNF365 are relevant candidate genes for SCD and will contribute to the mechanistic understanding of SCD susceptibility.

    PloS one 2013;8;4;e59905

  • Negligible impact of rare autoimmune-locus coding-region variants on missing heritability.

    Hunt KA, Mistry V, Bockett NA, Ahmad T, Ban M, Barker JN, Barrett JC, Blackburn H, Brand O, Burren O, Capon F, Compston A, Gough SC, Jostins L, Kong Y, Lee JC, Lek M, Macarthur DG, Mansfield JC, Mathew CG, Mein CA, Mirza M, Nutland S, Onengut-Gumuscu S, Papouli E, Parkes M, Rich SS, Sawcer S, Satsangi J, Simmonds MJ, Trembath RC, Walker NM, Wozniak E, Todd JA, Simpson MA, Plagnol V and van Heel DA

    Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK.

    Genome-wide association studies (GWAS) have identified common variants of modest-effect size at hundreds of loci for common autoimmune diseases; however, a substantial fraction of heritability remains unexplained, to which rare variants may contribute. To discover rare variants and test them for association with a phenotype, most studies re-sequence a small initial sample size and then genotype the discovered variants in a larger sample set. This approach fails to analyse a large fraction of the rare variants present in the entire sample set. Here we perform simultaneous amplicon-sequencing-based variant discovery and genotyping for coding exons of 25 GWAS risk genes in 41,911 UK residents of white European origin, comprising 24,892 subjects with six autoimmune disease phenotypes and 17,019 controls, and show that rare coding-region variants at known loci have a negligible role in common autoimmune disease susceptibility. These results do not support the rare-variant synthetic genome-wide-association hypothesis (in which unobserved rare causal variants lead to association detected at common tag variants). Many known autoimmune disease risk loci contain multiple, independently associated, common and low-frequency variants, and so genes at these loci are a priori stronger candidates for harbouring rare coding-region variants than other genes. Our data indicate that the missing heritability for common autoimmune diseases may not be attributable to the rare coding-region variant portion of the allelic spectrum, but perhaps, as others have proposed, may be a result of many common-variant loci of weak effect.

    Nature 2013

  • REAPR: a universal tool for genome assembly evaluation.

    Hunt M, Kikuchi T, Sanders M, Newbold C, Berriman M and Otto TD

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK. tdo@sanger.ac.uk.

    Methods to reliably assess the accuracy of genome sequence data are lacking. Currently completeness is only described qualitatively and mis-assemblies are overlooked. Here we present REAPR, a tool that precisely identifies errors in genome assemblies without the need for a reference sequence. We have validated REAPR on complete genomes or de novo assemblies from bacteria, malaria and Caenorhabditis elegans, and demonstrate that 86% and 82% of the human and mouse reference genomes are error-free, respectively. When applied to an ongoing genome project, REAPR provides corrected assembly statistics allowing the quantitative comparison of multiple assemblies. REAPR is available at http://www.sanger.ac.uk/resources/software/reapr/.

    Genome biology 2013;14;5;R47

  • LUD, a new protein domain associated with lactate utilization.

    Hwang WC, Bakolitsa C, Punta M, Coggill PC, Bateman A, Axelrod HL, Rawlings ND, Sedova M, Peterson SN, Eberhardt RY, Aravind L, Pascual J and Godzik A

    Joint Center for Structural Genomics, La Jolla, CA 92037, USA. wchwang@sanfordburnham.org.

    Background: A novel highly conserved protein domain, DUF162 [Pfam: PF02589], can be mapped to two proteins: LutB and LutC. Both proteins are encoded by a highly conserved LutABC operon, which has been implicated in lactate utilization in bacteria. Based on our analysis of its sequence, structure, and recent experimental evidence reported by other groups, we hereby redefine DUF162 as the LUD domain family. Results: JCSG solved the first crystal structure [PDB:2G40] from the LUD domain family: LutC protein, encoded by ORF DR_1909, of Deinococcus radiodurans. LutC shares features with domains in the functionally diverse ISOCOT superfamily. We have observed that the LUD domain has an increased abundance in the human gut microbiome. Conclusions: We propose a model for the substrate and cofactor binding and regulation in LUD domain. The significance of LUD-containing proteins in the human gut microbiome, and the implication of lactate metabolism in the radiation-resistance of Deinococcus radiodurans are discussed.

    BMC bioinformatics 2013;14;341

  • Astroglial IFITM3 mediates neuronal impairments following neonatal immune challenge in mice.

    Ibi D, Nagai T, Nakajima A, Mizoguchi H, Kawase T, Tsuboi D, Kano SI, Sato Y, Hayakawa M, Lange UC, Adams DJ, Surani MA, Satoh T, Sawa A, Kaibuchi K, Nabeshima T and Yamada K

    Department of Neuropsychopharmacology and Hospital Pharmacy, Nagoya University Graduate School of Medicine, Nagoya, Japan; Department of Chemical Pharmacology, Graduate School of Pharmaceutical Sciences, Meijo University, Nagoya, Japan.

    Interferon-induced transmembrane protein 3 (IFITM3) ıplays a crucial role in the antiviral responses of Type I interferons (IFNs). The role of IFITM3 in the central nervous system (CNS) is, however, largely unknown, despite the fact that its expression is increased in the brains of patients with neurologic and neuropsychiatric diseases. Here, we show the role of IFITM3 in long-lasting neuronal impairments in mice following polyriboinosinic-polyribocytidylic acid (polyI:C, a synthetic double-stranded RNA)-induced immune challenge during the early stages of development. We found that the induction of IFITM3 expression in the brain of mice treated with polyI:C was observed only in astrocytes. Cultured astrocytes were activated by polyI:C treatment, leading to an increase in the mRNA levels of inflammatory cytokines as well as Ifitm3. When cultured neurons were treated with the conditioned medium of polyI:C-treated astrocytes (polyI:C-ACM), neurite development was impaired. These polyI:C-ACM-induced neurodevelopmental abnormalities were alleviated by ifitm3(-) (/) (-) astrocyte-conditioned medium. Furthermore, decreases of MAP2 expression, spine density, and dendrite complexity in the frontal cortex as well as memory impairment were evident in polyI:C-treated wild-type mice, but such neuronal impairments were not observed in ifitm3(-) (/) (-) mice. We also found that IFITM3 proteins were localized to the early endosomes of astrocytes following polyI:C treatment and reduced endocytic activity. These findings suggest that the induction of IFITM3 expression in astrocytes by the activation of the innate immune system during the early stages of development has non-cell autonomous effects that affect subsequent neurodevelopment, leading to neuropathological impairments and brain dysfunction, by impairing endocytosis in astrocytes. GLIA 2013.

    Glia 2013

  • The role of high-throughput technologies in clinical cancer genomics.

    Idris SF, Ahmad SS, Scott MA, Vassiliou GS and Hadfield J

    Department of Hematology/Oncology, Cambridge University NHS Hospitals Foundation Trust, Cambridge, CB2 0QQ, UK.

    Cancer is a genetic disease driven by both heritable and somatic alterations in DNA, which underpin not only oncogenesis but also progression and eventual metastasis. The major impetus for elucidating the nature and function of somatic mutations in cancer genomes is the potential for the development of effective targeted anticancer therapies. Over the last decade, high-throughput technologies have allowed us unprecedented access to a host of cancer genomes, leading to an influx of new information about their pathobiology. The challenge now is to integrate such emerging information into clinical practice to achieve tangible benefits for cancer patients. This review examines the roles array-based comparative genomic hybridization and next-generation sequencing are playing in furthering our understanding of both hematological and solid-organ tumors. Furthermore, the authors discuss the current challenges in translating the role of these technologies from bench to bedside.

    Funded by: Wellcome Trust: 095663

    Expert review of molecular diagnostics 2013;13;2;167-81

  • Inferring genome-wide recombination landscapes from advanced intercross lines: application to yeast crosses.

    Illingworth CJ, Parts L, Bergström A, Liti G and Mustonen V

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    Accurate estimates of recombination rates are of great importance for understanding evolution. In an experimental genetic cross, recombination breaks apart and rejoins genetic material, such that the genomes of the resulting isolates are comprised of distinct blocks of differing parental origin. We here describe a method exploiting this fact to infer genome-wide recombination profiles from sequenced isolates from an advanced intercross line (AIL). We verified the accuracy of the method against simulated data. Next, we sequenced 192 isolates from a twelve-generation cross between West African and North American yeast Saccharomyces cerevisiae strains and inferred the underlying recombination landscape at a fine genomic resolution (mean segregating site distance 0.22 kb). Comparison was made with landscapes inferred for a similar cross between four yeast strains, and with a previous single-generation, intra-strain cross (Mancera et al., Nature 2008). Moderate congruence was identified between landscapes (correlation 0.58-0.77 at 5 kb resolution), albeit with variance between mean genome-wide recombination rates. The multiple generations of mating undergone in the AILs gave more precise inference of recombination rates than could be achieved from a single-generation cross, in particular in identifying recombination cold-spots. The recombination landscapes we describe have particular utility; both AILs are part of a resource to study complex yeast traits (see e.g. Parts et al., Genome Res 2011). Our results will enable future applications of this resource to take better account of local linkage structure heterogeneities. Our method has general applicability to other crossing experiments, including a variety of experimental designs.

    PloS one 2013;8;5;e62266

  • Computational approaches to identify functional genetic variants in cancer genomes.

    International Cancer Genome Consortium Mutation Pathways and Consequences Subgroup of the Bioinformatics Analyses Working Group, Gonzalez-Perez A, Mustonen V, Reva B, Ritchie GR, Creixell P, Karchin R, Vazquez M, Fink JL, Kassahn KS, Pearson JV, Bader GD, Boutros PC, Muthuswamy L, Ouellette BF, Reimand J, Linding R, Shibata T, Valencia A, Butler A, Dronov S, Flicek P, Shannon NB, Carter H, Ding L, Sander C, Stuart JM, Stein LD and Lopez-Bigas N

    1] Research Unit on Biomedical Informatics, University Pompeu Fabra, Barcelona, Spain. [2].

    The International Cancer Genome Consortium (ICGC) aims to catalog genomic abnormalities in tumors from 50 different cancer types. Genome sequencing reveals hundreds to thousands of somatic mutations in each tumor but only a minority of these drive tumor progression. We present the result of discussions within the ICGC on how to address the challenge of identifying mutations that contribute to oncogenesis, tumor maintenance or response to therapy, and recommend computational techniques to annotate somatic variants and predict their impact on cancer phenotype.

    Nature methods 2013;10;8;723-9

  • Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis.

    International Multiple Sclerosis Genetics Consortium (IMSGC), Beecham AH, Patsopoulos NA, Xifara DK, Davis MF, Kemppinen A, Cotsapas C, Shah TS, Spencer C, Booth D, Goris A, Oturai A, Saarela J, Fontaine B, Hemmer B, Martin C, Zipp F, D'Alfonso S, Martinelli-Boneschi F, Taylor B, Harbo HF, Kockum I, Hillert J, Olsson T, Ban M, Oksenberg JR, Hintzen R, Barcellos LF, Wellcome Trust Case Control Consortium 2 (WTCCC2), International IBD Genetics Consortium (IIBDGC), Agliardi C, Alfredsson L, Alizadeh M, Anderson C, Andrews R, Søndergaard HB, Baker A, Band G, Baranzini SE, Barizzone N, Barrett J, Bellenguez C, Bergamaschi L, Bernardinelli L, Berthele A, Biberacher V, Binder TM, Blackburn H, Bomfim IL, Brambilla P, Broadley S, Brochet B, Brundin L, Buck D, Butzkueven H, Caillier SJ, Camu W, Carpentier W, Cavalla P, Celius EG, Coman I, Comi G, Corrado L, Cosemans L, Cournu-Rebeix I, Cree BA, Cusi D, Damotte V, Defer G, Delgado SR, Deloukas P, di Sapio A, Dilthey AT, Donnelly P, Dubois B, Duddy M, Edkins S, Elovaara I, Esposito F, Evangelou N, Fiddes B, Field J, Franke A, Freeman C, Frohlich IY, Galimberti D, Gieger C, Gourraud PA, Graetz C, Graham A, Grummel V, Guaschino C, Hadjixenofontos A, Hakonarson H, Halfpenny C, Hall G, Hall P, Hamsten A, Harley J, Harrower T, Hawkins C, Hellenthal G, Hillier C, Hobart J, Hoshi M, Hunt SE, Jagodic M, Jelčić I, Jochim A, Kendall B, Kermode A, Kilpatrick T, Koivisto K, Konidari I, Korn T, Kronsbein H, Langford C, Larsson M, Lathrop M, Lebrun-Frenay C, Lechner-Scott J, Lee MH, Leone MA, Leppä V, Liberatore G, Lie BA, Lill CM, Lindén M, Link J, Luessi F, Lycke J, Macciardi F, Männistö S, Manrique CP, Martin R, Martinelli V, Mason D, Mazibrada G, McCabe C, Mero IL, Mescheriakova J, Moutsianas L, Myhr KM, Nagels G, Nicholas R, Nilsson P, Piehl F, Pirinen M, Price SE, Quach H, Reunanen M, Robberecht W, Robertson NP, Rodegher M, Rog D, Salvetti M, Schnetz-Boutaud NC, Sellebjerg F, Selter RC, Schaefer C, Shaunak S, Shen L, Shields S, Siffrin V, Slee M, Sorensen PS, Sorosina M, Sospedra M, Spurkland A, Strange A, Sundqvist E, Thijs V, Thorpe J, Ticca A, Tienari P, van Duijn C, Visser EM, Vucic S, Westerlind H, Wiley JS, Wilkins A, Wilson JF, Winkelmann J, Zajicek J, Zindler E, Haines JL, Pericak-Vance MA, Ivinson AJ, Stewart G, Hafler D, Hauser SL, Compston A, McVean G, De Jager P, Sawcer SJ and McCauley JL

    1] John P. Hussman Institute for Human Genomics, University of Miami, Miller School of Medicine, Miami, Florida, USA. [2].

    Using the ImmunoChip custom genotyping array, we analyzed 14,498 subjects with multiple sclerosis and 24,091 healthy controls for 161,311 autosomal variants and identified 135 potentially associated regions (P < 1.0 × 10(-4)). In a replication phase, we combined these data with previous genome-wide association study (GWAS) data from an independent 14,802 subjects with multiple sclerosis and 26,703 healthy controls. In these 80,094 individuals of European ancestry, we identified 48 new susceptibility variants (P < 5.0 × 10(-8)), 3 of which we found after conditioning on previously identified variants. Thus, there are now 110 established multiple sclerosis risk variants at 103 discrete loci outside of the major histocompatibility complex. With high-resolution Bayesian fine mapping, we identified five regions where one variant accounted for more than 50% of the posterior probability of association. This study enhances the catalog of multiple sclerosis risk variants and illustrates the value of fine mapping in the resolution of GWAS signals.

    Funded by: Chief Scientist Office: CZB/4/710; Medical Research Council: G0000934, G0700061; Multiple Sclerosis Society: 862, 894, 898, 955; NCATS NIH HHS: UL1 TR000142; NCI NIH HHS: R01 CA104021; NCRR NIH HHS: UL1 RR024975; NIAID NIH HHS: R01 AI076544; NIEHS NIH HHS: R01 ES017080; NIGMS NIH HHS: RC2 GM093080; NINDS NIH HHS: R01 NS026799, R01 NS032830, R01 NS049477, R01 NS049510, R01 NS067305, RC2 NS070340; Wellcome Trust: 068545, 085475DONNELLY, 085475PELTONEN, 090532, 095552, 098051

    Nature genetics 2013;45;11;1353-60

  • Network based elucidation of drug response: from modulators to targets.

    Iorio F, Saez-Rodriguez J and di Bernardo D

    Telethon Institute of Genetics and Medicine, Naples, Italy. dibernardo@tigem.it.

    : Network-based drug discovery aims at harnessing the power of networks to investigate the mechanism of action of existing drugs, or new molecules, in order to identify innovative therapeutic treatments. In this review, we describe some of the most recent advances in the field of network pharmacology, starting with approaches relying on computational models of transcriptional networks, then moving to protein and signaling network models and concluding with "drug networks". These networks are derived from different sources of experimental data, or literature-based analysis, and provide a complementary view of drug mode of action. Molecular and drug networks are powerful integrated computational and experimental approaches that will likely speed up and improve the drug discovery process, once fully integrated into the academic and industrial drug discovery pipeline.

    BMC systems biology 2013;7;139

  • A Cell-surface Phylome for African Trypanosomes.

    Jackson AP, Allison HC, Barry JD, Field MC, Hertz-Fowler C and Berriman M

    Pathogen Genomics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, England, United Kingdom ; Department of Infection Biology, Institute of Infection and Global Health, University of Liverpool, Liverpool, England, United Kingdom.

    The cell surface of Trypanosoma brucei, like many protistan blood parasites, is crucial for mediating host-parasite interactions and is instrumental to the initiation, maintenance and severity of infection. Previous comparisons with the related trypanosomatid parasites T. cruzi and Leishmania major suggest that the cell-surface proteome of T. brucei is largely taxon-specific. Here we compare genes predicted to encode cell surface proteins of T. brucei with those from two related African trypanosomes, T. congolense and T. vivax. We created a cell surface phylome (CSP) by estimating phylogenies for 79 gene families with putative surface functions to understand the more recent evolution of African trypanosome surface architecture. Our findings demonstrate that the transferrin receptor genes essential for bloodstream survival in T. brucei are conserved in T. congolense but absent from T. vivax and include an expanded gene family of insect stage-specific surface glycoproteins that includes many currently uncharacterized genes. We also identify species-specific features and innovations and confirm that these include most expression site-associated genes (ESAGs) in T. brucei, which are absent from T. congolense and T. vivax. The CSP presents the first global picture of the origins and dynamics of cell surface architecture in African trypanosomes, representing the principal differences in genomic repertoire between African trypanosome species and provides a basis from which to explore the developmental and pathological differences in surface architectures. All data can be accessed at: http://www.genedb.org/Page/trypanosoma_surface_phylome.

    PLoS neglected tropical diseases 2013;7;3;e2121

  • iAnn: An Event Sharing Platform for the Life Sciences.

    Jimenez RC, Albar JP, Bhak J, Blatter MC, Blicher T, Brazas MD, Brooksbank C, Budd A, De Las Rivas J, Dreyer J, van Driel MA, Dunn MJ, Fernandes PL, van Gelder CW, Hermjakob H, Ioannidis V, Judge DP, Kahlem P, Korpelainen E, Kraus HJ, Loveland J, Mayer C, McDowall J, Moran F, Mulder N, Nyronen T, Rother K, Salazar GA, Schneider MV, Schneider R, Via A, Villaveces JM, Yu P, Attwood TK and Corpas M

    European Bioinformatics Institute, Hinxton, UK, National Center for Biotecnology-CSIC, Madrid, Spain, Theragen BioInstitute, South Korea, SIB Swiss Institute of Bioinformatics, Genève, Switzerland, NNF Center for Protein Research, Copenhagen, Denmark, Ontario Institute for Cancer Research, Toronto, Canada, European Molecular Biology Laboratory, Heidelberg, Germany, Cancer Research Center (IBMCC-CSIC), Salamanca, Spain, Netherlands Bioinformatics Centre, Nijmegen, Netherlands, University College Dublin, Dublin, Ireland, Instituto Gulbenkian de Ciência, Oeiras, Portugal, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland, University of Cambridge, Cambridge, UK, CSC-Scientific Computing Ltd., Espoo, Finland, WILEY-VCH Verlag, Weinheim, Germany, Wellcome Trust Sanger Institute, Hinxton, UK, Universidad Complutense, Madrid, Spain, Dept. Clinical Laboratory Sciences, IIDMM, University of Cape Town, South Africa, Academis Training, Berlin, Germany, The Genome Analysis Centre, Norwich, UK, Luxembourg Center for Systems Biomedicine, University of Luxembourg, Luxemburg, Sapienza University, Rome, Italy, Max Planck Institute for Biology of Ageing, Cologne, Germany, Itico, Fulbourn, Cambridge, UK, The University of Manchester, Manchester, UK.

    SUMMARY: We present iAnn, an open source community-driven platform for dissemination of life science events, such as courses, conferences and workshops. iAnn allows automatic visualisation and integration of customised event reports. A central repository lies at the core of the platform: curators add submitted events, and these are subsequently accessed via Web services. Thus, once an iAnn widget is incorporated into a website, it permanently shows timely, relevant information as if it were native to the remote site. At the same time, announcements submitted to the repository are automatically disseminated to all portals that query the system. To facilitate the visualisation of announcements, iAnn provides powerful filtering options and views, integrated in Google Maps and Google Calendar. All iAnn widgets are freely available. AVAILABILITY: http://iann.pro/iannviewer CONTACT: manuel.corpas@tgac.ac.uk.

    Bioinformatics (Oxford, England) 2013

  • The CD225 domain of IFITM3 is required for both IFITM protein association and inhibition of influenza A virus and dengue virus replication.

    John SP, Chin CR, Perreira JM, Feeley EM, Aker AM, Savidis G, Smith SE, Elia AE, Everitt AR, Vora M, Pertel T, Elledge SJ, Kellam P and Brass AL

    Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, Worcester, Massachusetts, USA.

    The interferon-induced transmembrane protein 3 (IFITM3) gene is an interferon-stimulated gene that inhibits the replication of multiple pathogenic viruses in vitro and in vivo. IFITM3 is a member of a large protein superfamily, whose members share a functionally undefined area of high amino acid conservation, the CD225 domain. We performed mutational analyses of IFITM3 and identified multiple residues within the CD225 domain, consisting of the first intramembrane domain (intramembrane domain 1 [IM1]) and a conserved intracellular loop (CIL), that are required for restriction of both influenza A virus (IAV) and dengue virus (DENV) infection in vitro. Two phenylalanines within IM1 (F75 and F78) also mediate a physical association between IFITM proteins, and the loss of this interaction decreases IFITM3-mediated restriction. By extension, similar IM1-mediated associations may contribute to the functions of additional members of the CD225 domain family. IFITM3's distal N-terminal domain is also needed for full antiviral activity, including a tyrosine (Y20), whose alteration results in mislocalization of a portion of IFITM3 to the cell periphery and surface. Comparative analyses demonstrate that similar molecular determinants are needed for IFITM3's restriction of both IAV and DENV. However, a portion of the CIL including Y99 and R87 is preferentially needed for inhibition of the orthomyxovirus. Several IFITM3 proteins engineered with rare single-nucleotide polymorphisms demonstrated reduced expression or mislocalization, and these events were associated with enhanced viral replication in vitro, suggesting that possessing such alleles may impact an individual's risk for viral infection. On the basis of this and other data, we propose a model for IFITM3-mediated restriction.

    Funded by: Howard Hughes Medical Institute; NIAID NIH HHS: 1R01AI091786, R01 AI091786; Wellcome Trust

    Journal of virology 2013;87;14;7837-52

  • Presynaptic maturation in auditory hair cells requires a critical period of sensory-independent spiking activity.

    Johnson SL, Kuhn S, Franz C, Ingham N, Furness DN, Knipper M, Steel KP, Adelman JP, Holley MC and Marcotti W

    Department of Biomedical Science, University of Sheffield, Sheffield S10 2TN, United Kingdom.

    The development of neural circuits relies on spontaneous electrical activity that occurs during immature stages of development. In the developing mammalian auditory system, spontaneous calcium action potentials are generated by inner hair cells (IHCs), which form the primary sensory synapse. It remains unknown whether this electrical activity is required for the functional maturation of the auditory system. We found that sensory-independent electrical activity controls synaptic maturation in IHCs. We used a mouse model in which the potassium channel SK2 is normally overexpressed, but can be modulated in vivo using doxycycline. SK2 overexpression affected the frequency and duration of spontaneous action potentials, which prevented the development of the Ca(2+)-sensitivity of vesicle fusion at IHC ribbon synapses, without affecting their morphology or general cell development. By manipulating the in vivo expression of SK2 channels, we identified the "critical period" during which spiking activity influences IHC synaptic maturation. Here we provide direct evidence that IHC development depends upon a specific temporal pattern of calcium spikes before sound-driven neuronal activity.

    Proceedings of the National Academy of Sciences of the United States of America 2013;110;21;8720-5

  • Open science and community norms: Data retention and publication moratoria policies in genomics projects

    Joly,Y.;, Dove,E.S.;, KENNEDY,K.L.;, Bobrow,M.;, Ouellette,B.F.F.;, DYKE,S.O.M.;, Kato,K.; and Knoppers,B.M.

    Medical Law International 2013;12;2;92-120

  • A sequence variant associated with sortilin-1 (SORT1) on 1p13.3 is independently associated with abdominal aortic aneurysm.

    Jones GT, Bown MJ, Gretarsdottir S, Romaine SP, Helgadottir A, Yu G, Tromp G, Norman PE, Jin C, Baas AF, Blankensteijn JD, Kullo IJ, Phillips LV, Williams MJ, Topless R, Merriman TR, Vasudevan TM, Lewis DR, Blair RD, Hill AA, Sayers RD, Powell JT, Deloukas P, Thorleifsson G, Matthiasson SE, Thorsteinsdottir U, Golledge J, Ariëns RA, Johnson A, Sohrabi S, Scott DJ, Carey DJ, Erdman R, Elmore JR, Kuivaniemi H, Samani NJ, Stefansson K and van Rij AM

    Abdominal aortic aneurysm (AAA) is a common human disease with a high estimated heritability (0.7); however, only a small number of associated genetic loci have been reported to date. In contrast, over 100 loci have now been reproducibly associated with either blood lipid profile and/or coronary artery disease (CAD) (both risk factors for AAA) in large-scale meta-analyses. This study employed a staged design to investigate whether the loci for these two phenotypes are also associated with AAA. Validated CAD and dyslipidaemia loci underwent screening using the Otago AAA genome-wide association data set. Putative associations underwent staged secondary validation in 10 additional cohorts. A novel association between the SORT1 (1p13.3) locus and AAA was identified. The rs599839 G allele, which has been previously associated with both dyslipidaemia and CAD, reached genome-wide significance in 11 combined independent cohorts (meta-analysis with 7048 AAA cases and 75 976 controls: G allele OR 0.81, 95% CI 0.76-0.85, P = 7.2 × 10(-14)). Modelling for confounding interactions of concurrent dyslipidaemia, heart disease and other risk factors suggested that this marker is an independent predictor of AAA susceptibility. In conclusion, a genetic marker associated with cardiovascular risk factors, and in particular concurrent vascular disease, appeared to independently contribute to susceptibility for AAA. Given the potential genetic overlap between risk factor and disease phenotypes, the use of well-characterized case-control cohorts allowing for modelling of cardiovascular disease risk confounders will be an important component in the future discovery of genetic markers for conditions such as AAA.

    Funded by: NHLBI NIH HHS: R01 HL064310

    Human molecular genetics 2013;22;14;2941-7

  • Using genetic prediction from known complex disease Loci to guide the design of next-generation sequencing experiments.

    Jostins L, Levine AP and Barrett JC

    Medical Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom ; Wellcome Trust Centre for Human Genetics, Univeristy of Oxford, Oxford, United Kingdom.

    A central focus of complex disease genetics after genome-wide association studies (GWAS) is to identify low frequency and rare risk variants, which may account for an important fraction of disease heritability unexplained by GWAS. A profusion of studies using next-generation sequencing are seeking such risk alleles. We describe how already-known complex trait loci (largely from GWAS) can be used to guide the design of these new studies by selecting cases, controls, or families who are most likely to harbor undiscovered risk alleles. We show that genetic risk prediction can select unrelated cases from large cohorts who are enriched for unknown risk factors, or multiply-affected families that are more likely to harbor high-penetrance risk alleles. We derive the frequency of an undiscovered risk allele in selected cases and controls, and show how this relates to the variance explained by the risk score, the disease prevalence and the population frequency of the risk allele. We also describe a new method for informing the design of sequencing studies using genetic risk prediction in large partially-genotyped families using an extension of the Inside-Outside algorithm for inference on trees. We explore several study design scenarios using both simulated and real data, and show that in many cases genetic risk prediction can provide significant increases in power to detect low-frequency and rare risk alleles. The same approach can also be used to aid discovery of non-genetic risk factors, suggesting possible future utility of genetic risk prediction in conventional epidemiology. Software implementing the methods in this paper is available in the R package Mangrove.

    PloS one 2013;8;10;e76328

  • Near in place linear time minimum redundancy coding

    Kärkkäinen,J.; and TISCHLER,G.;

    Data Compression Conference Proceedings 2013;411-20

  • Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research.

    Köhler S, Doelken SC, Ruef BJ, Bauer S, Washington N, Westerfield M, Gkoutos G, Schofield P, Smedley D, Lewis SE, Robinson PN and Mungall CJ

    Institute for Medical and Human Genetics, Chairité-Universitatsmedizin Berlin, Berlin, 13353, Germany ; Berlin-Brandenberg Center for Regenerative Therapies (BCRT), Charité-Universitatsmedizin Berlin, Berlin, 13352, Germany.

    Phenotype analyses, e.g. investigating metabolic processes, tissue formation, or organism behavior, are an important element of most biological and medical research activities. Biomedical researchers are making increased use of ontological standards and methods to capture the results of such analyses, with one focus being the comparison and analysis of phenotype information between species. We have generated a cross-species phenotype ontology for human, mouse and zebra fish that contains zebrafish phenotypes. We also provide up-to-date annotation data connecting human genes to phenotype classes from the generated ontology. We have included the data generation pipeline into our continuous integration system ensuring stable and up-to-date releases. This article describes the data generation process and is intended to help interested researchers access both the phenotype annotation data and the associated cross-species phenotype ontology. The resource described here can be used in sophisticated semantic similarity and gene set enrichment analyses for phenotype data across species. The stable releases of this resource can be obtained from http://purl.obolibrary.org/obo/hp/uberpheno/.

    F1000Research 2013;2

  • Whole-genome sequencing for rapid susceptibility testing of M. tuberculosis.

    Köser CU, Bryant JM, Becq J, Török ME, Ellington MJ, Marti-Renom MA, Carmichael AJ, Parkhill J, Smith GP and Peacock SJ

    Funded by: Biotechnology and Biological Sciences Research Council; Chief Scientist Office: G1000803; Department of Health; Medical Research Council: G1000803; Wellcome Trust: WT098051

    The New England journal of medicine 2013;369;3;290-2

  • Consequences of whiB7 (Rv3197A) Mutations in Beijing Genotype Isolates of the Mycobacterium tuberculosis Complex.

    Köser CU, Bryant JM, Parkhill J and Peacock SJ

    Public Health England, Cambridge, United Kingdom.

    Antimicrobial agents and chemotherapy 2013;57;7;3461

  • Genome-wide association analyses identify 18 new loci associated with serum urate concentrations.

    Köttgen A, Albrecht E, Teumer A, Vitart V, Krumsiek J, Hundertmark C, Pistis G, Ruggiero D, O'Seaghdha CM, Haller T, Yang Q, Tanaka T, Johnson AD, Kutalik Z, Smith AV, Shi J, Struchalin M, Middelberg RP, Brown MJ, Gaffo AL, Pirastu N, Li G, Hayward C, Zemunik T, Huffman J, Yengo L, Zhao JH, Demirkan A, Feitosa MF, Liu X, Malerba G, Lopez LM, van der Harst P, Li X, Kleber ME, Hicks AA, Nolte IM, Johansson A, Murgia F, Wild SH, Bakker SJ, Peden JF, Dehghan A, Steri M, Tenesa A, Lagou V, Salo P, Mangino M, Rose LM, Lehtimäki T, Woodward OM, Okada Y, Tin A, Müller C, Oldmeadow C, Putku M, Czamara D, Kraft P, Frogheri L, Thun GA, Grotevendt A, Gislason GK, Harris TB, Launer LJ, McArdle P, Shuldiner AR, Boerwinkle E, Coresh J, Schmidt H, Schallert M, Martin NG, Montgomery GW, Kubo M, Nakamura Y, Tanaka T, Munroe PB, Samani NJ, Jacobs DR, Liu K, D'Adamo P, Ulivi S, Rotter JI, Psaty BM, Vollenweider P, Waeber G, Campbell S, Devuyst O, Navarro P, Kolcic I, Hastie N, Balkau B, Froguel P, Esko T, Salumets A, Khaw KT, Langenberg C, Wareham NJ, Isaacs A, Kraja A, Zhang Q, Wild PS, Scott RJ, Holliday EG, Org E, Viigimaa M, Bandinelli S, Metter JE, Lupo A, Trabetti E, Sorice R, Döring A, Lattka E, Strauch K, Theis F, Waldenberger M, Wichmann HE, Davies G, Gow AJ, Bruinenberg M, LifeLines Cohort Study, Stolk RP, Kooner JS, Zhang W, Winkelmann BR, Boehm BO, Lucae S, Penninx BW, Smit JH, Curhan G, Mudgal P, Plenge RM, Portas L, Persico I, Kirin M, Wilson JF, Mateo Leach I, van Gilst WH, Goel A, Ongen H, Hofman A, Rivadeneira F, Uitterlinden AG, Imboden M, von Eckardstein A, Cucca F, Nagaraja R, Piras MG, Nauck M, Schurmann C, Budde K, Ernst F, Farrington SM, Theodoratou E, Prokopenko I, Stumvoll M, Jula A, Perola M, Salomaa V, Shin SY, Spector TD, Sala C, Ridker PM, Kähönen M, Viikari J, Hengstenberg C, Nelson CP, CARDIoGRAM Consortium, DIAGRAM Consortium, ICBP Consortium, MAGIC Consortium, Meschia JF, Nalls MA, Sharma P, Singleton AB, Kamatani N, Zeller T, Burnier M, Attia J, Laan M, Klopp N, Hillege HL, Kloiber S, Choi H, Pirastu M, Tore S, Probst-Hensch NM, Völzke H, Gudnason V, Parsa A, Schmidt R, Whitfield JB, Fornage M, Gasparini P, Siscovick DS, Polašek O, Campbell H, Rudan I, Bouatia-Naji N, Metspalu A, Loos RJ, van Duijn CM, Borecki IB, Ferrucci L, Gambaro G, Deary IJ, Wolffenbuttel BH, Chambers JC, März W, Pramstaller PP, Snieder H, Gyllensten U, Wright AF, Navis G, Watkins H, Witteman JC, Sanna S, Schipf S, Dunlop MG, Tönjes A, Ripatti S, Soranzo N, Toniolo D, Chasman DI, Raitakari O, Kao WH, Ciullo M, Fox CS, Caulfield M, Bochud M and Gieger C

    Renal Division, Freiburg University Hospital, Freiburg, Germany. anna.koettgen@uniklinik-freiburg.de

    Elevated serum urate concentrations can cause gout, a prevalent and painful inflammatory arthritis. By combining data from >140,000 individuals of European ancestry within the Global Urate Genetics Consortium (GUGC), we identified and replicated 28 genome-wide significant loci in association with serum urate concentrations (18 new regions in or near TRIM46, INHBB, SFMBT1, TMEM171, VEGFA, BAZ1B, PRKAG2, STC1, HNF4G, A1CF, ATXN2, UBE2Q2, IGF1R, NFAT5, MAF, HLF, ACVR1B-ACVRL1 and B3GNT4). Associations for many of the loci were of similar magnitude in individuals of non-European ancestry. We further characterized these loci for associations with gout, transcript expression and the fractional excretion of urate. Network analyses implicate the inhibins-activins signaling pathways and glucose metabolism in systemic urate control. New candidate genes for serum urate concentration highlight the importance of metabolic control of urate production and excretion, which may have implications for the treatment and prevention of gout.

    Funded by: Cancer Research UK: 12076; Chief Scientist Office: CZB/4/710; Medical Research Council: G0600237, G0700704, G1000143, G9521010, MC_PC_U127561128, MC_U106179471, MC_U106188470, MC_U127527198, MC_U127592696; NCATS NIH HHS: UL1 TR000124; NCI NIH HHS: P01 CA087969, R01 CA047988; NCRR NIH HHS: K12 RR023250, M01 RR016500, UL1 RR025005; NHGRI NIH HHS: U01 HG004402, U01 HG004424, U01 HG004446, U01 HG004729; NHLBI NIH HHS: HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, HHSN268201100012C, HHSN268201200036C, N01 HC025195, N01 HC045134, N01 HC095170, N01 HC095171, N01 HC095172, N01HC05187, N01HC45204, N01HC45205, N01HC48047, N01HC48048, N01HC48049, N01HC48050, N01HC55222, N01HC75150, N01HC85079, N01HC85086, N01HC95095, N02HL64278, R01 HL043851, R01 HL059367, R01 HL080295, R01 HL084099, R01 HL086694, R01 HL087641, R01 HL087652, R01 HL088119, R01 HL105756, T32 HL007024, U01 HL069757, U01 HL072515, U01 HL084756; NIA NIH HHS: N01AG12100, N01AG12109, R01 AG015928, R01 AG018728, R01 AG020098, R01 AG023629, R01 AG027058, Z01 AG000954-06; NIAAA NIH HHS: K05 AA017688, P50 AA011998, R01 AA007535, R01 AA013320, R01 AA013321, R01 AA013326, R01 AA014041; NIAMS NIH HHS: P60 AR047785, R01 AR056291, R21 AR056042; NIDA NIH HHS: R01 DA012854; NIDDK NIH HHS: P30 DK063491, P30 DK072488, P60 DK079637; NIGMS NIH HHS: U01 GM074518; NIMH NIH HHS: R01 MH066206

    Nature genetics 2013;45;2;145-54

  • KAT5 tyrosine phosphorylation couples chromatin sensing to ATM signalling.

    Kaidi A and Jackson SP

    The Gurdon Institute and Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, UK.

    The detection of DNA lesions within chromatin represents a critical step in cellular responses to DNA damage. However, the regulatory mechanisms that couple chromatin sensing to DNA-damage signalling in mammalian cells are not well understood. Here we show that tyrosine phosphorylation of the protein acetyltransferase KAT5 (also known as TIP60) increases after DNA damage in a manner that promotes KAT5 binding to the histone mark H3K9me3. This triggers KAT5-mediated acetylation of the ATM kinase, promoting DNA-damage-checkpoint activation and cell survival. We also establish that chromatin alterations can themselves enhance KAT5 tyrosine phosphorylation and ATM-dependent signalling, and identify the proto-oncogene c-Abl as a mediator of this modification. These findings define KAT5 tyrosine phosphorylation as a key event in the sensing of genomic and chromatin perturbations, and highlight a key role for c-Abl in such processes.

    Funded by: Cancer Research UK: C6/A11224; Wellcome Trust: WT092096

    Nature 2013;498;7452;70-4

  • Unusual features in organisation of capsular polysaccharide-related genes of C. jejuni strain X.

    Karlyshev AV, Quail MA, Parkhill J and Wren BW

    School of Life Sciences, Kingston University, Faculty of Science, Engineering and Computing, Penrhyn Road, Kingston-upon Thames, KT1 2EE, UK. a.karlyshev@kingston.ac.uk

    PCR probing of the genome of Campylobacter jejuni strain X using conserved capsular polysaccharide (CPS)-related genes allowed elucidation of a complete sequence of the respective gene cluster (cps). This is the largest known Campylobacter cps cluster (38 kb excluding flanking kps regions), which includes a number of genes not detected in other Campylobacter strains. Sequence analysis suggests genetic rearrangements both within and outside the cps gene cluster, a mechanism which may be responsible for mosaic organisation of sugar transferase-related genes leading to structural variability of the capsular polysaccharide (CPS).

    Gene 2013;522;1;37-45

  • Human melioidosis, Malawi, 2011.

    Katangwe T, Purcell J, Bar-Zeev N, Denis B, Montgomery J, Alaerts M, Heyderman RS, Dance DA, Kennedy N, Feasey N and Moxon CA

    A case of human melioidosis caused by a novel sequence type of Burkholderia pseudomallei occurred in a child in Malawi, southern Africa. A literature review showed that human cases reported from the continent have been increasing.

    Emerging infectious diseases 2013;19;6;981-4

  • Activation of the B cell antigen receptor triggers reactivation of latent Kaposi's sarcoma-associated herpesvirus in B cells.

    Kati S, Tsao EH, Günther T, Weidner-Glunde M, Rothämel T, Grundhoff A, Kellam P and Schulz TF

    Institute of Virology, Hanover Medical School, Hanover, Germany.

    Kaposi's sarcoma-associated herpesvirus (KSHV) is an oncogenic herpesvirus and the cause of Kaposi's sarcoma, primary effusion lymphoma (PEL) and multicentric Castleman's disease. Latently infected B cells are the main reservoir of this virus in vivo, but the nature of the stimuli that lead to its reactivation in B cells is only partially understood. We established stable BJAB cell lines harboring latent KSHV by cell-free infection with recombinant virus carrying a puromycin resistance marker. Our latently infected B cell lines, termed BrK.219, can be reactivated by triggering the B cell receptor (BCR) with antibodies to surface IgM, a stimulus imitating antigen recognition. Using this B cell model system we studied the mechanisms that mediate the reactivation of KSHV in B cells following the stimulation of the BCR and could identify phosphatidylinositol 3-kinase (PI3K) and X-box binding protein 1 (XBP-1) as proteins that play an important role in the BCR-mediated reactivation of latent KSHV.

    Journal of virology 2013;87;14;8004-16

  • RetroSeq: transposable element discovery from next-generation sequencing data.

    Keane TM, Wong K and Adams DJ

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK. tk2@sanger.ac.uk

    Unlabelled: A significant proportion of eukaryote genomes consist of transposable element (TE)-derived sequence. These elements are known to have the capacity to modulate gene function and genome evolution. We have developed RetroSeq for detecting non-reference TE insertions from Illumina paired-end whole-genome sequencing data. We evaluate RetroSeq on a human trio from the 1000 Genomes Project, showing that it produces highly accurate TE calls.

    Availabilty: RetroSeq is open-source and available from https://github.com/tk2/RetroSeq.

    Funded by: Cancer Research UK; Medical Research Council; Wellcome Trust

    Bioinformatics (Oxford, England) 2013;29;3;389-90

  • Different patterns of Epstein-Barr virus latency in endemic Burkitt lymphoma (BL) lead to distinct variants within the BL-associated gene expression signature.

    Kelly GL, Stylianou J, Rasaiyaah J, Wei W, Thomas W, Croom-Carter D, Kohler C, Spang R, Woodman C, Kellam P, Rickinson AB and Bell AI

    School of Cancer Sciences, College of Medical and Dental Sciences, University of Birmingham, Edgbaston, Birmingham, United Kingdom.

    Epstein-Barr virus (EBV) is present in all cases of endemic Burkitt lymphoma (BL) but in few European/North American sporadic BLs. Gene expression arrays of sporadic tumors have defined a consensus BL profile within which tumors are classifiable as "molecular BL" (mBL). Where endemic BLs fall relative to this profile remains unclear, since they not only carry EBV but also display one of two different forms of virus latency. Here, we use early-passage BL cell lines from different tumors, and BL subclones from a single tumor, to compare EBV-negative cells with EBV-positive cells displaying either classical latency I EBV infection (where EBNA1 is the only EBV antigen expressed from the wild-type EBV genome) or Wp-restricted latency (where an EBNA2 gene-deleted virus genome broadens antigen expression to include the EBNA3A, -3B, and -3C proteins and BHRF1). Expression arrays show that both types of endemic BL fall within the mBL classification. However, while EBV-negative and latency I BLs show overlapping profiles, Wp-restricted BLs form a distinct subgroup, characterized by a detectable downregulation of the germinal center (GC)-associated marker Bcl6 and upregulation of genes marking early plasmacytoid differentiation, notably IRF4 and BLIMP1. Importantly, these same changes can be induced in EBV-negative or latency I BL cells by infection with an EBNA2-knockout virus. Thus, we infer that the distinct gene profile of Wp-restricted BLs does not reflect differences in the identity of the tumor progenitor cell per se but differences imposed on a common progenitor by broadened EBV gene expression.

    Funded by: Cancer Research UK: C910/A8829

    Journal of virology 2013;87;5;2882-94

  • A systematic genome-wide analysis of zebrafish protein-coding gene function.

    Kettleborough RN, Busch-Nentwich EM, Harvey SA, Dooley CM, de Bruijn E, van Eeden F, Sealy I, White RJ, Herd C, Nijman IJ, Fényes F, Mehroke S, Scahill C, Gibbons R, Wali N, Carruthers S, Hall A, Yen J, Cuppen E and Stemple DL

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Since the publication of the human reference genome, the identities of specific genes associated with human diseases are being discovered at a rapid rate. A central problem is that the biological activity of these genes is often unclear. Detailed investigations in model vertebrate organisms, typically mice, have been essential for understanding the activities of many orthologues of these disease-associated genes. Although gene-targeting approaches and phenotype analysis have led to a detailed understanding of nearly 6,000 protein-coding genes, this number falls considerably short of the more than 22,000 mouse protein-coding genes. Similarly, in zebrafish genetics, one-by-one gene studies using positional cloning, insertional mutagenesis, antisense morpholino oligonucleotides, targeted re-sequencing, and zinc finger and TAL endonucleases have made substantial contributions to our understanding of the biological activity of vertebrate genes, but again the number of genes studied falls well short of the more than 26,000 zebrafish protein-coding genes. Importantly, for both mice and zebrafish, none of these strategies are particularly suited to the rapid generation of knockouts in thousands of genes and the assessment of their biological activity. Here we describe an active project that aims to identify and phenotype the disruptive mutations in every zebrafish protein-coding gene, using a well-annotated zebrafish reference genome sequence, high-throughput sequencing and efficient chemical mutagenesis. So far we have identified potentially disruptive mutations in more than 38% of all known zebrafish protein-coding genes. We have developed a multi-allelic phenotyping scheme to efficiently assess the effects of each allele during embryogenesis and have analysed the phenotypic consequences of over 1,000 alleles. All mutant alleles and data are available to the community and our phenotyping scheme is adaptable to phenotypic analysis beyond embryogenesis.

    Funded by: Medical Research Council: G0777791; NHGRI NIH HHS: 5R01HG00481; Wellcome Trust: 098051

    Nature 2013;496;7446;494-7

  • A genome-wide analysis of populations from European Russia reveals a new pole of genetic diversity in northern europe.

    Khrunin AV, Khokhrin DV, Filippova IN, Esko T, Nelis M, Bebyakova NA, Bolotova NL, Klovins J, Nikitina-Zake L, Rehnström K, Ripatti S, Schreiber S, Franke A, Macek M, Krulišová V, Lubinski J, Metspalu A and Limborska SA

    Department of Molecular Bases of Human Genetics, Institute of Molecular Genetics, Russian Academy of Sciences, Moscow, Russia.

    Several studies examined the fine-scale structure of human genetic variation in Europe. However, the European sets analyzed represent mainly northern, western, central, and southern Europe. Here, we report an analysis of approximately 166,000 single nucleotide polymorphisms in populations from eastern (northeastern) Europe: four Russian populations from European Russia, and three populations from the northernmost Finno-Ugric ethnicities (Veps and two contrast groups of Komi people). These were compared with several reference European samples, including Finns, Estonians, Latvians, Poles, Czechs, Germans, and Italians. The results obtained demonstrated genetic heterogeneity of populations living in the region studied. Russians from the central part of European Russia (Tver, Murom, and Kursk) exhibited similarities with populations from central-eastern Europe, and were distant from Russian sample from the northern Russia (Mezen district, Archangelsk region). Komi samples, especially Izhemski Komi, were significantly different from all other populations studied. These can be considered as a second pole of genetic diversity in northern Europe (in addition to the pole, occupied by Finns), as they had a distinct ancestry component. Russians from Mezen and the Finnic-speaking Veps were positioned between the two poles, but differed from each other in the proportions of Komi and Finnic ancestries. In general, our data provides a more complete genetic map of Europe accounting for the diversity in its most eastern (northeastern) populations.

    PloS one 2013;8;3;e58552

  • Integrative annotation of variants from 1092 humans: application to cancer genomics.

    Khurana E, Fu Y, Colonna V, Mu XJ, Kang HM, Lappalainen T, Sboner A, Lochovsky L, Chen J, Harmanci A, Das J, Abyzov A, Balasubramanian S, Beal K, Chakravarty D, Challis D, Chen Y, Clarke D, Clarke L, Cunningham F, Evani US, Flicek P, Fragoza R, Garrison E, Gibbs R, Gümüs ZH, Herrero J, Kitabayashi N, Kong Y, Lage K, Liluashvili V, Lipkin SM, MacArthur DG, Marth G, Muzny D, Pers TH, Ritchie GR, Rosenfeld JA, Sisu C, Wei X, Wilson M, Xue Y, Yu F, 1000 Genomes Project Consortium, Dermitzakis ET, Yu H, Rubin MA, Tyler-Smith C and Gerstein M

    Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.

    Interpreting variants, especially noncoding ones, in the increasing number of personal genomes is challenging. We used patterns of polymorphisms in functionally annotated regions in 1092 humans to identify deleterious variants; then we experimentally validated candidates. We analyzed both coding and noncoding regions, with the former corroborating the latter. We found regions particularly sensitive to mutations ("ultrasensitive") and variants that are disruptive because of mechanistic effects on transcription-factor binding (that is, "motif-breakers"). We also found variants in regions with higher network centrality tend to be deleterious. Insertions and deletions followed a similar pattern to single-nucleotide variants, with some notable exceptions (e.g., certain deletions and enhancers). On the basis of these patterns, we developed a computational tool (FunSeq), whose application to ~90 cancer genomes reveals nearly a hundred candidate noncoding drivers.

    Funded by: NCATS NIH HHS: UL1 TR000142, UL1 TR000457; NCI NIH HHS: CA167824, R01 CA166661, R01CA152057, U01 CA111275; NCRR NIH HHS: G12 RR003050; NHGRI NIH HHS: HG005718, HG007000, R01 HG002898, R01HG4719, U01 HG005718, U01HG6513, U41 HG007000, U54 HG003273; NIGMS NIH HHS: GM104424; NIMHD NIH HHS: G12 MD007579; Wellcome Trust: 085532, 090532, 095908, 098051, WT085532, WT095908

    Science (New York, N.Y.) 2013;342;6154;1235587

  • Genome and Transcriptome Adaptation Accompanying Emergence of the Definitive Type 2 Host-Restricted Salmonella enterica Serovar Typhimurium Pathovar.

    Kingsley RA, Kay S, Connor T, Barquist L, Sait L, Holt KE, Sivaraman K, Wileman T, Goulding D, Clare S, Hale C, Seshasayee A, Harris S, Thomson NR, Gardner P, Rabsch W, Wigley P, Humphrey T, Parkhill J and Dougan G

    The Wellcome Trust Sanger Institute, the Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    ABSTRACT Salmonella enterica serovar Typhimurium definitive type 2 (DT2) is host restricted to Columba livia (rock or feral pigeon) but is also closely related to S. Typhimurium isolates that circulate in livestock and cause a zoonosis characterized by gastroenteritis in humans. DT2 isolates formed a distinct phylogenetic cluster within S. Typhimurium based on whole-genome-sequence polymorphisms. Comparative genome analysis of DT2 94-213 and S. Typhimurium SL1344, DT104, and D23580 identified few differences in gene content with the exception of variations within prophages. However, DT2 94-213 harbored 22 pseudogenes that were intact in other closely related S. Typhimurium strains. We report a novel in silico approach to identify single amino acid substitutions in proteins that have a high probability of a functional impact. One polymorphism identified using this method, a single-residue deletion in the Tar protein, abrogated chemotaxis to aspartate in vitro. DT2 94-213 also exhibited an altered transcriptional profile in response to culture at 42°C compared to that of SL1344. Such differentially regulated genes included a number involved in flagellum biosynthesis and motility. IMPORTANCE Whereas Salmonella enterica serovar Typhimurium can infect a wide range of animal species, some variants within this serovar exhibit a more limited host range and altered disease potential. Phylogenetic analysis based on whole-genome sequences can identify lineages associated with specific virulence traits, including host adaptation. This study represents one of the first to link pathogen-specific genetic signatures, including coding capacity, genome degradation, and transcriptional responses to host adaptation within a Salmonella serovar. We performed comparative genome analysis of reference and pigeon-adapted definitive type 2 (DT2) S. Typhimurium isolates alongside phenotypic and transcriptome analyses, to identify genetic signatures linked to host adaptation within the DT2 lineage.

    mBio 2013;4;5

  • Analysis of Tumor Heterogeneity and Cancer Gene Networks Using Deep Sequencing of MMTV-Induced Mouse Mammary Tumors.

    Klijn C, Koudijs MJ, Kool J, Ten Hoeve J, Boer M, de Moes J, Akhtar W, van Miltenburg M, Vendel-Zwaagstra A, Reinders MJ, Adams DJ, van Lohuizen M, Hilkens J, Wessels LF and Jonkers J

    Division of Molecular Pathology, The Netherlands Cancer Institute, Amsterdam, The Netherlands ; Division of Molecular Carcinogenesis, The Netherlands Cancer Institute, Amsterdam, The Netherlands.

    Cancer develops through a multistep process in which normal cells progress to malignant tumors via the evolution of their genomes as a result of the acquisition of mutations in cancer driver genes. The number, identity and mode of action of cancer driver genes, and how they contribute to tumor evolution is largely unknown. This study deployed the Mouse Mammary Tumor Virus (MMTV) as an insertional mutagen to find both the driver genes and the networks in which they function. Using deep insertion site sequencing we identified around 31000 retroviral integration sites in 604 MMTV-induced mammary tumors from mice with mammary gland-specific deletion of Trp53, Pten heterozygous knockout mice, or wildtype strains. We identified 18 known common integration sites (CISs) and 12 previously unknown CISs marking new candidate cancer genes. Members of the Wnt, Fgf, Fgfr, Rspo and Pdgfr gene families were commonly mutated in a mutually exclusive fashion. The sequence data we generated yielded also information on the clonality of insertions in individual tumors, allowing us to develop a data-driven model of MMTV-induced tumor development. Insertional mutations near Wnt and Fgf genes mark the earliest "initiating" events in MMTV induced tumorigenesis, whereas Fgfr genes are targeted later during tumor progression. Our data shows that insertional mutagenesis can be used to discover the mutational networks, the timing of mutations, and the genes that initiate and drive tumor evolution.

    PloS one 2013;8;5;e62113

  • Current application and future perspectives of molecular typing methods to study Clostridium difficile infections.

    Knetsch CW, Lawley TD, Hensgens MP, Corver J, Wilcox MW and Kuijper EJ

    Section Experimental Microbiology, Department of Medical Microbiology, Center of Infectious Diseases, Leiden University Medical Center, Leiden, Netherlands.

    Molecular typing is an essential tool to monitor Clostridium difficile infections and outbreaks within healthcare facilities. Molecular typing also plays a key role in defining the regional and global changes in circulating C. difficile types. The patterns of C. difficile types circulating within Europe (and globally) remain poorly understood, although international efforts are under way to understand the spatial and temporal patterns of C. difficile types. A complete picture is essential to properly investigate type-specific risk factors for C. difficile infections (CDI) and track long-range transmission. Currently, conventional agarose gel-based polymerase chain reaction (PCR) ribotyping is the most common typing method used in Europe to type C. difficile. Although this method has proved to be useful to study epidemiology on local, national and European level, efforts are made to replace it with capillary electrophoresis PCR ribotyping to increase pattern recognition, reproducibility and interpretation. However, this method lacks sufficient discriminatory power to study outbreaks and therefore multilocus variable-number tandem repeat analysis (MLVA) has been developed to study transmission between humans, animals and food. Sequence-based methods are increasingly being used for C. difficile fingerprinting/typing because of their ability to discriminate between highly related strains, the ease of data interpretation and transferability of data. The first studies using whole-genome single nucleotide polymorphism typing of healthcare-associated C. difficile within a clinically relevant timeframe are very promising and, although limited to select facilities because of complex data interpretation and high costs, these approaches will likely become commonly used over the coming years.

    Euro surveillance : bulletin Européen sur les maladies transmissibles = European communicable disease bulletin 2013;18;4;20381

  • Tracking chromosome evolution in southern African gerbils using flow-sorted chromosome paints.

    Knight LI, Ng BL, Cheng W, Fu B, Yang F and Rambau RV

    Evolutionary Genomics Group, Department of Botany and Zoology, University of Stellenbosch, Stellenbosch, South Africa.

    Desmodillus and Gerbilliscus (formerly Tatera) comprise a monophyletic group of gerbils (subfamily Gerbillinae) which last shared an ancestor approximately 8 million years ago; diploid chromosome number variation among the species ranges from 2n = 36 to 2n = 50. In an attempt to shed more light on chromosome evolution and speciation in these rodents, we compared the karyotypes of 7 species, representing 3 genera, based on homology data revealed by chromosome painting with probes derived from flow-sorted chromosomes of the hairy footed gerbil, Gerbillurus paeba (2n = 36). The fluorescent in situ hybridization data revealed remarkable genome conservation: these species share a high proportion of conserved chromosomes, and differences are due to 10 Robertsonian (Rb) rearrangements (3 autapomorphies, 3 synapomorphies and 4 hemiplasies/homoplasies). Our data suggest that chromosome evolution in Desmodillus occurred at a rate of ~1.25 rearrangements per million years (Myr), and that the rate among Gerbilliscus over a time period spanning 8 Myr is also ~1.25 rearrangements/Myr. The recently diverged Gerbillurus (G. tytonis and G. paeba) share an identical karyotype, while Gerbilliscus kempi, G. afra and G. leucogaster differ by 6 Rb rearrangements (a rate of ~1 rearrangement/Myr). Thus, our data suggests a very slow rate of chromosomal evolution in Southern African gerbils.

    Funded by: Wellcome Trust: WT098051

    Cytogenetic and genome research 2013;139;4;267-75

  • A P3G generic access agreement for population genomic studies.

    Knoppers BM, Chisholm RL, Kaye J, Cox D, Thorogood A, Burton P, Brookes AJ, Fortier I, Goodwin P, Harris JR, Hveem K, Kent A, Little J, Riegman PH, Ripatti S and Stolk RP

    Centre of Genomics and Policy, McGill University, Montreal, Quebec, Canada.

    Nature biotechnology 2013;31;5;384-5

  • Host responses to melioidosis and tuberculosis are both dominated by interferon-mediated signaling.

    Koh GC, Schreiber MF, Bautista R, Maude RR, Dunachie S, Limmathurotsakul D, Day NP, Dougan G and Peacock SJ

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom ; Department of Medicine, University of Cambridge, Cambridge, United Kingdom ; Mahidol-Oxford Tropical Medicine Research Unit, Mahidol University, Bangkok, Thailand ; Department of Infection and Tropical Diseases, Birmingham Heartlands Hospital, Birmingham, United Kingdom.

    Melioidosis (Burkholderia pseudomallei infection) is a common cause of community-acquired sepsis in Northeast Thailand and northern Australia. B. pseudomallei is a soil saprophyte endemic to Southeast Asia and northern Australia. The clinical presentation of melioidosis may mimic tuberculosis (both cause chronic suppurative lesions unresponsive to conventional antibiotics and both commonly affect the lungs). The two diseases have overlapping risk profiles (e.g., diabetes, corticosteroid use), and both B. pseudomallei and Mycobacterium tuberculosis are intracellular pathogens. There are however important differences: the majority of melioidosis cases are acute, not chronic, and present with severe sepsis and a mortality rate that approaches 50% despite appropriate antimicrobial therapy. By contrast, tuberculosis is characteristically a chronic illness with mortality <2% with appropriate antimicrobial chemotherapy. We examined the gene expression profiles of total peripheral leukocytes in two cohorts of patients, one with acute melioidosis (30 patients and 30 controls) and another with tuberculosis (20 patients and 24 controls). Interferon-mediated responses dominate the host response to both infections, and both type 1 and type 2 interferon responses are important. An 86-gene signature previously thought to be specific for tuberculosis is also found in melioidosis. We conclude that the host responses to melioidosis and to tuberculosis are similar: both are dominated by interferon-signalling pathways and this similarity means gene expression signatures from whole blood do not distinguish between these two diseases.

    PloS one 2013;8;1;e54961

  • Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library.

    Koike-Yusa H, Li Y, Tan EP, Velasco-Herrera MD and Yusa K

    1] Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. [2].

    Identification of genes influencing a phenotype of interest is frequently achieved through genetic screening by RNA interference (RNAi) or knockouts. However, RNAi may only achieve partial depletion of gene activity, and knockout-based screens are difficult in diploid mammalian cells. Here we took advantage of the efficiency and high throughput of genome editing based on type II, clustered, regularly interspaced, short palindromic repeats (CRISPR)-CRISPR-associated (Cas) systems to introduce genome-wide targeted mutations in mouse embryonic stem cells (ESCs). We designed 87,897 guide RNAs (gRNAs) targeting 19,150 mouse protein-coding genes and used a lentiviral vector to express these gRNAs in ESCs that constitutively express Cas9. Screening the resulting ESC mutant libraries for resistance to either Clostridium septicum alpha-toxin or 6-thioguanine identified 27 known and 4 previously unknown genes implicated in these phenotypes. Our results demonstrate the potential for efficient loss-of-function screening using the CRISPR-Cas9 system.

    Nature biotechnology 2013

  • Chromatin Accessibility Data Sets Show Bias Due to Sequence Specificity of the DNase I Enzyme.

    Koohy H, Down TA and Hubbard TJ

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    Background: DNase I is an enzyme which cuts duplex DNA at a rate that depends strongly upon its chromatin environment. In combination with high-throughput sequencing (HTS) technology, it can be used to infer genome-wide landscapes of open chromatin regions. Using this technology, systematic identification of hundreds of thousands of DNase I hypersensitive sites (DHS) per cell type has been possible, and this in turn has helped to precisely delineate genomic regulatory compartments. However, to date there has been relatively little investigation into possible biases affecting this data. Results: We report a significant degree of sequence preference spanning sites cut by DNase I in a number of published data sets. The two major protocols in current use each show a different pattern, but for a given protocol the pattern of sequence specificity seems to be quite consistent. The patterns are substantially different from biases seen in other types of HTS data sets, and in some cases the most constrained position lies outside the sequenced fragment, implying that this constraint must relate to the digestion process rather than events occurring during library preparation or sequencing. Conclusions: DNase I is a sequence-specific enzyme, with a specificity that may depend on experimental conditions. This sequence specificity is not taken into account by existing pipelines for identifying open chromatin regions. Care must be taken when interpreting DNase I results, especially when looking at the precise locations of the reads. Future studies may be able to improve the sensitivity and precision of chromatin state measurement by compensating for sequence bias.

    PloS one 2013;8;7;e69853

  • Criteria for inference of chromothripsis in cancer genomes.

    Korbel JO and Campbell PJ

    Genome Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany. Electronic address: jan.korbel@embl.de.

    Chromothripsis scars the genome when localized chromosome shattering and repair occurs in a one-off catastrophe. Outcomes of this process are detectable as massive DNA rearrangements affecting one or a few chromosomes. Although recent findings suggest a crucial role of chromothripsis in cancer development, the reproducible inference of this process remains challenging, requiring that cataclysmic one-off rearrangements be distinguished from localized lesions that occur progressively. We describe conceptual criteria for the inference of chromothripsis, based on ruling out the alternative hypothesis that stepwise rearrangements occurred. Robust means of inference may facilitate in-depth studies on the impact of, and the mechanisms underlying, chromothripsis.

    Cell 2013;152;6;1226-36

  • Piliation of Invasive Streptococcus pneumoniae Isolates in the Era before Pneumococcal Conjugate Vaccine Introduction in Malawi.

    Kulohoma BW, Gray K, Kamng'ona A, Cornick J, Bentley SD, Heyderman RS and Everett DB

    The Malawi-Liverpool-Wellcome Trust Clinical Research Programme, Blantyre, Malawi.

    The pneumococcal pilus has been shown to be an important determinant of adhesion and virulence in mouse models of colonization, pneumonia, and bacteremia. A pilus is capable of inducing protective immunity, supporting its inclusion in next-generation pneumococcal protein vaccine formulations. Whether this vaccine target is common among pneumococci in sub-Saharan Africa is uncertain. To define the prevalence and genetic diversity of type I and II pili among invasive pneumococci in Malawi prior to the introduction of the 13-valent pneumococcal conjugate vaccine (PCV13) into routine childhood immunization, we examined 188 Streptococcus pneumoniae isolates collected between 2002 and 2008 (17% serotype 1). In this region of high disease burden, we found a low frequency of invasive piliated pneumococci (14%) and pilus gene sequence diversity similar to that seen previously in multiple global pneumococcal lineages. All common serotypes with pilus were covered by PCV13 and so we predict that pilus prevalence will be reduced in the Malawian pneumococcal population after PCV13 introduction.

    Clinical and vaccine immunology : CVI 2013;20;11;1729-35

  • The genome and transcriptome of Haemonchus contortus, a key model parasite for drug and vaccine discovery.

    Laing R, Kikuchi T, Martinelli A, Tsai IJ, Beech RN, Redman E, Holroyd N, Bartley DJ, Beasley H, Britton C, Curran D, Devaney E, Gilabert A, Hunt M, Jackson F, Johnston SL, Kryukov I, Li K, Morrison AA, Reid AJ, Sargison N, Saunders GI, Wasmuth JD, Wolstenholme A, Berriman M, Gilleard JS and Cotton JA

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. jsgillea@ucalgary.ca.

    Background: The small ruminant parasite Haemonchus contortus is the most widely used parasitic nematode in drug discovery, vaccine development and anthelmintic resistance research. Its remarkable propensity to develop resistance threatens the viability of the sheep industry in many regions of the world and provides a cautionary example of the effect of mass drug administration to control parasitic nematodes. Its phylogenetic position makes it particularly well placed for comparison with the free-living nematode Caenorhabditis elegans and the most economically important parasites of livestock and humans.

    Results: Here we report the detailed analysis of a draft genome assembly and extensive transcriptomic dataset for H. contortus. This represents the first genome to be published for a strongylid nematode and the most extensive transcriptomic dataset for any parasitic nematode reported to date. We show a general pattern of conservation of genome structure and gene content between H. contortus and C. elegans, but also a dramatic expansion of important parasite gene families. We identify genes involved in parasite-specific pathways such as blood feeding, neurological function, and drug metabolism. In particular, we describe complete gene repertoires for known drug target families, providing the most comprehensive understanding yet of the action of several important anthelmintics. Also, we identify a set of genes enriched in the parasitic stages of the lifecycle and the parasite gut that provide a rich source of vaccine and drug target candidates.

    Conclusions: The H. contortus genome and transcriptome provide an essential platform for postgenomic research in this and other important strongylid parasites.

    Genome biology 2013;14;8;R88

  • Etoposide Induces Nuclear Re-Localisation of AID.

    Lambert LJ, Walker S, Feltham J, Lee HJ, Reik W and Houseley J

    Epigenetics Programme, The Babraham Institute, Cambridge, United Kingdom.

    During B cell activation, the DNA lesions that initiate somatic hypermutation and class switch recombination are introduced by activation-induced cytidine deaminase (AID). AID is a highly mutagenic protein that is maintained in the cytoplasm at steady state, however AID is shuttled across the nuclear membrane and the protein transiently present in the nucleus appears sufficient for targeted alteration of immunoglobulin loci. AID has been implicated in epigenetic reprogramming in primordial germ cells and cell fusions and in induced pluripotent stem cells (iPS cells), however AID expression in non-B cells is very low. We hypothesised that epigenetic reprogramming would require a pathway that instigates prolonged nuclear residence of AID. Here we show that AID is completely re-localised to the nucleus during drug withdrawal following etoposide treatment, in the period in which double strand breaks (DSBs) are repaired. Re-localisation occurs 2-6 hours after etoposide treatment, and AID remains in the nucleus for 10 or more hours, during which time cells remain live and motile. Re-localisation is cell-cycle dependent and is only observed in G2. Analysis of DSB dynamics shows that AID is re-localised in response to etoposide treatment, however re-localisation occurs substantially after DSB formation and the levels of re-localisation do not correlate with γH2AX levels. We conclude that DSB formation initiates a slow-acting pathway which allows stable long-term nuclear localisation of AID, and that such a pathway may enable AID-induced DNA demethylation during epigenetic reprogramming.

    PloS one 2013;8;12;e82110

  • Cerebral organoids model human brain development and microcephaly.

    Lancaster MA, Renner M, Martin CA, Wenzel D, Bicknell LS, Hurles ME, Homfray T, Penninger JM, Jackson AP and Knoblich JA

    Institute of Molecular Biotechnology of the Austrian Academy of Science (IMBA), Vienna 1030, Austria.

    The complexity of the human brain has made it difficult to study many brain disorders in model organisms, highlighting the need for an in vitro model of human brain development. Here we have developed a human pluripotent stem cell-derived three-dimensional organoid culture system, termed cerebral organoids, that develop various discrete, although interdependent, brain regions. These include a cerebral cortex containing progenitor populations that organize and produce mature cortical neuron subtypes. Furthermore, cerebral organoids are shown to recapitulate features of human cortical development, namely characteristic progenitor zone organization with abundant outer radial glial stem cells. Finally, we use RNA interference and patient-specific induced pluripotent stem cells to model microcephaly, a disorder that has been difficult to recapitulate in mice. We demonstrate premature neuronal differentiation in patient organoids, a defect that could help to explain the disease phenotype. Together, these data show that three-dimensional organoids can recapitulate development and disease even in this most complex human tissue.

    Nature 2013

  • Expression of recombinant ITGA2 and CD109 for the detection of human platelet antigen (HPA)-5 and -15 alloantibodies.

    Lane-Serff H, Sun Y, Metcalfe P and Wright GJ

    Cell surface Signalling Laboratory, Wellcome Trust Sanger Institute, Hinxton Cambridge, UK.

    British journal of haematology 2013

  • Intestinal colonization resistance.

    Lawley TD and Walker AW

    Bacterial Pathogenesis Laboratory, Wellcome Trust Sanger Institute, Hinxton, UK. tl2@sanger.ac.uk

    Dense, complex microbial communities, collectively termed the microbiota, occupy a diverse array of niches along the length of the mammalian intestinal tract. During health and in the absence of antibiotic exposure the microbiota can effectively inhibit colonization and overgrowth by invading microbes such as pathogens. This phenomenon is called 'colonization resistance' and is associated with a stable and diverse microbiota in tandem with a controlled lack of inflammation, and involves specific interactions between the mucosal immune system and the microbiota. Here we overview the microbial ecology of the healthy mammalian intestinal tract and highlight the microbe-microbe and microbe-host interactions that promote colonization resistance. Emerging themes highlight immunological (T helper type 17/regulatory T-cell balance), microbiota (diverse and abundant) and metabolic (short-chain fatty acid) signatures of intestinal health and colonization resistance. Intestinal pathogens use specific virulence factors or exploit antibiotic use to subvert colonization resistance for their own benefit by triggering inflammation to disrupt the harmony of the intestinal ecosystem. A holistic view that incorporates immunological and microbiological facets of the intestinal ecosystem should facilitate the development of immunomodulatory and microbe-modulatory therapies that promote intestinal homeostasis and colonization resistance.

    Funded by: Medical Research Council: 93614; Wellcome Trust: 076964, 098051

    Immunology 2013;138;1;1-11

  • Murine models to study Clostridium difficile infection and transmission.

    Lawley TD and Young VB

    Bacterial Pathogenesis Laboratory, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Clostridium difficile is the leading cause of antibiotic-associated diarrhea in healthcare facilities worldwide. C. difficile infections are difficult to treat because of the high rate of disease recurrence after antibiotic therapy, leaving few treatment options for patients. C. difficile is also difficult to contain within a healthcare setting due to a highly-transmissible, resistant spore form that challenges standard infection control measures. The recent development of murine infection models to study the interactions between C. difficile, the host and the microbiota are providing novel insight into the mechanisms of pathogenesis and transmission that should guide the development of therapies and intervention measures.

    Funded by: Medical Research Council: G0901743; NIAID NIH HHS: AI090871, U19 AI090871; Wellcome Trust: 076964, 098051

    Anaerobe 2013;24;94-7

  • Richness of human gut microbiome correlates with metabolic markers.

    Le Chatelier E, Nielsen T, Qin J, Prifti E, Hildebrand F, Falony G, Almeida M, Arumugam M, Batto JM, Kennedy S, Leonard P, Li J, Burgdorf K, Grarup N, Jørgensen T, Brandslund I, Nielsen HB, Juncker AS, Bertalan M, Levenez F, Pons N, Rasmussen S, Sunagawa S, Tap J, Tims S, Zoetendal EG, Brunak S, Clément K, Doré J, Kleerebezem M, Kristiansen K, Renault P, Sicheritz-Ponten T, de Vos WM, Zucker JD, Raes J, Hansen T, MetaHIT consortium, Bork P, Wang J, Ehrlich SD and Pedersen O

    INRA, Institut National de la Recherche Agronomique, US1367 Metagenopolis, 78350 Jouy en Josas, France.

    We are facing a global metabolic health crisis provoked by an obesity epidemic. Here we report the human gut microbial composition in a population sample of 123 non-obese and 169 obese Danish individuals. We find two groups of individuals that differ by the number of gut microbial genes and thus gut bacterial richness. They contain known and previously unknown bacterial species at different proportions; individuals with a low bacterial richness (23% of the population) are characterized by more marked overall adiposity, insulin resistance and dyslipidaemia and a more pronounced inflammatory phenotype when compared with high bacterial richness individuals. The obese individuals among the lower bacterial richness group also gain more weight over time. Only a few bacterial species are sufficient to distinguish between individuals with high and low bacterial richness, and even between lean and obese participants. Our classifications based on variation in the gut microbiome identify subsets of individuals in the general white adult population who may be at increased risk of progressing to adiposity-associated co-morbidities.

    Nature 2013;500;7464;541-6

  • Human SNP Links Differential Outcomes in Inflammatory and Infectious Disease to a FOXO3-Regulated Pathway.

    Lee JC, Espéli M, Anderson CA, Linterman MA, Pocock JM, Williams NJ, Roberts R, Viatte S, Fu B, Peshu N, Hien TT, Phu NH, Wesley E, Edwards C, Ahmad T, Mansfield JC, Gearry R, Dunstan S, Williams TN, Barton A, Vinuesa CG, UK IBD Genetics Consortium, Parkes M, Lyons PA and Smith KG

    Cambridge Institute for Medical Research, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0XY, UK; Department of Medicine, University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Cambridge CB2 0QQ, UK.

    The clinical course and eventual outcome, or prognosis, of complex diseases varies enormously between affected individuals. This variability critically determines the impact a disease has on a patient's life but is very poorly understood. Here, we exploit existing genome-wide association study data to gain insight into the role of genetics in prognosis. We identify a noncoding polymorphism in FOXO3A (rs12212067: T > G) at which the minor (G) allele, despite not being associated with disease susceptibility, is associated with a milder course of Crohn's disease and rheumatoid arthritis and with increased risk of severe malaria. Minor allele carriage is shown to limit inflammatory responses in monocytes via a FOXO3-driven pathway, which through TGFβ1 reduces production of proinflammatory cytokines, including TNFα, and increases production of anti-inflammatory cytokines, including IL-10. Thus, we uncover a shared genetic contribution to prognosis in distinct diseases that operates via a FOXO3-driven pathway modulating inflammatory responses.

    Cell 2013

  • Common variants in the HLA-DRB1-HLA-DQA1 HLA class II region are associated with susceptibility to visceral leishmaniasis.

    LeishGEN Consortium, Wellcome Trust Case Control Consortium 2, Fakiola M, Strange A, Cordell HJ, Miller EN, Pirinen M, Su Z, Mishra A, Mehrotra S, Monteiro GR, Band G, Bellenguez C, Dronov S, Edkins S, Freeman C, Giannoulatou E, Gray E, Hunt SE, Lacerda HG, Langford C, Pearson R, Pontes NN, Rai M, Singh SP, Smith L, Sousa O, Vukcevic D, Bramon E, Brown MA, Casas JP, Corvin A, Duncanson A, Jankowski J, Markus HS, Mathew CG, Palmer CN, Plomin R, Rautanen A, Sawcer SJ, Trembath RC, Viswanathan AC, Wood NW, Wilson ME, Deloukas P, Peltonen L, Christiansen F, Witt C, Jeronimo SM, Sundar S, Spencer CC, Blackwell JM and Donnelly P

    1] Cambridge Institute for Medical Research, University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Cambridge, UK. [2].

    To identify susceptibility loci for visceral leishmaniasis, we undertook genome-wide association studies in two populations: 989 cases and 1,089 controls from India and 357 cases in 308 Brazilian families (1,970 individuals). The HLA-DRB1-HLA-DQA1 locus was the only region to show strong evidence of association in both populations. Replication at this region was undertaken in a second Indian population comprising 941 cases and 990 controls, and combined analysis across the three cohorts for rs9271858 at this locus showed P(combined) = 2.76 × 10(-17) and odds ratio (OR) = 1.41, 95% confidence interval (CI) = 1.30-1.52. A conditional analysis provided evidence for multiple associations within the HLA-DRB1-HLA-DQA1 region, and a model in which risk differed between three groups of haplotypes better explained the signal and was significant in the Indian discovery and replication cohorts. In conclusion, the HLA-DRB1-HLA-DQA1 HLA class II region contributes to visceral leishmaniasis susceptibility in India and Brazil, suggesting shared genetic risk factors for visceral leishmaniasis that cross the epidemiological divides of geography and parasite species.

    Nature genetics 2013;45;2;208-13

  • Genome-wide profiling of chromosome interactions in Plasmodium falciparum characterizes nuclear architecture and reconfigurations associated with antigenic variation.

    Lemieux JE, Kyes SA, Otto TD, Feller AI, Eastman RT, Pinches RA, Berriman M, Su XZ and Newbold CI

    Weatherall Institute of Molecular Medicine, Headington, Oxford, OX3 9DS, UK; National Institute of Allergy and Infectious Disease, NIH, Rockville, MD, 20892, USA.

    Spatial relationships within the eukaryotic nucleus are essential for proper nuclear function. In Plasmodium falciparum, the repositioning of chromosomes has been implicated in the regulation of the expression of genes responsible for antigenic variation, and the formation of a single, peri-nuclear nucleolus results in the clustering of rDNA. Nevertheless, the precise spatial relationships between chromosomes remain poorly understood, because, until recently, techniques with sufficient resolution have been lacking. Here we have used chromosome conformation capture and second-generation sequencing to study changes in chromosome folding and spatial positioning that occur during switches in var gene expression. We have generated maps of chromosomal spatial affinities within the P. falciparum nucleus at 25 Kb resolution, revealing a structured nucleolus, an absence of chromosome territories, and confirming previously identified clustering of heterochromatin foci. We show that switches in var gene expression do not appear to involve interaction with a distant enhancer, but do result in local changes at the active locus. These maps reveal the folding properties of malaria chromosomes, validate known physical associations, and characterize the global landscape of spatial interactions. Collectively, our data provide critical information for a better understanding of gene expression regulation and antigenic variation in malaria parasites.

    Funded by: Wellcome Trust: 082130, 082130/Z/07/Z, 098051

    Molecular microbiology 2013;90;3;519-37

  • The NGS WikiBook: a dynamic collaborative online training effort with long-term sustainability.

    Li JW, Bolser D, Manske M, Giorgi FM, Vyahhi N, Usadel B, Clavijo BJ, Chan TF, Wong N, Zerbino D and Schneider MV

    School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR. Tel.: +852-39431302; marcowanger@gmail.com.

    Next-generation sequencing (NGS) is increasingly being adopted as the backbone of biomedical research. With the commercialization of various affordable desktop sequencers, NGS will be reached by increasing numbers of cellular and molecular biologists, necessitating community consensus on bioinformatics protocols to tackle the exponential increase in quantity of sequence data. The current resources for NGS informatics are extremely fragmented. Finding a centralized synthesis is difficult. A multitude of tools exist for NGS data analysis; however, none of these satisfies all possible uses and needs. This gap in functionality could be filled by integrating different methods in customized pipelines, an approach helped by the open-source nature of many NGS programmes. Drawing from community spirit and with the use of the Wikipedia framework, we have initiated a collaborative NGS resource: The NGS WikiBook. We have collected a sufficient amount of text to incentivize a broader community to contribute to it. Users can search, browse, edit and create new content, so as to facilitate self-learning and feedback to the community. The overall structure and style for this dynamic material is designed for the bench biologists and non-bioinformaticians. The flexibility of online material allows the readers to ignore details in a first read, yet have immediate access to the information they need. Each chapter comes with practical exercises so readers may familiarize themselves with each step. The NGS WikiBook aims to create a collective laboratory book and protocol that explains the key concepts and describes best practices in this fast-evolving field.

    Funded by: Biotechnology and Biological Sciences Research Council

    Briefings in bioinformatics 2013;14;5;548-55

  • The piggybac transposon displays local and distant reintegration preferences and can cause mutations at non-canonical integration sites.

    Li MA, Pettitt SJ, Eckert S, Ning Z, Rice S, Cadiñanos J, Yusa K, Conte N and Bradley A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK, CB10 1SA.

    The DNA transposon piggyBac is widely used as a tool in mammalian experimental systems for transgenesis, mutagenesis and genome engineering. We have characterised genome-wide insertion site preferences of piggyBac by sequencing a large set of integration sites arising from transposition from two separate genomic loci and a plasmid donor in mouse embryonic stem cells. We found that piggyBac preferentially integrates locally to the excision site when mobilised from a chromosomal location, and identified other non-local regions of the genome with elevated insertion frequencies. PiggyBac insertions were associated with expressed genes and markers of open chromatin structure, and were excluded from heterochromatin. At the nucleotide level, piggyBac prefers to insert into TA-rich regions within a broader GC-rich context. We also found that piggyBac can insert into sites other than its known TTAA insertion site at low frequency (2%). Such insertions introduce mismatches that are repaired with signatures of host cell mismatch repair pathways. Transposons could be mobilised from plasmids with the observed mismatches, indicating that piggyBac could generate point mutations in the genome.

    Molecular and cellular biology 2013

  • Non dominant-negative KCNJ2 gene mutations leading to Andersen-Tawil syndrome with an isolated cardiac phenotype.

    Limberg MM, Zumhagen S, Netter MF, Coffey AJ, Grace A, Rogers J, Böckelmann D, Rinné S, Stallmeyer B, Decher N and Schulze-Bahr E

    Institut für Physiologie und Pathophysiologie, Vegetative Physiologie, Philipps-University Marburg, Deutschhausstraße 1-2, 35037 Marburg, Germany.

    Andersen-Tawil syndrome (ATS) is characterized by dysmorphic features, periodic paralyses and abnormal ventricular repolarization. After genotyping a large set of patients with congenital long-QT syndrome, we identified two novel, heterozygous KCNJ2 mutations (p.N318S, p.W322C) located in the C-terminus of the Kir2.1 subunit. These mutations have a different localization than classical ATS mutations which are mostly located at a potential interaction face with the slide helix or at the interface between the C-termini. Mutation carriers were without the key features of ATS, causing an isolated cardiac phenotype. While the N318S mutants regularly reached the plasma membrane, W322C mutants primarily resided in late endosomes. Co-expression of N318S or W322C with wild-type Kir2.1 reduced current amplitudes only by 20-25 %. This mild loss-of-function for the heteromeric channels resulted from defective channel trafficking (W322C) or gating (N318S). Strikingly, and in contrast to the majority of ATS mutations, neither mutant caused a dominant-negative suppression of wild-type Kir2.1, Kir2.2 and Kir2.3 currents. Thus, a mild reduction of native Kir2.x currents by non dominant-negative mutants may cause ATS with an isolated cardiac phenotype.

    Funded by: Wellcome Trust: 098051

    Basic research in cardiology 2013;108;3;353

  • Amphotericin B increases influenza A virus infection by preventing IFITM3-mediated restriction.

    Lin TY, Chin CR, Everitt AR, Clare S, Perreira JM, Savidis G, Aker AM, John SP, Sarlah D, Carreira EM, Elledge SJ, Kellam P and Brass AL

    Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, Worcester, MA 01655, USA.

    The IFITMs inhibit influenza A virus (IAV) replication in vitro and in vivo. Here, we establish that the antimycotic heptaen, amphotericin B (AmphoB), prevents IFITM3-mediated restriction of IAV, thereby increasing viral replication. Consistent with its neutralization of IFITM3, a clinical preparation of AmphoB, AmBisome, reduces the majority of interferon's protective effect against IAV in vitro. Mechanistic studies reveal that IFITM1 decreases host-membrane fluidity, suggesting both a possible mechanism for IFITM-mediated restriction and its negation by AmphoB. Notably, we reveal that mice treated with AmBisome succumbed to a normally mild IAV infection, similar to animals deficient in Ifitm3. Therefore, patients receiving antifungal therapy with clinical preparations of AmphoB may be functionally immunocompromised and thus more vulnerable to influenza, as well as other IFITM3-restricted viral infections.

    Funded by: NIAID NIH HHS: 1R01AI091786; Wellcome Trust

    Cell reports 2013;5;4;895-908

  • A combination of improved differential and global RNA-seq reveals pervasive transcription initiation and events in all stages of the life-cycle of functional RNAs in Propionibacterium acnes, a major contributor to wide-spread human disease.

    Lin YF, A DR, Guan S, Mamanova L and McDowall KJ

    Background: Sequencing of the genome of Propionibacterium acnes produced a catalogue of genes many of which enable this organism to colonise skin and survive exposure to the elements. Despite this platform, there was little understanding of the gene regulation that gives rise to an organism that has a major impact on human health and wellbeing and causes infections beyond the skin. To address this situation, we have undertaken a genome--wide study of gene regulation using a combination of improved differential and global RNA-sequencing and an analytical approach that takes into account the inherent noise within the data. Results: We have produced nucleotide-resolution transcriptome maps that identify and differentiate sites of transcription initiation from sites of stable RNA processing and mRNA cleavage. Moreover, analysis of these maps provides strong evidence for 'pervasive' transcription and shows that contrary to initial indications it is not biased towards the production of antisense RNAs. In addition, the maps reveal an extensive array of riboswitches, leaderless mRNAs and small non-protein-coding RNAs alongside vegetative promoters and post-transcriptional events, which includes unusual tRNA processing. The identification of such features will inform models of complex gene regulation, as illustrated here for ribonucleotide reductases and a potential quorum-sensing, two-component system. Conclusions: The approach described here, which is transferable to any bacterial species, has produced a step increase in whole-cell knowledge of gene regulation in P. acnes. Continued expansion of our maps to include transcription associated with different growth conditions and genetic backgrounds will provide a new platform from which to computationally model the gene expression that determines the physiology of P. acnes and its role in human disease.

    BMC genomics 2013;14;1;620

  • The future role of genetic screening to detect newborns at risk of childhood-onset hearing loss.

    Linden Phillips L, Bitner-Glindzicz M, Lench N, Steel KP, Langford C, Dawson SJ, Davis A, Simpson S and Packer C

    * National Institute for Health Research (NIHR) Horizon Scanning Centre, School of Health and Population Sciences, University of Birmingham , Birmingham , UK.

    Abstract Objective: To explore the future potential of genetic screening to detect newborns at risk of childhood-onset hearing loss. Design: An expert led discussion of current and future developments in genetic technology and the knowledge base of genetic hearing loss to determine the viability of genetic screening and the implications for screening policy. Results and Discussion: Despite increasing pressure to adopt genetic technologies, a major barrier for genetic screening in hearing loss is the uncertain clinical significance of the identified mutations and their interactions. Only when a reliable estimate of the future risk of hearing loss can be made at a reasonable cost, will genetic screening become viable. Given the speed of technological advancement this may be within the next 10 years. Decision-makers should start to consider how genetic screening could augment current screening programmes as well as the associated data processing and storage requirements. Conclusion: In the interim, we suggest that decision makers consider the benefits of (1) genetically testing all newborns and children with hearing loss, to determine aetiology and to increase knowledge of the genetic causes of hearing loss, and (2) consider screening pregnant women for the m.1555A> G mutation to reduce the risk of aminoglycoside antibiotic-associated hearing loss.

    International journal of audiology 2013;52;2;124-33

  • Survey of Culture, GoldenGate Assay, Universal Biosensor Assay, and 16S rRNA Gene Sequencing as Alternative Methods of Bacterial Pathogen Detection.

    Lindsay B, Pop M, Antonio M, Walker AW, Mai V, Ahmed D, Oundo J, Tamboura B, Panchalingam S, Levine MM, Kotloff K, Li S, Magder LS, Paulson JN, Liu B, Ikumapayi U, Ebruke C, Dione M, Adeyemi M, Rance R, Stares MD, Ukhanova M, Barnes B, Lewis I, Ahmed F, Alam MT, Amin R, Siddiqui S, Ochieng JB, Ouma E, Juma J, Mailu E, Omore R, O'Reilly CE, Hannis J, Manalili S, Deleon J, Yasuda I, Blyn L, Ranken R, Li F, Housley R, Ecker DJ, Hossain MA, Breiman RF, Morris JG, McDaniel TK, Parkhill J, Saha D, Sampath R, Stine OC and Nataro JP

    University of Maryland, School of Medicine, Baltimore, Maryland, USA.

    Cultivation-based assays combined with PCR or enzyme-linked immunosorbent assay (ELISA)-based methods for finding virulence factors are standard methods for detecting bacterial pathogens in stools; however, with emerging molecular technologies, new methods have become available. The aim of this study was to compare four distinct detection technologies for the identification of pathogens in stools from children under 5 years of age in The Gambia, Mali, Kenya, and Bangladesh. The children were identified, using currently accepted clinical protocols, as either controls or cases with moderate to severe diarrhea. A total of 3,610 stool samples were tested by established clinical culture techniques: 3,179 DNA samples by the Universal Biosensor assay (Ibis Biosciences, Inc.), 1,466 DNA samples by the GoldenGate assay (Illumina), and 1,006 DNA samples by sequencing of 16S rRNA genes. Each method detected different proportions of samples testing positive for each of seven enteric pathogens, enteroaggregative Escherichia coli (EAEC), enterotoxigenic E. coli (ETEC), enteropathogenic E. coli (EPEC), Shigella spp., Campylobacter jejuni, Salmonella enterica, and Aeromonas spp. The comparisons among detection methods included the frequency of positive stool samples and kappa values for making pairwise comparisons. Overall, the standard culture methods detected Shigella spp., EPEC, ETEC, and EAEC in smaller proportions of the samples than either of the methods based on detection of the virulence genes from DNA in whole stools. The GoldenGate method revealed the greatest agreement with the other methods. The agreement among methods was higher in cases than in controls. The new molecular technologies have a high potential for highly sensitive identification of bacterial diarrheal pathogens.

    Journal of clinical microbiology 2013;51;10;3263-9

  • Dense genotyping of immune-related disease regions identifies nine new risk loci for primary sclerosing cholangitis.

    Liu JZ, Hov JR, Folseraas T, Ellinghaus E, Rushbrook SM, Doncheva NT, Andreassen OA, Weersma RK, Weismüller TJ, Eksteen B, Invernizzi P, Hirschfield GM, Gotthardt DN, Pares A, Ellinghaus D, Shah T, Juran BD, Milkiewicz P, Rust C, Schramm C, Müller T, Srivastava B, Dalekos G, Nöthen MM, Herms S, Winkelmann J, Mitrovic M, Braun F, Ponsioen CY, Croucher PJ, Sterneck M, Teufel A, Mason AL, Saarela J, Leppa V, Dorfman R, Alvaro D, Floreani A, Onengut-Gumuscu S, Rich SS, Thompson WK, Schork AJ, Næss S, Thomsen I, Mayr G, König IR, Hveem K, Cleynen I, Gutierrez-Achury J, Ricaño-Ponce I, van Heel D, Björnsson E, Sandford RN, Durie PR, Melum E, Vatn MH, Silverberg MS, Duerr RH, Padyukov L, Brand S, Sans M, Annese V, Achkar JP, Boberg KM, Marschall HU, Chazouillères O, Bowlus CL, Wijmenga C, Schrumpf E, Vermeire S, Albrecht M, UK-PSCSC Consortium, International IBD Genetics Consortium, Rioux JD, Alexander G, Bergquist A, Cho J, Schreiber S, Manns MP, Färkkilä M, Dale AM, Chapman RW, Lazaridis KN, International PSC Study Group, Franke A, Anderson CA and Karlsen TH

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Primary sclerosing cholangitis (PSC) is a severe liver disease of unknown etiology leading to fibrotic destruction of the bile ducts and ultimately to the need for liver transplantation. We compared 3,789 PSC cases of European ancestry to 25,079 population controls across 130,422 SNPs genotyped using the Immunochip. We identified 12 genome-wide significant associations outside the human leukocyte antigen (HLA) complex, 9 of which were new, increasing the number of known PSC risk loci to 16. Despite comorbidity with inflammatory bowel disease (IBD) in 72% of the cases, 6 of the 12 loci showed significantly stronger association with PSC than with IBD, suggesting overlapping yet distinct genetic architectures for these two diseases. We incorporated association statistics from 7 diseases clinically occurring with PSC in the analysis and found suggestive evidence for 33 additional pleiotropic PSC risk loci. Together with network analyses, these findings add to the genetic risk map of PSC and expand on the relationship between PSC and other immune-mediated diseases.

    Funded by: Medical Research Council: G0601816; NCATS NIH HHS: UL1 TR000005; NIDDK NIH HHS: U01 DK062432; Wellcome Trust: 091745, 098051

    Nature genetics 2013;45;6;670-5

  • Fine Mapping of the Pond Snail Left-Right Asymmetry (Chirality) Locus Using RAD-Seq and Fibre-FISH.

    Liu MM, Davey JW, Banerjee R, Han J, Yang F, Aboobaker A, Blaxter ML and Davison A

    School of Biology, University of Nottingham, University Park, Nottingham, United Kingdom ; Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom.

    The left-right asymmetry of snails, including the direction of shell coiling, is determined by the delayed effect of a maternal gene on the chiral twist that takes place during early embryonic cell divisions. Yet, despite being a well-established classical problem, the identity of the gene and the means by which left-right asymmetry is established in snails remain unknown. We here demonstrate the power of new genomic approaches for identification of the chirality gene, "D". First, heterozygous (Dd) pond snails Lymnaea stagnalis were self-fertilised or backcrossed, and the genotype of more than six thousand offspring inferred, either dextral (DD/Dd) or sinistral (dd). Then, twenty of the offspring were used for Restriction-site-Associated DNA Sequencing (RAD-Seq) to identify anonymous molecular markers that are linked to the chirality locus. A local genetic map was constructed by genotyping three flanking markers in over three thousand snails. The three markers lie either side of the chirality locus, with one very tightly linked (<0.1 cM). Finally, bacterial artificial chromosomes (BACs) were isolated that contained the three loci. Fluorescent in situ hybridization (FISH) of pachytene cells showed that the three BACs tightly cluster on the same bivalent chromosome. Fibre-FISH identified a region of greater that ∼0.4 Mb between two BAC clone markers that must contain D. This work therefore establishes the resources for molecular identification of the chirality gene and the variation that underpins sinistral and dextral coiling. More generally, the results also show that combining genomic technologies, such as RAD-Seq and high resolution FISH, is a robust approach for mapping key loci in non-model systems.

    PloS one 2013;8;8;e71067

  • Detecting and Characterizing Genomic Signatures of Positive Selection in Global Populations.

    Liu X, Ong RT, Pillai EN, Elzein AM, Small KS, Clark TG, Kwiatkowski DP and Teo YY

    NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore 117456, Singapore; Saw Swee Hock School of Public Health, National University of Singapore, Singapore 117597, Singapore.

    Natural selection is a significant force that shapes the architecture of the human genome and introduces diversity across global populations. The question of whether advantageous mutations have arisen in the human genome as a result of single or multiple mutation events remains unanswered except for the fact that there exist a handful of genes such as those that confer lactase persistence, affect skin pigmentation, or cause sickle cell anemia. We have developed a long-range-haplotype method for identifying genomic signatures of positive selection to complement existing methods, such as the integrated haplotype score (iHS) or cross-population extended haplotype homozygosity (XP-EHH), for locating signals across the entire allele frequency spectrum. Our method also locates the founder haplotypes that carry the advantageous variants and infers their corresponding population frequencies. This presents an opportunity to systematically interrogate the whole human genome whether a selection signal shared across different populations is the consequence of a single mutation process followed subsequently by gene flow between populations or of convergent evolution due to the occurrence of multiple independent mutation events either at the same variant or within the same gene. The application of our method to data from 14 populations across the world revealed that positive-selection events tend to cluster in populations of the same ancestry. Comparing the founder haplotypes for events that are present across different populations revealed that convergent evolution is a rare occurrence and that the majority of shared signals stem from the same evolutionary event.

    American journal of human genetics 2013

  • Epigenetic conservation at gene regulatory elements revealed by non-methylated DNA profiling in seven vertebrates.

    Long HK, Sims D, Heger A, Blackledge NP, Kutter C, Wright ML, Grützner F, Odom DT, Patient R, Ponting CP and Klose RJ

    Department of Biochemistry , University of Oxford , Oxford , United Kingdom ; Weatherall Institute of Molecular Medicine, University of Oxford , Oxford , United Kingdom.

    Two-thirds of gene promoters in mammals are associated with regions of non-methylated DNA, called CpG islands (CGIs), which counteract the repressive effects of DNA methylation on chromatin. In cold-blooded vertebrates, computational CGI predictions often reside away from gene promoters, suggesting a major divergence in gene promoter architecture across vertebrates. By experimentally identifying non-methylated DNA in the genomes of seven diverse vertebrates, we instead reveal that non-methylated islands (NMIs) of DNA are a central feature of vertebrate gene promoters. Furthermore, NMIs are present at orthologous genes across vast evolutionary distances, revealing a surprising level of conservation in this epigenetic feature. By profiling NMIs in different tissues and developmental stages we uncover a unifying set of features that are central to the function of NMIs in vertebrates. Together these findings demonstrate an ancient logic for NMI usage at gene promoters and reveal an unprecedented level of epigenetic conservation across vertebrate evolution. DOI:http://dx.doi.org/10.7554/eLife.00348.001.

    eLife 2013;2;e00348

  • Human Spermatogenic Failure Purges Deleterious Mutation Load from the Autosomes and Both Sex Chromosomes, including the Gene DMRT1.

    Lopes AM, Aston KI, Thompson E, Carvalho F, Gonçalves J, Huang N, Matthiesen R, Noordam MJ, Quintela I, Ramu A, Seabra C, Wilfert AB, Dai J, Downie JM, Fernandes S, Guo X, Sha J, Amorim A, Barros A, Carracedo A, Hu Z, Hurles ME, Moskovtsev S, Ober C, Paduch DA, Schiffman JD, Schlegel PN, Sousa M, Carrell DT and Conrad DF

    Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal.

    Gonadal failure, along with early pregnancy loss and perinatal death, may be an important filter that limits the propagation of harmful mutations in the human population. We hypothesized that men with spermatogenic impairment, a disease with unknown genetic architecture and a common cause of male infertility, are enriched for rare deleterious mutations compared to men with normal spermatogenesis. After assaying genomewide SNPs and CNVs in 323 Caucasian men with idiopathic spermatogenic impairment and more than 1,100 controls, we estimate that each rare autosomal deletion detected in our study multiplicatively changes a man's risk of disease by 10% (OR 1.10 [1.04-1.16], p<2×10(-3)), rare X-linked CNVs by 29%, (OR 1.29 [1.11-1.50], p<1×10(-3)), and rare Y-linked duplications by 88% (OR 1.88 [1.13-3.13], p<0.03). By contrasting the properties of our case-specific CNVs with those of CNV callsets from cases of autism, schizophrenia, bipolar disorder, and intellectual disability, we propose that the CNV burden in spermatogenic impairment is distinct from the burden of large, dominant mutations described for neurodevelopmental disorders. We identified two patients with deletions of DMRT1, a gene on chromosome 9p24.3 orthologous to the putative sex determination locus of the avian ZW chromosome system. In an independent sample of Han Chinese men, we identified 3 more DMRT1 deletions in 979 cases of idiopathic azoospermia and none in 1,734 controls, and found none in an additional 4,519 controls from public databases. The combined results indicate that DMRT1 loss-of-function mutations are a risk factor and potential genetic cause of human spermatogenic failure (frequency of 0.38% in 1306 cases and 0% in 7,754 controls, p = 6.2×10(-5)). Our study identifies other recurrent CNVs as potential causes of idiopathic azoospermia and generates hypotheses for directing future studies on the genetic basis of male infertility and IVF outcomes.

    PLoS genetics 2013;9;3;e1003349

  • Genome-wide association study on detailed profiles of smoking behavior and nicotine dependence in a twin sample.

    Loukola A, Wedenoja J, Keskitalo-Vuokko K, Broms U, Korhonen T, Ripatti S, Sarin AP, Pitkäniemi J, He L, Häppölä A, Heikkilä K, Chou YL, Pergadia ML, Heath AC, Montgomery GW, Martin NG, Madden PA and Kaprio J

    Department of Public Health, Hjelt Institute, University of Helsinki, Helsinki, Finland.

    Smoking is a major risk factor for several somatic diseases and is also emerging as a causal factor for neuropsychiatric disorders. Genome-wide association (GWA) and candidate gene studies for smoking behavior and nicotine dependence (ND) have disclosed too few predisposing variants to account for the high estimated heritability. Previous large-scale GWA studies have had very limited phenotypic definitions of relevance to smoking-related behavior, which has likely impeded the discovery of genetic effects. We performed GWA analyses on 1114 adult twins ascertained for ever smoking from the population-based Finnish Twin Cohort study. The availability of 17 smoking-related phenotypes allowed us to comprehensively portray the dimensions of smoking behavior, clustered into the domains of smoking initiation, amount smoked and ND. Our results highlight a locus on 16p12.3, with several single-nucleotide polymorphisms (SNPs) in the vicinity of CLEC19A showing association (P<1 × 10(-6)) with smoking quantity. Interestingly, CLEC19A is located close to a previously reported attention-deficit hyperactivity disorder (ADHD) linkage locus and an evident link between ADHD and smoking has been established. Intriguing preliminary association (P<1 × 10(-5)) was detected between DSM-IV (Diagnostic and Statistical Manual of Mental Disorders, 4th edition) ND diagnosis and several SNPs in ERBB4, coding for a Neuregulin receptor, on 2q33. The association between ERBB4 and DSM-IV ND diagnosis was replicated in an independent Australian sample. Recently, a significant increase in ErbB4 and Neuregulin 3 (Nrg3) expression was revealed following chronic nicotine exposure and withdrawal in mice and an association between NRG3 SNPs and smoking cessation success was detected in a clinical trial. ERBB4 has previously been associated with schizophrenia; further, it is located within an established schizophrenia linkage locus and within a linkage locus for a smoker phenotype identified in this sample. In conclusion, we disclose novel tentative evidence for the involvement of ERBB4 in ND, suggesting the involvement of the Neuregulin/ErbB signalling pathway in addictions and providing a plausible link between the high co-morbidity of schizophrenia and ND.Molecular Psychiatry advance online publication, 11 June 2013; doi:10.1038/mp.2013.72.

    Funded by: NIAAA NIH HHS: K05 AA017688, P60 AA011998, R01 AA013320; NIDA NIH HHS: R01 DA012854

    Molecular psychiatry 2013

  • Generation of a Tn5 transposon library in Haemophilus parasuis and analysis by transposon-directed insertion-site sequencing (TraDIS).

    Luan SL, Chaudhuri RR, Peters SE, Mayho M, Weinert LA, Crowther SA, Wang J, Langford PR, Rycroft A, Wren BW, Tucker AW, Maskell DJ and BRaDP1T Consortium

    Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge CB3 0ES, UK. sl470@cam.ac.uk

    Haemophilus parasuis is an important respiratory tract pathogen of swine and the etiological agent of Glässer's disease. The molecular pathogenesis of H. parasuis is not well studied, mainly due to the lack of efficient tools for genetic manipulation of this bacterium. In this study we describe a Tn5-based random mutagenesis method for use in H. parasuis. A novel chloramphenicol-resistant Tn5 transposome was electroporated into the virulent H. parasuis serovar 5 strain 29755. High transposition efficiency of Tn5, up to 10(4) transformants/μg of transposon DNA, was obtained by modification of the Tn5 DNA in the H. parasuis strain HS071 and establishment of optimal electrotransformation conditions, and a library of approximately 10,500 mutants was constructed. Analysis of the library using transposon-directed insertion-site sequencing (TraDIS) revealed that the insertion of Tn5 was evenly distributed throughout the genome. 10,001 individual mutants were identified, with 1561 genes being disrupted (69.4% of the genome). This newly-developed, efficient mutagenesis approach will be a powerful tool for genetic manipulation of H. parasuis in order to study its physiology and pathogenesis.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/G003203/1, BB/G019177/1, BB/G019274/1, BB/G020744/1

    Veterinary microbiology 2013;166;3-4;558-66

  • The Distribution and 'In Vivo' Phase Variation Status of Haemoglobin Receptors in Invasive Meningococcal Serogroup B Disease: Genotypic and Phenotypic Analysis.

    Lucidarme J, Findlow J, Chan H, Feavers IM, Gray SJ, Kaczmarski EB, Parkhill J, Bai X, Borrow R and Bayliss CD

    Public Health England, Manchester, United Kingdom.

    Two haemoglobin-binding proteins, HmbR and HpuAB, contribute to iron acquisition by Neisseria meningitidis. These receptors are subject to high frequency, reversible switches in gene expression - phase variation (PV) - due to mutations in homopolymeric (poly-G) repeats present in the open reading frame. The distribution and PV state of these receptors was assessed for a representative collection of isolates from invasive meningococcal disease patients of England, Wales and Northern Ireland. Most of the major clonal complexes had only the HmbR receptor whilst the recently expanding ST-275-centred cluster of the ST-269 clonal complex had both receptors. At least one of the receptors was in an 'ON' configuration in 76.3% of the isolates, a finding that was largely consistent with phenotypic analyses. As PV status may change during isolation and culture of meningococci, a PCR-based protocol was utilised to confirm the expression status of the receptors within contemporaneously acquired clinical specimens (blood/cerebrospinal fluid) from the respective patients. The expression state was confirmed for all isolate/specimen pairs with <15 tract repeats indicating that the PV status of these receptors is stable during isolation. This study therefore establishes a protocol for determining in vivo PV status to aid in determining the contributions of phase variable genes to invasive meningococcal disease. Furthermore, the results of the study support a putative but non-essential role of the meningococcal haemoglobin receptors as virulence factors whilst further highlighting their vaccine candidacy.

    PloS one 2013;8;9;e76932

  • RocA Truncation Underpins Hyper-Encapsulation, Carriage Longevity and Transmissibility of Serotype M18 Group A Streptococci.

    Lynskey NN, Goulding D, Gierula M, Turner CE, Dougan G, Edwards RJ and Sriskandan S

    Faculty of Medicine, Imperial College London, Hammersmith Hospital, London, United Kingdom.

    Group A streptococcal isolates of serotype M18 are historically associated with epidemic waves of pharyngitis and the non-suppurative immune sequela rheumatic fever. The serotype is defined by a unique, highly encapsulated phenotype, yet the molecular basis for this unusual colony morphology is unknown. Here we identify a truncation in the regulatory protein RocA, unique to and conserved within our serotype M18 GAS collection, and demonstrate that it underlies the characteristic M18 capsule phenotype. Reciprocal allelic exchange mutagenesis of rocA between M18 GAS and M89 GAS demonstrated that truncation of RocA was both necessary and sufficient for hyper-encapsulation via up-regulation of both precursors required for hyaluronic acid synthesis. Although RocA was shown to positively enhance covR transcription, quantitative proteomics revealed RocA to be a metabolic regulator with activity beyond the CovR/S regulon. M18 GAS demonstrated a uniquely protuberant chain formation following culture on agar that was dependent on excess capsule and the RocA mutation. Correction of the M18 rocA mutation reduced GAS survival in human blood, and in vivo naso-pharyngeal carriage longevity in a murine model, with an associated drop in bacterial airborne transmission during infection. In summary, a naturally occurring truncation in a regulator explains the encapsulation phenotype, carriage longevity and transmissibility of M18 GAS, highlighting the close interrelation of metabolism, capsule and virulence.

    PLoS pathogens 2013;9;12;e1003842

  • Next-generation sequencing of disseminated tumor cells.

    Møller EK, Kumar P, Voet T, Peterson A, Van Loo P, Mathiesen RR, Fjelldal R, Grundstad J, Borgen E, Baumbusch LO, Naume B, Børresen-Dale AL, White KP, Nord S and Kristensen VN

    Department of Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet , Oslo , Norway ; K.G. Jebsen Center for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo , Oslo , Norway.

    Disseminated tumor cells (DTCs) detected in the bone marrow have been shown as an independent prognostic factor for women with breast cancer. However, the mechanisms behind the tumor cell dissemination are still unclear and more detailed knowledge is needed to fully understand why some cells remain dormant and others metastasize. Sequencing of single cells has opened for the possibility to dissect the genetic content of subclones of a primary tumor, as well as DTCs. Previous studies of genetic changes in DTCs have employed single-cell array comparative genomic hybridization which provides information about larger aberrations. To date, next-generation sequencing provides the possibility to discover new, smaller, and copy neutral genetic changes. In this study, we performed whole-genome amplification and subsequently next-generation sequencing to analyze DTCs from two breast cancer patients. We compared copy-number profiles of the DTCs and the corresponding primary tumor generated from sequencing and SNP-comparative genomic hybridization (CGH) data, respectively. While one tumor revealed mostly whole-arm gains and losses, the other had more complex alterations, as well as subclonal amplification and deletions. Whole-arm gains or losses in the primary tumor were in general also observed in the corresponding DTC. Both primary tumors showed amplification of chromosome 1q and deletion of parts of chromosome 16q, which was recaptured in the corresponding DTCs. Interestingly, clear differences were also observed, indicating that the DTC underwent further evolution at the copy-number level. This study provides a proof-of-principle for sequencing of DTCs and correlation with primary copy-number profiles. The analyses allow insight into tumor cell dissemination and show ongoing copy-number evolution in DTCs compared to the primary tumors.

    Frontiers in oncology 2013;3;320

  • Genome-wide association study in a Chinese population identifies a susceptibility locus for type 2 diabetes at 7q32 near PAX4.

    Ma RC, Hu C, Tam CH, Zhang R, Kwan P, Leung TF, Thomas GN, Go MJ, Hara K, Sim X, Ho JS, Wang C, Li H, Lu L, Wang Y, Li JW, Wang Y, Lam VK, Wang J, Yu W, Kim YJ, Ng DP, Fujita H, Panoutsopoulou K, Day-Williams AG, Lee HM, Ng AC, Fang YJ, Kong AP, Jiang F, Ma X, Hou X, Tang S, Lu J, Yamauchi T, Tsui SK, Woo J, Leung PC, Zhang X, Tang NL, Sy HY, Liu J, Wong TY, Lee JY, Maeda S, Xu G, Cherny SS, Chan TF, Ng MC, Xiang K, Morris AP, DIAGRAM Consortium, Keildson S, The MuTHER Consortium, Hu R, Ji L, Lin X, Cho YS, Kadowaki T, Tai ES, Zeggini E, McCarthy MI, Hon KL, Baum L, Tomlinson B, So WY, Bao Y, Chan JC and Jia W

    Department of Medicine and Therapeutics, Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong, SAR, People's Republic of China, rcwma@cuhk.edu.hk.

    AIMS/HYPOTHESIS: Most genetic variants identified for type 2 diabetes have been discovered in European populations. We performed genome-wide association studies (GWAS) in a Chinese population with the aim of identifying novel variants for type 2 diabetes in Asians. METHODS: We performed a meta-analysis of three GWAS comprising 684 patients with type 2 diabetes and 955 controls of Southern Han Chinese descent. We followed up the top signals in two independent Southern Han Chinese cohorts (totalling 10,383 cases and 6,974 controls), and performed in silico replication in multiple populations. RESULTS: We identified CDKN2A/B and four novel type 2 diabetes association signals with p < 1 × 10(-5) from the meta-analysis. Thirteen variants within these four loci were followed up in two independent Chinese cohorts, and rs10229583 at 7q32 was found to be associated with type 2 diabetes in a combined analysis of 11,067 cases and 7,929 controls (p meta = 2.6 × 10(-8); OR [95% CI] 1.18 [1.11, 1.25]). In silico replication revealed consistent associations across multiethnic groups, including five East Asian populations (p meta = 2.3 × 10(-10)) and a population of European descent (p = 8.6 × 10(-3)). The rs10229583 risk variant was associated with elevated fasting plasma glucose, impaired beta cell function in controls, and an earlier age at diagnosis for the cases. The novel variant lies within an islet-selective cluster of open regulatory elements. There was significant heterogeneity of effect between Han Chinese and individuals of European descent, Malaysians and Indians. CONCLUSIONS/INTERPRETATION: Our study identifies rs10229583 near PAX4 as a novel locus for type 2 diabetes in Chinese and other populations and provides new insights into the pathogenesis of type 2 diabetes.

    Diabetologia 2013

  • Human candidate polymorphisms in sympatric ethnic groups differing in malaria susceptibility in mali.

    Maiga B, Dolo A, Touré O, Dara V, Tapily A, Campino S, Sepulveda N, Risley P, Silva N, Corran P, Rockett KA, Kwiatkowski D, MalariaGEN Consortium, Clark TG, Troye-Blomberg M and Doumbo OK

    Malaria Research and Training Center / Department of Epidemiology of Parasitic Diseases / Faculty of Medicine, Pharmacy and Odonto - Stomatology, BP 1805, Bamako, USTTB, Mali ; Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, Stockholm, Sweden.

    Malaria still remains a major public health problem in Mali, although disease susceptibility varies between ethnic groups, particularly between the Fulani and Dogon. These two sympatric groups share similar socio-cultural factors and malaria transmission rates, but Fulani individuals tend to show significantly higher spleen enlargement scores, lower parasite prevalence, and seem less affected by the disease than their Dogon neighbours. We have used genetic polymorphisms from malaria-associated genes to investigate associations with various malaria metrics between the Fulanai and Dogon groups. Two cross sectional surveys (transmission season 2006, dry season 2007) were performed. Healthy volunteers from the both ethnic groups (n=939) were recruited in a rural setting. In each survey, clinical (spleen enlargement, axillary temperature, weight) and parasitological data (malaria parasite densities and species) were collected, as well as blood samples. One hundred and sixty six SNPs were genotyped and 5 immunoassays (AMA1, CSP, MSP1, MSP2, total IgE) were performed on the DNA and serum samples respectively. The data confirm the reduced malaria susceptibility in the Fulani, with a higher level of the protective O-blood group, and increased circulating antibody levels to several malaria antigens (p<10(-15)). We identified SNP allele frequency differences between the 2 ethnic groups in CD36, IL4, RTN3 and ADCY9. Moreover, polymorphisms in FCER1A, RAD50, TNF, SLC22A4, and IL13 genes were correlated with antibody production (p-value<0.003). Further work is required to understand the mechanisms underpinning these genetic factors.

    PloS one 2013;8;10;e75675

  • A follow-up linkage study of Finnish pre-eclampsia families identifies a new fetal susceptibility locus on chromosome 18.

    Majander KK, Villa PM, Kivinen K, Kere J and Laivuori H

    1] Department of Medical Genetics, Haartman Institute, University of Helsinki, Helsinki, Finland [2] Research Programs Unit, Women's Health, University of Helsinki, Helsinki, Finland.

    Pre-eclampsia is a common vascular disorder of pregnancy. It originates in the placenta and targets the maternal endothelium. According to epidemiological research, >50% of the liability to this disorder can be accounted for by genetic factors. Both maternal and fetal genes contribute to the risk, but especially the fetal genetic risk profile is still poorly understood. We have previously detected linkage signals in multiplex Finnish families on chromosomes 2p25, 4q32, and 9p13 using maternal phenotypes. We performed a linkage analysis using updated maternal phenotypes and an unprecedented linkage analysis using fetal phenotypes. Markers genotyped were available from 237 individuals in 15 Finnish families, including 72 affected mothers and 49 affected fetuses. The MERLIN software was used for sample and marker quality control and linkage analysis. The results were compared against the original ones obtained by using the GENEHUNTER 2.1 software. The previous identification of the maternal susceptibility locus to a genetic location at 21.70 cM near marker D2S168 on chromosome 2 was confirmed by using both maternal and fetal phenotypes (maternal non-parametric linkage (NPL) score 3.79, P=0.00008, LOD (logarithm (base 10) of odds)=2.20 and fetal NPL score 2.95, P=0.002, LOD=1.71). As a novel finding, we present a suggestive linkage to chromosome 18 at 86.80 cM near marker D18S64 (NPL score 2.51, P=0.006, LOD=1.20) using the fetal phenotype. We propose that chromosome 18 may harbor a new fetal susceptibility locus for pre-eclampsia.European Journal of Human Genetics advance online publication, 6 February 2013; doi:10.1038/ejhg.2013.6.

    European journal of human genetics : EJHG 2013

  • The agr locus regulates virulence and colonization genes in Clostridium difficile 027.

    Martin MJ, Clare S, Goulding D, Faulds-Pain A, Barquist L, Browne HP, Pettit L, Dougan G, Lawley TD and Wren BW

    Department of Pathogen Molecular Biology, London School of Hygiene and Tropical Medicine, University of London, London, United Kingdom.

    The transcriptional regulator AgrA, a member of the LytTR family of proteins, plays a key role in controlling gene expression in some Gram-positive pathogens, including Staphylococcus aureus and Enterococcus faecalis. AgrA is encoded by the agrACDB global regulatory locus, and orthologues are found within the genome of most Clostridium difficile isolates, including the epidemic lineage 027/BI/NAP1. Comparative RNA sequencing of the wild type and otherwise isogenic agrA null mutant derivatives of C. difficile R20291 revealed a network of approximately 75 differentially regulated transcripts at late exponential growth phase, including many genes associated with flagellar assembly and function, such as the major structural subunit, FliC. Other differentially regulated genes include several involved in bis-(3'-5')-cyclic dimeric GMP (c-di-GMP) synthesis and toxin A expression. C. difficile 027 R20291 agrA mutant derivatives were poorly flagellated and exhibited reduced levels of colonization and relapses in the murine infection model. Thus, the agr locus likely plays a contributory role in the fitness and virulence potential of C. difficile strains in the 027/BI/NAP1 lineage.

    Funded by: Medical Research Council: 93614, G1000214; Wellcome Trust: 086418, 086418/Z, 098051

    Journal of bacteriology 2013;195;16;3672-81

  • Playing fast and loose with mutation.

    Mather AE and Harris SR

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2013;11;12;822

  • Distinguishable epidemics of multidrug-resistant Salmonella Typhimurium DT104 in different hosts.

    Mather AE, Reid SW, Maskell DJ, Parkhill J, Fookes MC, Harris SR, Brown DJ, Coia JE, Mulvey MR, Gilmour MW, Petrovska L, de Pinna E, Kuroda M, Akiba M, Izumiya H, Connor TR, Suchard MA, Lemey P, Mellor DJ, Haydon DT and Thomson NR

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.

    The global epidemic of multidrug-resistant Salmonella Typhimurium DT104 provides an important example, both in terms of the agent and its resistance, of a widely disseminated zoonotic pathogen. Here, with an unprecedented national collection of isolates collected contemporaneously from humans and animals and including a sample of internationally derived isolates, we have used whole-genome sequencing to dissect the phylogenetic associations of the bacterium and its antimicrobial resistance genes through the course of an epidemic. Contrary to current tenets supporting a single homogeneous epidemic, we demonstrate that the bacterium and its resistance genes were largely maintained within animal and human populations separately and that there was limited transmission, in either direction. We also show considerable variation in the resistance profiles, in contrast to the largely stable bacterial core genome, which emphasizes the critical importance of integrated genotypic data sets in understanding the ecology of bacterial zoonoses and antimicrobial resistance.

    Funded by: European Research Council: 260864; NHGRI NIH HHS: HG006139, R01 HG006139; NIAID NIH HHS: AI107034, R01 AI107034; NIGMS NIH HHS: R01 GM086887; Wellcome Trust: 098051

    Science (New York, N.Y.) 2013;341;6153;1514-7

  • Mosaic copy number variation in human neurons.

    McConnell MJ, Lindberg MR, Brennand KJ, Piper JC, Voet T, Cowing-Zitron C, Shumilina S, Lasken RS, Vermeesch JR, Hall IM and Gage FH

    Laboratory of Genetics, Salk Institute for Biological Studies, La Jolla, CA 92037, USA.

    We used single-cell genomic approaches to map DNA copy number variation (CNV) in neurons obtained from human induced pluripotent stem cell (hiPSC) lines and postmortem human brains. We identified aneuploid neurons, as well as numerous subchromosomal CNVs in euploid neurons. Neurotypic hiPSC-derived neurons had larger CNVs than fibroblasts, and several large deletions were found in hiPSC-derived neurons but not in matched neural progenitor cells. Single-cell sequencing of endogenous human frontal cortex neurons revealed that 13 to 41% of neurons have at least one megabase-scale de novo CNV, that deletions are twice as common as duplications, and that a subset of neurons have highly aberrant genomes marked by multiple alterations. Our results show that mosaic CNV is abundant in human neurons.

    Funded by: NCCDPHP CDC HHS: DP20D006493-01; NICHD NIH HHS: N01-HD-9-011; NIMH NIH HHS: R01 MH095741; PHS HHS: HHSN2752009000011C

    Science (New York, N.Y.) 2013;342;6158;632-7

  • Retrospective analysis of whole genome sequencing compared to prospective typing data in further informing the epidemiological investigation of an outbreak of Shigella sonnei in the UK.

    McDonnell J, Dallman T, Atkin S, Turbitt DA, Connor TR, Grant KA, Thomson NR and Jenkins C

    North East and North Central London Health Protection Unit, Health Protection Agency, London, UK.

    The aim of this study was to retrospectively assess the value of whole genome sequencing (WGS) compared to conventional typing methods in the investigation and control of an outbreak of Shigella sonnei in the Orthodox Jewish (OJ) community in the UK. The genome sequence analysis showed that the strains implicated in the outbreak formed three phylogenetically distinct clusters. One cluster represented cases associated with recent exposure to a single strain, whereas the other two clusters represented related but distinct strains of S. sonnei circulating in the OJ community across the UK. The WGS data challenged the conclusions drawn during the initial outbreak investigation and allowed cases of dysentery to be implicated or ruled out of the outbreak that were previously misclassified. This study showed that the resolution achieved using WGS would have clearly defined the outbreak, thus facilitating the promotion of infection control measures within local schools and the dissemination of a stronger public health message to the community.

    Funded by: Wellcome Trust: 098051

    Epidemiology and infection 2013;141;12;2568-75

  • Association Study of Common Genetic Variants and HIV-1 Acquisition in 6,300 Infected Cases and 7,200 Controls.

    McLaren PJ, Coulonges C, Ripke S, van den Berg L, Buchbinder S, Carrington M, Cossarizza A, Dalmau J, Deeks SG, Delaneau O, De Luca A, Goedert JJ, Haas D, Herbeck JT, Kathiresan S, Kirk GD, Lambotte O, Luo M, Mallal S, van Manen D, Martinez-Picado J, Meyer L, Miro JM, Mullins JI, Obel N, O'Brien SJ, Pereyra F, Plummer FA, Poli G, Qi Y, Rucart P, Sandhu MS, Shea PR, Schuitemaker H, Theodorou I, Vannberg F, Veldink J, Walker BD, Weintrob A, Winkler CA, Wolinsky S, Telenti A, Goldstein DB, de Bakker PI, Zagury JF and Fellay J

    School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland ; Institute of Microbiology, University Hospital Center and University of Lausanne, Lausanne, Switzerland ; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America.

    Multiple genome-wide association studies (GWAS) have been performed in HIV-1 infected individuals, identifying common genetic influences on viral control and disease course. Similarly, common genetic correlates of acquisition of HIV-1 after exposure have been interrogated using GWAS, although in generally small samples. Under the auspices of the International Collaboration for the Genomics of HIV, we have combined the genome-wide single nucleotide polymorphism (SNP) data collected by 25 cohorts, studies, or institutions on HIV-1 infected individuals and compared them to carefully matched population-level data sets (a list of all collaborators appears in Note S1 in Text S1). After imputation using the 1,000 Genomes Project reference panel, we tested approximately 8 million common DNA variants (SNPs and indels) for association with HIV-1 acquisition in 6,334 infected patients and 7,247 population samples of European ancestry. Initial association testing identified the SNP rs4418214, the C allele of which is known to tag the HLA-B*57:01 and B*27:05 alleles, as genome-wide significant (p = 3.6×10(-11)). However, restricting analysis to individuals with a known date of seroconversion suggested that this association was due to the frailty bias in studies of lethal diseases. Further analyses including testing recessive genetic models, testing for bulk effects of non-genome-wide significant variants, stratifying by sexual or parenteral transmission risk and testing previously reported associations showed no evidence for genetic influence on HIV-1 acquisition (with the exception of CCR5Δ32 homozygosity). Thus, these data suggest that genetic influences on HIV acquisition are either rare or have smaller effects than can be detected by this sample size.

    PLoS pathogens 2013;9;7;e1003515

  • The Evolutionary Path to Extraintestinal Pathogenic, Drug-Resistant Escherichia coli Is Marked by Drastic Reduction in Detectable Recombination within the Core Genome.

    McNally A, Cheng L, Harris SR and Corander J

    Pathogen Research Group, Nottingham Trent University, United Kingdom.

    Escherichia coli is a highly diverse group of pathogens ranging from commensal of the intestinal tract, through to intestinal pathogen, and extraintestinal pathogen. Here, we present data on the population diversity of E. coli, using Bayesian analysis to identify 13 distinct clusters within the population from multilocus sequence typing data, which map onto a whole-genome-derived phylogeny based on 62 genome sequences. Bayesian analysis of recombination within the core genome identified reduction in detectable core genome recombination as one moves from the commensals, through the intestinal pathogens down to the multidrug-resistant extraintestinal pathogenic clone E. coli ST131. Our data show that the emergence of a multidrug-resistant, extraintestinal pathogenic lineage of E. coli is marked by substantial reduction in detectable core genome recombination, resulting in a lineage which is phylogenetically distinct and sexually isolated in terms of core genome recombination.

    Genome biology and evolution 2013;5;4;699-710

  • Selecting antagonistic antibodies that control differentiation through inducible expression in embryonic stem cells.

    Melidoni AN, Dyson MR, Wormald S and McCafferty J

    Department of Biochemistry, University of Cambridge, Cambridge CB2 1QW, United Kingdom.

    Antibodies that modulate receptor function have great untapped potential in the control of stem cell differentiation. In contrast to many natural ligands, antibodies are stable, exquisitely specific, and are unaffected by the regulatory mechanisms that act on natural ligands. Here we describe an innovative system for identifying such antibodies by introducing and expressing antibody gene populations in ES cells. Following induced antibody expression and secretion, changes in differentiation outcomes of individual antibody-expressing ES clones are monitored using lineage-specific gene expression to identify clones that encode and express signal-modifying antibodies. This in-cell expression and reporting system was exemplified by generating blocking antibodies to FGF4 and its receptor FGFR1β, identified through delayed onset of ES cell differentiation. Functionality of the selected antibodies was confirmed by addition of exogenous antibodies to three different ES reporter cell lines, where retained expression of pluripotency markers Oct4, Nanog, and Rex1 was observed. This work demonstrates the potential for discovery and utility of functional antibodies in stem cell differentiation. This work is also unique in constituting an example of ES cells carrying an inducible antibody that causes a functional protein "knock-down" and allows temporal control of stable signaling components at the protein level.

    Proceedings of the National Academy of Sciences of the United States of America 2013;110;44;17802-7

  • Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties.

    Menden MP, Iorio F, Garnett M, McDermott U, Benes CH, Ballester PJ and Saez-Rodriguez J

    European Bioinformatics Institute, Wellcome Trust Genome Campus-Cambridge, Cambridge, United Kingdom.

    Predicting the response of a specific cancer to a therapy is a major goal in modern oncology that should ultimately lead to a personalised treatment. High-throughput screenings of potentially active compounds against a panel of genomically heterogeneous cancer cell lines have unveiled multiple relationships between genomic alterations and drug responses. Various computational approaches have been proposed to predict sensitivity based on genomic features, while others have used the chemical properties of the drugs to ascertain their effect. In an effort to integrate these complementary approaches, we developed machine learning models to predict the response of cancer cell lines to drug treatment, quantified through IC50 values, based on both the genomic features of the cell lines and the chemical properties of the considered drugs. Models predicted IC50 values in a 8-fold cross-validation and an independent blind test with coefficient of determination R(2) of 0.72 and 0.64 respectively. Furthermore, models were able to predict with comparable accuracy (R(2) of 0.61) IC50s of cell lines from a tissue not used in the training stage. Our in silico models can be used to optimise the experimental design of drug-cell screenings by estimating a large proportion of missing IC50 values rather than experimentally measuring them. The implications of our results go beyond virtual drug screening design: potentially thousands of drugs could be probed in silico to systematically test their potential efficacy as anti-tumour agents based on their structure, thus providing a computational framework to identify new drug repositioning opportunities as well as ultimately be useful for personalized medicine by linking the genomic traits of patients to drug sensitivity.

    PloS one 2013;8;4;e61318

  • Biomarkers for type 2 diabetes and impaired fasting glucose using a nontargeted metabolomics approach.

    Menni C, Fauman E, Erte I, Perry JR, Kastenmüller G, Shin SY, Petersen AK, Hyde C, Psatha M, Ward KJ, Yuan W, Milburn M, Palmer CN, Frayling TM, Trimmer J, Bell JT, Gieger C, Mohney RP, Brosnan MJ, Suhre K, Soranzo N and Spector TD

    Department of Twin Research and Genetic Epidemiology, King's College London, London, U.K.

    Using a nontargeted metabolomics approach of 447 fasting plasma metabolites, we searched for novel molecular markers that arise before and after hyperglycemia in a large population-based cohort of 2,204 females (115 type 2 diabetic [T2D] case subjects, 192 individuals with impaired fasting glucose [IFG], and 1,897 control subjects) from TwinsUK. Forty-two metabolites from three major fuel sources (carbohydrates, lipids, and proteins) were found to significantly correlate with T2D after adjusting for multiple testing; of these, 22 were previously reported as associated with T2D or insulin resistance. Fourteen metabolites were found to be associated with IFG. Among the metabolites identified, the branched-chain keto-acid metabolite 3-methyl-2-oxovalerate was the strongest predictive biomarker for IFG after glucose (odds ratio [OR] 1.65 [95% CI 1.39-1.95], P = 8.46 × 10(-9)) and was moderately heritable (h(2) = 0.20). The association was replicated in an independent population (n = 720, OR 1.68 [ 1.34-2.11], P = 6.52 × 10(-6)) and validated in 189 twins with urine metabolomics taken at the same time as plasma (OR 1.87 [1.27-2.75], P = 1 × 10(-3)). Results confirm an important role for catabolism of branched-chain amino acids in T2D and IFG. In conclusion, this T2D-IFG biomarker study has surveyed the broadest panel of nontargeted metabolites to date, revealing both novel and known associated metabolites and providing potential novel targets for clinical prediction and a deeper understanding of causal mechanisms.

    Funded by: Wellcome Trust: 092447/Z/10/Z, WT091310, WT098051

    Diabetes 2013;62;12;4270-6

  • Metabolomic markers reveal novel pathways of ageing and early development in human populations.

    Menni C, Kastenmüller G, Petersen AK, Bell JT, Psatha M, Tsai PC, Gieger C, Schulz H, Erte I, John S, Brosnan MJ, Wilson SG, Tsaprouni L, Lim EM, Stuckey B, Deloukas P, Mohney R, Suhre K, Spector TD and Valdes AM

    Department of Twin Research & Genetic Epidemiology, King's College London, London, UK, Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg, Germany, Institute of Genetic Epidemiology, Helmholtz Zentrum München, Neuherberg, Germany, Institute of Epidemiology I, Helmholtz Zentrum München, Neuherberg, Germany, Pfizer Research Laboratories, Groton, CT, USA, Worldwide R&D, Pfizer Inc., Cambridge, MA, USA, School of Medicine and Pharmacology, University of Western Australia, Crawley, WA, Australia, Department of Endocrinology and Diabetes, Sir Charles Gairdner Hospital, Nedlands, WA, Australia, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK, Metabolon Inc., 617 Davis Drive, Durham, NC 27713, USA; Department of Physiology and Biophysics, Weill Cornell Medical College in Qatar, Education City, Qatar Foundation, Doha, State of Qatar and Academic Rheumatology, University of Nottingham, Nottingham City Hospital, Nottingham, UK.

    Background: Human ageing is a complex, multifactorial process and early developmental factors affect health outcomes in old age.

    Methods: Metabolomic profiling on fasting blood was carried out in 6055 individuals from the UK. Stepwise regression was performed to identify a panel of independent metabolites which could be used as a surrogate for age. We also investigated the association with birthweight overall and within identical discordant twins and with genome-wide methylation levels.

    Results: We identified a panel of 22 metabolites which combined are strongly correlated with age (R(2) = 59%) and with age-related clinical traits independently of age. One particular metabolite, C-glycosyl tryptophan (C-glyTrp), correlated strongly with age (beta = 0.03, SE = 0.001, P = 7.0 × 10(-157)) and lung function (FEV1 beta = -0.04, SE = 0.008, P = 1.8 × 10(-8) adjusted for age and confounders) and was replicated in an independent population (n = 887). C-glyTrp was also associated with bone mineral density (beta = -0.01, SE = 0.002, P = 1.9 × 10(-6)) and birthweight (beta = -0.06, SE = 0.01, P = 2.5 × 10(-9)). The difference in C-glyTrp levels explained 9.4% of the variance in the difference in birthweight between monozygotic twins. An epigenome-wide association study in 172 individuals identified three CpG-sites, associated with levels of C-glyTrp (P < 2 × 10(-6)). We replicated one CpG site in the promoter of the WDR85 gene in an independent sample of 350 individuals (beta = -0.20, SE = 0.04, P = 2.9 × 10(-8)). WDR85 is a regulator of translation elongation factor 2, essential for protein synthesis in eukaryotes.

    Conclusions: Our data illustrate how metabolomic profiling linked with epigenetic studies can identify some key molecular mechanisms potentially determined in early development that produce long-term physiological changes influencing human health and ageing.

    International journal of epidemiology 2013

  • Quantifying single nucleotide variant detection sensitivity in exome sequencing.

    Meynert AM, Bicknell LS, Hurles ME, Jackson AP and Taylor MS

    MRC Human Genetics Unit, MRC Institute for Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, UK. alison.meynert@igmm.ed.ac.uk.

    Background: The targeted capture and sequencing of genomic regions has rapidly demonstrated its utility in genetic studies. Inherent in this technology is considerable heterogeneity of target coverage and this is expected to systematically impact our sensitivity to detect genuine polymorphisms. To fully interpret the polymorphisms identified in a genetic study it is often essential to both detect polymorphisms and to understand where and with what probability real polymorphisms may have been missed. Results: Using down-sampling of 30 deeply sequenced exomes and a set of gold-standard single nucleotide variant (SNV) genotype calls for each sample, we developed an empirical model relating the read depth at a polymorphic site to the probability of calling the correct genotype at that site. We find that measured sensitivity in SNV detection is substantially worse than that predicted from the naive expectation of sampling from a binomial. This calibrated model allows us to produce single nucleotide resolution SNV sensitivity estimates which can be merged to give summary sensitivity measures for any arbitrary partition of the target sequences (nucleotide, exon, gene, pathway, exome). These metrics are directly comparable between platforms and can be combined between samples to give "power estimates" for an entire study. We estimate a local read depth of 13X is required to detect the alleles and genotype of a heterozygous SNV 95% of the time, but only 3X for a homozygous SNV. At a mean on-target read depth of 20X, commonly used for rare disease exome sequencing studies, we predict 5-15% of heterozygous and 1-4% of homozygous SNVs in the targeted regions will be missed. Conclusions: Non-reference alleles in the heterozygote state have a high chance of being missed when commonly applied read coverage thresholds are used despite the widely held assumption that there is good polymorphism detection at these coverage levels. Such alleles are likely to be of functional importance in population based studies of rare diseases, somatic mutations in cancer and explaining the "missing heritability" of quantitative traits.

    BMC bioinformatics 2013;14;195

  • Empirical research on the ethics of genomic research.

    Middleton A, Parker M, Wright CF, Bragin E, Hurles ME and DDD Study

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. am33@sanger.ac.uk

    Funded by: Wellcome Trust

    American journal of medical genetics. Part A 2013;161A;8;2099-101

  • Multiple populations of artemisinin-resistant Plasmodium falciparum in Cambodia.

    Miotto O, Almagro-Garcia J, Manske M, Macinnis B, Campino S, Rockett KA, Amaratunga C, Lim P, Suon S, Sreng S, Anderson JM, Duong S, Nguon C, Chuor CM, Saunders D, Se Y, Lon C, Fukuda MM, Amenga-Etego L, Hodgson AV, Asoala V, Imwong M, Takala-Harrison S, Nosten F, Su XZ, Ringwald P, Ariey F, Dolecek C, Hien TT, Boni MF, Thai CQ, Amambua-Ngwa A, Conway DJ, Djimdé AA, Doumbo OK, Zongo I, Ouedraogo JB, Alcock D, Drury E, Auburn S, Koch O, Sanders M, Hubbart C, Maslen G, Ruano-Rubio V, Jyothi D, Miles A, O'Brien J, Gamble C, Oyola SO, Rayner JC, Newbold CI, Berriman M, Spencer CC, McVean G, Day NP, White NJ, Bethell D, Dondorp AM, Plowe CV, Fairhurst RM and Kwiatkowski DP

    Medical Research Council MRC Centre for Genomics and Global Health, University of Oxford, Oxford, UK.

    We describe an analysis of genome variation in 825 P. falciparum samples from Asia and Africa that identifies an unusual pattern of parasite population structure at the epicenter of artemisinin resistance in western Cambodia. Within this relatively small geographic area, we have discovered several distinct but apparently sympatric parasite subpopulations with extremely high levels of genetic differentiation. Of particular interest are three subpopulations, all associated with clinical resistance to artemisinin, which have skewed allele frequency spectra and high levels of haplotype homozygosity, indicative of founder effects and recent population expansion. We provide a catalog of SNPs that show high levels of differentiation in the artemisinin-resistant subpopulations, including codon variants in transporter proteins and DNA mismatch repair proteins. These data provide a population-level genetic framework for investigating the biological origins of artemisinin resistance and for defining molecular markers to assist in its elimination.

    Funded by: Howard Hughes Medical Institute: 55005502; Medical Research Council: G0600718, G19/9, MC_U190081987; Wellcome Trust: 082370, 089275, 089276, 090532, 090532/Z/09/Z, 090770, 090770/Z/09/Z, 098051, G0600718

    Nature genetics 2013;45;6;648-55

  • The challenge of increasing Pfam coverage of the human proteome.

    Mistry J, Coggill P, Eberhardt RY, Deiana A, Giansanti A, Finn RD, Bateman A and Punta M

    EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

    It is a worthy goal to completely characterize all human proteins in terms of their domains. Here, using the Pfam database, we asked how far we have progressed in this endeavour. Ninety per cent of proteins in the human proteome matched at least one of 5494 manually curated Pfam-A families. In contrast, human residue coverage by Pfam-A families was <45%, with 9418 automatically generated Pfam-B families adding a further 10%. Even after excluding predicted signal peptide regions and short regions (<50 consecutive residues) unlikely to harbour new families, for ∼38% of the human protein residues, there was no information in Pfam about conservation and evolutionary relationship with other protein regions. This uncovered portion of the human proteome was found to be distributed over almost 25 000 distinct protein regions. Comparison with proteins in the UniProtKB database suggested that the human regions that exhibited similarity to thousands of other sequences were often either divergent elements or N- or C-terminal extensions of existing families. Thirty-four per cent of regions, on the other hand, matched fewer than 100 sequences in UniProtKB. Most of these did not appear to share any relationship with existing Pfam-A families, suggesting that thousands of new families would need to be generated to cover them. Also, these latter regions were particularly rich in amino acid compositional bias such as the one associated with intrinsic disorder. This could represent a significant obstacle toward their inclusion into new Pfam families. Based on these observations, a major focus for increasing Pfam coverage of the human proteome will be to improve the definition of existing families. New families will also be built, prioritizing those that have been experimentally functionally characterized. Database URL: http://pfam.sanger.ac.uk/

    Database : the journal of biological databases and curation 2013;2013;bat023

  • Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions.

    Mistry J, Finn RD, Eddy SR, Bateman A and Punta M

    EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK and HHMI Janelia Farm Research Campus, 19700 Helix Drive, Ashburn, VA 20147, USA.

    Detection of protein homology via sequence similarity has important applications in biology, from protein structure and function prediction to reconstruction of phylogenies. Although current methods for aligning protein sequences are powerful, challenges remain, including problems with homologous overextension of alignments and with regions under convergent evolution. Here, we test the ability of the profile hidden Markov model method HMMER3 to correctly assign homologous sequences to >13 000 manually curated families from the Pfam database. We identify problem families using protein regions that match two or more Pfam families not currently annotated as related in Pfam. We find that HMMER3 E-value estimates seem to be less accurate for families that feature periodic patterns of compositional bias, such as the ones typically observed in coiled-coils. These results support the continued use of manually curated inclusion thresholds in the Pfam database, especially on the subset of families that have been identified as problematic in experiments such as these. They also highlight the need for developing new methods that can correct for this particular type of compositional bias.

    Nucleic acids research 2013

  • Nuclear Wave1 is required for reprogramming transcription in oocytes and for normal development.

    Miyamoto K, Teperek M, Yusa K, Allen GE, Bradshaw CR and Gurdon JB

    Wellcome Trust/Cancer Research UK Gurdon Institute, The Henry Wellcome Building of Cancer and Developmental Biology, Cambridge, UK. k.miyamoto@gurdon.cam.ac.uk

    Eggs and oocytes have a remarkable ability to induce transcription of sperm after normal fertilization and in somatic nuclei after somatic cell nuclear transfer. This ability of eggs and oocytes is essential for normal development. Nuclear actin and actin-binding proteins have been shown to contribute to transcription, although their mode of action is elusive. Here, we find that Xenopus Wave1, previously characterized as a protein involved in actin cytoskeleton organization, is present in the oocyte nucleus and is required for efficient transcriptional reprogramming. Moreover, Wave1 knockdown in embryos results in abnormal development and defective hox gene activation. Nuclear Wave1 binds by its WHD domain to active transcription components, and this binding contributes to the action of RNA polymerase II. We identify Wave1 as a maternal reprogramming factor that also has a necessary role in gene activation in development.

    Funded by: Medical Research Council: G1001690/1; Wellcome Trust: 101050/Z/13/Z, WT077187, WT089613

    Science (New York, N.Y.) 2013;341;6149;1002-5

  • Deciphering the Mechanisms of Developmental Disorders (DMDD): a new programme for phenotyping embryonic lethal mice.

    Mohun T, Adams DJ, Baldock R, Bhattacharya S, Copp AJ, Hemberger M, Houart C, Hurles ME, Robertson E, Smith JC, Weaver T and Weninger W

    MRC National Institute for Medical Research, London, NW7 1AA, UK.

    International efforts to test gene function in the mouse by the systematic knockout of each gene are creating many lines in which embryonic development is compromised. These homozygous lethal mutants represent a potential treasure trove for the biomedical community. Developmental biologists could exploit them in their studies of tissue differentiation and organogenesis; for clinical researchers they offer a powerful resource for investigating the origins of developmental diseases that affect newborns. Here, we outline a new programme of research in the UK aiming to kick-start research with embryonic lethal mouse lines. The 'Deciphering the Mechanisms of Developmental Disorders' (DMDD) programme has the ambitious goal of identifying all embryonic lethal knockout lines made in the UK over the next 5 years, and will use a combination of comprehensive imaging and transcriptomics to identify abnormalities in embryo structure and development. All data will be made freely available, enabling individual researchers to identify lines relevant to their research. The DMDD programme will coordinate its work with similar international efforts through the umbrella of the International Mouse Phenotyping Consortium [see accompanying Special Article (Adams et al., 2013)] and, together, these programmes will provide a novel database for embryonic development, linking gene identity with molecular profiles and morphology phenotypes.

    Disease models &amp; mechanisms 2013;6;3;562-6

  • MiR-210 is induced by Oct-2, regulates B cells, and inhibits autoantibody production.

    Mok Y, Schwierzeck V, Thomas DC, Vigorito E, Rayner TF, Jarvis LB, Prosser HM, Bradley A, Withers DR, Mårtensson IL, Corcoran LM, Blenkiron C, Miska EA, Lyons PA and Smith KG

    Department of Medicine, Cambridge Institute for Medical Research, University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Cambridge CB2 0XY, United Kingdom.

    MicroRNAs (MiRs) are small, noncoding RNAs that regulate gene expression posttranscriptionally. In this study, we show that MiR-210 is induced by Oct-2, a key transcriptional mediator of B cell activation. Germline deletion of MiR-210 results in the development of autoantibodies from 5 mo of age. Overexpression of MiR-210 in vivo resulted in cell autonomous expansion of the B1 lineage and impaired fitness of B2 cells. Mice overexpressing MiR-210 exhibited impaired class-switched Ab responses, a finding confirmed in wild-type B cells transfected with a MiR-210 mimic. In vitro studies demonstrated defects in cellular proliferation and cell cycle entry, which were consistent with the transcriptomic analysis demonstrating downregulation of genes involved in cellular proliferation and B cell activation. These findings indicate that Oct-2 induction of MiR-210 provides a novel inhibitory mechanism for the control of B cells and autoantibody production.

    Funded by: Biotechnology and Biological Sciences Research Council; Department of Health: 1693; Medical Research Council; Wellcome Trust: 06753AIA, 100140, WT098051

    Journal of immunology (Baltimore, Md. : 1950) 2013;191;6;3037-48

  • The origin, evolution, and functional impact of short insertion-deletion variants identified in 179 human genomes.

    Montgomery SB, Goode DL, Kvikstad E, Albers CA, Zhang ZD, Mu XJ, Ananda G, Howie B, Karczewski KJ, Smith KS, Anaya V, Richardson R, Davis J, 1000 Genomes Project Consortium, Macarthur DG, Sidow A, Duret L, Gerstein M, Makova KD, Marchini J, McVean G and Lunter G

    Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, 1211, Switzerland;

    Short insertions and deletions (indels) are the second most abundant form of human genetic variation, but our understanding of their origins and functional effects lags behind that of other types of variants. Using population-scale sequencing, we have identified a high-quality set of 1.6 million indels from 179 individuals representing three diverse human populations. We show that rates of indel mutagenesis are highly heterogeneous, with 43%-48% of indels occurring in 4.03% of the genome, whereas in the remaining 96% their prevalence is 16 times lower than SNPs. Polymerase slippage can explain upwards of three-fourths of all indels, with the remainder being mostly simple deletions in complex sequence. However, insertions do occur and are significantly associated with pseudo-palindromic sequence features compatible with the fork stalling and template switching (FoSTeS) mechanism more commonly associated with large structural variations. We introduce a quantitative model of polymerase slippage, which enables us to identify indel-hypermutagenic protein-coding genes, some of which are associated with recurrent mutations leading to disease. Accounting for mutational rate heterogeneity due to sequence context, we find that indels across functional sequence are generally subject to stronger purifying selection than SNPs. We find that indel length modulates selection strength, and that indels affecting multiple functionally constrained nucleotides undergo stronger purifying selection. We further find that indels are enriched in associations with gene expression and find evidence for a contribution of nonsense-mediated decay. Finally, we show that indels can be integrated in existing genome-wide association studies (GWAS); although we do not find direct evidence that potentially causal protein-coding indels are enriched with associations to known disease-associated SNPs, our findings suggest that the causal variant underlying some of these associations may be indels.

    Genome research 2013;23;5;749-61

  • Implementing a successful data-management framework: the UK10K managed access model.

    Muddyman D, Smee C, Griffin H, Kaye J and the UK10K Project

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. dm11@sanger.ac.uk.

    This paper outlines the history behind open access principles and describes the development of a managed access data-sharing process for the UK10K Project, currently Britain's largest genomic sequencing consortium (2010 to 2013). Funded by the Wellcome Trust, the purpose of UK10K was two-fold: to investigate how low-frequency and rare genetic variants contribute to human disease, and to provide an enduring data resource for future research into human genetics. In this paper, we discuss the challenge of reconciling data-sharing principles with the practicalities of delivering a sequencing project of UK10K's scope and magnitude. We describe the development of a sustainable, easy-to-use managed access system that allowed rapid access to UK10K data, while protecting the interests of participants and data generators alike. Specifically, we focus in depth on the three key issues that emerge in the data pipeline: study recruitment, data release and data access.

    Genome medicine 2013;5;11;100

  • Functional transcriptomics in the post-ENCODE era.

    Mudge JM, Frankish A and Harrow J

    Department of Informatics, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, United Kingdom.

    The last decade has seen tremendous effort committed to the annotation of the human genome sequence, most notably perhaps in the form of the ENCODE project. One of the major findings of ENCODE, and other genome analysis projects, is that the human transcriptome is far larger and more complex than previously thought. This complexity manifests, for example, as alternative splicing within protein-coding genes, as well as in the discovery of thousands of long noncoding RNAs. It is also possible that significant numbers of human transcripts have not yet been described by annotation projects, while existing transcript models are frequently incomplete. The question as to what proportion of this complexity is truly functional remains open, however, and this ambiguity presents a serious challenge to genome scientists. In this article, we will discuss the current state of human transcriptome annotation, drawing on our experience gained in generating the GENCODE gene annotation set. We highlight the gaps in our knowledge of transcript functionality that remain, and consider the potential computational and experimental strategies that can be used to help close them. We propose that an understanding of the true overlap between transcriptional complexity and functionality will not be gained in the short term. However, significant steps toward obtaining this knowledge can now be taken by using an integrated strategy, combining all of the experimental resources at our disposal.

    Genome research 2013

  • Independent specialization of the human and mouse X chromosomes for the male germ line.

    Mueller JL, Skaletsky H, Brown LG, Zaghlul S, Rock S, Graves T, Auger K, Warren WC, Wilson RK and Page DC

    Whitehead Institute, Cambridge, Massachusetts, USA.

    We compared the human and mouse X chromosomes to systematically test Ohno's law, which states that the gene content of X chromosomes is conserved across placental mammals. First, we improved the accuracy of the human X-chromosome reference sequence through single-haplotype sequencing of ampliconic regions. The new sequence closed gaps in the reference sequence, corrected previously misassembled regions and identified new palindromic amplicons. Our subsequent analysis led us to conclude that the evolution of human and mouse X chromosomes was bimodal. In accord with Ohno's law, 94-95% of X-linked single-copy genes are shared by humans and mice; most are expressed in both sexes. Notably, most X-ampliconic genes are exceptions to Ohno's law: only 31% of human and 22% of mouse X-ampliconic genes had orthologs in the other species. X-ampliconic genes are expressed predominantly in testicular germ cells, and many were independently acquired since divergence from the common ancestor of humans and mice, specializing portions of their X chromosomes for sperm production.

    Nature genetics 2013

  • A powerful molecular synergy between mutant Nucleophosmin and Flt3-ITD drives acute myeloid leukemia in mice.

    Mupo A, Celani L, Dovey O, Cooper JL, Grove C, Rad R, Sportoletti P, Falini B, Bradley A and Vassiliou GS

    Funded by: Wellcome Trust: 095663

    Leukemia 2013;27;9;1917-20

  • Evolution of equine influenza virus in vaccinated horses.

    Murcia PR, Baillie GJ, Stack JC, Jervis C, Elton D, Mumford JA, Daly J, Kellam P, Grenfell BT, Holmes EC and Wood JL

    Medical Research Council-University of Glasgow Centre for Virus Research, Institute of Infection, Inflammation and Immunity, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom.

    Influenza A viruses are characterized by their ability to evade host immunity, even in vaccinated individuals. To determine how prior immunity shapes viral diversity in vivo, we studied the intra- and interhost evolution of equine influenza virus in vaccinated horses. Although the level and structure of genetic diversity were similar to those in naïve horses, intrahost bottlenecks may be more stringent in vaccinated animals, and mutations shared among horses often fall close to putative antigenic sites.

    Funded by: Medical Research Council: G0801822; NIGMS NIH HHS: R01 GM080533, R01 GM080533-06; Wellcome Trust

    Journal of virology 2013;87;8;4768-71

  • Sociodemographic distribution of non-communicable disease risk factors in rural Uganda: a cross-sectional study.

    Murphy GA, Asiki G, Ekoru K, Nsubuga RN, Nakiyingi-Miiro J, Young EH, Seeley J, Sandhu MS and Kamali A

    Department of Public Health & Primary Care, University of Cambridge, Cambridge, UK, Wellcome Trust Sanger Institute, Hinxton, UK, Medical Research Council/Uganda Virus Research Institute (MRC/UVRI), Uganda Research Unit on AIDS, Entebbe, Uganda, London School of Hygiene and Tropical Medicine, London, UK and School of International Development, University of East Anglia, Norwich, UK.

    Background: Non-communicable diseases (NCDs) are rapidly becoming leading causes of morbidity and mortality in low- and middle-income countries, including those in sub-Saharan Africa. In contrast to high-income countries, the sociodemographic distribution, including socioeconomic inequalities, of NCDs and their risk factors is unclear in sub-Saharan Africa, particularly among rural populations.

    Methods: We undertook a cross-sectional population-based survey of 7809 residents aged 13 years or older in the General Population Cohort in south-western rural Uganda. Information on behavioural, physiological and biochemical risk factors was obtained using standardized methods as recommended by the WHO STEPwise Approach to Surveillance. Socioeconomic status (SES) was determined by principal component analysis including household features, ownership, and occupation and education of the head of household.

    Results: SES was found to be associated with NCD risk factors in this rural population. Smoking, alcohol consumption (men only) and low high-density lipoprotein (HDL) cholesterol were more common among those of lower SES. For example, the prevalence of smoking decreased 4-fold from the lowest to the highest SES groups, from 22.0% to 5.7% for men and 2.2% to 0.4% for women, respectively. In contrast, overweight, raised blood pressure, raised HbA1c (women only) and raised cholesterol were more common among those of higher SES. For example, the prevalence of overweight increased 5-fold from 2.1% to 10.1% for men, and 2-fold from 12.0% to 23.4% for women, from the lowest to highest SES groups respectively. However, neither low physical activity nor fruit, vegetable or staples consumption was associated with SES. Furthermore, associations between NCD risk factors and SES were modified by age and sex.

    Conclusions: Within this rural population, NCD risk factors are common and vary both inversely and positively across the SES gradient. A better understanding of the determinants of the sociodemographic distribution of NCDs and their risk factors in rural sub-Saharan African populations will help identify populations at most risk of developing NCDs and help plan interventions to reduce their burden.

    Funded by: Medical Research Council: G0801566, G0901213, G0901213-92157, MC_U950080926

    International journal of epidemiology 2013;42;6;1740-53

  • Cardiometabolic risk in a rural ugandan population.

    Murphy GA, Asiki G, Young EH, Seeley J, Nsubuga RN, Sandhu MS and Kamali A

    Corresponding author: Georgina A.V. Murphy, gm7@sanger.ac.uk.

    Diabetes care 2013;36;9;e143

  • Somatic CALR mutations in myeloproliferative neoplasms with nonmutated JAK2.

    Nangalia J, Massie CE, Baxter EJ, Nice FL, Gundem G, Wedge DC, Avezov E, Li J, Kollmann K, Kent DG, Aziz A, Godfrey AL, Hinton J, Martincorena I, Van Loo P, Jones AV, Guglielmelli P, Tarpey P, Harding HP, Fitzpatrick JD, Goudie CT, Ortmann CA, Loughran SJ, Raine K, Jones DR, Butler AP, Teague JW, O'Meara S, McLaren S, Bianchi M, Silber Y, Dimitropoulou D, Bloxham D, Mudie L, Maddison M, Robinson B, Keohane C, Maclean C, Hill K, Orchard K, Tauro S, Du MQ, Greaves M, Bowen D, Huntly BJ, Harrison CN, Cross NC, Ron D, Vannucchi AM, Papaemmanuil E, Campbell PJ and Green AR

    The authors' full names, degrees, and affiliations are listed in the Appendix.

    Background: Somatic mutations in the Janus kinase 2 gene (JAK2) occur in many myeloproliferative neoplasms, but the molecular pathogenesis of myeloproliferative neoplasms with nonmutated JAK2 is obscure, and the diagnosis of these neoplasms remains a challenge.

    Methods: We performed exome sequencing of samples obtained from 151 patients with myeloproliferative neoplasms. The mutation status of the gene encoding calreticulin (CALR) was assessed in an additional 1345 hematologic cancers, 1517 other cancers, and 550 controls. We established phylogenetic trees using hematopoietic colonies. We assessed calreticulin subcellular localization using immunofluorescence and flow cytometry.

    Results: Exome sequencing identified 1498 mutations in 151 patients, with medians of 6.5, 6.5, and 13.0 mutations per patient in samples of polycythemia vera, essential thrombocythemia, and myelofibrosis, respectively. Somatic CALR mutations were found in 70 to 84% of samples of myeloproliferative neoplasms with nonmutated JAK2, in 8% of myelodysplasia samples, in occasional samples of other myeloid cancers, and in none of the other cancers. A total of 148 CALR mutations were identified with 19 distinct variants. Mutations were located in exon 9 and generated a +1 base-pair frameshift, which would result in a mutant protein with a novel C-terminal. Mutant calreticulin was observed in the endoplasmic reticulum without increased cell-surface or Golgi accumulation. Patients with myeloproliferative neoplasms carrying CALR mutations presented with higher platelet counts and lower hemoglobin levels than patients with mutated JAK2. Mutation of CALR was detected in hematopoietic stem and progenitor cells. Clonal analyses showed CALR mutations in the earliest phylogenetic node, a finding consistent with its role as an initiating mutation in some patients.

    Conclusions: Somatic mutations in the endoplasmic reticulum chaperone CALR were found in a majority of patients with myeloproliferative neoplasms with nonmutated JAK2. (Funded by the Kay Kendall Leukaemia Fund and others.).

    Funded by: Canadian Institutes of Health Research; Cancer Research UK; Wellcome Trust: 079249, 084812, 093867, 100140

    The New England journal of medicine 2013;369;25;2391-405

  • Special focus: bioinformatics.

    Nawrocki EP and Burge SW

    Eddy Lab; HHMI Janelia Farm Research Campus; Ashburn, VA USA.

    RNA biology 2013;10;7;1160

  • Meta-analysis investigating associations between healthy diet and fasting glucose and insulin levels and modification by Loci associated with glucose homeostasis in data from 15 cohorts.

    Nettleton JA, Hivert MF, Lemaitre RN, McKeown NM, Mozaffarian D, Tanaka T, Wojczynski MK, Hruby A, Djoussé L, Ngwa JS, Follis JL, Dimitriou M, Ganna A, Houston DK, Kanoni S, Mikkilä V, Manichaikul A, Ntalla I, Renström F, Sonestedt E, van Rooij FJ, Bandinelli S, de Koning L, Ericson U, Hassanali N, Kiefte-de Jong JC, Lohman KK, Raitakari O, Papoutsakis C, Sjogren P, Stirrups K, Ax E, Deloukas P, Groves CJ, Jacques PF, Johansson I, Liu Y, McCarthy MI, North K, Viikari J, Zillikens MC, Dupuis J, Hofman A, Kolovou G, Mukamal K, Prokopenko I, Rolandsson O, Seppälä I, Cupples LA, Hu FB, Kähönen M, Uitterlinden AG, Borecki IB, Ferrucci L, Jacobs DR, Kritchevsky SB, Orho-Melander M, Pankow JS, Lehtimäki T, Witteman JC, Ingelsson E, Siscovick DS, Dedoussis G, Meigs JB and Franks PW

    Whether loci that influence fasting glucose (FG) and fasting insulin (FI) levels, as identified by genome-wide association studies, modify associations of diet with FG or FI is unknown. We utilized data from 15 US and European cohort studies comprising 51,289 persons without diabetes to test whether genotype and diet interact to influence FG or FI concentration. We constructed a diet score using study-specific quartile rankings for intakes of whole grains, fish, fruits, vegetables, and nuts/seeds (favorable) and red/processed meats, sweets, sugared beverages, and fried potatoes (unfavorable). We used linear regression within studies, followed by inverse-variance-weighted meta-analysis, to quantify 1) associations of diet score with FG and FI levels and 2) interactions of diet score with 16 FG-associated loci and 2 FI-associated loci. Diet score (per unit increase) was inversely associated with FG (β = -0.004 mmol/L, 95% confidence interval: -0.005, -0.003) and FI (β = -0.008 ln-pmol/L, 95% confidence interval: -0.009, -0.007) levels after adjustment for demographic factors, lifestyle, and body mass index. Genotype variation at the studied loci did not modify these associations. Healthier diets were associated with lower FG and FI concentrations regardless of genotype at previously replicated FG- and FI-associated loci. Studies focusing on genomic regions that do not yield highly statistically significant associations from main-effect genome-wide association studies may be more fruitful in identifying diet-gene interactions.

    American journal of epidemiology 2013;177;2;103-15

  • The relative timing of mutations in a breast cancer genome.

    Newman S, Howarth KD, Greenman CD, Bignell GR, Tavaré S and Edwards PA

    Hutchison/MRC Research Centre and Department of Pathology, University of Cambridge, Cambridge, United Kingdom.

    Many tumors have highly rearranged genomes, but a major unknown is the relative importance and timing of genome rearrangements compared to sequence-level mutation. Chromosome instability might arise early, be a late event contributing little to cancer development, or happen as a single catastrophic event. Another unknown is which of the point mutations and rearrangements are selected. To address these questions we show, using the breast cancer cell line HCC1187 as a model, that we can reconstruct the likely history of a breast cancer genome. We assembled probably the most complete map to date of a cancer genome, by combining molecular cytogenetic analysis with sequence data. In particular, we assigned most sequence-level mutations to individual chromosomes by sequencing of flow sorted chromosomes. The parent of origin of each chromosome was assigned from SNP arrays. We were then able to classify most of the mutations as earlier or later according to whether they occurred before or after a landmark event in the evolution of the genome, endoreduplication (duplication of its entire genome). Genome rearrangements and sequence-level mutations were fairly evenly divided earlier and later, suggesting that genetic instability was relatively constant throughout the life of this tumor, and chromosome instability was not a late event. Mutations that caused chromosome instability would be in the earlier set. Strikingly, the great majority of inactivating mutations and in-frame gene fusions happened earlier. The non-random timing of some of the mutations may be evidence that they were selected.

    PloS one 2013;8;6;e64991

  • Comparative genomics in Chlamydomonas and Plasmodium identifies an ancient nuclear envelope protein family essential for sexual reproduction in protists, fungi, plants, and vertebrates.

    Ning J, Otto TD, Pfander C, Schwach F, Brochet M, Bushell E, Goulding D, Sanders M, Lefebvre PA, Pei J, Grishin NV, Vanderlaan G, Billker O and Snell WJ

    Department of Cell Biology, University of Texas Southwestern Medical School, Dallas, Texas 75390, USA;

    Fertilization is a crucial yet poorly characterized event in eukaryotes. Our previous discovery that the broadly conserved protein HAP2 (GCS1) functioned in gamete membrane fusion in the unicellular green alga Chlamydomonas and the malaria pathogen Plasmodium led us to exploit the rare biological phenomenon of isogamy in Chlamydomonas in a comparative transcriptomics strategy to uncover additional conserved sexual reproduction genes. All previously identified Chlamydomonas fertilization-essential genes fell into related clusters based on their expression patterns. Out of several conserved genes in a minus gamete cluster, we focused on Cre06.g280600, an ortholog of the fertilization-related Arabidopsis GEX1. Gene disruption, cell biological, and immunolocalization studies show that CrGEX1 functions in nuclear fusion in Chlamydomonas. Moreover, CrGEX1 and its Plasmodium ortholog, PBANKA_113980, are essential for production of viable meiotic progeny in both organisms and thus for mosquito transmission of malaria. Remarkably, we discovered that the genes are members of a large, previously unrecognized family whose first-characterized member, KAR5, is essential for nuclear fusion during yeast sexual reproduction. Our comparative transcriptomics approach provides a new resource for studying sexual development and demonstrates that exploiting the data can lead to the discovery of novel biology that is conserved across distant taxa.

    Genes &amp; development 2013;27;10;1198-215

  • Binding of nucleoid-associated protein fis to DNA is regulated by DNA breathing dynamics.

    Nowak-Lovato K, Alexandrov LB, Banisadr A, Bauer AL, Bishop AR, Usheva A, Mu F, Hong-Geller E, Rasmussen KØ, Hlavacek WS and Alexandrov BS

    Bioscience Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America.

    Physicochemical properties of DNA, such as shape, affect protein-DNA recognition. However, the properties of DNA that are most relevant for predicting the binding sites of particular transcription factors (TFs) or classes of TFs have yet to be fully understood. Here, using a model that accurately captures the melting behavior and breathing dynamics (spontaneous local openings of the double helix) of double-stranded DNA, we simulated the dynamics of known binding sites of the TF and nucleoid-associated protein Fis in Escherichia coli. Our study involves simulations of breathing dynamics, analysis of large published in vitro and genomic datasets, and targeted experimental tests of our predictions. Our simulation results and available in vitro binding data indicate a strong correlation between DNA breathing dynamics and Fis binding. Indeed, we can define an average DNA breathing profile that is characteristic of Fis binding sites. This profile is significantly enriched among the identified in vivo E. coli Fis binding sites. To test our understanding of how Fis binding is influenced by DNA breathing dynamics, we designed base-pair substitutions, mismatch, and methylation modifications of DNA regions that are known to interact (or not interact) with Fis. The goal in each case was to make the local DNA breathing dynamics either closer to or farther from the breathing profile characteristic of a strong Fis binding site. For the modified DNA segments, we found that Fis-DNA binding, as assessed by gel-shift assay, changed in accordance with our expectations. We conclude that Fis binding is associated with DNA breathing dynamics, which in turn may be regulated by various nucleotide modifications.

    PLoS computational biology 2013;9;1;e1002881

  • Chlamydia trachomatis clinical isolates identified as tetracycline resistant do not exhibit resistance in vitro: whole-genome sequencing reveals a mutation in porB but no evidence for tetracycline resistance genes.

    O'Neill CE, Seth-Smith HM, Van Der Pol B, Harris SR, Thomson NR, Cutcliffe LT and Clarke IN

    Faculty of Medicine, CES Academic Unit, University of Southampton, Southampton General Hospital, Tremona Road, Southampton, UK. c.o'neill@soton.ac.uk

    Chlamydia trachomatis is the most common bacterial sexually transmitted infection worldwide and the leading cause of preventable blindness in developing countries. Tetracycline is commonly the drug of choice for treating C. trachomatis infections, but cases of antibiotic resistance in clinical isolates have previously been reported. Here, we used antibiotic resistance assays and whole-genome sequencing to interrogate the hypothesis that two clinical isolates (IU824 and IU888) have acquired mechanisms of antibiotic resistance. Immunofluorescence staining was used to identify C. trachomatis inclusions in cell cultures grown in the presence of tetracycline; however, only antibiotic-free control cultures yielded the strong fluorescence associated with the presence of chlamydial inclusions. Infectivity was lost upon passage of harvested cultures grown in the presence of tetracycline into antibiotic-free medium, so we conclude that these isolates were phenotypically sensitive to tetracycline. Comparisons of the genome and plasmid sequences for the two isolates with tetracycline-sensitive strains did not identify regions of low sequence identity that could accommodate horizontally acquired resistance genes, and the tetracycline binding region of the 16S rRNA gene was identical to that of the sensitive control strains. The porB gene of strain IU824, however, was found to contain a premature stop codon not previously identified, which is noteworthy but unlikely to be related to tetracycline resistance. In conclusion, we found no evidence of tetracycline resistance in the two strains investigated, and it seems most likely that the small, aberrant inclusions previously identified resulted from the high chlamydial load used in the original antibiotic resistance assays.

    Microbiology (Reading, England) 2013;159;Pt 4;748-56

  • Meta-analysis of genome-wide association studies identifies six new Loci for serum calcium concentrations.

    O'Seaghdha CM, Wu H, Yang Q, Kapur K, Guessous I, Zuber AM, Köttgen A, Stoudmann C, Teumer A, Kutalik Z, Mangino M, Dehghan A, Zhang W, Eiriksdottir G, Li G, Tanaka T, Portas L, Lopez LM, Hayward C, Lohman K, Matsuda K, Padmanabhan S, Firsov D, Sorice R, Ulivi S, Brockhaus AC, Kleber ME, Mahajan A, Ernst FD, Gudnason V, Launer LJ, Mace A, Boerwinckle E, Arking DE, Tanikawa C, Nakamura Y, Brown MJ, Gaspoz JM, Theler JM, Siscovick DS, Psaty BM, Bergmann S, Vollenweider P, Vitart V, Wright AF, Zemunik T, Boban M, Kolcic I, Navarro P, Brown EM, Estrada K, Ding J, Harris TB, Bandinelli S, Hernandez D, Singleton AB, Girotto G, Ruggiero D, d'Adamo AP, Robino A, Meitinger T, Meisinger C, Davies G, Starr JM, Chambers JC, Boehm BO, Winkelmann BR, Huang J, Murgia F, Wild SH, Campbell H, Morris AP, Franco OH, Hofman A, Uitterlinden AG, Rivadeneira F, Völker U, Hannemann A, Biffar R, Hoffmann W, Shin SY, Lescuyer P, Henry H, Schurmann C, SUNLIGHT consortium, GEFOS consortium, Munroe PB, Gasparini P, Pirastu N, Ciullo M, Gieger C, März W, Lind L, Spector TD, Smith AV, Rudan I, Wilson JF, Polasek O, Deary IJ, Pirastu M, Ferrucci L, Liu Y, Kestenbaum B, Kooner JS, Witteman JC, Nauck M, Kao WH, Wallaschofski H, Bonny O, Fox CS and Bochud M

    National Heart, Lung, and Blood Institute's Framingham Heart Study and Center for Population Studies, Framingham, Massachusetts, United States of America ; Renal Division, Massachusetts General Hospital, Boston, Massachusetts, United States of America.

    Calcium is vital to the normal functioning of multiple organ systems and its serum concentration is tightly regulated. Apart from CASR, the genes associated with serum calcium are largely unknown. We conducted a genome-wide association meta-analysis of 39,400 individuals from 17 population-based cohorts and investigated the 14 most strongly associated loci in ≤21,679 additional individuals. Seven loci (six new regions) in association with serum calcium were identified and replicated. Rs1570669 near CYP24A1 (P = 9.1E-12), rs10491003 upstream of GATA3 (P = 4.8E-09) and rs7481584 in CARS (P = 1.2E-10) implicate regions involved in Mendelian calcemic disorders: Rs1550532 in DGKD (P = 8.2E-11), also associated with bone density, and rs7336933 near DGKH/KIAA0564 (P = 9.1E-10) are near genes that encode distinct isoforms of diacylglycerol kinase. Rs780094 is in GCKR. We characterized the expression of these genes in gut, kidney, and bone, and demonstrate modulation of gene expression in bone in response to dietary calcium in mice. Our results shed new light on the genetics of calcium homeostasis.

    PLoS genetics 2013;9;9;e1003796

  • Mutations in BICD2 Cause Dominant Congenital Spinal Muscular Atrophy and Hereditary Spastic Paraplegia.

    Oates EC, Rossor AM, Hafezparast M, Gonzalez M, Speziani F, Macarthur DG, Lek M, Cottenie E, Scoto M, Foley AR, Hurles M, Houlden H, Greensmith L, Auer-Grumbach M, Pieber TR, Strom TM, Schule R, Herrmann DN, Sowden JE, Acsadi G, Menezes MP, Clarke NF, Züchner S, UK10K, Muntoni F, North KN and Reilly MM

    Institute for Neuroscience and Muscle Research, Children's Hospital at Westmead, Westmead, Sydney, NSW 2145, Australia; Discipline of Paediatrics and Child Health, Faculty of Medicine, The University of Sydney, Sydney, NSW 2006, Australia.

    Dominant congenital spinal muscular atrophy (DCSMA) is a disorder of developing anterior horn cells and shows lower-limb predominance and clinical overlap with hereditary spastic paraplegia (HSP), a lower-limb-predominant disorder of corticospinal motor neurons. We have identified four mutations in bicaudal D homolog 2 (Drosophila) (BICD2) in six kindreds affected by DCSMA, DCSMA with upper motor neuron features, or HSP. BICD2 encodes BICD2, a key adaptor protein that interacts with the dynein-dynactin motor complex, which facilitates trafficking of cellular cargos that are critical to motor neuron development and maintenance. We demonstrate that mutations resulting in amino acid substitutions in two binding regions of BICD2 increase its binding affinity for the cytoplasmic dynein-dynactin complex, which might result in the perturbation of BICD2-dynein-dynactin-mediated trafficking, and impair neurite outgrowth. These findings provide insight into the mechanism underlying both the static and the slowly progressive clinical features and the motor neuron pathology that characterize BICD2-associated diseases, and underscore the importance of the dynein-dynactin transport pathway in the development and survival of both lower and upper motor neurons.

    American journal of human genetics 2013

  • Getting ready for the human phenome project: the 2012 forum of the human variome project.

    Oetting WS, Robinson PN, Greenblatt MS, Cotton RG, Beck T, Carey JC, Doelken SC, Girdea M, Groza T, Hamilton CM, Hamosh A, Kerner B, Macarthur JA, Maglott DR, Mons B, Rehm HL, Schofield PN, Searle BA, Smedley D, Smith CL, Bernstein IT, Zankl A and Zhao EY

    Department of Experimental and Clinical Pharmacology, College of Pharmacy, University of Minnesota, Minneapolis, Minnesota.

    A forum of the Human Variome Project (HVP) was held as a satellite to the 2012 Annual Meeting of the American Society of Human Genetics in San Francisco, California. The theme of this meeting was "Getting Ready for the Human Phenome Project." Understanding the genetic contribution to both rare single-gene "Mendelian" disorders and more complex common diseases will require integration of research efforts among many fields and better defined phenotypes. The HVP is dedicated to bringing together researchers and research populations throughout the world to provide the resources to investigate the impact of genetic variation on disease. To this end, there needs to be a greater sharing of phenotype and genotype data. For this to occur, many databases that currently exist will need to become interoperable to allow for the combining of cohorts with similar phenotypes to increase statistical power for studies attempting to identify novel disease genes or causative genetic variants. Improved systems and tools that enhance the collection of phenotype data from clinicians are urgently needed. This meeting begins the HVP's effort toward this important goal.

    Human mutation 2013;34;4;661-6

  • Cereal killers.

    Otto TD and Reid AJ

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2013;11;11;744

  • Autologous antibody capture to enrich immunogenic viruses for viral discovery.

    Oude Munnink BB, Jazaeri Farsani SM, Deijs M, Jonkers J, Verhoeven JT, Ieven M, Goossens H, de Jong MD, Berkhout B, Loens K, Kellam P, Bakker M, Canuti M, Cotten M and van der Hoek L

    Laboratory of Experimental Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center of the University of Amsterdam, Amsterdam, The Netherlands.

    Discovery of new viruses has been boosted by novel deep sequencing technologies. Currently, many viruses can be identified by sequencing without knowledge of the pathogenicity of the virus. However, attributing the presence of a virus in patient material to a disease in the patient can be a challenge. One approach to meet this challenge is identification of viral sequences based on enrichment by autologous patient antibody capture. This method facilitates identification of viruses that have provoked an immune response within the patient and may increase the sensitivity of the current virus discovery techniques. To demonstrate the utility of this method, virus discovery deep sequencing (VIDISCA-454) was performed on clinical samples from 19 patients: 13 with a known respiratory viral infection and 6 with a known gastrointestinal viral infection. Patient sera was collected from one to several months after the acute infection phase. Input and antibody capture material was sequenced and enrichment was assessed. In 18 of the 19 patients, viral reads from immunogenic viruses were enriched by antibody capture (ranging between 1.5x to 343x in respiratory material, and 1.4x to 53x in stool). Enriched reads were also determined in an identity independent manner by using a novel algorithm Xcompare. In 16 of the 19 patients, 21% to 100% of the enriched reads were derived from infecting viruses. In conclusion, the technique provides a novel approach to specifically identify immunogenic viral sequences among the bulk of sequences which are usually encountered during virus discovery metagenomics.

    PloS one 2013;8;11;e78454

  • Efficient depletion of host DNA contamination in malaria clinical sequencing.

    Oyola SO, Gu Y, Manske M, Otto TD, O'Brien J, Alcock D, Macinnis B, Berriman M, Newbold CI, Kwiatkowski DP, Swerdlow HP and Quail MA

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom. Samuel.oyola@sanger.ac.uk

    The cost of whole-genome sequencing (WGS) is decreasing rapidly as next-generation sequencing technology continues to advance, and the prospect of making WGS available for public health applications is becoming a reality. So far, a number of studies have demonstrated the use of WGS as an epidemiological tool for typing and controlling outbreaks of microbial pathogens. Success of these applications is hugely dependent on efficient generation of clean genetic material that is free from host DNA contamination for rapid preparation of sequencing libraries. The presence of large amounts of host DNA severely affects the efficiency of characterizing pathogens using WGS and is therefore a serious impediment to clinical and epidemiological sequencing for health care and public health applications. We have developed a simple enzymatic treatment method that takes advantage of the methylation of human DNA to selectively deplete host contamination from clinical samples prior to sequencing. Using malaria clinical samples with over 80% human host DNA contamination, we show that the enzymatic treatment enriches Plasmodium falciparum DNA up to ∼9-fold and generates high-quality, nonbiased sequence reads covering >98% of 86,158 catalogued typeable single-nucleotide polymorphism loci.

    Funded by: Medical Research Council: G19/9; Wellcome Trust: 079355/Z/06/Z

    Journal of clinical microbiology 2013;51;3;745-51

  • Diversity among human non-typhoidal salmonellae isolates from Zimbabwe.

    Paglietti B, Falchi G, Mason P, Chitsatso O, Nair S, Gwanzura L, Uzzau S, Cappuccinelli P, Wain J and Rubino S

    Microbiologica, Dipartimento di Scienze Biomediche, Universita di Sassari, Sassari, Sardegna, Italy.

    Background: Non-typhoidal Salmonella infections are an important public health problem in sub-Saharan Africa, especially among children and HIV-seropositive patients in whom they may cause invasive disease.

    Methods: In order to better understand the epidemiology of Salmonella infections in southern Africa we typed, using serotyping, phage typing and multilocus sequence typing, 167 non-typhoidal Salmonella strains isolated from human clinical specimens during 1995-2000.

    Results: The most common serovars were Salmonella Typhimurium DT56/ST313, Salmonella Enteritidis PT4 and Salmonella Isangi ST216. Isolates of Salmonella Isangi showed a multidrug-resistant phenotype that was resistant to extended-spectrum cephalosporins. Twelve new sequence types and six new serotypes of Salmonella were identified.

    Conclusions: Given the diversity detected in the study it seems likely that many new variants of S. enterica are extant in Zimbabwe and by implication across sub-Saharan Africa. We have demonstrated the presence in Zimbabwe of a multidrug-resistant strain of the serovar Salmonella Isangi and demonstrated the diversity of Salmonella circulating in one sub-Saharan African country. Further studies on the characteristics of Salmonella Isangi isolates from Zimbabwe, including plasmid typing and genotyping, are essential if effective control of the spread of this potential pathogen in sub-Saharan Africa is to be achieved.

    Funded by: Medical Research Council: G0600805; Wellcome Trust

    Transactions of the Royal Society of Tropical Medicine and Hygiene 2013;107;8;487-92

  • Probing the brain of comorbidity.

    Palotie A, Kallela M and Anttila V

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Inherited sleep disorders might provide insights for migraine pathophysiology (Brennan et al., this issue).

    Science translational medicine 2013;5;183;183fs15

  • Ectopic Expression of Activated Notch or SOX2 Reveals Similar and Unique Roles in the Development of the Sensory Cell Progenitors in the Mammalian Inner Ear.

    Pan W, Jin Y, Chen J, Rottier RJ, Steel KP and Kiernan AE

    Department of Ophthalmology and Department of Biomedical Genetics, University of Rochester, Rochester, New York 14642, Department of Pediatric Surgery, Erasmus MC-Sophia Children's Hospital, 3000 CA Rotterdam, the Netherlands, and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    Hearing impairment or vestibular dysfunction in humans often results from a permanent loss of critical cell types in the sensory regions of the inner ear, including hair cells, supporting cells, or cochleovestibular neurons. These important cell types arise from a common sensory or neurosensory progenitor, although little is known about how these progenitors are specified. Studies have shown that Notch signaling and the transcription factor Sox2 are required for the development of these lineages. Previously we and others demonstrated that ectopic activation of Notch can direct nonsensory cells to adopt a sensory fate, indicating a role for Notch in early specification events. Here, we explore the relationship between Notch and SOX2 by ectopically activating these factors in nonsensory regions of the mouse cochlea, and demonstrate that, similar to Notch, SOX2 can specify sensory progenitors, consistent with a role downstream of Notch signaling. However, we also show that Notch has a unique role in promoting the proliferation of the sensory progenitors. We further demonstrate that Notch can only induce ectopic sensory regions within a certain time window of development, and that the ectopic hair cells display specialized stereocilia bundles similar to endogenous hair cells. These results demonstrate that Notch and SOX2 can both drive the sensory program in nonsensory cells, indicating these factors may be useful in cell replacement strategies in the inner ear.

    The Journal of neuroscience : the official journal of the Society for Neuroscience 2013;33;41;16146-16157

  • Tailoring the models of transcription.

    Pance A

    The Welcome Trust Sanger Institute, Genome Campus Hinxton, Cambridge CB10 1SA, UK. ap9@sanger.ac.uk.

    Molecular biology is a rapidly evolving field that has led to the development of increasingly sophisticated technologies to improve our capacity to study cellular processes in much finer detail. Transcription is the first step in protein expression and the major point of regulation of the components that determine the characteristics, fate and functions of cells. The study of transcriptional regulation has been greatly facilitated by the development of reporter genes and transcription factor expression vectors, which have become versatile tools for manipulating promoters, as well as transcription factors in order to examine their function. The understanding of promoter complexity and transcription factor structure offers an insight into the mechanisms of transcriptional control and their impact on cell behaviour. This review focuses on some of the many applications of molecular cut-and-paste tools for the manipulation of promoters and transcription factors leading to the understanding of crucial aspects of transcriptional regulation.

    International journal of molecular sciences 2013;14;4;7583-97

  • Advances in osteoarthritis genetics.

    Panoutsopoulou K and Zeggini E

    Department of Human Genetics, Wellcome Trust Sanger Institute, Cambridgeshire, UK.

    Osteoarthritis (OA), the most common form of arthritis, is a highly debilitating disease of the joints and can lead to severe pain and disability. There is no cure for OA. Current treatments often fail to alleviate its symptoms leading to an increased demand for joint replacement surgery. Previous epidemiological and genetic research has established that OA is a multifactorial disease with both environmental and genetic components. Over the past 6 years, a candidate gene study and several genome-wide association scans (GWAS) in populations of Asian and European descent have collectively established 15 loci associated with knee or hip OA that have been replicated with genome-wide significance, shedding some light on the aetiogenesis of the disease. All OA associated variants to date are common in frequency and appear to confer moderate to small effect sizes. Some of the associated variants are found within or near genes with clear roles in OA pathogenesis, whereas others point to unsuspected, less characterised pathways. These studies have also provided further evidence in support of the existence of ethnic, sex, and joint specific effects in OA and have highlighted the importance of expanded and more homogeneous phenotype definitions in genetic studies of OA.

    Journal of medical genetics 2013

  • The effect of FTO variation on increased osteoarthritis risk is mediated through body mass index: a mendelian randomisation study.

    Panoutsopoulou K, Metrustry S, Doherty SA, Laslett LL, Maciewicz RA, Hart DJ, Zhang W, Muir KR, Wheeler M, Cooper C, Spector TD, Cicuttini FM, Jones G, Arden NK, Doherty M, Zeggini E, Valdes AM and arcOGEN Consortium

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.

    Objective: Variation in the fat mass and obesity-associated (FTO) gene influences susceptibility to obesity. A variant in the FTO gene has been implicated in genetic risk to osteoarthritis (OA). We examined the role of the FTO polymorphism rs8044769 in risk of knee and hip OA in cases and controls incorporating body mass index (BMI) information.

    Methods: 5409 knee OA patients, 4355 hip OA patients and up to 5362 healthy controls from 7 independent cohorts from the UK and Australia were genotyped for rs8044769. The association of the FTO variant with OA was investigated in case/control analyses with and without BMI adjustment and in analyses matched for BMI category. A mendelian randomisation approach was employed using the FTO variant as the instrumental variable to evaluate the role of overweight on OA.

    Results: In the meta-analysis of all overweight (BMI≥25) samples versus normal-weight controls irrespective of OA status the association of rs8044769 with overweight is highly significant (OR[CIs] for allele G=1.14 [01.08 to 1.19], p=7.5×10(-7)). A significant association with knee OA is present in the analysis without BMI adjustment (OR[CIs]=1.08[1.02 to 1.14], p=0.009) but the signal fully attenuates after BMI adjustment (OR[CIs]=0.99[0.93 to 1.05], p=0.666). We observe no evidence for association in the BMI-matched meta-analyses. Using mendelian randomisation approaches we confirm the causal role of overweight on OA.

    Conclusions: Our data highlight the contribution of genetic risk to overweight in defining risk to OA but the association is exclusively mediated by the effect on BMI. This is consistent with what is known of the biology of the FTO gene and supports the causative role of high BMI in OA.

    Funded by: Medical Research Council: MC_UP_A620_1014

    Annals of the rheumatic diseases 2013

  • In search of low frequency and rare variants affecting complex traits.

    Panoutsopoulou K, Tachmazidou I and Zeggini E

    Wellcome Trust Sanger Institute, Hinxton, UK.

    The allelic architecture of complex traits is likely to be underpinned by a combination of multiple common-frequency and rare variants. Targeted genotyping arrays and next generation sequencing technologies at the whole genome and whole exome scales are increasingly employed to access sequence variation across the full minor allele frequency spectrum. Different study design strategies that make use of diverse technologies, imputation and sample selection approaches are an active target of development and evaluation efforts. Initial insights into the contribution of rare variants in common diseases and medically-relevant quantitative traits point to low-frequency and rare alleles acting either independently or in aggregate and in several cases alongside common variants. Studies conducted in population isolates have been successful in detecting rare variant associations with complex phenotypes. Statistical methodologies that enable the joint analysis of rare variants across regions of the genome continue to evolve with current efforts focusing on incorporating information such as functional annotation, and on the meta-analysis of these burden tests. In addition, population stratification, defining genome-wide statistical significance thresholds and the design of appropriate replication experiments constitute important considerations for the powerful analysis and interpretation of rare variant association studies. Progress in addressing these emerging challenges and the accrual of sufficiently large data sets are poised to help the field of complex trait genetics enter a promising era of discovery.

    Human molecular genetics 2013

  • Genome-wide association study for osteoarthritis

    PANOUTSOPOULOU,K.;, SOUTHAM,L.; and ZEGGINI,E.;

    The Lancet 2013;381;9864;373

  • Clinical and biological implications of driver mutations in myelodysplastic syndromes.

    Papaemmanuil E, Gerstung M, Malcovati L, Tauro S, Gundem G, Van Loo P, Yoon CJ, Ellis P, Wedge DC, Pellagatti A, Shlien A, Groves MJ, Forbes SA, Raine K, Hinton J, Mudie LJ, McLaren S, Hardy C, Latimer C, Della Porta MG, O'Meara S, Ambaglio I, Galli A, Butler AP, Waldin G, Teague JW, Quek L, Sternberg A, Gambacorti-Passerini C, Cross NC, Green AR, Boultwood J, Vyas P, Hellstrom-Lindberg E, Bowen D, Cazzola M, Stratton MR and Campbell PJ

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, United Kingdom;

    Myelodysplastic syndromes (MDS) are a heterogeneous group of chronic hematological malignancies characterized by dysplasia, ineffective hematopoiesis and a variable risk of progression to acute myeloid leukemia. Sequencing of MDS genomes has identified mutations in genes implicated in RNA splicing, DNA modification, chromatin regulation and cell signaling. We sequenced 111 genes across 738 patients with MDS or closely related neoplasms (including CMML and MDS-MPN) to explore the role of acquired mutations in MDS biology and clinical phenotype. 78% patients had one or more oncogenic mutations. We identify complex patterns of pairwise association between genes, indicative of epistatic interactions involving components of the spliceosome machinery and epigenetic modifiers. Coupled with inferences on subclonal mutations, these data suggest a hypothesis of genetic 'predestination', in which early driver mutations, typically affecting genes involved in RNA splicing, dictate future trajectories of disease evolution with distinct clinical phenotypes. Driver mutations had equivalent prognostic significance whether clonal or subclonal, and leukemia-free survival deteriorated steadily as numbers of driver mutations increased. Thus, analysis of oncogenic mutations in large, well-characterized cohorts of patients illustrates the interconnections between the cancer genome and disease biology, with considerable potential for clinical application.

    Blood 2013

  • A member of the Plasmodium falciparum PHIST family binds to the erythrocyte cytoskeleton component band 4.1.

    Parish LA, Mai DW, Jones ML, Kitson EL and Rayner JC

    William C Gorgas Center for Geographic Medicine, Division of Infectious Diseases, Department of Medicine, University of Alabama at Birmingham, 845 19th St, South, Birmingham, AL, 35294-2170, USA. julian.rayner@sanger.ac.uk.

    Background: Plasmodium falciparum parasites export more than 400 proteins into the cytosol of their host erythrocytes. These exported proteins catalyse the formation of knobs on the erythrocyte plasma membrane and an overall increase in erythrocyte rigidity, presumably by modulating the endogenous erythrocyte cytoskeleton. In uninfected erythrocytes, Band 4.1 (4.1R) plays a key role in regulating erythrocyte shape by interacting with multiple proteins through the three lobes of its cloverleaf-shaped N-terminal domain. In P. falciparum-infected erythrocytes, the C-lobe of 4.1R interacts with the P. falciparum protein mature parasite-infected erythrocyte surface antigen (MESA), but it is not currently known whether other P. falciparum proteins bind to other lobes of the 4.1R N-terminal domain. Methods: In order to identify novel 4.1R interacting proteins, a yeast two-hybrid screen was performed with a fragment of 4.1R containing both the N- and α-lobes. Positive interactions were confirmed and investigated using site-directed mutagenesis, and antibodies were raised against the interacting partner to characterise it's expression and distribution in P. falciparum infected erythrocytes. Results: Yeast two-hybrid screening identified a positive interaction between the 4.1R N- and α-lobes and PF3D7_0402000. PF3D7_0402000 is a member of a large family of exported proteins that share a domain of unknown function, the PHIST domain. Domain mapping and site-directed mutagenesis established that it is the PHIST domain of PF3D7_0402000 that interacts with 4.1R. Native PF3D7_0402000 is localized at the parasitophorous vacuole membrane (PVM), and colocalizes with a subpopulation of 4.1R. Discussion: The function of the majority of P. falciparum exported proteins, including most members of the PHIST family, is unknown, and in only a handful of cases has a direct interaction between P. falciparum-exported proteins and components of the erythrocyte cytoskeleton been established. The interaction between 4.1R and PF3D7_0402000, and localization of PF3D7_0402000 with a sub-population of 4.1R at the PVM could indicate a role in modulating PVM structure. Further investigation into the mechanisms for 4.1R recruitment is needed. Conclusion: PF3D7_0402000 was identified as a new binding partner for the major erythrocyte cytoskeletal protein, 4.1R. This interaction is consistent with a growing body of literature that suggests the PHIST family members function by interacting directly with erythrocyte proteins.

    Malaria journal 2013;12;160

  • What has high-throughput sequencing ever done for us?

    Parkhill J

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    This month's Genome Watch looks back over the past 10 years and highlights how the incredible advances in sequencing technologies have transformed research into microbial genomes.

    Nature reviews. Microbiology 2013;11;10;664-5

  • Proteomic and Genetic Analyses Demonstrate that Plasmodium berghei Blood Stages Export a Large and Diverse Repertoire of Proteins.

    Pasini EM, Braks JA, Fonager J, Klop O, Aime E, Spaccapelo R, Otto TD, Berriman M, Hiss JA, Thomas AW, Mann M, Janse CJ, Kocken CH and Franke-Fayard B

    ‡Biomedical Primate Research Centre, 2288 GJ Rijswijk, The Netherlands;

    Malaria parasites actively remodel the infected red blood cell (irbc) by exporting proteins into the host cell cytoplasm. The human parasite Plasmodium falciparum exports particularly large numbers of proteins, including proteins that establish a vesicular network allowing the trafficking of proteins onto the surface of irbcs that are responsible for tissue sequestration. Like P. falciparum, the rodent parasite P. berghei ANKA sequesters via irbc interactions with the host receptor CD36. We have applied proteomic, genomic, and reverse-genetic approaches to identify P. berghei proteins potentially involved in the transport of proteins to the irbc surface. A comparative proteomics analysis of P. berghei non-sequestering and sequestering parasites was used to determine changes in the irbc membrane associated with sequestration. Subsequent tagging experiments identified 13 proteins (Plasmodium export element (PEXEL)-positive as well as PEXEL-negative) that are exported into the irbc cytoplasm and have distinct localization patterns: a dispersed and/or patchy distribution, a punctate vesicle-like pattern in the cytoplasm, or a distinct location at the irbc membrane. Members of the PEXEL-negative BIR and PEXEL-positive Pb-fam-3 show a dispersed localization in the irbc cytoplasm, but not at the irbc surface. Two of the identified exported proteins are transported to the irbc membrane and were named erythrocyte membrane associated proteins. EMAP1 is a member of the PEXEL-negative Pb-fam-1 family, and EMAP2 is a PEXEL-positive protein encoded by a single copy gene; neither protein plays a direct role in sequestration. Our observations clearly indicate that P. berghei traffics a diverse range of proteins to different cellular locations via mechanisms that are analogous to those employed by P. falciparum. This information can be exploited to generate transgenic humanized rodent P. berghei parasites expressing chimeric P. berghei/P. falciparum proteins on the surface of rodent irbc, thereby opening new avenues for in vivo screening adjunct therapies that block sequestration.

    Molecular &amp; cellular proteomics : MCP 2013;12;2;426-48

  • Incidence and Characterisation of Methicillin-Resistant Staphylococcus aureus (MRSA) from Nasal Colonisation in Participants Attending a Cattle Veterinary Conference in the UK.

    Paterson GK, Harrison EM, Craven EF, Petersen A, Larsen AR, Ellington MJ, Török ME, Peacock SJ, Parkhill J, Zadoks RN and Holmes MA

    Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge, United Kingdom.

    We sought to determine the prevalence of nasal colonisation with methicillin-resistant Staphylococcus aureus among cattle veterinarians in the UK. There was particular interest in examining the frequency of colonisation with MRSA harbouring mecC, as strains with this mecA homologue were originally identified in bovine milk and may represent a zoonotic risk to those in contact with dairy livestock. Three hundred and seven delegates at the British Cattle Veterinarian Association (BCVA) Congress 2011 in Southport, UK were screening for nasal colonisation with MRSA. Isolates were characterised by whole genome sequencing and antimicrobial susceptibility testing. Eight out of three hundred and seven delegates (2.6%) were positive for nasal colonisation with MRSA. All strains were positive for mecA and none possessed mecC. The time since a delegate's last visit to a farm was significantly shorter in the MRSA-positive group than in MRSA-negative counterparts. BCVA delegates have an increased risk of MRSA colonisation compared to the general population but their frequency of colonisation is lower than that reported from other types of veterinarian conference, and from that seen in human healthcare workers. The results indicate that recent visitation to a farm is a risk factor for MRSA colonisation and that mecC-MRSA are rare among BCVA delegates (<1% based on sample size). Contact with livestock, including dairy cattle, may still be a risk factor for human colonisation with mecC-MRSA but occurs at a rate below the lower limit of detection available in this study.

    PloS one 2013;8;7;e68463

  • A sequence-based variation map of zebrafish.

    Patowary A, Purkanti R, Singh M, Chauhan R, Singh AR, Swarnkar M, Singh N, Pandey V, Torroja C, Clark MD, Kocher JP, Clark KJ, Stemple DL, Klee EW, Ekker SC, Scaria V and Sivasubbu S

    1 CSIR- Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi, India .

    Abstract Zebrafish (Danio rerio) is a popular vertebrate model organism largely deployed using outbred laboratory animals. The nonisogenic nature of the zebrafish as a model system offers the opportunity to understand natural variations and their effect in modulating phenotype. In an effort to better characterize the range of natural variation in this model system and to complement the zebrafish reference genome project, the whole genome sequence of a wild zebrafish at 39-fold genome coverage was determined. Comparative analysis with the zebrafish reference genome revealed approximately 5.2 million single nucleotide variations and over 1.6 million insertion-deletion variations. This dataset thus represents a new catalog of genetic variations in the zebrafish genome. Further analysis revealed selective enrichment for variations in genes involved in immune function and response to the environment, suggesting genome-level adaptations to environmental niches. We also show that human disease gene orthologs in the sequenced wild zebrafish genome show a lower ratio of nonsynonymous to synonymous single nucleotide variations.

    Zebrafish 2013;10;1;15-20

  • The cell-cycle state of stem cells determines cell fate propensity.

    Pauklin S and Vallier L

    Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, Anne McLaren Laboratory for Regenerative Medicine and Department of Surgery, University of Cambridge, Cambridge CB2 0SZ, UK. Electronic address: sp579@cam.ac.uk.

    Self-renewal and differentiation of stem cells are fundamentally associated with cell-cycle progression to enable tissue specification, organ homeostasis, and potentially tumorigenesis. However, technical challenges have impaired the study of the molecular interactions coordinating cell fate choice and cell-cycle progression. Here, we bypass these limitations by using the FUCCI reporter system in human pluripotent stem cells and show that their capacity of differentiation varies during the progression of their cell cycle. These mechanisms are governed by the cell-cycle regulators cyclin D1-3 that control differentiation signals such as the TGF-β-Smad2/3 pathway. Conversely, cell-cycle manipulation using a small molecule directs differentiation of hPSCs and provides an approach to generate cell types with a clinical interest. Our results demonstrate that cell fate decisions are tightly associated with the cell-cycle machinery and reveal insights in the mechanisms synchronizing differentiation and proliferation in developing tissues.

    Funded by: Medical Research Council: G0701448; Wellcome Trust: 079249

    Cell 2013;155;1;135-47

  • Maps of open chromatin highlight cell type-restricted patterns of regulatory sequence variation at hematological trait loci.

    Paul DS, Albers CA, Rendon A, Voss K, Stephens J, HaemGen Consortium, van der Harst P, Chambers JC, Soranzo N, Ouwehand WH and Deloukas P

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom. d.paul@ucl.ac.uk

    Nearly three-quarters of the 143 genetic signals associated with platelet and erythrocyte phenotypes identified by meta-analyses of genome-wide association (GWA) studies are located at non-protein-coding regions. Here, we assessed the role of candidate regulatory variants associated with cell type-restricted, closely related hematological quantitative traits in biologically relevant hematopoietic cell types. We used formaldehyde-assisted isolation of regulatory elements followed by next-generation sequencing (FAIRE-seq) to map regions of open chromatin in three primary human blood cells of the myeloid lineage. In the precursors of platelets and erythrocytes, as well as in monocytes, we found that open chromatin signatures reflect the corresponding hematopoietic lineages of the studied cell types and associate with the cell type-specific gene expression patterns. Dependent on their signal strength, open chromatin regions showed correlation with promoter and enhancer histone marks, distance to the transcription start site, and ontology classes of nearby genes. Cell type-restricted regions of open chromatin were enriched in sequence variants associated with hematological indices. The majority (63.6%) of such candidate functional variants at platelet quantitative trait loci (QTLs) coincided with binding sites of five transcription factors key in regulating megakaryopoiesis. We experimentally tested 13 candidate regulatory variants at 10 platelet QTLs and found that 10 (76.9%) affected protein binding, suggesting that this is a frequent mechanism by which regulatory variants influence quantitative trait levels. Our findings demonstrate that combining large-scale GWA data with open chromatin profiles of relevant cell types can be a powerful means of dissecting the genetic architecture of closely related quantitative traits.

    Funded by: British Heart Foundation: RG/09/12/28096; Wellcome Trust: 097117, 098051

    Genome research 2013;23;7;1130-41

  • Meander: visually exploring the structural variome using space-filling curves.

    Pavlopoulos GA, Kumar P, Sifrim A, Sakai R, Lin ML, Voet T, Moreau Y and Aerts J

    Department of Electrical Engineering (ESAT/SCD), University of Leuven, Kasteelpark Arenberg 10, Box 2446, 3001 Leuven, Belgium, iMinds Future Health Department, University of Leuven, Kasteelpark Arenberg 10, Box 2446, 3001 Leuven, Belgium, Division of Basic Sciences, University of Crete, Medical School, Heraklion, 71110 Crete, Greece, Laboratory of Reproductive Genomics, Department of Human Genetics, University of Leuven, Herestraat 49, 3000 Leuven, Belgium and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton - Cambridge, CB10 1SA, UK.

    The introduction of next generation sequencing methods in genome studies has made it possible to shift research from a gene-centric approach to a genome wide view. Although methods and tools to detect single nucleotide polymorphisms are becoming more mature, methods to identify and visualize structural variation (SV) are still in their infancy. Most genome browsers can only compare a given sequence to a reference genome; therefore, direct comparison of multiple individuals still remains a challenge. Therefore, the implementation of efficient approaches to explore and visualize SVs and directly compare two or more individuals is desirable. In this article, we present a visualization approach that uses space-filling Hilbert curves to explore SVs based on both read-depth and pair-end information. An interactive open-source Java application, called Meander, implements the proposed methodology, and its functionality is demonstrated using two cases. With Meander, users can explore variations at different levels of resolution and simultaneously compare up to four different individuals against a common reference. The application was developed using Java version 1.6 and Processing.org and can be run on any platform. It can be found at http://homes.esat.kuleuven.be/∼bioiuser/meander.

    Nucleic acids research 2013

  • KSR2 mutations are associated with obesity, insulin resistance, and impaired cellular fuel oxidation.

    Pearce LR, Atanassova N, Banton MC, Bottomley B, van der Klaauw AA, Revelli JP, Hendricks A, Keogh JM, Henning E, Doree D, Jeter-Jones S, Garg S, Bochukova EG, Bounds R, Ashford S, Gayton E, Hindmarsh PC, Shield JP, Crowne E, Barford D, Wareham NJ, UK10K consortium, O'Rahilly S, Murphy MP, Powell DR, Barroso I and Farooqi IS

    Kinase suppressor of Ras 2 (KSR2) is an intracellular scaffolding protein involved in multiple signaling pathways. Targeted deletion of Ksr2 leads to obesity in mice, suggesting a role in energy homeostasis. We explored the role of KSR2 in humans by sequencing 2,101 individuals with severe early-onset obesity and 1,536 controls. We identified multiple rare variants in KSR2 that disrupt signaling through the Raf-MEKERK pathway and impair cellular fatty acid oxidation and glucose oxidation in transfected cells; effects that can be ameliorated by the commonly prescribed antidiabetic drug, metformin. Mutation carriers exhibit hyperphagia in childhood, low heart rate, reduced basal metabolic rate and severe insulin resistance. These data establish KSR2 as an important regulator of energy intake, energy expenditure, and substrate utilization in humans. Modulation of KSR2-mediated effects may represent a novel therapeutic strategy for obesity and type 2 diabetes.

    Funded by: Medical Research Council: MC_U106179471; Wellcome Trust: 077016/Z/05/Z, 096106/Z/11/Z, 098497, 098497/Z/12/Z, WT091310

    Cell 2013;155;4;765-77

  • Continent-wide panmixia of an African fruit bat facilitates transmission of potentially zoonotic viruses.

    Peel AJ, Sargan DR, Baker KS, Hayman DT, Barr JA, Crameri G, Suu-Ire R, Broder CC, Lembo T, Wang LF, Fooks AR, Rossiter SJ, Wood JL and Cunningham AA

    1] Department of Veterinary Medicine, University of Cambridge, Cambridge CB3 0ES, UK [2] Institute of Zoology, Zoological Society of London, Regent's Park, London NW1 4RY, UK.

    The straw-coloured fruit bat, Eidolon helvum, is Africa's most widely distributed and commonly hunted fruit bat, often living in close proximity to human populations. This species has been identified as a reservoir of potentially zoonotic viruses, but uncertainties remain regarding viral transmission dynamics and mechanisms of persistence. Here we combine genetic and serological analyses of populations across Africa, to determine the extent of epidemiological connectivity among E. helvum populations. Multiple markers reveal panmixia across the continental range, at a greater geographical scale than previously recorded for any other mammal, whereas populations on remote islands were genetically distinct. Multiple serological assays reveal antibodies to henipaviruses and Lagos bat virus in all locations, including small isolated island populations, indicating that factors other than population size and connectivity may be responsible for viral persistence. Our findings have potentially important public health implications, and highlight a need to avoid disturbances that may precipitate viral spillover.

    Nature communications 2013;4;2770

  • Genetic variants associated with warfarin dose in African-American individuals: a genome-wide association study.

    Perera MA, Cavallari LH, Limdi NA, Gamazon ER, Konkashbaev A, Daneshjou R, Pluzhnikov A, Crawford DC, Wang J, Liu N, Tatonetti N, Bourgeois S, Takahashi H, Bradford Y, Burkley BM, Desnick RJ, Halperin JL, Khalifa SI, Langaee TY, Lubitz SA, Nutescu EA, Oetjens M, Shahin MH, Patel SR, Sagreiya H, Tector M, Weck KE, Rieder MJ, Scott SA, Wu AH, Burmester JK, Wadelius M, Deloukas P, Wagner MJ, Mushiroda T, Kubo M, Roden DM, Cox NJ, Altman RB, Klein TE, Nakamura Y and Johnson JA

    Section of Genetic Medicine, Department of Medicine, University of Chicago, IL, USA.

    BACKGROUND: VKORC1 and CYP2C9 are important contributors to warfarin dose variability, but explain less variability for individuals of African descent than for those of European or Asian descent. We aimed to identify additional variants contributing to warfarin dose requirements in African Americans. METHODS: We did a genome-wide association study of discovery and replication cohorts. Samples from African-American adults (aged ≥18 years) who were taking a stable maintenance dose of warfarin were obtained at International Warfarin Pharmacogenetics Consortium (IWPC) sites and the University of Alabama at Birmingham (Birmingham, AL, USA). Patients enrolled at IWPC sites but who were not used for discovery made up the independent replication cohort. All participants were genotyped. We did a stepwise conditional analysis, conditioning first for VKORC1 -1639G→A, followed by the composite genotype of CYP2C9*2 and CYP2C9*3. We prespecified a genome-wide significance threshold of p<5×10(-8) in the discovery cohort and p<0·0038 in the replication cohort. FINDINGS: The discovery cohort contained 533 participants and the replication cohort 432 participants. After the prespecified conditioning in the discovery cohort, we identified an association between a novel single nucleotide polymorphism in the CYP2C cluster on chromosome 10 (rs12777823) and warfarin dose requirement that reached genome-wide significance (p=1·51×10(-8)). This association was confirmed in the replication cohort (p=5·04×10(-5)); analysis of the two cohorts together produced a p value of 4·5×10(-12). Individuals heterozygous for the rs12777823 A allele need a dose reduction of 6·92 mg/week and those homozygous 9·34 mg/week. Regression analysis showed that the inclusion of rs12777823 significantly improves warfarin dose variability explained by the IWPC dosing algorithm (21% relative improvement). INTERPRETATION: A novel CYP2C single nucleotide polymorphism exerts a clinically relevant effect on warfarin dose in African Americans, independent of CYP2C9*2 and CYP2C9*3. Incorporation of this variant into pharmacogenetic dosing algorithms could improve warfarin dose prediction in this population. FUNDING: National Institutes of Health, American Heart Association, Howard Hughes Medical Institute, Wisconsin Network for Health Research, and the Wellcome Trust.

    Lancet 2013

  • Computational proteomics pitfalls and challenges: HavanaBioinfo 2012 Workshop report.

    Perez-Riverol Y, Hermjakob H, Kohlbacher O, Martens L, Creasy D, Cox J, Leprevost F, Shan BP, Pérez-Nueno VI, Blazejczyk M, Punta M, Vierlinger K, Valiente PA, Leon K, Chinea G, Guirola O, Bringas R, Cabrera G, Guillen G, Padron G, Gonzalez LJ and Besada V

    Center for Genetic Engineering and Biotechnology, Ave 31 e/158 y 190, Cubanacán, Playa, Havana, Cuba; European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. Electronic address: yasset.perez@biocomp.cigb.edu.cu.

    The workshop "Bioinformatics for Biotechnology Applications (HavanaBioinfo 2012)", held December 8-11, 2012 in Havana, aimed at exploring new bioinformatics tools and approaches for large-scale proteomics, genomics and chemoinformatics. Major conclusions of the workshop include the following: (i) development of new applications and bioinformatics tools for proteomic repository analysis is crucial; current proteomic repositories contain enough data (spectra/identifications) that can be used to increase the annotations in protein databases and to generate new tools for protein identification; (ii) spectral libraries, de novo sequencing and database search tools should be combined to increase the number of protein identifications; (iii) protein probabilities and FDR are not yet sufficiently mature; (iv) computational proteomics software needs to become more intuitive; and at the same time appropriate education and training should be provided to help in the efficient exchange of knowledge between mass spectrometrists and experimental biologists and bioinformaticians in order to increase their bioinformatics background, especially statistics knowledge.

    Journal of proteomics 2013

  • Automatic event detection within thrombus formation based on integer programming

    Peter,L., Pauly,O., Jansen,S.B.G., Smethurst,P.A., Ouwehand,W.H. and Navab,N.

    Lecture Notes in Computer Science  2013;7766;215-24

  • Recombination-mediated genetic engineering of Plasmodium berghei DNA.

    Pfander C, Anar B, Brochet M, Rayner JC and Billker O

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    DNA of Plasmodium berghei is difficult to manipulate in Escherichia coli by conventional restriction and ligation methods due to its high content of adenine and thymine (AT) nucleotides. This limits our ability to clone large genes and to generate complex vectors for modifying the parasite genome. We here describe a protocol for using lambda Red recombinase to modify inserts of a P. berghei genomic DNA library constructed in a linear, low-copy, phage-derived vector. The method uses primer extensions of 50 bp, which provide sufficient homology for an antibiotic resistance marker to recombine efficiently with a P. berghei genomic DNA insert in E. coli. In a subsequent in vitro Gateway reaction the bacterial marker is replaced with a cassette for selection in P. berghei. The insert is then released and used for transfection. The basic techniques we describe here can be adapted to generate highly efficient vectors for gene deletion, tagging, targeted mutagenesis, or genetic complementation with larger genomic regions.

    Funded by: Medical Research Council: G0501670

    Methods in molecular biology (Clifton, N.J.) 2013;923;127-38

  • Identification of Salmonella enterica Serovar Typhi Genotypes by Use of Rapid Multiplex Ligation-Dependent Probe Amplification.

    Pham Thanh D, Tran Vu Thieu N, Tran Thuy C, Lodén M, Tuin K, Campbell JI, Van Minh Hoang N, Voong Vinh P, Farrar JJ, Holt KE, Dougan G and Baker S

    The Hospital for Tropical Diseases, Wellcome Trust Major Overseas Programme, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam.

    Salmonella enterica serovar Typhi, the causative agent of typhoid fever, is highly clonal and genetically conserved, making isolate subtyping difficult. We describe a standardized multiplex ligation-dependent probe amplification (MLPA) genotyping scheme targeting 11 key phylogenetic markers of the S. Typhi genome. The MLPA method demonstrated 90% concordance with single nucleotide polymorphism (SNP) typing, the gold standard for S. Typhi genotyping, and had the ability to identify isolates of the H58 haplotype, which is associated with resistance to multiple antimicrobials. Additionally, the assay permitted the detection of fluoroquinolone resistance-associated mutations in the DNA gyrase-encoding gene gyrA and the topoisomerase gene parC with a sensitivity of 100%. The MLPA methodology is simple and reliable, providing phylogenetically and phenotypically relevant genotyping information. This MLPA scheme offers a more-sensitive and interpretable alternative to the nonphylogenetic subgrouping methodologies that are currently used in reference and research laboratories in areas where typhoid is endemic.

    Journal of clinical microbiology 2013;51;9;2950-8

  • A genome-wide mutagenesis screen identifies multiple genes contributing to Vi capsular expression in Salmonella Typhi.

    Pickard D, Kingsley RA, Hale C, Turner K, Sivaraman K, Wetter M, Langridge G and Dougan G

    The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    A transposon-based, genome-wide mutagenesis screen exploiting the killing activity of a lytic ViII bacteriophage was used to identify Salmonella Typhi genes that contribute to Vi polysaccharide capsule expression. Genes enriched in the screen included those within the viaB locus (tviABCDE, vexABCDE) as well as oxyR, barA/sirA and yrfF, which have not previously been associated with Vi expression. The role of these genes in Vi expression was confirmed by constructing defined null mutant derivatives of S. Typhi and these were negative for Vi expression as determined by agglutination assays with Vi-specific sera or susceptibility to Vi-targeting bacteriophage. Transcriptome analysis confirmed a reduction in expression from the viaB locus in these S. Typhi mutant derivatives and defined regulatory networks associated with Vi expression.

    Journal of bacteriology 2013

  • Transcriptional Regulation of Culex pipiens Mosquitoes by Wolbachia Influences Cytoplasmic Incompatibility.

    Pinto SB, Stainton K, Harris S, Kambris Z, Sutton ER, Bonsall MB, Parkhill J and Sinkins SP

    Peter Medawar Building for Pathogen Research and Nuffield Department of Medicine (NDM), University of Oxford, Oxford, United Kingdom ; Department of Zoology, University of Oxford, Oxford, United Kingdom.

    Cytoplasmic incompatibility (CI) induced by the endosymbiont Wolbachia pipientis causes complex patterns of crossing sterility between populations of the Culex pipiens group of mosquitoes. The molecular basis of the phenotype is yet to be defined. In order to investigate what host changes may underlie CI at the molecular level, we examined the transcription of a homolog of the Drosophila melanogaster gene grauzone that encodes a zinc finger protein and acts as a regulator of female meiosis, in which mutations can cause sterility. Upregulation was observed in Wolbachia-infected C. pipiens group individuals relative to Wolbachia-cured lines and the level of upregulation differed between lines that were reproductively incompatible. Knockdown analysis of this gene using RNAi showed an effect on hatch rates in a Wolbachia infected Culex molestus line. Furthermore, in later stages of development an effect on developmental progression in CI embryos occurs in bidirectionally incompatible crosses. The genome of a wPip Wolbachia strain variant from Culex molestus was sequenced and compared with the genome of a wPip variant with which it was incompatible. Three genes in inserted or deleted regions were newly identified in the C. molestus wPip genome, one of which is a transcriptional regulator labelled wtrM. When this gene was transfected into adult Culex mosquitoes, upregulation of the grauzone homolog was observed. These data suggest that Wolbachia-mediated regulation of host gene expression is a component of the mechanism of cytoplasmic incompatibility.

    PLoS pathogens 2013;9;10;e1003647

  • Genome Wide Association Analysis of a Founder Population Identified TAF3 as a Gene for MCHC in Humans.

    Pistis G, Okonkwo SU, Traglia M, Sala C, Shin SY, Masciullo C, Buetti I, Massacane R, Mangino M, Thein SL, Spector TD, Ganesh S, Pirastu N, Gasparini P, Soranzo N, Camaschella C, Hart D, Green MR and Toniolo D

    Division of Genetics and Cell Biology, San Raffaele Research Institute and Vita Salute University, Milano, Italy.

    The red blood cell related traits are highly heritable but their genetics are poorly defined. Only 5-10% of the total observed variance is explained by the genetic loci found to date, suggesting that additional loci should be searched using approaches alternative to large meta analysis. GWAS (Genome Wide Association Study) for red blood cell traits in a founder population cohort from Northern Italy identified a new locus for mean corpuscular hemoglobin concentration (MCHC) in the TAF3 gene. The association was replicated in two cohorts (rs1887582, P = 4.25E-09). TAF3 encodes a transcription cofactor that participates in core promoter recognition complex, and is involved in zebrafish and mouse erythropoiesis. We show here that TAF3 is required for transcription of the SPTA1 gene, encoding alpha spectrin, one of the proteins that link the plasma membrane to the actin cytoskeleton. Mutations in SPTA1 are responsible for hereditary spherocytosis, a monogenic disorder of MCHC, as well as for the normal MCHC level. Based on our results, we propose that TAF3 is required for normal erythropoiesis in human and that it might have a role in controlling the ratio between hemoglobin (Hb) and cell volume and in the dynamics of RBC maturation in healthy individuals. Finally, TAF3 represents a potential candidate or a modifier gene for disorders of red cell membrane.

    PloS one 2013;8;7;e69206

  • NDUFA4 Mutations Underlie Dysfunction of a Cytochrome c Oxidase Subunit Linked to Human Neurological Disease.

    Pitceathly RD, Rahman S, Wedatilake Y, Polke JM, Cirak S, Foley AR, Sailer A, Hurles ME, Stalker J, Hargreaves I, Woodward CE, Sweeney MG, Muntoni F, Houlden H, UK10K Consortium, Taanman JW and Hanna MG

    MRC Centre for Neuromuscular Diseases, UCL Institute of Neurology and National Hospital for Neurology and Neurosurgery, Queen Square, London WC1N 3BG, UK.

    The molecular basis of cytochrome c oxidase (COX, complex IV) deficiency remains genetically undetermined in many cases. Homozygosity mapping and whole-exome sequencing were performed in a consanguineous pedigree with isolated COX deficiency linked to a Leigh syndrome neurological phenotype. Unexpectedly, affected individuals harbored homozygous splice donor site mutations in NDUFA4, a gene previously assigned to encode a mitochondrial respiratory chain complex I (NADH:ubiquinone oxidoreductase) subunit. Western blot analysis of denaturing gels and immunocytochemistry revealed undetectable steady-state NDUFA4 protein levels, indicating that the mutation causes a loss-of-function effect in the homozygous state. Analysis of one- and two-dimensional blue-native polyacrylamide gels confirmed an interaction between NDUFA4 and the COX enzyme complex in control muscle, whereas the COX enzyme complex without NDUFA4 was detectable with no abnormal subassemblies in patient muscle. These observations support recent work in cell lines suggesting that NDUFA4 is an additional COX subunit and demonstrate that NDUFA4 mutations cause human disease. Our findings support reassignment of the NDUFA4 protein to complex IV and suggest that patients with unexplained COX deficiency should be screened for NDUFA4 mutations.

    Cell reports 2013

  • COX10 mutations resulting in complex multisystem mitochondrial disease that remains stable into adulthood.

    Pitceathly RD, Taanman JW, Rahman S, Meunier B, Sadowski M, Cirak S, Hargreaves I, Land JM, Nanji T, Polke JM, Woodward CE, Sweeney MG, Solanki S, Foley AR, Hurles ME, Stalker J, Blake J, Holton JL, Phadke R, Muntoni F, Reilly MM, Hanna MG and UK10K Consortium

    Medical Research Council Centre for Neuromuscular Diseases, University College London Institute of Neurology and National Hospital for Neurology and Neurosurgery, London, England.

    Importance: Isolated cytochrome-c oxidase (COX) deficiency is one of the most frequent respiratory chain defects seen in human mitochondrial disease. Typically, patients present with severe neonatal multisystem disease and have an early fatal outcome. We describe an adult patient with isolated COX deficiency associated with a relatively mild clinical phenotype comprising myopathy; demyelinating neuropathy; premature ovarian failure; short stature; hearing loss; pigmentary maculopathy; and renal tubular dysfunction.

    Observations: Whole-exome sequencing detected 1 known pathogenic and 1 novel COX10 mutation: c.1007A>T; p.Asp336Val, previously associated with fatal infantile COX deficiency, and c.1015C>T; p.Arg339Trp. Muscle COX holoenzyme and subassemblies were undetectable on immunoblots of blue-native gels, whereas denaturing gels and immunocytochemistry showed reduced core subunit MTCO1. Heme absorption spectra revealed low heme aa3 compatible with heme A:farnesyltransferase deficiency due to COX10 dysfunction. Both mutations demonstrated respiratory deficiency in yeast, confirming pathogenicity. A COX10 protein model was used to predict the structural consequences of the novel Arg339Trp and all previously reported substitutions.

    These findings establish that COX10 mutations cause adult mitochondrial disease. Nuclear modifiers, epigenetic phenomenon, and/or environmental factors may influence the disease phenotype caused by reduced COX activity and contribute to the variable clinical severity related to COX10 dysfunction.

    Funded by: Department of Health; Medical Research Council: G0601943, G0800674, U117581331; NINDS NIH HHS: 1U54NS065712-01; Wellcome Trust: WT091310

    JAMA neurology 2013;70;12;1556-61

  • High-fat feeding rapidly induces obesity and lipid derangements in C57BL/6N mice.

    Podrini C, Cambridge EL, Lelliott CJ, Carragher DM, Estabel J, Gerdin AK, Karp NA, Scudamore CL, Sanger Mouse Genetics Project, Ramirez-Solis R and White JK

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    C57BL/6N (B6N) is becoming the standard background for genetic manipulation of the mouse genome. The B6N, whose genome is very closely related to the reference C57BL/6J genome, is versatile in a wide range of phenotyping and experimental settings and large repositories of B6N ES cells have been developed. Here, we present a series of studies showing the baseline characteristics of B6N fed a high-fat diet (HFD) for up to 12 weeks. We show that HFD-fed B6N mice show increased weight gain, fat mass, and hypercholesterolemia compared to control diet-fed mice. In addition, HFD-fed B6N mice display a rapid onset of lipid accumulation in the liver with both macro- and microvacuolation, which became more severe with increasing duration of HFD. Our results suggest that the B6N mouse strain is a versatile background for studying diet-induced metabolic syndrome and may also represent a model for early nonalcoholic fatty liver disease.

    Mammalian genome : official journal of the International Mammalian Genome Society 2013

  • Genome-wide mutational signatures of aristolochic acid and its application as a screening tool.

    Poon SL, Pang ST, McPherson JR, Yu W, Huang KK, Guan P, Weng WH, Siew EY, Liu Y, Heng HL, Chong SC, Gan A, Tay ST, Lim WK, Cutcutache I, Huang D, Ler LD, Nairismägi ML, Lee MH, Chang YH, Yu KJ, Chan-On W, Li BK, Yuan YF, Qian CN, Ng KF, Wu CF, Hsu CL, Bunte RM, Stratton MR, Futreal PA, Sung WK, Chuang CK, Ong CK, Rozen SG, Tan P and Teh BT

    NCCS-VARI Translational Research Laboratory, Division of Medical Sciences, National Cancer Centre Singapore, 11 Hospital Drive, Singapore 169610, Singapore.

    Aristolochic acid (AA), a natural product of Aristolochia plants found in herbal remedies and health supplements, is a group 1 carcinogen that can cause nephrotoxicity and upper urinary tract urothelial cell carcinoma (UTUC). Whole-genome and exome analysis of nine AA-associated UTUCs revealed a strikingly high somatic mutation rate (150 mutations/Mb), exceeding smoking-associated lung cancer (8 mutations/Mb) and ultraviolet radiation-associated melanoma (111 mutations/Mb). The AA-UTUC mutational signature was characterized by A:T to T:A transversions at the sequence motif A[C|T]AGG, located primarily on nontranscribed strands. AA-induced mutations were also significantly enriched at splice sites, suggesting a role for splice-site mutations in UTUC pathogenesis. RNA sequencing of AA-UTUC confirmed a general up-regulation of nonsense-mediated decay machinery components and aberrant splicing events associated with splice-site mutations. We observed a high frequency of somatic mutations in chromatin modifiers, particularly KDM6A, in AA-UTUC, demonstrated the sufficiency of AA to induce renal dysplasia in mice, and reproduced the AA mutational signature in experimentally treated human renal tubular cells. Finally, exploring other malignancies that were not known to be associated with AA, we screened 93 hepatocellular carcinoma genomes/exomes and identified AA-like mutational signatures in 11. Our study highlights an unusual genome-wide AA mutational signature and the potential use of mutation signatures as "molecular fingerprints" for interrogating high-throughput cancer genome data to infer previous carcinogen exposures.

    Science translational medicine 2013;5;197;197ra101

  • A meta-analysis of thyroid-related traits reveals novel Loci and gender-specific differences in the regulation of thyroid function.

    Porcu E, Medici M, Pistis G, Volpato CB, Wilson SG, Cappola AR, Bos SD, Deelen J, den Heijer M, Freathy RM, Lahti J, Liu C, Lopez LM, Nolte IM, O'Connell JR, Tanaka T, Trompet S, Arnold A, Bandinelli S, Beekman M, Böhringer S, Brown SJ, Buckley BM, Camaschella C, de Craen AJ, Davies G, de Visser MC, Ford I, Forsen T, Frayling TM, Fugazzola L, Gögele M, Hattersley AT, Hermus AR, Hofman A, Houwing-Duistermaat JJ, Jensen RA, Kajantie E, Kloppenburg M, Lim EM, Masciullo C, Mariotti S, Minelli C, Mitchell BD, Nagaraja R, Netea-Maier RT, Palotie A, Persani L, Piras MG, Psaty BM, Räikkönen K, Richards JB, Rivadeneira F, Sala C, Sabra MM, Sattar N, Shields BM, Soranzo N, Starr JM, Stott DJ, Sweep FC, Usala G, van der Klauw MM, van Heemst D, van Mullem A, H Vermeulen S, Visser WE, Walsh JP, Westendorp RG, Widen E, Zhai G, Cucca F, Deary IJ, Eriksson JG, Ferrucci L, Fox CS, Jukema JW, Kiemeney LA, Pramstaller PP, Schlessinger D, Shuldiner AR, Slagboom EP, Uitterlinden AG, Vaidya B, Visser TJ, Wolffenbuttel BH, Meulenbelt I, Rotter JI, Spector TD, Hicks AA, Toniolo D, Sanna S, Peeters RP and Naitza S

    Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche, c/o Cittadella Universitaria di Monserrato, Monserrato, Cagliari, Italy ; Dipartimento di Scienze Biomediche, Università di Sassari, Sassari, Italy.

    Thyroid hormone is essential for normal metabolism and development, and overt abnormalities in thyroid function lead to common endocrine disorders affecting approximately 10% of individuals over their life span. In addition, even mild alterations in thyroid function are associated with weight changes, atrial fibrillation, osteoporosis, and psychiatric disorders. To identify novel variants underlying thyroid function, we performed a large meta-analysis of genome-wide association studies for serum levels of the highly heritable thyroid function markers TSH and FT4, in up to 26,420 and 17,520 euthyroid subjects, respectively. Here we report 26 independent associations, including several novel loci for TSH (PDE10A, VEGFA, IGFBP5, NFIA, SOX9, PRDM11, FGF7, INSR, ABO, MIR1179, NRG1, MBIP, ITPK1, SASH1, GLIS3) and FT4 (LHX3, FOXE1, AADAT, NETO1/FBXO15, LPCAT2/CAPNS2). Notably, only limited overlap was detected between TSH and FT4 associated signals, in spite of the feedback regulation of their circulating levels by the hypothalamic-pituitary-thyroid axis. Five of the reported loci (PDE8B, PDE10A, MAF/LOC440389, NETO1/FBXO15, and LPCAT2/CAPNS2) show strong gender-specific differences, which offer clues for the known sexual dimorphism in thyroid function and related pathologies. Importantly, the TSH-associated loci contribute not only to variation within the normal range, but also to TSH values outside the reference range, suggesting that they may be involved in thyroid dysfunction. Overall, our findings explain, respectively, 5.64% and 2.30% of total TSH and FT4 trait variance, and they improve the current knowledge of the regulation of hypothalamic-pituitary-thyroid axis function and the consequences of genetic variation for hypo- or hyperthyroidism.

    PLoS genetics 2013;9;2;e1003266

  • Single-cell mutational profiling and clonal phylogeny in cancer.

    Potter NE, Ermini L, Papaemmanuil E, Cazzaniga G, Vijayaraghavan G, Titley I, Ford A, Campbell P, Kearney L and Greaves M

    The Institute of Cancer Research, London, SM2 5NG, United Kingdom;

    The development of cancer is a dynamic evolutionary process in which intraclonal, genetic diversity provides a substrate for clonal selection and a source of therapeutic escape. The complexity and topography of intraclonal genetic architectures have major implications for biopsy-based prognosis and for targeted therapy. High-depth, next-generation sequencing (NGS) efficiently captures the mutational load of individual tumors or biopsies. But, being a snapshot portrait of total DNA, it disguises the fundamental features of subclonal variegation of genetic lesions and of clonal phylogeny. Single-cell genetic profiling provides a potential resolution to this problem, but methods developed to date all have limitations. We present a novel solution to this challenge using leukemic cells with known mutational spectra as a tractable model. DNA from flow-sorted single cells is screened using multiplex targeted Q-PCR within a microfluidic platform allowing unbiased single-cell selection, high-throughput, and comprehensive analysis for all main varieties of genetic abnormalities: chimeric gene fusions, copy number alterations, and single-nucleotide variants. We show, in this proof-of-principle study, that the method has a low error rate and can provide detailed subclonal genetic architectures and phylogenies.

    Genome research 2013

  • Identification of Small Exonic CNV from Whole-Exome Sequence Data and Application to Autism Spectrum Disorder.

    Poultney CS, Goldberg AP, Drapeau E, Kou Y, Harony-Nicolas H, Kajiwara Y, De Rubeis S, Durand S, Stevens C, Rehnström K, Palotie A, Daly MJ, Ma'ayan A, Fromer M and Buxbaum JD

    Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.

    Copy number variation (CNV) is an important determinant of human diversity and plays important roles in susceptibility to disease. Most studies of CNV carried out to date have made use of chromosome microarray and have had a lower size limit for detection of about 30 kilobases (kb). With the emergence of whole-exome sequencing studies, we asked whether such data could be used to reliably call rare exonic CNV in the size range of 1-30 kilobases (kb), making use of the eXome Hidden Markov Model (XHMM) program. By using both transmission information and validation by molecular methods, we confirmed that small CNV encompassing as few as three exons can be reliably called from whole-exome data. We applied this approach to an autism case-control sample (n = 811, mean per-target read depth = 161) and observed a significant increase in the burden of rare (MAF ≤1%) 1-30 kb CNV, 1-30 kb deletions, and 1-10 kb deletions in ASD. CNV in the 1-30 kb range frequently hit just a single gene, and we were therefore able to carry out enrichment and pathway analyses, where we observed enrichment for disruption of genes in cytoskeletal and autophagy pathways in ASD. In summary, our results showed that XHMM provided an effective means to assess small exonic CNV from whole-exome data, indicated that rare 1-30 kb exonic deletions could contribute to risk in up to 7% of individuals with ASD, and implicated a candidate pathway in developmental delay syndromes.

    American journal of human genetics 2013;93;4;607-19

  • Comparative study of transcriptome profiles of mechanical- and skin-transformed Schistosoma mansoni schistosomula.

    Protasio AV, Dunne DW and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.

    Schistosome infection begins with the penetration of cercariae through healthy unbroken host skin. This process leads to the transformation of the free-living larvae into obligate parasites called schistosomula. This irreversible transformation, which occurs in as little as two hours, involves casting the cercaria tail and complete remodelling of the surface membrane. At this stage, parasites are vulnerable to host immune attack and oxidative stress. Consequently, the mechanisms by which the parasite recognises and swiftly adapts to the human host are still the subject of many studies, especially in the context of development of intervention strategies against schistosomiasis infection. Because obtaining enough material from in vivo infections is not always feasible for such studies, the transformation process is often mimicked in the laboratory by application of shear pressure to a cercarial sample resulting in mechanically transformed (MT) schistosomula. These parasites share remarkable morphological and biochemical similarity to the naturally transformed counterparts and have been considered a good proxy for parasites undergoing natural infection. Relying on this equivalency, MT schistosomula have been used almost exclusively in high-throughput studies of gene expression, identification of drug targets and identification of effective drugs against schistosomes. However, the transcriptional equivalency between skin-transformed (ST) and MT schistosomula has never been proven. In our approach to compare these two types of schistosomula preparations and to explore differences in gene expression triggered by the presence of a skin barrier, we performed RNA-seq transcriptome profiling of ST and MT schistosomula at 24 hours post transformation. We report that these two very distinct schistosomula preparations differ only in the expression of 38 genes (out of ∼11,000), providing convincing evidence to resolve the skin vs. mechanical long-lasting controversy.

    Funded by: Wellcome Trust: WT 083931/Z/07/Z, WT 098051

    PLoS neglected tropical diseases 2013;7;3;e2091

  • Targeting MYCN in Neuroblastoma by BET Bromodomain Inhibition.

    Puissant A, Frumm SM, Alexe G, Bassil CF, Qi J, Chanthery YH, Nekritz EA, Zeid R, Gustafson WC, Greninger P, Garnett MJ, McDermott U, Benes CH, Kung AL, Weiss WA, Bradner JE and Stegmaier K

    Departments of 1Pediatric Oncology and 2Medical Oncology, Dana-Farber Cancer Institute; 3Boston Children's Hospital; 4Department of Medicine, Harvard Medical School; 5Bioinformatics Graduate Program, Boston University, Boston; 6The Broad Institute of Harvard University and Massachusetts Institute of Technology, Cambridge; 7Massachusetts General Hospital Cancer Center, Harvard Medical School, Charlestown, Massachusetts; 8Department of Pediatrics, Helen Diller Family Comprehensive Cancer Center; 9Departments of Neurology and Neurosurgery, Brain Tumor Research Center, University of California, San Francisco, San Francisco, California; and 10Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom.

    Bromodomain inhibition comprises a promising therapeutic strategy in cancer, particularly for hematologic malignancies. To date, however, genomic biomarkers to direct clinical translation have been lacking. We conducted a cell-based screen of genetically defined cancer cell lines using a prototypical inhibitor of BET bromodomains. Integration of genetic features with chemosensitivity data revealed a robust correlation between MYCN amplification and sensitivity to bromodomain inhibition. We characterized the mechanistic and translational significance of this finding in neuroblastoma, a childhood cancer with frequent amplification of MYCN. Genome-wide expression analysis showed downregulation of the MYCN transcriptional program accompanied by suppression of MYCN transcription. Functionally, bromodomain-mediated inhibition of MYCN impaired growth and induced apoptosis in neuroblastoma. BRD4 knockdown phenocopied these effects, establishing BET bromodomains as transcriptional regulators of MYCN. BET inhibition conferred a significant survival advantage in 3 in vivo neuroblastoma models, providing a compelling rationale for developing BET bromodomain inhibitors in patients with neuroblastoma.

    Funded by: NCI NIH HHS: P01 CA081403, R01 CA102321

    Cancer discovery 2013

  • SpoIVA and SipL are Clostridium difficile spore morphogenetic proteins.

    Putnam EE, Nock AM, Lawley TD and Shen A

    Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont, USA.

    Clostridium difficile is a major nosocomial pathogen whose infections are difficult to treat because of their frequent recurrence. The spores of C. difficile are responsible for these clinical features, as they resist common disinfectants and antibiotic treatment. Although spores are the major transmissive form of C. difficile, little is known about their composition or morphogenesis. Spore morphogenesis has been well characterized for Bacillus sp., but Bacillus sp. spore coat proteins are poorly conserved in Clostridium sp. Of the known spore morphogenetic proteins in Bacillus subtilis, SpoIVA is one of the mostly highly conserved in the Bacilli and the Clostridia. Using genetic analyses, we demonstrate that SpoIVA is required for proper spore morphogenesis in C. difficile. In particular, a spoIVA mutant exhibits defects in spore coat localization but not cortex formation. Our study also identifies SipL, a previously uncharacterized protein found in proteomic studies of C. difficile spores, as another critical spore morphogenetic protein, since a sipL mutant phenocopies a spoIVA mutant. Biochemical analyses and mutational analyses indicate that SpoIVA and SipL directly interact. This interaction depends on the Walker A ATP binding motif of SpoIVA and the LysM domain of SipL. Collectively, these results provide the first insights into spore morphogenesis in C. difficile.

    Funded by: NCRR NIH HHS: P20RR021905; NIGMS NIH HHS: R00 GM092934, R00GM092934

    Journal of bacteriology 2013;195;6;1214-25

  • A genetic progression model of Braf(V600E)-induced intestinal tumorigenesis reveals targets for therapeutic intervention.

    Rad R, Cadiñanos J, Rad L, Varela I, Strong A, Kriegl L, Constantino-Casas F, Eser S, Hieber M, Seidler B, Price S, Fraga MF, Calvanese V, Hoffman G, Ponstingl H, Schneider G, Yusa K, Grove C, Schmid RM, Wang W, Vassiliou G, Kirchner T, McDermott U, Liu P, Saur D and Bradley A

    Department of Medicine II, Klinikum Rechts der Isar, Technische Universität München, 81675, München, Germany. roland.rad@lrz.tum.de

    We show that BRAF(V600E) initiates an alternative pathway to colorectal cancer (CRC), which progresses through a hyperplasia/adenoma/carcinoma sequence. This pathway underlies significant subsets of CRCs with distinctive pathomorphologic/genetic/epidemiologic/clinical characteristics. Genetic and functional analyses in mice revealed a series of stage-specific molecular alterations driving different phases of tumor evolution and uncovered mechanisms underlying this stage specificity. We further demonstrate dose-dependent effects of oncogenic signaling, with physiologic Braf(V600E) expression being sufficient for hyperplasia induction, but later stage intensified Mapk-signaling driving both tumor progression and activation of intrinsic tumor suppression. Such phenomena explain, for example, the inability of p53 to restrain tumor initiation as well as its importance in invasiveness control, and the late stage specificity of its somatic mutation. Finally, systematic drug screening revealed sensitivity of this CRC subtype to targeted therapeutics, including Mek or combinatorial PI3K/Braf inhibition.

    Funded by: Wellcome Trust: 095663

    Cancer cell 2013;24;1;15-29

  • Dnmt2-dependent methylomes lack defined DNA methylation patterns.

    Raddatz G, Guzzardo PM, Olova N, Fantappié MR, Rampp M, Schaefer M, Reik W, Hannon GJ and Lyko F

    Division of Epigenetics, DKFZ-ZMBH Alliance, German Cancer Research Center, 69120 Heidelberg, Germany.

    Several organisms have retained methyltransferase 2 (Dnmt2) as their only candidate DNA methyltransferase gene. However, information about Dnmt2-dependent methylation patterns has been limited to a few isolated loci and the results have been discussed controversially. In addition, recent studies have shown that Dnmt2 functions as a tRNA methyltransferase, which raised the possibility that Dnmt2-only genomes might be unmethylated. We have now used whole-genome bisulfite sequencing to analyze the methylomes of Dnmt2-only organisms at single-base resolution. Our results show that the genomes of Schistosoma mansoni and Drosophila melanogaster lack detectable DNA methylation patterns. Residual unconverted cytosine residues shared many attributes with bisulfite deamination artifacts and were observed at comparable levels in Dnmt2-deficient flies. Furthermore, genetically modified Dnmt2-only mouse embryonic stem cells lost the DNA methylation patterns found in wild-type cells. Our results thus uncover fundamental differences among animal methylomes and suggest that DNA methylation is dispensable for a considerable number of eukaryotic organisms.

    Proceedings of the National Academy of Sciences of the United States of America 2013;110;21;8627-31

  • Rare variants in single-minded 1 (SIM1) are associated with severe obesity.

    Ramachandrappa S, Raimondo A, Cali AM, Keogh JM, Henning E, Saeed S, Thompson A, Garg S, Bochukova EG, Brage S, Trowse V, Wheeler E, Sullivan AE, Dattani M, Clayton PE, Datta V, Datta V, Bruning JB, Wareham NJ, O'Rahilly S, Peet DJ, Barroso I, Whitelaw ML and Farooqi IS

    University of Cambridge Metabolic Research Laboratories and NIHR Cambridge Biomedical Research Centre, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge, United Kingdom.

    Single-minded 1 (SIM1) is a basic helix-loop-helix transcription factor involved in the development and function of the paraventricular nucleus of the hypothalamus. Obesity has been reported in Sim1 haploinsufficient mice and in a patient with a balanced translocation disrupting SIM1. We sequenced the coding region of SIM1 in 2,100 patients with severe, early onset obesity and in 1,680 controls. Thirteen different heterozygous variants in SIM1 were identified in 28 unrelated severely obese patients. Nine of the 13 variants significantly reduced the ability of SIM1 to activate a SIM1-responsive reporter gene when studied in stably transfected cells coexpressing the heterodimeric partners of SIM1 (ARNT or ARNT2). SIM1 variants with reduced activity cosegregated with obesity in extended family studies with variable penetrance. We studied the phenotype of patients carrying variants that exhibited reduced activity in vitro. Variant carriers exhibited increased ad libitum food intake at a test meal, normal basal metabolic rate, and evidence of autonomic dysfunction. Eleven of the 13 probands had evidence of a neurobehavioral phenotype. The phenotypic similarities between patients with SIM1 deficiency and melanocortin 4 receptor (MC4R) deficiency suggest that some of the effects of SIM1 deficiency on energy homeostasis are mediated by altered melanocortin signaling.

    Funded by: Medical Research Council: G9824984, MC_U106179471, MC_U106179473, MC_U106188470; NHLBI NIH HHS: HL-102923, HL-102924, HL-102925, HL-102926, HL-103010; Wellcome Trust: 077016/Z/05/Z, 082390/Z/07/Z, 098497

    The Journal of clinical investigation 2013;123;7;3042-50

  • DeNovoGear: de novo indel and point mutation discovery and phasing.

    Ramu A, Noordam MJ, Schwartz RS, Wuster A, Hurles ME, Cartwright RA and Conrad DF

    1] Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, USA. [2].

    We present DeNovoGear software for analyzing de novo mutations from familial and somatic tissue sequencing data. DeNovoGear uses likelihood-based error modeling to reduce the false positive rate of mutation discovery in exome analysis and fragment information to identify the parental origin of germ-line mutations. We used DeNovoGear on human whole-genome sequencing data to produce a set of predicted de novo insertion and/or deletion (indel) mutations with a 95% validation rate.

    Nature methods 2013

  • Sex-stratified genome-wide association studies including 270,000 individuals show sexual dimorphism in genetic loci for anthropometric traits.

    Randall JC, Winkler TW, Kutalik Z, Berndt SI, Jackson AU, Monda KL, Kilpeläinen TO, Esko T, Mägi R, Li S, Workalemahu T, Feitosa MF, Croteau-Chonka DC, Day FR, Fall T, Ferreira T, Gustafsson S, Locke AE, Mathieson I, Scherag A, Vedantam S, Wood AR, Liang L, Steinthorsdottir V, Thorleifsson G, Dermitzakis ET, Dimas AS, Karpe F, Min JL, Nicholson G, Clegg DJ, Person T, Krohn JP, Bauer S, Buechler C, Eisinger K, DIAGRAM Consortium, Bonnefond A, Froguel P, MAGIC Investigators, Hottenga JJ, Prokopenko I, Waite LL, Harris TB, Smith AV, Shuldiner AR, McArdle WL, Caulfield MJ, Munroe PB, Grönberg H, Chen YD, Li G, Beckmann JS, Johnson T, Thorsteinsdottir U, Teder-Laving M, Khaw KT, Wareham NJ, Zhao JH, Amin N, Oostra BA, Kraja AT, Province MA, Cupples LA, Heard-Costa NL, Kaprio J, Ripatti S, Surakka I, Collins FS, Saramies J, Tuomilehto J, Jula A, Salomaa V, Erdmann J, Hengstenberg C, Loley C, Schunkert H, Lamina C, Wichmann HE, Albrecht E, Gieger C, Hicks AA, Johansson A, Pramstaller PP, Kathiresan S, Speliotes EK, Penninx B, Hartikainen AL, Jarvelin MR, Gyllensten U, Boomsma DI, Campbell H, Wilson JF, Chanock SJ, Farrall M, Goel A, Medina-Gomez C, Rivadeneira F, Estrada K, Uitterlinden AG, Hofman A, Zillikens MC, den Heijer M, Kiemeney LA, Maschio A, Hall P, Tyrer J, Teumer A, Völzke H, Kovacs P, Tönjes A, Mangino M, Spector TD, Hayward C, Rudan I, Hall AS, Samani NJ, Attwood AP, Sambrook JG, Hung J, Palmer LJ, Lokki ML, Sinisalo J, Boucher G, Huikuri H, Lorentzon M, Ohlsson C, Eklund N, Eriksson JG, Barlassina C, Rivolta C, Nolte IM, Snieder H, Van der Klauw MM, Van Vliet-Ostaptchouk JV, Gejman PV, Shi J, Jacobs KB, Wang Z, Bakker SJ, Mateo Leach I, Navis G, van der Harst P, Martin NG, Medland SE, Montgomery GW, Yang J, Chasman DI, Ridker PM, Rose LM,