Sanger Institute - Publications 2016

Number of papers published in 2016: 555

  • Whole-Genome Sequencing for Routine Pathogen Surveillance in Public Health: a Population Snapshot of Invasive Staphylococcus aureus in Europe.

    Aanensen DM, Feil EJ, Holden MT, Dordel J, Yeats CA, Fedosejev A, Goater R, Castillo-Ramírez S, Corander J, Colijn C, Chlebowicz MA, Schouls L, Heck M, Pluister G, Ruimy R, Kahlmeter G, Åhman J, Matuschek E, Friedrich AW, Parkhill J, Bentley SD, Spratt BG, Grundmann H and European SRL Working Group

    Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom The Centre for Genomic Pathogen Surveillance, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom.

    Unlabelled: The implementation of routine whole-genome sequencing (WGS) promises to transform our ability to monitor the emergence and spread of bacterial pathogens. Here we combined WGS data from 308 invasive Staphylococcus aureus isolates corresponding to a pan-European population snapshot, with epidemiological and resistance data. Geospatial visualization of the data is made possible by a generic software tool designed for public health purposes that is available at the project URL ( Our analysis demonstrates that high-risk clones can be identified on the basis of population level properties such as clonal relatedness, abundance, and spatial structuring and by inferring virulence and resistance properties on the basis of gene content. We also show that in silico predictions of antibiotic resistance profiles are at least as reliable as phenotypic testing. We argue that this work provides a comprehensive road map illustrating the three vital components for future molecular epidemiological surveillance: (i) large-scale structured surveys, (ii) WGS, and (iii) community-oriented database infrastructure and analysis tools.

    Importance: The spread of antibiotic-resistant bacteria is a public health emergency of global concern, threatening medical intervention at every level of health care delivery. Several recent studies have demonstrated the promise of routine whole-genome sequencing (WGS) of bacterial pathogens for epidemiological surveillance, outbreak detection, and infection control. However, as this technology becomes more widely adopted, the key challenges of generating representative national and international data sets and the development of bioinformatic tools to manage and interpret the data become increasingly pertinent. This study provides a road map for the integration of WGS data into routine pathogen surveillance. We emphasize the importance of large-scale routine surveys to provide the population context for more targeted or localized investigation and the development of open-access bioinformatic tools to provide the means to combine and compare independently generated data with publicly available data sets.

    Funded by: Medical Research Council: G1000803

    mBio 2016;7;3

  • Genomic prediction of coronary heart disease.

    Abraham G, Havulinna AS, Bhalala OG, Byars SG, De Livera AM, Yetukuri L, Tikkanen E, Perola M, Schunkert H, Sijbrands EJ, Palotie A, Samani NJ, Salomaa V, Ripatti S and Inouye M

    Centre for Systems Genomics, School of BioSciences, The University of Melbourne, Parkville, Victoria 3010, Australia Department of Pathology, The University of Melbourne, Parkville, Victoria 3010, Australia.

    Aims: Genetics plays an important role in coronary heart disease (CHD) but the clinical utility of genomic risk scores (GRSs) relative to clinical risk scores, such as the Framingham Risk Score (FRS), is unclear. Our aim was to construct and externally validate a CHD GRS, in terms of lifetime CHD risk and relative to traditional clinical risk scores.

    Methods and results: We generated a GRS of 49 310 SNPs based on a CARDIoGRAMplusC4D Consortium meta-analysis of CHD, then independently tested it using five prospective population cohorts (three FINRISK cohorts, combined n = 12 676, 757 incident CHD events; two Framingham Heart Study cohorts (FHS), combined n = 3406, 587 incident CHD events). The GRS was associated with incident CHD (FINRISK HR = 1.74, 95% confidence interval (CI) 1.61-1.86 per S.D. of GRS; Framingham HR = 1.28, 95% CI 1.18-1.38), and was largely unchanged by adjustment for known risk factors, including family history. Integration of the GRS with the FRS or ACC/AHA13 scores improved the 10 years risk prediction (meta-analysis C-index: +1.5-1.6%, P < 0.001), particularly for individuals ≥60 years old (meta-analysis C-index: +4.6-5.1%, P < 0.001). Importantly, the GRS captured substantially different trajectories of absolute risk, with men in the top 20% of attaining 10% cumulative CHD risk 12-18 y earlier than those in the bottom 20%. High genomic risk was partially compensated for by low systolic blood pressure, low cholesterol level, and non-smoking.

    Conclusions: A GRS based on a large number of SNPs improves CHD risk prediction and encodes different trajectories of lifetime risk not captured by traditional clinical risk scores.

    European heart journal 2016

  • αv Integrins combine with LC3 and atg5 to regulate Toll-like receptor signalling in B cells.

    Acharya M, Sokolovska A, Tam JM, Conway KL, Stefani C, Raso F, Mukhopadhyay S, Feliu M, Paul E, Savill J, Hynes RO, Xavier RJ, Vyas JM, Stuart LM and Lacy-Hulbert A

    Immunology Program, Benaroya Research Institute, 1201 Ninth Avenue, Seattle, Washington 98101, USA.

    Integrin signalling triggers cytoskeletal rearrangements, including endocytosis and exocytosis of integrins and other membrane proteins. In addition to recycling integrins, this trafficking can also regulate intracellular signalling pathways. Here we describe a role for αv integrins in regulating Toll-like receptor (TLR) signalling by modulating intracellular trafficking. We show that deletion of αv or β3 causes increased B-cell responses to TLR stimulation in vitro, and αv-conditional knockout mice have elevated antibody responses to TLR-ligand-associated antigens. αv regulates TLR signalling by promoting recruitment of the autophagy component LC3 (microtubule-associated proteins 1 light chain 3) to TLR-containing endosomes, which is essential for progression from NF-κB to IRF signalling, and ultimately for traffic to lysosomes where signalling is terminated. Disruption of LC3 recruitment leads to prolonged NF-κB signalling and increased B-cell proliferation and antibody production. This work identifies a previously unrecognized role for αv and the autophagy components LC3 and atg5 in regulating TLR signalling and B-cell immunity.

    Funded by: NIDDK NIH HHS: R01 DK093695

    Nature communications 2016;7;10917

  • G9a inhibition potentiates the anti-tumour activity of DNA double-strand break inducing agents by impairing DNA repair independent of p53 status.

    Agarwal P and Jackson SP

    The Wellcome Trust/Cancer Research UK Gurdon Institute and Department of Biochemistry, University of Cambridge, Cambridge CB2 1QN, UK.

    Cancer cells often exhibit altered epigenetic signatures that can misregulate genes involved in processes such as transcription, proliferation, apoptosis and DNA repair. As regulation of chromatin structure is crucial for DNA repair processes, and both DNA repair and epigenetic controls are deregulated in many cancers, we speculated that simultaneously targeting both might provide new opportunities for cancer therapy. Here, we describe a focused screen that profiled small-molecule inhibitors targeting epigenetic regulators in combination with DNA double-strand break (DSB) inducing agents. We identify UNC0638, a catalytic inhibitor of histone lysine N-methyl-transferase G9a, as hypersensitising tumour cells to low doses of DSB-inducing agents without affecting the growth of the non-tumorigenic cells tested. Similar effects are also observed with another, structurally distinct, G9a inhibitor A-366. We also show that small-molecule inhibition of G9a or siRNA-mediated G9a depletion induces tumour cell death under low DNA damage conditions by impairing DSB repair in a p53 independent manner. Furthermore, we establish that G9a promotes DNA non-homologous end-joining in response to DSB-inducing genotoxic stress. This study thus highlights the potential for using G9a inhibitors as anti-cancer therapeutic agents in combination with DSB-inducing chemotherapeutic drugs such as etoposide.

    Cancer letters 2016;380;2;467-475

  • Sleeping Beauty screen reveals Pparg activation in metastatic prostate cancer.

    Ahmad I, Mui E, Galbraith L, Patel R, Tan EH, Salji M, Rust AG, Repiscak P, Hedley A, Markert E, Loveridge C, van der Weyden L, Edwards J, Sansom OJ, Adams DJ and Leung HY

    Cancer Research UK Beatson Institute, Bearsden, Glasgow G61 1BD, United Kingdom; Institute of Cancer Sciences, University of Glasgow, Glasgow G61 1QH, United Kingdom;

    Prostate cancer (CaP) is the most common adult male cancer in the developed world. The paucity of biomarkers to predict prostate tumor biology makes it important to identify key pathways that confer poor prognosis and guide potential targeted therapy. Using a murine forward mutagenesis screen in a Pten-null background, we identified peroxisome proliferator-activated receptor gamma (Pparg), encoding a ligand-activated transcription factor, as a promoter of metastatic CaP through activation of lipid signaling pathways, including up-regulation of lipid synthesis enzymes [fatty acid synthase (FASN), acetyl-CoA carboxylase (ACC), ATP citrate lyase (ACLY)]. Importantly, inhibition of PPARG suppressed tumor growth in vivo, with down-regulation of the lipid synthesis program. We show that elevated levels of PPARG strongly correlate with elevation of FASN in human CaP and that high levels of PPARG/FASN and PI3K/pAKT pathway activation confer a poor prognosis. These data suggest that CaP patients could be stratified in terms of PPARG/FASN and PTEN levels to identify patients with aggressive CaP who may respond favorably to PPARG/FASN inhibition.

    Proceedings of the National Academy of Sciences of the United States of America 2016;113;29;8290-5

  • Established BMI-associated genetic variants and their prospective associations with BMI and other cardiometabolic traits: The GLACIER Study.

    Ahmad S, Poveda A, Shungin D, Barroso I, Hallmans G, Renström F and Franks PW

    Department of Clinical Sciences, Genetic and Molecular Epidemiology Unit, Lund University Diabetes Center, Lund University, Malmö, Sweden.

    Background: Recent cross-sectional genome-wide scans have reported associations of 97 independent loci with body mass index (BMI). In 3541 middle-aged adult participants from the GLACIER Study, we tested whether these loci are associated with 10-year changes in BMI and other cardiometabolic traits (fasting and 2-hr glucose, triglycerides, total cholesterol, and systolic and diastolic blood pressures).

    Methods: A genetic risk score (GRS) was calculated by summing the BMI-associated effect alleles at each locus. Trait-specific cardiometabolic GRSs comprised only the loci that show nominal association (P⩽0.10) with the respective trait in the original cross-sectional study (Locke et al. Nature 2015). In longitudinal genetic association analyses, the second visit trait measure (assessed ~10-years after baseline) was used as the dependent variable and the models were adjusted for the baseline measure of the outcome trait, age, age(2), fasting time (for glucose and lipid traits), sex, follow-up time, and population substructure.

    Results: The BMI-specific GRS was associated with increased BMI at follow-up (β=0.014 kg/m(2) per allele per 10-year follow-up, s.e.=0.006, P=0.019) as were three loci (PARK2 rs13191362, P=0.005; C6orf106 rs205262, P=0.043; and C9orf93 rs4740619, P=0.005). Although not withstanding Bonferroni correction, a handful of SNPs were nominally associated with changes in blood pressure, glucose and lipid levels.

    Conclusion: Collectively, established BMI-associated loci convey modest but statistically significant time-dependent associations with long-term changes in BMI, suggesting a role for effect-modification by factors that change with time in this population.International Journal of Obesity accepted article preview online, 28 April 2016. doi:10.1038/ijo.2016.72.

    International journal of obesity (2005) 2016

  • Quantitation of next generation sequencing library preparation protocol efficiencies using droplet digital PCR assays - a systematic comparison of DNA library preparation kits for Illumina sequencing.

    Aigrain L, Gu Y and Quail MA

    Wellcome Trust Sanger Institute, Wellcome Trust Campus, Hinxton, Cambs, CB10 1SA, UK.

    Background: The emergence of next-generation sequencing (NGS) technologies in the past decade has allowed the democratization of DNA sequencing both in terms of price per sequenced bases and ease to produce DNA libraries. When it comes to preparing DNA sequencing libraries for Illumina, the current market leader, a plethora of kits are available and it can be difficult for the users to determine which kit is the most appropriate and efficient for their applications; the main concerns being not only cost but also minimal bias, yield and time efficiency.

    Results: We compared 9 commercially available library preparation kits in a systematic manner using the same DNA sample by probing the amount of DNA remaining after each protocol steps using a new droplet digital PCR (ddPCR) assay. This method allows the precise quantification of fragments bearing either adaptors or P5/P7 sequences on both ends just after ligation or PCR enrichment. We also investigated the potential influence of DNA input and DNA fragment size on the final library preparation efficiency. The overall library preparations efficiencies of the libraries show important variations between the different kits with the ones combining several steps into a single one exhibiting some final yields 4 to 7 times higher than the other kits. Detailed ddPCR data also reveal that the adaptor ligation yield itself varies by more than a factor of 10 between kits, certain ligation efficiencies being so low that it could impair the original library complexity and impoverish the sequencing results. When a PCR enrichment step is necessary, lower adaptor-ligated DNA inputs leads to greater amplification yields, hiding the latent disparity between kits.

    Conclusion: We describe a ddPCR assay that allows us to probe the efficiency of the most critical step in the library preparation, ligation, and to draw conclusion on which kits is more likely to preserve the sample heterogeneity and reduce the need of amplification.

    BMC genomics 2016;17;458

  • Ensembl 2017.

    Aken BL, Achuthan P, Akanni W, Amode MR, Bernsdorff F, Bhai J, Billis K, Carvalho-Silva D, Cummins C, Clapham P, Gil L, Girón CG, Gordon L, Hourlier T, Hunt SE, Janacek SH, Juettemann T, Keenan S, Laird MR, Lavidas I, Maurel T, McLaren W, Moore B, Murphy DN, Nag R, Newman V, Nuhn M, Ong CK, Parker A, Patricio M, Riat HS, Sheppard D, Sparrow H, Taylor K, Thormann A, Vullo A, Walts B, Wilder SP, Zadissa A, Kostadima M, Martin FJ, Muffato M, Perry E, Ruffier M, Staines DM, Trevanion SJ, Cunningham F, Yates A, Zerbino DR and Flicek P

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

    Ensembl ( is a database and genome browser for enabling research on vertebrate genomes. We import, analyse, curate and integrate a diverse collection of large-scale reference data to create a more comprehensive view of genome biology than would be possible from any individual dataset. Our extensive data resources include evidence-based gene and regulatory region annotation, genome variation and gene trees. An accompanying suite of tools, infrastructure and programmatic access methods ensure uniform data analysis and distribution for all supported species. Together, these provide a comprehensive solution for large-scale and targeted genomics applications alike. Among many other developments over the past year, we have improved our resources for gene regulation and comparative genomics, and added CRISPR/Cas9 target sites. We released new browser functionality and tools, including improved filtering and prioritization of genome variation, Manhattan plot visualization for linkage disequilibrium and eQTL data, and an ontology search for phenotypes, traits and disease. We have also enhanced data discovery and access with a track hub registry and a selection of new REST end points. All Ensembl data are freely released to the scientific community and our source code is available via the open source Apache 2.0 license.

    Nucleic acids research 2016

  • FHF1 (FGF12) epileptic encephalopathy.

    Al-Mehmadi S, Splitt M, For DDD Study group*, Ramesh V, DeBrosse S, Dessoffy K, Xia F, Yang Y, Rosenfeld JA, Cossette P, Michaud JL, Hamdan FF, Campeau PM, Minassian BA and For CENet Study group‡

    Program in Genetics and Genome Biology and Division of Neurology (S.A.-M., B.A.M.), Department of Paediatrics, The Hospital for Sick Children, and University of Toronto, Ontario, Canada; Institute of Genetic Medicine (M.S.), International Centre for Life, Pediatric Neurology (V.R.), Newcastle General Hospital, UK; Center for Human Genetics (S.D., K.D.), UH Case Medical Center, Cleveland, OH; Department of Molecular and Human Genetics (F.X., Y.Y., J.A.R.), Baylor College of Medicine, Houston, TX; Baylor Miraca Genetics Laboratories (F.X., Y.Y.), Houston, TX; The Deciphering Developmental Disorders (DDD) Study, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK; Division of Neurology (P.C.), CHUM Notre-Dame, Hospital University of Montreal, Quebec, Canada; Department of Pediatrics (J.L.M., P.M.C.), Department of Neurosciences (J.L.M., P.M.C.), Université de Montréal, Québec, Canada; and CHU Sainte-Justine Research Center (J.L.M., F.A.H., P.M.C.), Montreal, Quebec, Canada.

    Voltage-gated sodium channels (Navs) are mainstays of neuronal function, and mutations in the genes encoding CNS Navs (Nav1.1 [SCN1A], Nav1.2 [SCN2A], Nav1.3 [SCN3A], and Nav1.6 [SCN8A]) are causes of some of the most common and severe genetic epilepsies and epileptic encephalopathies (EE).(1) Fibroblast-growth-factor homologous factors (FHFs) compose a family of 4 proteins that interact with the C-terminal tails of Navs to modulate the channels' fast, and long-term, inactivations.(2)FHF2 mutation is a rare cause of generalized epilepsy with febrile seizures plus (GEFS+).(3) Recently, a de novo FHF1 mutation (p.R52H) was reported in early-onset EE in 2 siblings.(4) We report 3 patients from unrelated families with the same FHF1 p.R52H mutation. The 5 cases together frame the FHF1 R52H EE from infancy to adulthood. As discussed below, this gain-of-function disease may be amenable to personalized therapy.

    Funded by: NINDS NIH HHS: U54 NS078059

    Neurology. Genetics 2016;2;6;e115

  • Mutational signatures associated with tobacco smoking in human cancer.

    Alexandrov LB, Ju YS, Haase K, Van Loo P, Martincorena I, Nik-Zainal S, Totoki Y, Fujimoto A, Nakagawa H, Shibata T, Campbell PJ, Vineis P, Phillips DH and Stratton MR

    Theoretical Biology and Biophysics (T-6), Los Alamos National Laboratory, Los Alamos, NM 87545, USA.

    Tobacco smoking increases the risk of at least 17 classes of human cancer. We analyzed somatic mutations and DNA methylation in 5243 cancers of types for which tobacco smoking confers an elevated risk. Smoking is associated with increased mutation burdens of multiple distinct mutational signatures, which contribute to different extents in different cancers. One of these signatures, mainly found in cancers derived from tissues directly exposed to tobacco smoke, is attributable to misreplication of DNA damage caused by tobacco carcinogens. Others likely reflect indirect activation of DNA editing by APOBEC cytidine deaminases and of an endogenous clocklike mutational process. Smoking is associated with limited differences in methylation. The results are consistent with the proposition that smoking increases cancer risk by increasing the somatic mutation load, although direct evidence for this mechanism is lacking in some smoking-related cancer types.

    Funded by: Cancer Research UK; Department of Health; Wellcome Trust

    Science (New York, N.Y.) 2016;354;6312;618-622

  • Decreased Rate of Plasma Arginine Appearance in Murine Malaria May Explain Hypoargininemia in Children With Cerebral Malaria.

    Alkaitis MS, Wang H, Ikeda AK, Rowley CA, MacCormick IJ, Chertow JH, Billker O, Suffredini AF, Roberts DJ, Taylor TE, Seydel KB and Ackerman HC

    Laboratory of Malaria and Vector Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville.

    Background:  Plasmodium infection depletes arginine, the substrate for nitric oxide synthesis, and impairs endothelium-dependent vasodilation. Increased conversion of arginine to ornithine by parasites or host arginase is a proposed mechanism of arginine depletion.

    Methods:  We used high-performance liquid chromatography to measure plasma arginine, ornithine, and citrulline levels in Malawian children with cerebral malaria and in mice infected with Plasmodium berghei ANKA with or without the arginase gene. Heavy isotope-labeled tracers measured by quadrupole time-of-flight liquid chromatography-mass spectrometry were used to quantify the in vivo rate of appearance and interconversion of plasma arginine, ornithine, and citrulline in infected mice.

    Results:  Children with cerebral malaria and P. berghei-infected mice demonstrated depletion of plasma arginine, ornithine, and citrulline. Knock out of Plasmodium arginase did not alter arginine depletion in infected mice. Metabolic tracer analysis demonstrated that plasma arginase flux was unchanged by P. berghei infection. Instead, infected mice exhibited decreased rates of plasma arginine, ornithine, and citrulline appearance and decreased conversion of plasma citrulline to arginine. Notably, plasma arginine use by nitric oxide synthase was decreased in infected mice.

    Conclusions:  Simultaneous arginine and ornithine depletion in malaria parasite-infected children cannot be fully explained by plasma arginase activity. Our mouse model studies suggest that plasma arginine depletion is driven primarily by a decreased rate of appearance.

    The Journal of infectious diseases 2016;214;12;1840-1849

  • Ebola virus disease cluster — Northern Sierra Leone, January 2016

    Alpren,C., Sloan,M., Boegler,K.A., Martin,D.W., Ervin,E., Washburn,F., Rickert,R., Singh,T., Redd,J.T., Bangalie,A., Bass,M., Bennett,S.D., Boateng,I.A., Campbell,D., Cassell,C., COTTON,M., Duffy,N., Goodfellow,I., Hersey,S., Jackson,E.L., Jah,U., Jimissa,A.S., Kamara,A.S., Kamara,F., KELLAM,P., Levine,R., Meredith,L., Miller,L.A., Moody-Geissler,S., Musoke,R., Naidoo,D., Ndyahikayo,J., Njie,G., Phan,M., Rambaut,A. and Sesay,F.

    Morbidity and Mortality Weekly Report 2016;65;26;681-3

  • Dihydroartemisinin-piperaquine resistance in Plasmodium falciparum malaria in Cambodia: a multisite prospective cohort study.

    Amaratunga C, Lim P, Suon S, Sreng S, Mao S, Sopha C, Sam B, Dek D, Try V, Amato R, Blessborn D, Song L, Tullo GS, Fay MP, Anderson JM, Tarning J and Fairhurst RM

    Laboratory of Malaria and Vector Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD, USA.

    Background: Artemisinin resistance in Plasmodium falciparum threatens to reduce the efficacy of artemisinin combination therapies (ACTs), thus compromising global efforts to eliminate malaria. Recent treatment failures with dihydroartemisinin-piperaquine, the current first-line ACT in Cambodia, suggest that piperaquine resistance may be emerging in this country. We explored the relation between artemisinin resistance and dihydroartemisinin-piperaquine failures, and sought to confirm the presence of piperaquine-resistant P falciparum infections in Cambodia.

    Methods: In this prospective cohort study, we enrolled patients aged 2-65 years with uncomplicated P falciparum malaria in three Cambodian provinces: Pursat, Preah Vihear, and Ratanakiri. Participants were given standard 3-day courses of dihydroartemisinin-piperaquine. Peripheral blood parasite densities were measured until parasites cleared and then weekly to 63 days. The primary outcome was recrudescent P falciparum parasitaemia within 63 days. We measured piperaquine plasma concentrations at baseline, 7 days, and day of recrudescence. We assessed phenotypic and genotypic markers of drug resistance in parasite isolates. The study is registered with, number NCT01736319.

    Findings: Between Sept 4, 2012, and Dec 31, 2013, we enrolled 241 participants. In Pursat, where artemisinin resistance is entrenched, 37 (46%) of 81 patients had parasite recrudescence. In Preah Vihear, where artemisinin resistance is emerging, ten (16%) of 63 patients had recrudescence and in Ratanakiri, where artemisinin resistance is rare, one (2%) of 60 patients did. Patients with recrudescent P falciparum infections were more likely to have detectable piperaquine plasma concentrations at baseline compared with non-recrudescent patients, but did not differ significantly in age, initial parasite density, or piperaquine plasma concentrations at 7 days. Recrudescent parasites had a higher prevalence of kelch13 mutations, higher piperaquine 50% inhibitory concentration (IC50) values, and lower mefloquine IC50 values; none had multiple pfmdr1 copies, a genetic marker of mefloquine resistance.

    Interpretation: Dihydroartemisinin-piperaquine failures are caused by both artemisinin and piperaquine resistance, and commonly occur in places where dihydroartemisinin-piperaquine has been used in the private sector. In Cambodia, artesunate plus mefloquine may be a viable option to treat dihydroartemisinin-piperaquine failures, and a more effective first-line ACT in areas where dihydroartemisinin-piperaquine failures are common. The use of single low-dose primaquine to eliminate circulating gametocytes is needed in areas where artemisinin and ACT resistance is prevalent.

    Funding: National Institute of Allergy and Infectious Diseases.

    Funded by: Intramural NIH HHS: Z01 AI001000-01, Z01 AI001000-02; Wellcome Trust: 089275/Z/09/2

    The Lancet. Infectious diseases 2016;16;3;357-65

  • Voices of biotech.

    Amit I, Baker D, Barker R, Berger B, Bertozzi C, Bhatia S, Biffi A, Demichelis F, Doudna J, Dowdy SF, Endy D, Helmstaedter M, Junca H, June C, Kamb S, Khvorova A, Kim DH, Kim JS, Krishnan Y, Lakadamyali M, Lappalainen T, Lewin S, Liao J, Loman N, Lundberg E, Lynd L, Martin C, Mellman I, Miyawaki A, Mummery C, Nelson K, Paz J, Peralta-Yahya P, Picotti P, Polyak K, Prather K, Qin J, Quake S, Regev A, Rogers JA, Shetty R, Sommer M, Stevens M, Stolovitzky G, Takahashi M, Tang F, Teichmann S, Torres-Padilla ME, Tripathi L, Vemula P, Verdine G, Vollmer F, Wang J, Ying JY, Zhang F and Zhang T

    Weizmann Institute of Science, Rehovot, Israel.

    Nature biotechnology 2016;34;3;270-5

  • The OncoArray Consortium: a Network for Understanding the Genetic Architecture of Common Cancers.

    Amos CI, Dennis J, Wang Z, Byun J, Schumacher FR, Gayther SA, Casey G, Hunter DJ, Sellers TA, Gruber SB, Dunning AM, Michailidou K, Fachal L, Doheny K, Spurdle AB, Li Y, Xiao X, Romm J, Pugh E, Coetzee GA, Hazelett DJ, Bojesen SE, Caga-Anan C, Haiman CA, Kamal A, Luccarini C, Tessier D, Vincent D, Bacot F, Van Den Berg DJ, Nelson S, Demetriades S, Goldgar DE, Couch FJ, Forman JL, Giles GG, Conti DV, Bickeböller H, Risch A, Waldenberger M, Brüske-Hohlfeld I, Hicks BD, Ling H, McGuffog L, Lee A, Kuchenbaecker K, Soucy P, Manz J, Cunningham JM, Butterbach K, Kote-Jarai Z, Kraft P, FitzGerald L, Lindstrom S, Adams M, McKay JD, Phelan CM, Benlloch S, Kelemen LE, Brennan P, Riggan M, O'Mara TA, Shen H, Shi YY, Thompson DJ, Goodman MT, Nielsen SF, Berchuck A, Laboissiere S, Schmit SL, Shelford T, Edlund CK, Taylor JA, Field JK, Park SK, Offit K, Thomassen M, Schmutzler R, Ottini L, Hung RJ, Marchini J, Amin Al Olama A, Peters U, Eeles RA, Seldin MF, Gillanders E, Seminara D, Antoniou AC, Pharoah PD, Chenevix-Trench G, Chanock SJ, Simard J and Easton DF

    Department of Biomedical Data Science, Dartmouth Geisel School of Medicine

    Background: Common cancers develop through a multistep process often including inherited susceptibility. Collaboration among multiple institutions, and funding from multiple sources, has allowed the development of an inexpensive genotyping microarray, the OncoArray. The array includes a genome-wide backbone, comprising 230,000 SNPs tagging most common genetic variants, together with dense mapping of known susceptibility regions, rare variants from sequencing experiments, pharmacogenetic markers and cancer related traits.

    Methods: The OncoArray can be genotyped using a novel technology developed by Illumina to facilitate efficient genotyping. The consortium developed standard approaches for selecting SNPs for study, for quality control of markers and for ancestry analysis. The array was genotyped at selected sites and with prespecified replicate samples to permit evaluation of genotyping accuracy among centers and by ethnic background.

    Results: The OncoArray consortium genotyped 447,705 samples. A total of 494,763 SNPs passed quality control steps with a sample success rate of 97% of the samples. Participating sites performed ancestry analysis using a common set of markers and a scoring algorithm based on principal components analysis.

    Conclusions: Results from these analyses will enable researchers to identify new susceptibility loci, perform fine mapping of new or known loci associated with either single or multiple cancers, assess the degree of overlap in cancer causation and pleiotropic effects of loci that have been identified for disease-specific risk, and jointly model genetic, environmental and lifestyle related exposures.

    Impact: Ongoing analyses will shed light on etiology and risk assessment for many types of cancer.

    Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology 2016

  • Chlamydia trachomatis from Australian Aboriginal people with trachoma are polyphyletic composed of multiple distinctive lineages.

    Andersson P, Harris SR, Seth Smith HM, Hadfield J, O'Neill C, Cutcliffe LT, Douglas FP, Asche LV, Mathews JD, Hutton SI, Sarovich DS, Tong SY, Clarke IN, Thomson NR and Giffard PM

    Global and Tropical Health Division, Menzies School of Health Research, Charles Darwin University, Darwin, Casuarina, Northern Territory 0811, Australia.

    Chlamydia trachomatis causes sexually transmitted infections and the blinding disease trachoma. Current data on C. trachomatis phylogeny show that there is only a single trachoma-causing clade, which is distinct from the lineages causing urogenital tract (UGT) and lymphogranuloma venerum diseases. Here we report the whole-genome sequences of ocular C. trachomatis isolates obtained from young children with clinical signs of trachoma in a trachoma endemic region of northern Australia. The isolates form two lineages that fall outside the classical trachoma lineage, instead being placed within UGT clades of the C. trachomatis phylogenetic tree. The Australian trachoma isolates appear to be recombinants with UGT C. trachomatis genome backbones, in which loci that encode immunodominant surface proteins (ompA and pmpEFGH) have been replaced by those characteristic of classical ocular isolates. This suggests that ocular tropism and association with trachoma are functionally associated with some sequence variants of ompA and pmpEFGH.

    Funded by: Wellcome Trust: 098051

    Nature communications 2016;7;10688

  • Notes on the implementation of FAM


    CEUR Workshop Proceedings 2016;1661;46-58

  • Parallel single-cell sequencing links transcriptional and epigenetic heterogeneity.

    Angermueller C, Clark SJ, Lee HJ, Macaulay IC, Teng MJ, Hu TX, Krueger F, Smallwood S, Ponting CP, Voet T, Kelsey G, Stegle O and Reik W

    European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK.

    We report scM&T-seq, a method for parallel single-cell genome-wide methylome and transcriptome sequencing that allows for the discovery of associations between transcriptional and epigenetic variation. Profiling of 61 mouse embryonic stem cells confirmed known links between DNA methylation and transcription. Notably, the method revealed previously unrecognized associations between heterogeneously methylated distal regulatory elements and transcription of key pluripotency genes.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council: MC_U137761446, MR/K011332/1; Wellcome Trust: 095645, 105031REIK, 105045

    Nature methods 2016;13;3;229-232

  • Deep learning for computational biology.

    Angermueller C, Pärnamaa T, Parts L and Stegle O

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge, UK.

    Technological advances in genomics and imaging have led to an explosion of molecular and cellular profiling data from large numbers of samples. This rapid increase in biological data dimension and acquisition rate is challenging conventional analysis strategies. Modern machine learning methods, such as deep learning, promise to leverage very large data sets for finding hidden structure within them, and for making accurate predictions. In this review, we discuss applications of this new breed of analysis approaches in regulatory genomics and cellular imaging. We provide background of what deep learning is, and the settings in which it can be successfully applied to derive biological insights. In addition to presenting specific applications and providing tips for practical use, we also highlight possible pitfalls and limitations to guide computational biologists when and how to make the most use of this new technology.

    Molecular systems biology 2016;12;7;878

  • Phase variation of a Type IIG restriction-modification enzyme alters site-specific methylation patterns and gene expression in Campylobacter jejuni strain NCTC11168.

    Anjum A, Brathwaite KJ, Aidley J, Connerton PL, Cummings NJ, Parkhill J, Connerton I and Bayliss CD

    Department of Genetics, University of Leicester, Leicester LE1 7RH, UK.

    Phase-variable restriction-modification systems are a feature of a diverse range of bacterial species. Stochastic, reversible switches in expression of the methyltransferase produces variation in methylation of specific sequences. Phase-variable methylation by both Type I and Type III methyltransferases is associated with altered gene expression and phenotypic variation. One phase-variable gene of Campylobacter jejuni encodes a homologue of an unusual Type IIG restriction-modification system in which the endonuclease and methyltransferase are encoded by a single gene. Using both inhibition of restriction and PacBio-derived methylome analyses of mutants and phase-variants, the cj0031c allele in C. jejuni strain NCTC11168 was demonstrated to specifically methylate adenine in 5'CCCGA and 5'CCTGA sequences. Alterations in the levels of specific transcripts were detected using RNA-Seq in phase-variants and mutants of cj0031c but these changes did not correlate with observed differences in phenotypic behaviour. Alterations in restriction of phage growth were also associated with phase variation (PV) of cj0031c and correlated with presence of sites in the genomes of these phages. We conclude that PV of a Type IIG restriction-modification system causes changes in site-specific methylation patterns and gene expression patterns that may indirectly change adaptive traits.

    Nucleic acids research 2016;44;10;4581-94

  • Species Mash-up.

    Argimón S and Aanensen DM

    Centre for Genomic Pathogen Surveillance, The Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2016;14;12;730

  • Rapid outbreak sequencing of Ebola virus in Sierra Leone identifies transmission chains linked to sporadic cases.

    Arias A, Watson SJ, Asogun D, Tobin EA, Lu J, Phan MVT, Jah U, Wadoum REG, Meredith L, Thorne L, Caddy S, Tarawalie A, Langat P, Dudas G, Faria NR, Dellicour S, Kamara A, Kargbo B, Kamara BO, Gevao S, Cooper D, Newport M, Horby P, Dunning J, Sahr F, Brooks T, Simpson AJH, Groppelli E, Liu G, Mulakken N, Rhodes K, Akpablie J, Yoti Z, Lamunu M, Vitto E, Otim P, Owilli C, Boateng I, Okoror L, Omomoh E, Oyakhilome J, Omiunu R, Yemisis I, Adomeh D, Ehikhiametalor S, Akhilomen P, Aire C, Kurth A, Cook N, Baumann J, Gabriel M, Wölfel R, Di Caro A, Carroll MW, Günther S, Redd J, Naidoo D, Pybus OG, Rambaut A, Kellam P, Goodfellow I and Cotten M

    Division of Virology, Department of Pathology, University of Cambridge, Cambridge, United Kingdom.

    To end the largest known outbreak of Ebola virus disease (EVD) in West Africa and to prevent new transmissions, rapid epidemiological tracing of cases and contacts was required. The ability to quickly identify unknown sources and chains of transmission is key to ending the EVD epidemic and of even greater importance in the context of recent reports of Ebola virus (EBOV) persistence in survivors. Phylogenetic analysis of complete EBOV genomes can provide important information on the source of any new infection. A local deep sequencing facility was established at the Mateneh Ebola Treatment Centre in central Sierra Leone. The facility included all wetlab and computational resources to rapidly process EBOV diagnostic samples into full genome sequences. We produced 554 EBOV genomes from EVD cases across Sierra Leone. These genomes provided a detailed description of EBOV evolution and facilitated phylogenetic tracking of new EVD cases. Importantly, we show that linked genomic and epidemiological data can not only support contact tracing but also identify unconventional transmission chains involving body fluids, including semen. Rapid EBOV genome sequencing, when linked to epidemiological information and a comprehensive database of virus sequences across the outbreak, provided a powerful tool for public health epidemic control efforts.

    Virus evolution 2016;2;1;vew016

  • Origin of modern syphilis and emergence of a pandemic Treponema pallidum cluster.

    Arora N, Schuenemann VJ, Jäger G, Peltzer A, Seitz A, Herbig A, Strouhal M, Grillová L, Sánchez-Busó L, Kühnert D, Bos KI, Davis LR, Mikalová L, Bruisten S, Komericki P, French P, Grant PR, Pando MA, Vaulet LG, Fermepin MR, Martinez A, Centurion Lara A, Giacani L, Norris SJ, Šmajs D, Bosshard PP, González-Candelas F, Nieselt K, Krause J and Bagheri HC

    Institute for Evolutionary Biology and Environmental Studies, University of Zurich, 8057 Zurich, Switzerland.

    The abrupt onslaught of the syphilis pandemic that started in the late fifteenth century established this devastating infectious disease as one of the most feared in human history(1). Surprisingly, despite the availability of effective antibiotic treatment since the mid-twentieth century, this bacterial infection, which is caused by Treponema pallidum subsp. pallidum (TPA), has been re-emerging globally in the last few decades with an estimated 10.6 million cases in 2008 (ref. 2). Although resistance to penicillin has not yet been identified, an increasing number of strains fail to respond to the second-line antibiotic azithromycin(3). Little is known about the genetic patterns in current infections or the evolutionary origins of the disease due to the low quantities of treponemal DNA in clinical samples and difficulties in cultivating the pathogen(4). Here, we used DNA capture and whole-genome sequencing to successfully interrogate genome-wide variation from syphilis patient specimens, combined with laboratory samples of TPA and two other subspecies. Phylogenetic comparisons based on the sequenced genomes indicate that the TPA strains examined share a common ancestor after the fifteenth century, within the early modern era. Moreover, most contemporary strains are azithromycin-resistant and are members of a globally dominant cluster, named here as SS14-Ω. The cluster diversified from a common ancestor in the mid-twentieth century subsequent to the discovery of antibiotics. Its recent phylogenetic divergence and global presence point to the emergence of a pandemic strain cluster.

    Nature microbiology 2016;2;16245

  • Trans-ethnic study design approaches for fine-mapping.

    Asimit JL, Hatzikotoulas K, McCarthy M, Morris AP and Zeggini E

    Wellcome Trust Sanger Institute, Cambridge, UK.

    Studies that traverse ancestrally diverse populations may increase power to detect novel loci and improve fine-mapping resolution of causal variants by leveraging linkage disequilibrium differences between ethnic groups. The inclusion of African ancestry samples may yield further improvements because of low linkage disequilibrium and high genetic heterogeneity. We investigate the fine-mapping resolution of trans-ethnic fixed-effects meta-analysis for five type II diabetes loci, under various settings of ancestral composition (European, East Asian, African), allelic heterogeneity, and causal variant minor allele frequency. In particular, three settings of ancestral composition were compared: (1) single ancestry (European), (2) moderate ancestral diversity (European and East Asian), and (3) high ancestral diversity (European, East Asian, and African). Our simulations suggest that the European/Asian and European ancestry-only meta-analyses consistently attain similar fine-mapping resolution. The inclusion of African ancestry samples in the meta-analysis leads to a marked improvement in fine-mapping resolution.

    Funded by: Medical Research Council: MR/K021486/1; NIDDK NIH HHS: U01 DK085545; Wellcome Trust: 098017, 098051

    European journal of human genetics : EJHG 2016;24;9;1330-6

  • A two-stage inter-rater approach for enrichment testing of variants associated with multiple traits.

    Asimit JL, Payne F, Morris AP, Cordell HJ and Barroso I

    Wellcome Trust Sanger Institute, Hinxton, UK.

    Shared genetic aetiology may explain the co-occurrence of diseases in individuals more often than expected by chance. On identifying associated variants shared between two traits, one objective is to determine whether such overlap may be explained by specific genomic characteristics (eg, functional annotation). In clinical studies, inter-rater agreement approaches assess concordance among expert opinions on the presence/absence of a complex disease for each subject. We adapt a two-stage inter-rater agreement model to the genetic association setting to identify features predictive of overlap variants, while accounting for their marginal trait associations. The resulting corrected overlap and marginal enrichment test (COMET) also assesses enrichment at the individual trait level. Multiple categories may be tested simultaneously and the method is computationally efficient, not requiring permutations to assess significance. In an extensive simulation study, COMET identifies features predictive of enrichment with high power and has well-calibrated type I error. In contrast, testing for overlap with a single-trait enrichment test has inflated type I error. COMET is applied to three glycaemic traits using a set of functional annotation categories as predictors, followed by further analyses that focus on tissue-specific regulatory variants. The results support previous findings that regulatory variants in pancreatic islets are enriched for fasting glucose-associated variants, and give insight into differences/similarities between characteristics of variants associated with glycaemic traits. Also, despite regulatory variants in pancreatic islets being enriched for variants that are marginally associated with fasting glucose and fasting insulin, there is no enrichment of shared variants between the traits.European Journal of Human Genetics advance online publication, 21 December 2016; doi:10.1038/ejhg.2016.171.

    European journal of human genetics : EJHG 2016

  • The Allelic Landscape of Human Blood Cell Trait Variation and Links to Common Complex Disease.

    Astle WJ, Elding H, Jiang T, Allen D, Ruklisa D, Mann AL, Mead D, Bouman H, Riveros-Mckay F, Kostadima MA, Lambourne JJ, Sivapalaratnam S, Downes K, Kundu K, Bomba L, Berentsen K, Bradley JR, Daugherty LC, Delaneau O, Freson K, Garner SF, Grassi L, Guerrero J, Haimel M, Janssen-Megens EM, Kaan A, Kamat M, Kim B, Mandoli A, Marchini J, Martens JH, Meacham S, Megy K, O'Connell J, Petersen R, Sharifi N, Sheard SM, Staley JR, Tuna S, van der Ent M, Walter K, Wang SY, Wheeler E, Wilder SP, Iotchkova V, Moore C, Sambrook J, Stunnenberg HG, Di Angelantonio E, Kaptoge S, Kuijpers TW, Carrillo-de-Santa-Pau E, Juan D, Rico D, Valencia A, Chen L, Ge B, Vasquez L, Kwan T, Garrido-Martín D, Watt S, Yang Y, Guigo R, Beck S, Paul DS, Pastinen T, Bujold D, Bourque G, Frontini M, Danesh J, Roberts DJ, Ouwehand WH, Butterworth AS and Soranzo N

    Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK; National Health Service (NHS) Blood and Transplant, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK; Medical Research Council Biostatistics Unit, Cambridge Institute of Public Health, Cambridge Biomedical Campus, Forvie Site, Robinson Way, Cambridge CB2 0SR, UK; MRC/BHF Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Strangeways Research Laboratory, Wort's Causeway, Cambridge CB1 8RN, UK.

    Many common variants have been associated with hematological traits, but identification of causal genes and pathways has proven challenging. We performed a genome-wide association analysis in the UK Biobank and INTERVAL studies, testing 29.5 million genetic variants for association with 36 red cell, white cell, and platelet properties in 173,480 European-ancestry participants. This effort yielded hundreds of low frequency (<5%) and rare (<1%) variants with a strong impact on blood cell phenotypes. Our data highlight general properties of the allelic architecture of complex traits, including the proportion of the heritable component of each blood trait explained by the polygenic signal across different genome regulatory domains. Finally, through Mendelian randomization, we provide evidence of shared genetic pathways linking blood cell indices with complex pathologies, including autoimmune diseases, schizophrenia, and coronary heart disease and evidence suggesting previously reported population associations between blood cell indices and cardiovascular disease may be non-causal.

    Funded by: British Heart Foundation: RG/09/012/28096; Department of Health: RP-PG-0310-1002, RP-PG-0310-1004; European Research Council: 268834; Medical Research Council

    Cell 2016;167;5;1415-1429.e19

  • A new Plasmodium vivax reference sequence with improved assembly of the subtelomeres reveals an abundance of pir genes.

    Auburn S, Böhme U, Steinbiss S, Trimarsanto H, Hostetler J, Sanders M, Gao Q, Nosten F, Newbold CI, Berriman M, Price RN and Otto TD

    Global and Tropical Health Division, Menzies School of Health Research and Charles Darwin University, Darwin, Australia.

    Plasmodium vivax is now the predominant cause of malaria in the Asia-Pacific, South America and Horn of Africa. Laboratory studies of this species are constrained by the inability to maintain the parasite in continuous ex vivo culture, but genomic approaches provide an alternative and complementary avenue to investigate the parasite's biology and epidemiology. To date, molecular studies of P. vivax have relied on the Salvador-I reference genome sequence, derived from a monkey-adapted strain from South America. However, the Salvador-I reference remains highly fragmented with over 2500 unassembled scaffolds.  Using high-depth Illumina sequence data, we assembled and annotated a new reference sequence, PvP01, sourced directly from a patient from Papua Indonesia. Draft assemblies of isolates from China (PvC01) and Thailand (PvT01) were also prepared for comparative purposes. The quality of the PvP01 assembly is improved greatly over Salvador-I, with fragmentation reduced to 226 scaffolds. Detailed manual curation has ensured highly comprehensive annotation, with functions attributed to 58% core genes in PvP01 versus 38% in Salvador-I. The assemblies of PvP01, PvC01 and PvT01 are larger than that of Salvador-I (28-30 versus 27 Mb), owing to improved assembly of the subtelomeres.  An extensive repertoire of over 1200 Plasmodium interspersed repeat (pir) genes were identified in PvP01 compared to 346 in Salvador-I, suggesting a vital role in parasite survival or development. The manually curated PvP01 reference and PvC01 and PvT01 draft assemblies are important new resources to study vivax malaria. PvP01 is maintained at GeneDB and ongoing curation will ensure continual improvements in assembly and annotation quality.

    Wellcome open research 2016;1;4

  • Genomic Analysis Reveals a Common Breakpoint in Amplifications of the Plasmodium vivax Multidrug Resistance 1 Locus in Thailand.

    Auburn S, Serre D, Pearson RD, Amato R, Sriprawat K, To S, Handayuni I, Suwanarusk R, Russell B, Drury E, Stalker J, Miotto O, Kwiatkowski DP, Nosten F and Price RN

    Global and Tropical Health Division, Menzies School of Health Research, Charles Darwin University, Australia.

    In regions of coendemicity for Plasmodium falciparum and Plasmodium vivax where mefloquine is used to treat P. falciparum infection, drug pressure mediated by increased copy numbers of the multidrug resistance 1 gene (pvmdr1) may select for mefloquine-resistant P. vivax Surveillance is not undertaken routinely owing in part to methodological challenges in detection of gene amplification. Using genomic data on 88 P. vivax samples from western Thailand, we identified pvmdr1 amplification in 17 isolates, all exhibiting tandem copies of a 37.6-kilobase pair region with identical breakpoints. A novel breakpoint-specific polymerase chain reaction assay was designed to detect the amplification. The assay demonstrated high sensitivity, identifying amplifications in 13 additional, polyclonal infections. Application to 132 further samples identified the common breakpoint in all years tested (2003-2015), with a decline in prevalence after 2012 corresponding to local discontinuation of mefloquine regimens. Assessment of the structure of pvmdr1 amplification in other geographic regions will yield information about the population-specificity of the breakpoints and underlying amplification mechanisms.

    Funded by: NIAID NIH HHS: R01 AI103228; Wellcome Trust: 091625

    The Journal of infectious diseases 2016;214;8;1235-42

  • Whole-genome sequencing of multidrug-resistant Mycobacterium tuberculosis isolates from Myanmar.

    Aung HL, Tun T, Moradigaravand D, Köser CU, Nyunt WW, Aung ST, Lwin T, Thinn KK, Crump JA, Parkhill J, Peacock SJ, Cook GM and Hill PC

    Department of Microbiology and Immunology, Otago School of Medical Sciences, University of Otago, Dunedin, New Zealand; Maurice Wilkins Centre for Molecular Biodiscovery, University of Auckland, Auckland, New Zealand. Electronic address:

    Drug-resistant tuberculosis (TB) is a major health threat in Myanmar. An initial study was conducted to explore the potential utility of whole-genome sequencing (WGS) for the diagnosis and management of drug-resistant TB in Myanmar. Fourteen multidrug-resistant Mycobacterium tuberculosis isolates were sequenced. Known resistance genes for a total of nine antibiotics commonly used in the treatment of drug-susceptible and multidrug-resistant TB (MDR-TB) in Myanmar were interrogated through WGS. All 14 isolates were MDR-TB, consistent with the results of phenotypic drug susceptibility testing (DST), and the Beijing lineage predominated. Based on the results of WGS, 9 of the 14 isolates were potentially resistant to at least one of the drugs used in the standard MDR-TB regimen but for which phenotypic DST is not conducted in Myanmar. This study highlights a need for the introduction of second-line DST as part of routine TB diagnosis in Myanmar as well as new classes of TB drugs to construct effective regimens.

    Funded by: Wellcome Trust: 098600

    Journal of global antimicrobial resistance 2016;6;113-117

  • EuPathDB: the eukaryotic pathogen genomics database resource.

    Aurrecoechea C, Barreto A, Basenko EY, Brestelli J, Brunk BP, Cade S, Crouch K, Doherty R, Falke D, Fischer S, Gajria B, Harb OS, Heiges M, Hertz-Fowler C, Hu S, Iodice J, Kissinger JC, Lawrence C, Li W, Pinney DF, Pulman JA, Roos DS, Shanmugasundram A, Silva-Franco F, Steinbiss S, Stoeckert CJ, Spruill D, Wang H, Warrenfeltz S and Zheng J

    Center for Tropical & Emerging Global Diseases, University of Georgia, Athens, GA 30602, USA.

    The Eukaryotic Pathogen Genomics Database Resource (EuPathDB, is a collection of databases covering 170+ eukaryotic pathogens (protists & fungi), along with relevant free-living and non-pathogenic species, and select pathogen hosts. To facilitate the discovery of meaningful biological relationships, the databases couple preconfigured searches with visualization and analysis tools for comprehensive data mining via intuitive graphical interfaces and APIs. All data are analyzed with the same workflows, including creation of gene orthology profiles, so data are easily compared across data sets, data types and organisms. EuPathDB is updated with numerous new analysis tools, features, data sets and data types. New tools include GO, metabolic pathway and word enrichment analyses plus an online workspace for analysis of personal, non-public, large-scale data. Expanded data content is mostly genomic and functional genomic data while new data types include protein microarray, metabolic pathways, compounds, quantitative proteomics, copy number variation, and polysomal transcriptomics. New features include consistent categorization of searches, data sets and genome browser tracks; redesigned gene pages; effective integration of alternative transcripts; and a EuPathDB Galaxy instance for private analyses of a user's data. Forthcoming upgrades include user workspaces for private integration of data with existing EuPathDB data and improved integration and presentation of host-pathogen interactions.

    Nucleic acids research 2016

  • An Integrated Genome-Wide Systems Genetics Screen for Breast Cancer Metastasis Susceptibility Genes.

    Bai L, Yang HH, Hu Y, Shukla A, Ha NH, Doran A, Faraji F, Goldberger N, Lee MP, Keane T and Hunter KW

    Laboratory of Cancer Biology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America.

    Metastasis remains the primary cause of patient morbidity and mortality in solid tumors and is due to the action of a large number of tumor-autonomous and non-autonomous factors. Here we report the results of a genome-wide integrated strategy to identify novel metastasis susceptibility candidate genes and molecular pathways in breast cancer metastasis. This analysis implicates a number of transcriptional regulators and suggests cell-mediated immunity is an important determinant. Moreover, the analysis identified novel or FDA-approved drugs as potentially useful for anti-metastatic therapy. Further explorations implementing this strategy may therefore provide a variety of information for clinical applications in the control and treatment of advanced neoplastic disease.

    Funded by: Intramural NIH HHS

    PLoS genetics 2016;12;4;e1005989

  • Travel- and Community-Based Transmission of Multidrug-Resistant Shigella sonnei Lineage among International Orthodox Jewish Communities.

    Baker KS, Dallman TJ, Behar A, Weill FX, Gouali M, Sobel J, Fookes M, Valinsky L, Gal-Mor O, Connor TR, Nissan I, Bertrand S, Parkhill J, Jenkins C, Cohen D and Thomson NR

    Shigellae are sensitive indicator species for studying trends in the international transmission of antimicrobial-resistant Enterobacteriaceae. Orthodox Jewish communities (OJCs) are a known risk group for shigellosis; Shigella sonnei is cyclically epidemic in OJCs in Israel, and sporadic outbreaks occur in OJCs elsewhere. We generated whole-genome sequences for 437 isolates of S. sonnei from OJCs and non-OJCs collected over 22 years in Europe (the United Kingdom, France, and Belgium), the United States, Canada, and Israel and analyzed these within a known global genomic context. Through phylogenetic and genomic analysis, we showed that strains from outbreaks in OJCs outside of Israel are distinct from strains in the general population and relate to a single multidrug-resistant sublineage of S. sonnei that prevails in Israel. Further Bayesian phylogenetic analysis showed that this strain emerged approximately 30 years ago, demonstrating the speed at which antimicrobial drug-resistant pathogens can spread widely through geographically dispersed, but internationally connected, communities.

    Emerging infectious diseases 2016;22;9;1545-53

  • Synthetic lethality between PAXX and XLF in mammalian development.

    Balmus G, Barros AC, Wijnhoven PW, Lescale C, Hasse HL, Boroviak K, le Sage C, Doe B, Speak AO, Galli A, Jacobsen M, Deriano L, Adams DJ, Blackford AN and Jackson SP

    Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge CB2 1QN, United Kingdom.

    PAXX was identified recently as a novel nonhomologous end-joining DNA repair factor in human cells. To characterize its physiological roles, we generated Paxx-deficient mice. Like Xlf(-/-) mice, Paxx(-/-) mice are viable, grow normally, and are fertile but show mild radiosensitivity. Strikingly, while Paxx loss is epistatic with Ku80, Lig4, and Atm deficiency, Paxx/Xlf double-knockout mice display embryonic lethality associated with genomic instability, cell death in the central nervous system, and an almost complete block in lymphogenesis, phenotypes that closely resemble those of Xrcc4(-/-) and Lig4(-/-) mice. Thus, combined loss of Paxx and Xlf is synthetic-lethal in mammals.

    Genes & development 2016;30;19;2152-2157

  • Dawning of the age of genomics for platelet granule disorders: improving insight, diagnosis and management.

    Bariana TK, Ouwehand WH, Guerrero JA, Gomez K and BRIDGE Bleeding, Thrombotic and Platelet Disorders and ThromboGenomics Consortia

    Katharine Dormandy Haemophilia Centre and Thrombosis Unit, Royal Free London NHS Foundation Trust, London, UK.

    Inherited disorders of platelet granules are clinically heterogeneous and their prevalence is underestimated because most patients do not undergo a complete diagnostic work-up. The lack of a genetic diagnosis limits the ability to tailor management, screen family members, aid with family planning, predict clinical progression and detect serious consequences, such as myelofibrosis, lung fibrosis and malignancy, in a timely manner. This is set to change with the introduction of high throughput sequencing (HTS) as a routine clinical diagnostic test. HTS diagnostic tests are now available, affordable and allow parallel screening of DNA samples for variants in all of the 80 known bleeding, thrombotic and platelet genes. Increased genetic diagnosis and curation of variants is, in turn, improving our understanding of the pathobiology and clinical course of inherited platelet disorders. Our understanding of the genetic causes of platelet granule disorders and the regulation of granule biogenesis is a work in progress and has been significantly enhanced by recent genomic discoveries from high-powered genome-wide association studies and genome sequencing projects. In the era of whole genome and epigenome sequencing, new strategies are required to integrate multiple sources of big data in the search for elusive, novel genes underlying granule disorders.

    British journal of haematology 2016

  • The TraDIS toolkit: sequencing and analysis for dense transposon mutant libraries.

    Barquist L, Mayho M, Cummins C, Cain AK, Boinett CJ, Page AJ, Langridge GC, Quail MA, Keane JA and Parkhill J

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK and Institute for Molecular Infection Biology, University of Würzburg, Würzburg D-97080, Germany.

    Unlabelled: Transposon insertion sequencing is a high-throughput technique for assaying large libraries of otherwise isogenic transposon mutants providing insight into gene essentiality, gene function and genetic interactions. We previously developed the Transposon Directed Insertion Sequencing (TraDIS) protocol for this purpose, which utilizes shearing of genomic DNA followed by specific PCR amplification of transposon-containing fragments and Illumina sequencing. Here we describe an optimized high-yield library preparation and sequencing protocol for TraDIS experiments and a novel software pipeline for analysis of the resulting data. The Bio-Tradis analysis pipeline is implemented as an extensible Perl library which can either be used as is, or as a basis for the development of more advanced analysis tools. This article can serve as a general reference for the application of the TraDIS methodology.

    Availability and implementation: The optimized sequencing protocol is included as supplementary information. The Bio-Tradis analysis pipeline is available under a GPL license at


    Supplementary information: Supplementary data are available at Bioinformatics online.

    Funded by: Medical Research Council: G1100100

    Bioinformatics (Oxford, England) 2016;32;7;1109-11

  • The need for an integrated approach for chronic disease research and care in Africa

    BARR, A.L., YOUNG, E.H., Smeeth, L., Newton, R., Seeley, J., RIPULLONE, K., HIRD, T.R., THORNTON, J.R.M., Nyirenda, M.J., Kapiga, S., Adebamowo,C.A, Amoah, A.G., Wareham,N., Rotimi, C.N., Levitt,N.S., Ramaiya, K., Hennig,B.J., Mbanya, J.C., Tollman, S., Motala,A.A., Kaleebu, P. and SANDHU, M.S.

    Global Health, Epidemiology and Genomics 2016;1;e19

  • Genome-wide association studies of quantitative glycaemic traits

    Barroso,I. and Scott,R.

    The Genetics of Type 2 Diabetes and Related Traits: Biology, Physiology and Translation 2016;63-89

  • SeqTools: visual tools for manual analysis of sequence alignments.

    Barson G and Griffiths E

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Background: Manual annotation is essential to create high-quality reference alignments and annotation. Annotators need to be able to view sequence alignments in detail. The SeqTools package provides three tools for viewing different types of sequence alignment: Blixem is a many-to-one browser of pairwise alignments, displaying multiple match sequences aligned against a single reference sequence; Dotter provides a graphical dot-plot view of a single pairwise alignment; and Belvu is a multiple sequence alignment viewer, editor, and phylogenetic tool. These tools were originally part of the AceDB genome database system but have been completely rewritten to make them generally available as a standalone package of greatly improved function.

    Findings: Blixem is used by annotators to give a detailed view of the evidence for particular gene models. Blixem displays the gene model positions and the match sequences aligned against the genomic reference sequence. Annotators use this for many reasons, including to check the quality of an alignment, to find missing/misaligned sequence and to identify splice sites and polyA sites and signals. Dotter is used to give a dot-plot representation of a particular pairwise alignment. This is used to identify sequence that is not represented (or is misrepresented) and to quickly compare annotated gene models with transcriptional and protein evidence that putatively supports them. Belvu is used to analyse conservation patterns in multiple sequence alignments and to perform a combination of manual and automatic processing of the alignment. High-quality reference alignments are essential if they are to be used as a starting point for further automatic alignment generation.

    Conclusions: While there are many different alignment tools available, the SeqTools package provides unique functionality that annotators have found to be essential for analysing sequence alignments as part of the manual annotation process.

    Funded by: NHGRI NIH HHS: 5U54HG00455-04; Wellcome Trust: 098051

    BMC research notes 2016;9;39

  • The accessory genome of Shiga toxin-producing Escherichia coli (STEC) defines a persistent colonization type in cattle.

    Barth SA, Menge C, Eichhorn I, Semmler T, Wieler LH, Pickard D, Belka A, Berens C and Geue L

    Friedrich-Loeffler-Institut/Federal Research Institute for Animal Health, Institute of Molecular Pathogenesis, Naumburger Str. 96a, 07743 Jena, Germany.

    Shiga toxin-producing Escherichia coli (STEC) strains can colonize cattle for several months and may, thus, serve as gene reservoir for the genesis of highly virulent zoonotic enterohemorrhagic E. coli (EHEC). Attempts to reduce the human risk for acquiring EHEC infections should include strategies to control such STEC strains persisting in cattle. We therefore aimed to identify genetic patterns associated with the STEC colonization type in the bovine host. We included 88 persistent (STEC(per), shedding ≥ 4 months) and 74 sporadically colonizing STEC (STEC(spo), shedding ≤ 2 months) isolates from cattle and 16 bovine STEC isolates with unknown colonization type. Genoserotype and MLST were determined and the isolates probed with a DNA microarray for virulence-associated genes (VAGs). All STEC(per) belonged to only four genoserotypes (O26:H11, O156:H25, O165:H25, O182:H25) which formed three genetic clusters (ST21/396/1705, ST300/688, ST119). In contrast, STEC(spo) were scattered among 28 genoserotypes and 30 MLST types with O157:H7 (ST11) and O6:H49 (ST1079) being the most prevalent. The microarray analysis identified 139 unique gene patterns that clustered with the genoserotypes and MLST types of the strains. While the STEC(per) isolates possessed heterogeneous phylogenetic backgrounds, the accessory genome clustered these isolates together, separating them from STEC(spo) Given the vast genetic heterogeneity of bovine STEC strains, defining genetic patterns distinguishing STEC(per) from STEC(spo) will facilitate the targeted design of new intervention strategies counteracting these zoonotic pathogens at farm level.

    Importance: Ruminants, especially cattle, are sources of food-borne infections in humans by Shiga toxin-producing Escherichia coli (STEC). Some STEC persist in cattle for longer periods of time, while others are detected only sporadically. Persisting strains can serve as gene reservoirs that supply E. coli with virulence factors, thereby generating new outbreak strains. Attempts to reduce the human risk for acquiring STEC infections should therefore include strategies to control such persisting STEC strains. By analyzing representative genes of their core and accessory genomes, we show that bovine STEC with a persistent colonization type emerged independently from sporadically colonizing isolates and evolved in parallel evolutionary branches. But, persistent colonizing strains share similar sets of accessory genes. Defining genetic patterns that distinguish persistent from sporadically colonizing STEC isolates will facilitate the targeted design of new intervention strategies to counteract these zoonotic pathogens at farm level.

    Applied and environmental microbiology 2016

  • Eye on the B-ALL: B-cell receptor repertoires reveal persistence of numerous B-lymphoblastic leukemia subclones from diagnosis to relapse.

    Bashford-Rogers RJ, Nicolaou KA, Bartram J, Goulden NJ, Loizou L, Koumas L, Chi J, Hubank M, Kellam P, Costeas PA and Vassiliou GS

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    The strongest predictor of relapse in B-cell acute lymphoblastic leukemia (B-ALL) is the level of persistence of tumor cells after initial therapy. The high mutation rate of the B-cell receptor (BCR) locus allows high-resolution tracking of the architecture, evolution and clonal dynamics of B-ALL. Using longitudinal BCR repertoire sequencing, we find that the BCR undergoes an unexpectedly high level of clonal diversification in B-ALL cells through both somatic hypermutation and secondary rearrangements, which can be used for tracking the subclonal composition of the disease and detect minimal residual disease with unprecedented sensitivity. We go on to investigate clonal dynamics of B-ALL using BCR phylogenetic analyses of paired diagnosis-relapse samples and find that large numbers of small leukemic subclones present at diagnosis re-emerge at relapse alongside a dominant clone. Our findings suggest that in all informative relapsed patients, the survival of large numbers of clonogenic cells beyond initial chemotherapy is a surrogate for inherent partial chemoresistance or inadequate therapy, providing an increased opportunity for subsequent emergence of fully resistant clones. These results frame early cytoreduction as an important determinant of long-term outcome.

    Leukemia 2016;30;12;2312-2321

  • Dynamic variation of CD5 surface expression levels within individual chronic lymphocytic leukaemia clones.

    Bashford-Rogers RJ, Palser AL, Hodkinson C, Baxter J, Follows GA, Vassiliou GS and Kellam P

    Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Hills Road, Cambridge, CB2 0SP, UK.

    Chronic lymphocytic leukaemia (CLL) is characterised by the accumulation of clonally-derived mature CD5(high) B-cells, however the cellular origin of CLL is still unknown. Patients with CLL also harbour variable numbers of CD5(low) B-cells, but the clonal relationship of these cells to the bulk disease is unknown and can have important implications for monitoring, treating and understanding the biology of CLL. Here we use B-cell receptors (BCRs) as molecular barcodes to first show that the great majority of CD5(low) B-cells in the blood of CLL patients are clonally related to CD5(high) CLL B-cells by single-cell BCR sequencing. We investigate whether CD5 state-switching was likely to occur continuously (common event) or as a rare event in CLL, by tracking somatic BCR mutations in bulk CLL B-cells and using them to reconstruct the phylogenetic relationships and evolutionary history of the CLL in each of four patients. Using statistical methods we show that there is no parsimonious route from a single or low number of CD5(low) switch events to the CD5(high) population, but rather large-scale and/or dynamic switching between these CD5 states is the most likely explanation. The overlapping BCR repertoires between CD5(high) and CD5(low) cells from CLL patient peripheral blood reveal that CLLs exist in a continuum of CD5 expression. The major proportion of CD5(low) B-cells in patients are leukemic, thus identifying CD5(low) B-cells as an important component of CLL, with implications for CLL pathogenesis, clinical monitoring and the development of anti-CD5-directed therapies.

    Experimental hematology 2016

  • Six-Year Incidence of Blindness and Visual Impairment in Kenya: The Nakuru Eye Disease Cohort Study.

    Bastawrous A, Mathenge W, Wing K, Rono H, Gichangi M, Weiss HA, Macleod D, Foster A, Burton MJ and Kuper H

    International Centre for Eye Health, Clinical Research Department, London School of Hygiene and Tropical Medicine, London, United Kingdom.

    Purpose: To describe the cumulative 6-year incidence of visual impairment (VI) and blindness in an adult Kenyan population. The Nakuru Posterior Segment Eye Disease Study is a population-based sample of 4414 participants aged ≥50 years, enrolled in 2007-2008. Of these, 2170 (50%) were reexamined in 2013-2014.

    Methods: The World Health Organization (WHO) and US definitions were used to calculate presenting visual acuity classifications based on logMAR visual acuity tests at baseline and follow-up. Detailed ophthalmic and anthropometric examinations as well as a questionnaire, which included past medical and ophthalmic history, were used to assess risk factors for study participation and vision loss. Cumulative incidence of VI and blindness, and factors associated with these outcomes, were estimated. Inverse probability weighting was used to adjust for nonparticipation.

    Results: Visual acuity measurements were available for 2164 (99.7%) participants. Using WHO definitions, the 6-year cumulative incidence of VI was 11.9% (95%CI [confidence interval]: 10.3-13.8%) and blindness was 1.51% (95%CI: 1.0-2.2%); using the US classification, the cumulative incidence of blindness was 2.70% (95%CI: 1.8-3.2%). Incidence of VI increased strongly with older age, and independently with being diabetic. There are an estimated 21 new cases of VI per year in people aged ≥50 years per 1000 people, of whom 3 are blind. Therefore in Kenya we estimate that there are 92,000 new cases of VI in people aged ≥50 years per year, of whom 11,600 are blind, out of a total population of approximately 4.3 million people aged 50 and above.

    Conclusions: The incidence of VI and blindness in this older Kenyan population was considerably higher than in comparable studies worldwide. A continued effort to strengthen the eye health system is necessary to support the growing unmet need in an aging and growing population.

    Investigative ophthalmology & visual science 2016;57;14;5974-5983

  • Circulation of multiple genotypes of H1N2 viruses in a swine farm in Italy over a two-month period.

    Beato MS, Tassoni L, Milani A, Salviato A, Di Martino G, Mion M, Bonfanti L, Monne I, Watson SJ and Fusaro A

    Istituto Zooprofilattico Sperimentale delle Venezie, Legnaro, PD, Italy. Electronic address:

    In August 2012 repeated respiratory outbreaks caused by swine influenza A virus (swIAV) were registered for a whole year in a breeding farm in northeast Italy that supplied piglets for fattening. The virus, initially characterized in the farm, was a reassortant Eurasian avian-like H1N1 (H1avN1) genotype, containing a haemagglutinin segment derived from the pandemic H1N1 (A(H1N1)pdm09) lineage. To control infection, a vaccination program using vaccines against the A(H1N1)pdm09, human-like H1N2 (H1huN2), human-like H3N2 (H3N2), and H1avN1 viruses was implemented in sows in November 2013. Vaccine efficacy was assessed by sampling nasal swabs for two months in 35-75 day-old piglets born from vaccinated sows. Complete genome sequencing of eight swIAV-positive nasal swabs collected longitudinally from piglets after the implementation of the vaccination program was conducted to investigate the virus characteristics. Over the two-month period, two different genotypes involving multiple reassortment events were detected. The unexpected circulation of multiple reassortant genotypes in such a short time highlights the complexity of the genetic diversity of swIAV and the need for a better surveillance plan, based on the combination of clinical signs, epidemiological data and whole genome characterization.

    Veterinary microbiology 2016;195;25-29

  • Mutational signatures of ionizing radiation in second malignancies.

    Behjati S, Gundem G, Wedge DC, Roberts ND, Tarpey PS, Cooke SL, Van Loo P, Alexandrov LB, Ramakrishna M, Davies H, Nik-Zainal S, Hardy C, Latimer C, Raine KM, Stebbings L, Menzies A, Jones D, Shepherd R, Butler AP, Teague JW, Jorgensen M, Khatri B, Pillay N, Shlien A, Futreal PA, Badie C, ICGC Prostate Group, McDermott U, Bova GS, Richardson AL, Flanagan AM, Stratton MR and Campbell PJ

    Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA UK.

    Ionizing radiation is a potent carcinogen, inducing cancer through DNA damage. The signatures of mutations arising in human tissues following in vivo exposure to ionizing radiation have not been documented. Here, we searched for signatures of ionizing radiation in 12 radiation-associated second malignancies of different tumour types. Two signatures of somatic mutation characterize ionizing radiation exposure irrespective of tumour type. Compared with 319 radiation-naive tumours, radiation-associated tumours carry a median extra 201 deletions genome-wide, sized 1-100 base pairs often with microhomology at the junction. Unlike deletions of radiation-naive tumours, these show no variation in density across the genome or correlation with sequence context, replication timing or chromatin structure. Furthermore, we observe a significant increase in balanced inversions in radiation-associated tumours. Both small deletions and inversions generate driver mutations. Thus, ionizing radiation generates distinctive mutational signatures that explain its carcinogenic potential.

    Nature communications 2016;7;12605

  • FINEMAP: efficient variable selection using summary data from genome-wide association studies.

    Benner C, Spencer CC, Havulinna AS, Salomaa V, Ripatti S and Pirinen M

    Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland, Department of Public Health, University of Helsinki, Helsinki, Finland.

    Motivation: The goal of fine-mapping in genomic regions associated with complex diseases and traits is to identify causal variants that point to molecular mechanisms behind the associations. Recent fine-mapping methods using summary data from genome-wide association studies rely on exhaustive search through all possible causal configurations, which is computationally expensive.

    Results: We introduce FINEMAP, a software package to efficiently explore a set of the most important causal configurations of the region via a shotgun stochastic search algorithm. We show that FINEMAP produces accurate results in a fraction of processing time of existing approaches and is therefore a promising tool for analyzing growing amounts of data produced in genome-wide association studies and emerging sequencing projects.

    Availability and implementation: FINEMAP v1.0 is freely available for Mac OS X and Linux at

    Contact: : or

    Bioinformatics (Oxford, England) 2016;32;10;1493-501

  • Deep Roots for Aboriginal Australian Y Chromosomes.

    Bergström A, Nagle N, Chen Y, McCarthy S, Pollard MO, Ayub Q, Wilcox S, Wilcox L, van Oorschot RA, McAllister P, Williams L, Xue Y, Mitchell RJ and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    Australia was one of the earliest regions outside Africa to be colonized by fully modern humans, with archaeological evidence for human presence by 47,000 years ago (47 kya) widely accepted [1, 2]. However, the extent of subsequent human entry before the European colonial age is less clear. The dingo reached Australia about 4 kya, indirectly implying human contact, which some have linked to changes in language and stone tool technology to suggest substantial cultural changes at the same time [3]. Genetic data of two kinds have been proposed to support gene flow from the Indian subcontinent to Australia at this time, as well: first, signs of South Asian admixture in Aboriginal Australian genomes have been reported on the basis of genome-wide SNP data [4]; and second, a Y chromosome lineage designated haplogroup C(∗), present in both India and Australia, was estimated to have a most recent common ancestor around 5 kya and to have entered Australia from India [5]. Here, we sequence 13 Aboriginal Australian Y chromosomes to re-investigate their divergence times from Y chromosomes in other continents, including a comparison of Aboriginal Australian and South Asian haplogroup C chromosomes. We find divergence times dating back to ∼50 kya, thus excluding the Y chromosome as providing evidence for recent gene flow from India into Australia.

    Funded by: Wellcome Trust: 098051

    Current biology : CB 2016;26;6;809-13

  • Optimized inducible shRNA and CRISPR/Cas9 platforms for in vitro studies of human development using hPSCs.

    Bertero A, Pawlowski M, Ortmann D, Snijders K, Yiangou L, Cardoso de Brito M, Brown S, Bernard WG, Cooper JD, Giacomelli E, Gambardella L, Hannan NR, Iyer D, Sampaziotis F, Serrano F, Zonneveld MC, Sinha S, Kotter M and Vallier L

    Wellcome Trust-MRC Stem Cell Institute, Anne McLaren Laboratory, University of Cambridge, Cambridge, CB2 0SZ, UK

    Inducible loss of gene function experiments are necessary to uncover mechanisms underlying development, physiology and disease. However, current methods are complex, lack robustness and do not work in multiple cell types. Here we address these limitations by developing single-step optimized inducible gene knockdown or knockout (sOPTiKD or sOPTiKO) platforms. These are based on genetic engineering of human genomic safe harbors combined with an improved tetracycline-inducible system and CRISPR/Cas9 technology. We exemplify the efficacy of these methods in human pluripotent stem cells (hPSCs), and show that generation of sOPTiKD/KO hPSCs is simple, rapid and allows tightly controlled individual or multiplexed gene knockdown or knockout in hPSCs and in a wide variety of differentiated cells. Finally, we illustrate the general applicability of this approach by investigating the function of transcription factors (OCT4 and T), cell cycle regulators (cyclin D family members) and epigenetic modifiers (DPY30). Overall, sOPTiKD and sOPTiKO provide a unique opportunity for functional analyses in multiple cell types relevant for the study of human development.

    Funded by: British Heart Foundation: FS/13/29/30024; European Research Council: 281335; Medical Research Council: MC_PC_12009, MR/L016761/1

    Development (Cambridge, England) 2016;143;23;4405-4418

  • A detailed clinical analysis of 13 patients with AUTS2 syndrome further delineates the phenotypic spectrum and underscores the behavioural phenotype.

    Beunders G, van de Kamp J, Vasudevan P, Morton J, Smets K, Kleefstra T, de Munnik SA, Schuurs-Hoeijmakers J, Ceulemans B, Zollino M, Hoffjan S, Wieczorek S, So J, Mercer L, Walker T, Velsher L, DDD study, Parker MJ, Magee AC, Elffers B, Kooy RF, Yntema HG, Meijers-Heijboer EJ and Sistermans EA

    Department of Clinical Genetics, VU University Medical Center Amsterdam, The Netherlands.

    Background: AUTS2 syndrome is an 'intellectual disability (ID) syndrome' caused by genomic rearrangements, deletions, intragenic duplications or mutations disrupting AUTS2. So far, 50 patients with AUTS2 syndrome have been described, but clinical data are limited and almost all cases involved young children.

    Methods: We present a detailed clinical description of 13 patients (including six adults) with AUTS2 syndrome who have a pathogenic mutation or deletion in AUTS2. All patients were systematically evaluated by the same clinical geneticist.

    Results: All patients have borderline to severe ID/developmental delay, 83-100% have microcephaly and feeding difficulties. Congenital malformations are rare, but mild heart defects, contractures and genital malformations do occur. There are no major health issues in the adults; the oldest of whom is now 59 years of age. Behaviour is marked by it is a friendly outgoing social interaction. Specific features of autism (like obsessive behaviour) are seen frequently (83%), but classical autism was not diagnosed in any. A mild clinical phenotype is associated with a small in-frame 5' deletions, which are often inherited. Deletions and other mutations causing haploinsufficiency of the full-length AUTS2 transcript give a more severe phenotype and occur de novo.

    Conclusions: The 13 patients with AUTS2 syndrome with unique pathogenic deletions scattered around the AUTS2 locus confirm a phenotype-genotype correlation. Despite individual variations, AUTS2 syndrome emerges as a specific ID syndrome with microcephaly, feeding difficulties, dysmorphic features and a specific behavioural phenotype.

    Journal of medical genetics 2016;53;8;523-32

  • Sperm Meets Egg: The Genetics of Mammalian Fertilization.

    Bianchi E and Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Cambridge, CB10 1SA, United Kingdom; email:

    Fertilization is the culminating event of sexual reproduction, which involves the union of the sperm and egg to form a single, genetically distinct organism. Despite the fundamental role of fertilization, the basic mechanisms involved have remained poorly understood. However, these mechanisms must involve an ordered schedule of cellular recognition events between the sperm and egg to ensure successful fusion. In this article, we review recent progress in our molecular understanding of mammalian fertilization, highlighting the areas in which genetic approaches have been particularly informative and focusing especially on the roles of secreted and cell surface proteins, expressed in a sex-specific manner, that mediate sperm-egg interactions. We discuss how the sperm interacts with the female reproductive tract, zona pellucida, and the oolemma. Finally, we review recent progress made in elucidating the mechanisms that reduce polyspermy and ensure that eggs normally fuse with only a single sperm. Expected final online publication date for the Annual Review of Genetics Volume 50 is November 23, 2016. Please see for revised estimates.

    Annual review of genetics 2016

  • Interferon-driven alterations of the host's amino acid metabolism in the pathogenesis of typhoid fever.

    Blohmke CJ, Darton TC, Jones C, Suarez NM, Waddington CS, Angus B, Zhou L, Hill J, Clare S, Kane L, Mukhopadhyay S, Schreiber F, Duque-Correa MA, Wright JC, Roumeliotis TI, Yu L, Choudhary JS, Mejias A, Ramilo O, Shanyinde M, Sztein MB, Kingsley RA, Lockhart S, Levine MM, Lynn DJ, Dougan G and Pollard AJ

    Oxford Vaccine Group, Department of Paediatrics, University of Oxford and the NIHR Oxford Biomedical Research Centre, Oxford OX3 7LE, England, UK

    Enteric fever, caused by Salmonella enterica serovar Typhi, is an important public health problem in resource-limited settings and, despite decades of research, human responses to the infection are poorly understood. In 41 healthy adults experimentally infected with wild-type S. Typhi, we detected significant cytokine responses within 12 h of bacterial ingestion. These early responses did not correlate with subsequent clinical disease outcomes and likely indicate initial host-pathogen interactions in the gut mucosa. In participants developing enteric fever after oral infection, marked transcriptional and cytokine responses during acute disease reflected dominant type I/II interferon signatures, which were significantly associated with bacteremia. Using a murine and macrophage infection model, we validated the pivotal role of this response in the expression of proteins of the host tryptophan metabolism during Salmonella infection. Corresponding alterations in tryptophan catabolites with immunomodulatory properties in serum of participants with typhoid fever confirmed the activity of this pathway, and implicate a central role of host tryptophan metabolism in the pathogenesis of typhoid fever.

    Funded by: Medical Research Council: MR/M02637X/1; NIAID NIH HHS: R01 AI036525, U01 AI082210, U19 AI057234, U19 AI082655, U19 AI089987, U19 AI109776

    The Journal of experimental medicine 2016;213;6;1061-77

  • Tissue-specific mutation accumulation in human adult stem cells during life.

    Blokzijl F, de Ligt J, Jager M, Sasselli V, Roerink S, Sasaki N, Huch M, Boymans S, Kuijk E, Prins P, Nijman IJ, Martincorena I, Mokry M, Wiegerinck CL, Middendorp S, Sato T, Schwank G, Nieuwenhuis EE, Verstegen MM, van der Laan LJ, de Jonge J, IJzermans JN, Vries RG, van de Wetering M, Stratton MR, Clevers H, Cuppen E and van Boxtel R

    Center for Molecular Medicine, Cancer Genomics Netherlands, Department of Genetics, University Medical Center Utrecht, Heidelberglaan 100, 3584CX Utrecht, The Netherlands.

    The gradual accumulation of genetic mutations in human adult stem cells (ASCs) during life is associated with various age-related diseases, including cancer. Extreme variation in cancer risk across tissues was recently proposed to depend on the lifetime number of ASC divisions, owing to unavoidable random mutations that arise during DNA replication. However, the rates and patterns of mutations in normal ASCs remain unknown. Here we determine genome-wide mutation patterns in ASCs of the small intestine, colon and liver of human donors with ages ranging from 3 to 87 years by sequencing clonal organoid cultures derived from primary multipotent cells. Our results show that mutations accumulate steadily over time in all of the assessed tissue types, at a rate of approximately 40 novel mutations per year, despite the large variation in cancer incidence among these tissues. Liver ASCs, however, have different mutation spectra compared to those of the colon and small intestine. Mutational signature analysis reveals that this difference can be attributed to spontaneous deamination of methylated cytosine residues in the colon and small intestine, probably reflecting their high ASC division rate. In liver, a signature with an as-yet-unknown underlying mechanism is predominant. Mutation spectra of driver genes in cancer show high similarity to the tissue-specific ASC mutation spectra, suggesting that intrinsic mutational processes in ASCs can initiate tumorigenesis. Notably, the inter-individual variation in mutation rate and spectra are low, suggesting tissue-specific activity of common mutational processes throughout life.

    Funded by: Worldwide Cancer Research: 16-0193

    Nature 2016;538;7624;260-264

  • Characterization of Two Distinct Nucleosome Remodeling and Deacetylase (NuRD) Complex Assemblies in Embryonic Stem Cells.

    Bode D, Yu L, Tate P, Pardo M and Choudhary J

    From the ‡Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK;

    Pluripotency and self-renewal, the defining properties of embryonic stem cells, are brought about by transcriptional programs involving an intricate network of transcription factors and chromatin remodeling complexes. The Nucleosome Remodeling and Deacetylase (NuRD) complex plays a crucial and dynamic role in the regulation of stemness and differentiation. Several NuRD-associated factors have been reported but how they are organized has not been investigated in detail. Here, we have combined affinity purification and blue native polyacrylamide gel electrophoresis followed by protein identification by mass spectrometry and protein correlation profiling to characterize the topology of the NuRD complex. Our data show that in mouse embryonic stem cells the NuRD complex is present as two distinct assemblies of differing topology with different binding partners. Cell cycle regulator Cdk2ap1 and transcription factor Sall4 associate only with the higher mass NuRD assembly. We further establish that only isoform Sall4a, and not Sall4b, associates with NuRD. By contrast, Suz12, a component of the PRC2 Polycomb repressor complex, associates with the lower mass entity. In addition, we identify and validate a novel NuRD-associated protein, Wdr5, a regulatory subunit of the MLL histone methyltransferase complex, which associates with both NuRD entities. Bioinformatic analyses of published target gene sets of these chromatin binding proteins are in agreement with these structural observations. In summary, this study provides an interesting insight into mechanistic aspects of NuRD function in stem cell biology. The relevance of our work has broader implications because of the ubiquitous nature of the NuRD complex. The strategy described here can be more broadly applicable to investigate the topology of the multiple complexes an individual protein can participate in.

    Funded by: Wellcome Trust: WT098051

    Molecular & cellular proteomics : MCP 2016;15;3;878-91

  • Mouse model of chromosome mosaicism reveals lineage-specific depletion of aneuploid cells and normal developmental potential.

    Bolton H, Graham SJ, Van der Aa N, Kumar P, Theunis K, Fernandez Gallardo E, Voet T and Zernicka-Goetz M

    Department of Physiology, Development and Neuroscience and Gurdon Institute, University of Cambridge, Downing Street, Cambridge CB2 3EG, UK.

    Most human pre-implantation embryos are mosaics of euploid and aneuploid cells. To determine the fate of aneuploid cells and the developmental potential of mosaic embryos, here we generate a mouse model of chromosome mosaicism. By treating embryos with a spindle assembly checkpoint inhibitor during the four- to eight-cell division, we efficiently generate aneuploid cells, resulting in embryo death during peri-implantation development. Live-embryo imaging and single-cell tracking in chimeric embryos, containing aneuploid and euploid cells, reveal that the fate of aneuploid cells depends on lineage: aneuploid cells in the fetal lineage are eliminated by apoptosis, whereas those in the placental lineage show severe proliferative defects. Overall, the proportion of aneuploid cells is progressively depleted from the blastocyst stage onwards. Finally, we show that mosaic embryos have full developmental potential, provided they contain sufficient euploid cells, a finding of significance for the assessment of embryo vitality in the clinic.

    Funded by: Wellcome Trust

    Nature communications 2016;7;11165

  • Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency.

    Bone WP, Washington NL, Buske OJ, Adams DR, Davis J, Draper D, Flynn ED, Girdea M, Godfrey R, Golas G, Groden C, Jacobsen J, Köhler S, Lee EM, Links AE, Markello TC, Mungall CJ, Nehrebecky M, Robinson PN, Sincan M, Soldatos AG, Tifft CJ, Toro C, Trang H, Valkanas E, Vasilevsky N, Wahl C, Wolfe LA, Boerkoel CF, Brudno M, Haendel MA, Gahl WA and Smedley D

    Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA.

    Purpose: Medical diagnosis and molecular or biochemical confirmation typically rely on the knowledge of the clinician. Although this is very difficult in extremely rare diseases, we hypothesized that the recording of patient phenotypes in Human Phenotype Ontology (HPO) terms and computationally ranking putative disease-associated sequence variants improves diagnosis, particularly for patients with atypical clinical profiles.

    Methods: Using simulated exomes and the National Institutes of Health Undiagnosed Diseases Program (UDP) patient cohort and associated exome sequence, we tested our hypothesis using Exomiser. Exomiser ranks candidate variants based on patient phenotype similarity to (i) known disease-gene phenotypes, (ii) model organism phenotypes of candidate orthologs, and (iii) phenotypes of protein-protein association neighbors.

    Results: Benchmarking showed Exomiser ranked the causal variant as the top hit in 97% of known disease-gene associations and ranked the correct seeded variant in up to 87% when detectable disease-gene associations were unavailable. Using UDP data, Exomiser ranked the causative variant(s) within the top 10 variants for 11 previously diagnosed variants and achieved a diagnosis for 4 of 23 cases undiagnosed by clinical evaluation.

    Conclusion: Structured phenotyping of patients and computational analysis are effective adjuncts for diagnosing patients with genetic disorders.Genet Med 18 6, 608-617.

    Funded by: NHGRI NIH HHS: HHSN268201300036C, U54 HG006370; NIH HHS: R24 OD011883

    Genetics in medicine : official journal of the American College of Medical Genetics 2016;18;6;608-17

  • Chromosome engineering in zygotes with CRISPR/Cas9.

    Boroviak K, Doe B, Banerjee R, Yang F and Bradley A

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, Cambridge, United Kingdom.

    Deletions, duplications, and inversions of large genomic regions covering several genes are an important class of disease causing variants in humans. Modeling these structural variants in mice requires multistep processes in ES cells, which has limited their availability. Mutant mice containing small insertions, deletions, and single nucleotide polymorphisms can be reliably generated using CRISPR/Cas9 directly in mouse zygotes. Large structural variants can be generated using CRISPR/Cas9 in ES cells, but it has not been possible to generate these directly in zygotes. We now demonstrate the direct generation of deletions, duplications and inversions of up to one million base pairs by zygote injection.

    Funded by: NIH HHS: U42OD011174; Wellcome Trust: WT098051

    Genesis (New York, N.Y. : 2000) 2016;54;2;78-85

  • GFI1(36N) as a therapeutic and prognostic marker for myelodysplastic syndrome.

    Botezatu L, Michel LC, Makishima H, Schroeder T, Germing U, Haas R, van der Reijden B, Marneth AE, Bergevoet SM, Jansen JH, Przychodzen B, Wlodarski M, Niemeyer C, Platzbecker U, Ehninger G, Unnikrishnan A, Beck D, Pimanda J, Hellström-Lindberg E, Malcovati L, Boultwood J, Pellagatti A, Papaemmanuil E, Le Coutre P, Kaeda J, Opalka B, Möröy T, Dührsen U, Maciejewski J and Khandanpour C

    Department of Hematology, West German Cancer Center, University Hospital Essen, University Duisburg-Essen, Essen, Germany.

    Inherited gene variants play an important role in malignant diseases. The transcriptional repressor growth factor independence 1 (GFI1) regulates hematopoietic stem cell (HSC) self-renewal and differentiation. A single-nucleotide polymorphism of GFI1 (rs34631763) generates a protein with an asparagine (N) instead of a serine (S) at position 36 (GFI1(36N)) and has a prevalence of 3%-5% among Caucasians. Because GFI1 regulates myeloid development, we examined the role of GFI1(36N) on the course of MDS disease. To this end, we determined allele frequencies of GFI1(36N) in four independent MDS cohorts from the Netherlands and Belgium, Germany, the ICGC consortium, and the United States. The GFI1(36N) allele frequency in the 723 MDS patients genotyped ranged between 9% and 12%. GFI1(36N) was an independent adverse prognostic factor for overall survival, acute myeloid leukemia-free survival, and event-free survival in a univariate analysis. After adjustment for age, bone marrow blast percentage, IPSS score, mutational status, and cytogenetic findings, GFI1(36N) remained an independent adverse prognostic marker. GFI1(36S) homozygous patients exhibited a sustained response to treatment with hypomethylating agents, whereas GFI1(36N) patients had a poor sustained response to this therapy. Because allele status of GFI1(36N) is readily determined using basic molecular techniques, we propose inclusion of GFI1(36N) status in future prospective studies for MDS patients to better predict prognosis and guide therapeutic decisions.

    Experimental hematology 2016;44;7;590-595.e1

  • The spotted gar genome illuminates vertebrate evolution and facilitates human-teleost comparisons.

    Braasch I, Gehrke AR, Smith JJ, Kawasaki K, Manousaki T, Pasquier J, Amores A, Desvignes T, Batzel P, Catchen J, Berlin AM, Campbell MS, Barrell D, Martin KJ, Mulley JF, Ravi V, Lee AP, Nakamura T, Chalopin D, Fan S, Wcisel D, Cañestro C, Sydes J, Beaudry FE, Sun Y, Hertel J, Beam MJ, Fasold M, Ishiyama M, Johnson J, Kehr S, Lara M, Letaw JH, Litman GW, Litman RT, Mikami M, Ota T, Saha NR, Williams L, Stadler PF, Wang H, Taylor JS, Fontenot Q, Ferrara A, Searle SM, Aken B, Yandell M, Schneider I, Yoder JA, Volff JN, Meyer A, Amemiya CT, Venkatesh B, Holland PW, Guiguen Y, Bobe J, Shubin NH, Di Palma F, Alföldi J, Lindblad-Toh K and Postlethwait JH

    Institute of Neuroscience, University of Oregon, Eugene, Oregon, USA.

    To connect human biology to fish biomedical models, we sequenced the genome of spotted gar (Lepisosteus oculatus), whose lineage diverged from teleosts before teleost genome duplication (TGD). The slowly evolving gar genome has conserved in content and size many entire chromosomes from bony vertebrate ancestors. Gar bridges teleosts to tetrapods by illuminating the evolution of immunity, mineralization and development (mediated, for example, by Hox, ParaHox and microRNA genes). Numerous conserved noncoding elements (CNEs; often cis regulatory) undetectable in direct human-teleost comparisons become apparent using gar: functional studies uncovered conserved roles for such cryptic CNEs, facilitating annotation of sequences identified in human genome-wide association studies. Transcriptomic analyses showed that the sums of expression domains and expression levels for duplicated teleost genes often approximate the patterns and levels of expression for gar genes, consistent with subfunctionalization. The gar genome provides a resource for understanding evolution after genome duplication, the origin of vertebrate genomes and the function of human regulatory sequences.

    Nature genetics 2016

  • Maternal DNA Methylation Regulates Early Trophoblast Development.

    Branco MR, King M, Perez-Garcia V, Bogutz AB, Caley M, Fineberg E, Lefebvre L, Cook SJ, Dean W, Hemberger M and Reik W

    Blizard Institute, Barts and The London School of Medicine and Dentistry, QMUL, London E1 2AT, UK. Electronic address:

    Critical roles for DNA methylation in embryonic development are well established, but less is known about its roles during trophoblast development, the extraembryonic lineage that gives rise to the placenta. We dissected the role of DNA methylation in trophoblast development by performing mRNA and DNA methylation profiling of Dnmt3a/3b mutants. We find that oocyte-derived methylation plays a major role in regulating trophoblast development but that imprinting of the key placental regulator Ascl2 is only partially responsible for these effects. We have identified several methylation-regulated genes associated with trophoblast differentiation that are involved in cell adhesion and migration, potentially affecting trophoblast invasion. Specifically, trophoblast-specific DNA methylation is linked to the silencing of Scml2, a Polycomb Repressive Complex 1 protein that drives loss of cell adhesion in methylation-deficient trophoblast. Our results reveal that maternal DNA methylation controls multiple differentiation-related and physiological processes in trophoblast via both imprinting-dependent and -independent mechanisms.

    Funded by: Biotechnology and Biological Sciences Research Council: BBS/E/B/000C0417; Canadian Institutes of Health Research: MOP-119357; Medical Research Council: MR/L00027X/1; Wellcome Trust: 095645, 101225, 101225/Z/13/Z

    Developmental cell 2016;36;2;152-63

  • eFORGE: A Tool for Identifying Cell Type-Specific Signal in Epigenomic Data.

    Breeze CE, Paul DS, van Dongen J, Butcher LM, Ambrose JC, Barrett JE, Lowe R, Rakyan VK, Iotchkova V, Frontini M, Downes K, Ouwehand WH, Laperle J, Jacques PÉ, Bourque G, Bergmann AK, Siebert R, Vellenga E, Saeed S, Matarese F, Martens JH, Stunnenberg HG, Teschendorff AE, Herrero J, Birney E, Dunham I and Beck S

    UCL Cancer Institute, University College London, London WC1E 6BT, UK. Electronic address:

    Epigenome-wide association studies (EWAS) provide an alternative approach for studying human disease through consideration of non-genetic variants such as altered DNA methylation. To advance the complex interpretation of EWAS, we developed eFORGE (, a new standalone and web-based tool for the analysis and interpretation of EWAS data. eFORGE determines the cell type-specific regulatory component of a set of EWAS-identified differentially methylated positions. This is achieved by detecting enrichment of overlap with DNase I hypersensitive sites across 454 samples (tissues, primary cell types, and cell lines) from the ENCODE, Roadmap Epigenomics, and BLUEPRINT projects. Application of eFORGE to 20 publicly available EWAS datasets identified disease-relevant cell types for several common diseases, a stem cell-like signature in cancer, and demonstrated the ability to detect cell-composition effects for EWAS performed on heterogeneous tissues. Our approach bridges the gap between large-scale epigenomics data and EWAS-derived target selection to yield insight into disease etiology.

    Funded by: British Heart Foundation: BHF_RG/09/012/28096; Department of Health: DH_RP-PG-0310-1002

    Cell reports 2016;17;8;2137-2150

  • Efficient identification of CRISPR/Cas9-induced insertions/deletions by direct germline screening in zebrafish.

    Brocal I, White RJ, Dooley CM, Carruthers SN, Clark R, Hall A, Busch-Nentwich EM, Stemple DL and Kettleborough RN

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Background: The CRISPR/Cas9 system is a prokaryotic immune system that infers resistance to foreign genetic material and is a sort of 'adaptive immunity'. It has been adapted to enable high throughput genome editing and has revolutionised the generation of targeted mutations.

    Results: We have developed a scalable analysis pipeline to identify CRISPR/Cas9 induced mutations in hundreds of samples using next generation sequencing (NGS) of amplicons. We have used this system to investigate the best way to screen mosaic Zebrafish founder individuals for germline transmission of induced mutations. Screening sperm samples from potential founders provides much better information on germline transmission rates and crucially the sequence of the particular insertions/deletions (indels) that will be transmitted. This enables us to combine screening with archiving to create a library of cryopreserved samples carrying known mutations. It also allows us to design efficient genotyping assays, making identifying F1 carriers straightforward.

    Conclusions: The methods described will streamline the production of large numbers of knockout alleles in selected genes for phenotypic analysis, complementing existing efforts using random mutagenesis.

    Funded by: Wellcome Trust: WT098051

    BMC genomics 2016;17;259

  • Calcium signalling in malaria parasites.

    Brochet M and Billker O

    Faculty of Medicine, University of Geneva, 1 Rue Michel-Servet, CH-1211 Geneva 4, Switzerland.

    Ca(2+) is a ubiquitous intracellular messenger in malaria parasites with important functions in asexual blood stages responsible for malaria symptoms, the preceding liver-stage infection and transmission through the mosquito. Intracellular messengers amplify signals by binding to effector molecules that trigger physiological changes. The characterisation of some Ca(2+) effector proteins has begun to provide insights into the vast range of biological processes controlled by Ca(2+) signalling in malaria parasites, including host cell egress and invasion, protein secretion, motility, and cell cycle regulation. Despite the importance of Ca(2+) signalling during the life cycle of malaria parasites, little is known about Ca(2+) homeostasis. Recent findings highlighted that upstream of stage-specific Ca(2+) effectors is a conserved interplay between second messengers to control critical intracellular Ca(2+) signals throughout the life cycle. The identification of the molecular mechanisms integrating stagetranscending mechanisms of Ca(2+) homeostasis in a network of stage-specific regulator and effector pathways now represents a major challenge for a meaningful understanding of Ca(2+) signalling in malaria parasites. This article is protected by copyright. All rights reserved.

    Molecular microbiology 2016

  • Quantitative insertion-site sequencing (QIseq) for high throughput phenotyping of transposon mutants.

    Bronner IF, Otto TD, Zhang M, Udenze K, Wang C, Quail MA, Jiang RH, Adams JH and Rayner JC

    Malaria Programme, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom;

    Genetic screening using random transposon insertions has been a powerful tool for uncovering biology in prokaryotes, where whole-genome saturating screens have been performed in multiple organisms. In eukaryotes, such screens have proven more problematic, in part because of the lack of a sensitive and robust system for identifying transposon insertion sites. We here describe quantitative insertion-site sequencing, or QIseq, which uses custom library preparation and Illumina sequencing technology and is able to identify insertion sites from both the 5' and 3' ends of the transposon, providing an inbuilt level of validation. The approach was developed using piggyBac mutants in the human malaria parasite Plasmodium falciparum but should be applicable to many other eukaryotic genomes. QIseq proved accurate, confirming known sites in >100 mutants, and sensitive, identifying and monitoring sites over a >10,000-fold dynamic range of sequence counts. Applying QIseq to uncloned parasites shortly after transfections revealed multiple insertions in mixed populations and suggests that >4000 independent mutants could be generated from relatively modest scales of transfection, providing a clear pathway to genome-scale screens in P. falciparum QIseq was also used to monitor the growth of pools of previously cloned mutants and reproducibly differentiated between deleterious and neutral mutations in competitive growth. Among the mutants with fitness defects was a mutant with a piggyBac insertion immediately upstream of the kelch protein K13 gene associated with artemisinin resistance, implying mutants in this gene may have competitive fitness costs. QIseq has the potential to enable the scale-up of piggyBac-mediated genetics across multiple eukaryotic systems.

    Genome research 2016;26;7;980-9

  • Antibiotics, gut bugs and the young.

    Browne H

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Two recent studies have investigated the effects of antibiotic use on the intestinal microbiota of preterm infants and young children.

    Nature reviews. Microbiology 2016;14;6;336

  • Culturing of 'unculturable' human microbiota reveals novel taxa and extensive sporulation.

    Browne HP, Forster SC, Anonye BO, Kumar N, Neville BA, Stares MD, Goulding D and Lawley TD

    Host-Microbiota Interactions Laboratory, Wellcome Trust Sanger Institute, Hinxton, UK.

    Our intestinal microbiota harbours a diverse bacterial community required for our health, sustenance and wellbeing. Intestinal colonization begins at birth and climaxes with the acquisition of two dominant groups of strict anaerobic bacteria belonging to the Firmicutes and Bacteroidetes phyla. Culture-independent, genomic approaches have transformed our understanding of the role of the human microbiome in health and many diseases. However, owing to the prevailing perception that our indigenous bacteria are largely recalcitrant to culture, many of their functions and phenotypes remain unknown. Here we describe a novel workflow based on targeted phenotypic culturing linked to large-scale whole-genome sequencing, phylogenetic analysis and computational modelling that demonstrates that a substantial proportion of the intestinal bacteria are culturable. Applying this approach to healthy individuals, we isolated 137 bacterial species from characterized and candidate novel families, genera and species that were archived as pure cultures. Whole-genome and metagenomic sequencing, combined with computational and phenotypic analysis, suggests that at least 50-60% of the bacterial genera from the intestinal microbiota of a healthy individual produce resilient spores, specialized for host-to-host transmission. Our approach unlocks the human intestinal microbiota for phenotypic analysis and reveals how a marked proportion of oxygen-sensitive intestinal bacteria can be transmitted between individuals, affecting microbiota heritability.

    Funded by: Medical Research Council: G1000214, PF451; Wellcome Trust: 098051

    Nature 2016;533;7604;543-546

  • A Biobank of Breast Cancer Explants with Preserved Intra-tumor Heterogeneity to Screen Anticancer Compounds.

    Bruna A, Rueda OM, Greenwood W, Batra AS, Callari M, Batra RN, Pogrebniak K, Sandoval J, Cassidy JW, Tufegdzic-Vidakovic A, Sammut SJ, Jones L, Provenzano E, Baird R, Eirew P, Hadfield J, Eldridge M, McLaren-Douglas A, Barthorpe A, Lightfoot H, O'Connor MJ, Gray J, Cortes J, Baselga J, Marangoni E, Welm AL, Aparicio S, Serra V, Garnett MJ and Caldas C

    Department of Oncology and Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Cambridge CB2 0RE, UK.

    The inter- and intra-tumor heterogeneity of breast cancer needs to be adequately captured in pre-clinical models. We have created a large collection of breast cancer patient-derived tumor xenografts (PDTXs), in which the morphological and molecular characteristics of the originating tumor are preserved through passaging in the mouse. An integrated platform combining in vivo maintenance of these PDTXs along with short-term cultures of PDTX-derived tumor cells (PDTCs) was optimized. Remarkably, the intra-tumor genomic clonal architecture present in the originating breast cancers was mostly preserved upon serial passaging in xenografts and in short-term cultured PDTCs. We assessed drug responses in PDTCs on a high-throughput platform and validated several ex vivo responses in vivo. The biobank represents a powerful resource for pre-clinical breast cancer pharmacogenomic studies (, including identification of biomarkers of response or resistance.

    Funded by: Cancer Research UK: 9675; Marie Curie: 660060; NCI NIH HHS: P30 CA008748, R01 CA166422; Wellcome Trust

    Cell 2016;167;1;260-274.e22

  • Sugar-sweetened beverage consumption and genetic predisposition to obesity in 2 Swedish cohorts.

    Brunkwall L, Chen Y, Hindy G, Rukh G, Ericson U, Barroso I, Johansson I, Franks PW, Orho-Melander M and Renström F

    Diabetes and Cardiovascular Disease-Genetic Epidemiology and.

    Background: The consumption of sugar-sweetened beverages (SSBs), which has increased substantially during the last decades, has been associated with obesity and weight gain.

    Objective: Common genetic susceptibility to obesity has been shown to modify the association between SSB intake and obesity risk in 3 prospective cohorts from the United States. We aimed to replicate these findings in 2 large Swedish cohorts.

    Design: Data were available for 21,824 healthy participants from the Malmö Diet and Cancer study and 4902 healthy participants from the Gene-Lifestyle Interactions and Complex Traits Involved in Elevated Disease Risk Study. Self-reported SSB intake was categorized into 4 levels (seldom, low, medium, and high). Unweighted and weighted genetic risk scores (GRSs) were constructed based on 30 body mass index [(BMI) in kg/m(2)]-associated loci, and effect modification was assessed in linear regression equations by modeling the product and marginal effects of the GRS and SSB intake adjusted for age-, sex-, and cohort-specific covariates, with BMI as the outcome. In a secondary analysis, models were additionally adjusted for putative confounders (total energy intake, alcohol consumption, smoking status, and physical activity).

    Results: In an inverse variance-weighted fixed-effects meta-analysis, each SSB intake category increment was associated with a 0.18 higher BMI (SE = 0.02; P = 1.7 × 10(-20); n = 26,726). In the fully adjusted model, a nominal significant interaction between SSB intake category and the unweighted GRS was observed (P-interaction = 0.03). Comparing the participants within the top and bottom quartiles of the GRS to each increment in SSB intake was associated with 0.24 (SE = 0.04; P = 2.9 × 10(-8); n = 6766) and 0.15 (SE = 0.04; P = 1.3 × 10(-4); n = 6835) higher BMIs, respectively.

    Conclusions: The interaction observed in the Swedish cohorts is similar in magnitude to the previous analysis in US cohorts and indicates that the relation of SSB intake and BMI is stronger in people genetically predisposed to obesity.

    The American journal of clinical nutrition 2016

  • Emergence and spread of a human-transmissible multidrug-resistant nontuberculous mycobacterium.

    Bryant JM, Grogono DM, Rodriguez-Rincon D, Everall I, Brown KP, Moreno P, Verma D, Hill E, Drijkoningen J, Gilligan P, Esther CR, Noone PG, Giddings O, Bell SC, Thomson R, Wainwright CE, Coulter C, Pandey S, Wood ME, Stockwell RE, Ramsay KA, Sherrard LJ, Kidd TJ, Jabbour N, Johnson GR, Knibbs LD, Morawska L, Sly PD, Jones A, Bilton D, Laurenson I, Ruddy M, Bourke S, Bowler IC, Chapman SJ, Clayton A, Cullen M, Daniels T, Dempsey O, Denton M, Desai M, Drew RJ, Edenborough F, Evans J, Folb J, Humphrey H, Isalska B, Jensen-Fangel S, Jönsson B, Jones AM, Katzenstein TL, Lillebaek T, MacGregor G, Mayell S, Millar M, Modha D, Nash EF, O'Brien C, O'Brien D, Ohri C, Pao CS, Peckham D, Perrin F, Perry A, Pressler T, Prtak L, Qvist T, Robb A, Rodgers H, Schaffer K, Shafi N, van Ingen J, Walshaw M, Watson D, West N, Whitehouse J, Haworth CS, Harris SR, Ordway D, Parkhill J and Floto RA

    Wellcome Trust Sanger Institute, Hinxton, UK.

    Lung infections with Mycobacterium abscessus, a species of multidrug-resistant nontuberculous mycobacteria, are emerging as an important global threat to individuals with cystic fibrosis (CF), in whom M. abscessus accelerates inflammatory lung damage, leading to increased morbidity and mortality. Previously, M. abscessus was thought to be independently acquired by susceptible individuals from the environment. However, using whole-genome analysis of a global collection of clinical isolates, we show that the majority of M. abscessus infections are acquired through transmission, potentially via fomites and aerosols, of recently emerged dominant circulating clones that have spread globally. We demonstrate that these clones are associated with worse clinical outcomes, show increased virulence in cell-based and mouse infection models, and thus represent an urgent international infection challenge.

    Funded by: Medical Research Council: G1001712; Wellcome Trust

    Science (New York, N.Y.) 2016;354;6313;751-757

  • Mitochondrial Protein Lipoylation and the 2-Oxoglutarate Dehydrogenase Complex Controls HIF1α Stability in Aerobic Conditions.

    Burr SP, Costa AS, Grice GL, Timms RT, Lobb IT, Freisinger P, Dodd RB, Dougan G, Lehner PJ, Frezza C and Nathan JA

    Department of Medicine, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, CB2 0XY, UK.

    Hypoxia-inducible transcription factors (HIFs) control adaptation to low oxygen environments by activating genes involved in metabolism, angiogenesis, and redox homeostasis. The finding that HIFs are also regulated by small molecule metabolites highlights the need to understand the complexity of their cellular regulation. Here we use a forward genetic screen in near-haploid human cells to identify genes that stabilize HIFs under aerobic conditions. We identify two mitochondrial genes, oxoglutarate dehydrogenase (OGDH) and lipoic acid synthase (LIAS), which when mutated stabilize HIF1α in a non-hydroxylated form. Disruption of OGDH complex activity in OGDH or LIAS mutants promotes L-2-hydroxyglutarate formation, which inhibits the activity of the HIFα prolyl hydroxylases (PHDs) and TET 2-oxoglutarate dependent dioxygenases. We also find that PHD activity is decreased in patients with homozygous germline mutations in lipoic acid synthesis, leading to HIF1 activation. Thus, mutations affecting OGDHC activity may have broad implications for epigenetic regulation and tumorigenesis.

    Cell metabolism 2016;24;5;740-752

  • C13orf31 (FAMIN) is a central regulator of immunometabolic function.

    Cader MZ, Boroviak K, Zhang Q, Assadi G, Kempster SL, Sewell GW, Saveljeva S, Ashcroft JW, Clare S, Mukhopadhyay S, Brown KP, Tschurtschenthaler M, Raine T, Doe B, Chilvers ER, Griffin JL, Kaneider NC, Floto RA, D'Amato M, Bradley A, Wakelam MJ, Dougan G and Kaser A

    Division of Gastroenterology and Hepatology, Department of Medicine, Addenbrooke's Hospital, University of Cambridge, Cambridge, UK.

    Single-nucleotide variations in C13orf31 (LACC1) that encode p.C284R and p.I254V in a protein of unknown function (called 'FAMIN' here) are associated with increased risk for systemic juvenile idiopathic arthritis, leprosy and Crohn's disease. Here we set out to identify the biological mechanism affected by these coding variations. FAMIN formed a complex with fatty acid synthase (FASN) on peroxisomes and promoted flux through de novo lipogenesis to concomitantly drive high levels of fatty-acid oxidation (FAO) and glycolysis and, consequently, ATP regeneration. FAMIN-dependent FAO controlled inflammasome activation, mitochondrial and NADPH-oxidase-dependent production of reactive oxygen species (ROS), and the bactericidal activity of macrophages. As p.I254V and p.C284R resulted in diminished function and loss of function, respectively, FAMIN determined resilience to endotoxin shock. Thus, we have identified a central regulator of the metabolic function and bioenergetic state of macrophages that is under evolutionary selection and determines the risk of inflammatory and infectious disease.

    Nature immunology 2016

  • Comparative genome analysis and global phylogeny of the toxin variant Clostridium difficile PCR Ribotype 017 reveals the evolution of two independent sub-lineages.

    Cairns MD, Preston MD, Hall CL, Gerding DN, Hawkey PM, Kato H, Kim H, Kuijper EJ, Lawley TD, Pituch H, Reid S, Kullin B, Riley TV, Solomon K, Tsai PJ, Weese JS, Stabler RA and Wren BW

    Department of Pathogen Molecular Biology, London School of Hygiene and Tropical Medicine, Keppel Street, London, WC1E 7HT. UK.

    The diarrhoeal pathogen Clostridium difficile consists of at least six distinct evolutionary lineages. The RT017 lineage is anomalous as strains only express toxin B, compared to strains from other lineages that produce toxins A and B and occasionally binary toxin. Historically, RT017 were initially reported in Asia but have now been reported worldwide. We used whole genome sequencing and phylogenetic analysis to investigate the patterns of global spread and population structure of 277 RT017 isolates from animal and human origins from six continents, isolated between 1990 and 2013. We reveal two distinct evenly split sub-lineages (SL1 and SL2) of C. difficile RT017 that contain multiple independent clonal expansions. All 24 animal isolates were contained within SL1 along with human isolates suggesting potential transmission between animals and humans. Genetic analyses revealed an over representation of antibiotic resistance genes. Phylogeographic analyses show a North American origin for RT017 as has been found for the recently emerged epidemic RT027 lineage. Despite only having one toxin, RT017 strains have evolved in parallel from at least two independent sources and can readily transmit between continents.

    Journal of clinical microbiology 2016

  • A CRISPR outlook for apicomplexans.

    Carrasquilla M and Owusu CK

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2016;14;11;668

  • Whole Genome Sequence Analysis of a Large Isoniazid-Resistant Tuberculosis Outbreak in London: A Retrospective Observational Study.

    Casali N, Broda A, Harris SR, Parkhill J, Brown T and Drobniewski F

    Department of Infectious Diseases and Immunity, Imperial College London, London, United Kingdom.

    Background: A large isoniazid-resistant tuberculosis outbreak centred on London, United Kingdom, has been ongoing since 1995. The aim of this study was to investigate the power and value of whole genome sequencing (WGS) to resolve the transmission network compared to current molecular strain typing approaches, including analysis of intra-host diversity within a specimen, across body sites, and over time, with identification of genetic factors underlying the epidemiological success of this cluster.

    Methods and findings: We sequenced 344 outbreak isolates from individual patients collected over 14 y (2 February 1998-22 June 2012). This demonstrated that 96 (27.9%) were indistinguishable, and only one differed from this major clone by more than five single nucleotide polymorphisms (SNPs). The maximum number of SNPs between any pair of isolates was nine SNPs, and the modal distance between isolates was two SNPs. WGS was able to reveal the direction of transmission of tuberculosis in 16 cases within the outbreak (4.7%), including within a multidrug-resistant cluster that carried a rare rpoB mutation associated with rifampicin resistance. Eleven longitudinal pairs of patient pulmonary isolates collected up to 48 mo apart differed from each other by between zero and four SNPs. Extrapulmonary dissemination resulted in acquisition of a SNP in two of five cases. WGS analysis of 27 individual colonies cultured from a single patient specimen revealed ten loci differed amongst them, with a maximum distance between any pair of six SNPs. A limitation of this study, as in previous studies, is that indels and SNPs in repetitive regions were not assessed due to the difficulty in reliably determining this variation.

    Conclusions: Our study suggests that (1) certain paradigms need to be revised, such as the 12 SNP distance as the gold standard upper threshold to identify plausible transmissions; (2) WGS technology is helpful to rule out the possibility of direct transmission when isolates are separated by a substantial number of SNPs; (3) the concept of a transmission chain or network may not be useful in institutional or household settings; (4) the practice of isolating single colonies prior to sequencing is likely to lead to an overestimation of the number of SNPs between cases resulting from direct transmission; and (5) despite appreciable genomic diversity within a host, transmission of tuberculosis rarely results in minority variants becoming dominant. Thus, whilst WGS provided some increased resolution over variable number tandem repeat (VNTR)-based clustering, it was insufficient for inferring transmission in the majority of cases.

    PLoS medicine 2016;13;10;e1002137

  • Novel Genetic Variants for Cartilage Thickness and Hip Osteoarthritis.

    Castaño-Betancourt MC, Evans DS, Ramos YF, Boer CG, Metrustry S, Liu Y, den Hollander W, van Rooij J, Kraus VB, Yau MS, Mitchell BD, Muir K, Hofman A, Doherty M, Doherty S, Zhang W, Kraaij R, Rivadeneira F, Barrett-Connor E, Maciewicz RA, Arden N, Nelissen RG, Kloppenburg M, Jordan JM, Nevitt MC, Slagboom EP, Hart DJ, Lafeber F, Styrkarsdottir U, Zeggini E, Evangelou E, Spector TD, Uitterlinden AG, Lane NE, Meulenbelt I, Valdes AM and van Meurs JB

    Department of Internal Medicine, Erasmus Medical Center, Rotterdam, The Netherlands.

    Osteoarthritis is one of the most frequent and disabling diseases of the elderly. Only few genetic variants have been identified for osteoarthritis, which is partly due to large phenotype heterogeneity. To reduce heterogeneity, we here examined cartilage thickness, one of the structural components of joint health. We conducted a genome-wide association study of minimal joint space width (mJSW), a proxy for cartilage thickness, in a discovery set of 13,013 participants from five different cohorts and replication in 8,227 individuals from seven independent cohorts. We identified five genome-wide significant (GWS, P≤5·0×10-8) SNPs annotated to four distinct loci. In addition, we found two additional loci that were significantly replicated, but results of combined meta-analysis fell just below the genome wide significance threshold. The four novel associated genetic loci were located in/near TGFA (rs2862851), PIK3R1 (rs10471753), SLBP/FGFR3 (rs2236995), and TREH/DDX6 (rs496547), while the other two (DOT1L and SUPT3H/RUNX2) were previously identified. A systematic prioritization for underlying causal genes was performed using diverse lines of evidence. Exome sequencing data (n = 2,050 individuals) indicated that there were no rare exonic variants that could explain the identified associations. In addition, TGFA, FGFR3 and PIK3R1 were differentially expressed in OA cartilage lesions versus non-lesioned cartilage in the same individuals. In conclusion, we identified four novel loci (TGFA, PIK3R1, FGFR3 and TREH) and confirmed two loci known to be associated with cartilage thickness.The identified associations were not caused by rare exonic variants. This is the first report linking TGFA to human OA, which may serve as a new target for future therapies.

    PLoS genetics 2016;12;10;e1006260

  • EphrinB1/EphB3b Coordinate Bidirectional Epithelial-Mesenchymal Interactions Controlling Liver Morphogenesis and Laterality.

    Cayuso J, Dzementsei A, Fischer JC, Karemore G, Caviglia S, Bartholdson J, Wright GJ and Ober EA

    Division of Developmental Biology, Mill Hill Laboratories, The Francis Crick Institute, London NW7 1AA, UK.

    Positioning organs in the body often requires the movement of multiple tissues, yet the molecular and cellular mechanisms coordinating such movements are largely unknown. Here, we show that bidirectional signaling between EphrinB1 and EphB3b coordinates the movements of the hepatic endoderm and adjacent lateral plate mesoderm (LPM), resulting in asymmetric positioning of the zebrafish liver. EphrinB1 in hepatoblasts regulates directional migration and mediates interactions with the LPM, where EphB3b controls polarity and movement of the LPM. EphB3b in the LPM concomitantly repels hepatoblasts to move leftward into the liver bud. Cellular protrusions controlled by Eph/Ephrin signaling mediate hepatoblast motility and long-distance cell-cell contacts with the LPM beyond immediate tissue interfaces. Mechanistically, intracellular EphrinB1 domains mediate EphB3b-independent hepatoblast extension formation, while EpB3b interactions cause their destabilization. We propose that bidirectional short- and long-distance cell interactions between epithelial and mesenchyme-like tissues coordinate liver bud formation and laterality via cell repulsion.

    Developmental cell 2016;39;3;316-328

  • Understanding pneumococcal serotype 1 biology through population genomic analysis.

    Chaguza C, Cornick JE, Harris SR, Andam CP, Bricio-Moreno L, Yang M, Yalcin F, Ousmane S, Govindpersad S, Senghore M, Ebruke C, Du Plessis M, Kiran AM, Pluschke G, Sigauque B, McGee L, Klugman KP, Turner P, Corander J, Parkhill J, Collard JM, Antonio M, von Gottberg A, Heyderman RS, French N, Kadioglu A, Hanage WP, Everett DB, Bentley SD and PAGe Consortium

    Department of Clinical Infection, Microbiology and Immunology, Institute of Infection and Global Health, University of Liverpool, Liverpool, L69 7BE, UK.

    Background: Pneumococcus kills over one million children annually and over 90 % of these deaths occur in low-income countries especially in Sub-Saharan Africa (SSA) where HIV exacerbates the disease burden. In SSA, serotype 1 pneumococci particularly the endemic ST217 clone, causes majority of the pneumococcal disease burden. To understand the evolution of the virulent ST217 clone, we analysed ST217 whole genomes from isolates sampled from African and Asian countries.

    Methods: We analysed 226 whole genome sequences from the ST217 lineage sampled from 9 African and 4 Asian countries. We constructed a whole genome alignment and used it for phylogenetic and coalescent analyses. We also screened the genomes to determine presence of antibiotic resistance conferring genes.

    Results: Population structure analysis grouped the ST217 isolates into five sequence clusters (SCs), which were highly associated with different geographical regions and showed limited intracontinental and intercontinental spread. The SCs showed lower than expected genomic sequence, which suggested strong purifying selection and small population sizes caused by bottlenecks. Recombination rates varied between the SCs but were lower than in other successful clones such as PMEN1. African isolates showed higher prevalence of antibiotic resistance genes than Asian isolates. Interestingly, certain West African isolates harbored a defective chloramphenicol and tetracycline resistance-conferring element (Tn5253) with a deletion in the loci encoding the chloramphenicol resistance gene (cat pC194), which caused lower chloramphenicol than tetracycline resistance. Furthermore, certain genes that promote colonisation were absent in the isolates, which may contribute to serotype 1's rarity in carriage and consequently its lower recombination rates.

    Conclusions: The high phylogeographic diversity of the ST217 clone shows that this clone has been in circulation globally for a long time, which allowed its diversification and adaptation in different geographical regions. Such geographic adaptation reflects local variations in selection pressures in different locales. Further studies will be required to fully understand the biological mechanisms which makes the ST217 clone highly invasive but unable to successfully colonise the human nasopharynx for long durations which results in lower recombination rates.

    BMC infectious diseases 2016;16;1;649

  • Dataset for a Dugesia japonica de novo transcriptome assembly, utilized for defining the voltage-gated like ion channel superfamily.

    Chan JD, Zhang D, Liu X, Zarowiecki MZ, Berriman M and Marchant JS

    Department of Pharmacology, University of Minnesota Medical School, MN 55455, USA.

    This data article provides a transcriptomic resource for the free living planarian flatworm Dugesia japonica related to the research article entitled 'Utilizing the planarian voltage-gated ion channel transcriptome to resolve a role for a Ca(2+) channel in neuromuscular function and regeneration (J.D. Chan, D. Zhang, X. Liu, M. Zarowiecki, M. Berriman, J.S. Marchant, 2016) [1]. Data provided in this submission comprise sequence information for the unfiltered de novo assembly, the filtered assembly and a curated analysis of voltage-gated like (VGL) ion channel sequences mined from this resource. Availability of this data should facilitate further adoption of this model by laboratories interested in studying the role of individual genes of interest in planarian physiology and regenerative biology.

    Data in brief 2016;9;1044-1047

  • Utilizing the planarian voltage-gated ion channel transcriptome to resolve a role for a Ca(2+) channel in neuromuscular function and regeneration.

    Chan JD, Zhang D, Liu X, Zarowiecki MZ, Berriman M and Marchant JS

    Department of Pharmacology, United Kingdom.

    The robust regenerative capacity of planarian flatworms depends on the orchestration of signaling events from early wounding responses through the stem cell enacted differentiative outcomes that restore appropriate tissue types. Acute signaling events in excitable cells play an important role in determining regenerative polarity, rationalized by the discovery that sub-epidermal muscle cells express critical patterning genes known to control regenerative outcomes. These data imply a dual conductive (neuromuscular signaling) and instructive (anterior-posterior patterning) role for Ca(2+) signaling in planarian regeneration. Here, to facilitate study of acute signaling events in the excitable cell niche, we provide a de novo transcriptome assembly from the planarian Dugesia japonica allowing characterization of the diverse ionotropic portfolio of this model organism. We demonstrate the utility of this resource by proceeding to characterize the individual role of each of the planarian voltage-operated Ca(2+) channels during regeneration, and demonstrate that knockdown of a specific voltage operated Ca(2+) channel (Cav1B) that impairs muscle function uniquely creates an environment permissive for anteriorization. Provision of the full transcriptomic dataset should facilitate further investigations of molecules within the planarian voltage-gated channel portfolio to explore the role of excitable cell physiology on regenerative outcomes. This article is part of a Special Issue entitled: ECS Meeting edited by Claus Heizmann, Joachim Krebs and Jacques Haiech.

    Biochimica et biophysica acta 2016

  • Chromosome organisation during ageing and senescence.

    Chandra T and Kirschner K

    Epigenetics Programme, The Babraham Institute, Cambridge CB22 3AT, UK; The Wellcome Trust Sanger Institute, Cambridge CB10 1SA, UK. Electronic address:

    Acute cellular stress caused by oncogene activation or high levels of DNA damage can engage a tumour suppressive response, which can lead to cellular senescence. Chronic cellular stress evoked by low levels of DNA damage or telomere erosion is involved in the ageing process. In oncogene induced senescence in fibroblasts, a dramatic rearrangement of heterochromatin into foci and accumulation of constitutive heterochromatin is well documented. In contrast, a loss of heterochromatin has been described in replicative senescence and premature ageing syndromes. The distinct nuclear phenotypes that accompany the stress response highlight the differences between acute and chronic stress models, and this review will address the differences and similarities between these models with a focus on chromosome organisation and heterochromatin.

    Current opinion in cell biology 2016;40;161-167

  • Phenotypic insights into ADCY5-associated disease.

    Chang FC, Westenberger A, Dale RC, Smith M, Pall HS, Perez-Dueñas B, Grattan-Smith P, Ouvrier RA, Mahant N, Hanna BC, Hunter M, Lawson JA, Max C, Sachdev R, Meyer E, Crimmins D, Pryor D, Morris JG, Münchau A, Grozeva D, Carss KJ, Raymond L, Kurian MA, Klein C and Fung VS

    Movement Disorders Unit, Department of Neurology, Westmead Hospital, Sydney, Australia.

    Background: Adenylyl cyclase 5 (ADCY5) mutations is associated with heterogenous syndromes: familial dyskinesia and facial myokymia; paroxysmal chorea and dystonia; autosomal-dominant chorea and dystonia; and benign hereditary chorea. We provide detailed clinical data on 7 patients from six new kindreds with mutations in the ADCY5 gene, in order to expand and define the phenotypic spectrum of ADCY5 mutations.

    Methods: In 5 of the 7 patients, followed over a period of 9 to 32 years, ADCY5 was sequenced by Sanger sequencing. The other 2 unrelated patients participated in studies for undiagnosed pediatric hyperkinetic movement disorders and underwent whole-exome sequencing.

    Results: Five patients had the previously reported p.R418W ADCY5 mutation; we also identified two novel mutations at p.R418G and p.R418Q. All patients presented with motor milestone delay, infantile-onset action-induced generalized choreoathetosis, dystonia, or myoclonus, with episodic exacerbations during drowsiness being a characteristic feature. Axial hypotonia, impaired upward saccades, and intellectual disability were variable features. The p.R418G and p.R418Q mutation patients had a milder phenotype. Six of seven patients had mild functional gain with clonazepam or clobazam. One patient had bilateral globus pallidal DBS at the age of 33 with marked reduction in dyskinesia, which resulted in mild functional improvement.

    Conclusion: We further delineate the clinical features of ADCY5 gene mutations and illustrate its wide phenotypic expression. We describe mild improvement after treatment with clonazepam, clobazam, and bilateral pallidal DBS. ADCY5-associated dyskinesia may be under-recognized, and its diagnosis has important prognostic, genetic, and therapeutic implications. © 2016 The Authors. Movement Disorders published by Wiley Periodicals, Inc. on behalf of International Parkinson and Movement Disorder Society.

    Movement disorders : official journal of the Movement Disorder Society 2016

  • Identifying the effect of patient sharing on between-hospital genetic differentiation of methicillin-resistant Staphylococcus aureus.

    Chang HH, Dordel J, Donker T, Worby CJ, Feil EJ, Hanage WP, Bentley SD, Huang SS and Lipsitch M

    Department of Epidemiology, Center for Communicable Disease Dynamics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.

    Background: Methicillin-resistant Staphylococcus aureus (MRSA) is one of the most common healthcare-associated pathogens. To examine the role of inter-hospital patient sharing on MRSA transmission, a previous study collected 2,214 samples from 30 hospitals in Orange County, California and showed by spa typing that genetic differentiation decreased significantly with increased patient sharing. In the current study, we focused on the 986 samples with spa type t008 from the same population.

    Methods: We used genome sequencing to determine the effect of patient sharing on genetic differentiation between hospitals. Genetic differentiation was measured by between-hospital genetic diversity, F ST , and the proportion of nearly identical isolates between hospitals.

    Results: Surprisingly, we found very similar genetic diversity within and between hospitals, and no significant association between patient sharing and genetic differentiation measured by F ST . However, in contrast to F ST , there was a significant association between patient sharing and the proportion of nearly identical isolates between hospitals. We propose that the proportion of nearly identical isolates is more powerful at determining transmission dynamics than traditional estimators of genetic differentiation (F ST ) when gene flow between populations is high, since it is more responsive to recent transmission events. Our hypothesis was supported by the results from coalescent simulations.

    Conclusions: Our results suggested that there was a high level of gene flow between hospitals facilitated by patient sharing, and that the proportion of nearly identical isolates is more sensitive to population structure than F ST when gene flow is high.

    Funded by: NIGMS NIH HHS: U54 GM088558

    Genome medicine 2016;8;1;18

  • Extensive Proliferation of a Subset of Differentiated, Yet Plastic, Medial Vascular Smooth Muscle Cells Contribute to Neointimal Formation in Mouse Injury and Atherosclerosis Models.

    Chappell J, Harman JL, Narasimhan VM, Yu H, Foote K, Simons BD, Bennett MR and Jorgensen HF

    Cardiovascular Medicine, University of Cambridge.

    Rationale: Vascular smooth muscle cell (VSMC) accumulation is a hallmark of atherosclerosis and vascular injury. However, fundamental aspects of proliferation and the phenotypic changes within individual VSMCs, which underlie vascular disease remain unresolved. In particular, it is not known if all VSMCs proliferate and display plasticity, or whether individual cells can switch to multiple phenotypes.

    Objective: To assess whether proliferation and plasticity in disease is a general characteristic of VSMCs or a feature of a subset of cells.

    Methods and results: Using multi-color lineage labeling, we demonstrate that VSMCs in injury-induced neointimal lesions and in atherosclerotic plaques are oligoclonal, derived from few expanding cells. Lineage tracing also revealed that the progeny of individual VSMCs contribute to both alpha smooth muscle actin (aSma)-positive fibrous cap and Mac-3-expressing macrophage-like plaque core cells. Co-staining for phenotypic markers further identified a double-positive aSma+ Mac3+ cell population, which is specific to VSMC-derived plaque cells. In contrast, VSMC-derived cells generating the neointima after vascular injury generally retained expression of VSMC markers and upregulation of Mac3 was less pronounced. Monochromatic regions in atherosclerotic plaques and injury-induced neointima did not contain VSMC-derived cells expressing a different fluorescent reporter protein, suggesting that proliferation-independent VSMC migration does not make a major contribution to VSMC accumulation in vascular disease.

    Conclusions: We demonstrate that extensive proliferation of a low proportion of highly plastic VSMCs result in the observed VSMC accumulation after injury and in atherosclerotic plaques. Therapeutic targeting of these hyper-proliferating VSMCs might effectively reduce vascular disease without affecting vascular integrity.

    Circulation research 2016

  • Whole-genome sequencing of a quarter-century melioidosis outbreak in temperate Australia uncovers a region of low-prevalence endemicity.

    Chapple SN, Sarovich DS, Holden MT, Peacock SJ, Buller N, Golledge C, Mayo M, Currie BJ and Price EP

    Melbourne Medical School, University of Melbourne, Melbourne, Victoria, Australia; Global and Tropical Health Division, Menzies School of Health Research, Darwin, Northern Territory, Australia.

    Melioidosis, caused by the highly recombinogenic bacterium Burkholderia pseudomallei, is a disease with high mortality. Tracing the origin of melioidosis outbreaks and understanding how the bacterium spreads and persists in the environment are essential to protecting public and veterinary health and reducing mortality associated with outbreaks. We used whole-genome sequencing to compare isolates from a historical quarter-century outbreak that occurred between 1966 and 1991 in the Avon Valley, Western Australia, a region far outside the known range of B. pseudomallei endemicity. All Avon Valley outbreak isolates shared the same multilocus sequence type (ST-284), which has not been identified outside this region. We found substantial genetic diversity among isolates based on a comparison of genome-wide variants, with no clear correlation between genotypes and temporal, geographical or source data. We observed little evidence of recombination in the outbreak strains, indicating that genetic diversity among these isolates has primarily accrued by mutation. Phylogenomic analysis demonstrated that the isolates confidently grouped within the Australian B. pseudomallei clade, thereby ruling out introduction from a melioidosis-endemic region outside Australia. Collectively, our results point to B. pseudomallei ST-284 being present in the Avon Valley for longer than previously recognized, with its persistence and genomic diversity suggesting long-term, low-prevalence endemicity in this temperate region. Our findings provide a concerning demonstration of the potential for environmental persistence of B. pseudomallei far outside the conventional endemic regions. An expected increase in extreme weather events may reactivate latent B. pseudomallei populations in this region.

    Microbial genomics 2016;2;7;e000067

  • Canalization of genetic and pharmacological perturbations in developing primary neuronal activity patterns.

    Charlesworth P, Morton A, Eglen SJ, Komiyama NH and Grant SG

    Genes to Cognition Programme, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK. Electronic address:

    The function of the nervous system depends on the integrity of synapses and the patterning of electrical activity in brain circuits. The rapid advances in genome sequencing reveal a large number of mutations disrupting synaptic proteins, which potentially result in diseases known as synaptopathies. However, it is also evident that every normal individual carries hundreds of potentially damaging mutations. Although genetic studies in several organisms show that mutations can be masked during development by a process known as canalization, it is unknown if this occurs in the development of the electrical activity in the brain. Using longitudinal recordings of primary cultured neurons on multi-electrode arrays from mice carrying knockout mutations we report evidence of canalization in development of spontaneous activity patterns. Phenotypes in the activity patterns in young cultures from mice lacking the Gria1 subunit of the AMPA receptor were ameliorated as cultures matured. Similarly, the effects of chronic pharmacological NMDA receptor blockade diminished as cultures matured. Moreover, disturbances in activity patterns by simultaneous disruption of Gria1 and NMDA receptors were also canalized by three weeks in culture. Additional mutations and genetic variations also appeared to be canalized to varying degrees. These findings indicate that neuronal network canalization is a form of nervous system plasticity that provides resilience to developmental disruption. This article is part of the Special Issue entitled 'Synaptopathy--from Biology to Therapy'.

    Neuropharmacology 2016;100;47-55

  • Genetic Drivers of Epigenetic and Transcriptional Variation in Human Immune Cells.

    Chen L, Ge B, Casale FP, Vasquez L, Kwan T, Garrido-Martín D, Watt S, Yan Y, Kundu K, Ecker S, Datta A, Richardson D, Burden F, Mead D, Mann AL, Fernandez JM, Rowlston S, Wilder SP, Farrow S, Shao X, Lambourne JJ, Redensek A, Albers CA, Amstislavskiy V, Ashford S, Berentsen K, Bomba L, Bourque G, Bujold D, Busche S, Caron M, Chen SH, Cheung W, Delaneau O, Dermitzakis ET, Elding H, Colgiu I, Bagger FO, Flicek P, Habibi E, Iotchkova V, Janssen-Megens E, Kim B, Lehrach H, Lowy E, Mandoli A, Matarese F, Maurano MT, Morris JA, Pancaldi V, Pourfarzad F, Rehnstrom K, Rendon A, Risch T, Sharifi N, Simon MM, Sultan M, Valencia A, Walter K, Wang SY, Frontini M, Antonarakis SE, Clarke L, Yaspo ML, Beck S, Guigo R, Rico D, Martens JH, Ouwehand WH, Kuijpers TW, Paul DS, Stunnenberg HG, Stegle O, Downes K, Pastinen T and Soranzo N

    Department of Human Genetics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK; Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK.

    Characterizing the multifaceted contribution of genetic and epigenetic factors to disease phenotypes is a major challenge in human genetics and medicine. We carried out high-resolution genetic, epigenetic, and transcriptomic profiling in three major human immune cell types (CD14(+) monocytes, CD16(+) neutrophils, and naive CD4(+) T cells) from up to 197 individuals. We assess, quantitatively, the relative contribution of cis-genetic and epigenetic factors to transcription and evaluate their impact as potential sources of confounding in epigenome-wide association studies. Further, we characterize highly coordinated genetic effects on gene expression, methylation, and histone variation through quantitative trait locus (QTL) mapping and allele-specific (AS) analyses. Finally, we demonstrate colocalization of molecular trait QTLs at 345 unique immune disease loci. This expansive, high-resolution atlas of multi-omics changes yields insights into cell-type-specific correlation between diverse genomic inputs, more generalizable correlations between these inputs, and defines molecular events that may underpin complex disease risk.

    Funded by: British Heart Foundation: BHF_RG/09/012/28096; Department of Health: DH_RP-PG-0310-1002; Wellcome Trust

    Cell 2016;167;5;1398-1414.e24

  • Single-cell analysis at the threshold.

    Chen X, Love JC, Navin NE, Pachter L, Stubbington MJ, Svensson V, Sweedler JV and Teichmann SA

    Wellcome Trust Sanger Institute, Cambridge, UK.

    Nature biotechnology 2016;34;11;1111-1118

  • Genome-Wide Association Analysis of Young-Onset Stroke Identifies a Locus on Chromosome 10q25 Near HABP2.

    Cheng YC, Stanne TM, Giese AK, Ho WK, Traylor M, Amouyel P, Holliday EG, Malik R, Xu H, Kittner SJ, Cole JW, O'Connell JR, Danesh J, Rasheed A, Zhao W, Engelter S, Grond-Ginsbach C, Kamatani Y, Lathrop M, Leys D, Thijs V, Metso TM, Tatlisumak T, Pezzini A, Parati EA, Norrving B, Bevan S, Rothwell PM, Sudlow C, Slowik A, Lindgren A, Walters MR, WTCCC-2 Consortium, Jannes J, Shen J, Crosslin D, Doheny K, Laurie CC, Kanse SM, Bis JC, Fornage M, Mosley TH, Hopewell JC, Strauch K, Müller-Nurasyid M, Gieger C, Waldenberger M, Peters A, Meisinger C, Ikram MA, Longstreth WT, Meschia JF, Seshadri S, Sharma P, Worrall B, Jern C, Levi C, Dichgans M, Boncoraglio GB, Markus HS, Debette S, Rolfs A, Saleheen D and Mitchell BD

    From the Veterans Affairs Maryland Health Care System, Baltimore, MD (Y.-C.C., S.J.K., J.W.C., B.D.M.); University of Maryland School of Medicine, Baltimore (Y.-C.C., H.X., S.J.K., J.W.C., J.R.O., B.D.M.); The University of Gothenburg, Gothenburg, Sweden (T.M.S., C.J.); University of Rostock, Rostock, Germany (A.-K.G., A. Rolfs); University of Nottingham Malaysia Campus, Selangor Darul Ehsa, Malaysia (W.K.H.); University of Cambridge, Cambridge, UK (M.T., J.D., S.B., H.S.M., S.D., D.S.); Institut Pasteur de Lille, F-59000 Lille, France (P.A.); University of Newcastle, Australia (E.G.H.); Ludwig-Maximilians-Universität, Munich, Germany (R.M., K.S., M.D.); Wellcome Trust Sanger Institute, Cambridge, UK (J.D.); Center for Non-Communicable Diseases, Karachi, Pakistan (A. Rasheed, D.S.); University of Pennsylvania (W.Z., D.S.); Basel University Hospital, Switzerland (S.E.); Heidelberg University Hospital, Germany (C.G.-G.); Centre d'Étude du Polymorphisme Humain, Paris, France (Y.K.); RIKEN Center for Integrative Medical Sciences, Yokohama, Japan (Y.K.); National Genotyping Center, Evry, France (M.L.); Genome Quebec, McGill University, Montreal, Canada (M.L.); Lille University Hospital, France (D.L., S.D.); KU Leuven - University of Leuven, Leuven, Belgium (V.T.); Vesalius Research Center, VIB, Leuven, Belgium (V.T.); University Hospitals Leuven, Leuven, Belgium (V.T.); Helsinki University Central Hospital, Helsinki, Finland (T.M.M., T.T.); Università degli Studi di Brescia, Brescia, Italy (A. Pezzini); Fondazione IRCCS Istituto Neurologico Carlo Besta, Milan, Italy (E.A.P., G.B.B.); University of Lund, Sweden (B.N.); University of Oxford, John Radcliffe Hospital (P.M.R.); University of Edinburgh, Edinburgh, UK (C.S.); Jagiellonian University Medical College, Krakow, Poland (A.S.); Lund University, Lund, Sweden (A.L.); Skåne University Hospital, Lund, Sweden (A.L.); University of Glasgow, Glasgow, UK (M.R.W.); University of Adelaide, Australia (J.J.); Mount Sinai Hos

    Background and purpose: Although a genetic contribution to ischemic stroke is well recognized, only a handful of stroke loci have been identified by large-scale genetic association studies to date. Hypothesizing that genetic effects might be stronger for early- versus late-onset stroke, we conducted a 2-stage meta-analysis of genome-wide association studies, focusing on stroke cases with an age of onset <60 years.

    Methods: The discovery stage of our genome-wide association studies included 4505 cases and 21 968 controls of European, South-Asian, and African ancestry, drawn from 6 studies. In Stage 2, we selected the lead genetic variants at loci with association P<5×10(-6) and performed in silico association analyses in an independent sample of ≤1003 cases and 7745 controls.

    Results: One stroke susceptibility locus at 10q25 reached genome-wide significance in the combined analysis of all samples from the discovery and follow-up stages (rs11196288; odds ratio =1.41; P=9.5×10(-9)). The associated locus is in an intergenic region between TCF7L2 and HABP2. In a further analysis in an independent sample, we found that 2 single nucleotide polymorphisms in high linkage disequilibrium with rs11196288 were significantly associated with total plasma factor VII-activating protease levels, a product of HABP2.

    Conclusions: HABP2, which encodes an extracellular serine protease involved in coagulation, fibrinolysis, and inflammatory pathways, may be a genetic susceptibility locus for early-onset stroke.

    Stroke; a journal of cerebral circulation 2016;47;2;307-16

  • Pathogen hide-and-'seq'.

    Chewapreecha C, Bénard A and Reuter S

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2016;14;5;271

  • CRISPR-Cas9(D10A) nickase-based genotypic and phenotypic screening to enhance genome editing.

    Chiang TW, le Sage C, Larrieu D, Demir M and Jackson SP

    Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge CB2 1QN, UK.

    The RNA-guided Cas9 nuclease is being widely employed to engineer the genomes of various cells and organisms. Despite the efficient mutagenesis induced by Cas9, off-target effects have raised concerns over the system's specificity. Recently a "double-nicking" strategy using catalytic mutant Cas9(D10A) nickase has been developed to minimise off-target effects. Here, we describe a Cas9(D10A)-based screening approach that combines an All-in-One Cas9(D10A) nickase vector with fluorescence-activated cell sorting enrichment followed by high-throughput genotypic and phenotypic clonal screening strategies to generate isogenic knockouts and knock-ins highly efficiently, with minimal off-target effects. We validated this approach by targeting genes for the DNA-damage response (DDR) proteins MDC1, 53BP1, RIF1 and P53, plus the nuclear architecture proteins Lamin A/C, in three different human cell lines. We also efficiently obtained biallelic knock-in clones, using single-stranded oligodeoxynucleotides as homologous templates, for insertion of an EcoRI recognition site at the RIF1 locus and introduction of a point mutation at the histone H2AFX locus to abolish assembly of DDR factors at sites of DNA double-strand breaks. This versatile screening approach should facilitate research aimed at defining gene functions, modelling of cancers and other diseases underpinned by genetic factors, and exploring new therapeutic opportunities.

    Scientific reports 2016;6;24356

  • gEVAL - a web-based browser for evaluating genome assemblies.

    Chow W, Brugger K, Caccamo M, Sealy I, Torrance J and Howe K

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    Motivation: For most research approaches, genome analyses are dependent on the existence of a high quality genome reference assembly. However, the local accuracy of an assembly remains difficult to assess and improve. The gEVAL browser allows the user to interrogate an assembly in any region of the genome by comparing it to different datasets and evaluating the concordance. These analyses include: a wide variety of sequence alignments, comparative analyses of multiple genome assemblies, and consistency with optical and other physical maps. gEVAL highlights allelic variations, regions of low complexity, abnormal coverage, and potential sequence and assembly errors, and offers strategies for improvement. Although gEVAL focuses primarily on sequence integrity, it can also display arbitrary annotation including from Ensembl or TrackHub sources. We provide gEVAL web sites for many human, mouse, zebrafish and chicken assemblies to support the Genome Reference Consortium, and gEVAL is also downloadable to enable its use for any organism and assembly.

    Availability and implementation: Web Browser:, Plugin:


    Supplementary information: Supplementary data are available at Bioinformatics online.

    Bioinformatics (Oxford, England) 2016;32;16;2508-10

  • South Asia as a Reservoir for the Global Spread of Ciprofloxacin-Resistant Shigella sonnei: A Cross-Sectional Study.

    Chung The H, Rabaa MA, Pham Thanh D, De Lappe N, Cormican M, Valcanis M, Howden BP, Wangchuk S, Bodhidatta L, Mason CJ, Nguyen Thi Nguyen T, Vu Thuy D, Thompson CN, Phu Huong Lan N, Voong Vinh P, Ha Thanh T, Turner P, Sar P, Thwaites G, Thomson NR, Holt KE and Baker S

    The Hospital for Tropical Diseases, Wellcome Trust Major Overseas Programme, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam.

    Background: Antimicrobial resistance is a major issue in the Shigellae, particularly as a specific multidrug-resistant (MDR) lineage of Shigella sonnei (lineage III) is becoming globally dominant. Ciprofloxacin is a recommended treatment for Shigella infections. However, ciprofloxacin-resistant S. sonnei are being increasingly isolated in Asia and sporadically reported on other continents. We hypothesized that Asia is a primary hub for the recent international spread of ciprofloxacin-resistant S. sonnei.

    Methods and findings: We performed whole-genome sequencing on a collection of 60 contemporaneous ciprofloxacin-resistant S. sonnei isolated in four countries within Asia (Vietnam, n = 11; Bhutan, n = 12; Thailand, n = 1; Cambodia, n = 1) and two outside of Asia (Australia, n = 19; Ireland, n = 16). We reconstructed the recent evolutionary history of these organisms and combined these data with their geographical location of isolation. Placing these sequences into a global phylogeny, we found that all ciprofloxacin-resistant S. sonnei formed a single clade within a Central Asian expansion of lineage III. Furthermore, our data show that resistance to ciprofloxacin within S. sonnei may be globally attributed to a single clonal emergence event, encompassing sequential gyrA-S83L, parC-S80I, and gyrA-D87G mutations. Geographical data predict that South Asia is the likely primary source of these organisms, which are being regularly exported across Asia and intercontinentally into Australia, the United States and Europe. Our analysis was limited by the number of S. sonnei sequences available from diverse geographical areas and time periods, and we cannot discount the potential existence of other unsampled reservoir populations of antimicrobial-resistant S. sonnei.

    Conclusions: This study suggests that a single clone, which is widespread in South Asia, is likely driving the current intercontinental surge of ciprofloxacin-resistant S. sonnei and is capable of establishing endemic transmission in new locations. Despite being limited in geographical scope, our work has major implications for understanding the international transfer of antimicrobial-resistant pathogens, with S. sonnei acting as a tractable model for studying how antimicrobial-resistant Gram-negative bacteria spread globally.

    PLoS medicine 2016;13;8;e1002055

  • metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis.

    Cichonska A, Rousu J, Marttinen P, Kangas AJ, Soininen P, Lehtimäki T, Raitakari OT, Järvelin MR, Salomaa V, Ala-Korpela M, Ripatti S and Pirinen M

    Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland, Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland.

    Motivation: A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests.

    Results: We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness.Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies.

    Availability and implementation: Code is available at CONTACTS: or matti.pirinen@helsinki.fiSupplementary information: Supplementary data are available at Bioinformatics online.

    Bioinformatics (Oxford, England) 2016

  • Single-cell epigenomics: powerful new methods for understanding gene regulation and cell identity.

    Clark SJ, Lee HJ, Smallwood SA, Kelsey G and Reik W

    Epigenetics Programme, Babraham Institute, Cambridge, CB22 3AT, UK.

    Emerging single-cell epigenomic methods are being developed with the exciting potential to transform our knowledge of gene regulation. Here we review available techniques and future possibilities, arguing that the full potential of single-cell epigenetic studies will be realized through parallel profiling of genomic, transcriptional, and epigenetic information.

    Genome biology 2016;17;1;72

  • Comparative genomics of carriage and disease isolates of Streptococcus pneumoniae serotype 22F reveals lineage specific divergence and niche adaptation.

    Cleary DW, Devine VT, Jefferies J, Webb JS, Bentley SD, Gladstone RA, Faust SN and Clarke SC

    1. Academic Unit of Clinical and Experimental Sciences, Faculty of Medicine, University of Southampton, Southampton, UK 2. Institute for Life Sciences, University of Southampton, Southampton, UK.

    Streptococcus pneumoniaeis a major cause of meningitis, sepsis and pneumonia worldwide. Pneumococcal conjugate vaccines (PCV) have been part of the UK's childhood immunisation programme since 2006 and have significantly reduced the incidence of disease due to vaccine efficacy in reducing carriage in the population. Here we isolated two clones of 22F (an emerging serotype of clinical concern, multilocus sequence types (MLST) 433 and 698) and conducted comparative genomic analysis on four isolates, paired by ST with one of each pair being derived from carriage and the other disease (sepsis). The most compelling observation was of non-synonymous mutations inpgdA, encoding peptidoglycanN-acetylglucosamine deacetylase A, which were found in the carriage isolates of both ST433 and 698. Deacetylation of pneumococcal peptidoglycan is known to enable resistance to lysozyme upon invasion. Whilst no other clear genotypic signatures related to disease or carriage could be determined, additional intriguing comparisons between the two STs were possible. These include the presence of an intact prophage, in addition to numerous additional phage insertions, within the carriage isolate of ST433. Contrasting gene repertoires related to virulence and colonisation, including: bacteriocins, lantibiotics, and toxin-antitoxin systems, were also observed.

    Genome biology and evolution 2016

  • Cytomegalovirus-Specific IL-10-Producing CD4+ T Cells Are Governed by Type-I IFN-Induced IL-27 and Promote Virus Persistence.

    Clement M, Marsden M, Stacey MA, Abdul-Karim J, Gimeno Brias S, Costa Bento D, Scurr MJ, Ghazal P, Weaver CT, Carlesso G, Clare S, Jones SA, Godkin A, Jones GW and Humphreys IR

    Division of Infection & Immunity, Cardiff University, Cardiff, United Kingdom.

    CD4+ T cells support host defence against herpesviruses and other viral pathogens. We identified that CD4+ T cells from systemic and mucosal tissues of hosts infected with the β-herpesviridae human cytomegalovirus (HCMV) or murine cytomegalovirus (MCMV) express the regulatory cytokine interleukin (IL)-10. IL-10+CD4+ T cells co-expressed TH1-associated transcription factors and chemokine receptors. Mice lacking T cell-derived IL-10 elicited enhanced antiviral T cell responses and restricted MCMV persistence in salivary glands and secretion in saliva. Thus, IL-10+CD4+ T cells suppress antiviral immune responses against CMV. Expansion of this T-cell population in the periphery was promoted by IL-27 whereas mucosal IL-10+ T cell responses were ICOS-dependent. Infected Il27rα-deficient mice with reduced peripheral IL-10+CD4+ T cell accumulation displayed robust T cell responses and restricted MCMV persistence and shedding. Temporal inhibition experiments revealed that IL-27R signaling during initial infection was required for the suppression of T cell immunity and control of virus shedding during MCMV persistence. IL-27 production was promoted by type-I IFN, suggesting that β-herpesviridae exploit the immune-regulatory properties of this antiviral pathway to establish chronicity. Further, our data reveal that cytokine signaling events during initial infection profoundly influence virus chronicity.

    PLoS pathogens 2016;12;12;e1006050

  • Inherited determinants of Crohn's disease and ulcerative colitis phenotypes: a genetic association study.

    Cleynen I, Boucher G, Jostins L, Schumm LP, Zeissig S, Ahmad T, Andersen V, Andrews JM, Annese V, Brand S, Brant SR, Cho JH, Daly MJ, Dubinsky M, Duerr RH, Ferguson LR, Franke A, Gearry RB, Goyette P, Hakonarson H, Halfvarson J, Hov JR, Huang H, Kennedy NA, Kupcinskas L, Lawrance IC, Lee JC, Satsangi J, Schreiber S, Théâtre E, van der Meulen-de Jong AE, Weersma RK, Wilson DC, International Inflammatory Bowel Disease Genetics Consortium, Parkes M, Vermeire S, Rioux JD, Mansfield J, Silverberg MS, Radford-Smith G, McGovern DP, Barrett JC and Lees CW

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK; Department of Clinical and Experimental Medicine, TARGID, KU Leuven, Leuven, Belgium.

    Background: Crohn's disease and ulcerative colitis are the two major forms of inflammatory bowel disease; treatment strategies have historically been determined by this binary categorisation. Genetic studies have identified 163 susceptibility loci for inflammatory bowel disease, mostly shared between Crohn's disease and ulcerative colitis. We undertook the largest genotype association study, to date, in widely used clinical subphenotypes of inflammatory bowel disease with the goal of further understanding the biological relations between diseases.

    Methods: This study included patients from 49 centres in 16 countries in Europe, North America, and Australasia. We applied the Montreal classification system of inflammatory bowel disease subphenotypes to 34,819 patients (19,713 with Crohn's disease, 14,683 with ulcerative colitis) genotyped on the Immunochip array. We tested for genotype-phenotype associations across 156,154 genetic variants. We generated genetic risk scores by combining information from all known inflammatory bowel disease associations to summarise the total load of genetic risk for a particular phenotype. We used these risk scores to test the hypothesis that colonic Crohn's disease, ileal Crohn's disease, and ulcerative colitis are all genetically distinct from each other, and to attempt to identify patients with a mismatch between clinical diagnosis and genetic risk profile.

    Findings: After quality control, the primary analysis included 29,838 patients (16,902 with Crohn's disease, 12,597 with ulcerative colitis). Three loci (NOD2, MHC, and MST1 3p21) were associated with subphenotypes of inflammatory bowel disease, mainly disease location (essentially fixed over time; median follow-up of 10·5 years). Little or no genetic association with disease behaviour (which changed dramatically over time) remained after conditioning on disease location and age at onset. The genetic risk score representing all known risk alleles for inflammatory bowel disease showed strong association with disease subphenotype (p=1·65 × 10(-78)), even after exclusion of NOD2, MHC, and 3p21 (p=9·23 × 10(-18)). Predictive models based on the genetic risk score strongly distinguished colonic from ileal Crohn's disease. Our genetic risk score could also identify a small number of patients with discrepant genetic risk profiles who were significantly more likely to have a revised diagnosis after follow-up (p=6·8 × 10(-4)).

    Interpretation: Our data support a continuum of disorders within inflammatory bowel disease, much better explained by three groups (ileal Crohn's disease, colonic Crohn's disease, and ulcerative colitis) than by Crohn's disease and ulcerative colitis as currently defined. Disease location is an intrinsic aspect of a patient's disease, in part genetically determined, and the major driver to changes in disease behaviour over time.

    Funding: International Inflammatory Bowel Disease Genetics Consortium members funding sources (see Acknowledgments for full list).

    Funded by: AHRQ HHS: HS021747, R01 HS021747; Chief Scientist Office: ETM/75; Medical Research Council: G0600329, G0800675; NCI NIH HHS: P30 CA016359, R01 CA141743; NIAID NIH HHS: AI067068, U01 AI067068; NIDCR NIH HHS: U54 DE023789, U54DE023789-01; NIDDK NIH HHS: DK062413, DK062420, DK062422, DK062423, DK062429, DK062429-S1, DK062431, DK062432, DK076984, DK084554, P01 DK046763, P01DK046763, P30 DK043351, P30 DK089502, R03 DK076984, R21 DK084554, U01 DK062413, U01 DK062418, U01 DK062420, U01 DK062422, U01 DK062423, U01 DK062429, U01 DK062431, U01 DK062432; Wellcome Trust: 083948/Z/07/Z, 085475/B/08/Z, 085475/Z/08/Z, 098051, 098759

    Lancet (London, England) 2016;387;10014;156-67

  • Common polygenic variation in coeliac disease and confirmation of ZNF335 and NIFA as disease susceptibility loci.

    Coleman C, Quinn EM, Ryan AW, Conroy J, Trimble V, Mahmud N, Kennedy N, Corvin AP, Morris DW, Donohoe G, O'Morain C, MacMathuna P, Byrnes V, Kiat C, Trynka G, Wijmenga C, Kelleher D, Ennis S, Anney RJ and McManus R

    Department of Medicine, Institute of Molecular Medicine, Trinity College Dublin, St. James's Hospital, Dublin, Ireland.

    Coeliac disease (CD) is a chronic immune-mediated disease triggered by the ingestion of gluten. It has an estimated prevalence of approximately 1% in European populations. Specific HLA-DQA1 and HLA-DQB1 alleles are established coeliac susceptibility genes and are required for the presentation of gliadin to the immune system resulting in damage to the intestinal mucosa. In the largest association analysis of CD to date, 39 non-HLA risk loci were identified, 13 of which were new, in a sample of 12 014 individuals with CD and 12 228 controls using the Immunochip genotyping platform. Including the HLA, this brings the total number of known CD loci to 40. We have replicated this study in an independent Irish CD case-control population of 425 CD and 453 controls using the Immunochip platform. Using a binomial sign test, we show that the direction of the effects of previously described risk alleles were highly correlated with those reported in the Irish population, (P=2.2 × 10(-16)). Using the Polygene Risk Score (PRS) approach, we estimated that up to 35% of the genetic variance could be explained by loci present on the Immunochip (P=9 × 10(-75)). When this is limited to non-HLA loci, we explain a maximum of 4.5% of the genetic variance (P=3.6 × 10(-18)). Finally, we performed a meta-analysis of our data with the previous reports, identifying two further loci harbouring the ZNF335 and NIFA genes which now exceed genome-wide significance, taking the total number of CD susceptibility loci to 42.

    European journal of human genetics : EJHG 2016;24;2;291-7

  • Clonal analysis of stem cells in differentiation and disease.

    Colom B and Jones PH

    Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Tracking the fate of individual cells and their progeny by clonal analysis has redefined the concept of stem cells and their role in health and disease. The maintenance of cell turnover in adult tissues is achieved by the collective action of populations of stem cells with an equal likelihood of self-renewal or differentiation. Following injury stem cells exhibit striking plasticity, switching from homeostatic behavior in order to repair damaged tissues. The effects of disease states on stem cells are also being uncovered, with new insights into how somatic mutations trigger clonal expansion in early neoplasia.

    Current opinion in cell biology 2016;43;14-21

  • A survey of best practices for RNA-seq data analysis.

    Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, Szcześniak MW, Gaffney DJ, Elo LL, Zhang X and Mortazavi A

    Institute for Food and Agricultural Sciences, Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, 32603, USA.

    RNA-sequencing (RNA-seq) has a wide variety of applications, but no single analysis pipeline can be used in all cases. We review all of the major steps in RNA-seq data analysis, including experimental design, quality control, read alignment, quantification of gene and transcript levels, visualization, differential gene expression, alternative splicing, functional analysis, gene fusion detection and eQTL mapping. We highlight the challenges associated with each step. We discuss the analysis of small RNAs and the integration of RNA-seq with other functional genomics techniques. Finally, we discuss the outlook for novel technologies that are changing the state of the art in transcriptomics.

    Funded by: Medical Research Council: MC_PC_12009; Wellcome Trust

    Genome biology 2016;17;13

  • The genome of Onchocerca volvulus, agent of river blindness.

    Cotton JA, Bennuru S, Grote A, Harsha B, Tracey A, Beech R, Doyle SR, Dunn M, Hotopp JC, Holroyd N, Kikuchi T, Lambert O, Mhashilkar A, Mutowo P, Nursimulu N, Ribeiro JM, Rogers MB, Stanley E, Swapna LS, Tsai IJ, Unnasch TR, Voronin D, Parkinson J, Nutman TB, Ghedin E, Berriman M and Lustigman S

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    Human onchocerciasis is a serious neglected tropical disease caused by the filarial nematode Onchocerca volvulus that can lead to blindness and chronic disability. Control of the disease relies largely on mass administration of a single drug, and the development of new drugs and vaccines depends on a better knowledge of parasite biology. Here, we describe the chromosomes of O. volvulus and its Wolbachia endosymbiont. We provide the highest-quality sequence assembly for any parasitic nematode to date, giving a glimpse into the evolution of filarial parasite chromosomes and proteomes. This resource was used to investigate gene families with key functions that could be potentially exploited as targets for future drugs. Using metabolic reconstruction of the nematode and its endosymbiont, we identified enzymes that are likely to be essential for O. volvulus viability. In addition, we have generated a list of proteins that could be targeted by Federal-Drug-Agency-approved but repurposed drugs, providing starting points for anti-onchocerciasis drug development.

    Funded by: NIAID NIH HHS: R01 AI042328, R01 AI078314, U19 AI110820; NIH HHS: DP2 OD007372

    Nature microbiology 2016;2;16216

  • RLZAP: Relative lempel-Ziv with adaptive pointers

    Cox,A.J., Farruggia,A., Gagie,T., Puglisi,S.J. and Siren,J.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2016;9954 LNCS;1-14

  • Whole genome resequencing of the human parasite Schistosoma mansoni reveals population history and effects of selection.

    Crellen T, Allan F, David S, Durrant C, Huckvale T, Holroyd N, Emery AM, Rollinson D, Aanensen DM, Berriman M, Webster JP and Cotton JA

    Department of Infectious Disease Epidemiology, Imperial College London, St Mary's Campus, Norfolk Place, London W2 1PG, United Kingdom.

    Schistosoma mansoni is a parasitic fluke that infects millions of people in the developing world. This study presents the first application of population genomics to S. mansoni based on high-coverage resequencing data from 10 global isolates and an isolate of the closely-related Schistosoma rodhaini, which infects rodents. Using population genetic tests, we document genes under directional and balancing selection in S. mansoni that may facilitate adaptation to the human host. Coalescence modeling reveals the speciation of S. mansoni and S. rodhaini as 107.5-147.6KYA, a period which overlaps with the earliest archaeological evidence for fishing in Africa. Our results indicate that S. mansoni originated in East Africa and experienced a decline in effective population size 20-90KYA, before dispersing across the continent during the Holocene. In addition, we find strong evidence that S. mansoni migrated to the New World with the 16-19th Century Atlantic Slave Trade.

    Funded by: Medical Research Council; Wellcome Trust: 098051

    Scientific reports 2016;6;20954

  • Reduced efficacy of praziquantel against Schistosoma mansoni is associated with multiple-rounds of mass drug administration.

    Crellen T, Walker M, Lamberton PH, Kabatereine NB, Tukahebwa EM, Cotton JA and Webster JP

    Department of Infectious Disease Epidemiology and the London Centre for Neglected Tropical Disease Research, Imperial College London, St Mary's Campus, Norfolk Place, London W2 1PG, United Kingdom Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, United Kingdom Department of Pathology and Pathogen Biology, Royal Veterinary College, University of London, Hertfordshire, AL9 7TA, United Kingdom

    Background:  Mass drug administration (MDA) with praziquantel is the cornerstone of schistosomiasis control in sub-Saharan Africa. The effectiveness of this strategy is dependent on the continued high efficacy of praziquantel, however drug efficacy is rarely monitored using appropriate statistical approaches that can detect early signs of wane.

    Methods:  We conducted a repeated cross-sectional study, examining children infected with Schistosoma mansoni from 6 schools in Uganda that had previously received between 1 and 9 rounds of MDA with praziquantel. We collected up to 12 S. mansoni egg counts from 414 children aged 6-12 before and 25-27 days after treatment with praziquantel. We estimated individual patient egg reduction rates (ERRs) using a statistical model to explore the influence of covariates, including the number of prior MDA rounds.

    Results:  The average ERR among children within schools that had received 8 or 9 previous rounds of MDA (95% Bayesian credible interval (BCI) 88.23%, 93.64%) was statistically significantly lower than the average in schools that had received 5 (95% BCI 96.13%, 99.08%) or 1 (95% BCI 95.51%, 98.96%) round of MDA. We estimate that 5.11%, 4.55% and 16.42% of children from schools that had received 1, 5, and 8/9 rounds of MDA respectively had ERRs below the 90% threshold of optimal praziquantel efficacy set by the World Health Organization.

    Conclusions:  The reduced efficacy of praziquantel in schools with a higher exposure to MDA may pose a threat to the effectiveness of schistosomiasis control programs. We call for the efficacy of anthelmintic drugs used in MDA to be closely monitored.

    Clinical infectious diseases : an official publication of the Infectious Diseases Society of America 2016

  • Binding of Plasmodium falciparum Merozoite Surface Proteins DBLMSP and DBLMSP2 to Human Immunoglobulin M Is Conserved among Broadly Diverged Sequence Variants.

    Crosnier C, Iqbal Z, Knuepfer E, Maciuca S, Perrin AJ, Kamuyu G, Goulding D, Bustamante LY, Miles A, Moore SC, Dougan G, Holder AA, Kwiatkowski DP, Rayner JC, Pleass RJ and Wright GJ

    From the Cell Surface Signalling Laboratory, the Malaria Programme, and.

    Diversity at pathogen genetic loci can be driven by host adaptive immune selection pressure and may reveal proteins important for parasite biology. Population-based genome sequencing of Plasmodium falciparum, the parasite responsible for the most severe form of malaria, has highlighted two related polymorphic genes called dblmsp and dblmsp2, which encode Duffy binding-like (DBL) domain-containing proteins located on the merozoite surface but whose function remains unknown. Using recombinant proteins and transgenic parasites, we show that DBLMSP and DBLMSP2 directly and avidly bind human IgM via their DBL domains. We used whole genome sequence data from over 400 African and Asian P. falciparum isolates to show that dblmsp and dblmsp2 exhibit extreme protein polymorphism in their DBL domain, with multiple variants of two major allelic classes present in every population tested. Despite this variability, the IgM binding function was retained across diverse sequence representatives. Although this interaction did not seem to have an effect on the ability of the parasite to invade red blood cells, binding of DBLMSP and DBLMSP2 to IgM inhibited the overall immunoreactivity of these proteins to IgG from patients who had been exposed to the parasite. This suggests that IgM binding might mask these proteins from the host humoral immune system.

    The Journal of biological chemistry 2016;291;27;14285-99

  • Horizontal DNA Transfer Mechanisms of Bacteria as Weapons of Intragenomic Conflict.

    Croucher NJ, Mostowy R, Wymant C, Turner P, Bentley SD and Fraser C

    Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom.

    Horizontal DNA transfer (HDT) is a pervasive mechanism of diversification in many microbial species, but its primary evolutionary role remains controversial. Much recent research has emphasised the adaptive benefit of acquiring novel DNA, but here we argue instead that intragenomic conflict provides a coherent framework for understanding the evolutionary origins of HDT. To test this hypothesis, we developed a mathematical model of a clonally descended bacterial population undergoing HDT through transmission of mobile genetic elements (MGEs) and genetic transformation. Including the known bias of transformation toward the acquisition of shorter alleles into the model suggested it could be an effective means of counteracting the spread of MGEs. Both constitutive and transient competence for transformation were found to provide an effective defence against parasitic MGEs; transient competence could also be effective at permitting the selective spread of MGEs conferring a benefit on their host bacterium. The coordination of transient competence with cell-cell killing, observed in multiple species, was found to result in synergistic blocking of MGE transmission through releasing genomic DNA for homologous recombination while simultaneously reducing horizontal MGE spread by lowering the local cell density. To evaluate the feasibility of the functions suggested by the modelling analysis, we analysed genomic data from longitudinal sampling of individuals carrying Streptococcus pneumoniae. This revealed the frequent within-host coexistence of clonally descended cells that differed in their MGE infection status, a necessary condition for the proposed mechanism to operate. Additionally, we found multiple examples of MGEs inhibiting transformation through integrative disruption of genes encoding the competence machinery across many species, providing evidence of an ongoing "arms race." Reduced rates of transformation have also been observed in cells infected by MGEs that reduce the concentration of extracellular DNA through secretion of DNases. Simulations predicted that either mechanism of limiting transformation would benefit individual MGEs, but also that this tactic's effectiveness was limited by competition with other MGEs coinfecting the same cell. A further observed behaviour we hypothesised to reduce elimination by transformation was MGE activation when cells become competent. Our model predicted that this response was effective at counteracting transformation independently of competing MGEs. Therefore, this framework is able to explain both common properties of MGEs, and the seemingly paradoxical bacterial behaviours of transformation and cell-cell killing within clonally related populations, as the consequences of intragenomic conflict between self-replicating chromosomes and parasitic MGEs. The antagonistic nature of the different mechanisms of HDT over short timescales means their contribution to bacterial evolution is likely to be substantially greater than previously appreciated.

    PLoS biology 2016;14;3;e1002394

  • Respiratory microbiota resistance and resilience to pulmonary exacerbation and subsequent antimicrobial intervention.

    Cuthbertson L, Rogers GB, Walker AW, Oliver A, Green LE, Daniels TW, Carroll MP, Parkhill J, Bruce KD and van der Gast CJ

    NERC Centre for Ecology & Hydrology, Wallingford, UK.

    Pulmonary symptoms in cystic fibrosis (CF) begin in early life with chronic lung infections and concomitant airway inflammation leading to progressive loss of lung function. Gradual pulmonary function decline is interspersed with periods of acute worsening of respiratory symptoms known as CF pulmonary exacerbations (CFPEs). Cumulatively, CFPEs are associated with more rapid disease progression. In this study multiple sputum samples were collected from adult CF patients over the course of CFPEs to better understand how changes in microbiota are associated with CFPE onset and management. Data were divided into five clinical periods: pre-CFPE baseline, CFPE, antibiotic treatment, recovery, and post-CFPE baseline. Samples were treated with propidium monoazide prior to DNA extraction, to remove the impact of bacterial cell death artefacts following antibiotic treatment, and then characterised by 16S rRNA gene-targeted high-throughput sequencing. Partitioning CF microbiota into core and rare groups revealed compositional resistance to CFPE and resilience to antibiotics interventions. Mixed effects modelling of core microbiota members revealed no significant negative impact on the relative abundance of Pseudomonas aeruginosa across the exacerbation cycle. Our findings have implications for current CFPE management strategies, supporting reassessment of existing antimicrobial treatment regimens, as antimicrobial resistance by pathogens and other members of the microbiota may be significant contributing factors.

    The ISME journal 2016;10;5;1081-91

  • Mechanisms of fate decision and lineage commitment during haematopoiesis.

    Cvejic A

    Department of Haematology, University of Cambridge, Cambridge, UK.

    Blood stem cells need to both perpetuate themselves (self-renew) and differentiate into all mature blood cells to maintain blood formation throughout life. However, it is unclear how the underlying gene regulatory network maintains this population of self-renewing and differentiating stem cells and how it accommodates the transition from a stem cell to a mature blood cell. Our current knowledge of transcriptomes of various blood cell types has mainly been advanced by population-level analysis. However, a population of seemingly homogenous blood cells may include many distinct cell types with substantially different transcriptomes and abilities to make diverse fate decisions. Therefore, understanding the cell-intrinsic differences between individual cells is necessary for a deeper understanding of the molecular basis of their behaviour. Here we review recent single-cell studies in the haematopoietic system and their contribution to our understanding of the mechanisms governing cell fate choices and lineage commitment.

    Immunology and cell biology 2016;94;3;230-5

  • Exome sequencing identifies rare variants in multiple genes in atrioventricular septal defect.

    D'Alessandro LC, Al Turki S, Manickaraj AK, Manase D, Mulder BJ, Bergin L, Rosenberg HC, Mondal T, Gordon E, Lougheed J, Smythe J, Devriendt K, Bhattacharya S, Watkins H, Bentham J, Bowdin S, Hurles ME and Mital S

    Division of Cardiology, Department of Pediatrics, Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada.

    Purpose: The genetic etiology of atrioventricular septal defect (AVSD) is unknown in 40% cases. Conventional sequencing and arrays have identified the etiology in only a minority of nonsyndromic individuals with AVSD.

    Methods: Whole-exome sequencing was performed in 81 unrelated probands with AVSD to identify potentially causal variants in a comprehensive set of 112 genes with strong biological relevance to AVSD.

    Results: A significant enrichment of rare and rare damaging variants was identified in the gene set, compared with controls (odds ratio (OR): 1.52; 95% confidence interval (CI): 1.35-1.71; P = 4.8 × 10(-11)). The enrichment was specific to AVSD probands, compared with a cohort without AVSD with tetralogy of Fallot (OR: 2.25; 95% CI: 1.84-2.76; P = 2.2 × 10(-16)). Six genes (NIPBL, CHD7, CEP152, BMPR1a, ZFPM2, and MDM4) were enriched for rare variants in AVSD compared with controls, including three syndrome-associated genes (NIPBL, CHD7, and CEP152). The findings were confirmed in a replication cohort of 81 AVSD probands.

    Conclusion: Mutations in genes with strong biological relevance to AVSD, including syndrome-associated genes, can contribute to AVSD, even in those with isolated heart disease. The identification of a gene set associated with AVSD will facilitate targeted genetic screening in this cohort.

    Funded by: British Heart Foundation: CH/09/003/26631, RG/10/17/28553; Wellcome Trust: 090532, WT098051

    Genetics in medicine : official journal of the American College of Medical Genetics 2016;18;2;189-98

  • A multiple-phenotype imputation method for genetic studies.

    Dahl A, Iotchkova V, Baud A, Johansson Å, Gyllensten U, Soranzo N, Mott R, Kranis A and Marchini J

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.

    Genetic association studies have yielded a wealth of biological discoveries. However, these studies have mostly analyzed one trait and one SNP at a time, thus failing to capture the underlying complexity of the data sets. Joint genotype-phenotype analyses of complex, high-dimensional data sets represent an important way to move beyond simple genome-wide association studies (GWAS) with great potential. The move to high-dimensional phenotypes will raise many new statistical problems. Here we address the central issue of missing phenotypes in studies with any level of relatedness between samples. We propose a multiple-phenotype mixed model and use a computationally efficient variational Bayesian algorithm to fit the model. On a variety of simulated and real data sets from a range of organisms and trait types, we show that our method outperforms existing state-of-the-art methods from the statistics and machine learning literature and can boost signals of association.

    Nature genetics 2016;48;4;466-72

  • A Method for Checking Genomic Integrity in Cultured Cell Lines from SNP Genotyping Data.

    Danecek P, McCarthy SA, HipSci Consortium and Durbin R

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, United Kingdom.

    Genomic screening for chromosomal abnormalities is an important part of quality control when establishing and maintaining stem cell lines. We present a new method for sensitive detection of copy number alterations, aneuploidy, and contamination in cell lines using genome-wide SNP genotyping data. In contrast to other methods designed for identifying copy number variations in a single sample or in a sample composed of a mixture of normal and tumor cells, this new method is tailored for determining differences between cell lines and the starting material from which they were derived, which allows us to distinguish between normal and novel copy number variation. We implemented the method in the freely available BCFtools package and present results based on induced pluripotent stem cell lines obtained in the HipSci project.

    PloS one 2016;11;5;e0155014

  • Using a Human Challenge Model of Infection to Measure Vaccine Efficacy: A Randomised, Controlled Trial Comparing the Typhoid Vaccines M01ZH09 with Placebo and Ty21a.

    Darton TC, Jones C, Blohmke CJ, Waddington CS, Zhou L, Peters A, Haworth K, Sie R, Green CA, Jeppesen CA, Moore M, Thompson BA, John T, Kingsley RA, Yu LM, Voysey M, Hindle Z, Lockhart S, Sztein MB, Dougan G, Angus B, Levine MM and Pollard AJ

    Oxford Vaccine Group, Department of Paediatrics, and the NIHR Oxford Biomedical Research Centre, University of Oxford, Oxford, United Kingdom.

    Background: Typhoid persists as a major cause of global morbidity. While several licensed vaccines to prevent typhoid are available, they are of only moderate efficacy and unsuitable for use in children less than two years of age. Development of new efficacious vaccines is complicated by the human host-restriction of Salmonella enterica serovar Typhi (S. Typhi) and lack of clear correlates of protection. In this study, we aimed to evaluate the protective efficacy of a single dose of the oral vaccine candidate, M01ZH09, in susceptible volunteers by direct typhoid challenge.

    Methods and findings: We performed a randomised, double-blind, placebo-controlled trial in healthy adult participants at a single centre in Oxford (UK). Participants were allocated to receive one dose of double-blinded M01ZH09 or placebo or 3-doses of open-label Ty21a. Twenty-eight days after vaccination, participants were challenged with 104CFU S. Typhi Quailes strain. The efficacy of M01ZH09 compared with placebo (primary outcome) was assessed as the percentage of participants reaching pre-defined endpoints constituting typhoid diagnosis (fever and/or bacteraemia) during the 14 days after challenge. Ninety-nine participants were randomised to receive M01ZH09 (n = 33), placebo (n = 33) or 3-doses of Ty21a (n = 33). After challenge, typhoid was diagnosed in 18/31 (58.1% [95% CI 39.1 to 75.5]) M01ZH09, 20/30 (66.7% [47.2 to 87.2]) placebo, and 13/30 (43.3% [25.5 to 62.6]) Ty21a vaccine recipients. Vaccine efficacy (VE) for one dose of M01ZH09 was 13% [95% CI -29 to 41] and 35% [-5 to 60] for 3-doses of Ty21a. Retrospective multivariable analyses demonstrated that pre-existing anti-Vi antibody significantly reduced susceptibility to infection after challenge; a 1 log increase in anti-Vi IgG resulting in a 71% decrease in the hazard ratio of typhoid diagnosis ([95% CI 30 to 88%], p = 0.006) during the 14 day challenge period. Limitations to the study included the requirement to limit the challenge period prior to treatment to 2 weeks, the intensity of the study procedures and the high challenge dose used resulting in a stringent model.

    Conclusions: Despite successfully demonstrating the use of a human challenge study to directly evaluate vaccine efficacy, a single-dose M01ZH09 failed to demonstrate significant protection after challenge with virulent Salmonella Typhi in this model. Anti-Vi antibody detected prior to vaccination played a major role in outcome after challenge.

    Trial registration: (NCT01405521) and EudraCT (number 2011-000381-35).

    PLoS neglected tropical diseases 2016;10;8;e0004926

  • Multiple major disease-associated clones of Legionella pneumophila have emerged recently and independently.

    David S, Rusniok C, Mentasti M, Gomez-Valero L, Harris SR, Lechat P, Lees J, Ginevra C, Glaser P, Ma L, Bouchier C, Underwood A, Jarraud S, Harrison TG, Parkhill J and Buchrieser C

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA Cambridge, United Kingdom.

    Legionella pneumophila is an environmental bacterium and the leading cause of Legionnaires' disease. Just five sequence types (ST), from more than 2000 currently described, cause nearly half of disease cases in northwest Europe. Here, we report the sequence and analyses of 364 L. pneumophila genomes, including 337 from the five disease-associated STs and 27 representative of the species diversity. Phylogenetic analyses revealed that the five STs have independent origins within a highly diverse species. The number of de novo mutations is extremely low with maximum pairwise single-nucleotide polymorphisms (SNPs) ranging from 19 (ST47) to 127 (ST1), which suggests emergences within the last century. Isolates sampled geographically far apart differ by only a few SNPs, demonstrating rapid dissemination. These five STs have been recombining recently, leading to a shared pool of allelic variants potentially contributing to their increased disease propensity. The oldest clone, ST1, has spread globally; between 1940 and 2000, four new clones have emerged in Europe, which show long-distance, rapid dispersal. That a large proportion of clinical cases is caused by recently emerged and internationally dispersed clones, linked by convergent evolution, is surprising for an environmental bacterium traditionally considered to be an opportunistic pathogen. To simultaneously explain recent emergence, rapid spread and increased disease association, we hypothesize that these STs have adapted to new man-made environmental niches, which may be linked by human infection and transmission.

    Genome research 2016;26;11;1555-1564

  • Formin Is Associated with Left-Right Asymmetry in the Pond Snail and the Frog.

    Davison A, McDowell GS, Holden JM, Johnson HF, Koutsovoulos GD, Liu MM, Hulpiau P, Van Roy F, Wade CM, Banerjee R, Yang F, Chiba S, Davey JW, Jackson DJ, Levin M and Blaxter ML

    School of Life Sciences, University of Nottingham, Nottingham NG7 2RD, UK. Electronic address:

    While components of the pathway that establishes left-right asymmetry have been identified in diverse animals, from vertebrates to flies, it is striking that the genes involved in the first symmetry-breaking step remain wholly unknown in the most obviously chiral animals, the gastropod snails. Previously, research on snails was used to show that left-right signaling of Nodal, downstream of symmetry breaking, may be an ancestral feature of the Bilateria [1 and 2]. Here, we report that a disabling mutation in one copy of a tandemly duplicated, diaphanous-related formin is perfectly associated with symmetry breaking in the pond snail. This is supported by the observation that an anti-formin drug treatment converts dextral snail embryos to a sinistral phenocopy, and in frogs, drug inhibition or overexpression by microinjection of formin has a chirality-randomizing effect in early (pre-cilia) embryos. Contrary to expectations based on existing models [3, 4 and 5], we discovered asymmetric gene expression in 2- and 4-cell snail embryos, preceding morphological asymmetry. As the formin-actin filament has been shown to be part of an asymmetry-breaking switch in vitro [6 and 7], together these results are consistent with the view that animals with diverse body plans may derive their asymmetries from the same intracellular chiral elements [8].

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F018940/1, BB_BB/F021135/1, BB_BB/G00661X/1, F021135, G00661X; Medical Research Council: G0900740, MRC_MR/K001744/1; NCI NIH HHS: U54 CA143876, U54CA143876; Wellcome Trust: WT098051

    Current biology : CB 2016;26;5;654-60

  • Prognostic impact of p15 gene aberrations in acute leukemia.

    De Braekeleer M, Douet-Guilbert N and De Braekeleer E

    a Laboratoire d'Histologie, Embryologie et Cytogénétique, Faculté de Médecine et des Sciences de la Santé , Université de Brest , Brest , France ;

    The p15 gene (also known as CDKN2B, INK4B, p15(INK4B)), located in band 9p21, encodes a protein that induces a G1-phase cell cycle arrest through inhibition of CDK4/6 (cyclin-dependent kinase 4/6). It also plays an important role in the regulation of cellular commitment of hematopoietic progenitor cells and myeloid cell differentiation. p15 can be silenced by several mechanisms, including deletion and hypermethylation of its promoter. Homozygous p15 deletion is rare in acute myeloblastic leukemia (AML) and myelodysplastic syndromes (MDS) but frequent in acute lymphoblastic leukemia (ALL). On the contrary, methylation of the p15 promoter is identified in some 50% of the patients with AML and MDS, but is less frequent in ALL. The analysis of the 28 studies available in the literature revealed conflicting results (unfavorable, favorable or no impact) that can be due, at least in part, to methodological and/or biological pitfalls. Among those, are the heterogeneity of the methylation patterns of the p15 gene and the lack of a comprehensive analysis including transcriptional and translational inactivation that have major impact on its expression. Therefore, detection of the p15 mRNA expression (quantitative or not) may represent a more appropriate method to determine the prognostic impact of the p15 gene.

    Leukemia & lymphoma 2016;1-9

  • Chimpanzee genomic diversity reveals ancient admixture with bonobos.

    de Manuel M, Kuhlwilm M, Frandsen P, Sousa VC, Desai T, Prado-Martinez J, Hernandez-Rodriguez J, Dupanloup I, Lao O, Hallast P, Schmidt JM, Heredia-Genestar JM, Benazzo A, Barbujani G, Peter BM, Kuderna LF, Casals F, Angedakin S, Arandjelovic M, Boesch C, Kühl H, Vigilant L, Langergraber K, Novembre J, Gut M, Gut I, Navarro A, Carlsen F, Andrés AM, Siegismund HR, Scally A, Excoffier L, Tyler-Smith C, Castellano S, Xue Y, Hvilsom C and Marques-Bonet T

    Institut de Biologia Evolutiva (Consejo Superior de Investigaciones Científicas-Universitat Pompeu Fabra), Barcelona Biomedical Research Park, Doctor Aiguader 88, Barcelona, Catalonia 08003, Spain.

    Our closest living relatives, chimpanzees and bonobos, have a complex demographic history. We analyzed the high-coverage whole genomes of 75 wild-born chimpanzees and bonobos from 10 countries in Africa. We found that chimpanzee population substructure makes genetic information a good predictor of geographic origin at country and regional scales. Multiple lines of evidence suggest that gene flow occurred from bonobos into the ancestors of central and eastern chimpanzees between 200,000 and 550,000 years ago, probably with subsequent spread into Nigeria-Cameroon chimpanzees. Together with another, possibly more recent contact (after 200,000 years ago), bonobos contributed less than 1% to the central chimpanzee genomes. Admixture thus appears to have been widespread during hominid evolution.

    Science (New York, N.Y.) 2016;354;6311;477-481

  • A meta-analysis of 120 246 individuals identifies 18 new loci for fibrinogen concentration.

    de Vries PS, Chasman DI, Sabater-Lleal M, Chen MH, Huffman JE, Steri M, Tang W, Teumer A, Marioni RE, Grossmann V, Hottenga JJ, Trompet S, Müller-Nurasyid M, Zhao JH, Brody JA, Kleber ME, Guo X, Wang JJ, Auer PL, Attia JR, Yanek LR, Ahluwalia TS, Lahti J, Venturini C, Tanaka T, Bielak LF, Joshi PK, Rocanin-Arjo A, Kolcic I, Navarro P, Rose LM, Oldmeadow C, Riess H, Mazur J, Basu S, Goel A, Yang Q, Ghanbari M, Willemsen G, Rumley A, Fiorillo E, de Craen AJ, Grotevendt A, Scott R, Taylor KD, Delgado GE, Yao J, Kifley A, Kooperberg C, Qayyum R, Lopez LM, Berentzen TL, Räikkönen K, Mangino M, Bandinelli S, Peyser PA, Wild S, Trégouët DA, Wright AF, Marten J, Zemunik T, Morrison AC, Sennblad B, Tofler G, de Maat MP, de Geus EJ, Lowe GD, Zoledziewska M, Sattar N, Binder H, Völker U, Waldenberger M, Khaw KT, Mcknight B, Huang J, Jenny NS, Holliday EG, Qi L, Mcevoy MG, Becker DM, Starr JM, Sarin AP, Hysi PG, Hernandez DG, Jhun MA, Campbell H, Hamsten A, Rivadeneira F, Mcardle WL, Slagboom PE, Zeller T, Koenig W, Psaty BM, Haritunians T, Liu J, Palotie A, Uitterlinden AG, Stott DJ, Hofman A, Franco OH, Polasek O, Rudan I, Morange PE, Wilson JF, Kardia SL, Ferrucci L, Spector TD, Eriksson JG, Hansen T, Deary IJ, Becker LC, Scott RJ, Mitchell P, März W, Wareham NJ, Peters A, Greinacher A, Wild PS, Jukema JW, Boomsma DI, Hayward C, Cucca F, Tracy R, Watkins H, Reiner AP, Folsom AR, Ridker PM, O'Donnell CJ, Smith NL, Strachan DP and Dehghan A

    Department of Epidemiology.

    Genome-wide association studies have previously identified 23 genetic loci associated with circulating fibrinogen concentration. These studies used HapMap imputation and did not examine the X-chromosome. 1000 Genomes imputation provides better coverage of uncommon variants, and includes indels. We conducted a genome-wide association analysis of 34 studies imputed to the 1000 Genomes Project reference panel and including ∼120 000 participants of European ancestry (95 806 participants with data on the X-chromosome). Approximately 10.7 million single-nucleotide polymorphisms and 1.2 million indels were examined. We identified 41 genome-wide significant fibrinogen loci; of which, 18 were newly identified. There were no genome-wide significant signals on the X-chromosome. The lead variants of five significant loci were indels. We further identified six additional independent signals, including three rare variants, at two previously characterized loci: FGB and IRF1. Together the 41 loci explain 3% of the variance in plasma fibrinogen concentration.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F019394/1; Chief Scientist Office: CZB/4/505, ETM/55; Medical Research Council: G1000143, G1001799, MC_PC_U127561128, MR/K026992/1; NCATS NIH HHS: UL1 TR000124; NHLBI NIH HHS: R01 HL059367; NIDDK NIH HHS: P30 DK063491

    Human molecular genetics 2016;25;2;358-70

  • CD4-Transgenic Zebrafish Reveal Tissue-Resident Th2- and Regulatory T Cell-like Populations and Diverse Mononuclear Phagocytes.

    Dee CT, Nagaraju RT, Athanasiadis EI, Gray C, Fernandez Del Ama L, Johnston SA, Secombes CJ, Cvejic A and Hurlstone AF

    Faculty of Life Sciences, The University of Manchester, Manchester M13 9PT, United Kingdom.

    CD4(+) T cells are at the nexus of the innate and adaptive arms of the immune system. However, little is known about the evolutionary history of CD4(+) T cells, and it is unclear whether their differentiation into specialized subsets is conserved in early vertebrates. In this study, we have created transgenic zebrafish with vibrantly labeled CD4(+) cells allowing us to scrutinize the development and specialization of teleost CD4(+) leukocytes in vivo. We provide further evidence that CD4(+) macrophages have an ancient origin and had already emerged in bony fish. We demonstrate the utility of this zebrafish resource for interrogating the complex behavior of immune cells at cellular resolution by the imaging of intimate contacts between teleost CD4(+) T cells and mononuclear phagocytes. Most importantly, we reveal the conserved subspecialization of teleost CD4(+) T cells in vivo. We demonstrate that the ancient and specialized tissues of the gills contain a resident population of il-4/13b-expressing Th2-like cells, which do not coexpress il-4/13a Additionally, we identify a contrasting population of regulatory T cell-like cells resident in the zebrafish gut mucosa, in marked similarity to that found in the intestine of mammals. Finally, we show that, as in mammals, zebrafish CD4(+) T cells will infiltrate melanoma tumors and obtain a phenotype consistent with a type 2 immune microenvironment. We anticipate that this unique resource will prove invaluable for future investigation of T cell function in biomedical research, the development of vaccination and health management in aquaculture, and for further research into the evolution of adaptive immunity.

    Funded by: Cancer Research UK: A14953; European Research Council: 282059; Medical Research Council: MC_PC_12009, MR/J009156/1

    Journal of immunology (Baltimore, Md. : 1950) 2016;197;9;3520-3530

  • Discrete distributional differential expression (D3E)--a tool for gene expression analysis of single-cell RNA-seq data.

    Delmans M and Hemberg M

    Department of Plant Sciences, University of Cambridge, Downing Street, Cambridge, CB2 3EA, UK.

    Background: The advent of high throughput RNA-seq at the single-cell level has opened up new opportunities to elucidate the heterogeneity of gene expression. One of the most widespread applications of RNA-seq is to identify genes which are differentially expressed between two experimental conditions.

    Results: We present a discrete, distributional method for differential gene expression (D(3)E), a novel algorithm specifically designed for single-cell RNA-seq data. We use synthetic data to evaluate D(3)E, demonstrating that it can detect changes in expression, even when the mean level remains unchanged. Since D(3)E is based on an analytically tractable stochastic model, it provides additional biological insights by quantifying biologically meaningful properties, such as the average burst size and frequency. We use D(3)E to investigate experimental data, and with the help of the underlying model, we directly test hypotheses about the driving mechanism behind changes in gene expression.

    Conclusion: Evaluation using synthetic data shows that D(3)E performs better than other methods for identifying differentially expressed genes since it is designed to take full advantage of the information available from single-cell RNA-seq experiments. Moreover, the analytical model underlying D(3)E makes it possible to gain additional biological insights.

    Funded by: Biotechnology and Biological Sciences Research Council; Wellcome Trust

    BMC bioinformatics 2016;17;110

  • Somatic, positive and negative domains of the Center for Epidemiological Studies Depression (CES-D) scale: a meta-analysis of genome-wide association studies.

    Demirkan A, Lahti J, Direk N, Viktorin A, Lunetta KL, Terracciano A, Nalls MA, Tanaka T, Hek K, Fornage M, Wellmann J, Cornelis MC, Ollila HM, Yu L, Smith JA, Pilling LC, Isaacs A, Palotie A, Zhuang WV, Zonderman A, Faul JD, Sutin A, Meirelles O, Mulas A, Hofman A, Uitterlinden A, Rivadeneira F, Perola M, Zhao W, Salomaa V, Yaffe K, Luik AI, NABEC, Liu Y, Ding J, Lichtenstein P, Landén M, Widen E, Weir DR, Llewellyn DJ, Murray A, Kardia SL, Eriksson JG, Koenen K, Magnusson PK, Ferrucci L, Mosley TH, Cucca F, Oostra BA, Bennett DA, Paunio T, Berger K, Harris TB, Pedersen NL, Murabito JM, Tiemeier H, van Duijn CM and Räikkönen K

    Genetic Epidemiology Unit,Departments of Epidemiology and Clinical Genetics,Erasmus MC,Rotterdam,The Netherlands.

    Background: Major depressive disorder (MDD) is moderately heritable, however genome-wide association studies (GWAS) for MDD, as well as for related continuous outcomes, have not shown consistent results. Attempts to elucidate the genetic basis of MDD may be hindered by heterogeneity in diagnosis. The Center for Epidemiological Studies Depression (CES-D) scale provides a widely used tool for measuring depressive symptoms clustered in four different domains which can be combined together into a total score but also can be analysed as separate symptom domains.

    Method: We performed a meta-analysis of GWAS of the CES-D symptom clusters. We recruited 12 cohorts with the 20- or 10-item CES-D scale (32 528 persons).

    Results: One single nucleotide polymorphism (SNP), rs713224, located near the brain-expressed melatonin receptor (MTNR1A) gene, was associated with the somatic complaints domain of depression symptoms, with borderline genome-wide significance (p discovery = 3.82 × 10-8). The SNP was analysed in an additional five cohorts comprising the replication sample (6813 persons). However, the association was not consistent among the replication sample (p discovery+replication = 1.10 × 10-6) with evidence of heterogeneity.

    Conclusions: Despite the effort to harmonize the phenotypes across cohorts and participants, our study is still underpowered to detect consistent association for depression, even by means of symptom classification. On the contrary, the SNP-based heritability and co-heritability estimation results suggest that a very minor part of the variation could be captured by GWAS, explaining the reason of sparse findings.

    Psychological medicine 2016;1-11

  • Bacterial microbiota of the upper respiratory tract and childhood asthma.

    Depner M, Ege MJ, Cox MJ, Dwyer S, Walker AW, Birzele LT, Genuneit J, Horak E, Braun-Fahrländer C, Danielewicz H, Maier RM, Moffatt MF, Cookson WO, Heederik D, von Mutius E and Legatzki A

    Dr von Hauner Children's Hospital, LMU Munich, Munich, Germany. Electronic address:

    Background: Patients with asthma and healthy controls differ in bacterial colonization of the respiratory tract. The upper airways have been shown to reflect colonization of the lower airways, the actual site of inflammation in asthma, which is hardly accessible in population studies.

    Objective: We sought to characterize the bacterial communities at 2 sites of the upper respiratory tract obtained from children from a rural area and to relate these to asthma.

    Methods: The microbiota of 327 throat and 68 nasal samples from school-age farm and nonfarm children were analyzed by 454-pyrosequencing of the bacterial 16S ribosomal RNA gene.

    Results: Alterations in nasal microbiota but not of throat microbiota were associated with asthma. Children with asthma had lower α- and β-diversity of the nasal microbiota as compared with healthy control children. Furthermore, asthma presence was positively associated with a specific operational taxonomic unit from the genus Moraxella in children not exposed to farming, whereas in farm children Moraxella colonization was unrelated to asthma. In nonfarm children, Moraxella colonization explained the association between bacterial diversity and asthma to a large extent.

    Conclusions: Asthma was mainly associated with an altered nasal microbiota characterized by lower diversity and Moraxella abundance. Children living on farms might not be susceptible to the disadvantageous effect of Moraxella. Prospective studies may clarify whether Moraxella outgrowth is a cause or a consequence of loss in diversity.

    The Journal of allergy and clinical immunology 2016

  • Catalog of genetic progression of human cancers: breast cancer.

    Desmedt C, Yates L and Kulka J

    J.-C. Heuson Breast Cancer Translational Research Laboratory, Institut Jules Bordet, Université Libre de Bruxelles, Boulevard de Waterloo 121, 1000, Brussels, Belgium.

    With the rapid development of next-generation sequencing, deeper insights are being gained into the molecular evolution that underlies the development and clinical progression of breast cancer. It is apparent that during evolution, breast cancers acquire thousands of mutations including single base pair substitutions, insertions, deletions, copy number aberrations, and structural rearrangements. As a consequence, at the whole genome level, no two cancers are identical and few cancers even share the same complement of "driver" mutations. Indeed, two samples from the same cancer may also exhibit extensive differences due to constant remodeling of the genome over time. In this review, we summarize recent studies that extend our understanding of the genomic basis of cancer progression. Key biological insights include the following: subclonal diversification begins early in cancer evolution, being detectable even in in situ lesions; geographical stratification of subclonal structure is frequent in primary tumors and can include therapeutically targetable alterations; multiple distant metastases typically arise from a common metastatic ancestor following a "metastatic cascade" model; systemic therapy can unmask preexisting resistant subclones or influence further treatment sensitivity and disease progression. We conclude the review by describing novel approaches such as the analysis of circulating DNA and patient-derived xenografts that promise to further our understanding of the genomic changes occurring during cancer evolution and guide treatment decision making.

    Cancer metastasis reviews 2016;35;1;49-62

  • Genomic Characterization of Primary Invasive Lobular Breast Cancer.

    Desmedt C, Zoppoli G, Gundem G, Pruneri G, Larsimont D, Fornili M, Fumagalli D, Brown D, Rothé F, Vincent D, Kheddoumi N, Rouas G, Majjaj S, Brohée S, Van Loo P, Maisonneuve P, Salgado R, Van Brussel T, Lambrechts D, Bose R, Metzger O, Galant C, Bertucci F, Piccart-Gebhart M, Viale G, Biganzoli E, Campbell PJ and Sotiriou C

    Christine Desmedt, Gabriele Zoppoli, Denis Larsimont, Debora Fumagalli, David Brown, Françoise Rothé, Delphine Vincent, Naima Kheddoumi, Ghizlane Rouas, Samira Majjaj, Sylvain Brohée, Roberto Salgado, Martine Piccart-Gebhart, and Christos Sotiriou, Institut Jules Bordet; Christine Galant, Cliniques Universitaires Saint Luc, Brussels; Peter Van Loo, University of Leuven; Thomas Van Brussel and Diether Lambrechts, VIB Vesalius Research Center, Leuven, Belgium; Gabriele Zoppoli, University of Genoa and Istituto di Ricerca a Carattere Clinico-Scientifico San Martino-National Cancer Institute, Genoa; Giancarlo Pruneri, Patrick Maisonneuve, and Giuseppe Viale, European Institute of Oncology; Marco Fornili and Elia Biganzoli, University of Milan, Fondazione Istituto di Ricovero e Cura a Carattere Scientifico Istituto Nazionale Tumori, Milan, Italy; Gunes Gundem and Peter J. Campbell, Wellcome Trust Sanger Institute, Cambridgeshire; Peter Van Loo, The Francis Crick Institute, London, United Kingdom; Ron Bose, Washington University School of Medicine, St Louis, MO; Otto Metzger, Dana-Farber Cancer Institute, Boston, MA; and François Bertucci, Institut Paoli-Calmettes, Marseille, France.

    Purpose: Invasive lobular breast cancer (ILBC) is the second most common histologic subtype after invasive ductal breast cancer (IDBC). Despite clinical and pathologic differences, ILBC is still treated as IDBC. We aimed to identify genomic alterations in ILBC with potential clinical implications.

    Methods: From an initial 630 ILBC primary tumors, we interrogated oncogenic substitutions and insertions and deletions of 360 cancer genes and genome-wide copy number aberrations in 413 and 170 ILBC samples, respectively, and correlated those findings with clinicopathologic and outcome features.

    Results: Besides the high mutation frequency of CDH1 in 65% of tumors, alterations in one of the three key genes of the phosphatidylinositol 3-kinase pathway, PIK3CA, PTEN, and AKT1, were present in more than one-half of the cases. HER2 and HER3 were mutated in 5.1% and 3.6% of the tumors, with most of these mutations having a proven role in activating the human epidermal growth factor receptor/ERBB pathway. Mutations in FOXA1 and ESR1 copy number gains were detected in 9% and 25% of the samples. All these alterations were more frequent in ILBC than in IDBC. The histologic diversity of ILBC was associated with specific alterations, such as enrichment for HER2 mutations in the mixed, nonclassic, and ESR1 gains in the solid subtype. Survival analyses revealed that chromosome 1q and 11p gains showed independent prognostic value in ILBC and that HER2 and AKT1 mutations were associated with increased risk of early relapse.

    Conclusion: This study demonstrates that we can now begin to individualize the treatment of ILBC, with HER2, HER3, and AKT1 mutations representing high-prevalence therapeutic targets and FOXA1 mutations and ESR1 gains deserving urgent dedicated clinical investigation, especially in the context of endocrine treatment.

    Journal of clinical oncology : official journal of the American Society of Clinical Oncology 2016

  • Zygotes segregate entire parental genomes in distinct blastomere lineages causing cleavage-stage chimerism and mixoploidy.

    Destouni A, Zamani Esteki M, Catteeuw M, Tšuiko O, Dimitriadou E, Smits K, Kurg A, Salumets A, Van Soom A, Voet T and Vermeesch JR

    Laboratory of Cytogenetics and Genome Research, Center of Human Genetics, KU Leuven, Leuven, 3000, Belgium;

    Dramatic genome dynamics, such as chromosome instability, contribute to the remarkable genomic heterogeneity among the blastomeres comprising a single embryo during human preimplantation development. This heterogeneity, when compatible with life, manifests as constitutional mosaicism, chimerism, and mixoploidy in live-born individuals. Chimerism and mixoploidy are defined by the presence of cell lineages with different parental genomes or different ploidy states in a single individual, respectively. Our knowledge of their mechanistic origin results from indirect observations, often when the cell lineages have been subject to rigorous selective pressure during development. Here, we applied haplarithmisis to infer the haplotypes and the copy number of parental genomes in 116 single blastomeres comprising entire preimplantation bovine embryos (n = 23) following in vitro fertilization. We not only demonstrate that chromosome instability is conserved between bovine and human cleavage embryos, but we also discovered that zygotes can spontaneously segregate entire parental genomes into different cell lineages during the first post-zygotic cleavage division. Parental genome segregation was not exclusively triggered by abnormal fertilizations leading to triploid zygotes, but also normally fertilized zygotes can spontaneously segregate entire parental genomes into different cell lineages during cleavage of the zygote. We coin the term "heterogoneic division" to indicate the events leading to noncanonical zygotic cytokinesis, segregating the parental genomes into distinct cell lineages. Persistence of those cell lines during development is a likely cause of chimerism and mixoploidy in mammals.

    Genome research 2016;26;5;567-78

  • The role of folate transport in antifolate drug action in Trypanosoma brucei.

    Dewar S, Sienkiewicz N, Ong HB, Wall RJ, Horn D and Fairlamb AH

    University of Dundee, United Kingdom.

    The aim of this study was to identify and characterise mechanisms of resistance to antifolate drugs in African trypanosomes. Genome-wide RNAi library screens were undertaken in bloodstream form Trypanosoma brucei exposed to the antifolates methotrexate and raltitrexed. RNAi knockdown, in conjunction with drug susceptibly and folate transport studies, were used to validate the functions of the putative folate transporters. The transport kinetics of folate and methotrexate were further characterised in whole cells. RNA interference target sequencing (RIT-seq) experiments identified a tandem array of genes encoding a folate transporter family, TbFT1-3, as major contributors to antifolate drug uptake. RNAi knockdown of TbFT1-3 substantially reduced folate transport into trypanosomes and reduced the parasite's susceptibly to the classical antifolates methotrexate and raltitrexed. In contrast, knockdown of TbFT1-3 increased susceptibly to the non-classical antifolates pyrimethamine and nolatrexed. Both folate and methotrexate transport were inhibited by classical antifolates, but not by non-classical antifolates or biopterin. Thus, TbFT1-3 mediate the uptake of folate and classical antifolates in trypanosomes and TbFT1-3 loss-of-function is a mechanism of anti-folate drug resistance.

    The Journal of biological chemistry 2016

  • Coalescent Inference Using Serially Sampled, High-Throughput Sequencing Data from Intra-Host HIV Infection.

    Dialdestoro K, Sibbesen JA, Maretty L, Raghwani J, Gall A, Kellam P, Pybus OG, Hein J and Jenkins PA

    University of Oxford;

    Human immunodeficiency virus (HIV) is a rapidly evolving pathogen that causes chronic infections, so genetic diversity within a single infection can be very high. High-throughput "deep" sequencing can now measure this diversity in unprecedented detail, particularly since it can be performed at different timepoints during an infection, and this offers a potentially powerful way to infer the evolutionary dynamics of the intra-host viral population. However, population genomic inference from HIV sequence data is challenging because of high rates of mutation and recombination, rapid demographic changes, and ongoing selective pressures. In this paper we develop a new method for inference using HIV deep sequencing data using an approach based on importance sampling of ancestral recombination graphs under a multi-locus coalescent model. The approach further extends recent progress in the approximation of so-called conditional sampling distributions, a quantity of key interest when approximating coalescent likelihoods. The chief novelties of our method are that it is able to infer rates of recombination and mutation, as well as the effective population size, while handling sampling over different timepoints and missing data without extra computational difficulty. We apply our method to a dataset of HIV-1, in which several hundred sequences were obtained from an infected individual at seven timepoints over two years. We find mutation rate and effective population size estimates to be comparable to those produced by the software BEAST. Additionally, our method is able to produce local recombination rate estimates. The software underlying our method, Coalescenator, is freely available.

    Genetics 2016

  • High-throughput discovery of novel developmental phenotypes.

    Dickinson ME, Flenniken AM, Ji X, Teboul L, Wong MD, White JK, Meehan TF, Weninger WJ, Westerberg H, Adissu H, Baker CN, Bower L, Brown JM, Caddle LB, Chiani F, Clary D, Cleak J, Daly MJ, Denegre JM, Doe B, Dolan ME, Edie SM, Fuchs H, Gailus-Durner V, Galli A, Gambadoro A, Gallegos J, Guo S, Horner NR, Hsu CW, Johnson SJ, Kalaga S, Keith LC, Lanoue L, Lawson TN, Lek M, Mark M, Marschall S, Mason J, McElwee ML, Newbigging S, Nutter LM, Peterson KA, Ramirez-Solis R, Rowland DJ, Ryder E, Samocha KE, Seavitt JR, Selloum M, Szoke-Kovacs Z, Tamura M, Trainor AG, Tudose I, Wakana S, Warren J, Wendling O, West DB, Wong L, Yoshiki A, International Mouse Phenotyping Consortium, Jackson Laboratory, Infrastructure Nationale PHENOMIN, Institut Clinique de la Souris (ICS), Charles River Laboratories, MRC Harwell, Toronto Centre for Phenogenomics, Wellcome Trust Sanger Institute, RIKEN BioResource Center, MacArthur DG, Tocchini-Valentini GP, Gao X, Flicek P, Bradley A, Skarnes WC, Justice MJ, Parkinson HE, Moore M, Wells S, Braun RE, Svenson KL, de Angelis MH, Herault Y, Mohun T, Mallon AM, Henkelman RM, Brown SD, Adams DJ, Lloyd KC, McKerlie C, Beaudet AL, Bućan M and Murray SA

    Department of Molecular Physiology and Biophysics, Houston, Texas 77030, USA.

    Approximately one-third of all mammalian genes are essential for life. Phenotypes resulting from knockouts of these genes in mice have provided tremendous insight into gene function and congenital disorders. As part of the International Mouse Phenotyping Consortium effort to generate and phenotypically characterize 5,000 knockout mouse lines, here we identify 410 lethal genes during the production of the first 1,751 unique gene knockouts. Using a standardized phenotyping platform that incorporates high-resolution 3D imaging, we identify phenotypes at multiple time points for previously uncharacterized genes and additional phenotypes for genes with previously reported mutant phenotypes. Unexpectedly, our analysis reveals that incomplete penetrance and variable expressivity are common even on a defined genetic background. In addition, we show that human disease genes are enriched for essential genes, thus providing a dataset that facilitates the prioritization and validation of mutations identified in clinical sequencing efforts.

    Funded by: Cancer Research UK: CRUK_13031; Medical Research Council: MRC_MC_U142684171, MRC_MC_U142684172; NCI NIH HHS: P30 CA034196, P30 CA093373; NEI NIH HHS: P30 EY002520; NHGRI NIH HHS: U54 HG006332, U54 HG006348, U54 HG006364, U54 HG006370, UM1 HG006370; NIDDK NIH HHS: U2C DK092993; NIH HHS: U42 OD011174, U42 OD011175, U42 OD011185, UM1 OD023221, UM1 OD023222; Welcome Trust

    Nature 2016;537;7621;508-514

  • Perturbed hematopoietic stem and progenitor cell hierarchy in myelodysplastic syndromes patients with monosomy 7 as the sole cytogenetic abnormality.

    Dimitriou M, Woll PS, Mortera-Blanco T, Karimi M, Wedge DC, Doolittle H, Douagi I, Papaemmanuil E, Jacobsen SE and Hellström-Lindberg E

    Center for Hematology and Regenerative Medicine, Karolinska Institutet, Department of Medicine, Karolinska University Hospital Huddinge, Stockholm, Sweden.

    The stem and progenitor cell compartments in low- and intermediate-risk myelodysplastic syndromes (MDS) have recently been described, and shown to be highly conserved when compared to those in acute myeloid leukemia (AML). Much less is known about the characteristics of the hematopoietic hierarchy of subgroups of MDS with a high risk of transforming to AML. Immunophenotypic analysis of immature stem and progenitor cell compartments from patients with an isolated loss of the entire chromosome 7 (isolated -7), an independent high-risk genetic event in MDS, showed expansion and dominance of the malignant -7 clone in the granulocyte and macrophage progenitors (GMP), and other CD45RA+ progenitor compartments, and a significant reduction of the LIN-CD34+CD38low/-CD90+CD45RA- hematopoietic stem cell (HSC) compartment, highly reminiscent of what is typically seen in AML, and distinct from low-risk MDS. Established functional in vitro and in vivo stem cell assays showed a poor readout for -7 MDS patients irrespective of marrow blast counts. Moreover, while the -7 clone dominated at all stages of GM differentiation, the -7 clone had a competitive disadvantage in erythroid differentiation. In azacitidine-treated -7 MDS patients with a clinical response, the decreased clonal involvement in mononuclear bone marrow cells was not accompanied by a parallel reduced clonal involvement in the dominant CD45RA+ progenitor populations, suggesting a selective azacitidine-resistance of these distinct -7 progenitor compartments. Our data demonstrate, in a subgroup of high risk MDS with monosomy 7, that the perturbed stem and progenitor cell compartments resemble more that of AML than low-risk MDS.

    Oncotarget 2016

  • Pitfalls in genetic testing: the story of missed SCN1A mutations.

    Djémié T, Weckhuysen S, von Spiczak S, Carvill GL, Jaehn J, Anttonen AK, Brilstra E, Caglayan HS, de Kovel CG, Depienne C, Gaily E, Gennaro E, Giraldez BG, Gormley P, Guerrero-López R, Guerrini R, Hämäläinen E, Hartmann C, Hernandez-Hernandez L, Hjalgrim H, Koeleman BP, Leguern E, Lehesjoki AE, Lemke JR, Leu C, Marini C, McMahon JM, Mei D, Møller RS, Muhle H, Myers CT, Nava C, Serratosa JM, Sisodiya SM, Stephani U, Striano P, van Kempen MJ, Verbeek NE, Usluer S, Zara F, Palotie A, Mefford HC, Scheffer IE, De Jonghe P, Helbig I, Suls A and EuroEPINOMICS‐RES Dravet working group

    Neurogenetics groupDepartment of Molecular GeneticsVIBAntwerpBelgium; Laboratory of NeurogeneticsInstitute Born-BungeUniversity of AntwerpAntwerpBelgium.

    Background: Sanger sequencing, still the standard technique for genetic testing in most diagnostic laboratories and until recently widely used in research, is gradually being complemented by next-generation sequencing (NGS). No single mutation detection technique is however perfect in identifying all mutations. Therefore, we wondered to what extent inconsistencies between Sanger sequencing and NGS affect the molecular diagnosis of patients. Since mutations in SCN1A, the major gene implicated in epilepsy, are found in the majority of Dravet syndrome (DS) patients, we focused on missed SCN1A mutations.

    Methods: We sent out a survey to 16 genetic centers performing SCN1A testing.

    Results: We collected data on 28 mutations initially missed using Sanger sequencing. All patients were falsely reported as SCN1A mutation-negative, both due to technical limitations and human errors.

    Conclusion: We illustrate the pitfalls of Sanger sequencing and most importantly provide evidence that SCN1A mutations are an even more frequent cause of DS than already anticipated.

    Molecular genetics & genomic medicine 2016;4;4;457-64

  • Identification, Validation, and Application of Molecular Diagnostics for Insecticide Resistance in Malaria Vectors.

    Donnelly MJ, Isaacs AT and Weetman D

    Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, Liverpool L3 5QA, UK; Malaria Programme, Wellcome Trust Sanger Institute, Cambridge, UK. Electronic address:

    Insecticide resistance is a major obstacle to control of Anopheles malaria mosquitoes in sub-Saharan Africa and requires an improved understanding of the underlying mechanisms. Efforts to discover resistance genes and DNA markers have been dominated by candidate gene and quantitative trait locus studies of laboratory strains, but with greater availability of genome sequences a shift toward field-based agnostic discovery is anticipated. Mechanisms evolve continually to produce elevated resistance yielding multiplicative diagnostic markers, co-screening of which can give high predictive value. With a shift toward prospective analyses, identification and screening of resistance marker panels will boost monitoring and programmatic decision making.

    Trends in parasitology 2016;32;3;197-206

  • Deep genome sequencing and variation analysis of 13 inbred mouse strains defines candidate phenotypic alleles, private variation and homozygous truncating mutations.

    Doran AG, Wong K, Flint J, Adams DJ, Hunter KW and Keane TM

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK.

    Background: The Mouse Genomes Project is an ongoing collaborative effort to sequence the genomes of the common laboratory mouse strains. In 2011, the initial analysis of sequence variation across 17 strains found 56.7 M unique single nucleotide polymorphisms (SNPs) and 8.8 M indels. We carry out deep sequencing of 13 additional inbred strains (BUB/BnJ, C57BL/10J, C57BR/cdJ, C58/J, DBA/1J, I/LnJ, KK/HiJ, MOLF/EiJ, NZB/B1NJ, NZW/LacJ, RF/J, SEA/GnJ and ST/bJ), cataloguing molecular variation within and across the strains. These strains include important models for immune response, leukaemia, age-related hearing loss and rheumatoid arthritis. We now have several examples of fully sequenced closely related strains that are divergent for several disease phenotypes.

    Results: Approximately 27.4 M unique SNPs and 5 M indels are identified across these strains compared to the C57BL/6 J reference genome (GRCm38). The amount of variation found in the inbred laboratory mouse genome has increased to 71 M SNPs and 12 M indels. We investigate the genetic basis of highly penetrant cancer susceptibility in RF/J finding private novel missense mutations in DNA damage repair and highly cancer associated genes. We use two highly related strains (DBA/1J and DBA/2J) to investigate the genetic basis of collagen-induced arthritis susceptibility.

    Conclusions: This paper significantly expands the catalogue of fully sequenced laboratory mouse strains and now contains several examples of highly genetically similar strains with divergent phenotypes. We show how studying private missense mutations can lead to insights into the genetic mechanism for a highly penetrant phenotype.

    Funded by: Cancer Research UK: CRUK_13031; Medical Research Council: MRC_MR/L007428/1

    Genome biology 2016;17;1;167

  • DNAH11 Localization in the Proximal Region of Respiratory Cilia Defines Distinct Outer Dynein Arm Complexes.

    Dougherty GW, Loges NT, Klinkenbusch JA, Olbrich H, Pennekamp P, Menchen T, Raidt J, Wallmeier J, Werner C, Westermann C, Ruckert C, Mirra V, Hjeij R, Memari Y, Durbin R, Kolb-Kokocinski A, Praveen K, Kashef MA, Kashef S, Eghtedari F, Häffner K, Valmari P, Baktai G, Aviram M, Bentur L, Amirav I, Davis EE, Katsanis N, Brueckner M, Shaposhnykov A, Pigino G, Dworniczak B and Omran H

    1 Department of General Pediatrics and.

    Primary ciliary dyskinesia (PCD) is a recessively inherited disease that leads to chronic respiratory disorders owing to impaired mucociliary clearance. Conventional transmission electron microscopy (TEM) is a diagnostic standard to identify ultrastructural defects in respiratory cilia but is not useful in approximately 30% of PCD cases, which have normal ciliary ultrastructure. DNAH11 mutations are a common cause of PCD with normal ciliary ultrastructure and hyperkinetic ciliary beating, but its pathophysiology remains poorly understood. We therefore characterized DNAH11 in human respiratory cilia by immunofluorescence microscopy (IFM) in the context of PCD. We used whole-exome and targeted next-generation sequence analysis as well as Sanger sequencing to identify and confirm eight novel loss-of-function DNAH11 mutations. We designed and validated a monoclonal antibody specific to DNAH11 and performed high-resolution IFM of both control and PCD-affected human respiratory cells, as well as samples from green fluorescent protein (GFP)-left-right dynein mice, to determine the ciliary localization of DNAH11. IFM analysis demonstrated native DNAH11 localization in only the proximal region of wild-type human respiratory cilia and loss of DNAH11 in individuals with PCD with certain loss-of-function DNAH11 mutations. GFP-left-right dynein mice confirmed proximal DNAH11 localization in tracheal cilia. DNAH11 retained proximal localization in respiratory cilia of individuals with PCD with distinct ultrastructural defects, such as the absence of outer dynein arms (ODAs). TEM tomography detected a partial reduction of ODAs in DNAH11-deficient cilia. DNAH11 mutations result in a subtle ODA defect in only the proximal region of respiratory cilia, which is detectable by IFM and TEM tomography.

    Funded by: NCATS NIH HHS: UL1 TR001863; NHLBI NIH HHS: R01 HL093280; NIDDK NIH HHS: R01 DK072301

    American journal of respiratory cell and molecular biology 2016;55;2;213-24

  • RESEARCH ETHICS. Ethics review for international data-intensive research.

    Dove ES, Townend D, Meslin EM, Bobrow M, Littler K, Nicol D, de Vries J, Junker A, Garattini C, Bovenberg J, Shabani M, Lévesque E and Knoppers BM

    J. Kenyon Mason Institute for Medicine, Life Sciences and the Law, School of Law, University of Edinburgh, UK.

    Funded by: Wellcome Trust: 099313, 103360

    Science (New York, N.Y.) 2016;351;6280;1399-400

  • Identification of a germline F692L drug resistance variant in cis with Flt3-ITD in knock-in mice.

    Dovey OM, Chen B, Mupo A, Friedrich M, Grove CS, Cooper JL, Lee B, Varela I, Huang Y and Vassiliou GS

    The Wellcome Trust Sanger Institute;

    Haematologica 2016

  • Phylogenetic Analysis of Invasive Serotype 1 Pneumococcus in South Africa, 1989-2013.

    du Plessis M, Allam M, Tempia S, Wolter N, de Gouveia L, Mollendorf CV, Jolley KA, Mbelle N, Wadula J, Cornick JE, Everett DB, McGee L, Breiman RF, Gladstone RA, Bentley SD, Klugman KP and von Gottberg A

    Centre for Respiratory Diseases and Meningitis, National Institute for Communicable Diseases, National Health Laboratory Service, Johannesburg, South Africa School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa

    Background: Serotype 1 is an important cause of invasive pneumococcal disease in South Africa and has declined following introduction of the 13-valent pneumococcal conjugate vaccine in 2011.

    Methods: We genetically characterized 912 invasive serotype 1 isolates from 1989-2013. Simpson's diversity index and recombination ratios were calculated. Factors associated with sequence types (ST) were assessed.

    Results: Clonal complex 217 represented 96% (872/912) of sampled isolates. Post PCV13, ST diversity increased in children <5 years (0.39 to 0.63, p=0.002) and individuals >14 years (0.35 to 0.54, p<0.001): ST-217 declined proportionately in children <5 years [153/203 (75%) vs. 21/37 (57%), p=0.027], and individuals >14 years [242/305 (79%) vs. 96/148 (65%), p=0.001], whereas ST-9067 increased [4/684 (0.6%) vs. 24/228 (11%), p<0.001]. Three sub-clades were identified within ST-217: ST-217C1 (353/382, 92%), ST-217C2 (15/382, 4%) and ST-217C3 (14/382, 4%). ST-217C2, ST-217C3 and single-locus variant (SLV) ST-8314 (20/912, 2%) were associated with non-susceptibility to chloramphenicol, tetracycline and co-trimoxazole. ST-8314 (20/912, 2%) was also associated with increased non-susceptibility to penicillin (p<0.001). ST-217C3 and newly reported ST-9067 had higher recombination ratios compared to ST-217C1 (4.344 vs. 0.091, p<0.001 and 0.086 vs. 0.013, p<0.001, respectively).

    Conclusions: Increases in genetic diversity were noted post PCV13, and lineages associated with antimicrobial non-susceptibility were identified.

    Journal of clinical microbiology 2016

  • Bacterial pathogenesis: Getting all tangled up.

    Du Toit A

    Nature reviews. Microbiology 2016

  • Wheat bran promotes enrichment within the human colonic microbiota of butyrate-producing bacteria that release ferulic acid.

    Duncan SH, Russell WR, Quartieri A, Rossi M, Parkhill J, Walker AW and Flint HJ

    Rowett Institute of Nutrition and Health, University of Aberdeen, Aberdeen, UK.

    Cereal fibres such as wheat bran are considered to offer human health benefits via their impact on the intestinal microbiota. We show here by 16S rRNA gene-based community analysis that providing amylase-pretreated wheat bran as the sole added energy source to human intestinal microbial communities in anaerobic fermentors leads to the selective and progressive enrichment of a small number of bacterial species. In particular, OTUs corresponding to uncultured Lachnospiraceae (Firmicutes) related to Eubacterium xylanophilum and Butyrivibrio spp. were strongly enriched (by five to 160 fold) over 48 h in four independent experiments performed with different faecal inocula, while nine other Firmicutes OTUs showed > 5-fold enrichment in at least one experiment. Ferulic acid was released from the wheat bran during degradation but was rapidly converted to phenylpropionic acid derivatives via hydrogenation, demethylation and dehydroxylation to give metabolites that are detected in human faecal samples. Pure culture work using bacterial isolates related to the enriched OTUs, including several butyrate-producers, demonstrated that the strains caused substrate weight loss and released ferulic acid, but with limited further conversion. We conclude that breakdown of wheat bran involves specialist primary degraders while the conversion of released ferulic acid is likely to involve a multi-species pathway.

    Environmental microbiology 2016;18;7;2214-25

  • Consent Codes: Upholding Standard Data Use Conditions.

    Dyke SO, Philippakis AA, Rambla De Argila J, Paltoo DN, Luetkemeier ES, Knoppers BM, Brookes AJ, Spalding JD, Thompson M, Roos M, Boycott KM, Brudno M, Hurles M, Rehm HL, Matern A, Fiume M and Sherry ST

    Centre of Genomics and Policy, Faculty of Medicine, McGill University, Montreal, Quebec, Canada.

    A systematic way of recording data use conditions that are based on consent permissions as found in the datasets of the main public genome archives (NCBI dbGaP and EMBL-EBI/CRG EGA).

    Funded by: Canadian Institutes of Health Research: EP1-120608, EP2-120609

    PLoS genetics 2016;12;1;e1005772

  • Alternative Splice Forms Influence Functions of Whirlin in Mechanosensory Hair Cell Stereocilia.

    Ebrahim S, Ingham NJ, Lewis MA, Rogers MJ, Cui R, Kachar B, Pass JC and Steel KP

    Wolfson Centre for Age-Related Diseases, King's College London, Guy's Campus, London SE1 1UL, UK.

    WHRN (DFNB31) mutations cause diverse hearing disorders: profound deafness (DFNB31) or variable hearing loss in Usher syndrome type II. The known role of WHRN in stereocilia elongation does not explain these different pathophysiologies. Using spontaneous and targeted Whrn mutants, we show that the major long (WHRN-L) and short (WHRN-S) isoforms of WHRN have distinct localizations within stereocilia and also across hair cell types. Lack of both isoforms causes abnormally short stereocilia and profound deafness and vestibular dysfunction. WHRN-S expression, however, is sufficient to maintain stereocilia bundle morphology and function in a subset of hair cells, resulting in some auditory response and no overt vestibular dysfunction. WHRN-S interacts with EPS8, and both are required at stereocilia tips for normal length regulation. WHRN-L localizes midway along the shorter stereocilia, at the level of inter-stereociliary links. We propose that differential isoform expression underlies the variable auditory and vestibular phenotypes associated with WHRN mutations.

    Cell reports 2016;15;5;935-43

  • MERVL/Zscan4 Network Activation Results in Transient Genome-wide DNA Demethylation of mESCs.

    Eckersley-Maslin MA, Svensson V, Krueger C, Stubbs TM, Giehr P, Krueger F, Miragaia RJ, Kyriakopoulos C, Berrens RV, Milagre I, Walter J, Teichmann SA and Reik W

    Epigenetics Programme, Babraham Institute, Cambridge CB22 3AT, UK. Electronic address:

    Mouse embryonic stem cells are dynamic and heterogeneous. For example, rare cells cycle through a state characterized by decondensed chromatin and expression of transcripts, including the Zscan4 cluster and MERVL endogenous retrovirus, which are usually restricted to preimplantation embryos. Here, we further characterize the dynamics and consequences of this transient cell state. Single-cell transcriptomics identified the earliest upregulated transcripts as cells enter the MERVL/Zscan4 state. The MERVL/Zscan4 transcriptional network was also upregulated during induced pluripotent stem cell reprogramming. Genome-wide DNA methylation and chromatin analyses revealed global DNA hypomethylation accompanying increased chromatin accessibility. This transient DNA demethylation was driven by a loss of DNA methyltransferase proteins in the cells and occurred genome-wide. While methylation levels were restored once cells exit this state, genomic imprints remained hypomethylated, demonstrating a potential global and enduring influence of endogenous retroviral activation on the epigenome.

    Cell reports 2016;17;1;179-92

  • The genetics of blood pressure regulation and its target organs from association studies in 342,415 individuals.

    Ehret GB, Ferreira T, Chasman DI, Jackson AU, Schmidt EM, Johnson T, Thorleifsson G, Luan J, Donnelly LA, Kanoni S, Petersen AK, Pihur V, Strawbridge RJ, Shungin D, Hughes MF, Meirelles O, Kaakinen M, Bouatia-Naji N, Kristiansson K, Shah S, Kleber ME, Guo X, Lyytikäinen LP, Fava C, Eriksson N, Nolte IM, Magnusson PK, Salfati EL, Rallidis LS, Theusch E, Smith AJ, Folkersen L, Witkowska K, Pers TH, Joehanes R, Kim SK, Lataniotis L, Jansen R, Johnson AD, Warren H, Kim YJ, Zhao W, Wu Y, Tayo BO, Bochud M, CHARGE-EchoGen Consortium, CHARGE-HF Consortium, Wellcome Trust Case Control Consortium, Absher D, Adair LS, Amin N, Arking DE, Axelsson T, Baldassarre D, Balkau B, Bandinelli S, Barnes MR, Barroso I, Bevan S, Bis JC, Bjornsdottir G, Boehnke M, Boerwinkle E, Bonnycastle LL, Boomsma DI, Bornstein SR, Brown MJ, Burnier M, Cabrera CP, Chambers JC, Chang IS, Cheng CY, Chines PS, Chung RH, Collins FS, Connell JM, Döring A, Dallongeville J, Danesh J, de Faire U, Delgado G, Dominiczak AF, Doney AS, Drenos F, Edkins S, Eicher JD, Elosua R, Enroth S, Erdmann J, Eriksson P, Esko T, Evangelou E, Evans A, Fall T, Farrall M, Felix JF, Ferrières J, Ferrucci L, Fornage M, Forrester T, Franceschini N, Franco OH, Franco-Cereceda A, Fraser RM, Ganesh SK, Gao H, Gertow K, Gianfagna F, Gigante B, Giulianini F, Goel A, Goodall AH, Goodarzi MO, Gorski M, Gräßler J, Groves CJ, Gudnason V, Gyllensten U, Hallmans G, Hartikainen AL, Hassinen M, Havulinna AS, Hayward C, Hercberg S, Herzig KH, Hicks AA, Hingorani AD, Hirschhorn JN, Hofman A, Holmen J, Holmen OL, Hottenga JJ, Howard P, Hsiung CA, Hunt SC, Ikram MA, Illig T, Iribarren C, Jensen RA, Kähönen M, Kang HM, Kathiresan S, Keating BJ, Khaw KT, Kim YK, Kim E, Kivimaki M, Klopp N, Kolovou G, Komulainen P, Kooner JS, Kosova G, Krauss RM, Kuh D, Kutalik Z, Kuusisto J, Kvaløy K, Lakka TA, Lee NR, Lee IT, Lee WJ, Levy D, Li X, Liang KW, Lin H, Lin L, Lindström J, Lobbens S, Männistö S, Müller G, Müller-Nurasyid M, Mach F, Markus HS, Marouli E, McCarthy MI, McKenzie CA, Meneton P, Menni C, Metspalu A, Mijatovic V, Moilanen L, Montasser ME, Morris AD, Morrison AC, Mulas A, Nagaraja R, Narisu N, Nikus K, O'Donnell CJ, O'Reilly PF, Ong KK, Paccaud F, Palmer CD, Parsa A, Pedersen NL, Penninx BW, Perola M, Peters A, Poulter N, Pramstaller PP, Psaty BM, Quertermous T, Rao DC, Rasheed A, Rayner NW, Renström F, Rettig R, Rice KM, Roberts R, Rose LM, Rossouw J, Samani NJ, Sanna S, Saramies J, Schunkert H, Sebert S, Sheu WH, Shin YA, Sim X, Smit JH, Smith AV, Sosa MX, Spector TD, Stančáková A, Stanton AV, Stirrups KE, Stringham HM, Sundstrom J, Swift AJ, Syvänen AC, Tai ES, Tanaka T, Tarasov KV, Teumer A, Thorsteinsdottir U, Tobin MD, Tremoli E, Uitterlinden AG, Uusitupa M, Vaez A, Vaidya D, van Duijn CM, van Iperen EP, Vasan RS, Verwoert GC, Virtamo J, Vitart V, Voight BF, Vollenweider P, Wagner A, Wain LV, Wareham NJ, Watkins H, Weder AB, Westra HJ, Wilks R, Wilsgaard T, Wilson JF, Wong TY, Yang TP, Yao J, Yengo L, Zhang W, Zhao JH, Zhu X, Bovet P, Cooper RS, Mohlke KL, Saleheen D, Lee JY, Elliott P, Gierman HJ, Willer CJ, Franke L, Hovingh GK, Taylor KD, Dedoussis G, Sever P, Wong A, Lind L, Assimes TL, Njølstad I, Schwarz PE, Langenberg C, Snieder H, Caulfield MJ, Melander O, Laakso M, Saltevo J, Rauramaa R, Tuomilehto J, Ingelsson E, Lehtimäki T, Hveem K, Palmas W, März W, Kumari M, Salomaa V, Chen YD, Rotter JI, Froguel P, Jarvelin MR, Lakatta EG, Kuulasmaa K, Franks PW, Hamsten A, Wichmann HE, Palmer CN, Stefansson K, Ridker PM, Loos RJ, Chakravarti A, Deloukas P, Morris AP, Newton-Cheh C and Munroe PB

    Center for Complex Disease Genomics, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA.

    To dissect the genetic architecture of blood pressure and assess effects on target organ damage, we analyzed 128,272 SNPs from targeted and genome-wide arrays in 201,529 individuals of European ancestry, and genotypes from an additional 140,886 individuals were used for validation. We identified 66 blood pressure-associated loci, of which 17 were new; 15 harbored multiple distinct association signals. The 66 index SNPs were enriched for cis-regulatory elements, particularly in vascular endothelial cells, consistent with a primary role in blood pressure control through modulation of vascular tone across multiple tissues. The 66 index SNPs combined in a risk score showed comparable effects in 64,421 individuals of non-European descent. The 66-SNP blood pressure risk score was significantly associated with target organ damage in multiple tissues but with minor effects in the kidney. Our findings expand current knowledge of blood pressure-related pathways and highlight tissues beyond the classical renal system in blood pressure regulation.

    Nature genetics 2016;48;10;1171-84

  • Community dynamics and the lower airway microbiota in stable chronic obstructive pulmonary disease, smokers and healthy non-smokers.

    Einarsson GG, Comer DM, McIlreavey L, Parkhill J, Ennis M, Tunney MM and Elborn JS

    Halo, Queen's University Belfast, Belfast, UK Centre for Infection and Immunity, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, Belfast, UK.

    Rationale: The role bacteria play in the progression of COPD has increasingly been highlighted in recent years. However, the microbial community complexity in the lower airways of patients with COPD is poorly characterised.

    Objectives: To compare the lower airway microbiota in patients with COPD, smokers and non-smokers.

    Methods: Bronchial wash samples from adults with COPD (n=18), smokers with no airways disease (n=8) and healthy individuals (n=11) were analysed by extended-culture and culture-independent Illumina MiSeq sequencing. We determined aerobic and anaerobic microbiota load and evaluated differences in bacteria associated with the three cohorts. Culture-independent analysis was used to determine differences in microbiota between comparison groups including taxonomic richness, diversity, relative abundance, 'core' microbiota and co-occurrence.

    Measurement and main results: Extended-culture showed no difference in total load of aerobic and anaerobic bacteria between the three cohorts. Culture-independent analysis revealed that the prevalence of members of Pseudomonas spp. was greater in the lower airways of patients with COPD; however, the majority of the sequence reads for this taxa were attributed to three patients. Furthermore, members of Bacteroidetes, such as Prevotella spp., were observed to be greater in the 'healthy' comparison groups. Community diversity (α and β) was significantly less in COPD compared with healthy groups. Co-occurrence of bacterial taxa and the observation of a putative 'core' community within the lower airways were also observed.

    Conclusions: Microbial community composition in the lower airways of patients with COPD is significantly different to that found in smokers and non-smokers, indicating that a component of the disease is associated with changes in microbiological status.

    Thorax 2016;71;9;795-803

  • Involvement of astrocyte and oligodendrocyte gene sets in migraine.

    Eising E, de Leeuw C, Min JL, Anttila V, Verheijen MH, Terwindt GM, Dichgans M, Freilinger T, Kubisch C, International Headache Genetics Consortium, Ferrari MD, Smit AB, de Vries B, Palotie A, van den Maagdenberg AM and Posthuma D

    Department of Human Genetics, Leiden University Medical Centre, The Netherlands.

    Background: Migraine is a common episodic brain disorder characterized by recurrent attacks of severe unilateral headache and additional neurological symptoms. Two main migraine types can be distinguished based on the presence of aura symptoms that can accompany the headache: migraine with aura and migraine without aura. Multiple genetic and environmental factors confer disease susceptibility. Recent genome-wide association studies (GWAS) indicate that migraine susceptibility genes are involved in various pathways, including neurotransmission, which have already been implicated in genetic studies of monogenic familial hemiplegic migraine, a subtype of migraine with aura.

    Methods: To further explore the genetic background of migraine, we performed a gene set analysis of migraine GWAS data of 4954 clinic-based patients with migraine, as well as 13,390 controls. Curated sets of synaptic genes and sets of genes predominantly expressed in three glial cell types (astrocytes, microglia and oligodendrocytes) were investigated.

    Discussion: Our results show that gene sets containing astrocyte- and oligodendrocyte-related genes are associated with migraine, which is especially true for gene sets involved in protein modification and signal transduction. Observed differences between migraine with aura and migraine without aura indicate that both migraine types, at least in part, seem to have a different genetic background.

    Cephalalgia : an international journal of headache 2016;36;7;640-7

  • The role of hepatocyte nuclear factor 1β in disease and development.

    El-Khairi R and Vallier L

    Wellcome Trust-Medical Research Council Stem Cell Institute, Anne McLaren Laboratory, Department of Surgery, University of Cambridge, Cambridge, UK.

    Heterozygous mutations in the gene that encodes the transcription factor hepatocyte nuclear factor 1β (HNF1B) result in a multi-system disorder. HNF1B was initially discovered as a monogenic diabetes gene; however, renal cysts are the most frequently detected feature. Other clinical features include pancreatic hypoplasia and exocrine insufficiency, genital tract malformations, abnormal liver function, cholestasis and early-onset gout. Heterozygous mutations and complete gene deletions in HNF1B each account for approximately 50% of all cases of HNF1B-associated disease and may show autosomal dominant inheritance or arise spontaneously. There is no clear genotype-phenotype correlation indicating that haploinsufficiency is the main disease mechanism. Data from animal models suggest that HNF1B is essential for several stages of pancreas and liver development. However, mice with heterozygous mutations in HNF1B show no phenotype in contrast to the phenotype seen in humans. This suggests that mouse models do not fully replicate the features of human disease and complementary studies in human systems are necessary to determine the molecular mechanisms underlying HNF1B-associated disease. This review discusses the role of HNF1B in human and murine pancreas and liver development, summarizes the disease phenotypes and identifies areas for future investigations in HNF1B-associated diabetes and liver disease.

    Diabetes, obesity & metabolism 2016;18 Suppl 1;23-32

  • Analysis of five chronic inflammatory diseases identifies 27 new associations and highlights disease-specific patterns at shared loci.

    Ellinghaus D, Jostins L, Spain SL, Cortes A, Bethune J, Han B, Park YR, Raychaudhuri S, Pouget JG, Hübenthal M, Folseraas T, Wang Y, Esko T, Metspalu A, Westra HJ, Franke L, Pers TH, Weersma RK, Collij V, D'Amato M, Halfvarson J, Jensen AB, Lieb W, Degenhardt F, Forstner AJ, Hofmann A, International IBD Genetics Consortium (IIBDGC), International Genetics of Ankylosing Spondylitis Consortium (IGAS), International PSC Study Group (IPSCSG), Genetic Analysis of Psoriasis Consortium (GAPC), Psoriasis Association Genetics Extension (PAGE), Schreiber S, Mrowietz U, Juran BD, Lazaridis KN, Brunak S, Dale AM, Trembath RC, Weidinger S, Weichenthal M, Ellinghaus E, Elder JT, Barker JN, Andreassen OA, McGovern DP, Karlsen TH, Barrett JC, Parkes M, Brown MA and Franke A

    Institute of Clinical Molecular Biology, Christian Albrechts University of Kiel, Kiel, Germany.

    We simultaneously investigated the genetic landscape of ankylosing spondylitis, Crohn's disease, psoriasis, primary sclerosing cholangitis and ulcerative colitis to investigate pleiotropy and the relationship between these clinically related diseases. Using high-density genotype data from more than 86,000 individuals of European ancestry, we identified 244 independent multidisease signals, including 27 new genome-wide significant susceptibility loci and 3 unreported shared risk loci. Complex pleiotropy was supported when contrasting multidisease signals with expression data sets from human, rat and mouse together with epigenetic and expressed enhancer profiles. The comorbidities among the five immune diseases were best explained by biological pleiotropy rather than heterogeneity (a subgroup of cases genetically identical to those with another disease, possibly owing to diagnostic misclassification, molecular subtypes or excessive comorbidity). In particular, the strong comorbidity between primary sclerosing cholangitis and inflammatory bowel disease is likely the result of a unique disease, which is genetically distinct from classical inflammatory bowel disease phenotypes.

    Nature genetics 2016

  • Beegle: from literature mining to disease-gene discovery.

    ElShal S, Tranchevent LC, Sifrim A, Ardeshirdavani A, Davis J and Moreau Y

    Department of Electrical Engineering (ESAT) STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics Department, KU Leuven, Leuven 3001, Belgium iMinds Future Health Department, KU Leuven, Leuven 3001, Belgium

    Disease-gene identification is a challenging process that has multiple applications within functional genomics and personalized medicine. Typically, this process involves both finding genes known to be associated with the disease (through literature search) and carrying out preliminary experiments or screens (e.g. linkage or association studies, copy number analyses, expression profiling) to determine a set of promising candidates for experimental validation. This requires extensive time and monetary resources. We describe Beegle, an online search and discovery engine that attempts to simplify this process by automating the typical approaches. It starts by mining the literature to quickly extract a set of genes known to be linked with a given query, then it integrates the learning methodology of Endeavour (a gene prioritization tool) to train a genomic model and rank a set of candidate genes to generate novel hypotheses. In a realistic evaluation setup, Beegle has an average recall of 84% in the top 100 returned genes as a search engine, which improves the discovery engine by 12.6% in the top 5% prioritized genes. Beegle is publicly available at

    Nucleic acids research 2016;44;2;e18

  • Generation and Characterisation of a Pax8-CreERT2 Transgenic Line and a Slc22a6-CreERT2 Knock-In Line for Inducible and Specific Genetic Manipulation of Renal Tubular Epithelial Cells.

    Espana-Agusti J, Zou X, Wong K, Fu B, Yang F, Tuveson DA, Adams DJ and Matakidou A

    Department of Oncology, University of Cambridge, CRUK Cambridge institute, Cambridge, United Kingdom.

    Genetically relevant mouse models need to recapitulate the hallmarks of human disease by permitting spatiotemporal gene targeting. This is especially important for replicating the biology of complex diseases like cancer, where genetic events occur in a sporadic fashion within developed somatic tissues. Though a number of renal tubule targeting mouse lines have been developed their utility for the study of renal disease is limited by lack of inducibility and specificity. In this study we describe the generation and characterisation of two novel mouse lines directing CreERT2 expression to renal tubular epithelia. The Pax8-CreERT2 transgenic line uses the mouse Pax8 promoter to direct expression of CreERT2 to all renal tubular compartments (proximal and distal tubules as well as collecting ducts) whilst the Slc22a6-CreERT2 knock-in line utilises the endogenous mouse Slc22a6 locus to specifically target the epithelium of proximal renal tubules. Both lines show high organ and tissue specificity with no extrarenal activity detected. To establish the utility of these lines for the study of renal cancer biology, Pax8-CreERT2 and Slc22a6-CreERT2 mice were crossed to conditional Vhl knockout mice to induce long-term renal tubule specific Vhl deletion. These models exhibited renal specific activation of the hypoxia inducible factor pathway (a VHL target). Our results establish Pax8-CreERT2 and Slc22a6-CreERT2 mice as valuable tools for the investigation and modelling of complex renal biology and disease.

    PloS one 2016;11;2;e0148055

  • Genomic variations leading to alterations in cell morphology of Campylobacter spp.

    Esson D, Mather AE, Scanlan E, Gupta S, de Vries SP, Bailey D, Harris SR, McKinley TJ, Méric G, Berry SK, Mastroeni P, Sheppard SK, Christie G, Thomson NR, Parkhill J, Maskell DJ and Grant AJ

    Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge, UK.

    Campylobacter jejuni, the most common cause of bacterial diarrhoeal disease, is normally helical. However, it can also adopt straight rod, elongated helical and coccoid forms. Studying how helical morphology is generated, and how it switches between its different forms, is an important objective for understanding this pathogen. Here, we aimed to determine the genetic factors involved in generating the helical shape of Campylobacter. A C. jejuni transposon (Tn) mutant library was screened for non-helical mutants with inconsistent results. Whole genome sequence variation and morphological trends within this Tn library, and in various C. jejuni wild type strains, were compared and correlated to detect genomic elements associated with helical and rod morphologies. All rod-shaped C. jejuni Tn mutants and all rod-shaped laboratory, clinical and environmental C. jejuni and Campylobacter coli contained genetic changes within the pgp1 or pgp2 genes, which encode peptidoglycan modifying enzymes. We therefore confirm the importance of Pgp1 and Pgp2 in the maintenance of helical shape and extended this to a wide range of C. jejuni and C. coli isolates. Genome sequence analysis revealed variation in the sequence and length of homopolymeric tracts found within these genes, providing a potential mechanism of phase variation of cell shape.

    Scientific reports 2016;6;38303

  • DNA Methylation Dynamics of Human Hematopoietic Stem Cell Differentiation.

    Farlik M, Halbritter F, Müller F, Choudry FA, Ebert P, Klughammer J, Farrow S, Santoro A, Ciaurro V, Mathur A, Uppal R, Stunnenberg HG, Ouwehand WH, Laurenti E, Lengauer T, Frontini M and Bock C

    CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria.

    Hematopoietic stem cells give rise to all blood cells in a differentiation process that involves widespread epigenome remodeling. Here we present genome-wide reference maps of the associated DNA methylation dynamics. We used a meta-epigenomic approach that combines DNA methylation profiles across many small pools of cells and performed single-cell methylome sequencing to assess cell-to-cell heterogeneity. The resulting dataset identified characteristic differences between HSCs derived from fetal liver, cord blood, bone marrow, and peripheral blood. We also observed lineage-specific DNA methylation between myeloid and lymphoid progenitors, characterized immature multi-lymphoid progenitors, and detected progressive DNA methylation differences in maturing megakaryocytes. We linked these patterns to gene expression, histone modifications, and chromatin accessibility, and we used machine learning to derive a model of human hematopoietic differentiation directly from DNA methylation data. Our results contribute to a better understanding of human hematopoietic stem cell differentiation and provide a framework for studying blood-linked diseases.

    Cell stem cell 2016

  • The mountainous Cretan dietary patterns and their relationship with cardiovascular risk factors: the Hellenic Isolated Cohorts MANOLIS study.

    Farmaki AE, Rayner NW, Matchan A, Spiliopoulou P, Gilly A, Kariakli V, Kiagiadaki C, Tsafantakis E, Zeggini E and Dedoussis G

    1Department of Nutrition and Dietetics,School of Health Science and Education,Harokopio University,70 El Venizelou Avenue,17671 Athens,Greece.

    Objective: We carried out de novo recruitment of a population-based cohort (MANOLIS study) and describe the specific population, which displays interesting characteristics in terms of diet and health in old age, through deep phenotyping.

    Design: Cross-sectional study where anthropometric, biochemical and clinical measurements were taken in addition to interview-based completion of an extensive questionnaire on health and lifestyle parameters. Dietary patterns were derived through principal component analysis based on a validated FFQ.

    Setting: Geographically isolated Mylopotamos villages on Mount Idi, Crete, Greece.

    Subjects: Adults (n 1553).

    Results: Mean age of the participants was 61·6 years and 55·8 % were women. Of the population, 82·7 % were overweight or obese with a significantly different prevalence between overweight men and women (43·4 v. 34·7 %, P=0·002). The majority (70·6 %) of participants were married, while a larger proportion of women were widowed than men (27·8 v. 3·5 %, P<0·001). Smoking was more prevalent in men (38·7 v. 8·2 %, P<0·001), as 88·8% of women had never smoked. Four dietary patterns emerged as characteristic of the population; these were termed 'local', 'high fat and sugar, 'Greek café/tavern' and 'olive oil, fruits and vegetables'. Individuals more adherent to the local dietary pattern presented higher blood glucose (β=4·026, P<0·001). Similarly, individuals with higher compliance with the Greek café/tavern pattern had higher waist-to-hip ratio (β=0·012, P<0·001), blood pressure (β=1·015, P=0·005) and cholesterol (β=5·398, P<0·001).

    Conclusions: Profiling of the MANOLIS elderly population identifies unique unhealthy dietary patterns that are associated with cardiometabolic indices.

    Public health nutrition 2016;1-12

  • Complete Whole-Genome Sequence of Salmonella enterica subsp. enterica Serovar Java NCTC5706.

    Fazal MA, Alexander S, Burnett E, Deheer-Graham A, Oliver K, Holroyd N, Parkhill J and Russell JE

    Culture Collections, Public Health England, London, United Kingdom.

    Salmonellae are a significant cause of morbidity and mortality globally. Here, we report the first complete genome sequence for Salmonella enterica subsp. enterica serovar Java strain NCTC5706. This strain is of historical significance, having been isolated in the pre-antibiotic era and was deposited into the National Collection of Type Cultures in 1939.

    Genome announcements 2016;4;6

  • Distinct Salmonella Enteritidis lineages associated with enterocolitis in high-income settings and invasive disease in low-income settings.

    Feasey NA, Hadfield J, Keddy KH, Dallman TJ, Jacobs J, Deng X, Wigley P, Barquist Barquist L, Langridge GC, Feltwell T, Harris SR, Mather AE, Fookes M, Aslett M, Msefula C, Kariuki S, Maclennan CA, Onsare RS, Weill FX, Le Hello S, Smith AM, McClelland M, Desai P, Parry CM, Cheesbrough J, French N, Campos J, Chabalgoity JA, Betancor L, Hopkins KL, Nair S, Humphrey TJ, Lunguya O, Cogan TA, Tapia MD, Sow SO, Tennant SM, Bornstein K, Levine MM, Lacharme-Lora L, Everett DB, Kingsley RA, Parkhill J, Heyderman RS, Dougan G, Gordon MA and Thomson NR

    Liverpool School of Tropical Medicine, Liverpool, UK.

    An epidemiological paradox surrounds Salmonella enterica serovar Enteritidis. In high-income settings, it has been responsible for an epidemic of poultry-associated, self-limiting enterocolitis, whereas in sub-Saharan Africa it is a major cause of invasive nontyphoidal Salmonella disease, associated with high case fatality. By whole-genome sequence analysis of 675 isolates of S. Enteritidis from 45 countries, we show the existence of a global epidemic clade and two new clades of S. Enteritidis that are geographically restricted to distinct regions of Africa. The African isolates display genomic degradation, a novel prophage repertoire, and an expanded multidrug resistance plasmid. S. Enteritidis is a further example of a Salmonella serotype that displays niche plasticity, with distinct clades that enable it to become a prominent cause of gastroenteritis in association with the industrial production of eggs and of multidrug-resistant, bloodstream-invasive infection in Africa.

    Funded by: NIAID NIH HHS: R01 AI099525; Wellcome Trust: 092152, 100891

    Nature genetics 2016;48;10;1211-7

  • Genome-wide association analysis identifies three new susceptibility loci for childhood body mass index.

    Felix JF, Bradfield JP, Monnereau C, van der Valk RJ, Stergiakouli E, Chesi A, Gaillard R, Feenstra B, Thiering E, Kreiner-Møller E, Mahajan A, Pitkänen N, Joro R, Cavadino A, Huikari V, Franks S, Groen-Blokhuis MM, Cousminer DL, Marsh JA, Lehtimäki T, Curtin JA, Vioque J, Ahluwalia TS, Myhre R, Price TS, Vilor-Tejedor N, Yengo L, Grarup N, Ntalla I, Ang W, Atalay M, Bisgaard H, Blakemore AI, Bonnefond A, Carstensen L, Bone Mineral Density in Childhood Study (BMDCS), Early Genetics and Lifecourse Epidemiology (EAGLE) consortium, Eriksson J, Flexeder C, Franke L, Geller F, Geserick M, Hartikainen AL, Haworth CM, Hirschhorn JN, Hofman A, Holm JC, Horikoshi M, Hottenga JJ, Huang J, Kadarmideen HN, Kähönen M, Kiess W, Lakka HM, Lakka TA, Lewin AM, Liang L, Lyytikäinen LP, Ma B, Magnus P, McCormack SE, McMahon G, Mentch FD, Middeldorp CM, Murray CS, Pahkala K, Pers TH, Pfäffle R, Postma DS, Power C, Simpson A, Sengpiel V, Tiesler CM, Torrent M, Uitterlinden AG, van Meurs JB, Vinding R, Waage J, Wardle J, Zeggini E, Zemel BS, Dedoussis GV, Pedersen O, Froguel P, Sunyer J, Plomin R, Jacobsson B, Hansen T, Gonzalez JR, Custovic A, Raitakari OT, Pennell CE, Widén E, Boomsma DI, Koppelman GH, Sebert S, Järvelin MR, Hyppönen E, McCarthy MI, Lindi V, Harri N, Körner A, Bønnelykke K, Heinrich J, Melbye M, Rivadeneira F, Hakonarson H, Ring SM, Smith GD, Sørensen TI, Timpson NJ, Grant SF, Jaddoe VW, Early Growth Genetics (EGG) Consortium and Bone Mineral Density in Childhood Study BMDCS

    The Generation R Study Group, Department of Pediatrics, Department of Epidemiology,

    A large number of genetic loci are associated with adult body mass index. However, the genetics of childhood body mass index are largely unknown. We performed a meta-analysis of genome-wide association studies of childhood body mass index, using sex- and age-adjusted standard deviation scores. We included 35 668 children from 20 studies in the discovery phase and 11 873 children from 13 studies in the replication phase. In total, 15 loci reached genome-wide significance (P-value < 5 × 10(-8)) in the joint discovery and replication analysis, of which 12 are previously identified loci in or close to ADCY3, GNPDA2, TMEM18, SEC16B, FAIM2, FTO, TFAP2B, TNNI3K, MC4R, GPR61, LMX1B and OLFM4 associated with adult body mass index or childhood obesity. We identified three novel loci: rs13253111 near ELP3, rs8092503 near RAB27B and rs13387838 near ADAM23. Per additional risk allele, body mass index increased 0.04 Standard Deviation Score (SDS) [Standard Error (SE) 0.007], 0.05 SDS (SE 0.008) and 0.14 SDS (SE 0.025), for rs13253111, rs8092503 and rs13387838, respectively. A genetic risk score combining all 15 SNPs showed that each additional average risk allele was associated with a 0.073 SDS (SE 0.011, P-value = 3.12 × 10(-10)) increase in childhood body mass index in a population of 1955 children. This risk score explained 2% of the variance in childhood body mass index. This study highlights the shared genetic background between childhood and adult body mass index and adds three novel loci. These loci likely represent age-related differences in strength of the associations with body mass index.

    Funded by: Wellcome Trust: 098381

    Human molecular genetics 2016;25;2;389-403

  • A whole-genome sequence and transcriptome perspective on HER2-positive breast cancers.

    Ferrari A, Vincent-Salomon A, Pivot X, Sertier AS, Thomas E, Tonon L, Boyault S, Mulugeta E, Treilleux I, MacGrogan G, Arnould L, Kielbassa J, Le Texier V, Blanché H, Deleuze JF, Jacquemier J, Mathieu MC, Penault-Llorca F, Bibeau F, Mariani O, Mannina C, Pierga JY, Trédan O, Bachelot T, Bonnefoi H, Romieu G, Fumoleau P, Delaloge S, Rios M, Ferrero JM, Tarpin C, Bouteille C, Calvo F, Gut IG, Gut M, Martin S, Nik-Zainal S, Stratton MR, Pauporté I, Saintigny P, Birnbaum D, Viari A and Thomas G

    Synergie Lyon Cancer, Plateforme de bioinformatique 'Gilles Thomas' Centre Léon Bérard, 28 rue Laënnec, 69008 Lyon, France.

    HER2-positive breast cancer has long proven to be a clinically distinct class of breast cancers for which several targeted therapies are now available. However, resistance to the treatment associated with specific gene expressions or mutations has been observed, revealing the underlying diversity of these cancers. Therefore, understanding the full extent of the HER2-positive disease heterogeneity still remains challenging. Here we carry out an in-depth genomic characterization of 64 HER2-positive breast tumour genomes that exhibit four subgroups, based on the expression data, with distinctive genomic features in terms of somatic mutations, copy-number changes or structural variations. The results suggest that, despite being clinically defined by a specific gene amplification, HER2-positive tumours melt into the whole luminal-basal breast cancer spectrum rather than standing apart. The results also lead to a refined ERBB2 amplicon of 106 kb and show that several cases of amplifications are compatible with a breakage-fusion-bridge mechanism.

    Nature communications 2016;7;12222

  • The Pfam protein families database: towards a more sustainable future.

    Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A, Salazar GA, Tate J and Bateman A

    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

    In the last two years the Pfam database ( has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/L024136/1; Howard Hughes Medical Institute; Wellcome Trust: 108433/Z/15/Z]

    Nucleic acids research 2016;44;D1;D279-85

  • The diversity of Klebsiella pneumoniae surface polysaccharides.

    Follador R, Heinz E, Wyres KL, Ellington MJ, Kowarik M, Holt KE and Thomson NR

    LimmaTech Biologics AG , Schlieren , Switzerland.

    Klebsiella pneumoniae is considered an urgent health concern due to the emergence of multi-drug-resistant strains for which vaccination offers a potential remedy. Vaccines based on surface polysaccharides are highly promising but need to address the high diversity of surface-exposed polysaccharides, synthesized as O-antigens (lipopolysaccharide, LPS) and K-antigens (capsule polysaccharide, CPS), present in K. pneumoniae. We present a comprehensive and clinically relevant study of the diversity of O- and K-antigen biosynthesis gene clusters across a global collection of over 500 K. pneumoniae whole-genome sequences and the seroepidemiology of human isolates from different infection types. Our study defines the genetic diversity of O- and K-antigen biosynthesis cluster sequences across this collection, identifying sequences for known serotypes as well as identifying novel LPS and CPS gene clusters found in circulating contemporary isolates. Serotypes O1, O2 and O3 were most prevalent in our sample set, accounting for approximately 80 % of all infections. In contrast, K serotypes showed an order of magnitude higher diversity and differ among infection types. In addition we investigated a potential association of O or K serotypes with phylogenetic lineage, infection type and the presence of known virulence genes. K1 and K2 serotypes, which are associated with hypervirulent K. pneumoniae, were associated with a higher abundance of virulence genes and more diverse O serotypes compared to other common K serotypes.

    Microbial genomics 2016;2;8;e000073

  • COSMIC: High-Resolution Cancer Genetics Using the Catalogue of Somatic Mutations in Cancer.

    Forbes SA, Beare D, Bindal N, Bamford S, Ward S, Cole CG, Jia M, Kok C, Boutselakis H, De T, Sondka Z, Ponting L, Stefancsik R, Harsha B, Tate J, Dawson E, Thompson S, Jubb H and Campbell PJ

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom.

    COSMIC ( is an expert-curated database of somatic mutations in human cancer. Broad and comprehensive in scope, recent releases in 2016 describe over 4 million coding mutations across all human cancer disease types. Mutations are annotated across the entire genome, but expert curation is focused on over 400 key cancer genes. Now encompassing the majority of molecular mutation mechanisms in oncogenetics, COSMIC additionally describes 10 million non-coding mutations, 1 million copy-number aberrations, 9 million gene-expression variants, and almost 8 million differentially methylated CpGs. This information combines a consistent interpretation of the data from the major cancer genome consortia and cancer genome literature with exhaustive hand curation of over 22,000 gene-specific literature publications. This unit describes the graphical Web site in detail; alternative protocols overview other ways the entire database can be accessed, analyzed, and downloaded. © 2016 by John Wiley & Sons, Inc.

    Current protocols in human genetics 2016;91;10.11.1-10.11.37

  • HPMCD: the database of human microbial communities from metagenomic datasets and microbial reference genomes.

    Forster SC, Browne HP, Kumar N, Hunt M, Denise H, Mitchell A, Finn RD and Lawley TD

    Host Microbiota Interactions Laboratory, Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK Centre for Innate Immunity and Infectious Diseases, Hudson Institute of Medical Research, Clayton 3168, Australia Department of Molecular and Translational Sciences, Monash University, Clayton 3800, Australia

    The Human Pan-Microbe Communities (HPMC) database ( provides a manually curated, searchable, metagenomic resource to facilitate investigation of human gastrointestinal microbiota. Over the past decade, the application of metagenome sequencing to elucidate the microbial composition and functional capacity present in the human microbiome has revolutionized many concepts in our basic biology. When sufficient high quality reference genomes are available, whole genome metagenomic sequencing can provide direct biological insights and high-resolution classification. The HPMC database provides species level, standardized phylogenetic classification of over 1800 human gastrointestinal metagenomic samples. This is achieved by combining a manually curated list of bacterial genomes from human faecal samples with over 21000 additional reference genomes representing bacteria, viruses, archaea and fungi with manually curated species classification and enhanced sample metadata annotation. A user-friendly, web-based interface provides the ability to search for (i) microbial groups associated with health or disease state, (ii) health or disease states and community structure associated with a microbial group, (iii) the enrichment of a microbial gene or sequence and (iv) enrichment of a functional annotation. The HPMC database enables detailed analysis of human microbial communities and supports research from basic microbiology and immunology to therapeutic development in human health and disease.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/M011755/1; Medical Research Council: 1091097; Wellcome Trust: 098051

    Nucleic acids research 2016;44;D1;D604-9

  • Resistance of Transmitted Founder HIV-1 to IFITM-Mediated Restriction.

    Foster TL, Wilson H, Iyer SS, Coss K, Doores K, Smith S, Kellam P, Finzi A, Borrow P, Hahn BH and Neil SJ

    Department of Infectious Diseases, King's College London Faculty of Life Sciences and Medicine, Guy's Hospital, London SE1 9RT, UK.

    Interferon-induced transmembrane proteins (IFITMs) restrict the entry of diverse enveloped viruses through incompletely understood mechanisms. While IFITMs are reported to inhibit HIV-1, their in vivo relevance is unclear. We show that IFITM sensitivity of HIV-1 strains is determined by the co-receptor usage of the viral envelope glycoproteins as well as IFITM subcellular localization within the target cell. Importantly, we find that transmitted founder HIV-1, which establishes de novo infections, is uniquely resistant to the antiviral activity of IFITMs. However, viral sensitivity to IFITMs, particularly IFITM2 and IFITM3, increases over the first 6 months of infection, primarily as a result of neutralizing antibody escape mutations. Additionally, the ability to evade IFITM restriction contributes to the different interferon sensitivities of transmitted founder and chronic viruses. Together, these data indicate that IFITMs constitute an important barrier to HIV-1 transmission and that escape from adaptive immune responses exposes the virus to antiviral restriction.

    Cell host & microbe 2016

  • Variant Exported Blood-Stage Proteins Encoded by Plasmodium Multigene Families Are Expressed in Liver Stages Where They Are Exported into the Parasitophorous Vacuole.

    Fougère A, Jackson AP, Paraskevi Bechtsi D, Braks JA, Annoura T, Fonager J, Spaccapelo R, Ramesar J, Chevalley-Maurel S, Klop O, van der Laan AM, Tanke HJ, Kocken CH, Pasini EM, Khan SM, Böhme U, van Ooij C, Otto TD, Janse CJ and Franke-Fayard B

    Leiden Malaria Research Group, Parasitology, Center of infectious Diseases, Leiden University Medical Center (LUMC), Leiden, The Netherlands.

    Many variant proteins encoded by Plasmodium-specific multigene families are exported into red blood cells (RBC). P. falciparum-specific variant proteins encoded by the var, stevor and rifin multigene families are exported onto the surface of infected red blood cells (iRBC) and mediate interactions between iRBC and host cells resulting in tissue sequestration and rosetting. However, the precise function of most other Plasmodium multigene families encoding exported proteins is unknown. To understand the role of RBC-exported proteins of rodent malaria parasites (RMP) we analysed the expression and cellular location by fluorescent-tagging of members of the pir, fam-a and fam-b multigene families. Furthermore, we performed phylogenetic analyses of the fam-a and fam-b multigene families, which indicate that both families have a history of functional differentiation unique to RMP. We demonstrate for all three families that expression of family members in iRBC is not mutually exclusive. Most tagged proteins were transported into the iRBC cytoplasm but not onto the iRBC plasma membrane, indicating that they are unlikely to play a direct role in iRBC-host cell interactions. Unexpectedly, most family members are also expressed during the liver stage, where they are transported into the parasitophorous vacuole. This suggests that these protein families promote parasite development in both the liver and blood, either by supporting parasite development within hepatocytes and erythrocytes and/or by manipulating the host immune response. Indeed, in the case of Fam-A, which have a steroidogenic acute regulatory-related lipid transfer (START) domain, we found that several family members can transfer phosphatidylcholine in vitro. These observations indicate that these proteins may transport (host) phosphatidylcholine for membrane synthesis. This is the first demonstration of a biological function of any exported variant protein family of rodent malaria parasites.

    PLoS pathogens 2016;12;11;e1005917

  • An Antibody Screen of a Plasmodium vivax Antigen Library Identifies Novel Merozoite Proteins Associated with Clinical Protection.

    França CT, Hostetler JB, Sharma S, White MT, Lin E, Kiniboro B, Waltmann A, Darcy AW, Li Wai Suen CS, Siba P, King CL, Rayner JC, Fairhurst RM and Mueller I

    Population Health and Immunity Division, Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia.

    Background: Elimination of Plasmodium vivax malaria would be greatly facilitated by the development of an effective vaccine. A comprehensive and systematic characterization of antibodies to P. vivax antigens in exposed populations is useful in guiding rational vaccine design.

    Methodology/principal findings: In this study, we investigated antibodies to a large library of P. vivax entire ectodomain merozoite proteins in 2 Asia-Pacific populations, analysing the relationship of antibody levels with markers of current and cumulative malaria exposure, and socioeconomic and clinical indicators. 29 antigenic targets of natural immunity were identified. Of these, 12 highly-immunogenic proteins were strongly associated with age and thus cumulative lifetime exposure in Solomon Islanders (P<0.001-0.027). A subset of 6 proteins, selected on the basis of immunogenicity and expression levels, were used to examine antibody levels in plasma samples from a population of young Papua New Guinean children with well-characterized individual differences in exposure. This analysis identified a strong association between reduced risk of clinical disease and antibody levels to P12, P41, and a novel hypothetical protein that has not previously been studied, PVX_081550 (IRR 0.46-0.74; P<0.001-0.041).

    Conclusion/significance: These data emphasize the benefits of an unbiased screening approach in identifying novel vaccine candidate antigens. Functional studies are now required to establish whether PVX_081550 is a key component of the naturally-acquired protective immune response, a biomarker of immune status, or both.

    Funded by: Medical Research Council: MR/J002283/1, MR/L012170/1; NIAID NIH HHS: U19 AI089686

    PLoS neglected tropical diseases 2016;10;5;e0004639

  • Drug Sensitivity Assays of Human Cancer Organoid Cultures.

    Francies HE, Barthorpe A, McLaren-Douglas A, Barendt WJ and Garnett MJ

    Wellcome Trust Sanger Institute, Cambridge, CB10 1SA, UK.

    Drug sensitivity testing utilizing preclinical disease models such as cancer cell lines is an important and widely used tool for drug development. Importantly, when combined with molecular data such as gene copy number variation or somatic coding mutations, associations between drug sensitivity and molecular data can be used to develop markers to guide patient therapies. The use of organoids as a preclinical cancer model has become possible following recent work demonstrating that organoid cultures can be derived from patient tumors with a high rate of success. A genetic analysis of colon cancer organoids found that these models encompassed the majority of the somatic variants present within the tumor from which it was derived, and capture much of the genetic diversity of colon cancer observed in patients. Importantly, the systematic sensitivity testing of organoid cultures to anticancer drugs identified clinical gene-drug interactions, suggestive of their potential as preclinical models for testing anticancer drug sensitivity. In this chapter, we describe how to perform medium/high-throughput drug sensitivity screens using 3D organoid cell cultures.

    Methods in molecular biology (Clifton, N.J.) 2016

  • A single dividing cell population with imbalanced fate drives oesophageal tumour growth.

    Frede J, Greulich P, Nagy T, Simons BD and Jones PH

    Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Understanding the cellular mechanisms of tumour growth is key for designing rational anticancer treatment. Here we used genetic lineage tracing to quantify cell behaviour during neoplastic transformation in a model of oesophageal carcinogenesis. We found that cell behaviour was convergent across premalignant tumours, which contained a single proliferating cell population. The rate of cell division was not significantly different in the lesions and the surrounding epithelium. However, dividing tumour cells had a uniform, small bias in cell fate so that, on average, slightly more dividing than non-dividing daughter cells were generated at each round of cell division. In invasive cancers induced by Kras(G12D) expression, dividing cell fate became more strongly biased towards producing dividing over non-dividing cells in a subset of clones. These observations argue that agents that restore the balance of cell fate may prove effective in checking tumour growth, whereas those targeting cycling cells may show little selectivity.

    Funded by: Medical Research Council: MC_PC_12009

    Nature cell biology 2016;18;9;967-78

  • Rapid phenotyping of knockout mice to identify genetic determinants of bone strength.

    Freudenthal B, Logan J, Sanger Institute Mouse Pipelines T, Croucher PI, Williams GR and Bassett JH

    B Freudenthal, Medicine, Imperial College London, London, United Kingdom of Great Britain and Northern Ireland.

    The genetic determinants of osteoporosis remain poorly understood and there is a large unmet need for new treatments in our aging society. Thus, new approaches for gene discovery in skeletal disease are required to complement the current genome wide association studies in human populations. The International Knockout Mouse Consortium (IKMC) and International Mouse Phenotyping Consortium (IMPC) provide such an opportunity. The IKMC is generating knockout mice representing each of the known protein-coding genes in C57BL/6 mice and, as part of the IMPC initiative, the Origins of Bone and Cartilage Disease project is identifying mutants with significant outlier skeletal phenotypes. This initiative will add value to data from large human cohorts and provide a new understanding of bone and cartilage pathophysiology, ultimately leading to the identification of novel drug targets for the treatment of skeletal disease.

    The Journal of endocrinology 2016

  • Tyrosine kinase 2 is not limiting human antiviral type III interferon responses.

    Fuchs S, Kaiser-Labusch P, Bank J, Ammann S, Kolb-Kokocinski A, Edelbusch C, Omran H and Ehl S

    Center for Chronic Immunodeficiency (CCI), Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Germany.

    Tyrosine kinase 2 (TYK2) associates with interferon (IFN) alpha receptor, IL-10 receptor (IL-10R) beta and other cytokine receptor subunits for signal transduction, in response to various cytokines including type I and type III interferons, IL-6, IL-10, IL-12 and IL-23. Data on TYK2 dependence on cytokine responses and in vivo consequences of TYK2 deficiency are inconsistent. We investigated a TYK2 deficient patient, presenting with eczema, skin abscesses, respiratory infections and IgE levels >1000 U/ml, without viral or mycobacterial infections and a corresponding cellular model to analyze the role of TYK2 in type III IFN mediated responses and NK-cell function. We established a novel simple diagnostic monocyte assay to show that the mutation completely abolishes the IFN-α mediated antiviral response. It also partly reduces IL-10 but not IL-6 mediated signaling associated with reduced IL-10Rβ expression. However, we found almost normal type III IFN signaling associated with minimal impairment of virus control in a TYK2 deficient human cell line. Contrary to observations in TYK2 deficient mice, NK-cell phenotype and function, including IL-12/IL-18 mediated responses, were normal in the patient. Thus, preserved type III IFN responses and normal NK-cell function may contribute to antiviral protection in TYK2 deficiency leading to a surprisingly mild human phenotype. This article is protected by copyright. All rights reserved.

    European journal of immunology 2016

  • RUNX1 mutations in acute myeloid leukemia are associated with distinct clinico-pathologic and genetic features.

    Gaidzik VI, Teleanu V, Papaemmanuil E, Weber D, Paschka P, Hahn J, Wallrabenstein T, Kolbinger B, Köhne CH, Horst HA, Brossart P, Held G, Kündgen A, Ringhoffer M, Götze K, Rummel M, Gerstung M, Campbell P, Kraus JM, Kestler HA, Thol F, Heuser M, Schlegelberger B, Ganser A, Bullinger L, Schlenk RF, Döhner K and Döhner H

    Universitätsklinikum Ulm, Ulm, Germany.

    We evaluated the frequency, genetic architecture, clinico-pathologic features, and prognostic impact of RUNX1 mutations in 2439 adult patients with newly diagnosed acute myeloid leukemia (AML). RUNX1 mutations were found in 245 of 2439 (10%) patients; were almost mutually exclusive of AML with recurrent genetic abnormalities; and they co-occurred with a complex pattern of gene mutations, frequently involving mutations in epigenetic modifiers (ASXL1, IDH2, KMT2A, EZH2), components of the spliceosome complex (SRSF2, SF3B1), and STAG2, PHF6, BCOR. RUNX1 mutations were associated with older age (16-59 years: 8.5%; >60 years: 15.1%), male gender, more immature morphology, and secondary AML evolving from myelodysplastic syndrome. In univariable analyses, RUNX1 mutations were associated with inferior event-free (EFS, P<0.0001), relapse-free (RFS, P=0.0007), and overall survival (OS, P<0.0001) in all patients, remaining significant when age was considered. In multivariable analysis, RUNX1 mutations predicted for inferior EFS (P=0.01). The effect of co-mutation varied by partner gene, where patients with the secondary genotypes RUNX1(mut)/ASXL1(mut) (OS, P=0.004), RUNX1(mut)/SRSF2(mut) (OS, P=0.007), and RUNX1(mut)/PHF6(mut) (OS, P=0.03) did significantly worse, whereas patients with the genotype RUNX1(mut)/IDH2(mut) (OS, P=0.04) had a better outcome. In conclusion, RUNX1-mutated AML are associated with a complex mutation cluster and are correlated with distinct clinico-pathologic features, and inferior prognosis.Leukemia accepted article preview online, 03 May 2016. doi:10.1038/leu.2016.126.

    Leukemia 2016

  • tRNA fragments: novel players in intergenerational inheritance.

    Gapp K and Miska EA

    The Gurdon Institute and Department of Genetics, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK.

    Non-genetic inheritance is an evocative topic; in the past few years, the debate around potential inheritance of life-time experiences independent of social factors in mammals has become highly prominent due to increasing evidence for phenotypes in the offspring after paternal environmental exposures. Strikingly, two independent studies published in Science newly implicate a special class of RNA, transfer RNA fragments, in the intergenerational effects of paternal dietary intervention.

    Cell research 2016

  • MPRAnator: a web-based tool for the design of Massively Parallel Reporter Assay experiments.

    Georgakopoulos-Soares I, Jain N, Gray JM and Hemberg M

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK,

    Motivation: With the rapid advances in DNA synthesis and sequencing technologies and the continuing decline in the associated costs, high-throughput experiments can be performed to investigate the regulatory role of thousands of oligonucleotide sequences simultaneously. Nevertheless, designing high-throughput reporter assay experiments such as Massively Parallel Reporter Assays (MPRAs) and similar methods remains challenging.

    Results: We introduce MPRAnator, a set of tools that facilitate rapid design of MPRA experiments. With MPRA Motif design, a set of variables provides fine control of how motifs are placed into sequences, thereby allowing the investigation of the rules that govern TF occupancy. MPRA SNP design can be used to systematically examine the functional effects of single or combinations of SNPs at regulatory sequences. Finally, the Transmutation tool allows for the design of negative controls by permitting scrambling, reversing, complementing or introducing multiple random mutations in the input sequences or motifs.

    Availability: MPRAnator tool set is implemented in Python, Perl and Javascript and is freely available at: and The source code is available on under the MIT license. The REST API allows programmatic access to MPRAnator using simple URLs.

    Contact: or SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

    Bioinformatics (Oxford, England) 2016

  • Interleukin-13 Activates Distinct Cellular Pathways Leading to Ductular Reaction, Steatosis, and Fibrosis.

    Gieseck RL, Ramalingam TR, Hart KM, Vannella KM, Cantu DA, Lu WY, Ferreira-González S, Forbes SJ, Vallier L and Wynn TA

    Immunopathogenesis Section, Laboratory of Parasitic Diseases, National Institute of Allergy and Infectious Diseases, NIH, Bethesda, MD 20852, USA; Wellcome Trust-Medical Research Council Stem Cell Institute, Anne McLaren Laboratory, Department of Surgery, University of Cambridge, Cambridge CB2 0SZ, UK.

    Fibroproliferative diseases are driven by dysregulated tissue repair responses and are a major cause of morbidity and mortality because they affect nearly every organ system. Type 2 cytokine responses are critically involved in tissue repair; however, the mechanisms that regulate beneficial regeneration versus pathological fibrosis are not well understood. Here, we have shown that the type 2 effector cytokine interleukin-13 simultaneously, yet independently, directed hepatic fibrosis and the compensatory proliferation of hepatocytes and biliary cells in progressive models of liver disease induced by interleukin-13 overexpression or after infection with Schistosoma mansoni. Using transgenic mice with interleukin-13 signaling genetically disrupted in hepatocytes, cholangiocytes, or resident tissue fibroblasts, we have revealed direct and distinct roles for interleukin-13 in fibrosis, steatosis, cholestasis, and ductular reaction. Together, these studies show that these mechanisms are simultaneously controlled but distinctly regulated by interleukin-13 signaling. Thus, it may be possible to promote interleukin-13-dependent hepatobiliary expansion without generating pathological fibrosis. VIDEO ABSTRACT.

    Funded by: Intramural NIH HHS: Z01 AI000829-11, Z01 AI001019-01

    Immunity 2016;45;1;145-58

  • Cytokine profiles during invasive nontyphoidal Salmonella disease predict outcome in African children.

    Gilchrist JJ, Heath JN, Msefula CL, Gondwe EN, Naranbhai V, Mandala W, MacLennan JM, Molyneux EM, Graham SM, Drayson MT, Molyneux ME and MacLennan CA

    Wellcome Trust Centre for Human Genetics, University of Oxford, UK Department of Paediatrics, University of Oxford, UK.

    Nontyphoidal Salmonellae are a leading cause of sepsis in African children. Cytokine responses are central to the pathophysiology of sepsis and predict sepsis outcome in other settings. In this study we investigated cytokine responses to invasive nontyphoidal Salmonella (iNTS) disease in Malawian children. We determined serum concentrations of 48 cytokines with multiplexed immunoassays in Malawian children during acute iNTS disease (n = 111) and in convalescence (n = 77). Principal components analysis and logistic regression were used to identify cytokine signatures of acute iNTS disease. We further investigated whether these responses are altered by HIV co-infection or severe malnutrition, and whether cytokine responses predict inpatient mortality. Cytokine changes in acute iNTS disease were associated with two distinct cytokine signatures. The first is characterized by increased concentrations of mediators known to be associated with macrophage function, and the second by raised pro- and anti-inflammatory cytokines typical of responses reported in sepsis secondary to diverse pathogens. These cytokine responses were largely unaltered by either severe malnutrition or HIV co-infection. Children with fatal disease had a distinctive cytokine profile, characterized by raised mediators known to be associated with neutrophil function. In conclusion, cytokine responses to acute iNTS infection in Malawian children are reflective of both the cytokine storm typical of sepsis secondary to diverse pathogens, and the intra-macrophage replicative niche of NTS. The cytokine profile predictive of fatal disease supports a key role of neutrophils in the pathogenesis of NTS sepsis.

    Clinical and vaccine immunology : CVI 2016

  • Very low depth sequencing in a founder population identifies a cardioprotective APOC3 signal missed by genome-wide imputation.

    Gilly A, Ritchie GR, Southam L, Farmaki AE, Tsafantakis E, Dedoussis G and Zeggini E

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    Cohort-wide very low depth whole genome sequencing (WGS) can comprehensively capture low frequency sequence variation for the cost of a dense genome-wide genotyping array. Here we analyse 1x sequence data across the APOC3 gene in a founder population from the island of Crete in Greece (n=1239) and find significant evidence for association with blood triglyceride levels with the previously reported R19X cardioprotective null mutation (β=-1.09,σ=0.163, p=8.2x10(-11)) and a second loss of function mutation, rs138326449 (β=-1.17,σ=0.188, p=1.14x10(-9)). The signal cannot be recapitulated by imputing genome-wide genotype data on a large reference panel of 5122 individuals including 249 with 4x WGS data from the same population. Gene-level meta-analysis with other studies reporting burden signals at APOC3 provides robust evidence for a replicable cardioprotective rare variant aggregation (p=3.2x10(-31), n=13,480).

    Human molecular genetics 2016

  • New insights into sex chromosome evolution in anole lizards (Reptilia, Dactyloidae).

    Giovannotti M, Trifonov VA, Paoletti A, Kichigin IG, O'Brien PC, Kasai F, Giovagnoli G, Ng BL, Ruggeri P, Cerioni PN, Splendiani A, Pereira JC, Olmo E, Rens W, Caputo Barucchi V and Ferguson-Smith MA

    Dipartimento di Scienze della Vita e dell'Ambiente, Università Politecnica delle Marche, via Brecce Bianche, 60131, Ancona, Italy.

    Anoles are a clade of iguanian lizards that underwent an extensive radiation between 125 and 65 million years ago. Their karyotypes show wide variation in diploid number spanning from 26 (Anolis evermanni) to 44 (A. insolitus). This chromosomal variation involves their sex chromosomes, ranging from simple systems (XX/XY), with heterochromosomes represented by either micro- or macrochromosomes, to multiple systems (X1X1X2X2/X1X2Y). Here, for the first time, the homology relationships of sex chromosomes have been investigated in nine anole lizards at the whole chromosome level. Cross-species chromosome painting using sex chromosome paints from A. carolinensis, Ctenonotus pogus and Norops sagrei and gene mapping of X-linked genes demonstrated that the anole ancestral sex chromosome system constituted by microchromosomes is retained in all the species with the ancestral karyotype (2n = 36, 12 macro- and 24 microchromosomes). On the contrary, species with a derived karyotype, namely those belonging to genera Ctenonotus and Norops, show a series of rearrangements (fusions/fissions) involving autosomes/microchromosomes that led to the formation of their current sex chromosome systems. These results demonstrate that different autosomes were involved in translocations with sex chromosomes in closely related lineages of anole lizards and that several sequential microautosome/sex chromosome fusions lead to a remarkable increase in size of Norops sagrei sex chromosomes.

    Chromosoma 2016

  • Rapid Karyotype Evolution in Lasiopodomys Involved at Least Two Autosome - Sex Chromosome Translocations.

    Gladkikh OL, Romanenko SA, Lemskaya NA, Serdyukova NA, O'Brien PC, Kovalskaya JM, Smorkatcheva AV, Golenishchev FN, Perelman PL, Trifonov VA, Ferguson-Smith MA, Yang F and Graphodatsky AS

    Institute of Molecular and Cellular Biology, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.

    The generic status of Lasiopodomys and its division into subgenera Lasiopodomys (L. mandarinus, L. brandtii) and Stenocranius (L. gregalis, L. raddei) are not generally accepted because of contradictions between the morphological and molecular data. To obtain cytogenetic evidence for the Lasiopodomys genus and its subgenera and to test the autosome to sex chromosome translocation hypothesis of sex chromosome complex origin in L. mandarinus proposed previously, we hybridized chromosome painting probes from the field vole (Microtus agrestis, MAG) and the Arctic lemming (Dicrostonyx torquatus, DTO) onto the metaphases of a female Mandarin vole (L. mandarinus, 2n = 47) and a male Brandt's vole (L. brandtii, 2n = 34). In addition, we hybridized Arctic lemming painting probes onto chromosomes of a female narrow-headed vole (L. gregalis, 2n = 36). Cross-species painting revealed three cytogenetic signatures (MAG12/18, 17a/19, and 22/24) that could validate the genus Lasiopodomys and indicate the evolutionary affinity of L. gregalis to the genus. Moreover, all three species retained the associations MAG1bc/17b and 2/8a detected previously in karyotypes of all arvicolins studied. The associations MAG2a/8a/19b, 8b/21, 9b/23, 11/13b, 12b/18, 17a/19a, and 5 fissions of ancestral segments appear to be characteristic for the subgenus Lasiopodomys. We also validated the autosome to sex chromosome translocation hypothesis on the origin of complex sex chromosomes in L. mandarinus. Two translocations of autosomes onto the ancestral X chromosome in L. mandarinus led to a complex of neo-X1, neo-X2, and neo-X3 elements. Our results demonstrate that genus Lasiopodomys represents a striking example of rapid chromosome evolution involving both autosomes and sex chromosomes. Multiple reshuffling events including Robertsonian fusions, chromosomal fissions, inversions and heterochromatin expansion have led to the formation of modern species karyotypes in a very short time, about 2.4 MY.

    PloS one 2016;11;12;e0167653

  • GENOMICS. A federated ecosystem for sharing genomic, clinical data.

    Global Alliance for Genomics and Health

    Science (New York, N.Y.) 2016;352;6291;1278-80

  • Chromosomal phylogeny of Vampyressine bats (Chiroptera, Phyllostomidae) with description of two new sex chromosome systems.

    Gomes AJ, Nagamachi CY, Rodrigues LR, Benathar TC, Ribas TF, O'Brien PC, Yang F, Ferguson-Smith MA and Pieczarka JC

    Laboratório de Citogenética, CEABIO, ICB, Universidade Federal do Pará, Belém, Brazil.

    Background: The subtribe Vampyressina (sensu Baker et al. 2003) encompasses approximately 43 species and seven genera and is a recent and diversified group of New World leaf-nosed bats specialized in fruit eating. The systematics of this group continues to be debated mainly because of the lack of congruence between topologies generated by molecular and morphological data. We analyzed seven species of all genera of vampyressine bats by multidirectional chromosome painting, using whole-chromosome-painting probes from Carollia brevicauda and Phyllostomus hastatus. Phylogenetic analyses were performed using shared discrete chromosomal segments as characters and the Phylogenetic Analysis Using Parsimony (PAUP) software package, using Desmodontinae as outgroup. We also used the Tree Analysis Using New Technology (TNT) software.

    Results: The result showed a well-supported phylogeny congruent with molecular topologies regarding the sister taxa relationship of Vampyressa and Mesophylla genera, as well as the close relationship between the genus Chiroderma and Vampyriscus.

    Conclusions: Our results supported the hypothesis that all genera of this subtribe have compound sex chromosome systems that originated from an X-autosome translocation, an ancestral condition observed in the Stenodermatinae. Additional rearrangements occurred independently in the genus Vampyressa and Mesophylla yielding the X1X1X2X2/X1X2Y sex chromosome system. This work presents additional data supporting the hypothesis based on molecular studies regarding the polyphyly of the genus Vampyressa and its sister relationship to Mesophylla.

    BMC evolutionary biology 2016;16;1;119

  • Standardized Welfare Terms for the Zebrafish Community.

    Goodwin N, Karp NA, Blackledge S, Clark B, Keeble R, Kovacs C, Murray KN, Price M, Thompson P and Bussell J

    1 Research Support Facility, Wellcome Trust Sanger Institute , Cambridge, United Kingdom .

    Managing the welfare of laboratory animals is critical to animal health, vital in the understanding of phenotypes created by treatment or genetic alteration and ensures compliance of regulations. Part of an animal welfare assessment is the requirement to record observations, ensuring all those responsible for the animals are aware of their health status and can act accordingly. Although the use of zebrafish in research continues to increase, guidelines for conducting welfare assessments and the reporting of observations are considered unclear compared to mammalian species. To support the movement of zebrafish between facilities, significant improvement would be achieved through the use of standardized terms to ensure clarity and consistency between facilities. Improving the clarity of terminology around welfare not only addresses our ethical obligation but also supports the research goals and provides a searchable description of the phenotypes. A Collaboration between the Wellcome Trust Sanger Institute and Cambridge University (Department of Medicine-Laboratory of Molecular Biology) has led to the creation of the zebrafish welfare terms from which standardization of terminology can be achieved.

    Zebrafish 2016

  • Evaluating and Optimizing Fish Health and Welfare During Experimental Procedures.

    Goodwin N, Westall L, Karp NA, Hazlehurst D, Kovacs C, Keeble R, Thompson P, Collins R and Bussell J

    1 Research Support Facility, Wellcome Trust Sanger Institute , Cambridge, United Kingdom .

    Many facilities house fish in separate static containers post-procedure, for example, while awaiting genotyping results. This ensures fish can be easily identified, but it does not allow for provision of continuous filtered water or diet. At the Wellcome Trust Sanger Institute, concern over the housing conditions led to the development of an individual housing system (GeneS) enabling feeding and water filtration. Trials to compare the water quality measures between the various systems found that fish housed in static containers experienced rapid deterioration in water quality. By day 1, measures of ammonia were outside the Institute's prescribed values and continued to rise until it was 25-fold higher than recommended levels. Nitrite levels were also outside recommended levels for all fish by day 9 and were twofold higher by the end of the trial. The water quality measures for tanks held on the recirculating system were stable even though food was provided. These results indicate that for housing zebrafish, running water or appropriately timed water changes are a critical component to ensure that the ethical obligations are met.

    Zebrafish 2016

  • Meta-analysis of 375,000 individuals identifies 38 susceptibility loci for migraine.

    Gormley P, Anttila V, Winsvold BS, Palta P, Esko T, Pers TH, Farh KH, Cuenca-Leon E, Muona M, Furlotte NA, Kurth T, Ingason A, McMahon G, Ligthart L, Terwindt GM, Kallela M, Freilinger TM, Ran C, Gordon SG, Stam AH, Steinberg S, Borck G, Koiranen M, Quaye L, Adams HH, Lehtimäki T, Sarin AP, Wedenoja J, Hinds DA, Buring JE, Schürks M, Ridker PM, Hrafnsdottir MG, Stefansson H, Ring SM, Hottenga JJ, Penninx BW, Färkkilä M, Artto V, Kaunisto M, Vepsäläinen S, Malik R, Heath AC, Madden PA, Martin NG, Montgomery GW, Kurki MI, Kals M, Mägi R, Pärn K, Hämäläinen E, Huang H, Byrnes AE, Franke L, Huang J, Stergiakouli E, Lee PH, Sandor C, Webber C, Cader Z, Muller-Myhsok B, Schreiber S, Meitinger T, Eriksson JG, Salomaa V, Heikkilä K, Loehrer E, Uitterlinden AG, Hofman A, van Duijn CM, Cherkas L, Pedersen LM, Stubhaug A, Nielsen CS, Männikkö M, Mihailov E, Milani L, Göbel H, Esserlind AL, Christensen AF, Hansen TF, Werge T, International Headache Genetics Consortium, Kaprio J, Aromaa AJ, Raitakari O, Ikram MA, Spector T, Järvelin MR, Metspalu A, Kubisch C, Strachan DP, Ferrari MD, Belin AC, Dichgans M, Wessman M, van den Maagdenberg AM, Zwart JA, Boomsma DI, Smith GD, Stefansson K, Eriksson N, Daly MJ, Neale BM, Olesen J, Chasman DI, Nyholt DR and Palotie A

    Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA.

    Migraine is a debilitating neurological disorder affecting around one in seven people worldwide, but its molecular mechanisms remain poorly understood. There is some debate about whether migraine is a disease of vascular dysfunction or a result of neuronal dysfunction with secondary vascular changes. Genome-wide association (GWA) studies have thus far identified 13 independent loci associated with migraine. To identify new susceptibility loci, we carried out a genetic study of migraine on 59,674 affected subjects and 316,078 controls from 22 GWA studies. We identified 44 independent single-nucleotide polymorphisms (SNPs) significantly associated with migraine risk (P < 5 × 10(-8)) that mapped to 38 distinct genomic loci, including 28 loci not previously reported and a locus that to our knowledge is the first to be identified on chromosome X. In subsequent computational analyses, the identified loci showed enrichment for genes expressed in vascular and smooth muscle tissues, consistent with a predominant theory of migraine that highlights vascular etiologies.

    Nature genetics 2016

  • Invasion of hepatocytes by Plasmodium sporozoites requires cGMP-dependent protein kinase and calcium dependent protein kinase 4.

    Govindasamy K, Jebiwott S, Jaijyan DK, Davidow A, Ojo KK, Van Voorhis WC, Brochet M, Billker O and Bhanot P

    Department of Microbiology, Biochemistry and Molecular Genetics, Rutgers - New Jersey Medical School, Newark, NJ, USA.

    Invasion of hepatocytes by sporozoites is essential for Plasmodium to initiate infection of the mammalian host. The parasite's subsequent intracellular differentiation in the liver is the first developmental step of its mammalian cycle. Despite their biological significance, surprisingly little is known of the signalling pathways required for sporozoite invasion. We report that sporozoite invasion of hepatocytes requires signalling through two second-messengers - cGMP mediated by the parasite's cGMP-dependent protein kinase (PKG), and Ca(2+) , mediated by the parasite's calcium-dependent protein kinase 4 (CDPK4). Sporozoites expressing a mutated form of Plasmodium berghei PKG or carrying a deletion of the CDPK4 gene are defective in invasion of hepatocytes. Using specific and potent inhibitors of Plasmodium PKG and CDPK4, we demonstrate that PKG and CDPK4 are required for sporozoite motility, and that PKG regulates the secretion of TRAP, an adhesin that is essential for motility. Chemical inhibition of PKG decreases parasite egress from hepatocytes by inhibiting either the formation or release of merosomes. In contrast, genetic inhibition of CDPK4 does not significantly decrease the number of merosomes. By revealing the requirement for PKG and CDPK4 in Plasmodium sporozoite invasion, our work enables a better understanding of kinase pathways that act in different Plasmodium stages.

    Molecular microbiology 2016

  • Genomic epidemiology of gonococcal resistance to extended spectrum cephalosporins, macrolides, and fluoroquinolones in the US, 2000-2013.

    Grad YH, Harris SR, Kirkcaldy RD, Green AG, Marks DS, Bentley SD, Trees D and Lipsitch M

    Department of Immunology and Infectious Diseases, Harvard TH Chan School of Public Health, Boston MA, USA Division of Infectious Diseases, Brigham and Women's Hospital, Harvard Medical School, Boston MA, USA

    Background:  Treatment of Neisseria gonorrhoeae infection is empiric and based on population-wide susceptibilities. Increasing antimicrobial resistance underscores the potential importance of rapid diagnostics, including sequence-based tests, to guide therapy. However, the utility of sequence-based diagnostics depends on the prevalence and dynamics of the resistance mechanisms.

    Methods:  We define the prevalence and dynamics of resistance markers to extended spectrum cephalosporins (ESC), macrolides, and fluoroquinolones in 1102 resistant and susceptible clinical N. gonorrhoeae isolates collected from 2000-2013 via the CDC's Gonococcal Isolate Surveillance Project (GISP).

    Results:  Reduced ESC susceptibility (ESC(RS)) is predominantly clonal and associated with the mosaic penA XXXIV allele and derivatives (sensitivity 98% for cefixime, 91% for ceftriaxone), but alternative resistance mechanisms have sporadically emerged. Reduced azithromycin susceptibility (Azi(RS)) has arisen through multiple mechanisms and shows limited clonal spread; the basis for resistance in 36% of Azi(RS) isolates is unclear. Quinolone resistant N. gonorrhoeae (QRNG) have arisen multiple times, with extensive clonal spread.

    Conclusion:  QRNG and reduced cefixime susceptibility appear amenable to development of sequence-based diagnostics, whereas the undefined mechanisms of resistance to ceftriaxone and azithromycin underscore the importance of phenotypic surveillance. The identification of multidrug-resistant isolates highlights the need for additional measures to respond to the threat of untreatable gonorrhea.

    The Journal of infectious diseases 2016

  • Genes Required for the Fitness of Salmonella enterica Serovar Typhimurium during Infection of Immunodeficient gp91-/- phox Mice.

    Grant AJ, Oshota O, Chaudhuri RR, Mayho M, Peters SE, Clare S, Maskell DJ and Mastroeni P

    Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom

    Salmonella enterica causes systemic diseases (typhoid and paratyphoid fever), nontyphoidal septicemia (NTS), and gastroenteritis in humans and other animals worldwide. An important but underrecognized emerging infectious disease problem in sub-Saharan Africa is NTS in children and immunocompromised adults. A current goal is to identify Salmonella mutants that are not pathogenic in the absence of key components of the immune system such as might be found in immunocompromised hosts. Such attenuated strains have the potential to be used as live vaccines. We have used transposon-directed insertion site sequencing (TraDIS) to screen mutants of Salmonella enterica serovar Typhimurium for their ability to infect and grow in the tissues of wild-type and immunodeficient mice. This was to identify bacterial genes that might be deleted for the development of live attenuated vaccines that would be safer to use in situations and/or geographical areas where immunodeficiencies are prevalent. The relative fitness of each of 9,356 transposon mutants, representing mutations in 3,139 different genes, was determined in gp91(-/-) phox mice. Mutations in certain genes led to reduced fitness in both wild-type and mutant mice. To validate these results, these genes were mutated by allelic replacement, and resultant mutants were retested for fitness in the mice. A defined deletion mutant of cysE was attenuated in C57BL/6 wild-type mice and immunodeficient gp91(-/-) phox mice and was effective as a live vaccine in wild-type mice.

    Funded by: Biotechnology and Biological Sciences Research Council: APG19115; Medical Research Council: G1100102; Wellcome Trust: WT098051

    Infection and immunity 2016;84;4;989-97

  • Genetic invalidation of Lp-PLA2 as a therapeutic target: Large-scale study of five functional Lp-PLA2-lowering alleles.

    Gregson JM, Freitag DF, Surendran P, Stitziel NO, Chowdhury R, Burgess S, Kaptoge S, Gao P, Staley JR, Willeit P, Nielsen SF, Caslake M, Trompet S, Polfus LM, Kuulasmaa K, Kontto J, Perola M, Blankenberg S, Veronesi G, Gianfagna F, Männistö S, Kimura A, Lin H, Reilly DF, Gorski M, Mijatovic V, CKDGen consortium, Munroe PB, Ehret GB, International Consortium for Blood Pressure, Thompson A, Uria-Nickelsen M, Malarstig A, Dehghan A, CHARGE inflammation working group, Vogt TF, Sasaoka T, Takeuchi F, Kato N, Yamada Y, Kee F, Müller-Nurasyid M, Ferrières J, Arveiler D, Amouyel P, Salomaa V, Boerwinkle E, Thompson SG, Ford I, Wouter Jukema J, Sattar N, Packard CJ, Shafi Majumder AA, Alam DS, Deloukas P, Schunkert H, Samani NJ, Kathiresan S, MICAD Exome consortium, Nordestgaard BG, Saleheen D, Howson JM, Di Angelantonio E, Butterworth AS, Danesh J and EPIC-CVD consortium and the CHD Exome+ consortium

    MRC/BHF Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, UK.

    Aims: Darapladib, a potent inhibitor of lipoprotein-associated phospholipase A2 (Lp-PLA2), has not reduced risk of cardiovascular disease outcomes in recent randomized trials. We aimed to test whether Lp-PLA2 enzyme activity is causally relevant to coronary heart disease.

    Methods: In 72,657 patients with coronary heart disease and 110,218 controls in 23 epidemiological studies, we genotyped five functional variants: four rare loss-of-function mutations (c.109+2T > C (rs142974898), Arg82His (rs144983904), Val279Phe (rs76863441), Gln287Ter (rs140020965)) and one common modest-impact variant (Val379Ala (rs1051931)) in PLA2G7, the gene encoding Lp-PLA2. We supplemented de-novo genotyping with information on a further 45,823 coronary heart disease patients and 88,680 controls in publicly available databases and other previous studies. We conducted a systematic review of randomized trials to compare effects of darapladib treatment on soluble Lp-PLA2 activity, conventional cardiovascular risk factors, and coronary heart disease risk with corresponding effects of Lp-PLA2-lowering alleles.

    Results: Lp-PLA2 activity was decreased by 64% (p = 2.4 × 10(-25)) with carriage of any of the four loss-of-function variants, by 45% (p < 10(-300)) for every allele inherited at Val279Phe, and by 2.7% (p = 1.9 × 10(-12)) for every allele inherited at Val379Ala. Darapladib 160 mg once-daily reduced Lp-PLA2 activity by 65% (p < 10(-300)). Causal risk ratios for coronary heart disease per 65% lower Lp-PLA2 activity were: 0.95 (0.88-1.03) with Val279Phe; 0.92 (0.74-1.16) with carriage of any loss-of-function variant; 1.01 (0.68-1.51) with Val379Ala; and 0.95 (0.89-1.02) with darapladib treatment.

    Conclusions: In a large-scale human genetic study, none of a series of Lp-PLA2-lowering alleles was related to coronary heart disease risk, suggesting that Lp-PLA2 is unlikely to be a causal risk factor.

    European journal of preventive cardiology 2016

  • Rapid parallel acquisition of somatic mutations after NPM1 in acute myeloid leukaemia evolution.

    Grove CS, Bolli N, Manes N, Varela I, Van't Veer M, Bench A, Eldaly H, Wedge D, Van Loo P and Vassiliou GS

    Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK.

    British journal of haematology 2016

  • Role of Plasmodium vivax Duffy-binding protein 1 in invasion of Duffy-null Africans.

    Gunalan K, Lo E, Hostetler JB, Yewhalaw D, Mu J, Neafsey DE, Yan G and Miller LH

    Laboratory of Malaria and Vector Research, National Institutes of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD 20852;

    The ability of the malaria parasite Plasmodium vivax to invade erythrocytes is dependent on the expression of the Duffy blood group antigen on erythrocytes. Consequently, Africans who are null for the Duffy antigen are not susceptible to P. vivax infections. Recently, P. vivax infections in Duffy-null Africans have been documented, raising the possibility that P. vivax, a virulent pathogen in other parts of the world, may expand malarial disease in Africa. P. vivax binds the Duffy blood group antigen through its Duffy-binding protein 1 (DBP1). To determine if mutations in DBP1 resulted in the ability of P. vivax to bind Duffy-null erythrocytes, we analyzed P. vivax parasites obtained from two Duffy-null individuals living in Ethiopia where Duffy-null and -positive Africans live side-by-side. We determined that, although the DBP1s from these parasites contained unique sequences, they failed to bind Duffy-null erythrocytes, indicating that mutations in DBP1 did not account for the ability of P. vivax to infect Duffy-null Africans. However, an unusual DNA expansion of DBP1 (three and eight copies) in the two Duffy-null P. vivax infections suggests that an expansion of DBP1 may have been selected to allow low-affinity binding to another receptor on Duffy-null erythrocytes. Indeed, we show that Salvador (Sal) I P. vivax infects Squirrel monkeys independently of DBP1 binding to Squirrel monkey erythrocytes. We conclude that P. vivax Sal I and perhaps P. vivax in Duffy-null patients may have adapted to use new ligand-receptor pairs for invasion.

    Proceedings of the National Academy of Sciences of the United States of America 2016

  • Naive Pluripotent Stem Cells Derived Directly from Isolated Cells of the Human Inner Cell Mass.

    Guo G, von Meyenn F, Santos F, Chen Y, Reik W, Bertone P, Smith A and Nichols J

    Wellcome Trust - Medical Research Council Stem Cell Institute, University of Cambridge, Tennis Court Road, Cambridge CB2 1QR, UK.

    Conventional generation of stem cells from human blastocysts produces a developmentally advanced, or primed, stage of pluripotency. In vitro resetting to a more naive phenotype has been reported. However, whether the reset culture conditions of selective kinase inhibition can enable capture of naive epiblast cells directly from the embryo has not been determined. Here, we show that in these specific conditions individual inner cell mass cells grow into colonies that may then be expanded over multiple passages while retaining a diploid karyotype and naive properties. The cells express hallmark naive pluripotency factors and additionally display features of mitochondrial respiration, global gene expression, and genome-wide hypomethylation distinct from primed cells. They transition through primed pluripotency into somatic lineage differentiation. Collectively these attributes suggest classification as human naive embryonic stem cells. Human counterparts of canonical mouse embryonic stem cells would argue for conservation in the phased progression of pluripotency in mammals.

    Stem cell reports 2016;6;4;437-46

  • Functional analysis of an unusual type IV pilus in the Gram-positive Streptococcus sanguinis.

    Gurung I, Spielman I, Davies MR, Lala R, Gaustad P, Biais N and Pelicic V

    MRC Centre for Molecular Bacteriology and Infection, Imperial College London, London, UK.

    Type IV pili (Tfp), which have been studied extensively in a few Gram-negative species, are the paradigm of a group of widespread and functionally versatile nano-machines. Here, we performed the most detailed molecular characterisation of Tfp in a Gram-positive bacterium. We demonstrate that the naturally competent Streptococcus sanguinis produces retractable Tfp, which like their Gram-negative counterparts can generate hundreds of piconewton of tensile force and promote intense surface-associated motility. Tfp power 'train-like' directional motion parallel to the long axis of chains of cells, leading to spreading zones around bacteria grown on plates. However, S. sanguinis Tfp are not involved in DNA uptake, which is mediated by a related but distinct nano-machine, and are unusual because they are composed of two pilins in comparable amounts, rather than one as normally seen. Whole genome sequencing identified a locus encoding all the genes involved in Tfp biology in S. sanguinis. A systematic mutational analysis revealed that Tfp biogenesis in S. sanguinis relies on a more basic machinery (only 10 components) than in Gram-negative species and that a small subset of four proteins dispensable for pilus biogenesis are essential for motility. Intriguingly, one of the piliated mutants that does not exhibit spreading retains microscopic motility but moves sideways, which suggests that the corresponding protein controls motion directionality. Besides establishing S. sanguinis as a useful new model for studying Tfp biology, these findings have important implications for our understanding of these widespread filamentous nano-machines.

    Molecular microbiology 2016;99;2;380-92

  • Functional implications of disease-specific variants in loci jointly associated with coeliac disease and rheumatoid arthritis.

    Gutierrez-Achury J, Zorro MM, Ricaño-Ponce I, Zhernakova DV, Coeliac Disease Immunochip Consortium, RACI Consortium, Diogo D, Raychaudhuri S, Franke L, Trynka G, Wijmenga C and Zhernakova A

    Department of Genetics, University Medical Centre Groningen, University of Groningen, Groningen, The Netherlands.

    Hundreds of genomic loci have been associated with a significant number of immune-mediated diseases, and a large proportion of these associated loci are shared among traits. Both the molecular mechanisms by which these loci confer disease susceptibility and the extent to which shared loci are implicated in a common pathogenesis are unknown. We therefore sought to dissect the functional components at loci shared between two autoimmune diseases: coeliac disease (CeD) and rheumatoid arthritis (RA). We used a cohort of 12 381 CeD cases and 7827 controls, and another cohort of 13 819 RA cases and 12 897 controls, all genotyped with the Immunochip platform. In the joint analysis, we replicated 19 previously identified loci shared by CeD and RA and discovered five new non-HLA loci shared by CeD and RA. Our fine-mapping results indicate that in nine of 24 shared loci the associated variants are distinct in the two diseases. Using cell-type-specific histone markers, we observed that loci which pointed to the same variants in both diseases were enriched for marks of promoters active in CD14+ and CD34+ immune cells (P < 0.001), while loci pointing to distinct variants in one of the two diseases showed enrichment for marks of more specialized cell types, like CD4+ regulatory T cells in CeD (P < 0.0001) compared with Th17 and CD15+ in RA (P = 0.0029).

    Funded by: Wellcome Trust: WT098051

    Human molecular genetics 2016;25;1;180-90

  • Ancient DNA and the rewriting of human history: be sparing with Occam's razor.

    Haber M, Mezzavilla M, Xue Y and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Ancient DNA research is revealing a human history far more complex than that inferred from parsimonious models based on modern DNA. Here, we review some of the key events in the peopling of the world in the light of the findings of work on ancient DNA.

    Funded by: Wellcome Trust: 098051

    Genome biology 2016;17;1

  • Genetic evidence for an origin of the Armenians from Bronze Age mixing of multiple populations.

    Haber M, Mezzavilla M, Xue Y, Comas D, Gasparini P, Zalloua P and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK.

    The Armenians are a culturally isolated population who historically inhabited a region in the Near East bounded by the Mediterranean and Black seas and the Caucasus, but remain under-represented in genetic studies and have a complex history including a major geographic displacement during World War I. Here, we analyse genome-wide variation in 173 Armenians and compare them with 78 other worldwide populations. We find that Armenians form a distinctive cluster linking the Near East, Europe, and the Caucasus. We show that Armenian diversity can be explained by several mixtures of Eurasian populations that occurred between ~3000 and ~2000 bce, a period characterized by major population migrations after the domestication of the horse, appearance of chariots, and the rise of advanced civilizations in the Near East. However, genetic signals of population mixture cease after ~1200 bce when Bronze Age civilizations in the Eastern Mediterranean world suddenly and violently collapsed. Armenians have since remained isolated and genetic structure within the population developed ~500 years ago when Armenia was divided between the Ottomans and the Safavid Empire in Iran. Finally, we show that Armenians have higher genetic affinity to Neolithic Europeans than other present-day Near Easterners, and that 29% of Armenian ancestry may originate from an ancestral population that is best represented by Neolithic Europeans.

    Funded by: Wellcome Trust: 077009

    European journal of human genetics : EJHG 2016;24;6;931-6

  • Wide distribution and altitude correlation of an archaic high-altitude-adaptive EPAS1 haplotype in the Himalayas.

    Hackinger S, Kraaijenbrink T, Xue Y, Mezzavilla M, Asan, van Driem G, Jobling MA, de Knijff P, Tyler-Smith C and Ayub Q

    The Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    High-altitude adaptation in Tibetans is influenced by introgression of a 32.7-kb haplotype from the Denisovans, an extinct branch of archaic humans, lying within the endothelial PAS domain protein 1 (EPAS1), and has also been reported in Sherpa. We genotyped 19 variants in this genomic region in 1507 Eurasian individuals, including 1188 from Bhutan and Nepal residing at altitudes between 86 and 4550 m above sea level. Derived alleles for five SNPs characterizing the core Denisovan haplotype (AGGAA) were present at high frequency not only in Tibetans and Sherpa, but also among many populations from the Himalayas, showing a significant correlation with altitude (Spearman's correlation coefficient = 0.75, p value 3.9 × 10(-11)). Seven East- and South-Asian 1000 Genomes Project individuals shared the Denisovan haplotype extending beyond the 32-kb region, enabling us to refine the haplotype structure and identify a candidate regulatory variant (rs370299814) that might be interacting in an additive manner with the derived G allele of rs150877473, the variant previously associated with high-altitude adaptation in Tibetans. Denisovan-derived alleles were also observed at frequencies of 3-14% in the 1000 Genomes Project African samples. The closest African haplotype is, however, separated from the Asian high-altitude haplotype by 22 mutations whereas only three mutations, including rs150877473, separate the Asians from the Denisovan, consistent with distant shared ancestry for African and Asian haplotypes and Denisovan adaptive introgression.

    Funded by: Wellcome Trust: 087576, 098051

    Human genetics 2016;135;4;393-402

  • A bit of a mouthful.

    Hadfield J and David S

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    This month's Genome Watch explores recent advances in the identification of species-level and strain-level diversity in microbiome studies, and highlights how these have provided insights into the tropism and persistence of Neisseria spp. in the human oral cavity.

    Nature reviews. Microbiology 2016;14;9;548

  • Great ape Y Chromosome and mitochondrial DNA phylogenies reflect subspecies structure and patterns of mating and dispersal.

    Hallast P, Maisano Delser P, Batini C, Zadik D, Rocchi M, Schempp W, Tyler-Smith C and Jobling MA

    Department of Genetics, University of Leicester, Leicester LE1 7RH, United Kingdom; Institute of Molecular and Cell Biology, University of Tartu, Tartu 51010, Estonia;

    The distribution of genetic diversity in great ape species is likely to have been affected by patterns of dispersal and mating. This has previously been investigated by sequencing autosomal and mitochondrial DNA (mtDNA), but large-scale sequence analysis of the male-specific region of the Y Chromosome (MSY) has not yet been undertaken. Here, we use the human MSY reference sequence as a basis for sequence capture and read mapping in 19 great ape males, combining the data with sequences extracted from the published whole genomes of 24 additional males to yield a total sample of 19 chimpanzees, four bonobos, 14 gorillas, and six orangutans, in which interpretable MSY sequence ranges from 2.61 to 3.80 Mb. This analysis reveals thousands of novel MSY variants and defines unbiased phylogenies. We compare these with mtDNA-based trees in the same individuals, estimating time-to-most-recent common ancestor (TMRCA) for key nodes in both cases. The two loci show high topological concordance and are consistent with accepted (sub)species definitions, but time depths differ enormously between loci and (sub)species, likely reflecting different dispersal and mating patterns. Gorillas and chimpanzees/bonobos present generally low and high MSY diversity, respectively, reflecting polygyny versus multimale-multifemale mating. However, particularly marked differences exist among chimpanzee subspecies: The western chimpanzee MSY phylogeny has a TMRCA of only 13.2 (10.8-15.8) thousand years, but that for central chimpanzees exceeds 1 million years. Cross-species comparison within a single MSY phylogeny emphasizes the low human diversity, and reveals species-specific branch length variation that may reflect differences in long-term generation times.

    Genome research 2016;26;4;427-39

  • Powerful decomposition of complex traits in a diploid model.

    Hallin J, Märtens K, Young AI, Zackrisson M, Salinas F, Parts L, Warringer J and Liti G

    Institute for Research on Cancer and Aging, Nice (IRCAN), CNRS UMR7284, INSERM U1081, University of Nice Sophia Antipolis, 06107 Nice, France.

    Explaining trait differences between individuals is a core and challenging aim of life sciences. Here, we introduce a powerful framework for complete decomposition of trait variation into its underlying genetic causes in diploid model organisms. We sequence and systematically pair the recombinant gametes of two intercrossed natural genomes into an array of diploid hybrids with fully assembled and phased genomes, termed Phased Outbred Lines (POLs). We demonstrate the capacity of this approach by partitioning fitness traits of 6,642 Saccharomyces cerevisiae POLs across many environments, achieving near complete trait heritability and precisely estimating additive (73%), dominance (10%), second (7%) and third (1.7%) order epistasis components. We map quantitative trait loci (QTLs) and find nonadditive QTLs to outnumber (3:1) additive loci, dominant contributions to heterosis to outnumber overdominant, and extensive pleiotropy. The POL framework offers the most complete decomposition of diploid traits to date and can be adapted to most model organisms.

    Nature communications 2016;7;13311

  • Exploitation of the Apoptosis-Primed State of MYCN-Amplified Neuroblastoma to Develop a Potent and Specific Targeted Therapy Combination.

    Ham J, Costa C, Sano R, Lochmann TL, Sennott EM, Patel NU, Dastur A, Gomez-Caraballo M, Krytska K, Hata AN, Floros KV, Hughes MT, Jakubik CT, Heisey DA, Ferrell JT, Bristol ML, March RJ, Yates C, Hicks MA, Nakajima W, Gowda M, Windle BE, Dozmorov MG, Garnett MJ, McDermott U, Harada H, Taylor SM, Morgan IM, Benes CH, Engelman JA, Mossé YP and Faber AC

    Philips Institute for Oral Health Research, VCU School of Dentistry and Massey Cancer Center, Virginia Commonwealth University, Perkinson Building, Richmond, VA 23298, USA.

    Fewer than half of children with high-risk neuroblastoma survive. Many of these tumors harbor high-level amplification of MYCN, which correlates with poor disease outcome. Using data from our large drug screen we predicted, and subsequently demonstrated, that MYCN-amplified neuroblastomas are sensitive to the BCL-2 inhibitor ABT-199. This sensitivity occurs in part through low anti-apoptotic BCL-xL expression, high pro-apoptotic NOXA expression, and paradoxical, MYCN-driven upregulation of NOXA. Screening for enhancers of ABT-199 sensitivity in MYCN-amplified neuroblastomas, we demonstrate that the Aurora Kinase A inhibitor MLN8237 combines with ABT-199 to induce widespread apoptosis. In diverse models of MYCN-amplified neuroblastoma, including a patient-derived xenograft model, this combination uniformly induced tumor shrinkage, and in multiple instances led to complete tumor regression.

    Cancer cell 2016;29;2;159-72

  • Association of breast cancer risk in BRCA1 and BRCA2 mutation carriers with genetic variants showing differential allelic expression: identification of a modifier of breast cancer risk at locus 11q22.3.

    Hamdi Y, Soucy P, Kuchenbaeker KB, Pastinen T, Droit A, Lemaçon A, Adlard J, Aittomäki K, Andrulis IL, Arason A, Arnold N, Arun BK, Azzollini J, Bane A, Barjhoux L, Barrowdale D, Benitez J, Berthet P, Blok MJ, Bobolis K, Bonadona V, Bonanni B, Bradbury AR, Brewer C, Buecher B, Buys SS, Caligo MA, Chiquette J, Chung WK, Claes KB, Daly MB, Damiola F, Davidson R, De la Hoya M, De Leeneer K, Diez O, Ding YC, Dolcetti R, Domchek SM, Dorfling CM, Eccles D, Eeles R, Einbeigi Z, Ejlertsen B, EMBRACE, Engel C, Gareth Evans D, Feliubadalo L, Foretova L, Fostira F, Foulkes WD, Fountzilas G, Friedman E, Frost D, Ganschow P, Ganz PA, Garber J, Gayther SA, GEMO Study Collaborators, Gerdes AM, Glendon G, Godwin AK, Goldgar DE, Greene MH, Gronwald J, Hahnen E, Hamann U, Hansen TV, Hart S, Hays JL, HEBON, Hogervorst FB, Hulick PJ, Imyanitov EN, Isaacs C, Izatt L, Jakubowska A, James P, Janavicius R, Jensen UB, John EM, Joseph V, Just W, Kaczmarek K, Karlan BY, KConFab Investigators, Kets CM, Kirk J, Kriege M, Laitman Y, Laurent M, Lazaro C, Leslie G, Lester J, Lesueur F, Liljegren A, Loman N, Loud JT, Manoukian S, Mariani M, Mazoyer S, McGuffog L, Meijers-Heijboer HE, Meindl A, Miller A, Montagna M, Mulligan AM, Nathanson KL, Neuhausen SL, Nevanlinna H, Nussbaum RL, Olah E, Olopade OI, Ong KR, Oosterwijk JC, Osorio A, Papi L, Park SK, Pedersen IS, Peissel B, Segura PP, Peterlongo P, Phelan CM, Radice P, Rantala J, Rappaport-Fuerhauser C, Rennert G, Richardson A, Robson M, Rodriguez GC, Rookus MA, Schmutzler RK, Sevenet N, Shah PD, Singer CF, Slavin TP, Snape K, Sokolowska J, Sønderstrup IM, Southey M, Spurdle AB, Stadler Z, Stoppa-Lyonnet D, Sukiennicki G, Sutter C, Tan Y, Tea MK, Teixeira MR, Teulé A, Teo SH, Terry MB, Thomassen M, Tihomirova L, Tischkowitz M, Tognazzo S, Toland AE, Tung N, van den Ouweland AM, van der Luijt RB, van Engelen K, van Rensburg EJ, Varon-Mateeva R, Wappenschmidt B, Wijnen JT, Rebbeck T, Chenevix-Trench G, Offit K, Couch FJ, Nord S, Easton DF, Antoniou AC and Simard J

    Genomics Center, Centre Hospitalier Universitaire de Québec Research Center and Laval University, 2705 Laurier Boulevard, Quebec, QC, G1V 4G2, Canada.

    Purpose: Cis-acting regulatory SNPs resulting in differential allelic expression (DAE) may, in part, explain the underlying phenotypic variation associated with many complex diseases. To investigate whether common variants associated with DAE were involved in breast cancer susceptibility among BRCA1 and BRCA2 mutation carriers, a list of 175 genes was developed based of their involvement in cancer-related pathways.

    Methods: Using data from a genome-wide map of SNPs associated with allelic expression, we assessed the association of ~320 SNPs located in the vicinity of these genes with breast and ovarian cancer risks in 15,252 BRCA1 and 8211 BRCA2 mutation carriers ascertained from 54 studies participating in the Consortium of Investigators of Modifiers of BRCA1/2.

    Results: We identified a region on 11q22.3 that is significantly associated with breast cancer risk in BRCA1 mutation carriers (most significant SNP rs228595 p = 7 × 10(-6)). This association was absent in BRCA2 carriers (p = 0.57). The 11q22.3 region notably encompasses genes such as ACAT1, NPAT, and ATM. Expression quantitative trait loci associations were observed in both normal breast and tumors across this region, namely for ACAT1, ATM, and other genes. In silico analysis revealed some overlap between top risk-associated SNPs and relevant biological features in mammary cell data, which suggests potential functional significance.

    Conclusion: We identified 11q22.3 as a new modifier locus in BRCA1 carriers. Replication in larger studies using estrogen receptor (ER)-negative or triple-negative (i.e., ER-, progesterone receptor-, and HER2-negative) cases could therefore be helpful to confirm the association of this locus with breast cancer risk.

    Breast cancer research and treatment 2016

  • A small Acinetobacter plasmid carrying the tet39 tetracycline resistance determinant.

    Hamidian M, Holt KE, Pickard D and Hall RM

    School of Molecular Bioscience, The University of Sydney, NSW 2006, Australia

    The Journal of antimicrobial chemotherapy 2016;71;1;269-71

  • Rubinstein-Taybi syndrome type 2: report of nine new cases that extend the phenotypic and genotypic spectrum.

    Hamilton MJ, Newbury-Ecob R, Holder-Espinasse M, Yau S, Lillis S, Hurst JA, Clement E, Reardon W, Joss S, Hobson E, Blyth M, Al-Shehhi M, Lynch SA, Suri M and DDD Study

    aDepartment of Clinical Genetics, Nottingham City Hospital, Nottingham bDepartment of Clinical Genetics, University Hospitals Bristol, Bristol cClinical Genetics Service dViapath Analytics LLP, Guy's and St Thomas' Hospital eClinical Genetics Unit, Great Ormond Street Hospital for Children, London fWest of Scotland Clinical Genetics Service, Queen Elizabeth University Hospital, Glasgow gYorkshire Regional Genetics Service, Chapel Allerton Hospital, Leeds hWellcome Trust Sanger Institute, Hinxton, Cambridge, UK iDepartment of Clinical Genetics, Our Lady's Hospital for Children jACoRD, University College Dublin, Dublin, Ireland.

    Rubinstein-Taybi syndrome (RTS) is an autosomal dominant neurodevelopmental disorder characterized by growth deficiency, broad thumbs and great toes, intellectual disability and characteristic craniofacial appearance. Mutations in CREBBP account for around 55% of cases, with a further 8% attributed to the paralogous gene EP300. Comparatively few reports exist describing the phenotype of Rubinstein-Taybi because of EP300 mutations. Clinical and genetic data were obtained from nine patients from the UK and Ireland with pathogenic EP300 mutations, identified either by targeted testing or by exome sequencing. All patients had mild or moderate intellectual impairment. Behavioural or social difficulties were noted in eight patients, including three with autistic spectrum disorders. Typical dysmorphic features of Rubinstein-Taybi were only variably present. Additional observations include maternal pre-eclampsia (2/9), syndactyly (3/9), feeding or swallowing issues (3/9), delayed bone age (2/9) and scoliosis (2/9). Six patients had truncating mutations in EP300, with pathogenic missense mutations identified in the remaining three. The findings support previous observations that microcephaly, maternal pre-eclampsia, mild growth restriction and a mild to moderate intellectual disability are key pointers to the diagnosis of EP300-related RTS. Variability in the presence of typical facial features of Rubinstein-Taybi further highlights clinical heterogeneity, particularly among patients identified by exome sequencing. Features that overlap with Floating-Harbor syndrome, including craniofacial dysmorphism and delayed osseous maturation, were observed in three patients. Previous reports have only described mutations predicted to cause haploinsufficiency of EP300, whereas this cohort includes the first described pathogenic missense mutations in EP300.

    Clinical dysmorphology 2016;25;4;135-45

  • Extreme mutation bias and high AT content in Plasmodium falciparum.

    Hamilton WL, Claessens A, Otto TD, Kekre M, Fairhurst RM, Rayner JC and Kwiatkowski D

    Malaria Programme, Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK

    For reasons that remain unknown, the Plasmodium falciparum genome has an exceptionally high AT content compared to other Plasmodium species and eukaryotes in general - nearly 80% in coding regions and approaching 90% in non-coding regions. Here, we examine how this phenomenon relates to genome-wide patterns of de novo mutation. Mutation accumulation experiments were performed by sequential cloning of six P. falciparum isolates growing in human erythrocytes in vitro for 4 years, with 279 clones sampled for whole genome sequencing at different time points. Genome sequence analysis of these samples revealed a significant excess of G:C to A:T transitions compared to other types of nucleotide substitution, which would naturally cause AT content to equilibrate close to the level seen across the P. falciparum reference genome (80.6% AT). These data also uncover an extremely high rate of small indel mutation relative to other species, primarily associated with repetitive AT-rich sequences, in addition to larger-scale structural rearrangements focused in antigen-coding var genes. In conclusion, high AT content in P. falciparum is driven by a systematic mutational bias and ultimately leads to an unusual level of microstructural plasticity, raising the question of whether this contributes to adaptive evolution.

    Nucleic acids research 2016

  • Public health interventions to protect against falsified medicines: a systematic review of international, national and local policies.

    Hamilton WL, Doyle C, Halliwell-Ewen M and Lambert G

    University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Hills Road, Cambridge CB2 0SP, UK and Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK

    Background: Falsified medicines are deliberately fraudulent drugs that pose a direct risk to patient health and undermine healthcare systems, causing global morbidity and mortality.

    Objective: To produce an overview of anti-falsifying public health interventions deployed at international, national and local scales in low and middle income countries (LMIC).

    Data sources: We conducted a systematic search of the PubMed, Web of Science, Embase and Cochrane Central Register of Controlled Trials databases for healthcare or pharmaceutical policies relevant to reducing the burden of falsified medicines in LMIC.

    Results: Our initial search identified 660 unique studies, of which 203 met title/abstract inclusion criteria and were categorised according to their primary focus: international; national; local pharmacy; internet pharmacy; drug analysis tools. Eighty-four were included in the qualitative synthesis, along with 108 articles and website links retrieved through secondary searches.

    Discussion: On the international stage, we discuss the need for accessible pharmacovigilance (PV) global reporting systems, international leadership and funding incorporating multiple stakeholders (healthcare, pharmaceutical, law enforcement) and multilateral trade agreements that emphasise public health. On the national level, we explore the importance of establishing adequate medicine regulatory authorities and PV capacity, with drug screening along the supply chain. This requires interdepartmental coordination, drug certification and criminal justice legislation and enforcement that recognise the severity of medicine falsification. Local healthcare professionals can receive training on medicine quality assessments, drug registration and pharmacological testing equipment. Finally, we discuss novel technologies for drug analysis which allow rapid identification of fake medicines in low-resource settings. Innovative point-of-purchase systems like mobile phone verification allow consumers to check the authenticity of their medicines.

    Conclusions: Combining anti-falsifying strategies targeting different levels of the pharmaceutical supply chain provides multiple barriers of protection from falsified medicines. This requires the political will to drive policy implementation; otherwise, people around the world remain at risk.

    Health policy and planning 2016

  • Divergent evolution of vitamin B9 binding underlies Juno-mediated adhesion of mammalian gametes.

    Han L, Nishimura K, Sadat Al Hosseini H, Bianchi E, Wright GJ and Jovine L

    Department of Biosciences and Nutrition & Center for Innovative Medicine, Karolinska Institutet, Huddinge, SE-141 83, Sweden.

    The interaction between egg and sperm is the first necessary step of fertilization in all sexually reproducing organisms. A decade-long search for a protein pair mediating this event in mammals culminated in the identification of the glycosylphosphatidylinositol (GPI)-anchored glycoprotein Juno as the egg plasma membrane receptor of sperm Izumo1 [1,2]. The Juno-Izumo1 interaction was shown to be essential for fertilization since mice lacking either gene exhibit sex-specific sterility, making these proteins promising non-hormonal contraceptive targets [1,3]. No structural information is available on how gamete membranes interact at fertilization, and it is unclear how Juno - which was previously named folate receptor (FR) 4, based on sequence similarity considerations - triggers membrane adhesion by binding Izumo1. Here, we report the crystal structure of Juno and find that the overall fold is similar to that of FRα and FRβ but with significant flexibility within the area that corresponds to the rigid ligand-binding site of these bona fide folate receptors. This explains both the inability of Juno to bind vitamin B9/folic acid [1], and why mutations within the flexible region can either abolish or change the species specificity of this interaction. Furthermore, structural similarity between Juno and the cholesterol-binding Niemann-Pick disease type C1 protein (NPC1) suggests how the modified binding surface of Juno may recognize the helical structure of the amino-terminal domain of Izumo1. As Juno appears to be a mammalian innovation, our study indicates that a key evolutionary event in mammalian reproduction originated from the neofunctionalization of the vitamin B9-binding pocket of an ancestral folate receptor molecule.

    Current biology : CB 2016;26;3;R100-1

  • Fast, Accurate and Automatic Ancient Nucleosome and Methylation Maps with epiPALEOMIX.

    Hanghøj K, Seguin-Orlando A, Schubert M, Madsen T, Pedersen JS, Willerslev E and Orlando L

    Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark Laboratoire d'Anthropobiologie Moléculaire et d'Imagerie de Synthèse, Université de Toulouse, University Paul Sabatier, Toulouse, France.

    The first epigenomes from archaic hominins (AH) and ancient anatomically modern humans (AMH) have recently been characterized, based, however, on a limited number of samples. The extent to which ancient genome-wide epigenetic landscapes can be reconstructed thus remains contentious. Here, we present epiPALEOMIX, an open-source and user-friendly pipeline that exploits post-mortem DNA degradation patterns to reconstruct ancient methylomes and nucleosome maps from shotgun and/or capture-enrichment data. Applying epiPALEOMIX to the sequence data underlying 35 ancient genomes including AMH, AH, equids and aurochs, we investigate the temporal, geographical and preservation range of ancient epigenetic signatures. We first assess the quality of inferred ancient epigenetic signatures within well-characterized genomic regions. We find that tissue-specific methylation signatures can be obtained across a wider range of DNA preparation types than previously thought, including when no particular experimental procedures have been used to remove deaminated cytosines prior to sequencing. We identify a large subset of samples for which DNA associated with nucleosomes is protected from post-mortem degradation, and nucleosome positioning patterns can be reconstructed. Finally, we describe parameters and conditions such as DNA damage levels and sequencing depth that limit the preservation of epigenetic signatures in ancient samples. When such conditions are met, we propose that epigenetic profiles of CTCF binding regions can be used to help data authentication. Our work, including epiPALEOMIX, opens for further investigations of ancient epigenomes through time especially aimed at tracking possible epigenetic changes during major evolutionary, environmental, socioeconomic, and cultural shifts.

    Molecular biology and evolution 2016

  • Germline TERT promoter mutations are rare in familial melanoma.

    Harland M, Petljak M, Robles-Espinoza CD, Ding Z, Gruis NA, van Doorn R, Pooley KA, Dunning AM, Aoude LG, Wadt KA, Gerdes AM, Brown KM, Hayward NK, Newton-Bishop JA, Adams DJ and Bishop DT

    Section of Epidemiology and Biostatistics, Leeds Institute of Cancer and Pathology, University of Leeds, Leeds, LS9 7TF, UK.

    Germline CDKN2A mutations occur in 40 % of 3-or-more case melanoma families while mutations of CDK4, BAP1, and genes involved in telomere function (ACD, TERF2IP, POT1), have also been implicated in melanomagenesis. Mutation of the promoter of the telomerase reverse transcriptase (TERT) gene (c.-57 T>G variant) has been reported in one family. We tested for the TERT promoter variant in 675 multicase families wild-type for the known high penetrance familial melanoma genes, 1863 UK population-based melanoma cases and 529 controls. Germline lymphocyte telomere length was estimated in carriers. The c.-57 T>G TERT promoter variant was identified in one 7-case family with multiple primaries and early age of onset (earliest, 15 years) but not among population cases or controls. One family member had multiple primary melanomas, basal cell carcinomas and a bladder tumour. The blood leukocyte telomere length of a carrier was similar to wild-type cases. We provide evidence confirming that a rare promoter variant of TERT (c.-57 T>G) is associated with high penetrance, early onset melanoma and potentially other cancers, and explains <1 % of UK melanoma multicase families. The identification of POT1 and TERT germline mutations highlights the importance of telomere integrity in melanoma biology.

    Funded by: Cancer Research UK: 13031, C588/A19167, C8197/A16565, C8216/A6129; Intramural NIH HHS; NCI NIH HHS: CA83115, R01 CA083115

    Familial cancer 2016;15;1;139-44

  • Transmission of methicillin-resistant Staphylococcus aureus in long-term care facilities and their related healthcare networks.

    Harrison EM, Ludden C, Brodrick HJ, Blane B, Brennan G, Morris D, Coll F, Reuter S, Brown NM, Holmes MA, O'Connell B, Parkhill J, Török ME, Cormican M and Peacock SJ

    Department of Medicine, University of Cambridge, Addenbrooke's Hospital, Box 157, Hills Road, Cambridge, CB2 0QQ, UK.

    Background: Long-term care facilities (LTCF) are potential reservoirs for methicillin-resistant Staphylococcus aureus (MRSA), control of which may reduce MRSA transmission and infection elsewhere in the healthcare system. Whole-genome sequencing (WGS) has been used successfully to understand MRSA epidemiology and transmission in hospitals and has the potential to identify transmission between these and LTCF.

    Methods: Two prospective observational studies of MRSA carriage were conducted in LTCF in England and Ireland. MRSA isolates were whole-genome sequenced and analyzed using established methods. Genomic data were available for MRSA isolated in the local healthcare systems (isolates submitted by hospitals and general practitioners).

    Results: We sequenced a total of 181 MRSA isolates from the two study sites. The majority of MRSA were multilocus sequence type (ST)22. WGS identified one likely transmission event between residents in the English LTCF and three putative transmission events in the Irish LTCF. WGS also identified closely related isolates present in colonized Irish residents and their immediate environment. Based on phylogenetic reconstruction, closely related MRSA clades were identified between the LTCF and their healthcare referral network, together with putative MRSA acquisition by LTCF residents during hospital admission.

    Conclusions: These data confirm that MRSA is transmitted between residents of LTCF and is both acquired and transmitted to others in referral hospitals and beyond. Our data present compelling evidence for the importance of environmental contamination in MRSA transmission, reinforcing the importance of environmental cleaning. The use of WGS in this study highlights the need to consider infection control in hospitals and community healthcare facilities as a continuum.

    Genome medicine 2016;8;1;102

  • Genome-wide time-to-event analysis on smoking progression stages in a family-based study.

    He L, Pitkäniemi J, Heikkilä K, Chou YL, Madden PA, Korhonen T, Sarin AP, Ripatti S, Kaprio J and Loukola A

    Department of Public Health University of Helsinki Helsinki Finland.

    Background: Various pivotal stages in smoking behavior can be identified, including initiation, conversion from experimenting to established use, development of tolerance, and cessation. Previous studies have shown high heritability for age of smoking initiation and cessation; however, time-to-event genome-wide association studies aiming to identify underpinning genes that accelerate or delay these transitions are missing to date.

    Methods: We investigated which single nucleotide polymorphisms (SNPs) across the whole genome contribute to the hazard ratio of transition between different stages of smoking behavior by performing time-to-event analyses within a large Finnish twin family cohort (N = 1962), and further conducted mediation analyses of plausible intermediate traits for significant SNPs.

    Results: Genome-wide significant signals were detected for three of the four transitions: (1) for smoking cessation on 10p14 (P = 4.47e-08 for rs72779075 flanked by RP11-575N15 and GATA3), (2) for tolerance on 11p13 (P = 1.29e-08 for rs11031684 in RP1-65P5.1), mediated by smoking quantity, and on 9q34.12 (P = 3.81e-08 for rs2304808 in FUBP3), independent of smoking quantity, and (3) for smoking initiation on 19q13.33 (P = 3.37e-08 for rs73050610 flanked by TRPM4 and SLC6A16) in analysis adjusted for first time sensations. Although our top SNPs did not replicate, another SNP in the TRPM4-SLC6A16 gene region showed statistically significant association after region-based multiple testing correction in an independent Australian twin family sample.

    Conclusion: Our results suggest that the functional effect of the TRPM4-SLC6A16 gene region deserves further investigation, and that complex neurotransmitter networks including dopamine and glutamate may play a critical role in smoking initiation. Moreover, comparison of these results implies that genetic contributions to the complex smoking behavioral phenotypes vary among the transitions.

    Brain and behavior 2016;e00462

  • Linear mixed model for heritability estimation that explicitly addresses environmental variation.

    Heckerman D, Gurdasani D, Kadie C, Pomilla C, Carstensen T, Martin H, Ekoru K, Nsubuga RN, Ssenyomo G, Kamali A, Kaleebu P, Widmer C and Sandhu MS

    Microsoft Research, Los Angeles, CA 90024;

    The linear mixed model (LMM) is now routinely used to estimate heritability. Unfortunately, as we demonstrate, LMM estimates of heritability can be inflated when using a standard model. To help reduce this inflation, we used a more general LMM with two random effects-one based on genomic variants and one based on easily measured spatial location as a proxy for environmental effects. We investigated this approach with simulated data and with data from a Uganda cohort of 4,778 individuals for 34 phenotypes including anthropometric indices, blood factors, glycemic control, blood pressure, lipid tests, and liver function tests. For the genomic random effect, we used identity-by-descent estimates from accurately phased genome-wide data. For the environmental random effect, we constructed a covariance matrix based on a Gaussian radial basis function. Across the simulated and Ugandan data, narrow-sense heritability estimates were lower using the more general model. Thus, our approach addresses, in part, the issue of "missing heritability" in the sense that much of the heritability previously thought to be missing was fictional. Software is available at

    Proceedings of the National Academy of Sciences of the United States of America 2016;113;27;7377-82

  • Conserved Features in the Structure, Mechanism, and Biogenesis of the Inverse Autotransporter Protein Family.

    Heinz E, Stubenrauch CJ, Grinter R, Croft NP, Purcell AW, Strugnell RA, Dougan G and Lithgow T

    Department of Microbiology, Infection & Immunity Program, Biomedicine Discovery Institute, Monash University, Clayton, Australia Wellcome Trust Sanger Institute, Hinxton, United Kingdom.

    The bacterial cell surface proteins intimin and invasin are virulence factors that share a common domain structure and bind selectively to host cell receptors in the course of bacterial pathogenesis. The β-barrel domains of intimin and invasin show significant sequence and structural similarities. Conversely, a variety of proteins with sometimes limited sequence similarity have also been annotated as "intimin-like" and "invasin" in genome datasets, while other recent work on apparently unrelated virulence-associated proteins ultimately revealed similarities to intimin and invasin. Here we characterize the sequence and structural relationships across this complex protein family. Surprisingly, intimins and invasins represent a very small minority of the sequence diversity in what has been previously the "intimin/invasin protein family". Analysis of the assembly pathway for expression of the classic intimin, EaeA, and a characteristic example of the most prevalent members of the group, FdeC, revealed a dependence on the translocation and assembly module as a common feature for both these proteins. While the majority of the sequences in the grouping are most similar to FdeC, a further and widespread group is two-partner secretion systems that use the β-barrel domain as the delivery device for secretion of a variety of virulence factors. This comprehensive analysis supports the adoption of the "inverse autotransporter protein family" as the most accurate nomenclature for the family and, in turn, has important consequences for our overall understanding of the Type V secretion systems of bacterial pathogens.

    Genome biology and evolution 2016;8;6;1690-705

  • Ensembl comparative genomics resources.

    Herrero J, Muffato M, Beal K, Fitzgerald S, Gordon L, Pignatelli M, Vilella AJ, Searle SM, Amode R, Brent S, Spooner W, Kulesha E, Yates A and Flicek P

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD, Bill Lyons Informatics Centre, UCL Cancer Institute, University College London, London WC1E 6DD,

    Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available.Database URL:

    Database : the journal of biological databases and curation 2016;2016

  • Evidence for three genetic loci involved in both anorexia nervosa risk and variation of body mass index.

    Hinney A, Kesselmeier M, Jall S, Volckmar AL, Föcker M, Antel J, GCAN, WTCCC3, Heid IM, Winkler TW, GIANT, Grant SF, EGG, Guo Y, Bergen AW, Kaye W, Berrettini W, Hakonarson H, Price Foundation Collaborative Group, Children’s Hospital of Philadelphia/Price Foundation, Herpertz-Dahlmann B, de Zwaan M, Herzog W, Ehrlich S, Zipfel S, Egberts KM, Adan R, Brandys M, van Elburg A, Boraska Perica V, Franklin CS, Tschöp MH, Zeggini E, Bulik CM, Collier D, Scherag A, Müller TD and Hebebrand J

    Department of Child and Adolescent Psychiatry, Psychotherapy, and Psychosomatics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany.

    The maintenance of normal body weight is disrupted in patients with anorexia nervosa (AN) for prolonged periods of time. Prior to the onset of AN, premorbid body mass index (BMI) spans the entire range from underweight to obese. After recovery, patients have reduced rates of overweight and obesity. As such, loci involved in body weight regulation may also be relevant for AN and vice versa. Our primary analysis comprised a cross-trait analysis of the 1000 single-nucleotide polymorphisms (SNPs) with the lowest P-values in a genome-wide association meta-analysis (GWAMA) of AN (GCAN) for evidence of association in the largest published GWAMA for BMI (GIANT). Subsequently we performed sex-stratified analyses for these 1000 SNPs. Functional ex vivo studies on four genes ensued. Lastly, a look-up of GWAMA-derived BMI-related loci was performed in the AN GWAMA. We detected significant associations (P-values <5 × 10(-5), Bonferroni-corrected P<0.05) for nine SNP alleles at three independent loci. Interestingly, all AN susceptibility alleles were consistently associated with increased BMI. None of the genes (chr. 10: CTBP2, chr. 19: CCNE1, chr. 2: CARF and NBEAL1; the latter is a region with high linkage disequilibrium) nearest to these SNPs has previously been associated with AN or obesity. Sex-stratified analyses revealed that the strongest BMI signal originated predominantly from females (chr. 10 rs1561589; Poverall: 2.47 × 10(-06)/Pfemales: 3.45 × 10(-07)/Pmales: 0.043). Functional ex vivo studies in mice revealed reduced hypothalamic expression of Ctbp2 and Nbeal1 after fasting. Hypothalamic expression of Ctbp2 was increased in diet-induced obese (DIO) mice as compared with age-matched lean controls. We observed no evidence for associations for the look-up of BMI-related loci in the AN GWAMA. A cross-trait analysis of AN and BMI loci revealed variants at three chromosomal loci with potential joint impact. The chromosome 10 locus is particularly promising given that the association with obesity was primarily driven by females. In addition, the detected altered hypothalamic expression patterns of Ctbp2 and Nbeal1 as a result of fasting and DIO implicate these genes in weight regulation.Molecular Psychiatry advance online publication, 17 May 2016; doi:10.1038/mp.2016.71.

    Molecular psychiatry 2016

  • Study profile: the Durban Diabetes Study (DDS): a platform for chronic disease research

    Hird,T.R., Young,E.H., Pirie,F.J., RIHA,J., Esterhuizen,T.M., O'Leary,B., McCarthy,M.I., SANDHU,M.S. and Motala,A.A.

    Global Health, Epidemiology and Genomics 2016;1;e2

  • Genomic Analysis of Companion Rabbit Staphylococcus aureus.

    Holmes MA, Harrison EM, Fisher EA, Graham EM, Parkhill J, Foster G and Paterson GK

    Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom.

    In addition to being an important human pathogen, Staphylococcus aureus is able to cause a variety of infections in numerous other host species. While the S. aureus strains causing infection in several of these hosts have been well characterised, this is not the case for companion rabbits (Oryctolagus cuniculus), where little data are available on S. aureus strains from this host. To address this deficiency we have performed antimicrobial susceptibility testing and genome sequencing on a collection of S. aureus isolates from companion rabbits. The findings show a diverse S. aureus population is able to cause infection in this host, and while antimicrobial resistance was uncommon, the isolates possess a range of known and putative virulence factors consistent with a diverse clinical presentation in companion rabbits including severe abscesses. We additionally show that companion rabbit isolates carry polymorphisms within dltB as described as underlying host-adaption of S. aureus to farmed rabbits. The availability of S. aureus genome sequences from companion rabbits provides an important aid to understanding the pathogenesis of disease in this host and in the clinical management and surveillance of these infections.

    PloS one 2016;11;3;e0151458

  • Five decades of genome evolution in the globally distributed, extensively antibiotic-resistant Acinetobacter baumannii global clone 1.

    Holt K, Kenyon JJ, Hamidian M, Schultz MB, Pickard DJ, Dougan G and Hall R

    Department of Biochemistry & Molecular Biology, The University of Melbourne , Royal Parade, Parkville, Victoria , Australia.

    The majority of Acinetobacter baumannii isolates that are multiply, extensively and pan-antibiotic resistant belong to two globally disseminated clones, GC1 and GC2, that were first noticed in the 1970s. Here, we investigated microevolution and phylodynamics within GC1 via analysis of 45 whole-genome sequences, including 23 sequenced for this study. The most recent common ancestor of GC1 arose around 1960 and later diverged into two phylogenetically distinct lineages. In the 1970s, the main lineage acquired the AbaR resistance island, conferring resistance to older antibiotics, via a horizontal gene transfer event. We estimate a mutation rate of ∼5 SNPs genome(- 1) year(- 1) and detected extensive recombination within GC1 genomes, introducing nucleotide diversity into the population at >20 times the substitution rate (the ratio of SNPs introduced by recombination compared with mutation was 22). The recombination events were non-randomly distributed in the genome and created significant diversity within loci encoding outer surface molecules (including the capsular polysaccharide, the outer core lipooligosaccharide and the outer membrane protein CarO), and spread antimicrobial resistance-conferring mutations affecting the gyrA and parC genes and insertion sequence insertions activating the ampC gene. Both GC1 lineages accumulated resistance to newer antibiotics through various genetic mechanisms, including the acquisition of plasmids and transposons or mutations in chromosomal genes. Our data show that GC1 has diversified into multiple successful extensively antibiotic-resistant subclones that differ in their surface structures. This has important implications for all avenues of control, including epidemiological tracking, antimicrobial therapy and vaccination.

    Microbial genomics 2016;2;2;e000052

  • Palmitoyl Transferases have Critical Roles in the Development of Mosquito and Liver Stages of Plasmodium.

    Hopp CS, Balaban AE, Bushell E, Billker O, Rayner JC and Sinnis P

    Department of Molecular Microbiology & Immunology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA.

    As the Plasmodium parasite transitions between mammalian and mosquito host, it has to adjust quickly to new environments. Palmitoylation, a reversible and dynamic lipid posttranslational modification plays a central role in regulating this process and has been implicated with functions for parasite morphology, motility and host cell invasion. While proteins associated with the gliding motility machinery have been described to be palmitoylated, no palmitoyl transferase responsible for regulating gliding motility has previously been identified. Here, we characterize two palmityol transferases with gene tagging and gene deletion approaches. We identify DHHC3, a palmitoyl transferase as a mediator of ookinete development, with a crucial role for gliding motility in ookinetes and sporozoites and we co-localize the protein with a marker for the inner membrane complex in the ookinete stage. Ookinetes and sporozoites lacking DHHC3 are impaired in gliding motility and exhibit a strong phenotype in vivo; with ookinetes being significantly less infectious to their mosquito host and sporozoites being non-infectious to mice. Importantly, genetic complementation of the DHHC3-ko parasite completely restored virulence. We generated parasites lacking both DHHC3, as well as the palmitoyl transferase DHHC9, and found an enhanced phenotype for these double knockout parasites, allowing insights into the functional overlap and compensational nature of the large family of PbDHHCs. These findings contribute to our understanding of the organization and mechanism of the gliding motility machinery, which as is becoming increasingly clear, is mediated by palmitoylation. This article is protected by copyright. All rights reserved.

    Cellular microbiology 2016

  • Retinol and ascorbate drive erasure of epigenetic memory and enhance reprogramming to naïve pluripotency by complementary mechanisms.

    Hore TA, von Meyenn F, Ravichandran M, Bachman M, Ficz G, Oxley D, Santos F, Balasubramanian S, Jurkowski TP and Reik W

    Epigenetics Programme, Babraham Institute, Cambridge CB22 3AT, United Kingdom; Department of Anatomy, University of Otago, Dunedin 9016, New Zealand;

    Epigenetic memory, in particular DNA methylation, is established during development in differentiating cells and must be erased to create naïve (induced) pluripotent stem cells. The ten-eleven translocation (TET) enzymes can catalyze the oxidation of 5-methylcytosine (5mC) to 5-hydroxymethylcytosine (5hmC) and further oxidized derivatives, thereby actively removing this memory. Nevertheless, the mechanism by which the TET enzymes are regulated, and the extent to which they can be manipulated, are poorly understood. Here we report that retinoic acid (RA) or retinol (vitamin A) and ascorbate (vitamin C) act as modulators of TET levels and activity. RA or retinol enhances 5hmC production in naïve embryonic stem cells by activation of TET2 and TET3 transcription, whereas ascorbate potentiates TET activity and 5hmC production through enhanced Fe(2+) recycling, and not as a cofactor as reported previously. We find that both ascorbate and RA or retinol promote the derivation of induced pluripotent stem cells synergistically and enhance the erasure of epigenetic memory. This mechanistic insight has significance for the development of cell treatments for regenenerative medicine, and enhances our understanding of how intrinsic and extrinsic signals shape the epigenome.

    Proceedings of the National Academy of Sciences of the United States of America 2016

  • Genome-wide associations for birth weight and correlations with adult disease.

    Horikoshi M, Beaumont RN, Day FR, Warrington NM, Kooijman MN, Fernandez-Tajes J, Feenstra B, van Zuydam NR, Gaulton KJ, Grarup N, Bradfield JP, Strachan DP, Li-Gao R, Ahluwalia TS, Kreiner E, Rueedi R, Lyytikäinen LP, Cousminer DL, Wu Y, Thiering E, Wang CA, Have CT, Hottenga JJ, Vilor-Tejedor N, Joshi PK, Boh ET, Ntalla I, Pitkänen N, Mahajan A, van Leeuwen EM, Joro R, Lagou V, Nodzenski M, Diver LA, Zondervan KT, Bustamante M, Marques-Vidal P, Mercader JM, Bennett AJ, Rahmioglu N, Nyholt DR, Ma RC, Tam CH, Tam WH, CHARGE Consortium Hematology Working Group, Ganesh SK, van Rooij FJ, Jones SE, Loh PR, Ruth KS, Tuke MA, Tyrrell J, Wood AR, Yaghootkar H, Scholtens DM, Paternoster L, Prokopenko I, Kovacs P, Atalay M, Willems SM, Panoutsopoulou K, Wang X, Carstensen L, Geller F, Schraut KE, Murcia M, van Beijsterveldt CE, Willemsen G, Appel EV, Fonvig CE, Trier C, Tiesler CM, Standl M, Kutalik Z, Bonàs-Guarch S, Hougaard DM, Sánchez F, Torrents D, Waage J, Hollegaard MV, de Haan HG, Rosendaal FR, Medina-Gomez C, Ring SM, Hemani G, McMahon G, Robertson NR, Groves CJ, Langenberg C, Luan J, Scott RA, Zhao JH, Mentch FD, MacKenzie SM, Reynolds RM, Early Growth Genetics (EGG) Consortium, Lowe WL, Tönjes A, Stumvoll M, Lindi V, Lakka TA, van Duijn CM, Kiess W, Körner A, Sørensen TI, Niinikoski H, Pahkala K, Raitakari OT, Zeggini E, Dedoussis GV, Teo YY, Saw SM, Melbye M, Campbell H, Wilson JF, Vrijheid M, de Geus EJ, Boomsma DI, Kadarmideen HN, Holm JC, Hansen T, Sebert S, Hattersley AT, Beilin LJ, Newnham JP, Pennell CE, Heinrich J, Adair LS, Borja JB, Mohlke KL, Eriksson JG, Widén E, Kähönen M, Viikari JS, Lehtimäki T, Vollenweider P, Bønnelykke K, Bisgaard H, Mook-Kanamori DO, Hofman A, Rivadeneira F, Uitterlinden AG, Pisinger C, Pedersen O, Power C, Hyppönen E, Wareham NJ, Hakonarson H, Davies E, Walker BR, Jaddoe VW, Järvelin MR, Grant SF, Vaag AA, Lawlor DA, Frayling TM, Smith GD, Morris AP, Ong KK, Felix JF, Timpson NJ, Perry JR, Evans DM, McCarthy MI and Freathy RM

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK.

    Birth weight (BW) has been shown to be influenced by both fetal and maternal factors and in observational studies is reproducibly associated with future risk of adult metabolic diseases including type 2 diabetes (T2D) and cardiovascular disease. These life-course associations have often been attributed to the impact of an adverse early life environment. Here, we performed a multi-ancestry genome-wide association study (GWAS) meta-analysis of BW in 153,781 individuals, identifying 60 loci where fetal genotype was associated with BW (P < 5 × 10(-8)). Overall, approximately 15% of variance in BW was captured by assays of fetal genetic variation. Using genetic association alone, we found strong inverse genetic correlations between BW and systolic blood pressure (Rg = -0.22, P = 5.5 × 10(-13)), T2D (Rg = -0.27, P = 1.1 × 10(-6)) and coronary artery disease (Rg = -0.30, P = 6.5 × 10(-9)). In addition, using large -cohort datasets, we demonstrated that genetic factors were the major contributor to the negative covariance between BW and future cardiometabolic risk. Pathway analyses indicated that the protein products of genes within BW-associated regions were enriched for diverse processes including insulin signalling, glucose homeostasis, glycogen biosynthesis and chromatin remodelling. There was also enrichment of associations with BW in known imprinted regions (P = 1.9 × 10(-4)). We demonstrate that life-course associations between early growth phenotypes and adult cardiometabolic disease are in part the result of shared genetic effects and identify some of the pathways through which these causal genetic effects are mediated.

    Nature 2016

  • Transancestral fine-mapping of four type 2 diabetes susceptibility loci highlights potential causal regulatory mechanisms.

    Horikoshi M, Pasquali L, Wiltshire S, Huyghe JR, Mahajan A, Asimit JL, Ferreira T, Locke AE, Robertson NR, Wang X, Sim X, Fujita H, Hara K, Young R, Zhang W, Choi S, Chen H, Kaur I, Takeuchi F, Fontanillas P, Thuillier D, Yengo L, Below JE, Tam CH, Wu Y, Abecasis G, Altshuler D, Bell GI, Blangero J, Burtt NP, Duggirala R, Florez JC, Hanis CL, Seielstad M, Atzmon G, Chan JC, Ma RC, Froguel P, Wilson JG, Bharadwaj D, Dupuis J, Meigs JB, Cho YS, Park T, Kooner JS, Chambers JC, Saleheen D, Kadowaki T, Tai ES, Mohlke KL, Cox NJ, Ferrer J, Zeggini E, Kato N, Teo YY, Boehnke M, McCarthy MI, Morris AP and T2D-GENES Consortium

    Wellcome Trust Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford, UK, Oxford Centre for Diabetes, Endocrinology and Metabolism, Radcliffe Department of Medicine, University of Oxford, Oxford, UK.

    To gain insight into potential regulatory mechanisms through which the effects of variants at four established type 2 diabetes (T2D) susceptibility loci (CDKAL1, CDKN2A-B, IGF2BP2 and KCNQ1) are mediated, we undertook transancestral fine-mapping in 22 086 cases and 42 539 controls of East Asian, European, South Asian, African American and Mexican American descent. Through high-density imputation and conditional analyses, we identified seven distinct association signals at these four loci, each with allelic effects on T2D susceptibility that were homogenous across ancestry groups. By leveraging differences in the structure of linkage disequilibrium between diverse populations, and increased sample size, we localised the variants most likely to drive each distinct association signal. We demonstrated that integration of these genetic fine-mapping data with genomic annotation can highlight potential causal regulatory elements in T2D-relevant tissues. These analyses provide insight into the mechanisms through which T2D association signals are mediated, and suggest future routes to understanding the biology of specific disease susceptibility loci.

    Funded by: NIDDK NIH HHS: R01 DK072193, R01 DK078616, U01 DK078616, U01 DK105535

    Human molecular genetics 2016

  • Independent Origin and Global Distribution of Distinct Plasmodium vivax Duffy Binding Protein Gene Duplications.

    Hostetler JB, Lo E, Kanjee U, Amaratunga C, Suon S, Sreng S, Mao S, Yewhalaw D, Mascarenhas A, Kwiatkowski DP, Ferreira MU, Rathod PK, Yan G, Fairhurst RM, Duraisingh MT and Rayner JC

    Malaria Programme, Wellcome Trust Sanger Institute, Hinxton, United Kingdom.

    Background: Plasmodium vivax causes the majority of malaria episodes outside Africa, but remains a relatively understudied pathogen. The pathology of P. vivax infection depends critically on the parasite's ability to recognize and invade human erythrocytes. This invasion process involves an interaction between P. vivax Duffy Binding Protein (PvDBP) in merozoites and the Duffy antigen receptor for chemokines (DARC) on the erythrocyte surface. Whole-genome sequencing of clinical isolates recently established that some P. vivax genomes contain two copies of the PvDBP gene. The frequency of this duplication is particularly high in Madagascar, where there is also evidence for P. vivax infection in DARC-negative individuals. The functional significance and global prevalence of this duplication, and whether there are other copy number variations at the PvDBP locus, is unknown.

    Methodology/principal findings: Using whole-genome sequencing and PCR to study the PvDBP locus in P. vivax clinical isolates, we found that PvDBP duplication is widespread in Cambodia. The boundaries of the Cambodian PvDBP duplication differ from those previously identified in Madagascar, meaning that current molecular assays were unable to detect it. The Cambodian PvDBP duplication did not associate with parasite density or DARC genotype, and ranged in prevalence from 20% to 38% over four annual transmission seasons in Cambodia. This duplication was also present in P. vivax isolates from Brazil and Ethiopia, but not India.

    Conclusions/significance: PvDBP duplications are much more widespread and complex than previously thought, and at least two distinct duplications are circulating globally. The same duplication boundaries were identified in parasites from three continents, and were found at high prevalence in human populations where DARC-negativity is essentially absent. It is therefore unlikely that PvDBP duplication is associated with infection of DARC-negative individuals, but functional tests will be required to confirm this hypothesis.

    PLoS neglected tropical diseases 2016;10;10;e0005091

  • WormBase 2016: expanding to enable helminth genomic research.

    Howe KL, Bolt BJ, Cain S, Chan J, Chen WJ, Davis P, Done J, Down T, Gao S, Grove C, Harris TW, Kishore R, Lee R, Lomax J, Li Y, Muller HM, Nakamura C, Nuin P, Paulini M, Raciti D, Schindelman G, Stanley E, Tuli MA, Van Auken K, Wang D, Wang X, Williams G, Wright A, Yook K, Berriman M, Kersey P, Schedl T, Stein L and Sternberg PW

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

    WormBase ( is a central repository for research data on the biology, genetics and genomics of Caenorhabditis elegans and other nematodes. The project has evolved from its original remit to collect and integrate all data for a single species, and now extends to numerous nematodes, ranging from evolutionary comparators of C. elegans to parasitic species that threaten plant, animal and human health. Research activity using C. elegans as a model system is as vibrant as ever, and we have created new tools for community curation in response to the ever-increasing volume and complexity of data. To better allow users to navigate their way through these data, we have made a number of improvements to our main website, including new tools for browsing genomic features and ontology annotations. Finally, we have developed a new portal for parasitic worm genomes. WormBase ParaSite ( contains all publicly available nematode and platyhelminth annotated genome sequences, and is designed specifically to support helminth genomic research.

    Nucleic acids research 2016;44;D1;D774-80

  • WormBase ParaSite - a comprehensive resource for helminth genomics.

    Howe KL, Bolt BJ, Shafie M, Kersey P and Berriman M

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. Electronic address:

    The number of publicly available parasitic worm genome sequences has increased dramatically in the past three years, and research interest in helminth functional genomics is now quickly gathering pace in response to the foundation that has been laid by these collective efforts. A systematic approach to the organisation, curation, analysis and presentation of these data is clearly vital for maximising the utility of these data to researchers. We have developed a portal called WormBase ParaSite ( for interrogating helminth genomes on a large scale. Data from over 100 nematode and platyhelminth species are integrated, adding value by way of systematic and consistent functional annotation (e.g. protein domains and Gene Ontology terms), gene expression analysis (e.g. alignment of life-stage specific transcriptome data sets), and comparative analysis (e.g. orthologues and paralogues). We provide several ways of exploring the data, including genome browsers, genome and gene summary pages, text search, sequence search, a query wizard, bulk downloads, and programmatic interfaces. In this review, we provide an overview of the back-end infrastructure and analysis behind WormBase ParaSite, and the displays and tools available to users for interrogating helminth genomic data.

    Molecular and biochemical parasitology 2016

  • Insulin resistance uncoupled from dyslipidemia due to C-terminal PIK3R1 mutations.

    Huang-Doran I, Tomlinson P, Payne F, Gast A, Sleigh A, Bottomley W, Harris J, Daly A, Rocha N, Rudge S, Clark J, Kwok A, Romeo S, McCann E, Müksch B, Dattani M, Zucchini S, Wakelam M, Foukas LC, Savage DB, Murphy R, O'Rahilly S, Barroso I and Semple RK

    The University of Cambridge Metabolic Research Laboratories, Wellcome Trust-MRC Institute of Metabolic Science, Cambridge, United Kingdom.; The National Institute for Health Research Cambridge Biomedical Research Centre, Cambridge, United Kingdom.

    Obesity-related insulin resistance is associated with fatty liver, dyslipidemia, and low plasma adiponectin. Insulin resistance due to insulin receptor (INSR) dysfunction is associated with none of these, but when due to dysfunction of the downstream kinase AKT2 phenocopies obesity-related insulin resistance. We report 5 patients with SHORT syndrome and C-terminal mutations in PIK3R1, encoding the p85α/p55α/p50α subunits of PI3K, which act between INSR and AKT in insulin signaling. Four of 5 patients had extreme insulin resistance without dyslipidemia or hepatic steatosis. In 3 of these 4, plasma adiponectin was preserved, as in insulin receptor dysfunction. The fourth patient and her healthy mother had low plasma adiponectin associated with a potentially novel mutation, p.Asp231Ala, in adiponectin itself. Cells studied from one patient with the p.Tyr657X PIK3R1 mutation expressed abundant truncated PIK3R1 products and showed severely reduced insulin-stimulated association of mutant but not WT p85α with IRS1, but normal downstream signaling. In 3T3-L1 preadipocytes, mutant p85α overexpression attenuated insulin-induced AKT phosphorylation and adipocyte differentiation. Thus, PIK3R1 C-terminal mutations impair insulin signaling only in some cellular contexts and produce a subphenotype of insulin resistance resembling INSR dysfunction but unlike AKT2 dysfunction, implicating PI3K in the pathogenesis of key components of the metabolic syndrome.

    JCI insight 2016;1;17;e88766

  • The genomic basis of parasitism in the Strongyloides clade of nematodes.

    Hunt VL, Tsai IJ, Coghlan A, Reid AJ, Holroyd N, Foth BJ, Tracey A, Cotton JA, Stanley EJ, Beasley H, Bennett HM, Brooks K, Harsha B, Kajitani R, Kulkarni A, Harbecke D, Nagayasu E, Nichol S, Ogura Y, Quail MA, Randle N, Xia D, Brattig NW, Soblik H, Ribeiro DM, Sanchez-Flores A, Hayashi T, Itoh T, Denver DR, Grant W, Stoltzfus JD, Lok JB, Murayama H, Wastling J, Streit A, Kikuchi T, Viney M and Berriman M

    School of Biological Sciences, University of Bristol, Bristol, UK.

    Soil-transmitted nematodes, including the Strongyloides genus, cause one of the most prevalent neglected tropical diseases. Here we compare the genomes of four Strongyloides species, including the human pathogen Strongyloides stercoralis, and their close relatives that are facultatively parasitic (Parastrongyloides trichosuri) and free-living (Rhabditophanes sp. KR3021). A significant paralogous expansion of key gene families--families encoding astacin-like and SCP/TAPS proteins--is associated with the evolution of parasitism in this clade. Exploiting the unique Strongyloides life cycle, we compare the transcriptomes of the parasitic and free-living stages and find that these same gene families are upregulated in the parasitic stages, underscoring their role in nematode parasitism.

    Funded by: NCRR NIH HHS: P40 RR002512, RR02512; NIAID NIH HHS: AI050668, AI060516, AI105856, R01 AI050668, R21 AI105856, R33 AI105856, T32 AI060516; Wellcome Trust: 094462/Z/10/Z, 098051

    Nature genetics 2016;48;3;299-307

  • GWAS for executive function and processing speed suggests involvement of the CADM2 gene.

    Ibrahim-Verbaas CA, Bressler J, Debette S, Schuur M, Smith AV, Bis JC, Davies G, Trompet S, Smith JA, Wolf C, Chibnik LB, Liu Y, Vitart V, Kirin M, Petrovic K, Polasek O, Zgaga L, Fawns-Ritchie C, Hoffmann P, Karjalainen J, Lahti J, Llewellyn DJ, Schmidt CO, Mather KA, Chouraki V, Sun Q, Resnick SM, Rose LM, Oldmeadow C, Stewart M, Smith BH, Gudnason V, Yang Q, Mirza SS, Jukema JW, deJager PL, Harris TB, Liewald DC, Amin N, Coker LH, Stegle O, Lopez OL, Schmidt R, Teumer A, Ford I, Karbalai N, Becker JT, Jonsdottir MK, Au R, Fehrmann RS, Herms S, Nalls M, Zhao W, Turner ST, Yaffe K, Lohman K, van Swieten JC, Kardia SL, Knopman DS, Meeks WM, Heiss G, Holliday EG, Schofield PW, Tanaka T, Stott DJ, Wang J, Ridker P, Gow AJ, Pattie A, Starr JM, Hocking LJ, Armstrong NJ, McLachlan S, Shulman JM, Pilling LC, Eiriksdottir G, Scott RJ, Kochan NA, Palotie A, Hsieh YC, Eriksson JG, Penman A, Gottesman RF, Oostra BA, Yu L, DeStefano AL, Beiser A, Garcia M, Rotter JI, Nöthen MM, Hofman A, Slagboom PE, Westendorp RG, Buckley BM, Wolf PA, Uitterlinden AG, Psaty BM, Grabe HJ, Bandinelli S, Chasman DI, Grodstein F, Räikkönen K, Lambert JC, Porteous DJ, Generation Scotland, Price JF, Sachdev PS, Ferrucci L, Attia JR, Rudan I, Hayward C, Wright AF, Wilson JF, Cichon S, Franke L, Schmidt H, Ding J, de Craen AJ, Fornage M, Bennett DA, Deary IJ, Ikram MA, Launer LJ, Fitzpatrick AL, Seshadri S, van Duijn CM and Mosley TH

    Genetic Epidemiology Unit, Department of Epidemiology, Erasmus University Medical Center, Rotterdam, The Netherlands.

    To identify common variants contributing to normal variation in two specific domains of cognitive functioning, we conducted a genome-wide association study (GWAS) of executive functioning and information processing speed in non-demented older adults from the CHARGE (Cohorts for Heart and Aging Research in Genomic Epidemiology) consortium. Neuropsychological testing was available for 5429-32 070 subjects of European ancestry aged 45 years or older, free of dementia and clinical stroke at the time of cognitive testing from 20 cohorts in the discovery phase. We analyzed performance on the Trail Making Test parts A and B, the Letter Digit Substitution Test (LDST), the Digit Symbol Substitution Task (DSST), semantic and phonemic fluency tests, and the Stroop Color and Word Test. Replication was sought in 1311-21860 subjects from 20 independent cohorts. A significant association was observed in the discovery cohorts for the single-nucleotide polymorphism (SNP) rs17518584 (discovery P-value=3.12 × 10(-8)) and in the joint discovery and replication meta-analysis (P-value=3.28 × 10(-9) after adjustment for age, gender and education) in an intron of the gene cell adhesion molecule 2 (CADM2) for performance on the LDST/DSST. Rs17518584 is located about 170 kb upstream of the transcription start site of the major transcript for the CADM2 gene, but is within an intron of a variant transcript that includes an alternative first exon. The variant is associated with expression of CADM2 in the cingulate cortex (P-value=4 × 10(-4)). The protein encoded by CADM2 is involved in glutamate signaling (P-value=7.22 × 10(-15)), gamma-aminobutyric acid (GABA) transport (P-value=1.36 × 10(-11)) and neuron cell-cell adhesion (P-value=1.48 × 10(-13)). Our findings suggest that genetic variation in the CADM2 gene is associated with individual differences in information processing speed.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F019394/1; Medical Research Council: G0700704, MR/K026992/1; NCATS NIH HHS: UL1 TR000124; NCI NIH HHS: P01 CA055075, P01 CA087969, R01 CA047988, R01 CA049449, R01 CA050385, R01 CA065725, R01 CA067262, R01 CA134958, U01 CA067262, U01 CA098233; NEI NIH HHS: R01 EY009611, R01 EY015473; NHGRI NIH HHS: U01 HG004399, U01 HG004402, U01 HG004728; NHLBI NIH HHS: HHSN268200900020C, HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, HHSN268201100012C, HHSN268201200036C, N01 HC015103, N01 HC025195, N01 HC035129, N01 HC045133, N01 HC075150, N01 HC085082, N01 HC085084, N01 HC085085, N01 HC085086, N01HC55222, N01HC85079, N01HC85080, N01HC85081, N01HC85082, N01HC85083, R01 HL034594, R01 HL035464, R01 HL043851, R01 HL059367, R01 HL070825, R01 HL071917, R01 HL080295, R01 HL080467, R01 HL086694, R01 HL087641, R01 HL087652, R01 HL087660, R01 HL093029, R01 HL105756, U01 HL054457, U01 HL054463, U01 HL054464, U01 HL054481, U01 HL096917; NIA NIH HHS: K08 AG034290, K25 AG041906, N01 AG012100, N01 AG062101, N01 AG062103, N01 AG062106, N01 AG821336, N01 AG916413, P30 AG010161, P50 AG005133, R01 AG008122, R01 AG015819, R01 AG015928, R01 AG016495, R01 AG017917, R01 AG020098, R01 AG023629, R01 AG027058, R01 AG030146, R01 AG032098, R01 AG033193, U01 AG049505; NIDDK NIH HHS: P01 DK070756, P30 DK063491, R01 DK058845; NIMHD NIH HHS: 263 MD821336, 263 MD9164 13; NINDS NIH HHS: R01 NS017950, R01 NS041558

    Molecular psychiatry 2016;21;2;189-97

  • Evolutionary genomics of epidemic visceral leishmaniasis in the Indian subcontinent.

    Imamura H, Downing T, Van den Broeck F, Sanders MJ, Rijal S, Sundar S, Mannaert A, Vanaerschot M, Berg M, De Muylder G, Dumetz F, Cuypers B, Maes I, Domagalska M, Decuypere S, Rai K, Uranw S, Bhattarai NR, Khanal B, Prajapati VK, Sharma S, Stark O, Schönian G, De Koning HP, Settimo L, Vanhollebeke B, Roy S, Ostyn B, Boelaert M, Maes L, Berriman M, Dujardin JC and Cotton JA

    Department of Biomedical Sciences, Institute of Tropical Medicine, Antwerp, Belgium.

    Leishmania donovani causes visceral leishmaniasis (VL), the second most deadly vector-borne parasitic disease. A recent epidemic in the Indian subcontinent (ISC) caused up to 80% of global VL and over 30,000 deaths per year. Resistance against antimonial drugs has probably been a contributing factor in the persistence of this epidemic. Here we use whole genome sequences from 204 clinical isolates to track the evolution and epidemiology of L. donovani from the ISC. We identify independent radiations that have emerged since a bottleneck coincident with 1960s DDT spraying campaigns. A genetically distinct population frequently resistant to antimonials has a two base-pair insertion in the aquaglyceroporin gene LdAQP1 that prevents the transport of trivalent antimonials. We find evidence of genetic exchange between ISC populations, and show that the mutation in LdAQP1 has spread by recombination. Our results reveal the complexity of L. donovani evolution in the ISC in response to drug treatment.

    eLife 2016;5

  • Comparative Antibody Responses Against three Antimalarial Vaccine Candidate Antigens from Urban and Rural Exposed Individuals in Gabon.

    Imboumy-Limoukou RK, Oyegue-Liabagui SL, Ndidi S, Pegha-Moukandja I, Kouna CL, Galaway F, Florent I and Lekana-Douki JB

    Unité de Parasitologie Médicale (UPARAM), Centre International de Recherches Médicales de Franceville (CIRMF), BP 769 Franceville, Gabon; Molécules de Communication et Adaptation des Microorganismes (MCAM, UMR 7245), Sorbonne Universités, Muséum National d'Histoire Naturelle, CNRS, CP52, 57 rue Cuvier 75005 Paris, France; Ecole Doctorale Régionale en Infectiologie Tropicale d'Afrique Centrale (ECODRAC), BP 876 Franceville, Gabon.

    The analysis of immune responses in diverse malaria endemic regions provides more information to understand the host's immune response to Plasmodium falciparum. Several plasmodial antigens have been reported as targets of human immunity. PfAMA1 is one of most studied vaccine candidates; PfRH5 and Pf113 are new promising vaccine candidates. The aim of this study was to evaluate humoral response against these three antigens among children of Lastourville (rural area) and Franceville (urban area). Malaria was diagnosed using rapid diagnosis tests. Plasma samples were tested against these antigens by enzyme-linked immunosorbent assay (ELISA). We found that malaria prevalence was five times higher in the rural area than in the urban area (p < 0.0001). The anti-PfAMA1 and PfRh5 response levels were significantly higher in Lastourville than in Franceville (p < 0.0001; p = 0.005). The anti-AMA1 response was higher than the anti-Pf113 response, which in turn was higher than the anti-PfRh5 response in both sites. Anti-PfAMA1 levels were significantly higher in infected children than those in uninfected children (p = 0.001) in Franceville. Anti-Pf113 and anti-PfRh5 antibody levels were lowest in children presenting severe malarial anemia. These three antigens are targets of immunity in Gabon. Further studies on the role of Pf113 in antimalarial protection against severe anemia are needed.

    European journal of microbiology & immunology 2016;6;4;287-297

  • Evolution of atypical enteropathogenic E. coli by repeated acquisition of LEE pathogenicity island variants.

    Ingle DJ, Tauschek M, Edwards DJ, Hocking DM, Pickard DJ, Azzopardi KI, Amarasena T, Bennett-Wood V, Pearson JS, Tamboura B, Antonio M, Ochieng JB, Oundo J, Mandomando I, Qureshi S, Ramamurthy T, Hossain A, Kotloff KL, Nataro JP, Dougan G, Levine MM, Robins-Browne RM and Holt KE

    Department of Microbiology and Immunology, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Victoria 3010, Australia.

    Atypical enteropathogenic Escherichia coli (aEPEC) is an umbrella term given to E. coli that possess a type III secretion system encoded in the locus of enterocyte effacement (LEE), but lack the virulence factors (stx, bfpA) that characterize enterohaemorrhagic E. coli and typical EPEC, respectively. The burden of disease caused by aEPEC has recently increased in industrialized and developing nations, yet the population structure and virulence profile of this emerging pathogen are poorly understood. Here, we generated whole-genome sequences of 185 aEPEC isolates collected during the Global Enteric Multicenter Study from seven study sites in Asia and Africa, and compared them with publicly available E. coli genomes. Phylogenomic analysis revealed ten distinct widely distributed aEPEC clones. Analysis of genetic variation in the LEE pathogenicity island identified 30 distinct LEE subtypes divided into three major lineages. Each LEE lineage demonstrated a preferred chromosomal insertion site and different complements of non-LEE encoded effector genes, indicating distinct patterns of evolution of these lineages. This study provides the first detailed genomic framework for aEPEC in the context of the EPEC pathotype and will facilitate further studies into the epidemiology and pathogenicity of EPEC by enabling the detection and tracking of specific clones and LEE variants.

    Nature microbiology 2016;1;15010

  • Molecular Surveillance Identifies Multiple Transmissions of Typhoid in West Africa.

    International Typhoid Consortium, Wong VK, Holt KE, Okoro C, Baker S, Pickard DJ, Marks F, Page AJ, Olanipekun G, Munir H, Alter R, Fey PD, Feasey NA, Weill FX, Le Hello S, Hart PJ, Kariuki S, Breiman RF, Gordon MA, Heyderman RS, Jacobs J, Lunguya O, Msefula C, MacLennan CA, Keddy KH, Smith AM, Onsare RS, De Pinna E, Nair S, Amos B, Dougan G and Obaro S

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    Background: The burden of typhoid in sub-Saharan African (SSA) countries has been difficult to estimate, in part, due to suboptimal laboratory diagnostics. However, surveillance blood cultures at two sites in Nigeria have identified typhoid associated with Salmonella enterica serovar Typhi (S. Typhi) as an important cause of bacteremia in children.

    Methods: A total of 128 S. Typhi isolates from these studies in Nigeria were whole-genome sequenced, and the resulting data was used to place these Nigerian isolates into a worldwide context based on their phylogeny and carriage of molecular determinants of antibiotic resistance.

    Results: Several distinct S. Typhi genotypes were identified in Nigeria that were related to other clusters of S. Typhi isolates from north, west and central regions of Africa. The rapidly expanding S. Typhi clade 4.3.1 (H58) previously associated with multiple antimicrobial resistances in Asia and in east, central and southern Africa, was not detected in this study. However, antimicrobial resistance was common amongst the Nigerian isolates and was associated with several plasmids, including the IncHI1 plasmid commonly associated with S. Typhi.

    Conclusions: These data indicate that typhoid in Nigeria was established through multiple independent introductions into the country, with evidence of regional spread. MDR typhoid appears to be evolving independently of the haplotype H58 found in other typhoid endemic countries. This study highlights an urgent need for routine surveillance to monitor the epidemiology of typhoid and evolution of antimicrobial resistance within the bacterial population as a means to facilitate public health interventions to reduce the substantial morbidity and mortality of typhoid.

    PLoS neglected tropical diseases 2016;10;9;e0004781

  • A Landscape of Pharmacogenomic Interactions in Cancer.

    Iorio F, Knijnenburg TA, Vis DJ, Bignell GR, Menden MP, Schubert M, Aben N, Gonçalves E, Barthorpe S, Lightfoot H, Cokelaer T, Greninger P, van Dyk E, Chang H, de Silva H, Heyn H, Deng X, Egan RK, Liu Q, Mironenko T, Mitropoulos X, Richardson L, Wang J, Zhang T, Moran S, Sayols S, Soleimani M, Tamborero D, Lopez-Bigas N, Ross-Macdonald P, Esteller M, Gray NS, Haber DA, Stratton MR, Benes CH, Wessels LFA, Saez-Rodriguez J, McDermott U and Garnett MJ

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK; Wellcome Trust Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK.

    Systematic studies of cancer genomes have provided unprecedented insights into the molecular nature of cancer. Using this information to guide the development and application of therapies in the clinic is challenging. Here, we report how cancer-driven alterations identified in 11,289 tumors from 29 tissues (integrating somatic mutations, copy number alterations, DNA methylation, and gene expression) can be mapped onto 1,001 molecularly annotated human cancer cell lines and correlated with sensitivity to 265 drugs. We find that cell lines faithfully recapitulate oncogenic alterations identified in tumors, find that many of these associate with drug sensitivity/resistance, and highlight the importance of tissue lineage in mediating drug response. Logic-based modeling uncovers combinations of alterations that sensitize to drugs, while machine learning demonstrates the relative importance of different data types in predicting drug response. Our analysis and datasets are rich resources to link genotypes with cellular phenotypes and to identify therapeutic options for selected cancer sub-populations.

    Funded by: Cancer Research UK; European Research Council: 268626; Marie Curie; NCI NIH HHS: U24 CA143835; Wellcome Trust: 086375, 102696

    Cell 2016;166;3;740-754

  • Discovery and refinement of genetic loci associated with cardiometabolic risk using dense imputation maps.

    Iotchkova V, Huang J, Morris JA, Jain D, Barbieri C, Walter K, Min JL, Chen L, Astle W, Cocca M, Deelen P, Elding H, Farmaki AE, Franklin CS, Franberg M, Gaunt TR, Hofman A, Jiang T, Kleber ME, Lachance G, Luan J, Malerba G, Matchan A, Mead D, Memari Y, Ntalla I, Panoutsopoulou K, Pazoki R, Perry JRB, Rivadeneira F, Sabater-Lleal M, Sennblad B, Shin SY, Southam L, Traglia M, van Dijk F, van Leeuwen EM, Zaza G, Zhang W, UK10K Consortium, Amin N, Butterworth A, Chambers JC, Dedoussis G, Dehghan A, Franco OH, Franke L, Frontini M, Gambaro G, Gasparini P, Hamsten A, Issacs A, Kooner JS, Kooperberg C, Langenberg C, Marz W, Scott RA, Swertz MA, Toniolo D, Uitterlinden AG, van Duijn CM, Watkins H, Zeggini E, Maurano MT, Timpson NJ, Reiner AP, Auer PL and Soranzo N

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Large-scale whole-genome sequence data sets offer novel opportunities to identify genetic variation underlying human traits. Here we apply genotype imputation based on whole-genome sequence data from the UK10K and 1000 Genomes Project into 35,981 study participants of European ancestry, followed by association analysis with 20 quantitative cardiometabolic and hematological traits. We describe 17 new associations, including 6 rare (minor allele frequency (MAF) < 1%) or low-frequency (1% < MAF < 5%) variants with platelet count (PLT), red blood cell indices (MCH and MCV) and HDL cholesterol. Applying fine-mapping analysis to 233 known and new loci associated with the 20 traits, we resolve the associations of 59 loci to credible sets of 20 or fewer variants and describe trait enrichments within regions of predicted regulatory function. These findings improve understanding of the allelic architecture of risk factors for cardiometabolic and hematological diseases and provide additional functional insights with the identification of potentially novel biological targets.

    Funded by: Medical Research Council: MC_PC_15018, MC_UU_12015/1; NHLBI NIH HHS: R21 HL121422; NIH HHS: S10 OD020069; Wellcome Trust: 091310

    Nature genetics 2016;48;11;1303-1312

  • DNA REPAIR. Drugging DNA repair.

    Jackson SP and Helleday T

    The Wellcome Trust/Cancer Research UK Gurdon Institute and Department of Biochemistry, University of Cambridge, Cambridge CB2 1QN, UK. The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    Science (New York, N.Y.) 2016;352;6290;1178-9

  • WGS analysis and molecular resistance mechanisms of azithromycin-resistant (MIC >2 mg/L) Neisseria gonorrhoeae isolates in Europe from 2009 to 2014.

    Jacobsson S, Golparian D, Cole M, Spiteri G, Martin I, Bergheim T, Borrego MJ, Crowley B, Crucitti T, Van Dam AP, Hoffmann S, Jeverica S, Kohl P, Mlynarczyk-Bonikowska B, Pakarna G, Stary A, Stefanelli P, Pavlik P, Tzelepi E, Abad R, Harris SR and Unemo M

    Örebro University, Örebro, Sweden.

    Objectives: To elucidate the genome-based epidemiology and phylogenomics of azithromycin-resistant (MIC >2 mg/L) Neisseria gonorrhoeae strains collected in 2009-14 in Europe and clarify the azithromycin resistance mechanisms.

    Methods: Seventy-five azithromycin-resistant (MIC 4 to >256 mg/L) N. gonorrhoeae isolates collected in 17 European countries during 2009-14 were examined using antimicrobial susceptibility testing and WGS.

    Results: Thirty-six N. gonorrhoeae multi-antigen sequence typing STs and five phylogenomic clades, including 4-22 isolates from several countries per clade, were identified. The azithromycin target mutation A2059G (Escherichia coli numbering) was found in all four alleles of the 23S rRNA gene in all isolates with high-level azithromycin resistance (n = 4; MIC ≥256 mg/L). The C2611T mutation was identified in two to four alleles of the 23S rRNA gene in the remaining 71 isolates. Mutations in mtrR and its promoter were identified in 43 isolates, comprising isolates within the whole azithromycin MIC range. No mutations associated with azithromycin resistance were found in the rplD gene or the rplV gene and none of the macrolide resistance-associated genes [mef(A/E), ere(A), ere(B), erm(A), erm(B), erm(C) and erm(F)] were identified in any isolate.

    Conclusions: Clonal spread of relatively few N. gonorrhoeae strains accounts for the majority of the azithromycin resistance (MIC >2 mg/L) in Europe. The four isolates with high-level resistance to azithromycin (MIC ≥256 mg/L) were widely separated in the phylogenomic tree and did not belong to any of the main clades. The main azithromycin resistance mechanisms were the A2059G mutation (high-level resistance) and the C2611T mutation (low- and moderate-level resistance) in the 23S rRNA gene.

    The Journal of antimicrobial chemotherapy 2016

  • Pan-genomic perspective on the evolution of the Staphylococcus aureus USA300 epidemic.

    Jamrozy DM, Harris SR, Mohamed N, Peacock SJ, Tan CY, Parkhill J, Anderson AS and Holden MT

    The Wellcome Trust Sanger Institute , Cambridge CB10 1SA , UK.

    Staphylococcus aureus USA300 represents the dominant community-associated methicillin-resistant S. aureus lineage in the USA, where it is a major cause of skin and soft tissue infections. Previous comparative genomic studies have described the population structure and evolution of USA300 based on geographically restricted isolate collections. Here, we investigated the USA300 population by sequencing genomes of a geographically distributed panel of 191 clinical S. aureus isolates belonging to clonal complex 8 (CC8), derived from the Tigecycline Evaluation and Surveillance Trial program. Isolates were collected at 12 healthcare centres across nine USA states in 2004, 2009 or 2010. Reconstruction of evolutionary relationships revealed that CC8 was dominated by USA300 isolates (154/191, 81 %), which were heterogeneous and demonstrated limited phylogeographic clustering. Analysis of the USA300 core genomes revealed an increase in median pairwise SNP distance from 62 to 98 between 2004 and 2010, with a stable pattern of above average dN/dS ratios. The phylogeny of the USA300 population indicated that early diversification events led to the formation of nested clades, which arose through cumulative acquisition of predominantly non-synonymous SNPs in various coding sequences. The accessory genome of USA300 was largely homogenous and consisted of elements previously associated with this lineage. We observed an emergence of SCCmec negative and ACME negative USA300 isolates amongst more recent samples, and an increase in the prevalence of ϕSa5 prophage. Together, the analysed S. aureus USA300 collection revealed an evolving pan-genome through increased core genome heterogeneity and temporal variation in the frequency of certain accessory elements.

    Microbial genomics 2016;2;5;e000058

  • Lineage-Specific Genome Architecture Links Enhancers and Non-coding Disease Variants to Target Gene Promoters.

    Javierre BM, Burren OS, Wilder SP, Kreuzhuber R, Hill SM, Sewitz S, Cairns J, Wingett SW, Várnai C, Thiecke MJ, Burden F, Farrow S, Cutler AJ, Rehnström K, Downes K, Grassi L, Kostadima M, Freire-Pritchett P, Wang F, BLUEPRINT Consortium, Stunnenberg HG, Todd JA, Zerbino DR, Stegle O, Ouwehand WH, Frontini M, Wallace C, Spivakov M and Fraser P

    Nuclear Dynamics Programme, The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK.

    Long-range interactions between regulatory elements and gene promoters play key roles in transcriptional regulation. The vast majority of interactions are uncharted, constituting a major missing link in understanding genome control. Here, we use promoter capture Hi-C to identify interacting regions of 31,253 promoters in 17 human primary hematopoietic cell types. We show that promoter interactions are highly cell type specific and enriched for links between active promoters and epigenetically marked enhancers. Promoter interactomes reflect lineage relationships of the hematopoietic tree, consistent with dynamic remodeling of nuclear architecture during differentiation. Interacting regions are enriched in genetic variants linked with altered expression of genes they contact, highlighting their functional role. We exploit this rich resource to connect non-coding disease variants to putative target promoters, prioritizing thousands of disease-candidate genes and implicating disease pathways. Our results demonstrate the power of primary cell promoter interactomes to reveal insights into genomic regulatory mechanisms underlying common diseases.

    Cell 2016;167;5;1369-1384.e19

  • Molecular characterisation of the Chlamydia pecorum plasmid from porcine, ovine, bovine, and koala strains indicates plasmid-strain co-evolution.

    Jelocnik M, Bachmann NL, Seth-Smith H, Thomson NR, Timms P and Polkinghorne AM

    Centre for Animal Health Innovation, University of the Sunshine Coast , Sippy Downs, Queensland , Australia.

    Background. Highly stable, evolutionarily conserved, small, non-integrative plasmids are commonly found in members of the Chlamydiaceae and, in some species, these plasmids have been strongly linked to virulence. To date, evidence for such a plasmid in Chlamydia pecorum has been ambiguous. In a recent comparative genomic study of porcine, ovine, bovine, and koala C. pecorum isolates, we identified plasmids (pCpec) in a pig and three koala strains, respectively. Screening of further porcine, ovine, bovine, and koala C. pecorum isolates for pCpec showed that pCpec is common, but not ubiquitous in C. pecorum from all of the infected hosts. Methods. We used a combination of (i) bioinformatic mining of previously sequenced C. pecorum genome data sets and (ii) pCpec PCR-amplicon sequencing to characterise a further 17 novel pCpecs in C. pecorum isolates obtained from livestock, including pigs, sheep, and cattle, as well as those from koala. Results and Discussion. This analysis revealed that pCpec is conserved with all eight coding domain sequences (CDSs) present in isolates from each of the hosts studied. Sequence alignments revealed that the 21 pCpecs show 99% nucleotide sequence identity, with 83 single nucleotide polymorphisms (SNPs) shown to differentiate all of the plasmids analysed in this study. SNPs were found to be mostly synonymous and were distributed evenly across all eight pCpec CDSs as well as in the intergenic regions. Although conserved, analyses of the 21 pCpec sequences resolved plasmids into 12 distinct genotypes, with five shared between pCpecs from different isolates, and the remaining seven genotypes being unique to a single pCpec. Phylogenetic analysis revealed congruency and co-evolution of pCpecs with their cognate chromosome, further supporting polyphyletic origin of the koala C. pecorum. This study provides further understanding of the complex epidemiology of this pathogen in livestock and koala hosts and paves the way for studies to evaluate the function of this putative C. pecorum virulence factor.

    PeerJ 2016;4;e1661

  • Whole-exome sequencing in an isolated population from the Dalmatian island of Vis.

    Jeroncic A, Memari Y, Ritchie GR, Hendricks AE, Kolb-Kokocinski A, Matchan A, Vitart V, Hayward C, Kolcic I, Glodzik D, Wright AF, Rudan I, Campbell H, Durbin R, Polašek O, Zeggini E and Boraska Perica V

    Department of Research in Biomedicine and Health, University of Split School of Medicine, Split, Croatia.

    We have whole-exome sequenced 176 individuals from the isolated population of the island of Vis in Croatia in order to describe exonic variation architecture. We found 290 577 single nucleotide variants (SNVs), 65% of which are singletons, low frequency or rare variants. A total of 25 430 (9%) SNVs are novel, previously not catalogued in NHLBI GO Exome Sequencing Project, UK10K-Generation Scotland, 1000Genomes Project, ExAC or NCBI Reference Assembly dbSNP. The majority of these variants (76%) are singletons. Comparable to data obtained from UK10K-Generation Scotland that were sequenced and analysed using the same protocols, we detected an enrichment of potentially damaging variants (non-synonymous and loss-of-function) in the low frequency and common variant categories. On average 115 (range 93-140) genotypes with loss-of-function variants, 23 (15-34) of which were homozygous, were identified per person. The landscape of loss-of-function variants across an exome revealed that variants mainly accumulated in genes on the xenobiotic-related pathways, of which majority coded for enzymes. The frequency of loss-of-function variants was additionally increased in Vis runs of homozygosity regions where variants mainly affected signalling pathways. This work confirms the isolate status of Vis population by means of whole-exome sequence and reveals the pattern of loss-of-function mutations, which resembles the trails of adaptive evolution that were found in other species. By cataloguing the exomic variants and describing the allelic structure of the Vis population, this study will serve as a valuable resource for future genetic studies of human diseases, population genetics and evolution in this population.

    Funded by: Medical Research Council: MC_PC_U127561128; Wellcome Trust: 098051

    European journal of human genetics : EJHG 2016;24;10;1479-87

  • Heterogeneity of CD34 and CD38 expression in acute B lymphoblastic leukemia cells is reversible and not hierarchically organized.

    Jiang Z, Deng M, Wei X, Ye W, Xiao Y, Lin S, Wang S, Li B, Liu X, Zhang G, Lai P, Weng J, Wu D, Chen H, Wei W, Ma Y, Li Y, Liu P, Du X, Pei D, Yao Y, Xu B and Li P

    State Key Laboratory of Respiratory Disease, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, 190 Kaiyuan Avenue, Science Park, Guangzhou, Guangdong, 510530, China.

    The existence and identification of leukemia-initiating cells in adult acute B lymphoblastic leukemia (B-ALL) remain controversial. We examined whether adult B-ALL is hierarchically organized into phenotypically distinct subpopulations of leukemogenic and non-leukemogenic cells or whether most B-ALL cells retain leukemogenic capacity, irrespective of their immunophenotype profiles. Our results suggest that adult B-ALL follows the stochastic stem cell model and that the expression of CD34 and CD38 in B-ALL is reversibly and not hierarchically organized.

    Journal of hematology & oncology 2016;9;1;94

  • Identification of new heat-stable (STa) enterotoxin allele variants produced by human enterotoxigenic Escherichia coli (ETEC).

    Joffré E, von Mentzer A, Svennerholm AM and Sjöling Å

    Department of Microbiology and Immunology, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden; Institute of Molecular Biology and Biotechnology, Universidad Mayor de San Andrés, La Paz, Bolivia. Electronic address:

    We describe natural variants of the heat stable toxin (STa) produced by enterotoxigenic Escherichia coli (ETEC) isolates collected worldwide. Previous studies of ETEC isolated from human diarrheal cases have reported the existence of three natural STa gene variants estA1, estA2 and estA3/4 where the first variant encodes STp (porcine, bovine, and human origin) and the two latter ones encode STh (human origin). We identified STa sequences by BLASTn and profiled ST amino acid polymorphisms in a collection of 118 clinical ETEC isolates from children and adults from Asia, Africa and, Latin America that were characterized by whole genome sequencing. Three novel variants of STp and STh were found and designated STa5 and STa6, and STa7, respectively. Presence of glucose significantly decreased the production of STh and STp toxin variants (p<0.05) as well as downregulated the gene expression (STh: p<0.001, STp: p<0.05). We found that the ETEC isolates producing the most common STp variant, STa5, co-expressed coli surface antigen CS6 and was significantly associated with disease in adults in this data set (p<0.001). Expression of mature STa5 peptide as well as gene expression of tolC, involved in ST secretion, increased in response to bile (p<0.05). ETEC expressing the common STh variant STa3/4 was associated with disease in children (p<0.05). The crp gene, that positively regulate estA3/4 encoding STa3/4, and estA3/4 itself had decreased transcriptional levels in presence of bile. Since bile levels in the intestine are lower in children than adults, these results may suggest differences in pathogenicity of ETEC in children and adult populations.

    International journal of medical microbiology : IJMM 2016

  • The type III secretion system effector SptP of Salmonella enterica serovar Typhi.

    Johnson R, Byrne A, Berger CN, Klemm E, Crepin VF, Dougan G and Frankel G

    MRC Centre for Molecular Bacteriology and Infection, Department of Life Sciences, Imperial College London, London, United Kingdom.

    Salmonella enterica serovars causes gastroenteritis or typhoid fever in humans, with virulence depending on the action of two type III secretion systems (SPI-1 and SPI-2). SptP is a Salmonella SPI-1 effector, involved in mediating recovery of the host cytoskeleton post-infection. SptP requires a chaperone, SicP, for stability and secretion. SptP has 94% identity between S Typhimurium and S Typhi; direct comparison of the protein sequences revealed that S Typhi SptP has numerous amino acid changes within its chaperone-binding domain. Subsequent comparison of ΔsptP S Typhi and S Typhimurium strains demonstrated that unlike S Typhimurium, SptP in S Typhi was not involved in invasion or cytoskeletal recovery post-infection. Investigating if the observed amino acid changes within SptP of S Typhi affected its function revealed that S Typhi SptP was unable to complement S. Typhimurium ΔsptP due to an absence of secretion. We further demonstrated that whilst S Typhimurium SptP is stable intracellularly within S Typhi, S Typhi SptP is unstable, although stability could be recovered following replacement of the chaperone-binding domain with that of S Typhimurium. Direct assessment of the strength of interaction between SptP and SicP of both serovars via bacterial two hybrid demonstrated that S Typhi SptP has a significantly weaker interaction with SicP than the equivalent proteins in S Typhimurium. Taken together our results suggest that changes within the chaperone-binding domain of SptP in S Typhi hinder binding to its chaperone, resulting in instability and preventing translocation, and therefore restricting the intracellular activity of this effector.

    Importance: Studies investigating Salmonella pathogenesis typically rely on Salmonella Typhimurium, despite Salmonella Typhi causing the more severe disease in humans. As such, an understanding of S. Typhi pathogenesis is lacking. Differences within the type III secretion system effector, SptP, between typhoidal and non-typhoidal serovars led us to characterise this effector within S Typhi. Our results suggest that SptP is not translocated from typhoidal serovars, despite loss of sptP resulting in virulence defects in S Typhimurium. Although SptP is just one effector, our results exemplify that the behaviour of these serovars are significantly different, and genes identified as important for S. Typhimurium virulence may not translate to S Typhi.

    Journal of bacteriology 2016

  • Heterozygous KIDINS220/ARMS nonsense variants cause spastic paraplegia, intellectual disability, nystagmus, and obesity.

    Josifova DJ, Monroe GR, Tessadori F, de Graaff E, van der Zwaag B, Mehta SG, DDD Study, Harakalova M, Duran KJ, Savelberg SM, Nijman IJ, Jungbluth H, Hoogenraad CC, Bakkers J, Knoers NV, Firth HV, Beales PL, van Haaften G and van Haelst MM

    Department of Clinical Genetics, Guys' and St. Thomas' Hospital, London SE1 7EH, UK.

    We identified de novo nonsense variants in KIDINS220/ARMS in three unrelated patients with spastic paraplegia, intellectual disability, nystagmus, and obesity (SINO). KIDINS220 is an essential scaffold protein coordinating neurotrophin signal pathways in neurites and is spatially and temporally regulated in the brain. Molecular analysis of patients' variants confirmed expression and translation of truncated transcripts similar to recently characterized alternative terminal exon splice isoforms of KIDINS220 KIDINS220 undergoes extensive alternative splicing in specific neuronal populations and developmental time points, reflecting its complex role in neuronal maturation. In mice and humans, KIDINS220 is alternative spliced in the middle region as well as in the last exon. These full-length and KIDINS220 splice variants occur at precise moments in cortical, hippocampal, and motor neuron development, with splice variants similar to the variants seen in our patients and lacking the last exon of KIDINS220 occurring in adult rather than in embryonic brain. We conducted tissue-specific expression studies in zebrafish that resulted in spasms, confirming a functional link with disruption of the KIDINS220 levels in developing neurites. This work reveals a crucial physiological role of KIDINS220 in development and provides insight into how perturbation of the complex interplay of KIDINS220 isoforms and their relative expression can affect neuron control and human metabolism. Altogether, we here show that de novo protein-truncating KIDINS220 variants cause a new syndrome, SINO. This is the first report of KIDINS220 variants causing a human disease.

    Human molecular genetics 2016;25;11;2158-2167

  • New native South American Y chromosome lineages.

    Jota MS, Lacerda DR, Sandoval JR, Vieira PP, Ohasi D, Santos-Júnior JE, Acosta O, Cuellar C, Revollo S, Paz-Y-Miño C, Fujita R, Vallejo GA, Schurr TG, Tarazona-Santos EM, Pena SD, Ayub Q, Tyler-Smith C and Santos FR

    Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil.

    Many single-nucleotide polymorphisms (SNPs) in the non-recombining region of the human Y chromosome have been described in the last decade. High-coverage sequencing has helped to characterize new SNPs, which has in turn increased the level of detail in paternal phylogenies. However, these paternal lineages still provide insufficient information on population history and demography, especially for Native Americans. The present study aimed to identify informative paternal sublineages derived from the main founder lineage of the Americas-haplogroup Q-L54-in a sample of 1841 native South Americans. For this purpose, we used a Y-chromosomal genotyping multiplex platform and conventional genotyping methods to validate 34 new SNPs that were identified in the present study by sequencing, together with many Y-SNPs previously described in the literature. We updated the haplogroup Q phylogeny and identified two new Q-M3 and three new Q-L54*(xM3) sublineages defined by five informative SNPs, designated SA04, SA05, SA02, SA03 and SA29. Within the Q-M3, sublineage Q-SA04 was mostly found in individuals from ethnic groups belonging to the Tukanoan linguistic family in the northwest Amazon, whereas sublineage Q-SA05 was found in Peruvian and Bolivian Amazon ethnic groups. Within Q-L54*, the derived sublineages Q-SA03 and Q-SA02 were exclusively found among Coyaima individuals (Cariban linguistic family) from Colombia, while Q-SA29 was found only in Maxacali individuals (Jean linguistic family) from southeast Brazil. Furthermore, we validated the usefulness of several published SNPs among indigenous South Americans. This new Y chromosome haplogroup Q phylogeny offers an informative paternal genealogy to investigate the pre-Columbian history of South America.Journal of Human Genetics advance online publication, 31 March 2016; doi:10.1038/jhg.2016.26.

    Journal of human genetics 2016

  • Deficiency of the zinc finger protein ZFP106 causes motor and sensory neurodegeneration.

    Joyce PI, Fratta P, Landman AS, Mcgoldrick P, Wackerhage H, Groves M, Busam BS, Galino J, Corrochano S, Beskina OA, Esapa C, Ryder E, Carter S, Stewart M, Codner G, Hilton H, Teboul L, Tucker J, Lionikas A, Estabel J, Ramirez-Solis R, White JK, Brandner S, Plagnol V, Bennet DL, Abramov AY, Greensmith L, Fisher EM and Acevedo-Arozena A

    MRC Mammalian Genetics Unit, Harwell, Oxfordshire OX11 0RD, UK.

    Zinc finger motifs are distributed amongst many eukaryotic protein families, directing nucleic acid-protein and protein-protein interactions. Zinc finger protein 106 (ZFP106) has previously been associated with roles in immune response, muscle differentiation, testes development and DNA damage, although little is known about its specific function. To further investigate the function of ZFP106, we performed an in-depth characterization of Zfp106 deficient mice (Zfp106(-/-)), and we report a novel role for ZFP106 in motor and sensory neuronal maintenance and survival. Zfp106(-/-) mice develop severe motor abnormalities, major deficits in muscle strength and histopathological changes in muscle. Intriguingly, despite being highly expressed throughout the central nervous system, Zfp106(-/-) mice undergo selective motor and sensory neuronal and axonal degeneration specific to the spinal cord and peripheral nervous system. Neurodegeneration does not occur during development of Zfp106(-/-) mice, suggesting that ZFP106 is likely required for the maintenance of mature peripheral motor and sensory neurons. Analysis of embryonic Zfp106(-/-) motor neurons revealed deficits in mitochondrial function, with an inhibition of Complex I within the mitochondrial electron transport chain. Our results highlight a vital role for ZFP106 in sensory and motor neuron maintenance and reveal a novel player in mitochondrial dysfunction and neurodegeneration.

    Human molecular genetics 2016;25;2;291-307

  • Mutations at protein-protein interfaces: Small changes over big surfaces have large impacts on human health.

    Jubb HC, Pandurangan AP, Turner MA, Ochoa-Montaño B, Blundell TL and Ascher DB

    Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK; Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Many essential biological processes including cell regulation and signalling are mediated through the assembly of protein complexes. Changes to protein-protein interaction (PPI) interfaces can affect the formation of multiprotein complexes, and consequently lead to disruptions in interconnected networks of PPIs within and between cells, further leading to phenotypic changes as functional interactions are created or disrupted. Mutations altering PPIs have been linked to the development of genetic diseases including cancer and rare Mendelian diseases, and to the development of drug resistance. The importance of these protein mutations has led to the development of many resources for understanding and predicting their effects. We propose that a better understanding of how these mutations affect the structure, function, and formation of multiprotein complexes provides novel opportunities for tackling them, including the development of small-molecule drugs targeted specifically to mutated PPIs.

    Progress in biophysics and molecular biology 2016

  • Comparison of bacterial genome assembly software for MinION data and their applicability to medical microbiology.

    Judge K, Hunt M, Reuter S, Tracey A, Quail MA, Parkhill J and Peacock SJ

    Department of Medicine, University of Cambridge , Level 5, Addenbrookes Hospital, CB2 0QQ Cambridge , UK.

    Translating the Oxford Nanopore MinION sequencing technology into medical microbiology requires on-going analysis that keeps pace with technological improvements to the instrument and release of associated analysis software. Here, we use a multidrug-resistant Enterobacter kobei isolate as a model organism to compare open source software for the assembly of genome data, and relate this to the time taken to generate actionable information. Three software tools (PBcR, Canu and miniasm) were used to assemble MinION data and a fourth (SPAdes) was used to combine MinION and Illumina data to produce a hybrid assembly. All four had a similar number of contigs and were more contiguous than the assembly using Illumina data alone, with SPAdes producing a single chromosomal contig. Evaluation of the four assemblies to represent the genome structure revealed a single large inversion in the SPAdes assembly, which also incorrectly integrated a plasmid into the chromosomal contig. Almost 50 %, 80 % and 90 % of MinION pass reads were generated in the first 6, 9 and 12 h, respectively. Using data from the first 6 h alone led to a less accurate, fragmented assembly, but data from the first 9 or 12 h generated similar assemblies to that from 48 h sequencing. Assemblies were generated in 2 h using Canu, indicating that going from isolate to assembled data is possible in less than 48 h. MinION data identified that genes responsible for resistance were carried by two plasmids encoding resistance to carbapenem and to sulphonamides, rifampicin and aminoglycosides, respectively.

    Microbial genomics 2016;2;9;e000085

  • Efficient gene targeting in mouse zygotes mediated by CRISPR/Cas9-protein.

    Jung CJ, Zhang J, Trenchard E, Lloyd KC, West DB, Rosen B and de Jong PJ

    University of California, San Francisco Benioff Children's Hospital Oakland Research Institute, Oakland, CA, 94609, USA.

    The CRISPR/Cas9 system has rapidly advanced targeted genome editing technologies. However, its efficiency in targeting with constructs in mouse zygotes via homology directed repair (HDR) remains low. Here, we systematically explored optimal parameters for targeting constructs in mouse zygotes via HDR using mouse embryonic stem cells as a model system. We characterized several parameters, including single guide RNA cleavage activity and the length and symmetry of homology arms in the construct, and we compared the targeting efficiency between Cas9, Cas9nickase, and dCas9-FokI. We then applied the optimized conditions to zygotes, delivering Cas9 as either mRNA or protein. We found that Cas9 nucleo-protein complex promotes highly efficient, multiplexed targeting of circular constructs containing reporter genes and floxed exons. This approach allows for a one-step zygote injection procedure targeting multiple genes to generate conditional alleles via homologous recombination, and simultaneous knockout of corresponding genes in non-targeted alleles via non-homologous end joining.

    Transgenic research 2016

  • Targeting Chromatin Regulators Inhibits Leukemogenic Gene Expression in NPM1 Mutant Leukemia.

    Kühn MW, Song E, Feng Z, Sinha A, Chen CW, Deshpande AJ, Cusan M, Farnoud N, Mupo A, Grove C, Koche R, Bradner JE, de Stanchina E, Vassiliou GS, Hoshii T and Armstrong SA

    Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center.

    Homeobox (HOX) proteins and the receptor tyrosine kinase FLT3 are frequently highly expressed and mutated in acute myeloid leukemia (AML). Aberrant HOX expression is found in nearly all AMLs that harbor a mutation in the Nucleophosmin (NPM1) gene, and FLT3 is concomitantly mutated in approximately 60% of these cases. Little is known how mutant NPM1 (NPM1mut) cells maintain aberrant gene expression. Here, we demonstrate that the histone modifiers MLL1 and DOT1L control HOX and FLT3 expression and differentiation in NPM1mut AML. Using a CRISPR-Cas9 genome editing domain screen, we show NPM1mut AML to be exceptionally dependent on the menin binding site in MLL1. Pharmacological small-molecule inhibition of the menin-MLL1 protein interaction had profound anti-leukemic activity in human and murine models of NPM1mut AML. Combined pharmacological inhibition of menin-MLL1 and DOT1L resulted in dramatic suppression of HOX and FLT3 expression, induction of differentiation, and superior activity against NPM1mut leukemia. STATEMENT OF SIGNIFICANCE MLL1 and DOT1L are chromatin regulators that control HOX, MEIS1 and FLT3 expression and are therapeutic targets in NPM1mut AML. Combinatorial small-molecule inhibition has synergistic on target activity and constitutes a novel therapeutic concept for this common AML subtype.

    Cancer discovery 2016

  • EPEC: a cocktail of virulence.

    Kallonen T and Boinett CJ

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Genomics studies are prompting a re-evaluation of the diversity of Escherichia coli pathovars and how this diversity corresponds to virulence.

    Funded by: Medical Research Council: G1100100

    Nature reviews. Microbiology 2016;14;4;196

  • Analysis with the exome array identifies multiple new independent variants in lipid loci.

    Kanoni S, Masca NG, Stirrups KE, Varga TV, Warren HR, Scott RA, Southam L, Zhang W, Yaghootkar H, Müller-Nurasyid M, Couto Alves A, Strawbridge RJ, Lataniotis L, Hashim NA, Besse C, Boland A, Braund PS, Connell JM, Dominiczak A, Farmaki AE, Franks S, Grallert H, Jansson JH, Karaleftheri M, Keinänen-Kiukaanniemi S, Matchan A, Pasko D, Peters A, Poulter N, Rayner NW, Renström F, Rolandsson O, Sabater-Lleal M, Sennblad B, Sever P, Shields D, Silveira A, Stanton AV, Strauch K, Tomaszewski M, Tsafantakis E, Waldenberger M, Blakemore AI, Dedoussis G, Escher SA, Kooner JS, McCarthy MI, Palmer CN, Wellcome Trust Case Control Consortium, Hamsten A, Caulfield MJ, Frayling TM, Tobin MD, Jarvelin MR, Zeggini E, Gieger C, Chambers JC, Wareham NJ, Munroe PB, Franks PW, Samani NJ and Deloukas P

    1. William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK.

    It has been hypothesised that low frequency (1-5% MAF) and rare (<1% MAF) variants with large effect sizes may contribute to the missing heritability in complex traits. Here we report an association analysis of lipid traits (total cholesterol, LDL-cholesterol, HDL-cholesterol triglycerides) in up to 27,312 individuals with a comprehensive set of low frequency coding variants (ExomeChip), combined with conditional analysis in the known lipid loci. No new locus reached genome-wide significance. However, we found a new lead variant in 26 known lipid association regions of which 16 were >1000 fold more significant than the previous sentinel variant and not in close LD (6 had MAF < 5%). Furthermore, conditional analysis revealed multiple independent signals (ranging from 1-5) in a third of the 98 lipid loci tested, including rare variants. Addition of our novel associations resulted in between 1.5-2.5 fold increase in the proportion of heritability explained for the different lipid traits. Our findings suggest that rare coding variants contribute to the genetic architecture of lipid traits.

    Human molecular genetics 2016

  • Retrospective Analysis of Serotype Switching of Vibrio cholerae O1 in a Cholera Endemic Region Shows It Is a Non-random Process.

    Karlsson SL, Thomson N, Mutreja A, Connor T, Sur D, Ali M, Clemens J, Dougan G, Holmgren J and Lebens M

    Department of Microbiology and Immunology, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden.

    Genomic data generated from clinical Vibrio cholerae O1 isolates collected over a five year period in an area of Kolkata, India with seasonal cholera outbreaks allowed a detailed genetic analysis of serotype switching that occurred from Ogawa to Inaba and back to Ogawa. The change from Ogawa to Inaba resulted from mutational disruption of the methyltransferase encoded by the wbeT gene. Re-emergence of the Ogawa serotype was found to result either from expansion of an already existing Ogawa clade or reversion of the mutation in an Inaba clade. Our data suggests that such transitions are not random events but rather driven by as yet unidentified selection mechanisms based on differences in the structure of the O1 antigen or in the serotype-determining wbeT gene.

    PLoS neglected tropical diseases 2016;10;10;e0005044

  • Improving the Identification of Phenotypic Abnormalities and Sexual Dimorphism in Mice When Studying Rare Event Categorical Characteristics.

    Karp NA, Heller R, Yaacoby S, White JK and Benjamini Y

    Wellcome Trust Sanger Institute;

    Biological research frequently involves the study of phenotyping data. Many of these studies focus on rare event categorical data, and in functional genomics typically study the presence or absence of an abnormal phenotype. With the growing interest in the role of sex, there is a need to assess the phenotype for sexual dimorphism. The identification of abnormal phenotypes for downstream research is challenged by the small sample size, the rare event nature, and the multiple testing problem, as many variables are monitored simultaneously. Here we develop a statistical pipeline to assess statistical and biological significance whilst managing the multiple testing problem. We propose a two-step pipeline to initially assess for a treatment effect, in our case example genotype, and then test for an interaction with sex. We compare multiple statistical methods and use simulations to investigate the control of the type one error rate and power. To maximize the power whilst addressing the multiple testing issue we implement filters to remove datasets where the hypotheses to be tested cannot achieve significance. A motivating case study utilizing a large scale high throughput mouse phenotyping dataset from the Wellcome Trust Sanger Institute Mouse Genetics Project, where the treatment is a gene ablation, demonstrates the benefits of the new pipeline on the downstream biological calls.

    Genetics 2016

  • The nucleosome landscape of Plasmodium falciparum reveals chromatin architecture and dynamics of regulatory sequences.

    Kensche PR, Hoeijmakers WA, Toenhake CG, Bras M, Chappell L, Berriman M and Bártfai R

    Department of Molecular Biology, Radboud University, 6525GA Nijmegen, The Netherlands.

    In eukaryotes, the chromatin architecture has a pivotal role in regulating all DNA-associated processes and it is central to the control of gene expression. For Plasmodium falciparum, a causative agent of human malaria, the nucleosome positioning profile of regulatory regions deserves particular attention because of their extreme AT-content. With the aid of a highly controlled MNase-seq procedure we reveal how positioning of nucleosomes provides a structural and regulatory framework to the transcriptional unit by demarcating landmark sites (transcription/translation start and end sites). In addition, our analysis provides strong indications for the function of positioned nucleosomes in splice site recognition. Transcription start sites (TSSs) are bordered by a small nucleosome-depleted region, but lack the stereotypic downstream nucleosome arrays, highlighting a key difference in chromatin organization compared to model organisms. Furthermore, we observe transcription-coupled eviction of nucleosomes on strong TSSs during intraerythrocytic development and demonstrate that nucleosome positioning and dynamics can be predictive for the functionality of regulatory DNA elements. Collectively, the strong nucleosome positioning over splice sites and surrounding putative transcription factor binding sites highlights the regulatory capacity of the nucleosome landscape in this deadly human pathogen.

    Funded by: Wellcome Trust: WT 098051

    Nucleic acids research 2016;44;5;2110-24

  • Polymorphism in a lincRNA Associates with a Doubled Risk of Pneumococcal Bacteremia in Kenyan Children.

    Kenyan Bacteraemia Study Group, Wellcome Trust Case Control Consortium 2 (WTCCC2), Rautanen A, Pirinen M, Mills TC, Rockett KA, Strange A, Ndungu AW, Naranbhai V, Gilchrist JJ, Bellenguez C, Freeman C, Band G, Bumpstead SJ, Edkins S, Giannoulatou E, Gray E, Dronov S, Hunt SE, Langford C, Pearson RD, Su Z, Vukcevic D, Macharia AW, Uyoga S, Ndila C, Mturi N, Njuguna P, Mohammed S, Berkley JA, Mwangi I, Mwarumba S, Kitsao BS, Lowe BS, Morpeth SC, Khandwalla I, Kilifi Bacteraemia Surveillance Group, Blackwell JM, Bramon E, Brown MA, Casas JP, Corvin A, Duncanson A, Jankowski J, Markus HS, Mathew CG, Palmer CN, Plomin R, Sawcer SJ, Trembath RC, Viswanathan AC, Wood NW, Deloukas P, Peltonen L, Williams TN, Scott JA, Chapman SJ, Donnelly P, Hill AV and Spencer CC

    Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK. Electronic address:

    Bacteremia (bacterial bloodstream infection) is a major cause of illness and death in sub-Saharan Africa but little is known about the role of human genetics in susceptibility. We conducted a genome-wide association study of bacteremia susceptibility in more than 5,000 Kenyan children as part of the Wellcome Trust Case Control Consortium 2 (WTCCC2). Both the blood-culture-proven bacteremia case subjects and healthy infants as controls were recruited from Kilifi, on the east coast of Kenya. Streptococcus pneumoniae is the most common cause of bacteremia in Kilifi and was thus the focus of this study. We identified an association between polymorphisms in a long intergenic non-coding RNA (lincRNA) gene (AC011288.2) and pneumococcal bacteremia and replicated the results in the same population (p combined = 1.69 × 10(-9); OR = 2.47, 95% CI = 1.84-3.31). The susceptibility allele is African specific, derived rather than ancestral, and occurs at low frequency (2.7% in control subjects and 6.4% in case subjects). Our further studies showed AC011288.2 expression only in neutrophils, a cell type that is known to play a major role in pneumococcal clearance. Identification of this novel association will further focus research on the role of lincRNAs in human infectious disease.

    American journal of human genetics 2016;98;6;1092-100

  • High-throughput DNA methylation analysis in anorexia nervosa confirms TNXB hypermethylation.

    Kesselmeier M, Pütter C, Volckmar AL, Baurecht H, Grallert H, Illig T, Ismail K, Ollikainen M, Silén Y, Keski-Rahkonen A, Bulik CM, Collier DA, Zeggini E, Hebebrand J, Scherag A, Hinney A and GCAN and WTCCC3

    a Clinical Epidemiology, Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital , Jena , Germany ;

    Objectives: Patients with anorexia nervosa (AN) are ideally suited to identify differentially methylated genes in response to starvation.

    Methods: We examined high-throughput DNA methylation derived from whole blood of 47 females with AN, 47 lean females without AN and 100 population-based females to compare AN with both controls. To account for different cell type compositions, we applied two reference-free methods (FastLMM-EWASher, RefFreeEWAS) and searched for consensus CpG sites identified by both methods. We used a validation sample of five monozygotic AN-discordant twin pairs.

    Results: Fifty-one consensus sites were identified in AN vs. lean and 81 in AN vs. population-based comparisons. These sites have not been reported in AN methylation analyses, but for the latter comparison 54/81 sites showed directionally consistent differential methylation effects in the AN-discordant twins. For a single nucleotide polymorphism rs923768 in CSGALNACT1 a nearby site was nominally associated with AN. At the gene level, we confirmed hypermethylated sites at TNXB. We found support for a locus at NR1H3 in the AN vs. lean control comparison, but the methylation direction was opposite to the one previously reported.

    Conclusions: We confirm genes like TNXB previously described to comprise differentially methylated sites, and highlight further sites that might be specifically involved in AN starvation processes.

    The world journal of biological psychiatry : the official journal of the World Federation of Societies of Biological Psychiatry 2016;1-13

  • Genome-wide study for circulating metabolites identifies 62 loci and reveals novel systemic effects of LPA.

    Kettunen J, Demirkan A, Würtz P, Draisma HH, Haller T, Rawal R, Vaarhorst A, Kangas AJ, Lyytikäinen LP, Pirinen M, Pool R, Sarin AP, Soininen P, Tukiainen T, Wang Q, Tiainen M, Tynkkynen T, Amin N, Zeller T, Beekman M, Deelen J, van Dijk KW, Esko T, Hottenga JJ, van Leeuwen EM, Lehtimäki T, Mihailov E, Rose RJ, de Craen AJ, Gieger C, Kähönen M, Perola M, Blankenberg S, Savolainen MJ, Verhoeven A, Viikari J, Willemsen G, Boomsma DI, van Duijn CM, Eriksson J, Jula A, Järvelin MR, Kaprio J, Metspalu A, Raitakari O, Salomaa V, Slagboom PE, Waldenberger M, Ripatti S and Ala-Korpela M

    Computational Medicine, Faculty of Medicine, University of Oulu, PO Box 5000, 90014 Oulu, Finland.

    Genome-wide association studies have identified numerous loci linked with complex diseases, for which the molecular mechanisms remain largely unclear. Comprehensive molecular profiling of circulating metabolites captures highly heritable traits, which can help to uncover metabolic pathophysiology underlying established disease variants. We conduct an extended genome-wide association study of genetic influences on 123 circulating metabolic traits quantified by nuclear magnetic resonance metabolomics from up to 24,925 individuals and identify eight novel loci for amino acids, pyruvate and fatty acids. The LPA locus link with cardiovascular risk exemplifies how detailed metabolic profiling may inform underlying aetiology via extensive associations with very-low-density lipoprotein and triglyceride metabolism. Genetic fine mapping and Mendelian randomization uncover wide-spread causal effects of lipoprotein(a) on overall lipoprotein metabolism and we assess potential pleiotropic consequences of genetically elevated lipoprotein(a) on diverse morbidities via electronic health-care records. Our findings strengthen the argument for safe LPA-targeted intervention to reduce cardiovascular risk.

    Nature communications 2016;7;11122

  • Adults with suspected central nervous system infection: A prospective study of diagnostic accuracy.

    Khatib U, van de Beek D, Lees JA and Brouwer MC

    Department of Neurology, Center of Infection and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands.

    Objectives: To study the diagnostic accuracy of clinical and laboratory features in the diagnosis of central nervous system (CNS) infection and bacterial meningitis.

    Methods: We included consecutive adult episodes with suspected CNS infection who underwent cerebrospinal fluid (CSF) examination. The reference standard was the diagnosis classified into five categories: 1) CNS infection; 2) CNS inflammation without infection; 3) other neurological disorder; 4) non-neurological infection; and 5) other systemic disorder.

    Results: Between 2012 and 2015, 363 episodes of suspected CNS infection were included. CSF examination showed leucocyte count >5/mm(3) in 47% of episodes. Overall, 89 of 363 episodes were categorized as CNS infection (25%; most commonly viral meningitis [7%], bacterial meningitis [7%], and viral encephalitis [4%]), 36 (10%) episodes as CNS inflammatory disorder, 111 (31%) as systemic infection, in 119 (33%) as other neurological disorder, and 8 (2%) as other systemic disorders. Diagnostic accuracy of individual clinical characteristics and blood tests for the diagnosis of CNS infection or bacterial meningitis was low. CSF leucocytosis differentiated best between bacterial meningitis and other diagnoses (area under the curve [AUC] 0.95) or any neurological infection versus other diagnoses (AUC 0.93).

    Conclusions: Clinical characteristics fail to differentiate between neurological infections and other diagnoses, and CSF analysis is the main contributor to the final diagnosis.

    The Journal of infection 2016

  • Diagnostic Yield of Sequencing Familial Hypercholesterolemia Genes in Patients with Severe Hypercholesterolemia.

    Khera AV, Won HH, Peloso GM, Lawson KS, Bartz TM, Deng X, van Leeuwen EM, Natarajan P, Emdin CA, Bick AG, Morrison AC, Brody JA, Gupta N, Nomura A, Kessler T, Duga S, Bis JC, van Duijn CM, Cupples LA, Psaty B, Rader DJ, Danesh J, Schunkert H, McPherson R, Farrall M, Watkins H, Lander E, Wilson JG, Correa A, Boerwinkle E, Merlini PA, Ardissino D, Saleheen D, Gabriel S and Kathiresan S

    Center for Human Genetic Research, Cardiovascular Research Center and Cardiology Division, Massachusetts General Hospital, Harvard Medical School, Boston MA; Program in Medical and Population Genetics, Broad Institute, Cambridge, MA.

    Background: About 7% of US adults have severe hypercholesterolemia (untreated LDL cholesterol ≥190 mg/dl). Such high LDL levels may be due to familial hypercholesterolemia (FH), a condition caused by a single mutation in any of three genes. Lifelong elevations in LDL cholesterol in FH mutation carriers may confer CAD risk beyond that captured by a single LDL cholesterol measurement.

    Objectives: Assess the prevalence of a FH mutation among those with severe hypercholesterolemia and determine whether CAD risk varies according to mutation status beyond the observed LDL cholesterol.

    Methods: Three genes causative for FH (LDLR, APOB, PCSK9) were sequenced in 26,025 participants from 7 case-control studies (5,540 CAD cases, 8,577 CAD-free controls) and 5 prospective cohort studies (11,908 participants). FH mutations included loss-of-function variants in LDLR, missense mutations in LDLR predicted to be damaging, and variants linked to FH in ClinVar, a clinical genetics database.

    Results: Among 8,577 CAD-free control participants, 430 had LDL cholesterol ≥190 mg/dl; of these, only eight (1.9%) carried a FH mutation. Similarly, among 11,908 participants from 5 prospective cohorts, 956 had LDL cholesterol ≥190 mg/dl and of these, only 16 (1.7%) carried a FH mutation. Within any stratum of observed LDL cholesterol, risk of CAD was higher among FH mutation carriers when compared with non-carriers. When compared to a reference group with LDL cholesterol <130 mg/dl and no mutation, participants with LDL cholesterol ≥190 mg/dl and no FH mutation had six-fold higher risk for CAD (OR 6.0; 95%CI 5.2-6.9) whereas those with LDL cholesterol ≥190 mg/dl as well as a FH mutation demonstrated twenty-two fold increased risk (OR 22.3; 95%CI 10.7-53.2).

    Conclusions: Among individuals with LDL cholesterol ≥190 mg/dl, gene sequencing identified a FH mutation in <2%. However, for any given observed LDL cholesterol, FH mutation carriers are at substantially increased risk for CAD.

    Journal of the American College of Cardiology 2016

  • Evolutionary dynamics of Anolis sex chromosomes revealed by sequencing of flow sorting-derived microchromosome-specific DNA.

    Kichigin IG, Giovannotti M, Makunin AI, Ng BL, Kabilov MR, Tupikin AE, Barucchi VC, Splendiani A, Ruggeri P, Rens W, O'Brien PC, Ferguson-Smith MA, Graphodatsky AS and Trifonov VA

    Institute of Molecular and Cellular Biology SB RAS, Novosibirsk, 630090, Russia.

    Squamate reptiles show a striking diversity in modes of sex determination, including both genetic (XY or ZW) and temperature-dependent sex determination systems. The genomes of only a handful of species have been sequenced, analyzed and assembled including the genome of Anolis carolinensis. Despite a high genome coverage, only macrochromosomes of A. carolinensis were assembled whereas the content of most microchromosomes remained unclear. Most of the Anolis species have homomorphic XY sex chromosome system. However, some species have large heteromorphic XY chromosomes (e.g., A. sagrei) and even multiple sex chromosomes systems (e.g. A. pogus), that were shown to be derived from fusions of the ancestral XY with microautosomes. We applied next generation sequencing of flow sorting-derived chromosome-specific DNA pools to characterize the content and composition of microchromosomes in A. carolinensis and A. sagrei. Comparative analysis of sequenced chromosome-specific DNA pools revealed that the A. sagrei XY sex chromosomes contain regions homologous to several microautosomes of A. carolinensis. We suggest that the sex chromosomes of A. sagrei are derived by fusions of the ancestral sex chromosome with three microautosomes and subsequent loss of some genetic content on the Y chromosome.

    Molecular genetics and genomics : MGG 2016;291;5;1955-66

  • De Novo Mutations in SON Disrupt RNA Splicing of Genes Essential for Brain Development and Metabolism, Causing an Intellectual-Disability Syndrome.

    Kim JH, Shinde DN, Reijnders MRF, Hauser NS, Belmonte RL, Wilson GR, Bosch DGM, Bubulya PA, Shashi V, Petrovski S, Stone JK, Park EY, Veltman JA, Sinnema M, Stumpel CTRM, Draaisma JM, Nicolai J, University of Washington Center for Mendelian Genomics, Yntema HG, Lindstrom K, de Vries BBA, Jewett T, Santoro SL, Vogt J, Deciphering Developmental Disorders Study, Bachman KK, Seeley AH, Krokosky A, Turner C, Rohena L, Hempel M, Kortüm F, Lessel D, Neu A, Strom TM, Wieczorek D, Bramswig N, Laccone FA, Behunova J, Rehder H, Gordon CT, Rio M, Romana S, Tang S, El-Khechen D, Cho MT, McWalter K, Douglas G, Baskin B, Begtrup A, Funari T, Schoch K, Stegmann APA, Stevens SJC, Zhang DE, Traver D, Yao X, MacArthur DG, Brunner HG, Mancini GM, Myers RM, Owen LB, Lim ST, Stachura DL, Vissers LELM and Ahn EE

    Mitchell Cancer Institute, University of South Alabama, Mobile, AL 36604, USA.

    The overall understanding of the molecular etiologies of intellectual disability (ID) and developmental delay (DD) is increasing as next-generation sequencing technologies identify genetic variants in individuals with such disorders. However, detailed analyses conclusively confirming these variants, as well as the underlying molecular mechanisms explaining the diseases, are often lacking. Here, we report on an ID syndrome caused by de novo heterozygous loss-of-function (LoF) mutations in SON. The syndrome is characterized by ID and/or DD, malformations of the cerebral cortex, epilepsy, vision problems, musculoskeletal abnormalities, and congenital malformations. Knockdown of son in zebrafish resulted in severe malformation of the spine, brain, and eyes. Importantly, analyses of RNA from affected individuals revealed that genes critical for neuronal migration and cortex organization (TUBG1, FLNA, PNKP, WDR62, PSMD3, and HDAC6) and metabolism (PCK2, PFKL, IDH2, ACY1, and ADA) are significantly downregulated because of the accumulation of mis-spliced transcripts resulting from erroneous SON-mediated RNA splicing. Our data highlight SON as a master regulator governing neurodevelopment and demonstrate the importance of SON-mediated RNA splicing in human development.

    Funded by: NCI NIH HHS: R01 CA190688, R21 CA185818; NHGRI NIH HHS: U54 HG006493, UM1 HG006493; NIGMS NIH HHS: R15 GM084407

    American journal of human genetics 2016;99;3;711-719

  • Advances in Understanding Bacterial Pathogenesis Gained from Whole-Genome Sequencing and Phylogenetics.

    Klemm E and Dougan G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    The development of next-generation sequencing as a cost-effective technology has facilitated the analysis of bacterial population structure at a whole-genome level and at scale. From these data, phylogenic trees have been constructed that define population structures at a local, national, and global level, providing a framework for genetic analysis. Although still at an early stage, these approaches have yielded progress in several areas, including pathogen transmission mapping, the genetics of niche colonization and host adaptation, as well as gene-to-phenotype association studies. Antibiotic resistance has proven to be a major challenge in the early 21(st) century, and phylogenetic analyses have uncovered the dramatic effect that the use of antibiotics has had on shaping bacterial population structures. An update on insights into bacterial evolution from comparative genomics is provided in this review.

    Cell host & microbe 2016;19;5;599-610

  • Emergence of host-adapted Salmonella Enteritidis through rapid evolution in an immunocompromised host.

    Klemm EJ, Gkrania-Klotsas E, Hadfield J, Forbester JL, Harris SR, Hale C, Heath JN, Wileman T, Clare S, Kane L, Goulding D, Otto TD, Kay S, Doffinger R, Cooke FJ, Carmichael A, Lever AM, Parkhill J, MacLennan CA, Kumararatne D, Dougan G and Kingsley RA

    The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    Host adaptation is a key factor contributing to the emergence of new bacterial, viral and parasitic pathogens. Many pathogens are considered promiscuous because they cause disease across a range of host species, while others are host-adapted, infecting particular hosts(1). Host adaptation can potentially progress to host restriction where the pathogen is strictly limited to a single host species and is frequently associated with more severe symptoms. Host-adapted and host-restricted bacterial clades evolve from within a broader host-promiscuous species and sometimes target different niches within their specialist hosts, such as adapting from a mucosal to a systemic lifestyle. Genome degradation, marked by gene inactivation and deletion, is a key feature of host adaptation, although the triggers initiating genome degradation are not well understood. Here, we show that a chronic systemic non-typhoidal Salmonella infection in an immunocompromised human patient resulted in genome degradation targeting genes that are expendable for a systemic lifestyle. We present a genome-based investigation of a recurrent blood-borne Salmonella enterica serotype Enteritidis (S. Enteritidis) infection covering 15 years in an interleukin (IL)-12 β-1 receptor-deficient individual that developed into an asymptomatic chronic infection. The infecting S. Enteritidis harbored a mutation in the mismatch repair gene mutS that accelerated the genomic mutation rate. Phylogenetic analysis and phenotyping of multiple patient isolates provides evidence for a remarkable level of within-host evolution that parallels genome changes present in successful host-restricted bacterial pathogens but never before observed on this timescale. Our analysis identifies common pathways of host adaptation and demonstrates the role that immunocompromised individuals can play in this process.

    Funded by: Wellcome Trust: 098051

    Nature microbiology 2016;1;3

  • Logic models to predict continuous outputs based on binary inputs with an application to personalized cancer therapy.

    Knijnenburg TA, Klau GW, Iorio F, Garnett MJ, McDermott U, Shmulevich I and Wessels LF

    Institute for Systems Biology, Seattle, US.

    Mining large datasets using machine learning approaches often leads to models that are hard to interpret and not amenable to the generation of hypotheses that can be experimentally tested. We present 'Logic Optimization for Binary Input to Continuous Output' (LOBICO), a computational approach that infers small and easily interpretable logic models of binary input features that explain a continuous output variable. Applying LOBICO to a large cancer cell line panel, we find that logic combinations of multiple mutations are more predictive of drug response than single gene predictors. Importantly, we show that the use of the continuous information leads to robust and more accurate logic models. LOBICO implements the ability to uncover logic models around predefined operating points in terms of sensitivity and specificity. As such, it represents an important step towards practical application of interpretable logic models.

    Funded by: NCI NIH HHS: U24 CA143835

    Scientific reports 2016;6;36812

  • A novel signalling screen demonstrates that CALR mutations activate essential MAPK signalling and facilitate megakaryocyte differentiation.

    Kollmann K, Warsch W, Gonzalez-Arias C, Nice FL, Avezov E, Milburn J, Li J, Dimitropoulou D, Biddie S, Wang M, Poynton E, Colzani M, Tijssen MR, Anand S, McDermott U, Huntly B and Green T

    Cambridge Institute for Medical Research and Wellcome Trust/MRC Stem Cell Institute, University of Cambridge, Cambridge, UK.

    Most MPN patients lacking JAK2 mutations harbour somatic CALR mutations that are thought to activate cytokine signalling although the mechanism is unclear. To identify kinases important for survival of CALR-mutant cells we developed a novel strategy (KISMET) which utilises the full range of kinase selectivity data available from each inhibitor and thus takes advantage of off-target noise that limits conventional siRNA or inhibitor screens. KISMET successfully identified known essential kinases in haematopoietic and non-haematopoietic cell lines and identified the MAPK pathway as required for growth of the CALR-mutated MARIMO cells. Expression of mutant CALR in murine or human haematopoietic cell lines was accompanied by MPL-dependent activation of MAPK signalling, and MPN patients with CALR mutations showed increased MAPK activity in CD34-cells, platelets and megakaryocytes. Although CALR mutations resulted in protein instability and proteosomal degradation, mutant CALR was able to enhance megakaryopoiesis and pro-platelet production from human CD34+ progenitors. These data link aberrant MAPK activation to the MPN phenotype and identify it as a potential therapeutic target in CALR-mutant positive MPNs.Leukemia accepted article preview online, 14 October 2016. doi:10.1038/leu.2016.280.

    Leukemia 2016

  • Bi-allelic Truncating Mutations in TANGO2 Cause Infancy-Onset Recurrent Metabolic Crises with Encephalocardiomyopathy.

    Kremer LS, Distelmaier F, Alhaddad B, Hempel M, Iuso A, Küpper C, Mühlhausen C, Kovacs-Nagy R, Satanovskij R, Graf E, Berutti R, Eckstein G, Durbin R, Sauer S, Hoffmann GF, Strom TM, Santer R, Meitinger T, Klopstock T, Prokisch H and Haack TB

    Institute of Human Genetics, Technische Universität München, 81675 München, Germany; Institute of Human Genetics, Helmholtz Zentrum München, 85764 Neuherberg, Germany.

    Molecular diagnosis of mitochondrial disorders is challenging because of extreme clinical and genetic heterogeneity. By exome sequencing, we identified three different bi-allelic truncating mutations in TANGO2 in three unrelated individuals with infancy-onset episodic metabolic crises characterized by encephalopathy, hypoglycemia, rhabdomyolysis, arrhythmias, and laboratory findings suggestive of a defect in mitochondrial fatty acid oxidation. Over the course of the disease, all individuals developed global brain atrophy with cognitive impairment and pyramidal signs. TANGO2 (transport and Golgi organization 2) encodes a protein with a putative function in redistribution of Golgi membranes into the endoplasmic reticulum in Drosophila and a mitochondrial localization has been confirmed in mice. Investigation of palmitate-dependent respiration in mutant fibroblasts showed evidence of a functional defect in mitochondrial β-oxidation. Our results establish TANGO2 deficiency as a clinically recognizable cause of pediatric disease with multi-organ involvement.

    American journal of human genetics 2016;98;2;358-62

  • Benzalkonium tolerance genes and outcome in Listeria monocytogenes meningitis.

    Kremer PH, Lees JA, Koopmans MM, Ferwerda B, Arends AW, Feller MM, Schipper K, Seron MV, van der Ende A, Brouwer MC, van de Beek D and Bentley SD

    Department of Neurology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands.

    Objectives: Listeria monocytogenes is a foodborne pathogen that can cause meningitis. The listerial genotype ST6 has been linked to increasing rates of unfavourable outcome over time. We investigated listerial genetic variation and the relation with clinical outcome in meningitis.

    Methods: We sequenced 96 isolates from adults with listerial meningitis included in two prospective nationwide cohort studies by whole genome sequencing, and evaluated associations between bacterial genetic variation and clinical outcome. We validated these results by screening listerial genotypes of 445 cerebrospinal fluid and blood isolates from patients over a 30-year period from the Dutch national surveillance cohort.

    Results: We identified a bacteriophage, phiLMST6 co-occurring with a novel plasmid, pLMST6 in ST6 isolates to be associated with unfavourable outcome in patients (P=2.83e-05). The plasmid carries a benzalkonium chloride tolerance gene, emrC, conferring decreased susceptibility to disinfectants used in the food-processing industry. Isolates harbouring emrC were growth inhibited at higher levels of benzalkonium chloride (median 60 mg/L versus 15 mg/L; P<0.001), and had higher minimum inhibitory concentrations for amoxicillin and gentamicin compared to isolates without emrC (both P<0.001). Transformation of pLMST6 into naïve strains led to benzalkonium chloride tolerance and higher minimum inhibitory concentrations for gentamicin.

    Conclusions: These results show that a novel plasmid, carrying the efflux transporter emrC, is associated with increased incidence of ST6 listerial meningitis in The Netherlands. Suggesting increased disease severity, our findings warrant consideration of disinfectants used in food-processing industry that select for resistance mechanisms and may, inadvertently, lead to increased risk of poor disease outcome.

    Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases 2016

  • Combined immunodeficiency with severe inflammation and allergy caused by ARPC1B deficiency.

    Kuijpers TW, Tool AT, van der Bijl I, de Boer M, van Houdt M, de Cuyper IM, Roos D, van Alphen F, van Leeuwen K, Cambridge E, Arends MJ, Dougan G, Clare S, Ramirez-Solis R, Pals ST, Adams DJ, Meijer AB and van den Berg TK

    Department of Pediatric Hematology, Immunology and Infectious Diseases, Emma Children's Hospital, Academic Medical Center (AMC), University of Amsterdam, Amsterdam, The Netherlands; Department of Blood Cell Research, Sanquin Research and Landsteiner Laboratory , University of Amsterdam, The Netherlands. Electronic address:

    The Journal of allergy and clinical immunology 2016

  • Integrated transcriptomic and proteomic analysis identifies protein kinase CK2 as a key signaling node in an inflammatory cytokine network in ovarian cancer cells.

    Kulbe H, Iorio F, Chakravarty P, Milagre CS, Moore R, Thompson RG, Everitt G, Canosa M, Montoya A, Drygin D, Braicu I, Sehouli J, Saez-Rodriguez J, Cutillas PR and Balkwill FR

    Centre for Cancer and Inflammation, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London, UK.

    We previously showed how key pathways in cancer-related inflammation and Notch signaling are part of an autocrine malignant cell network in ovarian cancer. This network, which we named the "TNF network", has paracrine actions within the tumor microenvironment, influencing angiogenesis and the immune cell infiltrate.The aim of this study was to identify critical regulators in the signaling pathways of the TNF network in ovarian cancer cells that might be therapeutic targets. To achieve our aim, we used a systems biology approach, combining data from phospho-proteomic mass spectrometry and gene expression array analysis. Among the potential therapeutic kinase targets identified was the protein kinase Casein kinase II (CK2).Knockdown of CK2 expression in malignant cells by siRNA or treatment with the specific CK2 inhibitor CX-4945 significantly decreased Notch signaling and reduced constitutive cytokine release in ovarian cancer cell lines that expressed the TNF network as well as malignant cells isolated from high grade serous ovarian cancer ascites. The expression of the same cytokines was also inhibited after treatment with CX-4945 in a 3D organotypic model. CK2 inhibition was associated with concomitant inhibition of proliferative activity, reduced angiogenesis and experimental peritoneal ovarian tumor growth.In conclusion, we have identified kinases, particularly CK2, associated with the TNF network that may play a central role in sustaining the cytokine network and/or mediating its effects in ovarian cancer.

    Funded by: Medical Research Council: G0501974

    Oncotarget 2016

  • Fine-mapping cellular QTLs with RASQUAL and ATAC-seq.

    Kumasaka N, Knights AJ and Gaffney DJ

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Cambridge, UK.

    When cellular traits are measured using high-throughput DNA sequencing, quantitative trait loci (QTLs) manifest as fragment count differences between individuals and allelic differences within individuals. We present RASQUAL (Robust Allele-Specific Quantitation and Quality Control), a new statistical approach for association mapping that models genetic effects and accounts for biases in sequencing data using a single, probabilistic framework. RASQUAL substantially improves fine-mapping accuracy and sensitivity relative to existing methods in RNA-seq, DNase-seq and ChIP-seq data. We illustrate how RASQUAL can be used to maximize association detection by generating the first map of chromatin accessibility QTLs (caQTLs) in a European population using ATAC-seq. Despite a modest sample size, we identified 2,707 independent caQTLs (at a false discovery rate of 10%) and demonstrated how RASQUAL and ATAC-seq can provide powerful information for fine-mapping gene-regulatory variants and for linking distal regulatory elements with gene promoters. Our results highlight how combining between-individual and allele-specific genetic signals improves the functional interpretation of noncoding variation.

    Funded by: Wellcome Trust: WT098051

    Nature genetics 2016;48;2;206-13

  • A genomic island in Vibrio cholerae with VPI-1 site-specific recombination characteristics contains CRISPR-Cas and type VI secretion modules.

    Labbate M, Orata FD, Petty NK, Jayatilleke ND, King WL, Kirchberger PC, Allen C, Mann G, Mutreja A, Thomson NR, Boucher Y and Charles IG

    University of Technology Sydney, School of Life Sciences, Sydney, 2007, Australia.

    Cholera is a devastating diarrhoeal disease caused by certain strains of serogroup O1/O139 Vibrio cholerae. Mobile genetic elements such as genomic islands (GIs) have been pivotal in the evolution of O1/O139 V. cholerae. Perhaps the most important GI involved in cholera disease is the V. cholerae pathogenicity island 1 (VPI-1). This GI contains the toxin-coregulated pilus (TCP) gene cluster that is necessary for colonization of the human intestine as well as being the receptor for infection by the cholera-toxin bearing CTX phage. In this study, we report a GI (designated GIVchS12) from a non-O1/O139 strain of V. cholerae that is present in the same chromosomal location as VPI-1, contains an integrase gene with 94% nucleotide and 100% protein identity to the VPI-1 integrase, and attachment (att) sites 100% identical to those found in VPI-1. However, instead of TCP and the other accessory genes present in VPI-1, GIVchS12 contains a CRISPR-Cas element and a type VI secretion system (T6SS). GIs similar to GIVchS12 were identified in other V. cholerae genomes, also containing CRISPR-Cas elements and/or T6SS's. This study highlights the diversity of GIs circulating in natural V. cholerae populations and identifies GIs with VPI-1 recombination characteristics as a propagator of CRISPR-Cas and T6SS modules.

    Scientific reports 2016;6;36891

  • Extension of human lncRNA transcripts by RACE coupled with long-read high-throughput sequencing (RACE-Seq).

    Lagarde J, Uszczynska-Ratajczak B, Santoyo-Lopez J, Gonzalez JM, Tapanari E, Mudge JM, Steward CA, Wilming L, Tanzer A, Howald C, Chrast J, Vela-Boza A, Rueda A, Lopez-Domingo FJ, Dopazo J, Reymond A, Guigó R and Harrow J

    Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Dr Aiguader 88, 08003 Barcelona, Spain.

    Long non-coding RNAs (lncRNAs) constitute a large, yet mostly uncharacterized fraction of the mammalian transcriptome. Such characterization requires a comprehensive, high-quality annotation of their gene structure and boundaries, which is currently lacking. Here we describe RACE-Seq, an experimental workflow designed to address this based on RACE (rapid amplification of cDNA ends) and long-read RNA sequencing. We apply RACE-Seq to 398 human lncRNA genes in seven tissues, leading to the discovery of 2,556 on-target, novel transcripts. About 60% of the targeted loci are extended in either 5' or 3', often reaching genomic hallmarks of gene boundaries. Analysis of the novel transcripts suggests that lncRNAs are as long, have as many exons and undergo as much alternative splicing as protein-coding genes, contrary to current assumptions. Overall, we show that RACE-Seq is an effective tool to annotate an organism's deep transcriptome, and compares favourably to other targeted sequencing techniques.

    Funded by: NHGRI NIH HHS: U41 HG007000, U41 HG007234, U54 HG007004, Z01 HG000070; NIMH NIH HHS: R01 MH101814

    Nature communications 2016;7;12339

  • Haemonchus contortus: Genome Structure, Organization and Comparative Genomics.

    Laing R, Martinelli A, Tracey A, Holroyd N, Gilleard JS and Cotton JA

    University of Glasgow, Glasgow, Scotland, United Kingdom.

    One of the first genome sequencing projects for a parasitic nematode was that for Haemonchus contortus. The open access data from the Wellcome Trust Sanger Institute provided a valuable early resource for the research community, particularly for the identification of specific genes and genetic markers. Later, a second sequencing project was initiated by the University of Melbourne, and the two draft genome sequences for H. contortus were published back-to-back in 2013. There is a pressing need for long-range genomic information for genetic mapping, population genetics and functional genomic studies, so we are continuing to improve the Wellcome Trust Sanger Institute assembly to provide a finished reference genome for H. contortus. This review describes this process, compares the H. contortus genome assemblies with draft genomes from other members of the strongylid group and discusses future directions for parasite genomics using the H. contortus model.

    Advances in parasitology 2016;93;569-98

  • Spatiotemporal Co-existence of Two Mycobacterium ulcerans Clonal Complexes in the Offin River Valley of Ghana.

    Lamelas A, Ampah KA, Aboagye S, Kerber S, Danso E, Asante-Poku A, Asare P, Parkhill J, Harris SR, Pluschke G, Yeboah-Manu D and Röltgen K

    Swiss Tropical and Public Health Institute, Basel, Switzerland.

    In recent years, comparative genome sequence analysis of African Mycobacterium ulcerans strains isolated from Buruli ulcer (BU) lesion specimen has revealed a very limited genetic diversity of closely related isolates and a striking association between genotype and geographical origin of the patients. Here, we compared whole genome sequences of five M. ulcerans strains isolated in 2004 or 2013 from BU lesions of four residents of the Offin river valley with 48 strains isolated between 2002 and 2005 from BU lesions of individuals residing in the Densu river valley of Ghana. While all M. ulcerans isolates from the Densu river valley belonged to the same clonal complex, members of two distinct clonal complexes were found in the Offin river valley over space and time. The Offin strains were closely related to genotypes from either the Densu region or from the Asante Akim North district of Ghana. These results point towards an occasional involvement of a mobile reservoir in the transmission of M. ulcerans, enabling the spread of bacteria across different regions.

    PLoS neglected tropical diseases 2016;10;7;e0004856

  • An isochore-like structure in the genome of the flatworm Schistosoma mansoni.

    Lamolle G, Protasio AV, Iriarte A, Jara E, Simón D and Musto H

    Laboratorio de Organización y Evolución del Genoma, Facultad de Ciencias, Udelar, Montevideo, Uruguay.

    Eukaryotic genomes are compositionally heterogeneous, i.e. composed by regions that differ in GC content (isochores). The most well documented case is that of vertebrates (mainly mammals) although it has been also noted among unicellular eukaryotes and invertebrates. In the human genome, regarded as a typical mammal, this heterogeneity is associated with several features. Specifically, genes located in GC-richest regions are the GC3-richest, display CpG islands and have shorter introns. Furthermore, these genes are more heavily expressed and tend to be located at the extremes of the chromosomes. Although the compositional heterogeneity seems to be widespread among eukaryotes, the associated properties noted in the human genome and other mammals have not been investigated in depth in other taxa Here we provide evidence that the genome of the parasitic flatworm Schistosoma mansoni is compositionally heterogeneous and exhibits an isochore-like structure, displaying some features associated, until now, only with the human and other vertebrate genomes, with the exception of gene concentration.

    Genome biology and evolution 2016

  • Integrating population variation and protein structural analysis to improve clinical interpretation of missense variation: application to the WD40 domain.

    Laskowski RA, Tyagi N, Johnson D, Joss S, Kinning E, McWilliam C, Splitt M, Thornton JM, Firth HV, DDD Study and Wright CF

    European Bioinformatics Institute (EMBL-EBI) and.

    We present a generic, multidisciplinary approach for improving our understanding of novel missense variants in recently discovered disease genes exhibiting genetic heterogeneity, by combining clinical and population genetics with protein structural analysis. Using six new de novo missense diagnoses in TBL1XR1 from the Deciphering Developmental Disorders study, together with population variation data, we show that the β-propeller structure of the ubiquitous WD40 domain provides a convincing way to discriminate between pathogenic and benign variation. Children with likely pathogenic mutations in this gene have severely delayed language development, often accompanied by intellectual disability, autism, dysmorphology and gastrointestinal problems. Amino acids affected by likely pathogenic missense mutations are either crucial for the stability of the fold, forming part of a highly conserved symmetrically repeating hydrogen-bonded tetrad, or located at the top face of the β-propeller, where 'hotspot' residues affect the binding of β-catenin to the TBLR1 protein. In contrast, those altered by population variation are significantly less likely to be spatially clustered towards the top face or to be at buried or highly conserved residues. This result is useful not only for interpreting benign and pathogenic missense variants in this gene, but also in other WD40 domains, many of which are associated with disease.

    Funded by: Department of Health; Wellcome Trust: WT098051

    Human molecular genetics 2016;25;5;927-35

  • USF1 deficiency activates brown adipose tissue and improves cardiometabolic health.

    Laurila PP, Soronen J, Kooijman S, Forsström S, Boon MR, Surakka I, Kaiharju E, Coomans CP, Van Den Berg SA, Autio A, Sarin AP, Kettunen J, Tikkanen E, Manninen T, Metso J, Silvennoinen R, Merikanto K, Ruuth M, Perttilä J, Mäkelä A, Isomi A, Tuomainen AM, Tikka A, Ramadan UA, Seppälä I, Lehtimäki T, Eriksson J, Havulinna A, Jula A, Karhunen PJ, Salomaa V, Perola M, Ehnholm C, Lee-Rueckert M, Van Eck M, Roivainen A, Taskinen MR, Peltonen L, Mervaala E, Jalanko A, Hohtola E, Olkkonen VM, Ripatti S, Kovanen PT, Rensen PC, Suomalainen A and Jauhiainen M

    Genomics and Biomarkers Unit, National Institute for Health and Welfare, Helsinki FI-00251, Finland. Department of Medical Genetics, University of Helsinki, Helsinki FI-00014, Finland. Institute for Molecular Medicine Finland, FIMM, Helsinki FI-00251, Finland.

    USF1 (upstream stimulatory factor 1) is a transcription factor associated with familial combined hyperlipidemia and coronary artery disease in humans. However, whether USF1 is beneficial or detrimental to cardiometabolic health has not been addressed. By inactivating USF1 in mice, we demonstrate protection against diet-induced dyslipidemia, obesity, insulin resistance, hepatic steatosis, and atherosclerosis. The favorable plasma lipid profile, including increased high-density lipoprotein cholesterol and decreased triglycerides, was coupled with increased energy expenditure due to activation of brown adipose tissue (BAT). Usf1 inactivation directs triglycerides from the circulation to BAT for combustion via a lipoprotein lipase-dependent mechanism, thus enhancing plasma triglyceride clearance. Mice lacking Usf1 displayed increased BAT-facilitated, diet-induced thermogenesis with up-regulation of mitochondrial respiratory chain complexes, as well as increased BAT activity even at thermoneutrality and after BAT sympathectomy. A direct effect of USF1 on BAT activation was demonstrated by an amplified adrenergic response in brown adipocytes after Usf1 silencing, and by augmented norepinephrine-induced thermogenesis in mice lacking Usf1. In humans, individuals carrying SNP (single-nucleotide polymorphism) alleles that reduced USF1 mRNA expression also displayed a beneficial cardiometabolic profile, featuring improved insulin sensitivity, a favorable lipid profile, and reduced atherosclerosis. Our findings identify a new molecular link between lipid metabolism and energy expenditure, and point to the potential of USF1 as a therapeutic target for cardiometabolic disease.

    Science translational medicine 2016;8;323;323ra13

  • Bacterial GWAS: not just gilding the lily.

    Lees JA and Bentley SD

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2016;14;7;406

  • A high-content platform to characterise human induced pluripotent stem cell lines.

    Leha A, Moens N, Meleckyte R, Culley OJ, Gervasio MK, Kerz M, Reimer A, Cain SA, Streeter I, Folarin A, Stegle O, Kielty CM, HipSci Consortium, Durbin R, Watt FM and Danovi D

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

    Induced pluripotent stem cells (iPSCs) provide invaluable opportunities for future cell therapies as well as for studying human development, modelling diseases and discovering therapeutics. In order to realise the potential of iPSCs, it is crucial to comprehensively characterise cells generated from large cohorts of healthy and diseased individuals. The human iPSC initiative (HipSci) is assessing a large panel of cell lines to define cell phenotypes, dissect inter- and intra-line and donor variability and identify its key determinant components. Here we report the establishment of a high-content platform for phenotypic analysis of human iPSC lines. In the described assay, cells are dissociated and seeded as single cells onto 96-well plates coated with fibronectin at three different concentrations. This method allows assessment of cell number, proliferation, morphology and intercellular adhesion. Altogether, our strategy delivers robust quantification of phenotypic diversity within complex cell populations facilitating future identification of the genetic, biological and technical determinants of variance. Approaches such as the one described can be used to benchmark iPSCs from multiple donors and create novel platforms that can readily be tailored for disease modelling and drug discovery.

    Funded by: Medical Research Council: MC_PC_12026, MR/K026666/1, MR/L022699/1; Wellcome Trust: 098503

    Methods (San Diego, Calif.) 2016;96;85-96

  • Genome-Wide Meta-Analysis of Sciatica in Finnish Population.

    Lemmelä S, Solovieva S, Shiri R, Benner C, Heliövaara M, Kettunen J, Anttila V, Ripatti S, Perola M, Seppälä I, Juonala M, Kähönen M, Salomaa V, Viikari J, Raitakari OT, Lehtimäki T, Palotie A, Viikari-Juntura E and Husgafvel-Pursiainen K

    Health and Work Ability, Finnish Institute of Occupational Health, 00250 Helsinki, Finland.

    Sciatica or the sciatic syndrome is a common and often disabling low back disorder in the working-age population. It has a relatively high heritability but poorly understood molecular mechanisms. The Finnish population is a genetic isolate where small founder population and bottleneck events have led to enrichment of certain rare and low frequency variants. We performed here the first genome-wide association (GWAS) and meta-analysis of sciatica. The meta-analysis was conducted across two GWAS covering 291 Finnish sciatica cases and 3671 controls genotyped and imputed at 7.7 million autosomal variants. The most promising loci (p<1x10-6) were replicated in 776 Finnish sciatica patients and 18,489 controls. We identified five intragenic variants, with relatively low frequencies, at two novel loci associated with sciatica at genome-wide significance. These included chr9:14344410:I (rs71321981) at 9p22.3 (NFIB gene; p = 1.30x10-8, MAF = 0.08) and four variants at 15q21.2: rs145901849, rs80035109, rs190200374 and rs117458827 (MYO5A; p = 1.34x10-8, MAF = 0.06; p = 2.32x10-8, MAF = 0.07; p = 3.85x10-8, MAF = 0.06; p = 4.78x10-8, MAF = 0.07, respectively). The most significant association in the meta-analysis, a single base insertion rs71321981 within the regulatory region of the transcription factor NFIB, replicated in an independent Finnish population sample (p = 0.04). Despite identifying 15q21.2 as a promising locus, we were not able to replicate it. It was differentiated; the lead variants within 15q21.2 were more frequent in Finland (6-7%) than in other European populations (1-2%). Imputation accuracies of the three significantly associated variants (chr9:14344410:I, rs190200374, and rs80035109) were validated by genotyping. In summary, our results suggest a novel locus, 9p22.3 (NFIB), which may be involved in susceptibility to sciatica. In addition, another locus, 15q21.2, emerged as a promising one, but failed to replicate.

    PloS one 2016;11;10;e0163877

  • Inherited platelet disorders: toward DNA-based diagnosis.

    Lentaigne C, Freson K, Laffan MA, Turro E, Ouwehand WH and BRIDGE-BPD Consortium and the ThromboGenomics Consortium

    Centre for Haematology, Imperial College Academic Health Sciences Centre, Imperial College London, London, United Kingdom; Imperial College Healthcare National Health Service Trust, London, United Kingdom;

    Variations in platelet number, volume, and function are largely genetically controlled, and many loci associated with platelet traits have been identified by genome-wide association studies (GWASs).(1) The genome also contains a large number of rare variants, of which a tiny fraction underlies the inherited diseases of humans. Research over the last 3 decades has led to the discovery of 51 genes harboring variants responsible for inherited platelet disorders (IPDs). However, the majority of patients with an IPD still do not receive a molecular diagnosis. Alongside the scientific interest, molecular or genetic diagnosis is important for patients. There is increasing recognition that a number of IPDs are associated with severe pathologies, including an increased risk of malignancy, and a definitive diagnosis can inform prognosis and care. In this review, we give an overview of these disorders grouped according to their effect on platelet biology and their clinical characteristics. We also discuss the challenge of identifying candidate genes and causal variants therein, how IPDs have been historically diagnosed, and how this is changing with the introduction of high-throughput sequencing. Finally, we describe how integration of large genomic, epigenomic, and phenotypic datasets, including whole genome sequencing data, GWASs, epigenomic profiling, protein-protein interaction networks, and standardized clinical phenotype coding, will drive the discovery of novel mechanisms of disease in the near future to improve patient diagnosis and management.

    Blood 2016;127;23;2814-23

  • Specific Roles of XRCC4 Paralogs PAXX and XLF during V(D)J Recombination.

    Lescale C, Lenden Hasse H, Blackford AN, Balmus G, Bianchi JJ, Yu W, Bacoccina L, Jarade A, Clouin C, Sivapalan R, Reina-San-Martin B, Jackson SP and Deriano L

    Departments of Immunology and Genomes and Genetics, Institut Pasteur, 75015 Paris, France.

    Paralog of XRCC4 and XLF (PAXX) is a member of the XRCC4 superfamily and plays a role in nonhomologous end-joining (NHEJ), a DNA repair pathway critical for lymphocyte antigen receptor gene assembly. Here, we find that the functions of PAXX and XLF in V(D)J recombination are masked by redundant joining activities. Thus, combined PAXX and XLF deficiency leads to an inability to join RAG-cleaved DNA ends. Additionally, we demonstrate that PAXX function in V(D)J recombination depends on its interaction with Ku. Importantly, we show that, unlike XLF, the role of PAXX during the repair of DNA breaks does not overlap with ATM and the RAG complex. Our findings illuminate the role of PAXX in V(D)J recombination and support a model in which PAXX and XLF function during NHEJ repair of DNA breaks, whereas XLF, the RAG complex, and the ATM-dependent DNA damage response promote end joining by stabilizing DNA ends.

    Cell reports 2016

  • Genetic Complexity of Crohn's Disease in 2 Large Ashkenazi Jewish Families.

    Levine AP, Pontikos N, Schiff ER, Jostins L, Speed D, NIDDK Inflammatory Bowel Disease Genetics Consortium, Lovat LB, Barrett JC, Grasberger H, Plagnol V and Segal AW

    Division of Medicine, University College London, London, WC1E 6JF, United Kingdom.

    Background &amp; aims: Crohn's disease (CD) is a highly heritable disease that is particularly common in the Ashkenazi Jewish population. We studied 2 large Ashkenazi Jewish families with a high prevalence of CD in an attempt to identify novel genetic risk variants.

    Methods: Ashkenazi Jewish patients with CD and a positive family history were recruited from University College London Hospital. We used genome-wide single-nucleotide polymorphism data to assess the burden of common CD-associated risk variants and for linkage analysis. Exome sequencing was performed and rare variants predicted to be deleterious that were observed at a high frequency in cases were prioritized. We undertook within-family association analysis following imputation and assessed candidate variants for evidence of association with CD in an independent cohort of Ashkenazi Jewish individuals. We examined the effects of a variant in DUOX2 on hydrogen peroxide production in HEK293 cells.

    Results: We identified 2 families (1 with >800 members and 1 with >200 members) containing 54 and 26 cases of CD or colitis, respectively. Both families had a significant enrichment of previously described common CD-associated risk variants. No genome-wide significant linkage was observed. Exome sequencing identified candidate variants, including a missense mutation in DUOX2 that impaired its function and a frameshift mutation in CSF2RB that was associated with CD in an independent cohort of Ashkenazi Jewish individuals.

    Conclusions: In a study of 2 large Ashkenazi Jewish with a multiple cases of CD, we found the genetic basis of the disease to be complex, with a role for common and rare genetic variants. We identified a frameshift mutation in CSF2RB that replicated in an independent cohort. These findings demonstrate the value of family studies and the importance of the innate immune system in the pathogenesis of CD.

    Gastroenterology 2016

  • Exploring regulatory networks of miR-96 in the developing inner ear.

    Lewis MA, Buniello A, Hilton JM, Zhu F, Zhang WI, Evans S, van Dongen S, Enright AJ and Steel KP

    Wolfson Centre for Age-Related Diseases, King's College London, Guy's Campus, London SE1 1UL, UK.

    Mutations in the microRNA Mir96 cause deafness in mice and humans. In the diminuendo mouse, which carries a single base pair change in the seed region of miR-96, the sensory hair cells crucial for hearing fail to develop fully and retain immature characteristics, suggesting that miR-96 is important for coordinating hair cell maturation. Our previous transcriptional analyses show that many genes are misregulated in the diminuendo inner ear and we report here further misregulated genes. We have chosen three complementary approaches to explore potential networks controlled by miR-96 using these transcriptional data. Firstly, we used regulatory interactions manually curated from the literature to construct a regulatory network incorporating our transcriptional data. Secondly, we built a protein-protein interaction network using the InnateDB database. Thirdly, gene set enrichment analysis was used to identify gene sets in which the misregulated genes are enriched. We have identified several candidates for mediating some of the expression changes caused by the diminuendo mutation, including Fos, Myc, Trp53 and Nr3c1, and confirmed our prediction that Fos is downregulated in diminuendo homozygotes. Understanding the pathways regulated by miR-96 could lead to potential therapeutic targets for treating hearing loss due to perturbation of any component of the network.

    Scientific reports 2016;6;23363

  • Alkaline ceramidase 1 is essential for mammalian skin homeostasis and regulating whole body energy expenditure.

    Liakath-Ali K, Vancollie VE, Lelliott CJ, Speak AO, Lafont D, Protheroe HJ, Ingvorsen C, Galli A, Green A, Gleeson D, Ryder E, Glover L, Vizcay-Barrena G, Karp NA, Arends MJ, Brenn T, Spiegel S, Adams DJ, Watt FM and van der Weyden L

    Centre for Stem Cells and Regenerative Medicine, King's College London, 28th Floor, Tower Wing, Guy's Hospital, Great Maze Pond, London, SE1 9R, UK.

    The epidermis is the outermost layer of skin that acts as a barrier to protect the body from the external environment and to control water and heat loss. This barrier function is established through the multistage differentiation of keratinocytes and the presence of bioactive sphingolipids such as ceramides, the levels of which are tightly regulated by a balance of ceramide synthase and ceramidase activities. Here we reveal the essential role of alkaline ceramidase 1 (Acer1) in the skin. Acer1-deficient (Acer1(-/-) ) mice showed elevated levels of ceramide in the skin, aberrant hair shaft cuticle formation and cyclic alopecia. We demonstrate that Acer1 is specifically expressed in differentiated interfollicular epidermis, infundibulum and sebaceous glands and consequently Acer1(-/-) mice have significant alterations in infundibulum and sebaceous gland architecture. Acer1(-/-) skin also shows perturbed hair follicle stem cell compartments. These alterations result in Acer1(-/-) mice showing increased trans-epidermal water loss and a hyper-metabolism phenotype with associated reduction of fat content with age. We conclude that Acer1 is indispensable for mammalian skin homeostasis and whole body energy homeostasis.

    The Journal of pathology 2016

  • Saturation analysis for whole-genome bisulfite sequencing data.

    Libertini E, Heath SC, Hamoudi RA, Gut M, Ziller MJ, Herrero J, Czyz A, Ruotti V, Stunnenberg HG, Frontini M, Ouwehand WH, Meissner A, Gut IG and Beck S

    Medical Genomics, UCL Cancer Institute, University College London, London, UK.

    Nature biotechnology 2016

  • UDP-galactose and acetyl-CoA transporters as Plasmodium multidrug resistance genes.

    Lim MY, LaMonte G, Lee MC, Reimer C, Tan BH, Corey V, Tjahjadi BF, Chua A, Nachon M, Wintjens R, Gedeck P, Malleret B, Renia L, Bonamy GM, Ho PC, Yeung BK, Chow ED, Lim L, Fidock DA, Diagana TT, Winzeler EA and Bifani P

    Novartis Institute for Tropical Diseases, 138670 Singapore.

    A molecular understanding of drug resistance mechanisms enables surveillance of the effectiveness of new antimicrobial therapies during development and deployment in the field. We used conventional drug resistance selection as well as a regime of limiting dilution at early stages of drug treatment to probe two antimalarial imidazolopiperazines, KAF156 and GNF179. The latter approach permits the isolation of low-fitness mutants that might otherwise be out-competed during selection. Whole-genome sequencing of 24 independently derived resistant Plasmodium falciparum clones revealed four parasites with mutations in the known cyclic amine resistance locus (pfcarl) and a further 20 with mutations in two previously unreported P. falciparum drug resistance genes, an acetyl-CoA transporter (pfact) and a UDP-galactose transporter (pfugt). Mutations were validated both in vitro by CRISPR editing in P. falciparum and in vivo by evolution of resistant Plasmodium berghei mutants. Both PfACT and PfUGT were localized to the endoplasmic reticulum by fluorescence microscopy. As mutations in pfact and pfugt conveyed resistance against additional unrelated chemical scaffolds, these genes are probably involved in broad mechanisms of antimalarial drug resistance.

    Funded by: NIAID NIH HHS: R01 AI090141, R01 AI103058

    Nature microbiology 2016;16166

  • A time transect of exomes from a Native American population before and after European contact.

    Lindo J, Huerta-Sánchez E, Nakagome S, Rasmussen M, Petzelt B, Mitchell J, Cybulski JS, Willerslev E, DeGiorgio M and Malhi RS

    Department of Human Genetics, University of Chicago, 920 E 58th Street, Chicago, Illinois 60637, USA.

    A major factor for the population decline of Native Americans after European contact has been attributed to infectious disease susceptibility. To investigate whether a pre-existing genetic component contributed to this phenomenon, here we analyse 50 exomes of a continuous population from the Northwest Coast of North America, dating from before and after European contact. We model the population collapse after European contact, inferring a 57% reduction in effective population size. We also identify signatures of positive selection on immune-related genes in the ancient but not the modern group, with the strongest signal deriving from the human leucocyte antigen (HLA) gene HLA-DQA1. The modern individuals show a marked frequency decrease in the same alleles, likely due to the environmental change associated with European colonization, whereby negative selection may have acted on the same gene after contact. The evident shift in selection pressures correlates to the regional European-borne epidemics of the 1800s.

    Nature communications 2016;7;13175

  • A secreted WNT-ligand-binding domain of FZD5 generated by a frameshift mutation causes autosomal dominant coloboma.

    Liu C, Widen SA, Williamson KA, Ratnapriya R, Gerth-Kahlert C, Rainger J, Alur RP, Strachan E, Manjunath SH, Balakrishnan A, Floyd JA, UK10K Consortium, Li T, Waskiewicz A, Brooks BP, Lehmann OJ, FitzPatrick DR and Swaroop A

    Neurobiology-Neurodegeneration and Repair Laboratory, National Eye Institute, National Institutes of Health, 6 Center Drive, Bethesda, MD 20892, USA, State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou 510060, China.

    Ocular coloboma is a common eye malformation resulting from incomplete fusion of the optic fissure during development. Coloboma is often associated with microphthalmia and/or contralateral anophthalmia. Coloboma shows extensive locus heterogeneity associated with causative mutations identified in genes encoding developmental transcription factors or components of signaling pathways. We report an ultra-rare, heterozygous frameshift mutation in FZD5 (p.Ala219Glufs*49) that was identified independently in two branches of a large family with autosomal dominant non-syndromic coloboma. FZD5 has a single-coding exon and consequently a transcript with this frameshift variant is not a canonical substrate for nonsense-mediated decay. FZD5 encodes a transmembrane receptor with a conserved extracellular cysteine rich domain for ligand binding. The frameshift mutation results in the production of a truncated protein, which retains the Wingless-type MMTV integration site family member-ligand-binding domain, but lacks the transmembrane domain. The truncated protein was secreted from cells, and behaved as a dominant-negative FZD5 receptor, antagonizing both canonical and non-canonical WNT signaling. Expression of the resultant mutant protein caused coloboma and microphthalmia in zebrafish, and disruption of the apical junction of the retinal neural epithelium in mouse, mimicking the phenotype of Fz5/Fz8 compound conditional knockout mutants. Our studies have revealed a conserved role of Wnt-Frizzled (FZD) signaling in ocular development and directly implicate WNT-FZD signaling both in normal closure of the human optic fissure and pathogenesis of coloboma.

    Funded by: Medical Research Council: MC_PC_U127561093

    Human molecular genetics 2016;25;7;1382-91

  • Multigenomic Delineation of Plasmodium Species of the Laverania Subgenus Infecting Wild-living Chimpanzees and Gorillas.

    Liu W, Sundararaman SA, Loy DE, Learn GH, Li Y, Plenderleith LJ, Ndjango JN, Speede S, Atencia R, Cox D, Shaw GM, Ayouba A, Peeters M, Rayner JC, Hahn BH and Sharp PM

    Department of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA;

    Plasmodium falciparum, the major cause of malaria morbidity and mortality worldwide, is only distantly related to other human malaria parasites, and has thus been placed in a separate subgenus, termed Laverania Parasites morphologically similar to P. falciparum have been identified in African apes, but only one other Laverania species, P. reichenowi from chimpanzees, has been formally described. Although recent studies have pointed to the existence of additional Laverania species, their precise number and host associations remain uncertain, primarily because of limited sampling and a paucity of parasite sequences other than from mitochondrial DNA. To address this, we used limiting dilution PCR to amplify additional parasite sequences from a large number of chimpanzee and gorilla blood and fecal samples collected at two sanctuaries and 30 field sites across equatorial Africa. Phylogenetic analyses of more than 2,000 new sequences derived from the mitochondrial, nuclear and apicoplast genomes revealed six divergent and well-supported clades within the Laverania parasite group. Although two of these clades exhibited deep subdivisions in phylogenies estimated from organelle gene sequences, these sublineages were geographically defined and not present in trees from four unlinked nuclear loci. This greatly expanded sequence data set thus confirms six, and not seven or more, ape Laverania species, of which P. reichenowi, P. gaboni, and P. billcollinsi only infect chimpanzees, while P. praefalciparum, P. adleri, P. blacklocki only infect gorillas. The new sequence data also confirm the P. praefalciparum origin of human P. falciparum.

    Genome biology and evolution 2016

  • Genomic heterogeneity of multiple synchronous lung cancer.

    Liu Y, Zhang J, Li L, Yin G, Zhang J, Zheng S, Cheung H, Wu N, Lu N, Mao X, Yang L, Zhang J, Zhang L, Seth S, Chen H, Song X, Liu K, Xie Y, Zhou L, Zhao C, Han N, Chen W, Zhang S, Chen L, Cai W, Li L, Shen M, Xu N, Cheng S, Yang H, Lee JJ, Correa A, Fujimoto J, Behrens C, Chow CW, William WN, Heymach JV, Hong WK, Swisher S, Wistuba II, Wang J, Lin D, Liu X, Futreal PA and Gao Y

    State Key Laboratory of Molecular Oncology, Department of Etiology and Carcinogenesis, Cancer Institute and Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100021, People's Republic of China.

    Multiple synchronous lung cancers (MSLCs) present a clinical dilemma as to whether individual tumours represent intrapulmonary metastases or independent tumours. In this study we analyse genomic profiles of 15 lung adenocarcinomas and one regional lymph node metastasis from 6 patients with MSLC. All 15 lung tumours demonstrate distinct genomic profiles, suggesting all are independent primary tumours, which are consistent with comprehensive histopathological assessment in 5 of the 6 patients. Lung tumours of the same individuals are no more similar to each other than are lung adenocarcinomas of different patients from TCGA cohort matched for tumour size and smoking status. Several known cancer-associated genes have different mutations in different tumours from the same patients. These findings suggest that in the context of identical constitutional genetic background and environmental exposure, different lung cancers in the same individual may have distinct genomic profiles and can be driven by distinct molecular events.

    Nature communications 2016;7;13200

  • Demographic history of the genus Pan inferred from whole mitochondrial genome reconstructions.

    Lobon I, Tucci S, de Manuel M, Ghirotto S, Benazzo A, Prado-Martinez J, Lorente-Galdos B, Nam K, Dabad M, Hernandez-Rodriguez J, Comas D, Navarro A, Schierup MH, Andres AM, Barbujani G, Hvilsom C and Marques-Bonet T

    Institut de Biologia Evolutiva (CSIC-UPF), Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Doctor Aiguader 88, 08003 Barcelona, Catalonia, Spain.

    The genus Pan is the closest genus to our own and it includes two species, Pan paniscus (bonobos) and Pan troglodytes (chimpanzees). The later is constituted by four subspecies, all highly endangered. The study of the Pan genera has been incessantly complicated by the intricate relationship among subspecies and the statistical limitations imposed by the reduced number of samples or genomic markers analysed.Here, we present a new method to reconstruct complete mitochondrial genomes (mitogenomes) from whole genome shotgun (WGS) datasets, mtArchitect, showing that its reconstructions are highly accurate and consistent with long range PCR mitogenomes. We used this approach to build the mitochondrial genomes of 20 newly sequenced samples which, together with available genomes, allowed us to analyse the hitherto most complete Pan mitochondrial genome dataset including 156 chimpanzee and 44 bonobo individuals, with a proportional contribution from all chimpanzee subspecies. We estimated the separation time between chimpanzees and bonobos around 1.15 Mya [0.81-1.49]. Further, we found that under the most probable genealogical model the two clades of chimpanzees, Western+Nigeria-Cameroon and Central+Eastern, separated at 0.59 Mya [0.41-0.78] with further internal separations at 0.32 Mya [0.22-0.43] and 0.16 Mya [0.17-0.34], respectively. Finally, for a subset of our samples, we compared nuclear vs. mitochondrial genomes and we found that chimpanzee subspecies have different patterns of nuclear and mitochondrial diversity, which could be a result of either processes affecting the mitochondrial genome, such as hitchhiking or background selection, or a result of population dynamics.

    Genome biology and evolution 2016

  • Reference-based phasing using the Haplotype Reference Consortium panel.

    Loh PR, Danecek P, Palamara PF, Fuchsberger C, A Reshef Y, K Finucane H, Schoenherr S, Forer L, McCarthy S, Abecasis GR, Durbin R and L Price A

    Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.

    Haplotype phasing is a fundamental problem in medical and population genetics. Phasing is generally performed via statistical phasing in a genotyped cohort, an approach that can yield high accuracy in very large cohorts but attains lower accuracy in smaller cohorts. Here we instead explore the paradigm of reference-based phasing. We introduce a new phasing algorithm, Eagle2, that attains high accuracy across a broad range of cohort sizes by efficiently leveraging information from large external reference panels (such as the Haplotype Reference Consortium; HRC) using a new data structure based on the positional Burrows-Wheeler transform. We demonstrate that Eagle2 attains a ∼20× speedup and ∼10% increase in accuracy compared to reference-based phasing using SHAPEIT2. On European-ancestry samples, Eagle2 with the HRC panel achieves >2× the accuracy of 1000 Genomes-based phasing. Eagle2 is open source and freely available for HRC-based phasing via the Sanger Imputation Service and the Michigan Imputation Server.

    Funded by: NCRR NIH HHS: S10 RR028832; NEI NIH HHS: R01 EY022005; NHGRI NIH HHS: F32 HG007805, R01 HG006399, R01 HG007022; NHLBI NIH HHS: R01 HL117626; NIMH NIH HHS: R01 MH101244

    Nature genetics 2016;48;11;1443-1448

  • No Association of Coronary Artery Disease with X-Chromosomal Variants in Comprehensive International Meta-Analysis.

    Loley C, Alver M, Assimes TL, Bjonnes A, Goel A, Gustafsson S, Hernesniemi J, Hopewell JC, Kanoni S, Kleber ME, Lau KW, Lu Y, Lyytikäinen LP, Nelson CP, Nikpay M, Qu L, Salfati E, Scholz M, Tukiainen T, Willenborg C, Won HH, Zeng L, Zhang W, Anand SS, Beutner F, Bottinger EP, Clarke R, Dedoussis G, Do R, Esko T, Eskola M, Farrall M, Gauguier D, Giedraitis V, Granger CB, Hall AS, Hamsten A, Hazen SL, Huang J, Kähönen M, Kyriakou T, Laaksonen R, Lind L, Lindgren C, Magnusson PK, Marouli E, Mihailov E, Morris AP, Nikus K, Pedersen N, Rallidis L, Salomaa V, Shah SH, Stewart AF, Thompson JR, Zalloua PA, Chambers JC, Collins R, Ingelsson E, Iribarren C, Karhunen PJ, Kooner JS, Lehtimäki T, Loos RJ, März W, McPherson R, Metspalu A, Reilly MP, Ripatti S, Sanghera DK, Thiery J, Watkins H, Deloukas P, Kathiresan S, Samani NJ, Schunkert H, Erdmann J and König IR

    Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Lübeck, Germany.

    In recent years, genome-wide association studies have identified 58 independent risk loci for coronary artery disease (CAD) on the autosome. However, due to the sex-specific data structure of the X chromosome, it has been excluded from most of these analyses. While females have 2 copies of chromosome X, males have only one. Also, one of the female X chromosomes may be inactivated. Therefore, special test statistics and quality control procedures are required. Thus, little is known about the role of X-chromosomal variants in CAD. To fill this gap, we conducted a comprehensive X-chromosome-wide meta-analysis including more than 43,000 CAD cases and 58,000 controls from 35 international study cohorts. For quality control, sex-specific filters were used to adequately take the special structure of X-chromosomal data into account. For single study analyses, several logistic regression models were calculated allowing for inactivation of one female X-chromosome, adjusting for sex and investigating interactions between sex and genetic variants. Then, meta-analyses including all 35 studies were conducted using random effects models. None of the investigated models revealed genome-wide significant associations for any variant. Although we analyzed the largest-to-date sample, currently available methods were not able to detect any associations of X-chromosomal variants with CAD.

    Scientific reports 2016;6;35278

  • Unscrambling the genomic chaos of osteosarcoma reveals extensive transcript fusion, recurrent rearrangements and frequent novel TP53 aberrations.

    Lorenz S, Barøy T, Sun J, Nome T, Vodák D, Bryne JC, Håkelien AM, Fernandez-Cuesta L, Möhlendick B, Rieder H, Szuhai K, Zaikova O, Ahlquist TC, Thomassen GO, Skotheim RI, Lothe RA, Tarpey PS, Campbell P, Flanagan A, Myklebost O and Meza-Zepeda LA

    Department of Tumor Biology, Oslo University Hospital, Norwegian Radium Hospital, Oslo, Norway.

    In contrast to many other sarcoma subtypes, the chaotic karyotypes of osteosarcoma have precluded the identification of pathognomonic translocations. We here report hundreds of genomic rearrangements in osteosarcoma cell lines, showing clear characteristics of microhomology-mediated break-induced replication (MMBIR) and end-joining repair (MMEJ) mechanisms. However, at RNA level, the majority of the fused transcripts did not correspond to genomic rearrangements, suggesting the involvement of trans-splicing, which was further supported by typical trans-splicing characteristics. By combining genomic and transcriptomic analysis, certain recurrent rearrangements were identified and further validated in patient biopsies, including a PMP22-ELOVL5 gene fusion, genomic structural variations affecting RB1, MTAP/CDKN2A and MDM2, and, most frequently, rearrangements involving TP53. Most cell lines (7/11) and a large fraction of tumor samples (10/25) showed TP53 rearrangements, in addition to somatic point mutations (6 patient samples, 1 cell line) and MDM2 amplifications (2 patient samples, 2 cell lines). The resulting inactivation of p53 was demonstrated by a deficiency of the radiation-induced DNA damage response. Thus, TP53 rearrangements are the major mechanism of p53 inactivation in osteosarcoma. Together with active MMBIR and MMEJ, this inactivation probably contributes to the exceptional chromosomal instability in these tumors. Although rampant rearrangements appear to be a phenotype of osteosarcomas, we demonstrate that among the huge number of probable passenger rearrangements, specific recurrent, possibly oncogenic, events are present. For the first time the genomic chaos of osteosarcoma is characterized so thoroughly and delivered new insights in mechanisms involved in osteosarcoma development and may contribute to new diagnostic and therapeutic strategies.

    Oncotarget 2016;7;5;5273-88

  • Integrative genomic analysis implicates limited peripheral adipose storage capacity in the pathogenesis of human insulin resistance.

    Lotta LA, Gulati P, Day FR, Payne F, Ongen H, van de Bunt M, Gaulton KJ, Eicher JD, Sharp SJ, Luan J, De Lucia Rolfe E, Stewart ID, Wheeler E, Willems SM, Adams C, Yaghootkar H, EPIC-InterAct Consortium, Cambridge FPLD1 Consortium, Forouhi NG, Khaw KT, Johnson AD, Semple RK, Frayling T, Perry JR, Dermitzakis E, McCarthy MI, Barroso I, Wareham NJ, Savage DB, Langenberg C, O'Rahilly S and Scott RA

    MRC Epidemiology Unit, University of Cambridge, Cambridge, UK.

    Insulin resistance is a key mediator of obesity-related cardiometabolic disease, yet the mechanisms underlying this link remain obscure. Using an integrative genomic approach, we identify 53 genomic regions associated with insulin resistance phenotypes (higher fasting insulin levels adjusted for BMI, lower HDL cholesterol levels and higher triglyceride levels) and provide evidence that their link with higher cardiometabolic risk is underpinned by an association with lower adipose mass in peripheral compartments. Using these 53 loci, we show a polygenic contribution to familial partial lipodystrophy type 1, a severe form of insulin resistance, and highlight shared molecular mechanisms in common/mild and rare/severe insulin resistance. Population-level genetic analyses combined with experiments in cellular models implicate CCDC92, DNAH10 and L3MBTL3 as previously unrecognized molecules influencing adipocyte differentiation. Our findings support the notion that limited storage capacity of peripheral adipose tissue is an important etiological component in insulin-resistant cardiometabolic disease and highlight genes and mechanisms underpinning this link.

    Nature genetics 2016

  • Association Between Low-Density Lipoprotein Cholesterol-Lowering Genetic Variants and Risk of Type 2 Diabetes: A Meta-analysis.

    Lotta LA, Sharp SJ, Burgess S, Perry JR, Stewart ID, Willems SM, Luan J, Ardanaz E, Arriola L, Balkau B, Boeing H, Deloukas P, Forouhi NG, Franks PW, Grioni S, Kaaks R, Key TJ, Navarro C, Nilsson PM, Overvad K, Palli D, Panico S, Quirós JR, Riboli E, Rolandsson O, Sacerdote C, Salamanca-Fernandez E, Slimani N, Spijkerman AM, Tjonneland A, Tumino R, van der A DL, van der Schouw YT, McCarthy MI, Barroso I, O'Rahilly S, Savage DB, Sattar N, Langenberg C, Scott RA and Wareham NJ

    MRC Epidemiology Unit, University of Cambridge, Cambridge, United Kingdom.

    Importance: Low-density lipoprotein cholesterol (LDL-C)-lowering alleles in or near NPC1L1 or HMGCR, encoding the respective molecular targets of ezetimibe and statins, have previously been used as proxies to study the efficacy of these lipid-lowering drugs. Alleles near HMGCR are associated with a higher risk of type 2 diabetes, similar to the increased incidence of new-onset diabetes associated with statin treatment in randomized clinical trials. It is unknown whether alleles near NPC1L1 are associated with the risk of type 2 diabetes.

    Objective: To investigate whether LDL-C-lowering alleles in or near NPC1L1 and other genes encoding current or prospective molecular targets of lipid-lowering therapy (ie, HMGCR, PCSK9, ABCG5/G8, LDLR) are associated with the risk of type 2 diabetes.

    Design, setting, and participants: The associations with type 2 diabetes and coronary artery disease of LDL-C-lowering genetic variants were investigated in meta-analyses of genetic association studies. Meta-analyses included 50 775 individuals with type 2 diabetes and 270 269 controls and 60 801 individuals with coronary artery disease and 123 504 controls. Data collection took place in Europe and the United States between 1991 and 2016.

    Exposures: Low-density lipoprotein cholesterol-lowering alleles in or near NPC1L1, HMGCR, PCSK9, ABCG5/G8, and LDLR.

    Main outcomes and measures: Odds ratios (ORs) for type 2 diabetes and coronary artery disease.

    Results: Low-density lipoprotein cholesterol-lowering genetic variants at NPC1L1 were inversely associated with coronary artery disease (OR for a genetically predicted 1-mmol/L [38.7-mg/dL] reduction in LDL-C of 0.61 [95% CI, 0.42-0.88]; P = .008) and directly associated with type 2 diabetes (OR for a genetically predicted 1-mmol/L reduction in LDL-C of 2.42 [95% CI, 1.70-3.43]; P < .001). For PCSK9 genetic variants, the OR for type 2 diabetes per 1-mmol/L genetically predicted reduction in LDL-C was 1.19 (95% CI, 1.02-1.38; P = .03). For a given reduction in LDL-C, genetic variants were associated with a similar reduction in coronary artery disease risk (I2 = 0% for heterogeneity in genetic associations; P = .93). However, associations with type 2 diabetes were heterogeneous (I2 = 77.2%; P = .002), indicating gene-specific associations with metabolic risk of LDL-C-lowering alleles.

    Conclusions and relevance: In this meta-analysis, exposure to LDL-C-lowering genetic variants in or near NPC1L1 and other genes was associated with a higher risk of type 2 diabetes. These data provide insights into potential adverse effects of LDL-C-lowering therapy.

    JAMA 2016;316;13;1383-1391

  • Oxford Nanopore MinION Sequencing and Genome Assembly.

    Lu H, Giordano F and Ning Z

    National Centre of Gene Research, Chinese Academy of Sciences, Shanghai 200233, China.

    The revolution of genome sequencing is continuing after the successful second-generation sequencing (SGS) technology. The third-generation sequencing (TGS) technology, led by Pacific Biosciences (PacBio), is progressing rapidly, moving from a technology once only capable of providing data for small genome analysis, or for performing targeted screening, to one that promises high quality de novo assembly and structural variation detection for human-sized genomes. In 2014, the MinION, the first commercial sequencer using nanopore technology, was released by Oxford Nanopore Technologies (ONT). MinION identifies DNA bases by measuring the changes in electrical conductivity generated as DNA strands pass through a biological pore. Its portability, affordability, and speed in data production makes it suitable for real-time applications, the release of the long read sequencer MinION has thus generated much excitement and interest in the genomics community. Whilst de novo genome assemblies can be cheaply produced from SGS data, assembly continuity is often relatively poor, due to the limited ability of short reads to handle long repeats. Assembly quality can be greatly improved by using TGS long reads, since repetitive regions can be easily expanded into using longer sequencing lengths, despite having higher error rates at the base level. The potential of nanopore sequencing has been demonstrated by various studies in genome surveillance at locations where rapid and reliable sequencing is needed, but where resources are limited.

    Genomics, proteomics & bioinformatics 2016

  • Schistosome sex matters: a deep view into gonad-specific and pairing-dependent transcriptomes reveals a complex gender interplay.

    Lu Z, Sessler F, Holroyd N, Hahnel S, Quack T, Berriman M and Grevelding CG

    BFS, Institute of Parasitology, Justus-Liebig-University, Giessen, Germany.

    As a key event for maintaining life cycles, reproduction is a central part of platyhelminth biology. In case of parasitic platyhelminths, reproductive processes can also contribute to pathology. One representative example is the trematode Schistosoma, which causes schistosomiasis, an infectious disease, whose pathology is associated with egg production. Among the outstanding features of schistosomes is their dioecious lifestyle and the pairing-dependent differentiation of the female gonads which finally leads to egg synthesis. To analyze the reproductive biology of Schistosoma mansoni in-depth we isolated complete ovaries and testes from paired and unpaired schistosomes for comparative RNA-seq analyses. Of >7,000 transcripts found in the gonads, 243 (testes) and 3,600 (ovaries) occurred pairing-dependently. Besides the detection of genes transcribed preferentially or specifically in the gonads of both genders, we uncovered pairing-induced processes within the gonads including stem cell-associated and neural functions. Comparisons to work on neuropeptidergic signaling in planarian showed interesting parallels but also remarkable differences and highlights the importance of the nervous system for flatworm gonad differentiation. Finally, we postulated first functional hints for 235 hypothetical genes. Together, these results elucidate key aspects of flatworm reproductive biology and will be relevant for basic as well as applied, exploitable research aspects.

    Scientific reports 2016;6;31150

  • A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor.

    Lun AT, McCarthy DJ and Marioni JC

    Cancer Research UK Cambridge Institute, Cambridge, UK.

    Single-cell RNA sequencing (scRNA-seq) is widely used to profile the transcriptome of individual cells. This provides biological resolution that cannot be matched by bulk RNA sequencing, at the cost of increased technical noise and data complexity. The differences between scRNA-seq and bulk RNA-seq data mean that the analysis of the former cannot be performed by recycling bioinformatics pipelines for the latter. Rather, dedicated single-cell methods are required at various steps to exploit the cellular resolution while accounting for technical noise. This article describes a computational workflow for low-level analyses of scRNA-seq data, based primarily on software packages from the open-source Bioconductor project. It covers basic steps including quality control, data exploration and normalization, as well as more complex procedures such as cell cycle phase assignment, identification of highly variable and correlated genes, clustering into subpopulations and marker gene detection. Analyses were demonstrated on gene-level count data from several publicly available datasets involving haematopoietic stem cells, brain-derived cells, T-helper cells and mouse embryonic stem cells. This will provide a range of usage scenarios from which readers can construct their own analysis pipelines.

    F1000Research 2016;5;2122

  • Predicting quantitative traits from genome and phenome with near perfect accuracy.

    Märtens K, Hallin J, Warringer J, Liti G and Parts L

    Institute of Computer Science, University of Tartu, Tartu 50409, Estonia.

    In spite of decades of linkage and association studies and its potential impact on human health, reliable prediction of an individual's risk for heritable disease remains difficult. Large numbers of mapped loci do not explain substantial fractions of heritable variation, leaving an open question of whether accurate complex trait predictions can be achieved in practice. Here, we use a genome sequenced population of ∼7,000 yeast strains of high but varying relatedness, and predict growth traits from family information, effects of segregating genetic variants and growth in other environments with an average coefficient of determination R(2) of 0.91. This accuracy exceeds narrow-sense heritability, approaches limits imposed by measurement repeatability and is higher than achieved with a single assay in the laboratory. Our results prove that very accurate prediction of complex traits is possible, and suggest that additional data from families rather than reference cohorts may be more useful for this purpose.

    Nature communications 2016;7;11512

  • Effects of long-term ethanol consumption and Aldh1b1 depletion on intestinal tumourigenesis in mice.

    Müller MF, Zhou Y, Adams DJ and Arends MJ

    University of Edinburgh, Division of Pathology, Centre for Comparative Pathology, Cancer Research UK Edinburgh Centre, Institute of Genetics & Molecular Medicine, Western General Hospital, Crewe Road South, Edinburgh, EH4 2XR, UK.

    Ethanol and its metabolite acetaldehyde have been classified as carcinogens for the upper aerodigestive tract, liver, breast and colorectum. Whereas mechanisms related to oxidative stress and Cyp2e1 induction seem to prevail in the liver, and acetaldehyde has been proposed to play a crucial role in the upper aerodigestive tract, pathological mechanisms in the colorectum have not yet been clarified. Moreover, all evidence for a pro-carcinogenic role of ethanol in colorectal cancer is derived from correlations observed in epidemiological studies or from rodent studies with additional carcinogen application or tumour suppressor gene inactivation. In the current study, wildtype mice and mice with depletion of aldehyde dehydrogenase 1b1 (Aldh1b1), an enzyme which has been proposed to play an important role in acetaldehyde detoxification in the intestines, received ethanol in drinking water for one year. Long-term ethanol consumption led to intestinal tumour development in wildtype and Aldh1b1-depleted mice, but no intestinal tumours were observed in water-treated controls. Moreover, a significant increase in DNA damage was detected in the large intestinal epithelium of ethanol-treated mice of both genotypes compared with the respective water-treated groups, along with increased proliferation of the small and large intestinal epithelium. Aldh1b1 depletion led to increased plasma acetaldehyde levels in ethanol-treated mice, to a significant aggravation of ethanol-induced intestinal hyperproliferation, and to more advanced features of intestinal tumours, but it did not affect intestinal tumour incidence. These data indicate that ethanol consumption can initiate intestinal tumourigenesis without any additional carcinogen treatment or tumour suppressor gene inactivation, and we provide evidence for a role of Aldh1b1 in protection of the intestines from ethanol-induced damage, as well as for both carcinogenic and tumour-promoting functions of acetaldehyde, including increased progression of ethanol-induced tumours.

    The Journal of pathology 2016

  • Single-Cell RNA-Sequencing Reveals a Continuous Spectrum of Differentiation in Hematopoietic Cells.

    Macaulay IC, Svensson V, Labalette C, Ferreira L, Hamey F, Voet T, Teichmann SA and Cvejic A

    Sanger Institute-EBI Single-Cell Genomics Centre, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK.

    The transcriptional programs that govern hematopoiesis have been investigated primarily by population-level analysis of hematopoietic stem and progenitor cells, which cannot reveal the continuous nature of the differentiation process. Here we applied single-cell RNA-sequencing to a population of hematopoietic cells in zebrafish as they undergo thrombocyte lineage commitment. By reconstructing their developmental chronology computationally, we were able to place each cell along a continuum from stem cell to mature cell, refining the traditional lineage tree. The progression of cells along this continuum is characterized by a highly coordinated transcriptional program, displaying simultaneous suppression of genes involved in cell proliferation and ribosomal biogenesis as the expression of lineage specific genes increases. Within this program, there is substantial heterogeneity in the expression of the key lineage regulators. Overall, the total number of genes expressed, as well as the total mRNA content of the cell, decreases as the cells undergo lineage commitment.

    Funded by: Cancer Research UK: C45041/A14953; Medical Research Council: MC_PC_12009; Wellcome Trust

    Cell reports 2016;14;4;966-977

  • Separation and parallel sequencing of the genomes and transcriptomes of single cells using G&T-seq.

    Macaulay IC, Teng MJ, Haerty W, Kumar P, Ponting CP and Voet T

    Earlham Institute, Norwich Research Park, Norwich, UK.

    Parallel sequencing of a single cell's genome and transcriptome provides a powerful tool for dissecting genetic variation and its relationship with gene expression. Here we present a detailed protocol for G&T-seq, a method for separation and parallel sequencing of genomic DNA and full-length polyA(+) mRNA from single cells. We provide step-by-step instructions for the isolation and lysis of single cells; the physical separation of polyA(+) mRNA from genomic DNA using a modified oligo-dT bead capture and the respective whole-transcriptome and whole-genome amplifications; and library preparation and sequence analyses of these amplification products. The method allows the detection of thousands of transcripts in parallel with the genetic variants captured by the DNA-seq data from the same single cell. G&T-seq differs from other currently available methods for parallel DNA and RNA sequencing from single cells, as it involves physical separation of the DNA and RNA and does not require bespoke microfluidics platforms. The process can be implemented manually or through automation. When performed manually, paired genome and transcriptome sequencing libraries from eight single cells can be produced in ∼3 d by researchers experienced in molecular laboratory work. For users with experience in the programming and operation of liquid-handling robots, paired DNA and RNA libraries from 96 single cells can be produced in the same time frame. Sequence analysis and integration of single-cell G&T-seq DNA and RNA data requires a high level of bioinformatics expertise and familiarity with a wide range of informatics tools.

    Nature protocols 2016;11;11;2081-103

  • Environmental Correlation Analysis for Genes Associated with Protection against Malaria.

    Mackinnon MJ, Ndila C, Uyoga S, Macharia A, Snow RW, Band G, Rautanen A, Rockett KA, Kwiatkowski DP and Williams TN

    Department of Epidemiology and Demography, KEMRI-Wellcome Trust Research Programme, Kilifi, Kenya.

    Genome-wide searches for loci involved in human resistance to malaria are currently being conducted on a large scale in Africa using case-control studies. Here, we explore the utility of an alternative approach-"environmental correlation analysis, ECA," which tests for clines in allele frequencies across a gradient of an environmental selection pressure-to identify genes that have historically protected against death from malaria. We collected genotype data from 12,425 newborns on 57 candidate malaria resistance loci and 9,756 single nucleotide polymorphisms (SNPs) selected at random from across the genome, and examined their allele frequencies for geographic correlations with long-term malaria prevalence data based on 84,042 individuals living under different historical selection pressures from malaria in coastal Kenya. None of the 57 candidate SNPs showed significant (P < 0.05) correlations in allele frequency with local malaria transmission intensity after adjusting for population structure and multiple testing. In contrast, two of the random SNPs that had highly significant correlations (P < 0.01) were in genes previously linked to malaria resistance, namely, CDH13, encoding cadherin 13, and HS3ST3B1, encoding heparan sulfate 3-O-sulfotransferase 3B1. Both proteins play a role in glycoprotein-mediated cell-cell adhesion which has been widely implicated in cerebral malaria, the most life-threatening form of this disease. Other top genes, including CTNND2 which encodes δ-catenin, a molecular partner to cadherin, were significantly enriched in cadherin-mediated pathways affecting inflammation of the brain vascular endothelium. These results demonstrate the utility of ECA in the discovery of novel genes and pathways affecting infectious disease.

    Funded by: Wellcome Trust: 090770, 091758

    Molecular biology and evolution 2016

  • New Approaches for Needed Vaccines: Bacteria

    MacLennan,C.A., Mutreja,A. and Dougan,G.

    The Vaccine Book: Second Edition 2016;311-29

  • Quantitative proteomic analysis of Shigella flexneri and Shigella sonnei Generalized Modules for Membrane Antigens (GMMA) reveals highly pure preparations.

    Maggiore L, Yu L, Omasits U, Rossi O, Dougan G, Thomson NR, Saul A, Choudhary JS and Gerke C

    Novartis Vaccines Institute for Global Health, Via Fiorentina 1, 53100 Siena, Italy.

    Outer membrane blebs are naturally shed by Gram-negative bacteria and are candidates of interest for vaccines development. Genetic modification of bacteria to induce hyperblebbing greatly increases the yield of blebs, called Generalized Modules for Membrane Antigens (GMMA). The composition of the GMMA from hyperblebbing mutants of Shigella flexneri 2a and Shigella sonnei were quantitatively analyzed using high-sensitivity mass spectrometry with the label-free iBAQ procedure and compared to the composition of the solubilized cells of the GMMA-producing strains. There were 2306 proteins identified, 659 in GMMA and 2239 in bacteria, of which 290 (GMMA) and 1696 (bacteria) were common to both S. flexneri 2a and S. sonnei. Predicted outer membrane and periplasmic proteins constituted 95.7% and 98.7% of the protein mass of S. flexneri 2a and S. sonnei GMMA, respectively. Among the remaining proteins, small quantities of ribosomal proteins collectively accounted for more than half of the predicted cytoplasmic protein impurities in the GMMA. In GMMA, the outer membrane and periplasmic proteins were enriched 13.3-fold (S. flexneri 2a) and 8.3-fold (S. sonnei) compared to their abundance in the parent bacteria. Both periplasmic and outer membrane proteins were enriched similarly, suggesting that GMMA have a similar surface to volume ratio as the surface to periplasmic volume ratio in these mutant bacteria. Results in S. flexneri 2a and S. sonnei showed high reproducibility indicating a robust GMMA-producing process and the low contamination by cytoplasmic proteins support the use of GMMA for vaccines. Data are available via ProteomeXchange with identifier PXD002517.

    Funded by: Wellcome Trust: WT098051

    International journal of medical microbiology : IJMM 2016;306;2;99-108

  • A Phylogenetic and Phenotypic Analysis of Salmonella enterica Serovar Weltevreden, an Emerging Agent of Diarrheal Disease in Tropical Regions.

    Makendi C, Page AJ, Wren BW, Le Thi Phuong T, Clare S, Hale C, Goulding D, Klemm EJ, Pickard D, Okoro C, Hunt M, Thompson CN, Phu Huong Lan N, Tran Do Hoang N, Thwaites GE, Le Hello S, Brisabois A, Weill FX, Baker S and Dougan G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom.

    Salmonella enterica serovar Weltevreden (S. Weltevreden) is an emerging cause of diarrheal and invasive disease in humans residing in tropical regions. Despite the regional and international emergence of this Salmonella serovar, relatively little is known about its genetic diversity, genomics or virulence potential in model systems. Here we used whole genome sequencing and bioinformatics analyses to define the phylogenetic structure of a diverse global selection of S. Weltevreden. Phylogenetic analysis of more than 100 isolates demonstrated that the population of S. Weltevreden can be segregated into two main phylogenetic clusters, one associated predominantly with continental Southeast Asia and the other more internationally dispersed. Subcluster analysis suggested the local evolution of S. Weltevreden within specific geographical regions. Four of the isolates were sequenced using long read sequencing to produce high quality reference genomes. Phenotypic analysis in Hep-2 cells and in a murine infection model indicated that S. Weltevreden were significantly attenuated in these models compared to the classical S. Typhimurium reference strain SL1344. Our work outlines novel insights into this important emerging pathogen and provides a baseline understanding for future research studies.

    Funded by: Wellcome Trust: 100087

    PLoS neglected tropical diseases 2016;10;2;e0004446

  • Genomic epidemiology of artemisinin resistant malaria.

    MalariaGEN Plasmodium falciparum Community Project

    The current epidemic of artemisinin resistant Plasmodium falciparum in Southeast Asia is the result of a soft selective sweep involving at least 20 independent kelch13 mutations. In a large global survey, we find that kelch13 mutations which cause resistance in Southeast Asia are present at low frequency in Africa. We show that African kelch13 mutations have originated locally, and that kelch13 shows a normal variation pattern relative to other genes in Africa, whereas in Southeast Asia there is a great excess of non-synonymous mutations, many of which cause radical amino-acid changes. Thus, kelch13 is not currently undergoing strong selection in Africa, despite a deep reservoir of variations that could potentially allow resistance to emerge rapidly. The practical implications are that public health surveillance for artemisinin resistance should not rely on kelch13 data alone, and interventions to prevent resistance must account for local evolutionary conditions, shown by genomic epidemiology to differ greatly between geographical regions.

    Funded by: FIC NIH HHS: D43 TW006589; Medical Research Council: G0600718, MC_EX_MR/K02440X/1, MR/M006212/1; NIAID NIH HHS: R01 AI101713; Wellcome Trust: 091625

    eLife 2016;5

  • A genomic history of Aboriginal Australia.

    Malaspinas AS, Westaway MC, Muller C, Sousa VC, Lao O, Alves I, Bergström A, Athanasiadis G, Cheng JY, Crawford JE, Heupink TH, Macholdt E, Peischl S, Rasmussen S, Schiffels S, Subramanian S, Wright JL, Albrechtsen A, Barbieri C, Dupanloup I, Eriksson A, Margaryan A, Moltke I, Pugach I, Korneliussen TS, Levkivskyi IP, Moreno-Mayar JV, Ni S, Racimo F, Sikora M, Xue Y, Aghakhanian FA, Brucato N, Brunak S, Campos PF, Clark W, Ellingvåg S, Fourmile G, Gerbault P, Injie D, Koki G, Leavesley M, Logan B, Lynch A, Matisoo-Smith EA, McAllister PJ, Mentzer AJ, Metspalu M, Migliano AB, Murgha L, Phipps ME, Pomat W, Reynolds D, Ricaut FX, Siba P, Thomas MG, Wales T, Wall CM, Oppenheimer SJ, Tyler-Smith C, Durbin R, Dortch J, Manica A, Schierup MH, Foley RA, Lahr MM, Bowern C, Wall JD, Mailund T, Stoneking M, Nielsen R, Sandhu MS, Excoffier L, Lambert DM and Willerslev E

    Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark.

    The population history of Aboriginal Australians remains largely uncharacterized. Here we generate high-coverage genomes for 83 Aboriginal Australians (speakers of Pama-Nyungan languages) and 25 Papuans from the New Guinea Highlands. We find that Papuan and Aboriginal Australian ancestors diversified 25-40 thousand years ago (kya), suggesting pre-Holocene population structure in the ancient continent of Sahul (Australia, New Guinea and Tasmania). However, all of the studied Aboriginal Australians descend from a single founding population that differentiated ~10-32 kya. We infer a population expansion in northeast Australia during the Holocene epoch (past 10,000 years) associated with limited gene flow from this region to the rest of Australia, consistent with the spread of the Pama-Nyungan languages. We estimate that Aboriginal Australians and Papuans diverged from Eurasians 51-72 kya, following a single out-of-Africa dispersal, and subsequently admixed with archaic populations. Finally, we report evidence of selection in Aboriginal Australians potentially associated with living in the desert.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/H005854/1

    Nature 2016;538;7624;207-214

  • Environmental context for understanding the iconic adaptive radiation of cichlid fishes in Lake Malawi.

    Malinsky M and Salzburger W

    Zoological Institute, Department of Environmental Sciences, University of Basel, 4051 Basel, Switzerland; Wellcome Trust Sanger Institute, Cambridge CB10 1SA, United Kingdom.

    Proceedings of the National Academy of Sciences of the United States of America 2016;113;42;11654-11656

  • The Simons Genome Diversity Project: 300 genomes from 142 diverse populations.

    Mallick S, Li H, Lipson M, Mathieson I, Gymrek M, Racimo F, Zhao M, Chennagiri N, Nordenfelt S, Tandon A, Skoglund P, Lazaridis I, Sankararaman S, Fu Q, Rohland N, Renaud G, Erlich Y, Willems T, Gallo C, Spence JP, Song YS, Poletti G, Balloux F, van Driem G, de Knijff P, Romero IG, Jha AR, Behar DM, Bravi CM, Capelli C, Hervig T, Moreno-Estrada A, Posukh OL, Balanovska E, Balanovsky O, Karachanak-Yankova S, Sahakyan H, Toncheva D, Yepiskoposyan L, Tyler-Smith C, Xue Y, Abdullah MS, Ruiz-Linares A, Beall CM, Di Rienzo A, Jeong C, Starikovskaya EB, Metspalu E, Parik J, Villems R, Henn BM, Hodoglugil U, Mahley R, Sajantila A, Stamatoyannopoulos G, Wee JT, Khusainova R, Khusnutdinova E, Litvinov S, Ayodo G, Comas D, Hammer MF, Kivisild T, Klitz W, Winkler CA, Labuda D, Bamshad M, Jorde LB, Tishkoff SA, Watkins WS, Metspalu M, Dryomov S, Sukernik R, Singh L, Thangaraj K, Pääbo S, Kelso J, Patterson N and Reich D

    Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA.

    Here we report the Simons Genome Diversity Project data set: high quality genomes from 300 individuals from 142 diverse populations. These genomes include at least 5.8 million base pairs that are not present in the human reference genome. Our analysis reveals key features of the landscape of human genome variation, including that the rate of accumulation of mutations has accelerated by about 5% in non-Africans compared to Africans since divergence. We show that the ancestors of some pairs of present-day human populations were substantially separated by 100,000 years ago, well before the archaeologically attested onset of behavioural modernity. We also demonstrate that indigenous Australians, New Guineans and Andamanese do not derive substantial ancestry from an early dispersal of modern humans; instead, their modern human ancestry is consistent with coming from the same source as that of other non-Africans.

    Funded by: NCATS NIH HHS: UL1 TR001067; NIGMS NIH HHS: R00 GM111744, R01 GM059290, R01 GM094402, R01 GM100233

    Nature 2016;538;7624;201-206

  • Analyzing tumor heterogeneity and driver genes in single myeloid leukemia cells with SBCapSeq.

    Mann KM, Newberg JY, Black MA, Jones DJ, Amaya-Manzanares F, Guzman-Rojas L, Kodama T, Ward JM, Rust AG, van der Weyden L, Yew CC, Waters JL, Leung ML, Rogers K, Rogers SM, McNoe LA, Selvanesan L, Navin N, Jenkins NA, Copeland NG and Mann MB

    Cancer Research Program, Houston Methodist Research Institute, Houston, Texas, USA.

    A central challenge in oncology is how to kill tumors containing heterogeneous cell populations defined by different combinations of mutated genes. Identifying these mutated genes and understanding how they cooperate requires single-cell analysis, but current single-cell analytic methods, such as PCR-based strategies or whole-exome sequencing, are biased, lack sequencing depth or are cost prohibitive. Transposon-based mutagenesis allows the identification of early cancer drivers, but current sequencing methods have limitations that prevent single-cell analysis. We report a liquid-phase, capture-based sequencing and bioinformatics pipeline, Sleeping Beauty (SB) capture hybridization sequencing (SBCapSeq), that facilitates sequencing of transposon insertion sites from single tumor cells in a SB mouse model of myeloid leukemia (ML). SBCapSeq analysis of just 26 cells from one tumor revealed the tumor's major clonal subpopulations, enabled detection of clonal insertion events not detected by other sequencing methods and led to the identification of dominant subclones, each containing a unique pair of interacting gene drivers along with three to six cooperating cancer genes with SB-driven expression changes.

    Nature biotechnology 2016

  • A Toll-like receptor-1 variant and its characteristic cellular phenotype is associated with severe malaria in Papua New Guinean children.

    Manning L, Cutts J, Stanisic DI, Laman M, Carmagnac A, Allen S, O'Donnell A, Karunajeewa H, Rosanas-Urgell A, Siba P, Davis TM, Michon P, Schofield L, Rockett K, Kwiatkowski D and Mueller I

    School of Medicine and Pharmacology, University of Western Australia, Harry Perkins Institute, Fiona Stanley Hospital, Bull Creek, Western Australia, Australia.

    Genetic factors are likely to contribute to low severe malaria case fatality rates in Melanesian populations, but association studies can be underpowered and may not provide plausible mechanistic explanations if significant associations are detected. In preparation for a genome-wide association study, 29 candidate single-nucleotide polymorphisms (SNPs) with minor allele frequencies >5% were examined in a case-control study of 504 Papua New Guinean children with severe malaria. In parallel, an immunological substudy was performed on convalescent peripheral blood mononuclear cells (PBMCs) from cases and controls. Following stimulation with a Toll-like receptor (TLR) 1/2 agonist, effector cytokines and chemokines were assayed. The only significant genetic association observed involved a nonsynonymous SNP (TLR1rs4833095) in the TLR1 gene. A recessive (TT) genotype was associated with reduced odds of severe malaria of 0.52 (95% confidence interval (0.29-0.90), P=0.006). Concentrations of pro-inflammatory cytokines interleukin-1β and tumour necrosis factor α were significantly higher in severe malaria cases compared with healthy controls, but lower in children with the protective recessive (TT) genotype. A genetic variant in TLR1 may contribute to the low severe malaria case fatality rates in this region through a reduced pro-inflammatory cellular phenotype.

    Funded by: Howard Hughes Medical Institute; Medical Research Council: G0600718; Wellcome Trust: 090532/Z/09/Z, 090770, 090770/Z/09/Z, 098051, WT077383/Z/05/Z

    Genes and immunity 2016;17;1;52-9

  • Multiplexed pancreatic genome engineering and cancer induction by transfection-based CRISPR/Cas9 delivery in mice.

    Maresch R, Mueller S, Veltkamp C, Öllinger R, Friedrich M, Heid I, Steiger K, Weber J, Engleitner T, Barenboim M, Klein S, Louzada S, Banerjee R, Strong A, Stauber T, Gross N, Geumann U, Lange S, Ringelhan M, Varela I, Unger K, Yang F, Schmid RM, Vassiliou GS, Braren R, Schneider G, Heikenwalder M, Bradley A, Saur D and Rad R

    Department of Medicine II, Klinikum rechts der Isar, Technische Universität München, 81675 Munich, Germany.

    Mouse transgenesis has provided fundamental insights into pancreatic cancer, but is limited by the long duration of allele/model generation. Here we show transfection-based multiplexed delivery of CRISPR/Cas9 to the pancreas of adult mice, allowing simultaneous editing of multiple gene sets in individual cells. We use the method to induce pancreatic cancer and exploit CRISPR/Cas9 mutational signatures for phylogenetic tracking of metastatic disease. Our results demonstrate that CRISPR/Cas9-multiplexing enables key applications, such as combinatorial gene-network analysis, in vivo synthetic lethality screening and chromosome engineering. Negative-selection screening in the pancreas using multiplexed-CRISPR/Cas9 confirms the vulnerability of pancreatic cells to Brca2-inactivation in a Kras-mutant context. We also demonstrate modelling of chromosomal deletions and targeted somatic engineering of inter-chromosomal translocations, offering multifaceted opportunities to study complex structural variation, a hallmark of pancreatic cancer. The low-frequency mosaic pattern of transfection-based CRISPR/Cas9 delivery faithfully recapitulates the stochastic nature of human tumorigenesis, supporting wide applicability for biological/preclinical research.

    Funded by: European Research Council: ERC_648521; Medical Research Council: MRC_MC_PC_12009

    Nature communications 2016;7;10770

  • Mutations in genes encoding condensin complex proteins cause microcephaly through decatenation failure at mitosis.

    Martin CA, Murray JE, Carroll P, Leitch A, Mackenzie KJ, Halachev M, Fetit AE, Keith C, Bicknell LS, Fluteau A, Gautier P, Hall EA, Joss S, Soares G, Silva J, Bober MB, Duker A, Wise CA, Quigley AJ, Phadke SR, Deciphering Developmental Disorders Study, Wood AJ, Vagnarelli P and Jackson AP

    MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, United Kingdom.

    Compaction of chromosomes is essential for accurate segregation of the genome during mitosis. In vertebrates, two condensin complexes ensure timely chromosome condensation, sister chromatid disentanglement, and maintenance of mitotic chromosome structure. Here, we report that biallelic mutations in NCAPD2, NCAPH, or NCAPD3, encoding subunits of these complexes, cause microcephaly. In addition, hypomorphic Ncaph2 mice have significantly reduced brain size, with frequent anaphase chromatin bridge formation observed in apical neural progenitors during neurogenesis. Such DNA bridges also arise in condensin-deficient patient cells, where they are the consequence of failed sister chromatid disentanglement during chromosome compaction. This results in chromosome segregation errors, leading to micronucleus formation and increased aneuploidy in daughter cells. These findings establish "condensinopathies" as microcephalic disorders, with decatenation failure as an additional disease mechanism for microcephaly, implicating mitotic chromosome condensation as a key process ensuring mammalian cerebral cortex size.

    Funded by: European Research Council: 281847; Medical Research Council: MC_PC_U127580972; Wellcome Trust

    Genes & development 2016;30;19;2158-2172

  • Status of paratyphoid fever vaccine research and development.

    Martin LB, Simon R, MacLennan CA, Tennant SM, Sahastrabuddhe S and Khan MI

    Sclavo Berhing Vaccines Institute for Global Health, Via Fiorentina 1, 53100 Siena, Italy.

    Salmonella enterica serovars Typhi and Paratyphi (S. Paratyphi) A and B cause enteric fever in humans. Of the paratyphoid group, S. Paratyphi A is the most common serovar. In 2000, there were an estimated 5.4 million cases of S. Paratyphi A worldwide. More recently paratyphoid fever has accounted for an increasing fraction of all cases of enteric fever. Although vaccines for typhoid fever have been developed and in use for decades, vaccines for paratyphoid fever have not yet been licensed. Several S. Paratyphi A vaccines, however, are development and based on either whole cell live-attenuated strains or repeating units of the lipopolysaccharide O-antigen (O:2) conjugated to different protein carriers. An O-specific polysaccharide (O:2) of S. Paratyphi A conjugated to tetanus toxoid (O:2-TT), for example, has been determined be safe and immunogenic after one dose in Phase I and Phase II trials. Two other conjugated vaccine candidates linked to diphtheria toxin and a live-attenuated oral vaccine candidate are currently in preclinical development. As promising vaccine candidates are advanced along the development pipeline, an adequate supply of vaccines will need to be generated to meet growing demand, particularly in the most affected countries.

    Vaccine 2016

  • Constrained positive selection on cancer mutations in normal skin.

    Martincorena I, Jones PH and Campbell PJ

    Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, United Kingdom;

    Proceedings of the National Academy of Sciences of the United States of America 2016;113;9;E1128-9

  • The mechanisms shaping the single-cell transcriptional landscape.

    Martinez-Jimenez CP and Odom DT

    University of Cambridge, Cancer Research UK Cambridge Institute, Robinson Way, Cambridge CB2 0RE, UK; Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

    Recent technological and computational advances in understanding the transcriptional and chromatin features of single cells have begun answering longstanding questions in the extent and impact of biological heterogeneity. Here, we outline the intrinsic and extrinsic mechanisms that underlie the transcriptional and functional diversity within superficially homogeneous populations, and we discuss how fascinating new studies have afforded novel insight into each mechanism. The studies are chosen in part to include initial reports of novel functional genomics tools where the eventual applications will clearly have profound impact on our understanding the dynamics of cell-to-cell transcriptional variation-from individual cells to whole organisms.

    Current opinion in genetics & development 2016;37;27-35

  • Impact of socioeconomic status on disease phenotype, genomic landscape and outcomes in myelodysplastic syndromes.

    Mastaglio F, Bedair K, Papaemmanuil E, Groves MJ, Hyslop A, Keenan N, Hothersall EJ, Campbell PJ, Bowen DT and Tauro S

    Dundee Cancer Centre, Ninewells Hospital & Medical School, University of Dundee, Dundee, UK.

    Genetic and epigenetic alterations contribute to the biological and clinical characteristics of myelodysplastic syndromes (MDS), but a role for socioeconomic environment remains unclear. Here, socioeconomic status (SES) for 283 MDS patients was estimated using the Scottish Index of Multiple Deprivation tool. Indices were assigned to quintile categorical indicators ranked from SES1 (lowest) to SES5 (highest). Clinicopathological features and outcomes between SES quintiles containing 15%, 20%, 19%, 30% and 16% of patients were compared. Prognostic scores identified lower-risk MDS in 82% of patients, with higher-risk disease in 18%. SES quintiles did not associate with age, gender, cytogenetics, International Prognostic scores or, in sub-analysis (n = 95), driver mutations. The odds ratio of a diagnosis of refractory anaemia was greater than other MDS sub-types in SES5 (OR 1·9, P = 0·024). Most patients (91%) exclusively received supportive care. SES did not associate with leukaemic transformation or cause of death. Cox regression models confirmed male gender (P < 0·05), disease-risk (P < 0·0001) and age (P < 0·01) as independent predictors of leukaemia-free survival, with leukaemic transformation an additional determinant of overall survival (P = 0·07). Thus, if access to healthcare is equitable, SES does not determine disease biology or survival in MDS patients receiving supportive treatment; additional studies are required to determine whether outcomes following disease-modifying therapies are influenced by SES.

    British journal of haematology 2016;174;2;227-34

  • Genomic analysis of Salmonella enterica serovar Typhimurium from wild passerines in England and Wales.

    Mather AE, Lawson B, de Pinna E, Wigley P, Parkhill J, Thomson NR, Page AJ, Holmes MA and Paterson GK

    Wellcome Trust Sanger Institute, Hinxton, UK.

    Passerine salmonellosis is a well-recognised disease of birds in the order Passeriformes, including common songbirds such as finches and sparrows, caused by infection with Salmonella enterica serovar Typhimurium. Previous research has suggested that some subtypes of S. Typhimurium - definitive phage types (DT) 40, 56 variant, and 160 - are host-adapted to passerines, and that these birds may represent a reservoir of infection for humans and other animals. Here, we have used whole genome sequences of 11 isolates from British passerines, five isolates of similar DTs from humans and a domestic cat, and previously published S. Typhimurium genomes including similar DTs from other hosts to investigate the phylogenetic relatedness of passerine salmonellae in comparison with other S. Typhimurium, and investigate possible genetic features of the distinct disease pathogenesis of S. Typhimurium in passerines. Our results demonstrate that the 11 passerine isolates and 13 other isolates, including those from non-passerine hosts, were genetically closely related, with a median pairwise single nucleotide polymorphism (SNP) difference of 130 SNPs. These 24 isolates did not carry antimicrobial resistance genetic determinants or the S Typhimurium virulence plasmid. Although our study does not provide evidence of Salmonella transmission from passerines to other hosts, our results are consistent with the hypothesis that wild birds represent a potential reservoir of these Salmonella subtypes, and thus, sensible personal hygiene precautions should be taken when feeding or handling garden birds.

    Importance: Passerine salmonellosis, caused by certain definitive phage types (DTs) of Salmonella Typhimurium, has been documented as a cause of wild passerine mortality since the 1950s in many countries, often in the vicinity of garden bird feeding stations. To gain better insight into its epidemiology and host-pathogen interactions, we genome-sequenced a collection of eleven isolates from wild passerine salmonellosis in England and Wales. Phylogenetic analysis showed these passerine isolates to be closely related to each other and to form a clade distinct from other strains of S Typhimurium, which included a multidrug resistant isolate from invasive non-typhoidal Salmonella disease which shares the same phage type as several of the passerine isolates. Closely related to wild passerine isolates and within the same clade were four S Typhimurium isolates from humans as well as isolates from horses, poultry, cattle, an unspecified wild bird, and a domestic cat and dog with similar DTs and/or multi-locus sequence types. This suggests the potential for cross-species transmission and the genome sequences provide a valuable resource to investigate passerine salmonellosis further.

    Applied and environmental microbiology 2016

  • A reference panel of 64,976 haplotypes for genotype imputation.

    McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, Fuchsberger C, Danecek P, Sharp K, Luo Y, Sidore C, Kwong A, Timpson N, Koskinen S, Vrieze S, Scott LJ, Zhang H, Mahajan A, Veldink J, Peters U, Pato C, van Duijn CM, Gillies CE, Gandin I, Mezzavilla M, Gilly A, Cocca M, Traglia M, Angius A, Barrett JC, Boomsma D, Branham K, Breen G, Brummett CM, Busonero F, Campbell H, Chan A, Chen S, Chew E, Collins FS, Corbin LJ, Smith GD, Dedoussis G, Dorr M, Farmaki AE, Ferrucci L, Forer L, Fraser RM, Gabriel S, Levy S, Groop L, Harrison T, Hattersley A, Holmen OL, Hveem K, Kretzler M, Lee JC, McGue M, Meitinger T, Melzer D, Min JL, Mohlke KL, Vincent JB, Nauck M, Nickerson D, Palotie A, Pato M, Pirastu N, McInnis M, Richards JB, Sala C, Salomaa V, Schlessinger D, Schoenherr S, Slagboom PE, Small K, Spector T, Stambolian D, Tuke M, Tuomilehto J, Van den Berg LH, Van Rheenen W, Volker U, Wijmenga C, Toniolo D, Zeggini E, Gasparini P, Sampson MG, Wilson JF, Frayling T, de Bakker PI, Swertz MA, McCarroll S, Kooperberg C, Dekker A, Altshuler D, Willer C, Iacono W, Ripatti S, Soranzo N, Walter K, Swaroop A, Cucca F, Anderson CA, Myers RM, Boehnke M, McCarthy MI, Durbin R and Haplotype Reference Consortium

    Human Genetics, Wellcome Trust Sanger Institute, Hinxton, UK.

    We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole-genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1% and a large increase in the number of SNPs tested in association studies, and it can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently.

    Funded by: Medical 16a0 Research Council: MC_UU_12013/3; Medical Research Council: G0601261, MC_PC_15018, MR/J00314X/1; NEI NIH HHS: P30 EY001583; NHLBI NIH HHS: R01 HL117626; NIDA NIH HHS: R01 DA037904; NIDDK NIH HHS: R01 DK072193, U01 DK062370

    Nature genetics 2016;48;10;1279-83

  • Alternative haplotypes of antigen processing genes in zebrafish diverged early in vertebrate evolution.

    McConnell SC, Hernandez KM, Wcisel DJ, Kettleborough RN, Stemple DL, Yoder JA, Andrade J and de Jong JL

    Section of Hematology-Oncology and Stem Cell Transplant, Department of Pediatrics, The University of Chicago, Chicago, IL 60637;

    Antigen processing and presentation genes found within the MHC are among the most highly polymorphic genes of vertebrate genomes, providing populations with diverse immune responses to a wide array of pathogens. Here, we describe transcriptome, exome, and whole-genome sequencing of clonal zebrafish, uncovering the most extensive diversity within the antigen processing and presentation genes of any species yet examined. Our CG2 clonal zebrafish assembly provides genomic context within a remarkably divergent haplotype of the core MHC region on chromosome 19 for six expressed genes not found in the zebrafish reference genome: mhc1uga, proteasome-β 9b (psmb9b), psmb8f, and previously unknown genes psmb13b, tap2d, and tap2e We identify ancient lineages for Psmb13 within a proteasome branch previously thought to be monomorphic and provide evidence of substantial lineage diversity within each of three major trifurcations of catalytic-type proteasome subunits in vertebrates: Psmb5/Psmb8/Psmb11, Psmb6/Psmb9/Psmb12, and Psmb7/Psmb10/Psmb13. Strikingly, nearby tap2 and MHC class I genes also retain ancient sequence lineages, indicating that alternative lineages may have been preserved throughout the entire MHC pathway since early diversification of the adaptive immune system ∼500 Mya. Furthermore, polymorphisms within the three MHC pathway steps (antigen cleavage, transport, and presentation) are each predicted to alter peptide specificity. Lastly, comparative analysis shows that antigen processing gene diversity is far more extensive than previously realized (with ancient coelacanth psmb8 lineages, shark psmb13, and tap2t and psmb10 outside the teleost MHC), implying distinct immune functions and conserved roles in shaping MHC pathway evolution throughout vertebrates.

    Proceedings of the National Academy of Sciences of the United States of America 2016;113;34;E5014-23

  • Phosphorylation of a constrained azacyclic FTY720 analog enhances anti-leukemic activity without inducing S1P receptor activation.

    McCracken AN, McMonigle RJ, Tessier J, Fransson R, Perryman MS, Chen B, Keebaugh A, Selwan E, Barr SA, Kim SM, Roy SG, Liu G, Fallegger D, Sernissi L, Brandt C, Moitessier N, Snider AJ, Clare S, Müschen M, Huwiler A, Kleinman MT, Hanessian S and Edinger AL

    Department of Developmental and Cell Biology, University of California, Irvine, CA.

    The frequency of poor outcomes in relapsed leukemia patients underscores the need for novel therapeutic approaches. The FDA-approved immunosuppressant FTY720 limits leukemia progression by activating protein phosphatase 2A and restricting nutrient access. Unfortunately, FTY720 cannot be re-purposed for use in cancer patients due to on-target toxicity associated with S1P receptor activation at the elevated, anti-neoplastic dose. Here we show that the constrained azacyclic FTY720 analog SH-RF-177 lacks S1P receptor activity but maintains anti-leukemic activity in vitro and in vivo. SH-RF-177 was not only more potent than FTY720, but killed via a distinct mechanism. Phosphorylation is dispensable for FTY720's anti-leukemic actions. However, chemical biology and genetic approaches demonstrated that the sphingosine kinase 2- (SPHK2) mediated phosphorylation of SH-RF-177 led to engagement of a pro-apoptotic target and increased potency. The cytotoxicity of membrane-permeant FTY720 phosphonate esters suggests that the enhanced potency of SH-RF-177 stems from its more efficient phosphorylation. The tight inverse correlation between SH-RF-177 IC50 and SPHK2 mRNA expression suggests a useful biomarker for SH-RF-177 sensitivity. In summary, these studies indicate that FTY720 analogs that are efficiently phosphorylated but fail to activate S1P receptors may be superior anti-leukemic agents compared to compounds that avoid cardiotoxicity by eliminating phosphorylation.Leukemia accepted article preview online, 30 August 2016. doi:10.1038/leu.2016.244.

    Leukemia 2016

  • A Simple Screening Approach To Prioritize Genes for Functional Analysis Identifies a Role for Interferon Regulatory Factor 7 in the Control of Respiratory Syncytial Virus Disease.

    McDonald JU, Kaforou M, Clare S, Hale C, Ivanova M, Huntley D, Dorner M, Wright VJ, Levin M, Martinon-Torres F, Herberg JA and Tregoning JS

    Mucosal Infection and Immunity Group, Section of Virology, Imperial College London, St. Mary's Campus, London, United Kingdom.

    Greater understanding of the functions of host gene products in response to infection is required. While many of these genes enable pathogen clearance, some enhance pathogen growth or contribute to disease symptoms. Many studies have profiled transcriptomic and proteomic responses to infection, generating large data sets, but selecting targets for further study is challenging. Here we propose a novel data-mining approach combining multiple heterogeneous data sets to prioritize genes for further study by using respiratory syncytial virus (RSV) infection as a model pathogen with a significant health care impact. The assumption was that the more frequently a gene is detected across multiple studies, the more important its role is. A literature search was performed to find data sets of genes and proteins that change after RSV infection. The data sets were standardized, collated into a single database, and then panned to determine which genes occurred in multiple data sets, generating a candidate gene list. This candidate gene list was validated by using both a clinical cohort and in vitro screening. We identified several genes that were frequently expressed following RSV infection with no assigned function in RSV control, including IFI27, IFIT3, IFI44L, GBP1, OAS3, IFI44, and IRF7. Drilling down into the function of these genes, we demonstrate a role in disease for the gene for interferon regulatory factor 7, which was highly ranked on the list, but not for IRF1, which was not. Thus, we have developed and validated an approach for collating published data sets into a manageable list of candidates, identifying novel targets for future analysis. IMPORTANCE Making the most of "big data" is one of the core challenges of current biology. There is a large array of heterogeneous data sets of host gene responses to infection, but these data sets do not inform us about gene function and require specialized skill sets and training for their utilization. Here we describe an approach that combines and simplifies these data sets, distilling this information into a single list of genes commonly upregulated in response to infection with RSV as a model pathogen. Many of the genes on the list have unknown functions in RSV disease. We validated the gene list with new clinical, in vitro, and in vivo data. This approach allows the rapid selection of genes of interest for further, more-detailed studies, thus reducing time and costs. Furthermore, the approach is simple to use and widely applicable to a range of diseases.

    mSystems 2016;1;3

  • A Restricted Repertoire of De Novo Mutations in ITPR1 Cause Gillespie Syndrome with Evidence for Dominant-Negative Effect.

    McEntagart M, Williamson KA, Rainger JK, Wheeler A, Seawright A, De Baere E, Verdin H, Bergendahl LT, Quigley A, Rainger J, Dixit A, Sarkar A, López Laso E, Sanchez-Carpintero R, Barrio J, Bitoun P, Prescott T, Riise R, McKee S, Cook J, McKie L, Ceulemans B, Meire F, Temple IK, Prieur F, Williams J, Clouston P, Németh AH, Banka S, Bengani H, Handley M, Freyer E, Ross A, DDD Study, van Heyningen V, Marsh JA, Elmslie F and FitzPatrick DR

    Medical Genetics, St George's University Hospitals NHS Foundation Trust, Cranmer Terrace, London SW17 0RE, UK.

    Gillespie syndrome (GS) is characterized by bilateral iris hypoplasia, congenital hypotonia, non-progressive ataxia, and progressive cerebellar atrophy. Trio-based exome sequencing identified de novo mutations in ITPR1 in three unrelated individuals with GS recruited to the Deciphering Developmental Disorders study. Whole-exome or targeted sequence analysis identified plausible disease-causing ITPR1 mutations in 10/10 additional GS-affected individuals. These ultra-rare protein-altering variants affected only three residues in ITPR1: Glu2094 missense (one de novo, one co-segregating), Gly2539 missense (five de novo, one inheritance uncertain), and Lys2596 in-frame deletion (four de novo). No clinical or radiological differences were evident between individuals with different mutations. ITPR1 encodes an inositol 1,4,5-triphosphate-responsive calcium channel. The homo-tetrameric structure has been solved by cryoelectron microscopy. Using estimations of the degree of structural change induced by known recessive- and dominant-negative mutations in other disease-associated multimeric channels, we developed a generalizable computational approach to indicate the likely mutational mechanism. This analysis supports a dominant-negative mechanism for GS variants in ITPR1. In GS-derived lymphoblastoid cell lines (LCLs), the proportion of ITPR1-positive cells using immunofluorescence was significantly higher in mutant than control LCLs, consistent with an abnormality of nuclear calcium signaling feedback control. Super-resolution imaging supports the existence of an ITPR1-lined nucleoplasmic reticulum. Mice with Itpr1 heterozygous null mutations showed no major iris defects. Purkinje cells of the cerebellum appear to be the most sensitive to impaired ITPR1 function in humans. Iris hypoplasia is likely to result from either complete loss of ITPR1 activity or structure-specific disruption of multimeric interactions.

    Funded by: Medical Research Council: MC_PC_U127561093, MC_U127527199, MC_U127561093, MR/K01563X/1, MR/M02122X/1

    American journal of human genetics 2016;98;5;981-992

  • Enhanced Methylation Analysis by Recovery of Unsequenceable Fragments.

    McInroy GR, Beraldi D, Raiber EA, Modrzynska K, van Delft P, Billker O and Balasubramanian S

    Department of Chemistry, University of Cambridge, Cambridge, Cambridgeshire, United Kingdom.

    Bisulfite sequencing is a valuable tool for mapping the position of 5-methylcytosine in the genome at single base resolution. However, the associated chemical treatment causes strand scission, which depletes the number of sequenceable DNA fragments in a library and thus necessitates PCR amplification. The AT-rich nature of the library generated from bisulfite treatment adversely affects this amplification, resulting in the introduction of major biases that can confound methylation analysis. Here, we report a method that enables more accurate methylation analysis, by rebuilding bisulfite-damaged components of a DNA library. This recovery after bisulfite treatment (ReBuilT) approach enables PCR-free bisulfite sequencing from low nanogram quantities of genomic DNA. We apply the ReBuilT method for the first whole methylome analysis of the highly AT-rich genome of Plasmodium berghei. Side-by-side comparison to a commercial protocol involving amplification demonstrates a substantial improvement in uniformity of coverage and reduction of sequence context bias. Our method will be widely applicable for quantitative methylation analysis, even for technically challenging genomes, and where limited sample DNA is available.

    Funded by: Cancer Research UK; Wellcome Trust: 099232/Z/12/Z

    PloS one 2016;11;3;e0152322

  • A Genome-Wide Association Study for Regulators of Micronucleus Formation in Mice.

    McIntyre RE, Nicod J, Robles-Espinoza CD, Maciejowski J, Cai N, Hill J, Verstraten R, Iyer V, Rust AG, Balmus G, Mott R, Flint J and Adams DJ

    Experimental Cancer Genetics, The Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK.

    In mammals the regulation of genomic instability plays a key role in tumor suppression and also controls genome plasticity, which is important for recombination during the processes of immunity and meiosis. Most studies to identify regulators of genomic instability have been performed in cells in culture or in systems that report on gross rearrangements of the genome, yet subtle differences in the level of genomic instability can contribute to whole organism phenotypes such as tumor predisposition. Here we performed a genome-wide association study in a population of 1379 outbred Crl:CFW(SW)-US_P08 mice to dissect the genetic landscape of micronucleus formation, a biomarker of chromosomal breaks, whole chromosome loss, and extranuclear DNA. Variation in micronucleus levels is a complex trait with a genome-wide heritability of 53.1%. We identify seven loci influencing micronucleus formation (false discovery rate <5%), and define candidate genes at each locus. Intriguingly at several loci we find evidence for sexual dimorphism in micronucleus formation, with a locus on chromosome 11 being specific to males.

    Funded by: Cancer Research UK: 12401

    G3 (Bethesda, Md.) 2016;6;8;2343-54

  • Navigating the Phenotype Frontier: The Monarch Initiative.

    McMurry JA, Köhler S, Washington NL, Balhoff JP, Borromeo C, Brush M, Carbon S, Conlin T, Dunn N, Engelstad M, Foster E, Gourdine JP, Jacobsen JO, Keith D, Laraway B, Xuan JN, Shefchek K, Vasilevsky NA, Yuan Z, Lewis SE, Hochheiser H, Groza T, Smedley D, Robinson PN, Mungall CJ and Haendel MA

    Department of Medical Informatics and Epidemiology, and Oregon Health and Science University Library, Oregon Health and Science University, Portland, Oregon 97239.

    The principles of genetics apply across the entire tree of life. At the cellular level we share biological mechanisms with species from which we diverged millions, even billions of years ago. We can exploit this common ancestry to learn about health and disease, by analyzing DNA and protein sequences, but also through the observable outcomes of genetic differences, i.e. phenotypes. To solve challenging disease problems we need to unify the heterogeneous data that relates genomics to disease traits. Without a big-picture view of phenotypic data, many questions in genetics are difficult or impossible to answer. The Monarch Initiative ( provides tools for genotype-phenotype analysis, genomic diagnostics, and precision medicine across broad areas of disease.

    Genetics 2016;203;4;1491-5

  • 'Add, stir and reduce': Yersinia spp. as model bacteria for pathogen evolution.

    McNally A, Thomson NR, Reuter S and Wren BW

    Pathogen Research Group, Nottingham Trent University, Clifton Lane, Nottingham NG11 8NS, UK.

    Pathogenic species in the Yersinia genus have historically been targets for research aimed at understanding how bacteria evolve into mammalian pathogens. The advent of large-scale population genomic studies has greatly accelerated the progress in this field, and Yersinia pestis, Yersinia pseudotuberculosis and Yersinia enterocolitica have once again acted as model organisms to help shape our understanding of the evolutionary processes involved in pathogenesis. In this Review, we highlight the gene gain, gene loss and genome rearrangement events that have been identified by genomic studies in pathogenic Yersinia species, and we discuss how these findings are changing our understanding of pathogen evolution. Finally, as these traits are also found in the genomes of other species in the Enterobacteriaceae, we suggest that they provide a blueprint for the evolution of enteropathogenic bacteria.

    Nature reviews. Microbiology 2016;14;3;177-90

  • Mutation allele burden remains unchanged in chronic myelomonocytic leukaemia responding to hypomethylating agents.

    Merlevede J, Droin N, Qin T, Meldi K, Yoshida K, Morabito M, Chautard E, Auboeuf D, Fenaux P, Braun T, Itzykson R, de Botton S, Quesnel B, Commes T, Jourdan E, Vainchenker W, Bernard O, Pata-Merci N, Solier S, Gayevskiy V, Dinger ME, Cowley MJ, Selimoglu-Buet D, Meyer V, Artiguenave F, Deleuze JF, Preudhomme C, Stratton MR, Alexandrov LB, Padron E, Ogawa S, Koscielny S, Figueroa M and Solary E

    INSERM U1170, Gustave Roussy, 114, rue Edouard Vaillant, 94805 Villejuif, France.

    The cytidine analogues azacytidine and 5-aza-2'-deoxycytidine (decitabine) are commonly used to treat myelodysplastic syndromes, with or without a myeloproliferative component. It remains unclear whether the response to these hypomethylating agents results from a cytotoxic or an epigenetic effect. In this study, we address this question in chronic myelomonocytic leukaemia. We describe a comprehensive analysis of the mutational landscape of these tumours, combining whole-exome and whole-genome sequencing. We identify an average of 14±5 somatic mutations in coding sequences of sorted monocyte DNA and the signatures of three mutational processes. Serial sequencing demonstrates that the response to hypomethylating agents is associated with changes in DNA methylation and gene expression, without any decrease in the mutation allele burden, nor prevention of new genetic alteration occurence. Our findings indicate that cytosine analogues restore a balanced haematopoiesis without decreasing the size of the mutated clone, arguing for a predominantly epigenetic effect.

    Nature communications 2016;7;10767

  • Strain features and distributions in pneumococci from children with invasive disease before and after 13-valent conjugate vaccine implementation in the USA.

    Metcalf BJ, Gertz RE, Gladstone RA, Walker H, Sherwood LK, Jackson D, Li Z, Law C, Hawkins PA, Chochua S, Sheth M, Rayamajhi N, Bentley SD, Kim L, Whitney CG, McGee L, Beall B and Active Bacterial Core surveillance team

    Centers for Disease Control and Prevention, National Center for Immunization and Respiratory Diseases, Atlanta, GA, USA.

    The effect of second-generation pneumococcal conjugate vaccines on invasive pneumococcal disease (IPD) strain distributions have not yet been well described. We analysed IPD isolates recovered from children aged <5 years through Active Bacterial Core surveillance before (2008-2009; n = 828) and after (2011-2013; n = 600) 13-valent pneumococcal conjugate vaccine (PCV13) implementation. We employed conventional testing, PCR/electrospray ionization mass spectrometry and whole genome sequence (WGS) analysis to identify serotypes, resistance features, genotypes, and pilus types. PCV13, licensed in February 2010, effectively targeted all major 19A and 7F genotypes, and decreased antimicrobial resistance, primarily owing to removal of the 19A/ST320 complex. The strain complex contributing most to the remaining β-lactam resistance during 2011-2013 was 35B/ST558. Significant emergence of non-vaccine clonal complexes was not evident. Because of the removal of vaccine serotype strains, positivity for one or both pilus types (PI-1 and PI-2) decreased in the post-PCV13 years 2011-2013 relative to 2008-2009 (decreases of 32-55% for PI-1, and >95% for PI-2 and combined PI-1 + PI-2). β-Lactam susceptibility phenotypes correlated consistently with transpeptidase region sequence combinations of the three major penicillin-binding proteins (PBPs) determined through WGS analysis. Other major resistance features were predictable by DNA signatures from WGS analysis. Multilocus sequence data combined with PBP combinations identified progeny, serotype donors and recipient strains in serotype switch events. PCV13 decreased the frequency of all PCV13 serotype clones and concurrently decreased the frequency of strain subsets with resistance and/or adherence features conducive to successful carriage. Our results serve as a reference describing key features of current paediatric IPD strains in the USA after PCV13 implementation.

    Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases 2016;22;1;60.e9-60.e29

  • Whole genome sequencing to investigate a putative outbreak of the virulent community-associated methicillin-resistant Staphylococcus aureus ST93 clone in a remote Indigenous community.

    Meumann EM, Andersson P, Yeaman F, Oldfield S, Lilliebridge R, Bentley SD, Krause V, Beaman M, Currie BJ, Holt DC, Giffard PM and Tong SY

    Global and Tropical Health Division, Menzies School of Health Research, Charles Darwin University, Darwin, Northern Territory, Australia; Department of Infectious Diseases, Royal Darwin Hospital, Darwin, Northern Territory, Australia; Centre for Disease Control, Department of Health, Northern Territory Government, Darwin, Northern Territory, Australia.

    We report two cases of severe pneumonia due to clone ST93 methicillin-resistant Staphylococcus aureus (MRSA) presenting from a remote Australian Indigenous community within a 2-week period, and the utilization of whole genome sequences to determine whether these were part of an outbreak. S. aureus was isolated from 12 of 92 nasal swabs collected from 25 community households (including the two index households); one isolate was ST93. Three of five skin lesion S. aureus isolates obtained at the community were ST93. Whole genome sequencing of the ST93 isolates from this study and a further 20 ST93 isolates from the same region suggested that recent transmission and progression to disease had not taken place. The proximity in time and space of the two severe pneumonia cases is probably a reflection of the high burden of disease due to ST93 MRSA in this population where skin infections and household crowding are common.

    Microbial genomics 2016;2;12;e000098

  • Quantifying side-chain conformational variations in protein structure.

    Miao Z and Cao Y

    Architecture et Réactivité de l'ARN, Université de Strasbourg, Institut de biologie moléculaire et cellulaire du CNRS, 67000 Strasbourg, France.

    Protein side-chain conformation is closely related to their biological functions. The side-chain prediction is a key step in protein design, protein docking and structure optimization. However, side-chain polymorphism comprehensively exists in protein as various types and has been long overlooked by side-chain prediction. But such conformational variations have not been quantitatively studied and the correlations between these variations and residue features are vague. Here, we performed statistical analyses on large scale data sets and found that the side-chain conformational flexibility is closely related to the exposure to solvent, degree of freedom and hydrophilicity. These analyses allowed us to quantify different types of side-chain variabilities in PDB. The results underscore that protein side-chain conformation prediction is not a single-answer problem, leading us to reconsider the assessment approaches of side-chain prediction programs.

    Scientific reports 2016;6;37024

  • FANCD2 limits replication stress and genome instability in cells lacking BRCA2.

    Michl J, Zimmer J, Buffa FM, McDermott U and Tarsounas M

    The CR-UK/MRC Oxford Institute for Radiation Oncology, Department of Oncology, University of Oxford, Oxford, U.K.

    The tumor suppressor BRCA2 plays a key role in genome integrity by promoting replication-fork stability and homologous recombination (HR) DNA repair. Here we report that human cancer cells lacking BRCA2 rely on the Fanconi anemia protein FANCD2 to limit replication-fork progression and genomic instability. Our results identify a new role of FANCD2 in limiting constitutive replication stress in BRCA2-deficient cells, thereby affecting cell survival and treatment responses.

    Funded by: Cancer Research UK: A8942

    Nature structural & molecular biology 2016;23;8;755-757

  • Attitudes of nearly 7000 health professionals, genomic researchers and publics toward the return of incidental results from sequencing research.

    Middleton A, Morley KI, Bragin E, Firth HV, Hurles ME, Wright CF, Parker M and DDD study

    Wellcome Trust Sanger Institute, Human Genetics, Cambridge, UK.

    Genome-wide sequencing in a research setting has the potential to reveal health-related information of personal or clinical utility for the study participant. There is increasing pressure to return research findings to participants that may not be related to the project aims, particularly when these could be used to prevent disease. Such secondary, unsolicited or 'incidental findings' (IFs) may be discovered unintentionally when interpreting sequence data, or as the result of a deliberate opportunistic screen. This cross-sectional, web-based survey investigated attitudes of 6944 individuals from 75 countries towards returning IFs from genome research. Participants included four relevant stakeholder groups: 4961 members of the public, 533 genetic health professionals, 843 non-genetic health professionals and 607 genomic researchers who were invited via traditional media, social media and professional e-mail list-serve. Treatability and perceived utility of incidental results were deemed important with 98% of stakeholders personally interested in learning about preventable life-threatening conditions. Although there was a generic interest in receiving genomic information, stakeholders did not expect researchers to opportunistically screen for IFs in a research setting. On many items, genetic health professionals had significantly more conservative views compared with other stakeholders. This finding demonstrates a disconnect between the views of those handling the findings of research and those participating in research. Exploring, evaluating and ultimately addressing this disconnect should form a priority for researchers and clinicians alike. This social sciences study offers the largest dataset, published to date, of attitudes towards issues surrounding the return of IFs from sequencing research.

    Funded by: Department of Health; Wellcome Trust

    European journal of human genetics : EJHG 2016;24;1;21-9

  • Indels, structural variation, and recombination drive genomic diversity in Plasmodium falciparum.

    Miles A, Iqbal Z, Vauterin P, Pearson R, Campino S, Theron M, Gould K, Mead D, Drury E, O'Brien J, Ruano Rubio V, MacInnis B, Mwangi J, Samarakoon U, Ranford-Cartwright L, Ferdig M, Hayton K, Su XZ, Wellems T, Rayner J, McVean G and Kwiatkowski D

    MRC Centre for Genomics and Global Health, University of Oxford, Oxford OX3 7BN, United Kingdom; Malaria Programme, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, United Kingdom;

    The malaria parasite Plasmodium falciparum has a great capacity for evolutionary adaptation to evade host immunity and develop drug resistance. Current understanding of parasite evolution is impeded by the fact that a large fraction of the genome is either highly repetitive or highly variable and thus difficult to analyze using short-read sequencing technologies. Here, we describe a resource of deep sequencing data on parents and progeny from genetic crosses, which has enabled us to perform the first genome-wide, integrated analysis of SNP, indel and complex polymorphisms, using Mendelian error rates as an indicator of genotypic accuracy. These data reveal that indels are exceptionally abundant, being more common than SNPs and thus the dominant mode of polymorphism within the core genome. We use the high density of SNP and indel markers to analyze patterns of meiotic recombination, confirming a high rate of crossover events and providing the first estimates for the rate of non-crossover events and the length of conversion tracts. We observe several instances of meiotic recombination within copy number variants associated with drug resistance, demonstrating a mechanism whereby fitness costs associated with resistance mutations could be compensated and greater phenotypic plasticity could be acquired.

    Genome research 2016;26;9;1288-99

  • Association of Forced Vital Capacity with the Developmental Gene NCOR2.

    Minelli C, Dean CH, Hind M, Alves AC, Amaral AF, Siroux V, Huikari V, Soler Artigas M, Evans DM, Loth DW, Bossé Y, Postma DS, Sin D, Thompson J, Demenais F, Henderson J, SpiroMeta consortium, CHARGE consortium, Bouzigon E, Jarvis D, Järvelin MR and Burney P

    Respiratory Epidemiology, Occupational Medicine and Public Health, National Heart and Lung Institute, Imperial College, London, United Kingdom.

    Background: Forced Vital Capacity (FVC) is an important predictor of all-cause mortality in the absence of chronic respiratory conditions. Epidemiological evidence highlights the role of early life factors on adult FVC, pointing to environmental exposures and genes affecting lung development as risk factors for low FVC later in life. Although highly heritable, a small number of genes have been found associated with FVC, and we aimed at identifying further genetic variants by focusing on lung development genes.

    Methods: Per-allele effects of 24,728 SNPs in 403 genes involved in lung development were tested in 7,749 adults from three studies (NFBC1966, ECRHS, EGEA). The most significant SNP for the top 25 genes was followed-up in 46,103 adults (CHARGE and SpiroMeta consortia) and 5,062 children (ALSPAC). Associations were considered replicated if the replication p-value survived Bonferroni correction (p<0.002; 0.05/25), with a nominal p-value considered as suggestive evidence. For SNPs with evidence of replication, effects on the expression levels of nearby genes in lung tissue were tested in 1,111 lung samples (Lung eQTL consortium), with further functional investigation performed using public epigenomic profiling data (ENCODE).

    Results: NCOR2-rs12708369 showed strong replication in children (p = 0.0002), with replication unavailable in adults due to low imputation quality. This intronic variant is in a strong transcriptional enhancer element in lung fibroblasts, but its eQTL effects could not be tested due to low imputation quality in the eQTL dataset. SERPINE2-rs6754561 replicated at nominal level in both adults (p = 0.036) and children (p = 0.045), while WNT16-rs2707469 replicated at nominal level only in adults (p = 0.026). The eQTL analyses showed association of WNT16-rs2707469 with expression levels of the nearby gene CPED1. We found no statistically significant eQTL effects for SERPINE2-rs6754561.

    Conclusions: We have identified a new gene, NCOR2, in the retinoic acid signalling pathway pointing to a role of vitamin A metabolism in the regulation of FVC. Our findings also support SERPINE2, a COPD gene with weak previous evidence of association with FVC, and suggest WNT16 as a further promising candidate.

    Funded by: Chief Scientist Office: CZD/16/6/4; Medical Research Council: MC_PC_15018

    PloS one 2016;11;2;e0147388

  • Genome engineering uncovers 54 evolutionarily conserved and testis-enriched genes that are not required for male fertility in mice.

    Miyata H, Castaneda JM, Fujihara Y, Yu Z, Archambeault DR, Isotani A, Kiyozumi D, Kriseman ML, Mashiko D, Matsumura T, Matzuk RM, Mori M, Noda T, Oji A, Okabe M, Prunskaite-Hyyrylainen R, Ramirez-Solis R, Satouh Y, Zhang Q, Ikawa M and Matzuk MM

    Research Institute for Microbial Diseases, Osaka University, Suita, Osaka 5650871, Japan;

    Gene-expression analysis studies from Schultz et al. estimate that more than 2,300 genes in the mouse genome are expressed predominantly in the male germ line. As of their 2003 publication [Schultz N, Hamra FK, Garbers DL (2003) Proc Natl Acad Sci USA 100(21):12201-12206], the functions of the majority of these testis-enriched genes during spermatogenesis and fertilization were largely unknown. Since the study by Schultz et al., functional analysis of hundreds of reproductive-tract-enriched genes have been performed, but there remain many testis-enriched genes for which their relevance to reproduction remain unexplored or unreported. Historically, a gene knockout is the "gold standard" to determine whether a gene's function is essential in vivo. Although knockout mice without apparent phenotypes are rarely published, these knockout mouse lines and their phenotypic information need to be shared to prevent redundant experiments. Herein, we used bioinformatic and experimental approaches to uncover mouse testis-enriched genes that are evolutionarily conserved in humans. We then used gene-disruption approaches, including Knockout Mouse Project resources (targeting vectors and mice) and CRISPR/Cas9, to mutate and quickly analyze the fertility of these mutant mice. We discovered that 54 mutant mouse lines were fertile. Thus, despite evolutionary conservation of these genes in vertebrates and in some cases in all eukaryotes, our results indicate that these genes are not individually essential for male mouse fertility. Our phenotypic data are highly relevant in this fiscally tight funding period and postgenomic age when large numbers of genomes are being analyzed for disease association, and will prevent unnecessary expenditures and duplications of effort by others.

    Proceedings of the National Academy of Sciences of the United States of America 2016

  • Recent independent emergence of multiple multidrug-resistant Serratia marcescens clones within the United Kingdom and Ireland.

    Moradigaravand D, Boinett CJ, Martin V, Peacock SJ and Parkhill J

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom;

    Serratia marcescens, a member of the Enterobacteriaceae family, is a Gram-negative bacterium responsible for a wide range of nosocomial infections. The emergence of multidrug-resistant strains is an increasing danger to public health. To design effective means to control the dissemination of S. marcescens, an in-depth analysis of the population structure and variation is required. Utilizing whole-genome sequencing, we characterized the population structure and variation, as well as the antimicrobial resistance determinants, of a systematic collection of antimicrobial-resistant S. marcescens associated with bloodstream infections in hospitals across the United Kingdom and Ireland between 2001 and 2011. Our results show that S. marcescens is a diverse species with a high level of genomic variation. However, the collection was largely composed of a limited number of clones that emerged from this diverse background within the past few decades. We identified potential recent transmissions of these clones, within and between hospitals, and showed that they have acquired antimicrobial resistance determinants for different beta-lactams, ciprofloxacin, and tetracyclines on multiple occasions. The expansion of these multidrug-resistant clones suggests that the treatment of S. marcescens infections will become increasingly difficult in the future.

    Funded by: Medical Research Council: G1100100; Wellcome Trust: 098600

    Genome research 2016;26;8;1101-9

  • dfrA thyA Double Deletion in para-Aminosalicylic Acid-Resistant Mycobacterium tuberculosis Beijing Strains.

    Moradigaravand D, Grandjean L, Martinez E, Li H, Zheng J, Coronel J, Moore D, Török ME, Sintchenko V, Huang H, Javid B, Parkhill J, Peacock SJ and Köser CU

    Wellcome Trust Sanger Institute, Hinxton, United Kingdom.

    Funded by: Wellcome Trust: 098600

    Antimicrobial agents and chemotherapy 2016;60;6;3864-7

  • The dissemination of multidrug-resistant Enterobacter cloacae throughout the UK and Ireland.

    Moradigaravand D, Reuter S, Martin V, Peacock SJ and Parkhill J

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    Enterobacter cloacae is a clinically important Gram-negative member of the Enterobacteriaceae, which has increasingly been recognized as a major pathogen in nosocomial infections. Despite this, knowledge about the population structure and the distribution of virulence factors and antibiotic-resistance determinants of this species is scarce. In this study, we analysed a systematic collection of multidrug-resistant E. cloacae isolated between 2001 and 2011 from bloodstream infections across hospitals in the UK and Ireland. We found that the population is characterized by the presence of multiple clones formed at widely different time periods in the past. The clones exhibit a high degree of geographical heterogeneity, which indicates extensive dissemination of these E. cloacae clones across the UK and Ireland. These findings suggest that a diverse, community-based, commensal population underlies multidrug-resistant E. cloacae infections within hospitals.

    Funded by: Wellcome Trust: 098600

    Nature microbiology 2016;1;16173

  • Large-scale production of megakaryocytes from human pluripotent stem cells by chemically defined forward programming.

    Moreau T, Evans AL, Vasquez L, Tijssen MR, Yan Y, Trotter MW, Howard D, Colzani M, Arumugam M, Wu WH, Dalby A, Lampela R, Bouet G, Hobbs CM, Pask DC, Payne H, Ponomaryov T, Brill A, Soranzo N, Ouwehand WH, Pedersen RA and Ghevaert C

    Department of Haematology, University of Cambridge and NHS Blood and Transplant, Long Road, Cambridge CB2 0PT, UK.

    The production of megakaryocytes (MKs)--the precursors of blood platelets--from human pluripotent stem cells (hPSCs) offers exciting clinical opportunities for transfusion medicine. Here we describe an original approach for the large-scale generation of MKs in chemically defined conditions using a forward programming strategy relying on the concurrent exogenous expression of three transcription factors: GATA1, FLI1 and TAL1. The forward programmed MKs proliferate and differentiate in culture for several months with MK purity over 90% reaching up to 2 × 10(5) mature MKs per input hPSC. Functional platelets are generated throughout the culture allowing the prospective collection of several transfusion units from as few as 1 million starting hPSCs. The high cell purity and yield achieved by MK forward programming, combined with efficient cryopreservation and good manufacturing practice (GMP)-compatible culture, make this approach eminently suitable to both in vitro production of platelets for transfusion and basic research in MK and platelet biology.

    Funded by: British Heart Foundation: FS/09/039, FS/09/039/27788, FS/14/40/30921, RG/09/012/28096; Department of Health: RP-PG-0310-1002; Medical Research Council: MC_PC_12009, MR/L022982/1; Wellcome Trust: WT091310, WT098051

    Nature communications 2016;7;11208

  • Whole Genome Sequence of Two Wild-Derived Mus musculus domesticus Inbred Strains, LEWES/EiJ and ZALENDE/EiJ, with Different Diploid Numbers.

    Morgan AP, Didion JP, Doran AG, Holt JM, McMillan L, Keane TM and de Villena FP

    Department of Genetics, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina 27599-7264.

    Wild-derived mouse inbred strains are becoming increasingly popular for complex traits analysis, evolutionary studies, and systems genetics. Here, we report the whole-genome sequencing of two wild-derived mouse inbred strains, LEWES/EiJ and ZALENDE/EiJ, of Mus musculus domesticus origin. These two inbred strains were selected based on their geographic origin, karyotype, and use in ongoing research. We generated 14× and 18× coverage sequence, respectively, and discovered over 1.1 million novel variants, most of which are private to one of these strains. This report expands the number of wild-derived inbred genomes in the Mus genus from six to eight. The sequence variation can be accessed via an online query tool; variant calls (VCF format) and alignments (BAM format) are available for download from a dedicated ftp site. Finally, the sequencing data have also been stored in a lossless, compressed, and indexed format using the multi-string Burrows-Wheeler transform. All data can be used without restriction.

    Funded by: NIAID NIH HHS: U19 AI100625; NIGMS NIH HHS: P50 GM076468, T32 GM067553; NIMH NIH HHS: F30 MH103925

    G3 (Bethesda, Md.) 2016;6;12;4211-4216

  • The Evolutionary Fates of a Large Segmental Duplication in Mouse.

    Morgan AP, Holt JM, McMullan RC, Bell TA, Clayshulte AM, Didion JP, Yadgary L, Thybert D, Odom DT, Flicek P, McMillan L and de Villena FP

    Department of Genetics and Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina 27599.

    Gene duplication and loss are major sources of genetic polymorphism in populations, and are important forces shaping the evolution of genome content and organization. We have reconstructed the origin and history of a 127-kbp segmental duplication, R2d, in the house mouse (Mus musculus). R2d contains a single protein-coding gene, Cwc22 De novo assembly of both the ancestral (R2d1) and the derived (R2d2) copies reveals that they have been subject to nonallelic gene conversion events spanning tens of kilobases. R2d2 is also a hotspot for structural variation: its diploid copy number ranges from zero in the mouse reference genome to >80 in wild mice sampled from around the globe. Hemizygosity for high copy-number alleles of R2d2 is associated in cis with meiotic drive; suppression of meiotic crossovers; and copy-number instability, with a mutation rate in excess of 1 per 100 transmissions in some laboratory populations. Our results provide a striking example of allelic diversity generated by duplication and demonstrate the value of de novo assembly in a phylogenetic context for understanding the mutational processes affecting duplicate genes.

    Funded by: NIMH NIH HHS: F30 MH103925

    Genetics 2016;204;1;267-85

  • The topography of mutational processes in breast cancer genomes.

    Morganella S, Alexandrov LB, Glodzik D, Zou X, Davies H, Staaf J, Sieuwerts AM, Brinkman AB, Martin S, Ramakrishna M, Butler A, Kim HY, Borg Å, Sotiriou C, Futreal PA, Campbell PJ, Span PN, Van Laere S, Lakhani SR, Eyfjord JE, Thompson AM, Stunnenberg HG, van de Vijver MJ, Martens JW, Børresen-Dale AL, Richardson AL, Kong G, Thomas G, Sale J, Rada C, Stratton MR, Birney E and Nik-Zainal S

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridgeshire CB10 1SD, UK.

    Somatic mutations in human cancers show unevenness in genomic distribution that correlate with aspects of genome structure and function. These mutations are, however, generated by multiple mutational processes operating through the cellular lineage between the fertilized egg and the cancer cell, each composed of specific DNA damage and repair components and leaving its own characteristic mutational signature on the genome. Using somatic mutation catalogues from 560 breast cancer whole-genome sequences, here we show that each of 12 base substitution, 2 insertion/deletion (indel) and 6 rearrangement mutational signatures present in breast tissue, exhibit distinct relationships with genomic features relating to transcription, DNA replication and chromatin organization. This signature-based approach permits visualization of the genomic distribution of mutational processes associated with APOBEC enzymes, mismatch repair deficiency and homologous recombinational repair deficiency, as well as mutational processes of unknown aetiology. Furthermore, it highlights mechanistic insights including a putative replication-dependent mechanism of APOBEC-related mutagenesis.

    Funded by: Medical Research Council: MC_U105178805, MC_U105178808; NCI NIH HHS: NIH/NCI 5 P50 CA168504-02, P50 CA168504; Wellcome Trust: 077012/Z/05/Z, 101126/B/13/Z, WT100183MA

    Nature communications 2016;7;11383

  • Filling in the Gap of Human Chromosome 4: Single Molecule Real Time Sequencing of Macrosatellite Repeats in the Facioscapulohumeral Muscular Dystrophy Locus.

    Morioka MS, Kitazume M, Osaki K, Wood J and Tanaka Y

    Dept. of Bioinformatics, Medical Research Institute, Tokyo Medical and Dental University, 1-5-45 Yushima, Bunkyoku, Tokyo, 113-8510, Japan.

    A majority of facioscapulohumeral muscular dystrophy (FSHD) is caused by contraction of macrosatellite repeats called D4Z4 that are located in the subtelomeric region of human chromosome 4q35. Sequencing the FSHD locus has been technically challenging due to its long size and nearly identical nature of repeat elements. Here we report sequencing and partial assembly of a BAC clone carrying an entire FSHD locus by a single molecule real time (SMRT) sequencing technology which could produce long reads up to about 18 kb containing D4Z4 repeats. De novo assembly by Hierarchical Genome Assembly Process 1 (HGAP.1) yielded a contig of 41 kb containing all but a part of the most distal D4Z4 element. The validity of the sequence model was confirmed by an independent approach employing anchored multiple sequence alignment by Kalign using reads containing unique flanking sequences. Our data will provide a basis for further optimization of sequencing and assembly conditions of D4Z4.

    PloS one 2016;11;3;e0151963

  • Genetic identification of thiosulfate sulfurtransferase as an adipocyte-expressed antidiabetic target in mice selected for leanness.

    Morton NM, Beltram J, Carter RN, Michailidou Z, Gorjanc G, McFadden C, Barrios-Llerena ME, Rodriguez-Cuenca S, Gibbins MT, Aird RE, Moreno-Navarrete JM, Munger SC, Svenson KL, Gastaldello A, Ramage L, Naredo G, Zeyda M, Wang ZV, Howie AF, Saari A, Sipilä P, Stulnig TM, Gudnason V, Kenyon CJ, Seckl JR, Walker BR, Webster SP, Dunbar DR, Churchill GA, Vidal-Puig A, Fernandez-Real JM, Emilsson V and Horvat S

    University-British Heart Foundation Centre for Cardiovascular Science, University of Edinburgh, Queen's Medical Research Institute, Edinburgh, UK.

    The discovery of genetic mechanisms for resistance to obesity and diabetes may illuminate new therapeutic strategies for the treatment of this global health challenge. We used the polygenic 'lean' mouse model, which has been selected for low adiposity over 60 generations, to identify mitochondrial thiosulfate sulfurtransferase (Tst; also known as rhodanese) as a candidate obesity-resistance gene with selectively increased expression in adipocytes. Elevated adipose Tst expression correlated with indices of metabolic health across diverse mouse strains. Transgenic overexpression of Tst in adipocytes protected mice from diet-induced obesity and insulin-resistant diabetes. Tst-deficient mice showed markedly exacerbated diabetes, whereas pharmacological activation of TST ameliorated diabetes in mice. Mechanistically, TST selectively augmented mitochondrial function combined with degradation of reactive oxygen species and sulfide. In humans, TST mRNA expression in adipose tissue correlated positively with insulin sensitivity in adipose tissue and negatively with fat mass. Thus, the genetic identification of Tst as a beneficial regulator of adipocyte mitochondrial function may have therapeutic significance for individuals with type 2 diabetes.

    Nature medicine 2016

  • Infection Susceptibility in Gastric Intrinsic Factor (Vitamin B12)-Defective Mice Is Subject to Maternal Influences.

    Mottram L, Speak AO, Selek RM, Cambridge EL, McIntyre Z, Kane L, Mukhopadhyay S, Grove C, Colin A, Brandt C, Duque-Correa MA, Forbester J, Nguyen TA, Hale C, Vasilliou GS, Arends MJ, Wren BW, Dougan G and Clare S

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.

    Unlabelled: Mice harboring a mutation in the gene encoding gastric intrinsic factor (Gif), a protein essential for the absorption of vitamin B12/cobalamin (Cbl), have potential as a model to explore the role of vitamins in infection. The levels of Cbl in the blood of Gif(tm1a/tm1a) mutant mice were influenced by the maternal genotype, with offspring born to heterozygous (high Cbl, F1) mothers exhibiting a significantly higher serum Cbl level than those born to homozygous (low Cbl, F2) equivalents. Low Cbl levels correlated with susceptibility to an infectious challenge with Salmonella enterica serovar Typhimurium or Citrobacter rodentium, and this susceptibility phenotype was moderated by Cbl administration. Transcriptional and metabolic profiling revealed that Cbl deficient mice exhibited a bioenergetic shift similar to a metabolic phenomenon commonly found in cancerous cells under hypoxic conditions known as the Warburg effect, with this metabolic effect being exacerbated further by infection. Our findings demonstrate a role for Cbl in bacterial infection, with potential general relevance to dietary deficiency and infection susceptibility.

    Importance: Malnutrition continues to be a major public health problem in countries with weak infrastructures. In communities with a high prevalence of poor diet, malnourishment and infectious disease can impact vulnerable individuals such as pregnant women and children. Here, we describe a highly flexible murine model for monitoring maternal and environmental influences of vitamin B12 metabolism. We also demonstrate the potential importance of vitamin B12 in controlling susceptibility to bacterial pathogens such as C. rodentium and S Typhimurium. We postulate that this model, along with similarly vitamin deficient mice, could be used to further explore the mechanisms associated with micronutrients and susceptibility to diseases, thereby increasing our understanding of disease in the malnourished.

    mBio 2016;7;3

  • The state of play in higher eukaryote gene annotation.

    Mudge JM and Harrow J

    Department of Computational Genomics, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    A genome sequence is worthless if it cannot be deciphered; therefore, efforts to describe - or 'annotate' - genes began as soon as DNA sequences became available. Whereas early work focused on individual protein-coding genes, the modern genomic ocean is a complex maelstrom of alternative splicing, non-coding transcription and pseudogenes. Scientists - from clinicians to evolutionary biologists - need to navigate these waters, and this has led to the design of high-throughput, computationally driven annotation projects. The catalogues that are being produced are key resources for genome exploration, especially as they become integrated with expression, epigenomic and variation data sets. Their creation, however, remains challe