Sanger Institute - Publications 2017

Number of papers published in 2017: 680

  • Phylogenetic ctDNA analysis depicts early-stage lung cancer evolution.

    Abbosh C, Birkbak NJ, Wilson GA, Jamal-Hanjani M, Constantin T, Salari R, Le Quesne J, Moore DA, Veeriah S, Rosenthal R, Marafioti T, Kirkizlar E, Watkins TBK, McGranahan N, Ward S, Martinson L, Riley J, Fraioli F, Al Bakir M, Grönroos E, Zambrana F, Endozo R, Bi WL, Fennessy FM, Sponer N, Johnson D, Laycock J, Shafi S, Czyzewska-Khan J, Rowan A, Chambers T, Matthews N, Turajlic S, Hiley C, Lee SM, Forster MD, Ahmad T, Falzon M, Borg E, Lawrence D, Hayward M, Kolvekar S, Panagiotopoulos N, Janes SM, Thakrar R, Ahmed A, Blackhall F, Summers Y, Hafez D, Naik A, Ganguly A, Kareht S, Shah R, Joseph L, Marie Quinn A, Crosbie PA, Naidu B, Middleton G, Langman G, Trotter S, Nicolson M, Remmen H, Kerr K, Chetty M, Gomersall L, Fennell DA, Nakas A, Rathinam S, Anand G, Khan S, Russell P, Ezhil V, Ismail B, Irvin-Sellers M, Prakash V, Lester JF, Kornaszewska M, Attanoos R, Adams H, Davies H, Oukrif D, Akarca AU, Hartley JA, Lowe HL, Lock S, Iles N, Bell H, Ngai Y, Elgar G, Szallasi Z, Schwarz RF, Herrero J, Stewart A, Quezada SA, Peggs KS, Van Loo P, Dive C, Lin CJ, Rabinowitz M, Aerts HJWL, Hackshaw A, Shaw JA, Zimmermann BG, TRACERx consortium, PEACE consortium and Swanton C

    Cancer Research UK Lung Cancer Centre of Excellence London and Manchester, University College London Cancer Institute, Paul O'Gorman Building, 72 Huntley Street, London WC1E 6DD, UK.

    The early detection of relapse following primary surgery for non-small-cell lung cancer and the characterization of emerging subclones, which seed metastatic sites, might offer new therapeutic approaches for limiting tumour recurrence. The ability to track the evolutionary dynamics of early-stage lung cancer non-invasively in circulating tumour DNA (ctDNA) has not yet been demonstrated. Here we use a tumour-specific phylogenetic approach to profile the ctDNA of the first 100 TRACERx (Tracking Non-Small-Cell Lung Cancer Evolution Through Therapy (Rx)) study participants, including one patient who was also recruited to the PEACE (Posthumous Evaluation of Advanced Cancer Environment) post-mortem study. We identify independent predictors of ctDNA release and analyse the tumour-volume detection limit. Through blinded profiling of postoperative plasma, we observe evidence of adjuvant chemotherapy resistance and identify patients who are very likely to experience recurrence of their lung cancer. Finally, we show that phylogenetic ctDNA profiling tracks the subclonal nature of lung cancer relapse and metastasis, providing a new approach for ctDNA-driven therapeutic studies.

    Funded by: Medical Research Council: G108/596, MC_UP_1203/1; NCATS NIH HHS: UL1 TR001863; Wellcome Trust: FC001169, FC001202

    Nature 2017;545;7655;446-451

  • Proteomic analysis of extracellular vesicles from a Plasmodium falciparum Kenyan clinical isolate defines a core parasite secretome.

    Abdi A, Yu L, Goulding D, Rono MK, Bejon P, Choudhary J and Rayner J

    Pwani University Bioscience Research Centre, Pwani University, Kilifi, Kenya.

    Background: Many pathogens secrete effector molecules to subvert host immune responses, to acquire nutrients, and/or to prepare host cells for invasion. One of the ways that effector molecules are secreted is through extracellular vesicles (EVs) such as exosomes. Recently, the malaria parasite P. falciparum has been shown to produce EVs that can mediate transfer of genetic material between parasites and induce sexual commitment. Characterizing the content of these vesicles may improve our understanding of P. falciparum pathogenesis and virulence.

    Methods: Previous studies of P. falciparum EVs have been limited to long-term adapted laboratory isolates. In this study, we isolated EVs from a Kenyan P. falciparum clinical isolate adapted to in vitro culture for a short period and characterized their protein content by mass spectrometry (data are available via ProteomeXchange, with identifier PXD006925).

    Results: We show that P. falciparum extracellular vesicles ( PfEVs) are enriched in proteins found within the exomembrane compartments of infected erythrocytes such as Maurer's clefts (MCs), as well as the secretory endomembrane compartments in the apical end of the merozoites, suggesting that these proteins play a role in parasite-host interactions. Comparison of this novel clinically relevant dataset with previously published datasets helps to define a core secretome present in Plasmodium EVs.

    Conclusions: P. falciparum extracellular vesicles contain virulence-associated parasite proteins. Therefore, analysis of PfEVs contents from a range of clinical isolates, and their functional validation may improve our understanding of the virulence mechanisms of the parasite, and potentially identify targets for interventions or diagnostics.

    Wellcome open research 2017;2;50

  • Rapid identification of genes controlling virulence and immunity in malaria parasites.

    Abkallo HM, Martinelli A, Inoue M, Ramaprasad A, Xangsayarath P, Gitaka J, Tang J, Yahata K, Zoungrana A, Mitaka H, Acharjee A, Datta PP, Hunt P, Carter R, Kaneko O, Mustonen V, Illingworth CJR, Pain A and Culleton R

    Malaria Unit, Department of Pathology, Institute of Tropical Medicine, Nagasaki University, Nagasaki, Japan.

    Identifying the genetic determinants of phenotypes that impact disease severity is of fundamental importance for the design of new interventions against malaria. Here we present a rapid genome-wide approach capable of identifying multiple genetic drivers of medically relevant phenotypes within malaria parasites via a single experiment at single gene or allele resolution. In a proof of principle study, we found that a previously undescribed single nucleotide polymorphism in the binding domain of the erythrocyte binding like protein (EBL) conferred a dramatic change in red blood cell invasion in mutant rodent malaria parasites Plasmodium yoelii. In the same experiment, we implicated merozoite surface protein 1 (MSP1) and other polymorphic proteins, as the major targets of strain-specific immunity. Using allelic replacement, we provide functional validation of the substitution in the EBL gene controlling the growth rate in the blood stages of the parasites.

    PLoS pathogens 2017;13;7;e1006447

  • Sanger Institute series: uncovering the genetics of cancer: an interview with David Adams.

    Adams DJ

    Wellcome Trust Sanger Institute, Cambridge, UK, CB10 1SA.

    Dr David Adams speaks to Editor of Oncology Central, Jade Parker: Based at Wellcome Trust Sanger Institute as a senior group leader, David Adams uses DNA sequencing of patients and genetic screens in human cells and mice to identify cancer genes and genetic interactions. The Adams group studies the mechanisms of cancer development, particularly skin cancer (melanoma) and colorectal cancer. They sequence DNA from families with a high incidence of cancer and also tumors from patients to understand why some people are at greater risk of tumor development and how cancers evolve. The group also performs functional studies in cultured cells and in mice to understand how factors such as DNA mutations and the immune system influence tumor growth.

    Future oncology (London, England) 2017;13;24;2133-2135

  • Phylogenetic characterisation of circulating, clinical influenza isolates from Bali, Indonesia: preliminary report from the BaliMEI project.

    Adisasmito W, Budayanti SN, Aisyah DN, Gallo Cassarino T, Rudge JW, Watson SJ, Kozlakidis Z, Smith GJD and Coker R

    Universitas Indonesia, Depok, Indonesia.

    Background: Human influenza represents a major public health concern, especially in south-east Asia where the risk of emergence and spread of novel influenza viruses is particularly high. The BaliMEI study aims to conduct a five year active surveillance and characterisation of influenza viruses in Bali using an extensive network of participating healthcare facilities.

    Methods: Samples were collected during routine diagnostic treatment in healthcare facilities. In addition to standard clinical and molecular methods for influenza typing, next generation sequencing and subsequent de novo genome assembly were performed to investigate the phylogeny of the collected patient samples.

    Results: The samples collected are characteristic of the seasonally circulating influenza viruses with indications of phylogenetic links to other samples characterised in neighbouring countries during the same time period.

    Conclusions: There were some strong phylogenetic links with sequences from samples collected in geographically proximal regions, with some of the samples from the same time-period resulting to small clusters at the tree-end points. However this work, which is the first of its kind completely performed within Indonesia, supports the view that the circulating seasonal influenza in Bali reflects the strains circulating in geographically neighbouring areas as would be expected to occur within a busy regional transit centre.

    Funded by: Departmwnt of Health [UK]; Wellcome Trust

    BMC infectious diseases 2017;17;1;583

  • Enhanced nasopharyngeal infection and shedding associated with an epidemic lineage of emm3 group A Streptococcus.

    Afshar B, Turner CE, Lamagni TL, Smith KC, Al-Shahib A, Underwood A, Holden MTG, Efstratiou A and Sriskandan S

    a Department of Medicine , Imperial College London , London , U.K.

    Background: A group A Streptococcus (GAS) lineage of genotype emm3, sequence type 15 (ST15) was associated with a 6 month upsurge in invasive GAS disease in the UK. The epidemic lineage (Lineage C) had lost 2 typical emm3 prophages, Φ315.1 and Φ315.2 associated with the superantigen ssa, but gained a different prophage (ΦUK-M3.1) associated with a different superantigen, speC and a DNAse spd1.

    Methods and results: The presence of speC and spd1 in Lineage C ST15 strains enhanced both in vitro mitogenic and DNase activities over non-Lineage C ST15 strains. Invasive disease models in Galleria mellonella and SPEC-sensitive transgenic mice, revealed no difference in overall invasiveness of Lineage C ST15 strains compared with non-Lineage C ST15 strains, consistent with clinical and epidemiological analysis. Lineage C strains did however markedly prolong murine nasal infection with enhanced nasal and airborne shedding compared with non-Lineage C strains. Deletion of speC or spd1 in 2 Lineage C strains identified a possible role for spd1 in airborne shedding from the murine nasopharynx.

    Conclusions: Nasopharyngeal infection and shedding of Lineage C strains was enhanced compared with non-Lineage C strains and this was, in part, mediated by the gain of the DNase spd1 through prophage acquisition.

    Funded by: Medical Research Council: G0800777

    Virulence 2017;8;7;1390-1400

  • Transmission patterns and evolution of respiratory syncytial virus in a community outbreak identified by genomic analysis.

    Agoti CN, Munywoki PK, Phan MVT, Otieno JR, Kamau E, Bett A, Kombe I, Githinji G, Medley GF, Cane PA, Kellam P, Cotten M and Nokes DJ

    Epidemiology and Demography Department, Kenya Medical Research Institute (KEMRI) - Wellcome Trust Research Collaborative Programme, Kilifi, Kenya.

    Detailed information on the source, spread and evolution of respiratory syncytial virus (RSV) during seasonal community outbreaks remains sparse. Molecular analyses of attachment (G) gene sequences from hospitalized cases suggest that multiple genotypes and variants co-circulate during epidemics and that RSV persistence over successive seasons is characterized by replacement and multiple new introductions of variants. No studies have defined the patterns of introduction, spread and evolution of RSV at the local community and household level. We present a whole genome sequence analysis of 131 RSV group A viruses collected during 6-month household-based RSV infection surveillance in Coastal Kenya, 2010 within an area of 12 km<sup>2</sup>. RSV infections were identified by regular symptom-independent screening of all household members twice weekly. Phylogenetic analysis revealed that the RSV A viruses in nine households were closely related to genotype GA2 and fell within a single branch of the global phylogeny. Genomic analysis allowed the detection of household-specific variation in seven households. For comparison, using only G gene analysis, household-specific variation was found only in one of the nine households. Nucleotide changes were observed both intra-host (viruses identified from same individual in follow-up sampling) and inter-host (viruses identified from different household members) and these coupled with sampling dates enabled a partial reconstruction of the within household transmission chains. The genomic evolutionary rate for the household dataset was estimated as 2.307 × 10 <sup>-</sup> <sup>3</sup> (95% highest posterior density: 0.935-4.165× 10 <sup>-</sup> <sup>3</sup>) substitutions/site/year. We conclude that (i) at the household level, most RSV infections arise from the introduction of a single virus variant followed by accumulation of household specific variation and (ii) analysis of complete virus genomes is crucial to better understand viral transmission in the community. A key question arising is whether prevention of RSV introduction or spread within the household by vaccinating key transmitting household members would lead to a reduced onward community-wide transmission.

    Virus evolution 2017;3;1;vex006

  • Embedding gender equality into institutional strategy.

    Ahmed S

    Wellcome Trust Sanger Institute, Human Genetics, Cambridge, Cambridgeshire, UK.

    The SiS (Sex in Science) Programme on the WGC (Wellcome Genome Campus) was established in 2011. Key participants include the Wellcome Trust Sanger Institute, EMB-EBI (EMBL-European Bioinformatics Institute), Open Targets and Elixir. The key objectives are to catalyse cultural change, develop partnerships, communicate activities and champion our women in science work at a national and international level ( In this paper, we highlight some of the many initiatives that have taken place since 2013, to address gender inequality at the highest levels; the challenges we have faced and how we have overcome these, and the future direction of travel.

    Global health, epidemiology and genomics 2017;2;e5

  • Some Synonymous and Nonsynonymous gyrA Mutations in Mycobacterium tuberculosis Lead to Systematic False-Positive Fluoroquinolone Resistance Results with the Hain GenoType MTBDRsl Assays.

    Ajileye A, Alvarez N, Merker M, Walker TM, Akter S, Brown K, Moradigaravand D, Schön T, Andres S, Schleusener V, Omar SV, Coll F, Huang H, Diel R, Ismail N, Parkhill J, de Jong BC, Peto TE, Crook DW, Niemann S, Robledo J, Smith EG, Peacock SJ and Köser CU

    Public Health England West Midlands Public Health Laboratory, Heartlands Hospital, Birmingham, United Kingdom.

    In this study, using the Hain GenoType MTBDR<i>sl</i> assays (versions 1 and 2), we found that some nonsynonymous and synonymous mutations in <i>gyrA</i> in <i>Mycobacterium tuberculosis</i> result in systematic false-resistance results to fluoroquinolones by preventing the binding of wild-type probes. Moreover, such mutations can prevent the binding of mutant probes designed for the identification of specific resistance mutations. Although these mutations are likely rare globally, they occur in approximately 7% of multidrug-resistant tuberculosis strains in some settings.

    Funded by: Department of Health; Wellcome Trust: 201344/Z/16/Z

    Antimicrobial agents and chemotherapy 2017;61;4

  • The Helicase Aquarius/EMB-4 Is Required to Overcome Intronic Barriers to Allow Nuclear RNAi Pathways to Heritably Silence Transcription.

    Akay A, Di Domenico T, Suen KM, Nabih A, Parada GE, Larance M, Medhi R, Berkyurek AC, Zhang X, Wedeles CJ, Rudolph KLM, Engelhardt J, Hemberg M, Ma P, Lamond AI, Claycomb JM and Miska EA

    Wellcome Trust Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, UK; Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK.

    Small RNAs play a crucial role in genome defense against transposable elements and guide Argonaute proteins to nascent RNA transcripts to induce co-transcriptional gene silencing. However, the molecular basis of this process remains unknown. Here, we identify the conserved RNA helicase Aquarius/EMB-4 as a direct and essential link between small RNA pathways and the transcriptional machinery in Caenorhabditis elegans. Aquarius physically interacts with the germline Argonaute HRDE-1. Aquarius is required to initiate small-RNA-induced heritable gene silencing. HRDE-1 and Aquarius silence overlapping sets of genes and transposable elements. Surprisingly, removal of introns from a target gene abolishes the requirement for Aquarius, but not HRDE-1, for small RNA-dependent gene silencing. We conclude that Aquarius allows small RNA pathways to compete for access to nascent transcripts undergoing co-transcriptional splicing in order to detect and silence transposable elements. Thus, Aquarius and HRDE-1 act as gatekeepers coordinating gene expression and genome defense.

    Funded by: CIHR: MOP-274660; Cancer Research UK: C13474/A18583, C6946/A14492; European Research Council: 260688; NIGMS NIH HHS: R01 GM113242, R01 GM122080; Wellcome Trust: 092096/Z/10/Z, 104640/Z/14/Z, 108058/Z/15/Z

    Developmental cell 2017;42;3;241-255.e6

  • Genetic association analysis identifies variants associated with disease progression in primary sclerosing cholangitis.

    Alberts R, de Vries EMG, Goode EC, Jiang X, Sampaziotis F, Rombouts K, Böttcher K, Folseraas T, Weismüller TJ, Mason AL, Wang W, Alexander G, Alvaro D, Bergquist A, Björkström NK, Beuers U, Björnsson E, Boberg KM, Bowlus CL, Bragazzi MC, Carbone M, Chazouillères O, Cheung A, Dalekos G, Eaton J, Eksteen B, Ellinghaus D, Färkkilä M, Festen EAM, Floreani A, Franceschet I, Gotthardt DN, Hirschfield GM, Hoek BV, Holm K, Hohenester S, Hov JR, Imhann F, Invernizzi P, Juran BD, Lenzen H, Lieb W, Liu JZ, Marschall HU, Marzioni M, Melum E, Milkiewicz P, Müller T, Pares A, Rupp C, Rust C, Sandford RN, Schramm C, Schreiber S, Schrumpf E, Silverberg MS, Srivastava B, Sterneck M, Teufel A, Vallier L, Verheij J, Vila AV, Vries B, Zachou K, International PSC Study Group, The UK PSC Consortium, Chapman RW, Manns MP, Pinzani M, Rushbrook SM, Lazaridis KN, Franke A, Anderson CA, Karlsen TH, Ponsioen CY and Weersma RK

    Department of Gastroenterology and Hepatology, University of Groningen and University Medical Centre Groningen, Groningen, The Netherlands.

    Objective: Primary sclerosing cholangitis (PSC) is a genetically complex, inflammatory bile duct disease of largely unknown aetiology often leading to liver transplantation or death. Little is known about the genetic contribution to the severity and progression of PSC. The aim of this study is to identify genetic variants associated with PSC disease progression and development of complications.

    Design: We collected standardised PSC subphenotypes in a large cohort of 3402 patients with PSC. After quality control, we combined 130 422 single nucleotide polymorphisms of all patients-obtained using the Illumina immunochip-with their disease subphenotypes. Using logistic regression and Cox proportional hazards models, we identified genetic variants associated with binary and time-to-event PSC subphenotypes.

    Results: We identified genetic variant rs853974 to be associated with liver transplant-free survival (p=6.07×10<sup>-9</sup>). Kaplan-Meier survival analysis showed a 50.9% (95% CI 41.5% to 59.5%) transplant-free survival for homozygous AA allele carriers of rs853974 compared with 72.8% (95% CI 69.6% to 75.7%) for GG carriers at 10 years after PSC diagnosis. For the candidate gene in the region, <i>RSPO3</i>, we demonstrated expression in key liver-resident effector cells, such as human and murine cholangiocytes and human hepatic stellate cells.

    Conclusion: We present a large international PSC cohort, and report genetic loci associated with PSC disease progression. For liver transplant-free survival, we identified a genome-wide significant signal and demonstrated expression of the candidate gene <i>RSPO3</i> in key liver-resident effector cells. This warrants further assessments of the role of this potential key PSC modifier gene.

    Funded by: Medical Research Council: MC_PC_12009, MR/L016761/1; NIDDK NIH HHS: R01 DK084960; National Centre for the Replacement, Refinement and Reduction of Animals in Research: NC/N001540/1

    Gut 2017;67;8;1517-1524

  • Antimicrobial resistance in human populations: challenges and opportunities.

    Allcock S, Young EH, Holmes M, Gurdasani D, Dougan G, Sandhu MS, Solomon L and Török ME

    Department of Medicine, University of Cambridge, Cambridge, UK.

    Antimicrobial resistance (AMR) is a global public health threat. Emergence of AMR occurs naturally, but can also be selected for by antimicrobial exposure in clinical and veterinary medicine. Despite growing worldwide attention to AMR, there are substantial limitations in our understanding of the burden, distribution and determinants of AMR at the population level. We highlight the importance of population-based approaches to assess the association between antimicrobial use and AMR in humans and animals. Such approaches are needed to improve our understanding of the development and spread of AMR in order to inform strategies for the prevention, detection and management of AMR, and to support the sustainable use of antimicrobials in healthcare.

    Funded by: Medical Research Council: G1001787, MR/K013491/1; Wellcome Trust

    Global health, epidemiology and genomics 2017;2;e4

  • Adipocyte Accumulation in the Bone Marrow during Obesity and Aging Impairs Stem Cell-Based Hematopoietic and Bone Regeneration.

    Ambrosi TH, Scialdone A, Graja A, Gohlke S, Jank AM, Bocian C, Woelk L, Fan H, Logan DW, Schürmann A, Saraiva LR and Schulz TJ

    German Institute of Human Nutrition Potsdam-Rehbrücke, 14558 Nuthetal, Germany.

    Aging and obesity induce ectopic adipocyte accumulation in bone marrow cavities. This process is thought to impair osteogenic and hematopoietic regeneration. Here we specify the cellular identities of the adipogenic and osteogenic lineages of the bone. While aging impairs the osteogenic lineage, high-fat diet feeding activates expansion of the adipogenic lineage, an effect that is significantly enhanced in aged animals. We further describe a mesenchymal sub-population with stem cell-like characteristics that gives rise to both lineages and, at the same time, acts as a principal component of the hematopoietic niche by promoting competitive repopulation following lethal irradiation. Conversely, bone-resident cells committed to the adipocytic lineage inhibit hematopoiesis and bone healing, potentially by producing excessive amounts of Dipeptidyl peptidase-4, a protease that is a target of diabetes therapies. These studies delineate the molecular identity of the bone-resident adipocytic lineage, and they establish its involvement in age-dependent dysfunction of bone and hematopoietic regeneration.

    Cell stem cell 2017;20;6;771-784.e6

  • The OncoArray Consortium: A Network for Understanding the Genetic Architecture of Common Cancers.

    Amos CI, Dennis J, Wang Z, Byun J, Schumacher FR, Gayther SA, Casey G, Hunter DJ, Sellers TA, Gruber SB, Dunning AM, Michailidou K, Fachal L, Doheny K, Spurdle AB, Li Y, Xiao X, Romm J, Pugh E, Coetzee GA, Hazelett DJ, Bojesen SE, Caga-Anan C, Haiman CA, Kamal A, Luccarini C, Tessier D, Vincent D, Bacot F, Van Den Berg DJ, Nelson S, Demetriades S, Goldgar DE, Couch FJ, Forman JL, Giles GG, Conti DV, Bickeböller H, Risch A, Waldenberger M, Brüske-Hohlfeld I, Hicks BD, Ling H, McGuffog L, Lee A, Kuchenbaecker K, Soucy P, Manz J, Cunningham JM, Butterbach K, Kote-Jarai Z, Kraft P, FitzGerald L, Lindström S, Adams M, McKay JD, Phelan CM, Benlloch S, Kelemen LE, Brennan P, Riggan M, O'Mara TA, Shen H, Shi Y, Thompson DJ, Goodman MT, Nielsen SF, Berchuck A, Laboissiere S, Schmit SL, Shelford T, Edlund CK, Taylor JA, Field JK, Park SK, Offit K, Thomassen M, Schmutzler R, Ottini L, Hung RJ, Marchini J, Amin Al Olama A, Peters U, Eeles RA, Seldin MF, Gillanders E, Seminara D, Antoniou AC, Pharoah PD, Chenevix-Trench G, Chanock SJ, Simard J and Easton DF

    Biomedical Data Science, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire.

    Background: Common cancers develop through a multistep process often including inherited susceptibility. Collaboration among multiple institutions, and funding from multiple sources, has allowed the development of an inexpensive genotyping microarray, the OncoArray. The array includes a genome-wide backbone, comprising 230,000 SNPs tagging most common genetic variants, together with dense mapping of known susceptibility regions, rare variants from sequencing experiments, pharmacogenetic markers, and cancer-related traits.

    Methods: The OncoArray can be genotyped using a novel technology developed by Illumina to facilitate efficient genotyping. The consortium developed standard approaches for selecting SNPs for study, for quality control of markers, and for ancestry analysis. The array was genotyped at selected sites and with prespecified replicate samples to permit evaluation of genotyping accuracy among centers and by ethnic background.

    Results: The OncoArray consortium genotyped 447,705 samples. A total of 494,763 SNPs passed quality control steps with a sample success rate of 97% of the samples. Participating sites performed ancestry analysis using a common set of markers and a scoring algorithm based on principal components analysis.

    Conclusions: Results from these analyses will enable researchers to identify new susceptibility loci, perform fine-mapping of new or known loci associated with either single or multiple cancers, assess the degree of overlap in cancer causation and pleiotropic effects of loci that have been identified for disease-specific risk, and jointly model genetic, environmental, and lifestyle-related exposures.

    Impact: Ongoing analyses will shed light on etiology and risk assessment for many types of cancer. Cancer Epidemiol Biomarkers Prev; 26(1); 126-35. ©2016 AACR.

    Funded by: Cancer Research UK: 10118, 10124, 11174; NCI NIH HHS: P30 CA008748, P30 CA014089, P30 CA015083, P30 CA023108, P30 CA138313, P50 CA116201, P50 CA136393, R01 CA081488, R01 CA122443, R01 CA133996, R01 CA136924, R01 CA149429, R01 CA190182, R01 CA192393, R25 CA134286, U01 CA196386, U19 CA148065, U19 CA148107, U19 CA148112, U19 CA148127, U19 CA148537, UM1 CA164920, UM1 CA167551; NIGMS NIH HHS: P20 GM103534; NIH HHS: S10 OD020069; NLM NIH HHS: T32 LM012204; World Health Organization: 001

    Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology 2017;26;1;126-135

  • mRNA processing in mutant zebrafish lines generated by chemical and CRISPR-mediated mutagenesis produces unexpected transcripts that escape nonsense-mediated decay.

    Anderson JL, Mulligan TS, Shen MC, Wang H, Scahill CM, Tan FJ, Du SJ, Busch-Nentwich EM and Farber SA

    Carnegie Institution for Science, Department of Embryology, Baltimore, Maryland, United States of America.

    As model organism-based research shifts from forward to reverse genetics approaches, largely due to the ease of genome editing technology, a low frequency of abnormal phenotypes is being observed in lines with mutations predicted to lead to deleterious effects on the encoded protein. In zebrafish, this low frequency is in part explained by compensation by genes of redundant or similar function, often resulting from the additional round of teleost-specific whole genome duplication within vertebrates. Here we offer additional explanations for the low frequency of mutant phenotypes. We analyzed mRNA processing in seven zebrafish lines with mutations expected to disrupt gene function, generated by CRISPR/Cas9 or ENU mutagenesis methods. Five of the seven lines showed evidence of altered mRNA processing: one through a skipped exon that did not lead to a frame shift, one through nonsense-associated splicing that did not lead to a frame shift, and three through the use of cryptic splice sites. These results highlight the need for a methodical analysis of the mRNA produced in mutant lines before making conclusions or embarking on studies that assume loss of function as a result of a given genomic change. Furthermore, recognition of the types of adaptations that can occur may inform the strategies of mutant generation.

    Funded by: NIDDK NIH HHS: R01 DK093399; NIGMS NIH HHS: R01 GM063904, T32 GM007231

    PLoS genetics 2017;13;11;e1007105

  • One-step generation of conditional and reversible gene knockouts.

    Andersson-Rolf A, Mustata RC, Merenda A, Kim J, Perera S, Grego T, Andrews K, Tremble K, Silva JC, Fink J, Skarnes WC and Koo BK

    Wellcome Trust-Medical Research Council Stem Cell Institute, University of Cambridge, Cambridge, UK.

    Loss-of-function studies are key for investigating gene function, and CRISPR technology has made genome editing widely accessible in model organisms and cells. However, conditional gene inactivation in diploid cells is still difficult to achieve. Here, we present CRISPR-FLIP, a strategy that provides an efficient, rapid and scalable method for biallelic conditional gene knockouts in diploid or aneuploid cells, such as pluripotent stem cells, 3D organoids and cell lines, by co-delivery of CRISPR-Cas9 and a universal conditional intronic cassette.

    Funded by: Medical Research Council: MC_PC_12009; Wellcome Trust

    Nature methods 2017;14;3;287-289

  • DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning.

    Angermueller C, Lee HJ, Reik W and Stegle O

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

    Recent technological advances have enabled DNA methylation to be assayed at single-cell resolution. However, current protocols are limited by incomplete CpG coverage and hence methods to predict missing methylation states are critical to enable genome-wide analyses. We report DeepCpG, a computational approach based on deep neural networks to predict methylation states in single cells. We evaluate DeepCpG on single-cell methylation data from five cell types generated using alternative sequencing protocols. DeepCpG yields substantially more accurate predictions than previous methods. Additionally, we show that the model parameters can be interpreted, thereby providing insights into how sequence composition affects methylation variability.

    Funded by: Wellcome Trust

    Genome biology 2017;18;1;67

  • Genetic diversity of the African malaria vector Anopheles gambiae.

    Anopheles gambiae 1000 Genomes Consortium, Data analysis group, Partner working group, Sample collections—Angola:, Burkina Faso:, Cameroon:, Gabon:, Guinea:, Guinea-Bissau:, Kenya:, Uganda:, Crosses:, Sequencing and data production, Web application development and Project coordination

    The sustainability of malaria control in Africa is threatened by the rise of insecticide resistance in Anopheles mosquitoes, which transmit the disease. To gain a deeper understanding of how mosquito populations are evolving, here we sequenced the genomes of 765 specimens of Anopheles gambiae and Anopheles coluzzii sampled from 15 locations across Africa, and identified over 50 million single nucleotide polymorphisms within the accessible genome. These data revealed complex population structure and patterns of gene flow, with evidence of ancient expansions, recent bottlenecks, and local variation in effective population size. Strong signals of recent selection were observed in insecticide-resistance genes, with several sweeps spreading over large geographical distances and between species. The design of new tools for mosquito control using gene-drive systems will need to take account of high levels of genetic diversity in natural mosquito populations.

    Funded by: Medical Research Council: G0600718, G1002624, G1100339, MR/M006212/1; NIAID NIH HHS: R01 AI082734, U19 AI089674; NIGMS NIH HHS: R01 GM117241; Wellcome Trust: 090532/Z/09/Z, 090770/Z/09/Z, 098051

    Nature 2017;552;7683;96-100

  • Molecular markers for artemisinin and partner drug resistance in natural Plasmodium falciparum populations following increased insecticide treated net coverage along the slope of mount Cameroon: cross-sectional study.

    Apinjoh TO, Mugri RN, Miotto O, Chi HF, Tata RB, Anchang-Kimbi JK, Fon EM, Tangoh DA, Nyingchu RV, Jacob C, Amato R, Djimde A, Kwiatkowski D, Achidi EA and Amambua-Ngwa A

    Department of Biochemistry and Molecular Biology, University of Buea, Buea, Cameroon.

    Background: Drug resistance is one of the greatest challenges of malaria control programmes, with the monitoring of parasite resistance to artemisinins or to Artemisinin Combination Therapy (ACT) partner drugs critical to elimination efforts. Markers of resistance to a wide panel of antimalarials were assessed in natural parasite populations from southwestern Cameroon.

    Methods: Individuals with asymptomatic parasitaemia or uncomplicated malaria were enrolled through cross-sectional surveys from May 2013 to March 2014 along the slope of mount Cameroon. Plasmodium falciparum malaria parasitaemic blood, screened by light microscopy, was depleted of leucocytes using CF11 cellulose columns and the parasite genotype ascertained by sequencing on the Illumina HiSeq platform.

    Results: A total of 259 participants were enrolled in this study from three different altitudes. While some alleles associated with drug resistance in pfdhfr, pfmdr1 and pfcrt were highly prevalent, less than 3% of all samples carried mutations in the pfkelch13 gene, none of which were amongst those associated with slow artemisinin parasite clearance rates in Southeast Asia. The most prevalent haplotypes were triple mutants Pfdhfr I <sub>51</sub> R <sub>59</sub> N <sub>108</sub> I <sub>164</sub>(99%), pfcrt- C<sub>72</sub>V<sub>73</sub> I <sub>74</sub> E <sub>75</sub> T <sub>76</sub> (47.3%), and single mutants PfdhpsS<sub>436</sub> G <sub>437</sub>K<sub>540</sub>A<sub>581</sub>A<sub>613</sub>(69%) and Pfmdr1 N<sub>86</sub> F <sub>184</sub>D<sub>1246</sub> (53.2%).

    Conclusions: The predominance of the Pf pfcrt CVIET and Pf dhfr IRN triple mutant parasites and absence of pfkelch13 resistance alleles suggest that the amodiaquine and pyrimethamine components of AS-AQ and SP may no longer be effective in their role while chloroquine resistance still persists in southwestern Cameroon.

    Funded by: Medical Research Council: MC_EX_MR/K02440X/1, MR/M006212/1; Wellcome Trust

    Infectious diseases of poverty 2017;6;1;136

  • Rare Variant, Gene-Based Association Study of Hereditary Melanoma Using Whole-Exome Sequencing.

    Artomov M, Stratigos AJ, Kim I, Kumar R, Lauss M, Reddy BY, Miao B, Daniela Robles-Espinoza C, Sankar A, Njauw CN, Shannon K, Gragoudas ES, Marie Lane A, Iyer V, Newton-Bishop JA, Timothy Bishop D, Holland EA, Mann GJ, Singh T, Daly MJ and Tsao H

    MGH Analytic and Translational Genetics Unit, MGH and Broad Institute, Boston, MA.

    Background: Extraordinary progress has been made in our understanding of common variants in many diseases, including melanoma. Because the contribution of rare coding variants is not as well characterized, we performed an exome-wide, gene-based association study of familial cutaneous melanoma (CM) and ocular melanoma (OM).

    Methods: Using 11 990 jointly processed individual DNA samples, whole-exome sequencing was performed, followed by large-scale joint variant calling using GATK (Genome Analysis ToolKit). PLINK/SEQ was used for statistical analysis of genetic variation. Four models were used to estimate the association among different types of variants. In vitro functional validation was performed using three human melanoma cell lines in 2D and 3D proliferation assays. In vivo tumor growth was assessed using xenografts of human melanoma A375 melanoma cells in nude mice (eight mice per group). All statistical tests were two-sided.

    Results: Strong signals were detected for CDKN2A (Pmin = 6.16 × 10-8) in the CM cohort (n = 273) and BAP1 (Pmin = 3.83 × 10-6) in the OM (n = 99) cohort. Eleven genes that exhibited borderline association (P < 10-4) were independently validated using The Cancer Genome Atlas melanoma cohort (379 CM, 47 OM) and a matched set of 3563 European controls with CDKN2A (P = .009), BAP1 (P = .03), and EBF3 (P = 4.75 × 10-4), a candidate risk locus, all showing evidence of replication. EBF3 was then evaluated using germline data from a set of 132 familial melanoma cases and 4769 controls of UK origin (joint P = 1.37 × 10-5). Somatically, loss of EBF3 expression correlated with progression, poorer outcome, and high MITF tumors. Functionally, induction of EBF3 in melanoma cells reduced cell growth in vitro, retarded tumor formation in vivo, and reduced MITF levels.

    Conclusions: The results of this large rare variant germline association study further define the mutational landscape of hereditary melanoma and implicate EBF3 as a possible CM predisposition gene.

    Funded by: NCI NIH HHS: K24 CA149202

    Journal of the National Cancer Institute 2017;109;12

  • Evidence for large-scale gene-by-smoking interaction effects on pulmonary function.

    Aschard H, Tobin MD, Hancock DB, Skurnik D, Sood A, James A, Vernon Smith A, Manichaikul AW, Campbell A, Prins BP, Hayward C, Loth DW, Porteous DJ, Strachan DP, Zeggini E, O'Connor GT, Brusselle GG, Boezen HM, Schulz H, Deary IJ, Hall IP, Rudan I, Kaprio J, Wilson JF, Wilk JB, Huffman JE, Hua Zhao J, de Jong K, Lyytikäinen LP, Wain LV, Jarvelin MR, Kähönen M, Fornage M, Polasek O, Cassano PA, Barr RG, Rawal R, Harris SE, Gharib SA, Enroth S, Heckbert SR, Lehtimäki T, Gyllensten U, Understanding Society Scientific Group, Jackson VE, Gudnason V, Tang W, Dupuis J, Soler Artigas M, Joshi AD, London SJ and Kraft P

    Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, USA.

    Background: Smoking is the strongest environmental risk factor for reduced pulmonary function. The genetic component of various pulmonary traits has also been demonstrated, and at least 26 loci have been reproducibly associated with either FEV 1 (forced expiratory volume in 1 second) or FEV 1 /FVC (FEV 1 /forced vital capacity). Although the main effects of smoking and genetic loci are well established, the question of potential gene-by-smoking interaction effect remains unanswered. The aim of the present study was to assess, using a genetic risk score approach, whether the effect of these 26 loci on pulmonary function is influenced by smoking.

    Methods: We evaluated the interaction between smoking exposure, considered as either ever vs never or pack-years, and a 26-single nucleotide polymorphisms (SNPs) genetic risk score in relation to FEV 1 or FEV 1 /FVC in 50 047 participants of European ancestry from the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) and SpiroMeta consortia.

    Results: We identified an interaction ( βint  = -0.036, 95% confidence interval, -0.040 to -0.032, P  = 0.00057) between an unweighted 26 SNP genetic risk score and smoking status (ever/never) on the FEV 1 /FVC ratio. In interpreting this interaction, we showed that the genetic risk of falling below the FEV /FVC threshold used to diagnose chronic obstructive pulmonary disease is higher among ever smokers than among never smokers. A replication analysis in two independent datasets, although not statistically significant, showed a similar trend in the interaction effect.

    Conclusions: This study highlights the benefit of using genetic risk scores for identifying interactions missed when studying individual SNPs and shows, for the first time, that persons with the highest genetic risk for low FEV 1 /FVC may be more susceptible to the deleterious effects of smoking.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F019394/1; Chief Scientist Office: CZD/16/6/4; Medical Research Council: G0902313, G1000861, MC_PC_U127561128, MR/K026992/1; NHGRI NIH HHS: R21 HG007687; NHLBI NIH HHS: R01 HL077612, R01 HL093081; NIDDK NIH HHS: K01 DK110267

    International journal of epidemiology 2017;46;3;894-904

  • A two-stage inter-rater approach for enrichment testing of variants associated with multiple traits.

    Asimit JL, Payne F, Morris AP, Cordell HJ and Barroso I

    Wellcome Trust Sanger Institute, Hinxton, UK.

    Shared genetic aetiology may explain the co-occurrence of diseases in individuals more often than expected by chance. On identifying associated variants shared between two traits, one objective is to determine whether such overlap may be explained by specific genomic characteristics (eg, functional annotation). In clinical studies, inter-rater agreement approaches assess concordance among expert opinions on the presence/absence of a complex disease for each subject. We adapt a two-stage inter-rater agreement model to the genetic association setting to identify features predictive of overlap variants, while accounting for their marginal trait associations. The resulting corrected overlap and marginal enrichment test (COMET) also assesses enrichment at the individual trait level. Multiple categories may be tested simultaneously and the method is computationally efficient, not requiring permutations to assess significance. In an extensive simulation study, COMET identifies features predictive of enrichment with high power and has well-calibrated type I error. In contrast, testing for overlap with a single-trait enrichment test has inflated type I error. COMET is applied to three glycaemic traits using a set of functional annotation categories as predictors, followed by further analyses that focus on tissue-specific regulatory variants. The results support previous findings that regulatory variants in pancreatic islets are enriched for fasting glucose-associated variants, and give insight into differences/similarities between characteristics of variants associated with glycaemic traits. Also, despite regulatory variants in pancreatic islets being enriched for variants that are marginally associated with fasting glucose and fasting insulin, there is no enrichment of shared variants between the traits.

    Funded by: Medical Research Council: MR/K021486/1; Wellcome Trust: 098017, 098051, 102858

    European journal of human genetics : EJHG 2017;25;3;341-349

  • Single-cell RNA-sequencing uncovers transcriptional states and fate decisions in haematopoiesis.

    Athanasiadis EI, Botthof JG, Andres H, Ferreira L, Lio P and Cvejic A

    Department of Haematology, University of Cambridge, Cambridge, CB2 0XY, UK.

    The success of marker-based approaches for dissecting haematopoiesis in mouse and human is reliant on the presence of well-defined cell surface markers specific for diverse progenitor populations. An inherent problem with this approach is that the presence of specific cell surface markers does not directly reflect the transcriptional state of a cell. Here, we used a marker-free approach to computationally reconstruct the blood lineage tree in zebrafish and order cells along their differentiation trajectory, based on their global transcriptional differences. Within the population of transcriptionally similar stem and progenitor cells, our analysis reveals considerable cell-to-cell differences in their probability to transition to another committed state. Once fate decision is executed, the suppression of transcription of ribosomal genes and upregulation of lineage-specific factors coordinately controls lineage differentiation. Evolutionary analysis further demonstrates that this haematopoietic programme is highly conserved between zebrafish and higher vertebrates.

    Funded by: Cancer Research UK: C45041/A14953; Medical Research Council: MC_PC_12009; Wellcome Trust

    Nature communications 2017;8;1;2045

  • EuPathDB: the eukaryotic pathogen genomics database resource.

    Aurrecoechea C, Barreto A, Basenko EY, Brestelli J, Brunk BP, Cade S, Crouch K, Doherty R, Falke D, Fischer S, Gajria B, Harb OS, Heiges M, Hertz-Fowler C, Hu S, Iodice J, Kissinger JC, Lawrence C, Li W, Pinney DF, Pulman JA, Roos DS, Shanmugasundram A, Silva-Franco F, Steinbiss S, Stoeckert CJ, Spruill D, Wang H, Warrenfeltz S and Zheng J

    Center for Tropical & Emerging Global Diseases, University of Georgia, Athens, GA 30602, USA.

    The Eukaryotic Pathogen Genomics Database Resource (EuPathDB, is a collection of databases covering 170+ eukaryotic pathogens (protists & fungi), along with relevant free-living and non-pathogenic species, and select pathogen hosts. To facilitate the discovery of meaningful biological relationships, the databases couple preconfigured searches with visualization and analysis tools for comprehensive data mining via intuitive graphical interfaces and APIs. All data are analyzed with the same workflows, including creation of gene orthology profiles, so data are easily compared across data sets, data types and organisms. EuPathDB is updated with numerous new analysis tools, features, data sets and data types. New tools include GO, metabolic pathway and word enrichment analyses plus an online workspace for analysis of personal, non-public, large-scale data. Expanded data content is mostly genomic and functional genomic data while new data types include protein microarray, metabolic pathways, compounds, quantitative proteomics, copy number variation, and polysomal transcriptomics. New features include consistent categorization of searches, data sets and genome browser tracks; redesigned gene pages; effective integration of alternative transcripts; and a EuPathDB Galaxy instance for private analyses of a user's data. Forthcoming upgrades include user workspaces for private integration of data with existing EuPathDB data and improved integration and presentation of host-pathogen interactions.

    Funded by: NIAID NIH HHS: HHSN272201400030C; Wellcome Trust: 108443/Z/15/Z, WT085822MA

    Nucleic acids research 2017;45;D1;D581-D591

  • Meta-analysis of GWAS of over 16,000 individuals with autism spectrum disorder highlights a novel locus at 10q24.32 and a significant overlap with schizophrenia.

    Autism Spectrum Disorders Working Group of The Psychiatric Genomics Consortium

    Background: Over the past decade genome-wide association studies (GWAS) have been applied to aid in the understanding of the biology of traits. The success of this approach is governed by the underlying effect sizes carried by the true risk variants and the corresponding statistical power to observe such effects given the study design and sample size under investigation. Previous ASD GWAS have identified genome-wide significant (GWS) risk loci; however, these studies were of only of low statistical power to identify GWS loci at the lower effect sizes (odds ratio (OR) <1.15).

    Methods: We conducted a large-scale coordinated international collaboration to combine independent genotyping data to improve the statistical power and aid in robust discovery of GWS loci. This study uses genome-wide genotyping data from a discovery sample (7387 ASD cases and 8567 controls) followed by meta-analysis of summary statistics from two replication sets (7783 ASD cases and 11359 controls; and 1369 ASD cases and 137308 controls).

    Results: We observe a GWS locus at 10q24.32 that overlaps several genes including <i>PITX3</i>, which encodes a transcription factor identified as playing a role in neuronal differentiation and <i>CUEDC2</i> previously reported to be associated with social skills in an independent population cohort. We also observe overlap with regions previously implicated in schizophrenia which was further supported by a strong genetic correlation between these disorders (Rg = 0.23; <i>P</i> = 9 × 10<sup>-6</sup>). We further combined these Psychiatric Genomics Consortium (PGC) ASD GWAS data with the recent PGC schizophrenia GWAS to identify additional regions which may be important in a common neurodevelopmental phenotype and identified 12 novel GWS loci. These include loci previously implicated in ASD such as <i>FOXP1</i> at 3p13, <i>ATP2B2</i> at 3p25.3, and a 'neurodevelopmental hub' on chromosome 8p11.23.

    Conclusions: This study is an important step in the ongoing endeavour to identify the loci which underpin the common variant signal in ASD. In addition to novel GWS loci, we have identified a significant genetic correlation with schizophrenia and association of ASD with several neurodevelopmental-related genes such as <i>EXT1</i>, <i>ASTN2</i>, <i>MACROD2</i>, and <i>HDAC4.</i>

    Funded by: CIHR; Medical Research Council: MR/L010305/1; NCBDD CDC HHS: U01 DD000498, U10 DD000180, U10 DD000181, U10 DD000182, U10 DD000183, U10 DD000184; NIMH NIH HHS: K99 MH101367, R00 MH101367, R01 MH094293, U01 MH094432, U01 MH109514

    Molecular autism 2017;8;21

  • Association of Pneumococcal Protein Antigen Serology With Age and Antigenic Profile of Colonizing Isolates.

    Azarian T, Grant LR, Georgieva M, Hammitt LL, Reid R, Bentley SD, Goldblatt D, Santosham M, Weatherholtz R, Burbidge P, Goklish N, Thompson CM, Hanage WP, O'Brien KL and Lipsitch M

    Center for Communicable Disease Dynamics, Department of Epidemiology, T. H. Chan School of Public Health, Harvard University, Boston, Massachusetts, USA.

    Background: Several Streptococcus pneumoniae proteins play a role in pathogenesis and are being investigated as vaccine targets. It is largely unknown whether naturally acquired antibodies reduce the risk of colonization with strains expressing a particular antigenic variant.

    Methods: Serum immunoglobulin G (IgG) titers to 28 pneumococcal protein antigens were measured among 242 individuals aged <6 months-78 years in Native American communities between 2007 and 2009. Nasopharyngeal swabs were collected >- 30 days after serum collection, and the antigen variant in each pneumococcal isolate was determined using genomic data. We assessed the association between preexisting variant-specific antibody titers and subsequent carriage of pneumococcus expressing a particular antigen variant.

    Results: Antibody titers often increased across pediatric groups before decreasing among adults. Individuals with low titers against group 3 pneumococcal surface protein C (PspC) variants were more likely to be colonized with pneumococci expressing those variants. For other antigens, variant-specific IgG titers do not predict colonization.

    Conclusion: We observed an inverse association between variant-specific antibody concentration and homologous pneumococcal colonization for only 1 protein. Further assessment of antibody repertoires may elucidate the nature of antipneumococcal antibody-mediated mucosal immunity while informing vaccine development.

    Funded by: NIAID NIH HHS: R01 AI048935, R01 AI106786

    The Journal of infectious diseases 2017;215;5;713-722

  • Heterogeneity of the Epstein-Barr Virus (EBV) Major Internal Repeat Reveals Evolutionary Mechanisms of EBV and a Functional Defect in the Prototype EBV Strain B95-8.

    Ba Abdullah MM, Palermo RD, Palser AL, Grayson NE, Kellam P, Correia S, Szymula A and White RE

    Section of Virology, Imperial College Faculty of Medicine, St. Mary's Hospital, Norfolk Place, London, United Kingdom.

    Epstein-Barr virus (EBV) is a ubiquitous pathogen of humans that can cause several types of lymphoma and carcinoma. Like other herpesviruses, EBV has diversified through both coevolution with its host and genetic exchange between virus strains. Sequence analysis of the EBV genome is unusually challenging because of the large number and lengths of repeat regions within the virus. Here we describe the sequence assembly and analysis of the large internal repeat 1 of EBV (IR1; also known as the BamW repeats) for more than 70 strains. The diversity of the latency protein EBV nuclear antigen leader protein (EBNA-LP) resides predominantly within the exons downstream of IR1. The integrity of the putative BWRF1 open reading frame (ORF) is retained in over 80% of strains, and deletions truncating IR1 always spare BWRF1. Conserved regions include the IR1 latency promoter (Wp) and one zone upstream of and two within BWRF1. IR1 is heterogeneous in 70% of strains, and this heterogeneity arises from sequence exchange between strains as well as from spontaneous mutation, with interstrain recombination being more common in tumor-derived viruses. This genetic exchange often incorporates regions of <1 kb, and allelic gene conversion changes the frequency of small regions within the repeat but not close to the flanks. These observations suggest that IR1-and, by extension, EBV-diversifies through both recombination and breakpoint repair, while concerted evolution of IR1 is driven by gene conversion of small regions. Finally, the prototype EBV strain B95-8 contains four nonconsensus variants within a single IR1 repeat unit, including a stop codon in the EBNA-LP gene. Repairing IR1 improves EBNA-LP levels and the quality of transformation by the B95-8 bacterial artificial chromosome (BAC).<b>IMPORTANCE</b> Epstein-Barr virus (EBV) infects the majority of the world population but causes illness in only a small minority of people. Nevertheless, over 1% of cancers worldwide are attributable to EBV. Recent sequencing projects investigating virus diversity to see if different strains have different disease impacts have excluded regions of repeating sequence, as they are more technically challenging. Here we analyze the sequence of the largest repeat in EBV (IR1). We first characterized the variations in protein sequences encoded across IR1. In studying variations within the repeat of each strain, we identified a mutation in the main laboratory strain of EBV that impairs virus function, and we suggest that tumor-associated viruses may be more likely to contain DNA mixed from two strains. The patterns of this mixing suggest that sequences can spread between strains (and also within the repeat) by copying sequence from another strain (or repeat unit) to repair DNA damage.

    Funded by: Medical Research Council: MR/ L008432/1, MR/N010388/1; Wellcome Trust: 098051

    Journal of virology 2017;91;23

  • Differentiation dynamics of mammary epithelial cells revealed by single-cell RNA sequencing.

    Bach K, Pensa S, Grzelak M, Hadfield J, Adams DJ, Marioni JC and Khaled WT

    Department of Pharmacology, University of Cambridge, Cambridge, CB2 1PD, UK.

    Characterising the hierarchy of mammary epithelial cells (MECs) and how they are regulated during adult development is important for understanding how breast cancer arises. Here we report the use of single-cell RNA sequencing to determine the gene expression profile of MECs across four developmental stages; nulliparous, mid gestation, lactation and post involution. Our analysis of 23,184 cells identifies 15 clusters, few of which could be fully characterised by a single marker gene. We argue instead that the epithelial cells-especially in the luminal compartment-should rather be conceptualised as being part of a continuous spectrum of differentiation. Furthermore, our data support the existence of a common luminal progenitor cell giving rise to intermediate, restricted alveolar and hormone-sensing progenitors. This luminal progenitor compartment undergoes transcriptional changes in response to a full pregnancy, lactation and involution. In summary, our results provide a global, unbiased view of adult mammary gland development.

    Funded by: Cancer Research UK: C47525/A17348

    Nature communications 2017;8;1;2128

  • A pilot study to understand feasibility and acceptability of stool and cord blood sample collection for a large-scale longitudinal birth cohort.

    Bailey SR, Townsend CL, Dent H, Mallet C, Tsaliki E, Riley EM, Noursadeghi M, Lawley TD, Rodger AJ, Brocklehurst P and Field N

    UCL Institute of Child Health, University College London, London, UK.

    Background: Few data are available to guide biological sample collection around the time of birth for large-scale birth cohorts. We are designing a large UK birth cohort to investigate the role of infection and the developing immune system in determining future health and disease. We undertook a pilot to develop methodology for the main study, gain practical experience of collecting samples, and understand the acceptability of sample collection to women in late pregnancy.

    Methods: Between February-July 2014, we piloted the feasibility and acceptability of collecting maternal stool, baby stool and cord blood samples from participants recruited at prolonged pregnancy and planned pre-labour caesarean section clinics at University College London Hospital. Participating women were asked to complete acceptability questionnaires.

    Results: Overall, 265 women were approached and 171 (65%) participated, with ≥1 sample collected from 113 women or their baby (66%). Women had a mean age of 34 years, were primarily of white ethnicity (130/166, 78%), and half were nulliparous (86/169, 51%). Women undergoing planned pre-labour caesarean section were more likely than those who delivered vaginally to provide ≥1 sample (98% vs 54%), but less likely to provide maternal stool (10% vs 43%). Pre-sample questionnaires were completed by 110/171 women (64%). Most women reported feeling comfortable with samples being collected from their baby (<10% uncomfortable), but were less comfortable about their own stool (19% uncomfortable) or a vaginal swab (24% uncomfortable).

    Conclusions: It is possible to collect a range of biological samples from women around the time of delivery, and this was acceptable for most women. These data inform study design and protocol development for large-scale birth cohorts.

    Funded by: Wellcome Trust: WT101169MA

    BMC pregnancy and childbirth 2017;17;1;439

  • Whole genome sequencing of Shigella sonnei through PulseNet Latin America and Caribbean: advancing global surveillance of foodborne illnesses.

    Baker KS, Campos J, Pichel M, Della Gaspera A, Duarte-Martínez F, Campos-Chacón E, Bolaños-Acuña HM, Guzmán-Verri C, Mather AE, Diaz Velasco S, Zamudio Rojas ML, Forbester JL, Connor TR, Keddy KH, Smith AM, López de Delgado EA, Angiolillo G, Cuaical N, Fernández J, Aguayo C, Morales Aguilar M, Valenzuela C, Morales Medrano AJ, Sirok A, Weiler Gustafson N, Diaz Guevara PL, Montaño LA, Perez E and Thomson NR

    University of Liverpool, Department of Functional and Comparative Genomics, Liverpool, England, United Kingdom; Wellcome Trust Sanger Institute, Pathogen Variation Programme, Hinxton, England, United Kingdom. Electronic address:

    Objectives: Shigella sonnei is a globally important diarrhoeal pathogen tracked through the surveillance network PulseNet Latin America and Caribbean (PNLA&C), which participates in PulseNet International. PNLA&C laboratories use common molecular techniques to track pathogens causing foodborne illness. We aimed to demonstrate the possibility and advantages of transitioning to whole genome sequencing (WGS) for surveillance within existing networks across a continent where S. sonnei is endemic.

    Methods: We applied WGS to representative archive isolates of S. sonnei (n = 323) from laboratories in nine PNLA&C countries to generate a regional phylogenomic reference for S. sonnei and put this in the global context. We used this reference to contextualise 16 S. sonnei from three Argentinian outbreaks, using locally generated sequence data. Assembled genome sequences were used to predict antimicrobial resistance (AMR) phenotypes and identify AMR determinants.

    Results: S. sonnei isolates clustered in five Latin American sublineages in the global phylogeny, with many (46%, 149 of 323) belonging to previously undescribed sublineages. Predicted multidrug resistance was common (77%, 249 of 323), and clinically relevant differences in AMR were found among sublineages. The regional overview showed that Argentinian outbreak isolates belonged to distinct sublineages and had different epidemiologic origins.

    Conclusions: Latin America contains novel genetic diversity of S. sonnei that is relevant on a global scale and commonly exhibits multidrug resistance. Retrospective passive surveillance with WGS has utility for informing treatment, identifying regionally epidemic sublineages and providing a framework for interpretation of prospective, locally sequenced outbreaks.

    Funded by: NCI NIH HHS: U01 CA207167; World Health Organization: 001

    Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases 2017;23;11;845-853

  • Genetic differentiation between upland and lowland populations shapes the Y-chromosomal landscape of West Asia.

    Balanovsky O, Chukhryaeva M, Zaporozhchenko V, Urasin V, Zhabagin M, Hovhannisyan A, Agdzhoyan A, Dibirova K, Kuznetsova M, Koshel S, Pocheshkhova E, Alborova I, Skhalyakho R, Utevska O, Genographic Consortium, Mustafin K, Yepiskoposyan L, Tyler-Smith C and Balanovska E

    Vavilov Institute of General Genetics, Moscow, Russia.

    Y-chromosomal variation in West Asian populations has so far been studied in less detail than in the neighboring Europe. Here, we analyzed 598 Y-chromosomes from two West Asian subregions-Transcaucasia and the Armenian plateau-using 40 Y-SNPs and 17 Y-STRs and combined them with previously published data from the region. The West Asian populations fell into two clusters: upland populations from the Anatolian, Armenian and Iranian plateaus, and lowland populations from the Levant, Mesopotamia and the Arabian Peninsula. This geographic subdivision corresponds with the linguistic difference between Indo-European and Turkic speakers, on the one hand, and Semitic speakers, on the other. This subdivision could be traced back to the Neolithic epoch, when upland populations from the Anatolian and Iranian plateaus carried similar haplogroup spectra but did not overlap with lowland populations from the Levant. We also found that the initial gene pool of the Armenian motherland population has been well preserved in most groups of the Armenian Diaspora. In view of the contribution of West Asians to the autosomal gene pool of the steppe Yamnaya archaeological culture, we sequenced a large portion of the Y-chromosome in haplogroup R1b samples from present-day East European steppe populations. The ancient Yamnaya samples are located on the "eastern" R-GG400 branch of haplogroup R1b-L23, showing that the paternal descendants of the Yamnaya still live in the Pontic steppe and that the ancient Yamnaya population was not an important source of paternal lineages in present-day West Europeans.

    Funded by: Wellcome Trust: 098051

    Human genetics 2017;136;4;437-450

  • Phylogeography of human Y-chromosome haplogroup Q3-L275 from an academic/citizen science collaboration.

    Balanovsky O, Gurianov V, Zaporozhchenko V, Balaganskaya O, Urasin V, Zhabagin M, Grugni V, Canada R, Al-Zahery N, Raveane A, Wen SQ, Yan S, Wang X, Zalloua P, Marafi A, Koshel S, Semino O, Tyler-Smith C and Balanovska E

    Vavilov Institute of General Genetics, Moscow, Russia.

    Background: The Y-chromosome haplogroup Q has three major branches: Q1, Q2, and Q3. Q1 is found in both Asia and the Americas where it accounts for about 90% of indigenous Native American Y-chromosomes; Q2 is found in North and Central Asia; but little is known about the third branch, Q3, also named Q1b-L275. Here, we combined the efforts of population geneticists and genetic genealogists to use the potential of full Y-chromosome sequencing for reconstructing haplogroup Q3 phylogeography and suggest possible linkages to events in population history.

    Results: We analyzed 47 fully sequenced Y-chromosomes and reconstructed the haplogroup Q3 phylogenetic tree in detail. Haplogroup Q3-L275, derived from the oldest known split within Eurasian/American haplogroup Q, most likely occurred in West or Central Asia in the Upper Paleolithic period. During the Mesolithic and Neolithic epochs, Q3 remained a minor component of the West Asian Y-chromosome pool and gave rise to five branches (Q3a to Q3e), which spread across West, Central and parts of South Asia. Around 3-4 millennia ago (Bronze Age), the Q3a branch underwent a rapid expansion, splitting into seven branches, some of which entered Europe. One of these branches, Q3a1, was acquired by a population ancestral to Ashkenazi Jews and grew within this population during the 1st millennium AD, reaching up to 5% in present day Ashkenazi.

    Conclusions: This study dataset was generated by a massive Y-chromosome genotyping effort in the genetic genealogy community, and phylogeographic patterns were revealed by a collaboration of population geneticists and genetic genealogists. This positive experience of collaboration between academic and citizen science provides a model for further joint projects. Merging data and skills of academic and citizen science promises to combine, respectively, quality and quantity, generalization and specialization, and achieve a well-balanced and careful interpretation of the paternal-side history of human populations.

    Funded by: Wellcome Trust: 098051

    BMC evolutionary biology 2017;17;Suppl 1;18

  • Compound heterozygous variants in NBAS as a cause of atypical osteogenesis imperfecta.

    Balasubramanian M, Hurst J, Brown S, Bishop NJ, Arundel P, DeVile C, Pollitt RC, Crooks L, Longman D, Caceres JF, Shackley F, Connolly S, Payne JH, Offiah AC, Hughes D, DDD Study, Parker MJ, Hide W and Skerry TM

    Sheffield Clinical Genetics Service, Sheffield Children's NHS Foundation Trust, UK; Highly Specialised Service for Severe, Complex and Atypical OI, UK. Electronic address:

    Background: Osteogenesis imperfecta (OI), the commonest inherited bone fragility disorder, affects 1 in 15,000 live births resulting in frequent fractures and reduced mobility, with significant impact on quality of life. Early diagnosis is important, as therapeutic advances can lead to improved clinical outcome and patient benefit.

    Report: Whole exome sequencing in patients with OI identified, in two patients with a multi-system phenotype, compound heterozygous variants in NBAS (neuroblastoma amplified sequence). Patient 1: NBAS c.5741G>A p.(Arg1914His); c.3010C>T p.(Arg1004*) in a 10-year old boy with significant short stature, bone fragility requiring treatment with bisphosphonates, developmental delay and immunodeficiency. Patient 2: NBAS c.5741G>A p.(Arg1914His); c.2032C>T p.(Gln678*) in a 5-year old boy with similar presenting features, bone fragility, mild developmental delay, abnormal liver function tests and immunodeficiency.

    Discussion: Homozygous missense NBAS variants cause SOPH syndrome (short stature; optic atrophy; Pelger-Huet anomaly), the same missense variant was found in our patients on one allele and a nonsense variant in the other allele. Recent literature suggests a multi-system phenotype. In this study, patient fibroblasts have shown reduced collagen expression, compared to control cells and RNAseq studies, in bone cells show that NBAS is expressed in osteoblasts and osteocytes of rodents and primates. These findings provide proof-of-concept that NBAS mutations have mechanistic effects in bone, and that NBAS variants are a novel cause of bone fragility, which is distinguishable from 'Classical' OI.

    Conclusions: Here we report on variants in NBAS, as a cause of bone fragility in humans, and expand the phenotypic spectrum associated with NBAS. We explore the mechanism underlying NBAS and the striking skeletal phenotype in our patients.

    Funded by: Department of Health UK; Medical Research Council: MC_PC_15018, MC_PC_U127584479; Wellcome Trust: WT098051

    Bone 2017;94;65-74

  • Chitayat syndrome: hyperphalangism, characteristic facies, hallux valgus and bronchomalacia results from a recurrent c.266A>G p.(Tyr89Cys) variant in the ERF gene.

    Balasubramanian M, Lord H, Levesque S, Guturu H, Thuriot F, Sillon G, Wenger AM, Sureka DL, Lester T, Johnson DS, Bowen J, Calhoun AR, Viskochil DH, DDD Study, Bejerano G, Bernstein JA and Chitayat D

    Sheffield Clinical Genetics Service, Sheffield Children's NHS Foundation Trust, Sheffield, UK.

    Background: In 1993, Chitayat <i>et al.</i>, reported a newborn with hyperphalangism, facial anomalies, and bronchomalacia. We identified three additional families with similar findings. Features include bilateral accessory phalanx resulting in shortened index fingers; hallux valgus; distinctive face; respiratory compromise.

    Objectives: To identify the genetic aetiology of Chitayat syndrome and identify a unifying cause for this specific form of hyperphalangism.

    Methods: Through ongoing collaboration, we had collected patients with strikingly-similar phenotype. Trio-based exome sequencing was first performed in Patient 2 through Deciphering Developmental Disorders study. Proband-only exome sequencing had previously been independently performed in Patient 4. Following identification of a candidate gene variant in Patient 2, the same variant was subsequently confirmed from exome data in Patient 4. Sanger sequencing was used to validate this variant in Patients 1, 3; confirm paternal inheritance in Patient 5.

    Results: A recurrent, novel variant NM_006494.2:c.266A>G p.(Tyr89Cys) in <i>ERF</i> was identified in five affected individuals: de novo (patient 1, 2 and 3) and inherited from an affected father (patient 4 and 5). p.Tyr89Cys is an aromatic polar neutral to polar neutral amino acid substitution, at a highly conserved position and lies within the functionally important ETS-domain of the protein. The recurrent <i>ERF</i> c.266A>C p.(Tyr89Cys) variant causes Chitayat syndrome.

    Discussion: <i>ERF</i> variants have previously been associated with complex craniosynostosis. In contrast, none of the patients with the c.266A>G p.(Tyr89Cys) variant have craniosynostosis.

    Conclusions: We report the molecular aetiology of Chitayat syndrome and discuss potential mechanisms for this distinctive phenotype associated with the p.Tyr89Cys substitution in <i>ERF</i>.

    Funded by: NIMH NIH HHS: U01 MH105949; Wellcome Trust: WT098051

    Journal of medical genetics 2017;54;3;157-165

  • Delineating the phenotypic spectrum of Bainbridge-Ropers syndrome: 12 new patients with de novo, heterozygous, loss-of-function mutations in ASXL3 and review of published literature.

    Balasubramanian M, Willoughby J, Fry AE, Weber A, Firth HV, Deshpande C, Berg JN, Chandler K, Metcalfe KA, Lam W, Pilz DT and Tomkins S

    Sheffield Clinical Genetics Service, Sheffield Children's NHS Foundation Trust, Sheffield, UK.

    Background: Bainbridge-Ropers syndrome (BRPS) is a recently described developmental disorder caused by <i>de novo</i> truncating mutations in the additional sex combs like 3 (<i>ASXL3</i>) gene. To date, there have been fewer than 10 reported patients.

    Objectives: Here, we delineate the BRPS phenotype further by describing a series of 12 previously unreported patients identified by the Deciphering Developmental Disorders study.

    Methods: Trio-based exome sequencing was performed on all 12 patients included in this study, which found a <i>de novo</i> truncating mutation in <i>ASXL3</i>. Detailed phenotypic information and patient images were collected and summarised as part of this study.

    Results: By obtaining genotype:phenotype data, we have been able to demonstrate a second mutation cluster region within <i>ASXL3</i>. This report expands the phenotype of older patients with BRPS; common emerging features include severe intellectual disability (11/12), poor/ absent speech (12/12), autistic traits (9/12), distinct face (arched eyebrows, prominent forehead, high-arched palate, hypertelorism and downslanting palpebral fissures), (9/12), hypotonia (11/12) and significant feeding difficulties (9/12) when young.

    Discussion: Similarities in the patients reported previously in comparison with this cohort included their distinctive craniofacial features, feeding problems, absent/limited speech and intellectual disability. Shared behavioural phenotypes include autistic traits, hand-flapping, rocking, aggressive behaviour and sleep disturbance.

    Conclusions: This series expands the phenotypic spectrum of this severe disorder and highlights its surprisingly high frequency. With the advent of advanced genomic screening, we are likely to identify more variants in this gene presenting with a variable phenotype, which this study will explore.

    Funded by: Wellcome Trust: WT098051

    Journal of medical genetics 2017;54;8;537-543

  • MiR-211 is essential for adult cone photoreceptor maintenance and visual function.

    Barbato S, Marrocco E, Intartaglia D, Pizzo M, Asteriti S, Naso F, Falanga D, Bhat RS, Meola N, Carissimo A, Karali M, Prosser HM, Cangiano L, Surace EM, Banfi S and Conte I

    Telethon Institute of Genetics and Medicine, Via Campi Flegrei 34, Pozzuoli (Naples), 80078, Italy.

    MicroRNAs (miRNAs) are key post-transcriptional regulators of gene expression that play an important role in the control of fundamental biological processes in both physiological and pathological conditions. Their function in retinal cells is just beginning to be elucidated, and a few have been found to play a role in photoreceptor maintenance and function. MiR-211 is one of the most abundant miRNAs in the developing and adult eye. However, its role in controlling vertebrate visual system development, maintenance and function so far remain incompletely unexplored. Here, by targeted inactivation in a mouse model, we identify a critical role of miR-211 in cone photoreceptor function and survival. MiR-211 knockout (-/-) mice exhibited a progressive cone dystrophy accompanied by significant alterations in visual function. Transcriptome analysis of the retina from miR-211-/- mice during cone degeneration revealed significant alteration of pathways related to cell metabolism. Collectively, this study highlights for the first time the impact of miR-211 function in the retina and significantly contributes to unravelling the role of specific miRNAs in cone photoreceptor function and survival.

    Funded by: European Research Council: 311682; Wellcome Trust: 098051

    Scientific reports 2017;7;1;17004

  • Promoter-bound METTL3 maintains myeloid leukaemia by m6A-dependent translation control.

    Barbieri I, Tzelepis K, Pandolfini L, Shi J, Millán-Zambrano G, Robson SC, Aspris D, Migliori V, Bannister AJ, Han N, De Braekeleer E, Ponstingl H, Hendrick A, Vakoc CR, Vassiliou GS and Kouzarides T

    The Gurdon Institute and Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, UK.

    N<sup>6</sup>-methyladenosine (m<sup>6</sup>A) is an abundant internal RNA modification in both coding and non-coding RNAs that is catalysed by the METTL3-METTL14 methyltransferase complex. However, the specific role of these enzymes in cancer is still largely unknown. Here we define a pathway that is specific for METTL3 and is implicated in the maintenance of a leukaemic state. We identify METTL3 as an essential gene for growth of acute myeloid leukaemia cells in two distinct genetic screens. Downregulation of METTL3 results in cell cycle arrest, differentiation of leukaemic cells and failure to establish leukaemia in immunodeficient mice. We show that METTL3, independently of METTL14, associates with chromatin and localizes to the transcriptional start sites of active genes. The vast majority of these genes have the CAATT-box binding protein CEBPZ present at the transcriptional start site, and this is required for recruitment of METTL3 to chromatin. Promoter-bound METTL3 induces m<sup>6</sup>A modification within the coding region of the associated mRNA transcript, and enhances its translation by relieving ribosome stalling. We show that genes regulated by METTL3 in this way are necessary for acute myeloid leukaemia. Together, these data define METTL3 as a regulator of a chromatin-based pathway that is necessary for maintenance of the leukaemic state and identify this enzyme as a potential therapeutic target for acute myeloid leukaemia.

    Funded by: Cancer Research UK: 10827, A17001, A23015; European Research Council: 268569; Medical Research Council: MC_PC_12009; Wellcome Trust: 092096, 095663, 098051, C6946/AI4492, WT095663MA

    Nature 2017;552;7683;126-131

  • Evaluation of applicability of DNA microarray-based characterization of bovine Shiga toxin-producing Escherichia coli isolates using whole genome sequence analysis.

    Barth SA, Menge C, Eichhorn I, Semmler T, Pickard D and Geue L

    Friedrich-Loeffler-Institut/Federal Research Institute for Animal Health, Institute of Molecular Pathogenesis, Jena, Germany (Barth, Menge, Geue).

    We assessed the ability of a commercial DNA microarray to characterize bovine Shiga toxin-producing Escherichia coli (STEC) isolates and evaluated the results using in silico hybridization of the microarray probes within whole genome sequencing scaffolds. From a total of 69,954 reactions (393 probes with 178 isolates), 68,706 (98.2%) gave identical results by DNA microarray and in silico probe hybridization. Results were more congruent when detecting the genoserotype (209 differing results from 19,758 in total; 1.1%) or antimicrobial resistance genes (AMRGs; 141 of 26,878; 0.5%) than when detecting virulence-associated genes (VAGs; 876 of 22,072; 4.0%). Owing to the limited coverage of O-antigens by the microarray, only 37.2% of the isolates could be genoserotyped. However, the microarray proved suitable to rapidly screen bovine STEC strains for the occurrence of high numbers of VAGs and AMRGs and is suitable for molecular surveillance workflows.

    Journal of veterinary diagnostic investigation : official publication of the American Association of Veterinary Laboratory Diagnosticians, Inc 2017;29;5;721-724

  • Dynamic variation of CD5 surface expression levels within individual chronic lymphocytic leukemia clones.

    Bashford-Rogers RJ, Palser AL, Hodkinson C, Baxter J, Follows GA, Vassiliou GS and Kellam P

    Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK.

    Chronic lymphocytic leukemia (CLL) is characterized by the accumulation of clonally derived mature CD5<sup>high</sup> B cells; however, the cellular origin of CLL is still unknown. Patients with CLL also harbor variable numbers of CD5<sup>low</sup> B cells, but the clonal relationship of these cells to the bulk disease is unknown and can have important implications for monitoring, treating, and understanding the biology of CLL. Here, we use B-cell receptors (BCRs) as molecular barcodes to first show by single-cell BCR sequencing that the great majority of CD5<sup>low</sup> B cells in the blood of CLL patients are clonally related to CD5<sup>high</sup> CLL B cells. We investigate whether CD5 state switching was likely to occur continuously as a common event or as a rare event in CLL by tracking somatic BCR mutations in bulk CLL B cells and using them to reconstruct the phylogenetic relationships and evolutionary history of the CLL in four patients. Using statistical methods, we show that there is no parsimonious route from a single or low number of CD5<sup>low</sup> switch events to the CD5<sup>high</sup> population, but rather, large-scale and/or dynamic switching between these CD5 states is the most likely explanation. The overlapping BCR repertoires between CD5<sup>high</sup> and CD5<sup>low</sup> cells from CLL patient peripheral blood reveal that CLL exists in a continuum of CD5 expression. The major proportion of CD5<sup>low</sup> B cells in patients are leukemic, thus identifying CD5<sup>low</sup> B cells as an important component of CLL, with implications for CLL pathogenesis, clinical monitoring, and the development of anti-CD5-directed therapies.

    Funded by: Medical Research Council: MC_PC_12009

    Experimental hematology 2017;46;31-37.e10

  • Accurate characterization of the IFITM locus using MiSeq and PacBio sequencing shows genetic variation in Galliformes.

    Bassano I, Ong SH, Lawless N, Whitehead T, Fife M and Kellam P

    The Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Background: Interferon inducible transmembrane (IFITM) proteins are effectors of the immune system widely characterized for their role in restricting infection by diverse enveloped and non-enveloped viruses. The chicken IFITM (chIFITM) genes are clustered on chromosome 5 and to date four genes have been annotated, namely chIFITM1, chIFITM3, chIFITM5 and chIFITM10. However, due to poor assembly of this locus in the Gallus Gallus v4 genome, accurate characterization has so far proven problematic. Recently, a new chicken reference genome assembly Gallus Gallus v5 was generated using Sanger, 454, Illumina and PacBio sequencing technologies identifying considerable differences in the chIFITM locus over the previous genome releases.

    Methods: We re-sequenced the locus using both Illumina MiSeq and PacBio RS II sequencing technologies and we mapped RNA-seq data from the European Nucleotide Archive (ENA) to this finalized chIFITM locus. Using SureSelect probes capture probes designed to the finalized chIFITM locus, we sequenced the locus of a different chicken breed, namely a White Leghorn, and a turkey.

    Results: We confirmed the Gallus Gallus v5 consensus except for two insertions of 5 and 1 base pair within the chIFITM3 and B4GALNT4 genes, respectively, and a single base pair deletion within the B4GALNT4 gene. The pull down revealed a single amino acid substitution of A63V in the CIL domain of IFITM2 compared to Red Jungle fowl and 13, 13 and 11 differences between IFITM1, 2 and 3 of chickens and turkeys, respectively. RNA-seq shows chIFITM2 and chIFITM3 expression in numerous tissue types of different chicken breeds and avian cell lines, while the expression of the putative chIFITM1 is limited to the testis, caecum and ileum tissues.

    Conclusions: Locus resequencing using these capture probes and RNA-seq based expression analysis will allow the further characterization of genetic diversity within Galliformes.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/L00397X/2, BB/L00397X/1 , BB/L003996/1, BBS/E/I/00007031

    BMC genomics 2017;18;1;419

  • Editing the genome of hiPSC with CRISPR/Cas9: disease models.

    Bassett AR

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    The advent of human-induced pluripotent stem cell (hiPSC) technology has provided a unique opportunity to establish cellular models of disease from individual patients, and to study the effects of the underlying genetic aberrations upon multiple different cell types, many of which would not normally be accessible. Combining this with recent advances in genome editing techniques such as the clustered regularly interspaced short palindromic repeat (CRISPR) system has provided an ability to repair putative causative alleles in patient lines, or introduce disease alleles into a healthy "WT" cell line. This has enabled analysis of isogenic cell pairs that differ in a single genetic change, which allows a thorough assessment of the molecular and cellular phenotypes that result from this abnormality. Importantly, this establishes the true causative lesion, which is often impossible to ascertain from human genetic studies alone. These isogenic cell lines can be used not only to understand the cellular consequences of disease mutations, but also to perform high throughput genetic and pharmacological screens to both understand the underlying pathological mechanisms and to develop novel therapeutic agents to prevent or treat such diseases. In the future, optimising and developing such genetic manipulation technologies may facilitate the provision of cellular or molecular gene therapies, to intervene and ultimately cure many debilitating genetic disorders.

    Funded by: Wellcome: Core funding; Wellcome Trust

    Mammalian genome : official journal of the International Mammalian Genome Society 2017;28;7-8;348-364

  • A Family Based Study of Carbon Monoxide and Nitric Oxide Signalling Genes and Preeclampsia.

    Bauer AE, Avery CL, Shi M, Weinberg CR, Olshan AF, Harmon QE, Luo J, Yang J, Manuck TA, Wu MC, Williams N, McGinnis R, Morgan L, Klungsøyr K, Trogstad L, Magnus P and Engel SM

    Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA.

    Background: Preeclampsia is thought to originate during placentation, with incomplete remodelling and perfusion of the spiral arteries leading to reduced placental vascular capacity. Nitric oxide (NO) and carbon monoxide (CO) are powerful vasodilators that play a role in the placental vascular system. Although family clustering of preeclampsia has been observed, the existing genetic literature is limited by a failure to consider both mother and child.

    Methods: We conducted a nested case-control study within the Norwegian Mother and Child Birth Cohort of 1545 case-pairs and 995 control-pairs from 2540 validated dyads (2011 complete pairs, 529 missing mother or child genotype). We selected 1518 single-nucleotide polymorphisms (SNPs) with minor allele frequency >5% in NO and CO signalling pathways. We used log-linear Poisson regression models and likelihood ratio tests to assess maternal and child effects.

    Results: One SNP met criteria for a false discovery rate Q-value <0.05. The child variant, rs12547243 in adenylate cyclase 8 (ADCY8), was associated with an increased risk (relative risk [RR] 1.42, 95% confidence interval [CI] 1.20, 1.69 for AG vs. GG, RR 2.14, 95% CI 1.47, 3.11 for AA vs. GG, Q = 0.03). The maternal variant, rs30593 in PDE1C was associated with a decreased risk for the subtype of preeclampsia accompanied by early delivery (RR 0.45, 95% CI 0.27, 0.75 for TC vs. CC; Q = 0.02). None of the associations were replicated after correction for multiple testing.

    Conclusions: This study uses a novel approach to disentangle maternal and child genotypic effects of NO and CO signalling genes on preeclampsia.

    Funded by: British Heart Foundation: RG/99006; NICHD NIH HHS: R01 HD058008, T32 HD052468; NIEHS NIH HHS: N01ES75558, P30 ES010126; NINDS NIH HHS: U01 NS047537; Wellcome Trust: 076113, 083948/Z/07/Z, 088841/Z/09/Z

    Paediatric and perinatal epidemiology 2017;32;1;1-12

  • Evolution of complexity in the zebrafish synapse proteome.

    Bayés À, Collins MO, Reig-Viader R, Gou G, Goulding D, Izquierdo A, Choudhary JS, Emes RD and Grant SG

    Molecular Physiology of the Synapse Laboratory, Biomedical Research Institute Sant Pau (IIB Sant Pau), Sant Antoni Maria Claret 167, 08025 Barcelona, Spain.

    The proteome of human brain synapses is highly complex and is mutated in over 130 diseases. This complexity arose from two whole-genome duplications early in the vertebrate lineage. Zebrafish are used in modelling human diseases; however, its synapse proteome is uncharacterized, and whether the teleost-specific genome duplication (TSGD) influenced complexity is unknown. We report the characterization of the proteomes and ultrastructure of central synapses in zebrafish and analyse the importance of the TSGD. While the TSGD increases overall synapse proteome complexity, the postsynaptic density (PSD) proteome of zebrafish has lower complexity than mammals. A highly conserved set of ∼1,000 proteins is shared across vertebrates. PSD ultrastructural features are also conserved. Lineage-specific proteome differences indicate that vertebrate species evolved distinct synapse types and functions. The data sets are a resource for a wide range of studies and have important implications for the use of zebrafish in modelling human synaptic diseases.

    Nature communications 2017;8;14613

  • The evolving craniofacial phenotype of a patient with Sensenbrenner syndrome caused by IFT140 compound heterozygous mutations.

    Bayat A, Kerr B, Douzgou S and DDD Study

    aDepartment of Pediatrics, University Hospital of Hvidovre, Hvidovre, Denmark bManchester Centre for Genomic Medicine, St Mary's Hospital, Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Sciences Centre cSchool of Biological Sciences, Division of Evolution and Genomic Sciences, University of Manchester, Manchester dWellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Clinical dysmorphology 2017;26;4;247-251

  • The Promise of Whole Genome Pathogen Sequencing for the Molecular Epidemiology of Emerging Aquaculture Pathogens.

    Bayliss SC, Verner-Jeffreys DW, Bartie KL, Aanensen DM, Sheppard SK, Adams A and Feil EJ

    The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath Bath, UK.

    Aquaculture is the fastest growing food-producing sector, and the sustainability of this industry is critical both for global food security and economic welfare. The management of infectious disease represents a key challenge. Here, we discuss the opportunities afforded by whole genome sequencing of bacterial and viral pathogens of aquaculture to mitigate disease emergence and spread. We outline, by way of comparison, how sequencing technology is transforming the molecular epidemiology of pathogens of public health importance, emphasizing the importance of community-oriented databases and analysis tools.

    Frontiers in microbiology 2017;8;121

  • Genetics: Taking single-cell transcriptomics to the bedside.

    Behjati S and Haniffa M

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK; and at the Department of Paediatrics, University of Cambridge, Cambridge, Hills Road, CB2 0QQ, UK.

    Nature reviews. Clinical oncology 2017;14;10;590-592

  • Recurrent mutation of IGF signalling genes and distinct patterns of genomic rearrangement in osteosarcoma.

    Behjati S, Tarpey PS, Haase K, Ye H, Young MD, Alexandrov LB, Farndon SJ, Collord G, Wedge DC, Martincorena I, Cooke SL, Davies H, Mifsud W, Lidgren M, Martin S, Latimer C, Maddison M, Butler AP, Teague JW, Pillay N, Shlien A, McDermott U, Futreal PA, Baumhoer D, Zaikova O, Bjerkehagen B, Myklebost O, Amary MF, Tirabosco R, Van Loo P, Stratton MR, Flanagan AM and Campbell PJ

    Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    Osteosarcoma is a primary malignancy of bone that affects children and adults. Here, we present the largest sequencing study of osteosarcoma to date, comprising 112 childhood and adult tumours encompassing all major histological subtypes. A key finding of our study is the identification of mutations in insulin-like growth factor (IGF) signalling genes in 8/112 (7%) of cases. We validate this observation using fluorescence in situ hybridization (FISH) in an additional 87 osteosarcomas, with IGF1 receptor (IGF1R) amplification observed in 14% of tumours. These findings may inform patient selection in future trials of IGF1R inhibitors in osteosarcoma. Analysing patterns of mutation, we identify distinct rearrangement profiles including a process characterized by chromothripsis and amplification. This process operates recurrently at discrete genomic regions and generates driver mutations. It may represent an age-independent mutational mechanism that contributes to the development of osteosarcoma in children and adults alike.

    Funded by: Medical Research Council: MR/N005813/1; Wellcome Trust

    Nature communications 2017;8;15936

  • Clinical and molecular consequences of disease-associated de novo mutations in SATB2.

    Bengani H, Handley M, Alvi M, Ibitoye R, Lees M, Lynch SA, Lam W, Fannemel M, Nordgren A, Malmgren H, Kvarnung M, Mehta S, McKee S, Whiteford M, Stewart F, Connell F, Clayton-Smith J, Mansour S, Mohammed S, Fryer A, Morton J, UK10K Consortium, Grozeva D, Asam T, Moore D, Sifrim A, McRae J, Hurles ME, Firth HV, Raymond FL, Kini U, Nellåker C, Ddd Study and FitzPatrick DR

    MRC Human Genetics Unit, IGMM, University of Edinburgh, Western General Hospital, Edinburgh, UK.

    Purpose: To characterize features associated with de novo mutations affecting SATB2 function in individuals ascertained on the basis of intellectual disability.

    Methods: Twenty previously unreported individuals with 19 different SATB2 mutations (11 loss-of-function and 8 missense variants) were studied. Fibroblasts were used to measure mutant protein production. Subcellular localization and mobility of wild-type and mutant SATB2 were assessed using fluorescently tagged protein.

    Results: Recurrent clinical features included neurodevelopmental impairment (19/19), absent/near absent speech (16/19), normal somatic growth (17/19), cleft palate (9/19), drooling (12/19), and dental anomalies (8/19). Six of eight missense variants clustered in the first CUT domain. Sibling recurrence due to gonadal mosaicism was seen in one family. A nonsense mutation in the last exon resulted in production of a truncated protein retaining all three DNA-binding domains. SATB2 nuclear mobility was mutation-dependent; p.Arg389Cys in CUT1 increased mobility and both p.Gly515Ser in CUT2 and p.Gln566Lys between CUT2 and HOX reduced mobility. The clinical features in individuals with missense variants were indistinguishable from those with loss of function.

    Conclusion: SATB2 haploinsufficiency is a common cause of syndromic intellectual disability. When mutant SATB2 protein is produced, the protein appears functionally inactive with a disrupted pattern of chromatin or matrix association.Genet Med advance online publication 02 February 2017.

    Funded by: Medical Research Council: MC_PC_U127561093, MR/M014568/1

    Genetics in medicine : official journal of the American College of Medical Genetics 2017;19;8;900-908

  • Prospects of Fine-Mapping Trait-Associated Genomic Regions by Using Summary Statistics from Genome-wide Association Studies.

    Benner C, Havulinna AS, Järvelin MR, Salomaa V, Ripatti S and Pirinen M

    Institute for Molecular Medicine Finland, University of Helsinki, 00014 Helsinki, Finland; Department of Public Health, University of Helsinki, 00014 Helsinki, Finland. Electronic address:

    During the past few years, various novel statistical methods have been developed for fine-mapping with the use of summary statistics from genome-wide association studies (GWASs). Although these approaches require information about the linkage disequilibrium (LD) between variants, there has not been a comprehensive evaluation of how estimation of the LD structure from reference genotype panels performs in comparison with that from the original individual-level GWAS data. Using population genotype data from Finland and the UK Biobank, we show here that a reference panel of 1,000 individuals from the target population is adequate for a GWAS cohort of up to 10,000 individuals, whereas smaller panels, such as those from the 1000 Genomes Project, should be avoided. We also show, both theoretically and empirically, that the size of the reference panel needs to scale with the GWAS sample size; this has important consequences for the application of these methods in ongoing GWAS meta-analyses and large biobank studies. We conclude by providing software tools and by recommending practices for sharing LD information to more efficiently exploit summary statistics in genetics research.

    Funded by: Medical Research Council: MC_QA137853

    American journal of human genetics 2017;101;4;539-551

  • Citrobacter rodentium Subverts ATP Flux and Cholesterol Homeostasis in Intestinal Epithelial Cells In Vivo.

    Berger CN, Crepin VF, Roumeliotis TI, Wright JC, Carson D, Pevsner-Fischer M, Furniss RCD, Dougan G, Dori-Bachash M, Yu L, Clements A, Collins JW, Elinav E, Larrouy-Maumus GJ, Choudhary JS and Frankel G

    MRC Centre for Molecular Bacteriology and Infection, Department of Life Sciences, Imperial College London, London, UK.

    The intestinal epithelial cells (IECs) that line the gut form a robust line of defense against ingested pathogens. We investigated the impact of infection with the enteric pathogen Citrobacter rodentium on mouse IEC metabolism using global proteomic and targeted metabolomics and lipidomics. The major signatures of the infection were upregulation of the sugar transporter Sglt4, aerobic glycolysis, and production of phosphocreatine, which mobilizes cytosolic energy. In contrast, biogenesis of mitochondrial cardiolipins, essential for ATP production, was inhibited, which coincided with increased levels of mucosal O<sub>2</sub> and a reduction in colon-associated anaerobic commensals. In addition, IECs responded to infection by activating Srebp2 and the cholesterol biosynthetic pathway. Unexpectedly, infected IECs also upregulated the cholesterol efflux proteins AbcA1, AbcG8, and ApoA1, resulting in higher levels of fecal cholesterol and a bloom of Proteobacteria. These results suggest that C. rodentium manipulates host metabolism to evade innate immune responses and establish a favorable gut ecosystem.

    Funded by: Medical Research Council: MR/J006874/1, MR/K019007/1, MR/L01632X/1; Wellcome Trust

    Cell metabolism 2017;26;5;738-752.e6

  • Paleolithic networking.

    Bergström A and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK.

    Funded by: Wellcome Trust

    Science (New York, N.Y.) 2017;358;6363;586-587

  • A Neolithic expansion, but strong genetic structure, in the independent history of New Guinea.

    Bergström A, Oppenheimer SJ, Mentzer AJ, Auckland K, Robson K, Attenborough R, Alpers MP, Koki G, Pomat W, Siba P, Xue Y, Sandhu MS and Tyler-Smith C

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    New Guinea shows human occupation since ~50 thousand years ago (ka), independent adoption of plant cultivation ~10 ka, and great cultural and linguistic diversity today. We performed genome-wide single-nucleotide polymorphism genotyping on 381 individuals from 85 language groups in Papua New Guinea and find a sharp divide originating 10 to 20 ka between lowland and highland groups and a lack of non-New Guinean admixture in the latter. All highlanders share ancestry within the last 10 thousand years, with major population growth in the same period, suggesting population structure was reshaped following the Neolithic lifestyle transition. However, genetic differentiation between groups in Papua New Guinea is much stronger than in comparable regions in Eurasia, demonstrating that such a transition does not necessarily limit the genetic and linguistic diversity of human societies.

    Funded by: European Research Council: 294557; Wellcome Trust: 090532, 098051, 106289

    Science (New York, N.Y.) 2017;357;6356;1160-1163

  • Cross-Species Y Chromosome Function Between Malaria Vectors of the Anopheles gambiae Species Complex.

    Bernardini F, Galizi R, Wunderlich M, Taxiarchi C, Kranjc N, Kyrou K, Hammond A, Nolan T, Lawniczak MNK, Papathanos PA, Crisanti A and Windbichler N

    Department of Life Sciences, Imperial College London, South Kensington Campus, SW7 2AZ, United Kingdom.

    Y chromosome function, structure and evolution is poorly understood in many species, including the <i>Anopheles</i> genus of mosquitoes-an emerging model system for studying speciation that also represents the major vectors of malaria. While the Anopheline Y had previously been implicated in male mating behavior, recent data from the <i>Anopheles gambiae</i> complex suggests that, apart from the putative primary sex-determiner, no other genes are conserved on the Y. Studying the functional basis of the evolutionary divergence of the Y chromosome in the gambiae complex is complicated by complete F1 male hybrid sterility. Here, we used an F1 × F0 crossing scheme to overcome a severe bottleneck of male hybrid incompatibilities that enabled us to experimentally purify a genetically labeled <i>A. gambiae</i> Y chromosome in an <i>A. arabiensis</i> background. Whole genome sequencing (WGS) confirmed that the <i>A. gambiae</i> Y retained its original sequence content in the <i>A. arabiensis</i> genomic background. In contrast to comparable experiments in <i>Drosophila</i>, we find that the presence of a heterospecific Y chromosome has no significant effect on the expression of <i>A. arabiensis</i> genes, and transcriptional differences can be explained almost exclusively as a direct consequence of transcripts arising from sequence elements present on the <i>A. gambiae</i> Y chromosome itself. We find that Y hybrids show no obvious fertility defects, and no substantial reduction in male competitiveness. Our results demonstrate that, despite their radically different structure, Y chromosomes of these two species of the gambiae complex that diverged an estimated 1.85 MYA function interchangeably, thus indicating that the Y chromosome does not harbor loci contributing to hybrid incompatibility. Therefore, Y chromosome gene flow between members of the gambiae complex is possible even at their current level of divergence. Importantly, this also suggests that malaria control interventions based on sex-distorting Y drive would be transferable, whether intentionally or contingent, between the major malaria vector species.

    Funded by: European Research Council: 335724; Medical Research Council: G1100339; Wellcome Trust: 098051

    Genetics 2017;207;2;729-740

  • An endosiRNA-Based Repression Mechanism Counteracts Transposon Activation during Global DNA Demethylation in Embryonic Stem Cells.

    Berrens RV, Andrews S, Spensberger D, Santos F, Dean W, Gould P, Sharif J, Olova N, Chandra T, Koseki H, von Meyenn F and Reik W

    Epigenetics Programme, Babraham Institute, Cambridge CB22 3AT, UK; University of Cambridge, The Old Schools, Trinity Lane, Cambridge CB2 1TN, UK. Electronic address:

    Erasure of DNA methylation and repressive chromatin marks in the mammalian germline leads to risk of transcriptional activation of transposable elements (TEs). Here, we used mouse embryonic stem cells (ESCs) to identify an endosiRNA-based mechanism involved in suppression of TE transcription. In ESCs with DNA demethylation induced by acute deletion of Dnmt1, we saw an increase in sense transcription at TEs, resulting in an abundance of sense/antisense transcripts leading to high levels of ARGONAUTE2 (AGO2)-bound small RNAs. Inhibition of Dicer or Ago2 expression revealed that small RNAs are involved in an immediate response to demethylation-induced transposon activation, while the deposition of repressive histone marks follows as a chronic response. In vivo, we also found TE-specific endosiRNAs present during primordial germ cell development. Our results suggest that antisense TE transcription is a "trap" that elicits an endosiRNA response to restrain acute transposon activity during epigenetic reprogramming in the mammalian germline.

    Funded by: Wellcome Trust

    Cell stem cell 2017;21;5;694-703.e7

  • Cracking Ali Baba's code.

    Billker O

    Wellcome Trust Sanger Institute, Hinxton, United Kingdom.

    A protein called P36 holds the key to how different species of malaria parasite invade liver cells.

    eLife 2017;6

  • Genomic Dissection of an Icelandic Epidemic of Respiratory Disease in Horses and Associated Zoonotic Cases.

    Björnsdóttir S, Harris SR, Svansson V, Gunnarsson E, Sigurðardóttir ÓG, Gammeljord K, Steward KF, Newton JR, Robinson C, Charbonneau ARL, Parkhill J, Holden MTG and Waller AS

    MAST Icelandic Food and Veterinary Authority, Reykjavik, Iceland.

    Iceland is free of the major infectious diseases of horses. However, in 2010 an epidemic of respiratory disease of unknown cause spread through the country's native horse population of 77,000. Microbiological investigations ruled out known viral agents but identified the opportunistic pathogen <i>Streptococcus equi</i> subsp. <i>zooepidemicus</i> (<i>S. zooepidemicus</i>) in diseased animals. We sequenced the genomes of 257 isolates of <i>S. zooepidemicus</i> to differentiate epidemic from endemic strains. We found that although multiple endemic clones of <i>S. zooepidemicus</i> were present, one particular clone, sequence type 209 (ST209), was likely to have been responsible for the epidemic. Concurrent with the epidemic, ST209 was also recovered from a human case of septicemia, highlighting the pathogenic potential of this strain. Epidemiological investigation revealed that the incursion of this strain into one training yard during February 2010 provided a nidus for the infection of multiple horses that then transmitted the strain to farms throughout Iceland. This study represents the first time that whole-genome sequencing has been used to investigate an epidemic on a national scale to identify the likely causative agent and the link to an associated zoonotic infection. Our data highlight the importance of national biosecurity to protect vulnerable populations of animals and also demonstrate the potential impact of <i>S. zooepidemicus</i> transmission to other animals, including humans.<b>IMPORTANCE</b> An epidemic of respiratory disease affected almost the entire native Icelandic horse population of 77,000 animals in 2010, resulting in a self-imposed ban on the export of horses and significant economic costs to associated industries. Although the speed of transmission suggested that a viral pathogen was responsible, only the presence of the opportunistic pathogen <i>Streptococcus zooepidemicus</i> was consistent with the observed clinical signs. We applied genomic sequencing to differentiate epidemic from endemic strains and to shed light on the rapid transmission of the epidemic strain throughout Iceland. We further highlight the ability of epidemic and endemic strains of <i>S. zooepidemicus</i> to infect other animals, including humans. This study represents the first time that whole-genome sequencing has been used to elucidate an outbreak on a national scale and identify the likely causative agent.

    Funded by: Wellcome Trust: 098051

    mBio 2017;8;4

  • Variants of AbGRI3 carrying the armA gene in extensively antibiotic-resistant Acinetobacter baumannii from Singapore.

    Blackwell GA, Holt KE, Bentley SD, Hsu LY and Hall RM

    School of Life and Environmental Sciences, The University of Sydney, NSW 2006, Australia.

    Objectives: To investigate the context of the ribosomal RNA methyltransferase gene armA in carbapenem-resistant global clone 2 (GC2) Acinetobacter baumannii isolates from Singapore.

    Methods: Antibiotic resistance was determined using disc diffusion; PCR was used to identify resistance genes. Whole genome sequences were determined and contigs were assembled and ordered using PCR. Resistance regions in unsequenced isolates were mapped.

    Results: Fifteen GC2 A. baumannii isolated at Singapore General Hospital over the period 2004-11 and found to carry the armA gene were resistant to carbapenems, third-generation cephalosporins, fluoroquinolones and most aminoglycosides. In these isolates, the armA gene was located in a third chromosomal resistance island, previously designated AbGRI3. In four isolates, armA was in a 19 kb IS 26 -bounded transposon, designated Tn 6180 . In three of them, a 2.7 kb transposon carrying the aphA1b gene, designated Tn 6179 , was found adjacent to and sharing an IS 26 with Tn 6180. However, in these four isolates a 3.1 kb segment of the adjacent chromosomal DNA has been inverted by an IS 26 -mediated event. The remaining 11 isolates all contained a derivative of Tn 6180 that had lost part of the central segment and only one retained Tn 6179 . The chromosomal inversion was present in four of these and in seven the deletion extended beyond the inversion into adjacent chromosomal DNA. AbGRI3 forms were found in available GC2 sequences carrying armA.

    Conclusions: In GC2 A. baumannii , the armA gene is located in various forms of a third genomic resistance island named AbGRI3. An aphA1b transposon is variably present in AbGRI3.

    The Journal of antimicrobial chemotherapy 2017;72;4;1031-1039

  • Viral genetic variation accounts for a third of variability in HIV-1 set-point viral load in Europe.

    Blanquart F, Wymant C, Cornelissen M, Gall A, Bakker M, Bezemer D, Hall M, Hillebregt M, Ong SH, Albert J, Bannert N, Fellay J, Fransen K, Gourlay AJ, Grabowski MK, Gunsenheimer-Bartmeyer B, Günthard HF, Kivelä P, Kouyos R, Laeyendecker O, Liitsola K, Meyer L, Porter K, Ristola M, van Sighem A, Vanham G, Berkhout B, Kellam P, Reiss P, Fraser C and BEEHIVE collaboration

    Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom.

    HIV-1 set-point viral load-the approximately stable value of viraemia in the first years of chronic infection-is a strong predictor of clinical outcome and is highly variable across infected individuals. To better understand HIV-1 pathogenesis and the evolution of the viral population, we must quantify the heritability of set-point viral load, which is the fraction of variation in this phenotype attributable to viral genetic variation. However, current estimates of heritability vary widely, from 6% to 59%. Here we used a dataset of 2,028 seroconverters infected between 1985 and 2013 from 5 European countries (Belgium, Switzerland, France, the Netherlands and the United Kingdom) and estimated the heritability of set-point viral load at 31% (CI 15%-43%). Specifically, heritability was measured using models of character evolution describing how viral load evolves on the phylogeny of whole-genome viral sequences. In contrast to previous studies, (i) we measured viral loads using standardized assays on a sample collected in a strict time window of 6 to 24 months after infection, from which the viral genome was also sequenced; (ii) we compared 2 models of character evolution, the classical "Brownian motion" model and another model ("Ornstein-Uhlenbeck") that includes stabilising selection on viral load; (iii) we controlled for covariates, including age and sex, which may inflate estimates of heritability; and (iv) we developed a goodness of fit test based on the correlation of viral loads in cherries of the phylogenetic tree, showing that both models of character evolution fit the data well. An overall heritability of 31% (CI 15%-43%) is consistent with other studies based on regression of viral load in donor-recipient pairs. Thus, about a third of variation in HIV-1 virulence is attributable to viral genetic variation.

    Funded by: Medical Research Council: MR/L01632X/1; NIAID NIH HHS: K01 AI125086

    PLoS biology 2017;15;6;e2001855

  • Induction of Cell Cycle and NK Cell Responses by Live-Attenuated Oral Vaccines against Typhoid Fever.

    Blohmke CJ, Hill J, Darton TC, Carvalho-Burger M, Eustace A, Jones C, Schreiber F, Goodier MR, Dougan G, Nakaya HI and Pollard AJ

    Oxford Vaccine Group, Department of Paediatrics, University of Oxford, NIHR Oxford Biomedical Research Centre, Oxford, United Kingdom.

    The mechanisms by which oral, live-attenuated vaccines protect against typhoid fever are poorly understood. Here, we analyze transcriptional responses after vaccination with Ty21a or vaccine candidate, M01ZH09. Alterations in response profiles were related to vaccine-induced immune responses and subsequent outcome after wild-type <i>Salmonella</i> Typhi challenge. Despite broad genetic similarity, we detected differences in transcriptional responses to each vaccine. Seven days after M01ZH09 vaccination, marked cell cycle activation was identified and associated with humoral immunogenicity. By contrast, vaccination with Ty21a was associated with NK cell activity and validated in peripheral blood mononuclear cell stimulation assays confirming superior induction of an NK cell response. Moreover, transcriptional signatures of amino acid metabolism in Ty21a recipients were associated with protection against infection, including increased incubation time and decreased severity. Our data provide detailed insight into molecular immune responses to typhoid vaccines, which could aid the rational design of improved oral, live-attenuated vaccines against enteric pathogens.

    Funded by: Medical Research Council: MR/M02637X/1; Wellcome Trust

    Frontiers in immunology 2017;8;1276

  • Galleria mellonella is low cost and suitable surrogate host for studying virulence of human pathogenic Vibrio cholerae.

    Bokhari H, Ali A, Noreen Z, Thomson N and Wren BW

    Department of Biosciences, COMSATS Institute of Information Technology, Islamabad, Pakistan. Electronic address:

    Vibrio cholerae causes a severe diarrheal disease affecting millions of people worldwide, particularly in low income countries. V. cholerae successfully persist in aquatic environment and its pathogenic strains results in sever enteric disease in humans. This dual life style contributes towards its better survival and persistence inside host gut and in the environment. Alternative animal replacement models are of great value in studying host-pathogen interaction and for quick screening of various pathogenic strains. One such model is Galleria mellonella, a wax moth which has a complex innate immune system and here we investigate its suitability as a model for clinical human isolates of O1 El TOR, Ogawa serotype belonging to two genetically distinct subclades found in Pakistan (PSC-1 and PSC-2). We demonstrate that the PSC-2 strain D59 frequently isolated from inland areas, was more virulent than PSC-1 strain K7 mainly isolated from coastal areas (p=0.0001). In addition, we compared the relative biofilm capability of the representative strains as indicators of their survival and persistence in the environment and K7 showed enhanced biofilm forming capabilities (p=0.004). Finally we present the annotated genomes of the strains D59 and K7, and compared them with the reference strain N16961.

    Gene 2017;628;1-7

  • Next-generation sequencing of a family with a high penetrance of monoclonal gammopathies for the identification of candidate risk alleles.

    Bolli N, Barcella M, Salvi E, D'Avila F, Vendramin A, De Philippis C, Munshi NC, Avet-Loiseau H, Campbell PJ, Mussetti A, Carniti C, Maura F, Barlassina C, Corradini P and Montefusco V

    Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy.

    Background: The authors describe a family with a high penetrance of plasma cell dyscrasias, suggesting inheritance of an autosomal dominant risk allele.

    Methods: The authors performed whole-exome sequencing and reported on a combined approach aimed at the identification of causative variants and risk loci, using the wealth of data provided by this approach.

    Results: The authors identified gene mutations and single-nucleotide polymorphisms of potential significance, and pinpointed a known risk locus for myeloma as a potential area of transmissible risk in the family.

    Conclusions: To the authors' knowledge, the current study is the first to provide a whole-exome sequencing approach to such cases, and a framework analysis that could be applied to further understanding of the inherited risk of developing plasma cell dyscrasias. Cancer 2017;123:3701-3708. © 2017 American Cancer Society.

    Cancer 2017;123;19;3701-3708

  • Analysis of the genomic landscape of multiple myeloma highlights novel prognostic markers and disease subgroups.

    Bolli N, Biancon G, Moarii M, Gimondi S, Li Y, de Philippis C, Maura F, Sathiaseelan V, Tai YT, Mudie L, O'Meara S, Raine K, Teague JW, Butler AP, Carniti C, Gerstung M, Bagratuni T, Kastritis E, Dimopoulos M, Corradini P, Anderson K, Moreau P, Minvielle S, Campbell PJ, Papaemmanuil E, Avet-Loiseau H and Munshi NC

    University of Milan, Department of Oncology and Onco-Hematology, Milan, Italy.

    In multiple myeloma, next generation sequencing (NGS) has expanded our knowledge of genomic lesions, and highlighted a dynamic and heterogeneous composition of the tumor. Here, we used NGS to characterize the genomic landscape of 418 multiple myeloma cases at diagnosis and correlate this with prognosis and classification. Translocations and copy number changes (CNAs) had a preponderant contribution over gene mutations in defining the genotype and prognosis of each case. Known and novel independent prognostic markers were identified in our cohort of proteasome inhibitor and IMiD-treated patients with long follow-up, including events with context-specific prognostic value, such as deletions of the PRDM1 gene. Taking advantage of the comprehensive genomic annotation of each case, we used innovative statistical approaches to identify potential novel myeloma subgroups. We observed clusters of patients stratified based on the overall number of mutations and number/type of CNAs, with distinct effects on survival, suggesting that extended genotype of multiple myeloma at diagnosis may lead to improved disease classification and prognostication.Leukemia accepted article preview online, 06 December 2017. doi:10.1038/leu.2017.344.

    Funded by: BLRD VA: I01 BX001584; NCI NIH HHS: P01 CA155258; Wellcome Trust

    Leukemia 2017

  • The impact of rare and low-frequency genetic variants in common disease.

    Bomba L, Walter K and Soranzo N

    Human Genetics, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, CB10 1HH, UK.

    Despite thousands of genetic loci identified to date, a large proportion of genetic variation predisposing to complex disease and traits remains unaccounted for. Advances in sequencing technology enable focused explorations on the contribution of low-frequency and rare variants to human traits. Here we review experimental approaches and current knowledge on the contribution of these genetic variants in complex disease and discuss challenges and opportunities for personalised medicine.

    Funded by: Wellcome Trust: WT091310, WT098051

    Genome biology 2017;18;1;77

  • KDM2A integrates DNA and histone modification signals through a CXXC/PHD module and direct interaction with HP1.

    Borgel J, Tyl M, Schiller K, Pusztai Z, Dooley CM, Deng W, Wooding C, White RJ, Warnecke T, Leonhardt H, Busch-Nentwich EM and Bartke T

    MRC Clinical Sciences Centre (CSC), Du Cane Road, London, UK.

    Functional genomic elements are marked by characteristic DNA and histone modification signatures. How combinatorial chromatin modification states are recognized by epigenetic reader proteins and how this is linked to their biological function is largely unknown. Here we provide a detailed molecular analysis of chromatin recognition by the lysine demethylase KDM2A. Using biochemical approaches we identify a nucleosome interaction module within KDM2A consisting of a CXXC type zinc finger, a PHD domain and a newly identified Heterochromatin Protein 1 (HP1) interaction motif that mediates direct binding between KDM2A and HP1. This nucleosome interaction module enables KDM2A to decode nucleosomal H3K9me3 modification in addition to CpG methylation signals. The multivalent engagement with DNA and HP1 results in a nucleosome binding circuit in which KDM2A can be recruited to H3K9me3-modified chromatin through HP1, and HP1 can be recruited to unmodified chromatin by KDM2A. A KDM2A mutant deficient in HP1-binding is inactive in an in vivo overexpression assay in zebrafish embryos demonstrating that the HP1 interaction is essential for KDM2A function. Our results reveal a complex regulation of chromatin binding for both KDM2A and HP1 that is modulated by DNA- and H3K9-methylation, and suggest a direct role for KDM2A in chromatin silencing.

    Funded by: European Research Council: 309952; Medical Research Council: MC_UP_1102/7; Wellcome Trust

    Nucleic acids research 2017;45;3;1114-1129

  • Revealing hidden complexities of genomic rearrangements generated with Cas9.

    Boroviak K, Fu B, Yang F, Doe B and Bradley A

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom.

    Modelling human diseases caused by large genomic rearrangements has become more accessible since the utilization of CRISPR/Cas9 in mammalian systems. In a previous study, we showed that genomic rearrangements of up to one million base pairs can be generated by direct injection of CRISPR/Cas9 reagents into mouse zygotes. Although these rearrangements are ascertained by junction PCR, we describe here a variety of anticipated structural changes often involving reintegration of the region demarcated by the gRNAs in the vicinity of the edited locus. We illustrate here some of this diversity detected by high-resolution fibre-FISH and conclude that extensive molecular analysis is required to fully understand the structure of engineered chromosomes generated by Cas9.

    Funded by: NIH HHS: U42 OD011174; Wellcome Trust: WT206194

    Scientific reports 2017;7;1;12867

  • Whole Genome Sequencing for Surveillance of Antimicrobial Resistance in Actinobacillus pleuropneumoniae.

    Bossé JT, Li Y, Rogers J, Fernandez Crespo R, Li Y, Chaudhuri RR, Holden MT, Maskell DJ, Tucker AW, Wren BW, Rycroft AN and Langford PR

    Section of Paediatrics, Department of Medicine, Imperial College London London, UK.

    The aim of this study was to evaluate the correlation between antimicrobial resistance (AMR) profiles of 96 clinical isolates of <i>Actinobacillus pleuropneumoniae</i>, an important porcine respiratory pathogen, and the identification of AMR genes in whole genome sequence (wgs) data. Susceptibility of the isolates to nine antimicrobial agents (ampicillin, enrofloxacin, erythromycin, florfenicol, sulfisoxazole, tetracycline, tilmicosin, trimethoprim, and tylosin) was determined by agar dilution susceptibility test. Except for the macrolides tested, elevated MICs were highly correlated to the presence of AMR genes identified in wgs data using ResFinder or BLASTn. Of the isolates tested, 57% were resistant to tetracycline [MIC ≥ 4 mg/L; 94.8% with either <i>tet</i>(B) or <i>tet</i>(H)]; 48% to sulfisoxazole (MIC ≥ 256 mg/L or DD = 6; 100% with <i>sul2</i>), 20% to ampicillin (MIC ≥ 4 mg/L; 100% with <i>bla</i><sub>ROB-1</sub>), 17% to trimethoprim (MIC ≥ 32 mg/L; 100% with <i>dfrA14</i>), and 6% to enrofloxacin (MIC ≥ 0.25 mg/L; 100% with GyrAS83F). Only 33% of the isolates did not have detectable AMR genes, and were sensitive by MICs for the antimicrobial agents tested. Although 23 isolates had MIC ≥ 32 mg/L for tylosin, all isolates had MIC ≤ 16 mg/L for both erythromycin and tilmicosin, and no macrolide resistance genes or known point mutations were detected. Other than the GyrAS83F mutation, the AMR genes detected were mapped to potential plasmids. In addition to presence on plasmid(s), the <i>tet</i>(B) gene was also found chromosomally either as part of a 56 kb integrative conjugative element (ICE<i>Apl1</i>) in 21, or as part of a Tn<i>7</i> insertion in 15 isolates. Our results indicate that, with the exception of macrolides, wgs data can be used to accurately predict resistance of <i>A. pleuropneumoniae</i> to the tested antimicrobial agents and provides added value for routine surveillance.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/G018553/1, BB/G019177/1, BB/G019274/1, BB/G020744/1

    Frontiers in microbiology 2017;8;311

  • Loss of the homologous recombination gene rad51 leads to Fanconi anemia-like symptoms in zebrafish.

    Botthof JG, Bielczyk-Maczyńska E, Ferreira L and Cvejic A

    Department of Haematology, University of Cambridge, Addenbrookes Hospital, Cambridge CB2 0XY, United Kingdom.

    RAD51 is an indispensable homologous recombination protein, necessary for strand invasion and crossing over. It has recently been designated as a Fanconi anemia (FA) gene, following the discovery of two patients carrying dominant-negative mutations. FA is a hereditary DNA-repair disorder characterized by various congenital abnormalities, progressive bone marrow failure, and cancer predisposition. In this report, we describe a viable vertebrate model of <i>RAD51</i> loss. Zebrafish <i>rad51</i> loss-of-function mutants developed key features of FA, including hypocellular kidney marrow, sensitivity to cross-linking agents, and decreased size. We show that some of these symptoms stem from both decreased proliferation and increased apoptosis of embryonic hematopoietic stem and progenitor cells. Comutation of <i>p53</i> was able to rescue the hematopoietic defects seen in the single mutants, but led to tumor development. We further demonstrate that prolonged inflammatory stress can exacerbate the hematological impairment, leading to an additional decrease in kidney marrow cell numbers. These findings strengthen the assignment of <i>RAD51</i> as a Fanconi gene and provide more evidence for the notion that aberrant p53 signaling during embryogenesis leads to the hematological defects seen later in life in FA. Further research on this zebrafish FA model will lead to a deeper understanding of the molecular basis of bone marrow failure in FA and the cellular role of RAD51.

    Funded by: Cancer Research UK: C45041/A14953; Medical Research Council: MC_PC_12009; Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 2017;114;22;E4452-E4461

  • Semantic prioritization of novel causative genomic variants.

    Boudellioua I, Mahamad Razali RB, Kulmanov M, Hashish Y, Bajic VB, Goncalves-Serra E, Schoenmakers N, Gkoutos GV, Schofield PN and Hoehndorf R

    King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia.

    Discriminating the causative disease variant(s) for individuals with inherited or de novo mutations presents one of the main challenges faced by the clinical genetics community today. Computational approaches for variant prioritization include machine learning methods utilizing a large number of features, including molecular information, interaction networks, or phenotypes. Here, we demonstrate the PhenomeNET Variant Predictor (PVP) system that exploits semantic technologies and automated reasoning over genotype-phenotype relations to filter and prioritize variants in whole exome and whole genome sequencing datasets. We demonstrate the performance of PVP in identifying causative variants on a large number of synthetic whole exome and whole genome sequences, covering a wide range of diseases and syndromes. In a retrospective study, we further illustrate the application of PVP for the interpretation of whole exome sequencing data in patients suffering from congenital hypothyroidism. We find that PVP accurately identifies causative variants in whole exome and whole genome sequencing datasets and provides a powerful resource for the discovery of causal variants.

    Funded by: Medical Research Council: MC_UU_12012/5; Wellcome Trust: 100585/Z/12/Z

    PLoS computational biology 2017;13;4;e1005500

  • A large scale hearing loss screen reveals an extensive unexplored genetic landscape for auditory dysfunction.

    Bowl MR, Simon MM, Ingham NJ, Greenaway S, Santos L, Cater H, Taylor S, Mason J, Kurbatova N, Pearson S, Bower LR, Clary DA, Meziane H, Reilly P, Minowa O, Kelsey L, International Mouse Phenotyping Consortium, Tocchini-Valentini GP, Gao X, Bradley A, Skarnes WC, Moore M, Beaudet AL, Justice MJ, Seavitt J, Dickinson ME, Wurst W, de Angelis MH, Herault Y, Wakana S, Nutter LMJ, Flenniken AM, McKerlie C, Murray SA, Svenson KL, Braun RE, West DB, Lloyd KCK, Adams DJ, White J, Karp N, Flicek P, Smedley D, Meehan TF, Parkinson HE, Teboul LM, Wells S, Steel KP, Mallon AM and Brown SDM

    Medical Research Council Harwell Institute (Mammalian Genetics Unit and Mary Lyon Centre), Harwell, Oxfordshire, OX11 0RD, UK.

    The developmental and physiological complexity of the auditory system is likely reflected in the underlying set of genes involved in auditory function. In humans, over 150 non-syndromic loci have been identified, and there are more than 400 human genetic syndromes with a hearing loss component. Over 100 non-syndromic hearing loss genes have been identified in mouse and human, but we remain ignorant of the full extent of the genetic landscape involved in auditory dysfunction. As part of the International Mouse Phenotyping Consortium, we undertook a hearing loss screen in a cohort of 3006 mouse knockout strains. In total, we identify 67 candidate hearing loss genes. We detect known hearing loss genes, but the vast majority, 52, of the candidate genes were novel. Our analysis reveals a large and unexplored genetic landscape involved with auditory function.The full extent of the genetic basis for hearing impairment is unknown. Here, as part of the International Mouse Phenotyping Consortium, the authors perform a hearing loss screen in 3006 mouse knockout strains and identify 52 new candidate genes for genetic hearing loss.

    Funded by: Medical Research Council: G0300212, MC_QA137918, MC_U142684171, MC_U142684175; NHGRI NIH HHS: U54 HG006332, U54 HG006348, U54 HG006364, U54 HG006370, UM1 HG006348, UM1 HG006370; NIH HHS: U42 OD011174, U42 OD011175, U42 OD011185, U42 OD012210, UM1 OD023221, UM1 OD023222; Wellcome Trust

    Nature communications 2017;8;1;886

  • International Cooperation to Enable the Diagnosis of All Rare Genetic Diseases.

    Boycott KM, Rath A, Chong JX, Hartley T, Alkuraya FS, Baynam G, Brookes AJ, Brudno M, Carracedo A, den Dunnen JT, Dyke SOM, Estivill X, Goldblatt J, Gonthier C, Groft SC, Gut I, Hamosh A, Hieter P, Höhn S, Hurles ME, Kaufmann P, Knoppers BM, Krischer JP, Macek M, Matthijs G, Olry A, Parker S, Paschall J, Philippakis AA, Rehm HL, Robinson PN, Sham PC, Stefanov R, Taruscio D, Unni D, Vanstone MR, Zhang F, Brunner H, Bamshad MJ and Lochmüller H

    Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, ON K1H 8L1, Canada. Electronic address:

    Provision of a molecularly confirmed diagnosis in a timely manner for children and adults with rare genetic diseases shortens their "diagnostic odyssey," improves disease management, and fosters genetic counseling with respect to recurrence risks while assuring reproductive choices. In a general clinical genetics setting, the current diagnostic rate is approximately 50%, but for those who do not receive a molecular diagnosis after the initial genetics evaluation, that rate is much lower. Diagnostic success for these more challenging affected individuals depends to a large extent on progress in the discovery of genes associated with, and mechanisms underlying, rare diseases. Thus, continued research is required for moving toward a more complete catalog of disease-related genes and variants. The International Rare Diseases Research Consortium (IRDiRC) was established in 2011 to bring together researchers and organizations invested in rare disease research to develop a means of achieving molecular diagnosis for all rare diseases. Here, we review the current and future bottlenecks to gene discovery and suggest strategies for enabling progress in this regard. Each successful discovery will define potential diagnostic, preventive, and therapeutic opportunities for the corresponding rare disease, enabling precision medicine for this patient population.

    Funded by: NHGRI NIH HHS: U41 HG006627, U54 HG006493, U54 HG006542, UM1 HG006493, UM1 HG008900; Wellcome Trust

    American journal of human genetics 2017;100;5;695-705

  • Genome-wide chemical mutagenesis screens allow unbiased saturation of the cancer genome and identification of drug resistance mutations.

    Brammeld JS, Petljak M, Martincorena I, Williams SP, Alonso LG, Dalmases A, Bellosillo B, Robles-Espinoza CD, Price S, Barthorpe S, Tarpey P, Alifrangis C, Bignell G, Vidal J, Young J, Stebbings L, Beal K, Stratton MR, Saez-Rodriguez J, Garnett M, Montagut C, Iorio F and McDermott U

    Wellcome Trust Sanger Institute, Hinxton CB10 1SA, United Kingdom.

    Drug resistance is an almost inevitable consequence of cancer therapy and ultimately proves fatal for the majority of patients. In many cases, this is the consequence of specific gene mutations that have the potential to be targeted to resensitize the tumor. The ability to uniformly saturate the genome with point mutations without chromosome or nucleotide sequence context bias would open the door to identify all putative drug resistance mutations in cancer models. Here, we describe such a method for elucidating drug resistance mechanisms using genome-wide chemical mutagenesis allied to next-generation sequencing. We show that chemically mutagenizing the genome of cancer cells dramatically increases the number of drug-resistant clones and allows the detection of both known and novel drug resistance mutations. We used an efficient computational process that allows for the rapid identification of involved pathways and druggable targets. Such a priori knowledge would greatly empower serial monitoring strategies for drug resistance in the clinic as well as the development of trials for drug-resistant patients.

    Funded by: Cancer Research UK; Medical Research Council; Wellcome Trust

    Genome research 2017;27;4;613-625

  • Artificial and natural RNA interactions between bacteria and C. elegans.

    Braukmann F, Jordan D and Miska E

    a Gurdon Institute, University of Cambridge , Cambridge , UK.

    Nineteen years after Lisa Timmons and Andy Fire first described RNA transfer from bacteria to C. elegans in an experimental setting <sup>48</sup> the biologic role of this trans-kingdom RNA-based communication remains unknown. Here we summarize our current understanding on the mechanism and potential role of such social RNA.

    RNA biology 2017;14;4;415-420

  • Efficient CRISPR/Cas9-assisted gene targeting enables rapid and precise genetic manipulation of mammalian neural stem cells.

    Bressan RB, Dewari PS, Kalantzaki M, Gangoso E, Matjusaitis M, Garcia-Diaz C, Blin C, Grant V, Bulstrode H, Gogolok S, Skarnes WC and Pollard SM

    MRC Centre for Regenerative Medicine, University of Edinburgh, Edinburgh, UK.

    Mammalian neural stem cell (NSC) lines provide a tractable model for discovery across stem cell and developmental biology, regenerative medicine and neuroscience. They can be derived from foetal or adult germinal tissues and continuously propagated <i>in vitro</i> as adherent monolayers. NSCs are clonally expandable, genetically stable, and easily transfectable - experimental attributes compatible with targeted genetic manipulations. However, gene targeting, which is crucial for functional studies of embryonic stem cells, has not been exploited to date in NSC lines. Here, we deploy CRISPR/Cas9 technology to demonstrate a variety of sophisticated genetic modifications via gene targeting in both mouse and human NSC lines, including: (1) efficient targeted transgene insertion at safe harbour loci (<i>Rosa26</i> and <i>AAVS1</i>); (2) biallelic knockout of neurodevelopmental transcription factor genes; (3) simple knock-in of epitope tags and fluorescent reporters (e.g. <i>Sox2-V5</i> and <i>Sox2-mCherry</i>); and (4) engineering of glioma mutations (<i>TP53</i> deletion; <i>H3F3A</i> point mutations). These resources and optimised methods enable facile and scalable genome editing in mammalian NSCs, providing significant new opportunities for functional genetic analysis.

    Funded by: Cancer Research UK: A19778; Medical Research Council: MR/L012766/1

    Development (Cambridge, England) 2017;144;4;635-648

  • Longitudinal genomic surveillance of multidrug-resistant Escherichia coli carriage in a long-term care facility in the United Kingdom.

    Brodrick HJ, Raven KE, Kallonen T, Jamrozy D, Blane B, Brown NM, Martin V, Török ME, Parkhill J and Peacock SJ

    Department of Medicine, University of Cambridge, Box 157, Addenbrooke's Hospital, Hills Road, Cambridge, CB2 0QQ, UK.

    Background: Residents of long-term care facilities (LTCF) may have high carriage rates of multidrug-resistant pathogens, but are not currently included in surveillance programmes for antimicrobial resistance or healthcare-associated infections. Here, we describe the value derived from a longitudinal epidemiological and genomic surveillance study of drug-resistant Escherichia coli in a LTCF in the United Kingdom (UK).

    Methods: Forty-five of 90 (50%) residents were recruited and followed for six months in 2014. Participants were screened weekly for carriage of extended-spectrum beta-lactamase (ESBL) producing E. coli. Participants positive for ESBL E. coli were also screened for ESBL-negative E. coli. Phenotypic antibiotic susceptibility of E. coli was determined using the Vitek2 instrument and isolates were sequenced on an Illumina HiSeq2000 instrument. Information was collected on episodes of clinical infection and antibiotic consumption.

    Results: Seventeen of 45 participants (38%) carried ESBL E. coli. Twenty-three of the 45 participants (51%) had 63 documented episodes of clinical infection treated with antibiotics. Treatment with antibiotics was associated with higher risk of carrying ESBL E. coli. ESBL E. coli was mainly sequence type (ST)131 (16/17, 94%). Non-ESBL E. coli from these 17 cases was more genetically diverse, but ST131 was found in eight (47%) cases. Whole-genome analysis of 297 ST131 E. coli from the 17 cases demonstrated highly related strains from six participants, indicating acquisition from a common source or person-to-person transmission. Five participants carried highly related strains of both ESBL-positive and ESBL-negative ST131. Genome-based comparison of ST131 isolates from the LTCF study participants with ST131 associated with bloodstream infection at a nearby acute hospital and in hospitals across England revealed sharing of highly related lineages between the LTCF and a local hospital.

    Conclusions: This study demonstrates the power of genomic surveillance to detect multidrug-resistant pathogens and confirm their connectivity within a healthcare network.

    Genome medicine 2017;9;1;70

  • Human primary liver cancer-derived organoid cultures for disease modeling and drug screening.

    Broutier L, Mastrogiovanni G, Verstegen MM, Francies HE, Gavarró LM, Bradshaw CR, Allen GE, Arnes-Benito R, Sidorova O, Gaspersz MP, Georgakopoulos N, Koo BK, Dietmann S, Davies SE, Praseedom RK, Lieshout R, IJzermans JNM, Wigmore SJ, Saeb-Parsy K, Garnett MJ, van der Laan LJ and Huch M

    The Wellcome Trust/CRUK Gurdon Institute, University of Cambridge, Cambridge, UK.

    Human liver cancer research currently lacks in vitro models that can faithfully recapitulate the pathophysiology of the original tumor. We recently described a novel, near-physiological organoid culture system, wherein primary human healthy liver cells form long-term expanding organoids that retain liver tissue function and genetic stability. Here we extend this culture system to the propagation of primary liver cancer (PLC) organoids from three of the most common PLC subtypes: hepatocellular carcinoma (HCC), cholangiocarcinoma (CC) and combined HCC/CC (CHC) tumors. PLC-derived organoid cultures preserve the histological architecture, gene expression and genomic landscape of the original tumor, allowing for discrimination between different tumor tissues and subtypes, even after long-term expansion in culture in the same medium conditions. Xenograft studies demonstrate that the tumorogenic potential, histological features and metastatic properties of PLC-derived organoids are preserved in vivo. PLC-derived organoids are amenable for biomarker identification and drug-screening testing and led to the identification of the ERK inhibitor SCH772984 as a potential therapeutic agent for primary liver cancer. We thus demonstrate the wide-ranging biomedical utilities of PLC-derived organoid models in furthering the understanding of liver cancer biology and in developing personalized-medicine approaches for the disease.

    Funded by: National Centre for the Replacement, Refinement and Reduction of Animals in Research: NC/R001162/1; Wellcome Trust: 104151

    Nature medicine 2017;23;12;1424-1435

  • Targeting DNA Repair in Cancer: Beyond PARP Inhibitors.

    Brown JS, O'Carrigan B, Jackson SP and Yap TA

    Royal Marsden NHS Foundation Trust, London, United Kingdom.

    Germline aberrations in critical DNA-repair and DNA damage-response (DDR) genes cause cancer predisposition, whereas various tumors harbor somatic mutations causing defective DDR/DNA repair. The concept of synthetic lethality can be exploited in such malignancies, as exemplified by approval of poly(ADP-ribose) polymerase inhibitors for treating BRCA1/2-mutated ovarian cancers. Herein, we detail how cellular DDR processes engage various proteins that sense DNA damage, initiate signaling pathways to promote cell-cycle checkpoint activation, trigger apoptosis, and coordinate DNA repair. We focus on novel therapeutic strategies targeting promising DDR targets and discuss challenges of patient selection and the development of rational drug combinations.

    Significance: Various inhibitors of DDR components are in preclinical and clinical development. A thorough understanding of DDR pathway complexities must now be combined with strategies and lessons learned from the successful registration of PARP inhibitors in order to fully exploit the potential of DDR inhibitors and to ensure their long-term clinical success. Cancer Discov; 7(1); 20-37. ©2016 AACR.

    Funded by: Cancer Research UK: C6/A18796, C6946/A14492; Wellcome Trust: WT092096

    Cancer discovery 2017;7;1;20-37

  • Transmission of the gut microbiota: spreading of health.

    Browne HP, Neville BA, Forster SC and Lawley TD

    Host-Microbiota Interactions Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK.

    Transmission of commensal intestinal bacteria between humans could promote health by establishing, maintaining and replenishing microbial diversity in the microbiota of an individual. Unlike pathogens, the routes of transmission for commensal bacteria remain unappreciated and poorly understood, despite the likely commonalities between both. Consequently, broad infection control measures that are designed to prevent pathogen transmission and infection, such as oversanitation and the overuse of antibiotics, may inadvertently affect human health by altering normal commensal transmission. In this Review, we discuss the mechanisms and factors that influence host-to-host transmission of the intestinal microbiota and examine how a better understanding of these processes will identify new approaches to nurture and restore transmission routes that are used by beneficial bacteria.

    Funded by: Medical Research Council: MR/K000551/1; Wellcome Trust: 098051

    Nature reviews. Microbiology 2017;15;9;531-543

  • Expanding the clinical spectrum of recessive truncating mutations of KLHL7 to a Bohring-Opitz-like phenotype.

    Bruel AL, Bigoni S, Kennedy J, Whiteford M, Buxton C, Parmeggiani G, Wherlock M, Woodward G, Greenslade M, Williams M, St-Onge J, Ferlini A, Garani G, Ballardini E, van Bon BW, Acuna-Hidalgo R, Bohring A, Deleuze JF, Boland A, Meyer V, Olaso R, Ginglinger E, Study D, Rivière JB, Brunner HG, Hoischen A, Newbury-Ecob R, Faivre L, Thauvin-Robinet C and Thevenon J

    Inserm UMR 1231 GAD Team, Genetics of Developmental Anomalies, Université de Bourgogne-Franche Comté, Dijon, France.

    Background: Bohring-Opitz syndrome (BOS) is a rare genetic disorder characterised by a recognisable craniofacial appearance and a typical 'BOS' posture. BOS is caused by sporadic mutations of<i>ASXL1</i>. However, several typical patients with BOS have no molecular diagnosis, suggesting clinical and genetic heterogeneity.

    Objectives: To expand the phenotypical spectrum of autosomal recessive variants of <i>KLHL7</i>, reported as causing Crisponi syndrome/cold-induced sweating syndrome type 1 (CS/CISS1)-like syndrome.

    Methods: We performed whole-exome sequencing in two families with a suspected recessive mode of inheritance. We used the Matchmaker Exchange initiative to identify additional patients.

    Results: Here, we report six patients with microcephaly, facial dysmorphism, including exophthalmos, nevus flammeus of the glabella and joint contractures with a suspected BOS posture in five out of six patients. We identified autosomal recessive truncating mutations in the <i>KLHL7</i> gene. <i>KLHL7</i> encodes a BTB-kelch protein implicated in the cell cycle and in protein degradation by the ubiquitin-proteasome pathway. Recently, biallelic mutations in the <i>KLHL7</i> gene were reported in four families and associated with CS/CISS1, characterised by clinical features overlapping with our patients.

    Conclusion: We have expanded the clinical spectrum of <i>KLHL7</i> autosomal recessive variants by describing a syndrome with features overlapping CS/CISS1 and BOS.

    Journal of medical genetics 2017;54;12;830-835

  • Antibody-independent mechanisms regulate the establishment of chronic Plasmodium infection.

    Brugat T, Reid AJ, Lin J, Cunningham D, Tumwine I, Kushinga G, McLaughlin S, Spence P, Böhme U, Sanders M, Conteh S, Bushell E, Metcalf T, Billker O, Duffy PE, Newbold C, Berriman M and Langhorne J

    The Francis Crick institute, London NW1 1AT, UK.

    Malaria is caused by parasites of the genus Plasmodium. All human-infecting Plasmodium species can establish long-lasting chronic infections<sup>1-5</sup>, creating an infectious reservoir to sustain transmission<sup>1,6</sup>. It is widely accepted that the maintenance of chronic infection involves evasion of adaptive immunity by antigenic variation<sup>7</sup>. However, genes involved in this process have been identified in only two of five human-infecting species: Plasmodium falciparum and Plasmodium knowlesi. Furthermore, little is understood about the early events in the establishment of chronic infection in these species. Using a rodent model we demonstrate that from the infecting population, only a minority of parasites, expressing one of several clusters of virulence-associated pir genes, establishes a chronic infection. This process occurs in different species of parasites and in different hosts. Establishment of chronicity is independent of adaptive immunity and therefore different from the mechanism proposed for maintenance of chronic P. falciparum infections<sup>7-9</sup>. Furthermore, we show that the proportions of parasites expressing different types of pir genes regulate the time taken to establish a chronic infection. Because pir genes are common to most, if not all, species of Plasmodium<sup>10</sup>, this process may be a common way of regulating the establishment of chronic infections.

    Funded by: Medical Research Council: MC_EX_G0901345, MR/M003906/1; Wellcome Trust

    Nature microbiology 2017;2;16276

  • f-scLVM: scalable and versatile factor analysis for single-cell RNA-seq.

    Buettner F, Pratanwanich N, McCarthy DJ, Marioni JC and Stegle O

    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

    Single-cell RNA-sequencing (scRNA-seq) allows studying heterogeneity in gene expression in large cell populations. Such heterogeneity can arise due to technical or biological factors, making decomposing sources of variation difficult. We here describe f-scLVM (factorial single-cell latent variable model), a method based on factor analysis that uses pathway annotations to guide the inference of interpretable factors underpinning the heterogeneity. Our model jointly estimates the relevance of individual factors, refines gene set annotations, and infers factors without annotation. In applications to multiple scRNA-seq datasets, we find that f-scLVM robustly decomposes scRNA-seq datasets into interpretable components, thereby facilitating the identification of novel subpopulations.

    Funded by: Medical Research Council: MR/M01536X/1

    Genome biology 2017;18;1;212

  • Chromosome contacts in activated T cells identify autoimmune disease candidate genes.

    Burren OS, Rubio García A, Javierre BM, Rainbow DB, Cairns J, Cooper NJ, Lambourne JJ, Schofield E, Castro Dopico X, Ferreira RC, Coulson R, Burden F, Rowlston SP, Downes K, Wingett SW, Frontini M, Ouwehand WH, Fraser P, Spivakov M, Todd JA, Wicker LS, Cutler AJ and Wallace C

    Department of Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge, CB2 0SP, UK.

    Background: Autoimmune disease-associated variants are preferentially found in regulatory regions in immune cells, particularly CD4<sup>+</sup> T cells. Linking such regulatory regions to gene promoters in disease-relevant cell contexts facilitates identification of candidate disease genes.

    Results: Within 4 h, activation of CD4<sup>+</sup> T cells invokes changes in histone modifications and enhancer RNA transcription that correspond to altered expression of the interacting genes identified by promoter capture Hi-C. By integrating promoter capture Hi-C data with genetic associations for five autoimmune diseases, we prioritised 245 candidate genes with a median distance from peak signal to prioritised gene of 153 kb. Just under half (108/245) prioritised genes related to activation-sensitive interactions. This included IL2RA, where allele-specific expression analyses were consistent with its interaction-mediated regulation, illustrating the utility of the approach.

    Conclusions: Our systematic experimental framework offers an alternative approach to candidate causal gene identification for variants with cell state-specific functional effects, with achievable sample sizes.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/J004480/1; Medical Research Council: MC_UP_1302/5, MC_UU_00002/4, MR/L007150/1; NIDDK NIH HHS: U01 DK062418; Wellcome Trust: 089989, 091157, 107881

    Genome biology 2017;18;1;165

  • Functional Profiling of a Plasmodium Genome Reveals an Abundance of Essential Genes.

    Bushell E, Gomes AR, Sanderson T, Anar B, Girling G, Herd C, Metcalf T, Modrzynska K, Schwach F, Martin RE, Mather MW, McFadden GI, Parts L, Rutledge GG, Vaidya AB, Wengelnik K, Rayner JC and Billker O

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, UK.

    The genomes of malaria parasites contain many genes of unknown function. To assist drug development through the identification of essential genes and pathways, we have measured competitive growth rates in mice of 2,578 barcoded Plasmodium berghei knockout mutants, representing >50% of the genome, and created a phenotype database. At a single stage of its complex life cycle, P. berghei requires two-thirds of genes for optimal growth, the highest proportion reported from any organism and a probable consequence of functional optimization necessitated by genomic reductions during the evolution of parasitism. In contrast, extreme functional redundancy has evolved among expanded gene families operating at the parasite-host interface. The level of genetic redundancy in a single-celled organism may thus reflect the degree of environmental variation it experiences. In the case of Plasmodium parasites, this helps rationalize both the relative successes of drugs and the greater difficulty of making an effective vaccine.

    Funded by: Medical Research Council: G0501670; NIAID NIH HHS: R01 AI028398, R56 AI028398; Wellcome Trust

    Cell 2017;170;2;260-272.e8

  • Synergistic malaria vaccine combinations identified by systematic antigen screening.

    Bustamante LY, Powell GT, Lin YC, Macklin MD, Cross N, Kemp A, Cawkill P, Sanderson T, Crosnier C, Muller-Sienerth N, Doumbo OK, Traore B, Crompton PD, Cicuta P, Tran TM, Wright GJ and Rayner JC

    Malaria Programme, Wellcome Trust Sanger Institute, Cambridge CB10 1SA, United Kingdom.

    A highly effective vaccine would be a valuable weapon in the drive toward malaria elimination. No such vaccine currently exists, and only a handful of the hundreds of potential candidates in the parasite genome have been evaluated. In this study, we systematically evaluated 29 antigens likely to be involved in erythrocyte invasion, an essential developmental stage during which the malaria parasite is vulnerable to antibody-mediated inhibition. Testing antigens alone and in combination identified several strain-transcending targets that had synergistic combinatorial effects in vitro, while studies in an endemic population revealed that combinations of the same antigens were associated with protection from febrile malaria. Video microscopy established that the most effective combinations targeted multiple discrete stages of invasion, suggesting a mechanistic explanation for synergy. Overall, this study both identifies specific antigen combinations for high-priority clinical testing and establishes a generalizable approach that is more likely to produce effective vaccines.

    Funded by: Medical Research Council: MR/J002283/1; NCATS NIH HHS: KL2 TR000163; NIAID NIH HHS: K08 AI125682; Wellcome Trust: 090851

    Proceedings of the National Academy of Sciences of the United States of America 2017;114;45;12045-12050

  • Guideline for the investigation and management of eosinophilia.

    Butt NM, Lambert J, Ali S, Beer PA, Cross NC, Duncombe A, Ewing J, Harrison CN, Knapper S, McLornan D, Mead AJ, Radia D, Bain BJ and British Committee for Standards in Haematology

    Royal Liverpool and Broadgreen University Teaching Hospitals NHS Trust, Liverpool, UK.

    Funded by: Medical Research Council: G84/6443, MC_UU_12009/16, MR/L006340/1

    British journal of haematology 2017;176;4;553-572

  • Genetic loci associated with coronary artery disease harbor evidence of selection and antagonistic pleiotropy.

    Byars SG, Huang QQ, Gray LA, Bakshi A, Ripatti S, Abraham G, Stearns SC and Inouye M

    Centre for Systems Genomics, School of BioSciences, The University of Melbourne, Parkville, Victoria, Australia.

    Traditional genome-wide scans for positive selection have mainly uncovered selective sweeps associated with monogenic traits. While selection on quantitative traits is much more common, very few signals have been detected because of their polygenic nature. We searched for positive selection signals underlying coronary artery disease (CAD) in worldwide populations, using novel approaches to quantify relationships between polygenic selection signals and CAD genetic risk. We identified new candidate adaptive loci that appear to have been directly modified by disease pressures given their significant associations with CAD genetic risk. These candidates were all uniquely and consistently associated with many different male and female reproductive traits suggesting selection may have also targeted these because of their direct effects on fitness. We found that CAD loci are significantly enriched for lifetime reproductive success relative to the rest of the human genome, with evidence that the relationship between CAD and lifetime reproductive success is antagonistic. This supports the presence of antagonistic-pleiotropic tradeoffs on CAD loci and provides a novel explanation for the maintenance and high prevalence of CAD in modern humans. Lastly, we found that positive selection more often targeted CAD gene regulatory variants using HapMap3 lymphoblastoid cell lines, which further highlights the unique biological significance of candidate adaptive loci underlying CAD. Our study provides a novel approach for detecting selection on polygenic traits and evidence that modern human genomes have evolved in response to CAD-induced selection pressures and other early-life traits sharing pleiotropic links with CAD.

    PLoS genetics 2017;13;6;e1006328

  • 11,670 whole-genome sequences representative of the Han Chinese population from the CONVERGE project.

    Cai N, Bigdeli TB, Kretzschmar WW, Li Y, Liang J, Hu J, Peterson RE, Bacanu S, Webb BT, Riley B, Li Q, Marchini J, Mott R, Kendler KS and Flint J

    Wellcome Trust Centre for Human Genetics, OX3 7BN Oxford, UK.

    The China, Oxford and Virginia Commonwealth University Experimental Research on Genetic Epidemiology (CONVERGE) project on Major Depressive Disorder (MDD) sequenced 11,670 female Han Chinese at low-coverage (1.7X), providing the first large-scale whole genome sequencing resource representative of the largest ethnic group in the world. Samples are collected from 58 hospitals from 23 provinces around China. We are able to call 22 million high quality single nucleotide polymorphisms (SNP) from the nuclear genome, representing the largest SNP call set from an East Asian population to date. We use these variants for imputation of genotypes across all samples, and this has allowed us to perform a successful genome wide association study (GWAS) on MDD. The utility of these data can be extended to studies of genetic ancestry in the Han Chinese and evolutionary genetics when integrated with data from other populations. Molecular phenotypes, such as copy number variations and structural variations can be detected, quantified and analysed in similar ways.

    Funded by: European Research Council: 617306; NIMH NIH HHS: R01 MH100549, T32 MH020030; Wellcome Trust

    Scientific data 2017;4;170011

  • Comparative Genome Analysis and Global Phylogeny of the Toxin Variant Clostridium difficile PCR Ribotype 017 Reveals the Evolution of Two Independent Sublineages.

    Cairns MD, Preston MD, Hall CL, Gerding DN, Hawkey PM, Kato H, Kim H, Kuijper EJ, Lawley TD, Pituch H, Reid S, Kullin B, Riley TV, Solomon K, Tsai PJ, Weese JS, Stabler RA and Wren BW

    Department of Pathogen Molecular Biology, London School of Hygiene and Tropical Medicine, London, United Kingdom.

    The diarrheal pathogen <i>Clostridium difficile</i> consists of at least six distinct evolutionary lineages. The RT017 lineage is anomalous, as strains only express toxin B, compared to strains from other lineages that produce toxins A and B and, occasionally, binary toxin. Historically, RT017 initially was reported in Asia but now has been reported worldwide. We used whole-genome sequencing and phylogenetic analysis to investigate the patterns of global spread and population structure of 277 RT017 isolates from animal and human origins from six continents, isolated between 1990 and 2013. We reveal two distinct evenly split sublineages (SL1 and SL2) of <i>C. difficile</i> RT017 that contain multiple independent clonal expansions. All 24 animal isolates were contained within SL1 along with human isolates, suggesting potential transmission between animals and humans. Genetic analyses revealed an overrepresentation of antibiotic resistance genes. Phylogeographic analyses show a North American origin for RT017, as has been found for the recently emerged epidemic RT027 lineage. Despite having only one toxin, RT017 strains have evolved in parallel from at least two independent sources and can readily transmit between continents.

    Funded by: Medical Research Council: G1000214, MR/K000551/1; Wellcome Trust

    Journal of clinical microbiology 2017;55;3;865-876

  • Appraising the relevance of DNA copy number loss and gain in prostate cancer using whole genome DNA sequence data.

    Camacho N, Van Loo P, Edwards S, Kay JD, Matthews L, Haase K, Clark J, Dennis N, Thomas S, Kremeyer B, Zamora J, Butler AP, Gundem G, Merson S, Luxton H, Hawkins S, Ghori M, Marsden L, Lambert A, Karaszi K, Pelvender G, Massie CE, Kote-Jarai Z, Raine K, Jones D, Howat WJ, Hazell S, Livni N, Fisher C, Ogden C, Kumar P, Thompson A, Nicol D, Mayer E, Dudderidge T, Yu Y, Zhang H, Shah NC, Gnanapragasam VJ, CRUK-ICGC Prostate Group, Isaacs W, Visakorpi T, Hamdy F, Berney D, Verrill C, Warren AY, Wedge DC, Lynch AG, Foster CS, Lu YJ, Bova GS, Whitaker HC, McDermott U, Neal DE, Eeles R, Cooper CS and Brewer DS

    Division of Genetics and Epidemiology, The Institute Of Cancer Research, London, United Kingdom.

    A variety of models have been proposed to explain regions of recurrent somatic copy number alteration (SCNA) in human cancer. Our study employs Whole Genome DNA Sequence (WGS) data from tumor samples (n = 103) to comprehensively assess the role of the Knudson two hit genetic model in SCNA generation in prostate cancer. 64 recurrent regions of loss and gain were detected, of which 28 were novel, including regions of loss with more than 15% frequency at Chr4p15.2-p15.1 (15.53%), Chr6q27 (16.50%) and Chr18q12.3 (17.48%). Comprehensive mutation screens of genes, lincRNA encoding sequences, control regions and conserved domains within SCNAs demonstrated that a two-hit genetic model was supported in only a minor proportion of recurrent SCNA losses examined (15/40). We found that recurrent breakpoints and regions of inversion often occur within Knudson model SCNAs, leading to the identification of ZNF292 as a target gene for the deletion at 6q14.3-q15 and NKX3.1 as a two-hit target at 8p21.3-p21.2. The importance of alterations of lincRNA sequences was illustrated by the identification of a novel mutational hotspot at the KCCAT42, FENDRR, CAT1886 and STCAT2 loci at the 16q23.1-q24.3 loss. Our data confirm that the burden of SCNAs is predictive of biochemical recurrence, define nine individual regions that are associated with relapse, and highlight the possible importance of ion channel and G-protein coupled-receptor (GPCR) pathways in cancer development. We concluded that a two-hit genetic model accounts for about one third of SCNA indicating that mechanisms, such haploinsufficiency and epigenetic inactivation, account for the remaining SCNA losses.

    Funded by: Cancer Research UK: 14835, A11566, A14835; Medical Research Council: G0500966

    PLoS genetics 2017;13;9;e1007001

  • Cliques and Schisms of Cancer Genes.

    Campbell PJ

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK; Department of Haematology, University of Cambridge, Cambridge CB2 2XY, UK. Electronic address:

    With a few exceptions, cancers typically carry more than one driver mutation, sometimes five, ten, or more, and these driver mutations do not necessarily assort randomly. In this issue of Cancer Cell, Mina et al. systematically characterize patterns of co-mutation and mutual exclusivity in 6,456 cancers across 23 tumor types.

    Cancer cell 2017;32;2;129-130

  • CamOptimus: a tool for exploiting complex adaptive evolution to optimize experiments and processes in biotechnology.

    Cankorur-Cetinkaya A, Dias JML, Kludas J, Slater NKH, Rousu J, Oliver SG and Dikicioglu D

    Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK.

    Multiple interacting factors affect the performance of engineered biological systems in synthetic biology projects. The complexity of these biological systems means that experimental design should often be treated as a multiparametric optimization problem. However, the available methodologies are either impractical, due to a combinatorial explosion in the number of experiments to be performed, or are inaccessible to most experimentalists due to the lack of publicly available, user-friendly software. Although evolutionary algorithms may be employed as alternative approaches to optimize experimental design, the lack of simple-to-use software again restricts their use to specialist practitioners. In addition, the lack of subsidiary approaches to further investigate critical factors and their interactions prevents the full analysis and exploitation of the biotechnological system. We have addressed these problems and, here, provide a simple-to-use and freely available graphical user interface to empower a broad range of experimental biologists to employ complex evolutionary algorithms to optimize their experimental designs. Our approach exploits a Genetic Algorithm to discover the subspace containing the optimal combination of parameters, and Symbolic Regression to construct a model to evaluate the sensitivity of the experiment to each parameter under investigation. We demonstrate the utility of this method using an example in which the culture conditions for the microbial production of a bioactive human protein are optimized. CamOptimus is available through: (

    Funded by: Biotechnology and Biological Sciences Research Council

    Microbiology (Reading, England) 2017;163;6;829-839

  • Single-cell transcriptome analysis of fish immune cells provides insight into the evolution of vertebrate immune cell types.

    Carmona SJ, Teichmann SA, Ferreira L, Macaulay IC, Stubbington MJ, Cvejic A and Gfeller D

    Ludwig Center for Cancer Research, University of Lausanne, 1066 Epalinges, Switzerland.

    The immune system of vertebrate species consists of many different cell types that have distinct functional roles and are subject to different evolutionary pressures. Here, we first analyzed conservation of genes specific for all major immune cell types in human and mouse. Our results revealed higher gene turnover and faster evolution of <i>trans</i>-membrane proteins in NK cells compared with other immune cell types, and especially T cells, but similar conservation of nuclear and cytoplasmic protein coding genes. To validate these findings in a distant vertebrate species, we used single-cell RNA sequencing of <i>lck:GFP</i> cells in zebrafish and obtained the first transcriptome of specific immune cell types in a nonmammalian species. Unsupervised clustering and single-cell <i>TCR</i> locus reconstruction identified three cell populations, T cells, a novel type of NK-like cells, and a smaller population of myeloid-like cells. Differential expression analysis uncovered new immune-cell-specific genes, including novel immunoglobulin-like receptors, and neofunctionalization of recently duplicated paralogs. Evolutionary analyses confirmed the higher gene turnover of <i>trans</i>-membrane proteins in NK cells compared with T cells in fish species, suggesting that this is a general property of immune cell types across all vertebrates.

    Funded by: Cancer Research UK: C45041/A14953 ; European Research Council: 677501; Medical Research Council: MC_PC_12009; Wellcome Trust

    Genome research 2017;27;3;451-461

  • 'Basic and Applied Thermogenesis Research' Bridging the Gap.

    Carobbio S, Guénantin AC and Vidal-Puig A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK; Metabolic Research Laboratories, Addenbrooke's Treatment Centre, Institute of Metabolic Science, Addenbrooke's Hospital, University of Cambridge, Cambridge, UK. Electronic address:

    Obesity is a major health problem without satisfactory pharmacological treatment. A promising strategy is to promote energy dissipation by activating brown/beige adipose tissue. However, for this strategy to succeed it requires improving the transferability amongst cellular, murine, and human systems and bridging the gap between basic and clinical research.

    Funded by: British Heart Foundation: PG/12/53/29714; Medical Research Council: MC_UU_00014/2, MC_UU_00014/5, MC_UU_12012/2

    Trends in endocrinology and metabolism: TEM 2017;29;1;5-7

  • Adipose Tissue Function and Expandability as Determinants of Lipotoxicity and the Metabolic Syndrome.

    Carobbio S, Pellegrinelli V and Vidal-Puig A

    Metabolic Research Laboratories, Wellcome Trust-MRC Institute of Metabolic Science, University of Cambridge, Addenbrooke's Hospital, Box 289, Cambridge, CB2 OQQ, UK.

    The adipose tissue organ is organised as distinct anatomical depots located all along the body axis and it is constituted of three different types of adipocytes : white, beige and brown which are integrated with vascular, immune, neural and extracellular stroma cells. These distinct adipocytes serve different specialised functions. The main function of white adipocytes is to ensure healthy storage of excess nutrients/energy and its rapid mobilisation to supply the demand of energy imposed by physiological cues in other organs, whereas brown and beige adipocytes are designed for heat production through uncoupling lipid oxidation from energy production. The concert action of the three type of adipocytes/tissues has been reported to ensure an optimal metabolic status in rodents. However, when one or multiple of these adipose depots become dysfunctional as a consequence of sustained lipid/nutrient overload, then insulin resistance and associated metabolic complications ensue. These metabolic alterations negatively affects the adipose tissue functionality and compromises global metabolic homeostasis. Optimising white adipose tissue expandability and its functional metabolic flexibility and/or promoting brown/beige mediated thermogenic activity counteracts obesity and its associated lipotoxic metabolic effects. The development of these therapeutic approaches requires a deep understanding of adipose tissue in all broad aspects. In this chapter we will discuss the characteristics of the different adipose tissue depots with respect to origins and precursors recruitment, plasticity, cellular composition and expandability capacity as well as molecular and metabolic signatures in both physiological and pathophysiological conditions.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/H002731/1; British Heart Foundation: RG/12/13/29853; Medical Research Council: MC_UU_12012/2

    Advances in experimental medicine and biology 2017;960;161-196

  • Genome-wide association study of nevirapine hypersensitivity in a sub-Saharan African HIV-infected population.

    Carr DF, Bourgeois S, Chaponda M, Takeshita LY, Morris AP, Castro EM, Alfirevic A, Jones AR, Rigden DJ, Haldenby S, Khoo S, Lalloo DG, Heyderman RS, Dandara C, Kampira E, van Oosterhout JJ, Ssali F, Munderi P, Novelli G, Borgiani P, Nelson MR, Holden A, Deloukas P and Pirmohamed M

    Department of Molecular and Clinical Pharmacology, University of Liverpool, Liverpool, UK.

    Background: The antiretroviral nevirapine is associated with hypersensitivity reactions in 6%-10% of patients, including hepatotoxicity, maculopapular exanthema, Stevens-Johnson syndrome (SJS) and toxic epidermal necrolysis (TEN).

    Objectives: To undertake a genome-wide association study (GWAS) to identify genetic predisposing factors for the different clinical phenotypes associated with nevirapine hypersensitivity.

    Methods: A GWAS was undertaken in a discovery cohort of 151 nevirapine-hypersensitive and 182 tolerant, HIV-infected Malawian adults. Replication of signals was determined in a cohort of 116 cases and 68 controls obtained from Malawi, Uganda and Mozambique. Interaction with ERAP genes was determined in patients positive for HLA-C*04:01 . In silico docking studies were also performed for HLA-C*04:01 .

    Results: Fifteen SNPs demonstrated nominal significance ( P  <   1 × 10 -5 ) with one or more of the hypersensitivity phenotypes. The most promising signal was seen in SJS/TEN, where rs5010528 ( HLA-C locus) approached genome-wide significance ( P  <   8.5 × 10 -8 ) and was below HLA -wide significance ( P  <   2.5 × 10 -4 ) in the meta-analysis of discovery and replication cohorts [OR 4.84 (95% CI 2.71-8.61)]. rs5010528 is a strong proxy for HLA-C*04:01 carriage: in silico docking showed that two residues (33 and 123) in the B pocket were the most likely nevirapine interactors. There was no interaction between HLA-C*04:01 and ERAP1 , but there is a potential protective effect with ERAP2 [ P  =   0.019, OR 0.43 (95% CI 0.21-0.87)].

    Conclusions: HLA-C*04:01 predisposes to nevirapine-induced SJS/TEN in sub-Saharan Africans, but not to other hypersensitivity phenotypes. This is likely to be mediated via binding to the B pocket of the HLA-C peptide. Whether this risk is modulated by ERAP2 variants requires further study.

    Funded by: Department of Health: II-LB-0313-20008; Medical Research Council: G0600344, G0900753, MR/K002279/1, MR/L006758/1; Wellcome Trust: WT078857MA, WT098017

    The Journal of antimicrobial chemotherapy 2017;72;4;1152-1162

  • TCTE1 is a conserved component of the dynein regulatory complex and is required for motility and metabolism in mouse spermatozoa.

    Castaneda JM, Hua R, Miyata H, Oji A, Guo Y, Cheng Y, Zhou T, Guo X, Cui Y, Shen B, Wang Z, Hu Z, Zhou Z, Sha J, Prunskaite-Hyyrylainen R, Yu Z, Ramirez-Solis R, Ikawa M, Matzuk MM and Liu M

    Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX 77030.

    Flagella and cilia are critical cellular organelles that provide a means for cells to sense and progress through their environment. The central component of flagella and cilia is the axoneme, which comprises the "9+2" microtubule arrangement, dynein arms, radial spokes, and the nexin-dynein regulatory complex (<i>N</i>-DRC). Failure to properly assemble components of the axoneme leads to defective flagella and in humans leads to a collection of diseases referred to as ciliopathies. Ciliopathies can manifest as severe syndromic diseases that affect lung and kidney function, central nervous system development, bone formation, visceral organ organization, and reproduction. T-Complex-Associated-Testis-Expressed 1 (TCTE1) is an evolutionarily conserved axonemal protein present from <i>Chlamydomonas</i> (DRC5) to mammals that localizes to the <i>N</i>-DRC. Here, we show that mouse TCTE1 is testis-enriched in its expression, with its mRNA appearing in early round spermatids and protein localized to the flagellum. TCTE1 is 498 aa in length with a leucine rich repeat domain at the C terminus and is present in eukaryotes containing a flagellum. Knockout of <i>Tcte1</i> results in male sterility because <i>Tcte1</i>-null spermatozoa show aberrant motility. Although the axoneme is structurally normal in <i>Tcte1</i> mutant spermatozoa, <i>Tcte1</i>-null sperm demonstrate a significant decrease of ATP, which is used by dynein motors to generate the bending force of the flagellum. These data provide a link to defining the molecular intricacies required for axoneme function, sperm motility, and male fertility.

    Funded by: NHGRI NIH HHS: U01 HG004080; NICHD NIH HHS: R01 HD088412; Wellcome Trust: 079643, 098051

    Proceedings of the National Academy of Sciences of the United States of America 2017;114;27;E5370-E5378

  • Transcriptional repression of Plxnc1 by Lmx1a and Lmx1b directs topographic dopaminergic circuit formation.

    Chabrat A, Brisson G, Doucet-Beaupré H, Salesse C, Schaan Profes M, Dovonou A, Akitegetse C, Charest J, Lemstra S, Côté D, Pasterkamp RJ, Abrudan MI, Metzakopian E, Ang SL and Lévesque M

    Department of Psychiatry and Neurosciences, Faculty of Medicine, Université Laval, Québec, Quebec, G1V 0A6, Canada.

    Mesodiencephalic dopamine neurons play central roles in the regulation of a wide range of brain functions, including voluntary movement and behavioral processes. These functions are served by distinct subtypes of mesodiencephalic dopamine neurons located in the substantia nigra pars compacta and the ventral tegmental area, which form the nigrostriatal, mesolimbic, and mesocortical pathways. Until now, mechanisms involved in dopaminergic circuit formation remained largely unknown. Here, we show that Lmx1a, Lmx1b, and Otx2 transcription factors control subtype-specific mesodiencephalic dopamine neurons and their appropriate axon innervation. Our results revealed that the expression of Plxnc1, an axon guidance receptor, is repressed by Lmx1a/b and enhanced by Otx2. We also found that Sema7a/Plxnc1 interactions are responsible for the segregation of nigrostriatal and mesolimbic dopaminergic pathways. These findings identify Lmx1a/b, Otx2, and Plxnc1 as determinants of dopaminergic circuit formation and should assist in engineering mesodiencephalic dopamine neurons capable of regenerating appropriate connections for cell therapy.Midbrain dopaminergic neurons (mDAs) in the VTA and SNpc project to different regions and form distinct circuits. Here the authors show that transcription factors Lmx1a, Lmx1b, and Otx2 control the axon guidance of mDAs and the segregation of mesolimbic and nigrostriatal dopaminergic pathways.

    Funded by: Parkinson's UK: G-0906; Wellcome Trust

    Nature communications 2017;8;1;933

  • Adaptation... that's what you need?

    Chaguza C and Bentley SD

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2017;15;8;452

  • Population genetic structure, antibiotic resistance, capsule switching and evolution of invasive pneumococci before conjugate vaccination in Malawi.

    Chaguza C, Cornick JE, Andam CP, Gladstone RA, Alaerts M, Musicha P, Peno C, Bar-Zeev N, Kamng'ona AW, Kiran AM, Msefula CL, McGee L, Breiman RF, Kadioglu A, French N, Heyderman RS, Hanage WP, Bentley SD and Everett DB

    Department of Clinical Infection, Microbiology and Immunology, Institute of Infection and Global Health, University of Liverpool, Liverpool, UK; Malawi-Liverpool-Wellcome Trust Clinical Research Programme, Blantyre, Malawi.

    Introduction: Pneumococcal infections cause a high death toll in Sub Saharan Africa (SSA) but the recently rolled out pneumococcal conjugate vaccines (PCV) will reduce the disease burden. To better understand the population impact of these vaccines, comprehensive analysis of large collections of pneumococcal isolates sampled prior to vaccination is required. Here we present a population genomic study of the invasive pneumococcal isolates sampled before the implementation of PCV13 in Malawi.

    Materials and methods: We retrospectively sampled and whole genome sequenced 585 invasive isolates from 2004 to 2010. We determine the pneumococcal population genetic structure and assessed serotype prevalence, antibiotic resistance rates, and the occurrence of serotype switching.

    Results: Population structure analysis revealed 22 genetically distinct sequence clusters (SCs), which consisted of closely related isolates. Serotype 1 (ST217), a vaccine-associated serotype in clade SC2, showed highest prevalence (19.3%), and was associated with the highest MDR rate (81.9%) followed by serotype 12F, a non-vaccine serotype in clade SC10 with an MDR rate of 57.9%. Prevalence of serotypes was stable prior to vaccination although there was an increase in the PMEN19 clone, serotype 5 ST289, in clade SC1 in 2010 suggesting a potential undetected local outbreak. Coalescent analysis revealed recent emergence of the SCs and there was evidence of natural capsule switching in the absence of vaccine induced selection pressure. Furthermore, majority of the highly prevalent capsule-switched isolates were associated with acquisition of vaccine-targeted capsules.

    Conclusions: This study provides descriptions of capsule-switched serotypes and serotypes with potential to cause serotype replacement post-vaccination such as 12F. Continued surveillance is critical to monitor these serotypes and antibiotic resistance in order to design better infection prevention and control measures such as inclusion of emerging replacement serotypes in future conjugate vaccines.

    Funded by: Medical Research Council: MR/R003076/1; Wellcome Trust: 084679/Z/08/Z, OPP1023440, OPP1034556

    Vaccine 2017;35;35 Pt B;4594-4602

  • The evolutionary and phylogeographic history of woolly mammoths: a comprehensive mitogenomic analysis.

    Chang D, Knapp M, Enk J, Lippold S, Kircher M, Lister A, MacPhee RD, Widga C, Czechowski P, Sommer R, Hodges E, Stümpel N, Barnes I, Dalén L, Derevianko A, Germonpré M, Hillebrand-Voiculescu A, Constantin S, Kuznetsova T, Mol D, Rathgeber T, Rosendahl W, Tikhonov AN, Willerslev E, Hannon G, Lalueza-Fox C, Joger U, Poinar H, Hofreiter M and Shapiro B

    Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, USA.

    Near the end of the Pleistocene epoch, populations of the woolly mammoth (Mammuthus primigenius) were distributed across parts of three continents, from western Europe and northern Asia through Beringia to the Atlantic seaboard of North America. Nonetheless, questions about the connectivity and temporal continuity of mammoth populations and species remain unanswered. We use a combination of targeted enrichment and high-throughput sequencing to assemble and interpret a data set of 143 mammoth mitochondrial genomes, sampled from fossils recovered from across their Holarctic range. Our dataset includes 54 previously unpublished mitochondrial genomes and significantly increases the coverage of the Eurasian range of the species. The resulting global phylogeny confirms that the Late Pleistocene mammoth population comprised three distinct mitochondrial lineages that began to diverge ~1.0-2.0 million years ago (Ma). We also find that mammoth mitochondrial lineages were strongly geographically partitioned throughout the Pleistocene. In combination, our genetic results and the pattern of morphological variation in time and space suggest that male-mediated gene flow, rather than large-scale dispersals, was important in the Pleistocene evolutionary history of mammoths.

    Scientific reports 2017;7;44585

  • THE REAL McCOIL: A method for the concurrent estimation of the complexity of infection and SNP allele frequency for malaria parasites.

    Chang HH, Worby CJ, Yeka A, Nankabirwa J, Kamya MR, Staedke SG, Dorsey G, Murphy M, Neafsey DE, Jeffreys AE, Hubbart C, Rockett KA, Amato R, Kwiatkowski DP, Buckee CO and Greenhouse B

    Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States.

    As many malaria-endemic countries move towards elimination of Plasmodium falciparum, the most virulent human malaria parasite, effective tools for monitoring malaria epidemiology are urgent priorities. P. falciparum population genetic approaches offer promising tools for understanding transmission and spread of the disease, but a high prevalence of multi-clone or polygenomic infections can render estimation of even the most basic parameters, such as allele frequencies, challenging. A previous method, COIL, was developed to estimate complexity of infection (COI) from single nucleotide polymorphism (SNP) data, but relies on monogenomic infections to estimate allele frequencies or requires external allele frequency data which may not available. Estimates limited to monogenomic infections may not be representative, however, and when the average COI is high, they can be difficult or impossible to obtain. Therefore, we developed THE REAL McCOIL, Turning HEterozygous SNP data into Robust Estimates of ALelle frequency, via Markov chain Monte Carlo, and Complexity Of Infection using Likelihood, to incorporate polygenomic samples and simultaneously estimate allele frequency and COI. This approach was tested via simulations then applied to SNP data from cross-sectional surveys performed in three Ugandan sites with varying malaria transmission. We show that THE REAL McCOIL consistently outperforms COIL on simulated data, particularly when most infections are polygenomic. Using field data we show that, unlike with COIL, we can distinguish epidemiologically relevant differences in COI between and within these sites. Surprisingly, for example, we estimated high average COI in a peri-urban subregion with lower transmission intensity, suggesting that many of these cases were imported from surrounding regions with higher transmission intensity. THE REAL McCOIL therefore provides a robust tool for understanding the molecular epidemiology of malaria across transmission settings.

    Funded by: FIC NIH HHS: D43 TW010132; Medical Research Council: MR/M006212/1; NIAID NIH HHS: U19 AI089674; NIGMS NIH HHS: U54 GM088558; Wellcome Trust: 090770/Z/09/Z, 098051

    PLoS computational biology 2017;13;1;e1005348

  • Defining the ABC of gene essentiality in streptococci.

    Charbonneau ARL, Forman OP, Cain AK, Newland G, Robinson C, Boursnell M, Parkhill J, Leigh JA, Maskell DJ and Waller AS

    Animal Health Trust, Lanwades Park, Newmarket, Suffolk, UK.

    Background: Utilising next generation sequencing to interrogate saturated bacterial mutant libraries provides unprecedented information for the assignment of genome-wide gene essentiality. Exposure of saturated mutant libraries to specific conditions and subsequent sequencing can be exploited to uncover gene essentiality relevant to the condition. Here we present a barcoded transposon directed insertion-site sequencing (TraDIS) system to define an essential gene list for Streptococcus equi subsp. equi, the causative agent of strangles in horses, for the first time. The gene essentiality data for this group C Streptococcus was compared to that of group A and B streptococci.

    Results: Six barcoded variants of pGh9:ISS1 were designed and used to generate mutant libraries containing between 33,000-66,000 unique mutants. TraDIS was performed on DNA extracted from each library and data were analysed separately and as a combined master pool. Gene essentiality determined that 19.5% of the S. equi genome was essential. Gene essentialities were compared to those of group A and group B streptococci, identifying concordances of 90.2% and 89.4%, respectively and an overall concordance of 83.7% between the three species.

    Conclusions: The use of barcoded pGh9:ISS1 to generate mutant libraries provides a highly useful tool for the assignment of gene function in S. equi and other streptococci. The shared essential gene set of group A, B and C streptococci provides further evidence of the close genetic relationships between these important pathogenic bacteria. Therefore, the ABC of gene essentiality reported here provides a solid foundation towards reporting the functional genome of streptococci.

    Funded by: Biotechnology and Biological Sciences Research Council: 1503883; Medical Research Council: G1100100

    BMC genomics 2017;18;1;426

  • The exported chaperone Hsp70-x supports virulence functions for Plasmodium falciparum blood stage parasites.

    Charnaud SC, Dixon MWA, Nie CQ, Chappell L, Sanders PR, Nebl T, Hanssen E, Berriman M, Chan JA, Blanch AJ, Beeson JG, Rayner JC, Przyborski JM, Tilley L, Crabb BS and Gilson PR

    Burnet Institute, Melbourne, Victoria, Australia.

    Malaria is caused by five different Plasmodium spp. in humans each of which modifies the host erythrocyte to survive and replicate. The two main causes of malaria, P. falciparum and P. vivax, differ in their ability to cause severe disease, mainly due to differences in the cytoadhesion of infected erythrocytes (IE) in the microvasculature. Cytoadhesion of P. falciparum in the brain leads to a large number of deaths each year and is a consequence of exported parasite proteins, some of which modify the erythrocyte cytoskeleton while others such as PfEMP1 project onto the erythrocyte surface where they bind to endothelial cells. Here we investigate the effects of knocking out an exported Hsp70-type chaperone termed Hsp70-x that is present in P. falciparum but not P. vivax. Although the growth of Δhsp70-x parasites was unaffected, the export of PfEMP1 cytoadherence proteins was delayed and Δhsp70-x IE had reduced adhesion. The Δhsp70-x IE were also more rigid than wild-type controls indicating changes in the way the parasites modified their host erythrocyte. To investigate the cause of this, transcriptional and translational changes in exported and chaperone proteins were monitored and some changes were observed. We propose that PfHsp70-x is not essential for survival in vitro, but may be required for the efficient export and functioning of some P. falciparum exported proteins.

    PloS one 2017;12;7;e0181656

  • "Like sugar in milk": reconstructing the genetic history of the Parsi population.

    Chaubey G, Ayub Q, Rai N, Prakash S, Mushrif-Tripathy V, Mezzavilla M, Pathak AK, Tamang R, Firasat S, Reidla M, Karmin M, Rani DS, Reddy AG, Parik J, Metspalu E, Rootsi S, Dalal K, Khaliq S, Mehdi SQ, Singh L, Metspalu M, Kivisild T, Tyler-Smith C, Villems R and Thangaraj K

    Evolutionary Biology Group, Estonian Biocentre, Riia23b, Tartu, 51010, Estonia.

    Background: The Parsis are one of the smallest religious communities in the world. To understand the population structure and demographic history of this group in detail, we analyzed Indian and Pakistani Parsi populations using high-resolution genetic variation data on autosomal and uniparental loci (Y-chromosomal and mitochondrial DNA). Additionally, we also assayed mitochondrial DNA polymorphisms among ancient Parsi DNA samples excavated from Sanjan, in present day Gujarat, the place of their original settlement in India.

    Results: Among present-day populations, the Parsis are genetically closest to Iranian and the Caucasus populations rather than their South Asian neighbors. They also share the highest number of haplotypes with present-day Iranians and we estimate that the admixture of the Parsis with Indian populations occurred ~1,200 years ago. Enriched homozygosity in the Parsi reflects their recent isolation and inbreeding. We also observed 48% South-Asian-specific mitochondrial lineages among the ancient samples, which might have resulted from the assimilation of local females during the initial settlement. Finally, we show that Parsis are genetically closer to Neolithic Iranians than to modern Iranians, who have witnessed a more recent wave of admixture from the Near East.

    Conclusions: Our results are consistent with the historically-recorded migration of the Parsi populations to South Asia in the 7th century and in agreement with their assimilation into the Indian sub-continent's population and cultural milieu "like sugar in milk". Moreover, in a wider context our results support a major demographic transition in West Asia due to the Islamic conquest.

    Funded by: Wellcome Trust

    Genome biology 2017;18;1;110

  • Transposon insertional mutagenesis in mice identifies human breast cancer susceptibility genes and signatures for stratification.

    Chen L, Jenjaroenpun P, Pillai AM, Ivshina AV, Ow GS, Efthimios M, Zhiqun T, Tan TZ, Lee SC, Rogers K, Ward JM, Mori S, Adams DJ, Jenkins NA, Copeland NG, Ban KH, Kuznetsov VA and Thiery JP

    Institute of Molecular and Cell Biology, Singapore 138673.

    Robust prognostic gene signatures and therapeutic targets are difficult to derive from expression profiling because of the significant heterogeneity within breast cancer (BC) subtypes. Here, we performed forward genetic screening in mice using Sleeping Beauty transposon mutagenesis to identify candidate BC driver genes in an unbiased manner, using a stabilized N-terminal truncated β-catenin gene as a sensitizer. We identified 134 mouse susceptibility genes from 129 common insertion sites within 34 mammary tumors. Of these, 126 genes were orthologous to protein-coding genes in the human genome (hereafter, human BC susceptibility genes, hBCSGs), 70% of which are previously reported cancer-associated genes, and ∼16% are known BC suppressor genes. Network analysis revealed a gene hub consisting of E1A binding protein P300 (<i>EP300</i>), CD44 molecule (<i>CD44</i>), neurofibromin (<i>NF1</i>) and phosphatase and tensin homolog (<i>PTEN</i>), which are linked to a significant number of mutated hBCSGs. From our survival prediction analysis of the expression of human BC genes in 2,333 BC cases, we isolated a six-gene-pair classifier that stratifies BC patients with high confidence into prognostically distinct low-, moderate-, and high-risk subgroups. Furthermore, we proposed prognostic classifiers identifying three basal and three claudin-low tumor subgroups. Intriguingly, our hBCSGs are mostly unrelated to cell cycle/mitosis genes and are distinct from the prognostic signatures currently used for stratifying BC patients. Our findings illustrate the strength and validity of integrating functional mutagenesis screens in mice with human cancer transcriptomic data to identify highly prognostic BC subtyping biomarkers.

    Funded by: Cancer Research UK: 13031

    Proceedings of the National Academy of Sciences of the United States of America 2017;114;11;E2215-E2224

  • Pan-cancer analysis of homozygous deletions in primary tumours uncovers rare tumour suppressors.

    Cheng J, Demeulemeester J, Wedge DC, Vollan HKM, Pitt JJ, Russnes HG, Pandey BP, Nilsen G, Nord S, Bignell GR, White KP, Børresen-Dale AL, Campbell PJ, Kristensen VN, Stratton MR, Lingjærde OC, Moreau Y and Van Loo P

    Department of Electrical Engineering (ESAT) and iMinds Future Health Department, University of Leuven, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium.

    Homozygous deletions are rare in cancers and often target tumour suppressor genes. Here, we build a compendium of 2218 primary tumours across 12 human cancer types and systematically screen for homozygous deletions, aiming to identify rare tumour suppressors. Our analysis defines 96 genomic regions recurrently targeted by homozygous deletions. These recurrent homozygous deletions occur either over tumour suppressors or over fragile sites, regions of increased genomic instability. We construct a statistical model that separates fragile sites from regions showing signatures of positive selection for homozygous deletions and identify candidate tumour suppressors within those regions. We find 16 established tumour suppressors and propose 27 candidate tumour suppressors. Several of these genes (including MGMT, RAD17, and USP44) show prior evidence of a tumour suppressive function. Other candidate tumour suppressors, such as MAFTRR, KIAA1551, and IGF2BP2, are novel. Our study demonstrates how rare tumour suppressors can be identified through copy number meta-analysis.

    Funded by: Cancer Research UK: FC001202; Medical Research Council: FC001202; NIGMS NIH HHS: T32 GM007197; Wellcome Trust: FC001202

    Nature communications 2017;8;1;1221

  • Functional variation in allelic methylomes underscores a strong genetic contribution and reveals novel epigenetic alterations in the human epigenome.

    Cheung WA, Shao X, Morin A, Siroux V, Kwan T, Ge B, Aïssi D, Chen L, Vasquez L, Allum F, Guénard F, Bouzigon E, Simon MM, Boulier E, Redensek A, Watt S, Datta A, Clarke L, Flicek P, Mead D, Paul DS, Beck S, Bourque G, Lathrop M, Tchernof A, Vohl MC, Demenais F, Pin I, Downes K, Stunnenberg HG, Soranzo N, Pastinen T and Grundberg E

    Department of Human Genetics, McGill University, Montreal, Quebec, Canada.

    Background: The functional impact of genetic variation has been extensively surveyed, revealing that genetic changes correlated to phenotypes lie mostly in non-coding genomic regions. Studies have linked allele-specific genetic changes to gene expression, DNA methylation, and histone marks but these investigations have only been carried out in a limited set of samples.

    Results: We describe a large-scale coordinated study of allelic and non-allelic effects on DNA methylation, histone mark deposition, and gene expression, detecting the interrelations between epigenetic and functional features at unprecedented resolution. We use information from whole genome and targeted bisulfite sequencing from 910 samples to perform genotype-dependent analyses of allele-specific methylation (ASM) and non-allelic methylation (mQTL). In addition, we introduce a novel genotype-independent test to detect methylation imbalance between chromosomes. Of the ~2.2 million CpGs tested for ASM, mQTL, and genotype-independent effects, we identify ~32% as being genetically regulated (ASM or mQTL) and ~14% as being putatively epigenetically regulated. We also show that epigenetically driven effects are strongly enriched in repressed regions and near transcription start sites, whereas the genetically regulated CpGs are enriched in enhancers. Known imprinted regions are enriched among epigenetically regulated loci, but we also observe several novel genomic regions (e.g., HOX genes) as being epigenetically regulated. Finally, we use our ASM datasets for functional interpretation of disease-associated loci and show the advantage of utilizing naïve T cells for understanding autoimmune diseases.

    Conclusions: Our rich catalogue of haploid methylomes across multiple tissues will allow validation of epigenome association studies and exploration of new biological models for allelic exclusion in the human genome.

    Funded by: British Heart Foundation: RG/08/014/24067, SP/09/002; CIHR: EP1-120608, TEC-128093; Medical Research Council: G0800270, MR/L003120/1

    Genome biology 2017;18;1;50

  • Global and regional dissemination and evolution of Burkholderia pseudomallei.

    Chewapreecha C, Holden MT, Vehkala M, Välimäki N, Yang Z, Harris SR, Mather AE, Tuanyok A, De Smet B, Le Hello S, Bizet C, Mayo M, Wuthiekanun V, Limmathurotsakul D, Phetsouvanh R, Spratt BG, Corander J, Keim P, Dougan G, Dance DA, Currie BJ, Parkhill J and Peacock SJ

    Department of Medicine, University of Cambridge, CB2 0QQ, UK.

    The environmental bacterium Burkholderia pseudomallei causes an estimated 165,000 cases of human melioidosis per year worldwide and is also classified as a biothreat agent. We used whole genome sequences of 469 B. pseudomallei isolates from 30 countries collected over 79 years to explore its geographic transmission. Our data point to Australia as an early reservoir, with transmission to Southeast Asia followed by onward transmission to South Asia and East Asia. Repeated reintroductions were observed within the Malay Peninsula and between countries bordered by the Mekong River. Our data support an African origin of the Central and South American isolates with introduction of B. pseudomallei into the Americas between 1650 and 1850, providing a temporal link with the slave trade. We also identified geographically distinct genes/variants in Australasian or Southeast Asian isolates alone, with virulence-associated genes being among those over-represented. This provides a potential explanation for clinical manifestations of melioidosis that are geographically restricted.

    Funded by: Wellcome Trust

    Nature microbiology 2017;2;16263

  • Whole-genome view of the consequences of a population bottleneck using 2926 genome sequences from Finland and United Kingdom.

    Chheda H, Palta P, Pirinen M, McCarthy S, Walter K, Koskinen S, Salomaa V, Daly M, Durbin R, Palotie A, Aittokallio T and Ripatti S

    Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland.

    Isolated populations with enrichment of variants due to recent population bottlenecks provide a powerful resource for identifying disease-associated genetic variants and genes. As a model of an isolate population, we sequenced the genomes of 1463 Finnish individuals as part of the Sequencing Initiative Suomi (SISu) Project. We compared the genomic profiles of the 1463 Finns to a sample of 1463 British individuals that were sequenced in parallel as part of the UK10K Project. Whereas there were no major differences in the allele frequency of common variants, a significant depletion of variants in the rare frequency spectrum was observed in Finns when comparing the two populations. On the other hand, we observed >2.1 million variants that were twice as frequent among Finns compared with Britons and 800 000 variants that were more than 10 times more frequent in Finns. Furthermore, in Finns we observed a relative proportional enrichment of variants in the minor allele frequency range between 2 and 5% (P<2.2 × 10(-16)). When stratified by their functional annotations, loss-of-function variants showed the highest proportional enrichment in Finns (P=0.0291). In the non-coding part of the genome, variants in conserved regions (P=0.002) and promoters (P=0.01) were also significantly enriched in the Finnish samples. These functional categories represent the highest a priori power for downstream association studies of rare variants using population isolates.

    Funded by: Wellcome Trust

    European journal of human genetics : EJHG 2017;25;4;477-484

  • MRSA Transmission Dynamics Among Interconnected Acute, Intermediate-Term, and Long-Term Healthcare Facilities in Singapore.

    Chow A, Lim VW, Khan A, Pettigrew K, Lye DCB, Kanagasabai K, Phua K, Krishnan P, Ang B, Marimuthu K, Hon PY, Koh J, Leong I, Parkhill J, Hsu LY and Holden MTG

    Departments of Clinical Epidemiology and.

    Background: Methicillin-resistant Staphylococcus aureus (MRSA) is the most common healthcare-associated multidrug-resistant organism. Despite the interconnectedness between acute care hospitals (ACHs) and intermediate- and long-term care facilities (ILTCFs), the transmission dynamics of MRSA between healthcare settings is not well understood.

    Methods: We conducted a cross-sectional study in a network comprising an ACH and 5 closely affiliated ILTCFs in Singapore. A total of 1700 inpatients were screened for MRSA over a 6-week period in 2014. MRSA isolates underwent whole-genome sequencing, with a pairwise single-nucleotide polymorphism (Hamming distance) cutoff of 60 core genome single-nucleotide polymorphisms used to define recent transmission clusters (clades) for the 3 major clones.

    Results: MRSA prevalence was significantly higher in intermediate-term (29.9%) and long-term (20.4%) care facilities than in the ACH (11.8%) (P < .001). The predominant clones were sequence type [ST] 22 (n = 183; 47.8%), ST45 (n = 129; 33.7%), and ST239 (n = 26; 6.8%), with greater diversity of STs in ILTCFs relative to the ACH. A large proportion of the clades in ST22 (14 of 21 clades; 67%) and ST45 (7 of 13; 54%) included inpatients from the ACH and ILTCFs. The most frequent source of the interfacility transmissions was the ACH (n = 28 transmission events; 36.4%).

    Conclusions: MRSA transmission dynamics between the ACH and ILTCFs were complex. The greater diversity of STs in ILTCFs suggests that the ecosystem in such settings might be more conducive for intrafacility transmission events. ST22 and ST45 have successfully established themselves in ILTCFs. The importance of interconnected infection prevention and control measures and strategies cannot be overemphasized.

    Clinical infectious diseases : an official publication of the Infectious Diseases Society of America 2017;64;suppl_2;S76-S81

  • Pathways to understanding the genomic aetiology of osteoarthritis.

    Cibrián Uhalte E, Wilkinson JM, Southam L and Zeggini E

    Human Genetics and Cellular Genetics, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Osteoarthritis is a common, complex disease with no curative therapy. In this review, we summarize current knowledge on disease aetiopathogenesis and outline genetics and genomics approaches that are helping catalyse a much-needed improved understanding of the biological underpinning of disease development and progression.

    Human molecular genetics 2017;26;R2;R193-R201

  • Culture adaptation of malaria parasites selects for convergent loss-of-function mutants.

    Claessens A, Affara M, Assefa SA, Kwiatkowski DP and Conway DJ

    London School of Hygiene and Tropical Medicine, London, UK.

    Cultured human pathogens may differ significantly from source populations. To investigate the genetic basis of laboratory adaptation in malaria parasites, clinical Plasmodium falciparum isolates were sampled from patients and cultured in vitro for up to three months. Genome sequence analysis was performed on multiple culture time point samples from six monoclonal isolates, and single nucleotide polymorphism (SNP) variants emerging over time were detected. Out of a total of five positively selected SNPs, four represented nonsense mutations resulting in stop codons, three of these in a single ApiAP2 transcription factor gene, and one in SRPK1. To survey further for nonsense mutants associated with culture, genome sequences of eleven long-term laboratory-adapted parasite strains were examined, revealing four independently acquired nonsense mutations in two other ApiAP2 genes, and five in Epac. No mutants of these genes exist in a large database of parasite sequences from uncultured clinical samples. This implicates putative master regulator genes in which multiple independent stop codon mutations have convergently led to culture adaptation, affecting most laboratory lines of P. falciparum. Understanding the adaptive processes should guide development of experimental models, which could include targeted gene disruption to adapt fastidious malaria parasite species to culture.

    Scientific reports 2017;7;41303

  • Genome-wide base-resolution mapping of DNA methylation in single cells using single-cell bisulfite sequencing (scBS-seq).

    Clark SJ, Smallwood SA, Lee HJ, Krueger F, Reik W and Kelsey G

    Epigenetics Programme, Babraham Institute, Cambridge, UK.

    DNA methylation (DNAme) is an important epigenetic mark in diverse species. Our current understanding of DNAme is based on measurements from bulk cell samples, which obscures intercellular differences and prevents analyses of rare cell types. Thus, the ability to measure DNAme in single cells has the potential to make important contributions to the understanding of several key biological processes, such as embryonic development, disease progression and aging. We have recently reported a method for generating genome-wide DNAme maps from single cells, using single-cell bisulfite sequencing (scBS-seq), allowing the quantitative measurement of DNAme at up to 50% of CpG dinucleotides throughout the mouse genome. Here we present a detailed protocol for scBS-seq that includes our most recent developments to optimize recovery of CpGs, mapping efficiency and success rate; reduce hands-on time; and increase sample throughput with the option of using an automated liquid handler. We provide step-by-step instructions for each stage of the method, comprising cell lysis and bisulfite (BS) conversion, preamplification and adaptor tagging, library amplification, sequencing and, lastly, alignment and methylation calling. An individual with relevant molecular biology expertise can complete library preparation within 3 d. Subsequent computational steps require 1-3 d for someone with bioinformatics expertise.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council: MR/K011332/1; Wellcome Trust

    Nature protocols 2017;12;3;534-547

  • Characterisation of the opposing effects of G6PD deficiency on cerebral malaria and severe malarial anaemia.

    Clarke GM, Rockett K, Kivinen K, Hubbart C, Jeffreys AE, Rowlands K, Jallow M, Conway DJ, Bojang KA, Pinder M, Usen S, Sisay-Joof F, Sirugo G, Toure O, Thera MA, Konate S, Sissoko S, Niangaly A, Poudiougou B, Mangano VD, Bougouma EC, Sirima SB, Modiano D, Amenga-Etego LN, Ghansah A, Koram KA, Wilson MD, Enimil A, Evans J, Amodu OK, Olaniyan S, Apinjoh T, Mugri R, Ndi A, Ndila CM, Uyoga S, Macharia A, Peshu N, Williams TN, Manjurano A, Sepúlveda N, Clark TG, Riley E, Drakeley C, Reyburn H, Nyirongo V, Kachala D, Molyneux M, Dunstan SJ, Phu NH, Quyen NN, Thai CQ, Hien TT, Manning L, Laman M, Siba P, Karunajeewa H, Allen S, Allen A, Davis TM, Michon P, Mueller I, Molloy SF, Campino S, Kerasidou A, Cornelius VJ, Hart L, Shah SS, Band G, Spencer CC, Agbenyega T, Achidi E, Doumbo OK, Farrar J, Marsh K, Taylor T, Kwiatkowski DP and MalariaGEN Consortium

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.

    Glucose-6-phosphate dehydrogenase (G6PD) deficiency is believed to confer protection against <i>Plasmodium falciparum</i> malaria, but the precise nature of the protective effecthas proved difficult to define as G6PD deficiency has multiple allelic variants with different effects in males and females, and it has heterogeneous effects on the clinical outcome of <i>P. falciparum</i> infection. Here we report an analysis of multiple allelic forms of G6PD deficiency in a large multi-centre case-control study of severe malaria, using the WHO classification of G6PD mutations to estimate each individual's level of enzyme activity from their genotype. Aggregated across all genotypes, we find that increasing levels of G6PD deficiency are associated with decreasing risk of cerebral malaria, but with increased risk of severe malarial anaemia. Models of balancing selection based on these findings indicate that an evolutionary trade-off between different clinical outcomes of <i>P. falciparum</i> infection could have been a major cause of the high levels of G6PD polymorphism seen in human populations.

    Funded by: Medical Research Council: G0600230, G0600718, G19/9, MC_UP_A900_1118, MR/M006212/1; Wellcome Trust

    eLife 2017;6

  • Longitudinal genomic surveillance of MRSA in the UK reveals transmission patterns in hospitals and the community.

    Coll F, Harrison EM, Toleman MS, Reuter S, Raven KE, Blane B, Palmer B, Kappeler ARM, Brown NM, Török ME, Parkhill J and Peacock SJ

    London School of Hygiene and Tropical Medicine, London, UK.

    Genome sequencing has provided snapshots of the transmission of methicillin-resistant <i>Staphylococcus aureus</i> (MRSA) during suspected outbreaks in isolated hospital wards. Scale-up to populations is now required to establish the full potential of this technology for surveillance. We prospectively identified all individuals over a 12-month period who had at least one MRSA-positive sample processed by a routine diagnostic microbiology laboratory in the East of England, which received samples from three hospitals and 75 general practitioner (GP) practices. We sequenced at least 1 MRSA isolate from 1465 individuals (2282 MRSA isolates) and recorded epidemiological data. An integrated epidemiological and phylogenetic analysis revealed 173 transmission clusters containing between 2 and 44 cases and involving 598 people (40.8%). Of these, 118 clusters (371 people) involved hospital contacts alone, 27 clusters (72 people) involved community contacts alone, and 28 clusters (157 people) had both types of contact. Community- and hospital-associated MRSA lineages were equally capable of transmission in the community, with instances of spread in households, long-term care facilities, and GP practices. Our study provides a comprehensive picture of MRSA transmission in a sampled population of 1465 people and suggests the need to review existing infection control policy and practice.

    Funded by: Medical Research Council: G1000803, MR/N029399/1; Wellcome Trust: 098051, 201344

    Science translational medicine 2017;9;413

  • Global, site-specific analysis of neuronal protein S-acylation.

    Collins MO, Woodley KT and Choudhary JS

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK.

    Protein S-acylation (palmitoylation) is a reversible lipid modification that is an important regulator of dynamic membrane-protein interactions. Proteomic approaches have uncovered many putative palmitoylated proteins however, methods for comprehensive palmitoylation site characterization are lacking. We demonstrate a quantitative site-specific-Acyl-Biotin-Exchange (ssABE) method that allowed the identification of 906 putative palmitoylation sites on 641 proteins from mouse forebrain. 62% of sites map to known palmitoylated proteins and 102 individual palmitoylation sites are known from the literature. 54% of palmitoylation sites map to synaptic proteins including many GPCRs, receptors/ion channels and peripheral membrane proteins. Phosphorylation sites were also identified on a subset of peptides that were palmitoylated, demonstrating for the first time co-identification of these modifications by mass spectrometry. Palmitoylation sites were identified on over half of the family of palmitoyl-acyltransferases (PATs) that mediate protein palmitoylation, including active site thioester-linked palmitoyl intermediates. Distinct palmitoylation motifs and site topology were identified for integral membrane and soluble proteins, indicating potential differences in associated PAT specificity and palmitoylation function. ssABE allows the global identification of palmitoylation sites as well as measurement of the active site modification state of PATs, enabling palmitoylation to be studied at a systems level.

    Funded by: Wellcome Trust

    Scientific reports 2017;7;1;4683

  • The Driver Mutational Landscape of Ovarian Squamous Cell Carcinomas Arising in Mature Cystic Teratoma.

    Cooke SL, Ennis D, Evers L, Dowson S, Chan MY, Paul J, Hirschowitz L, Glasspool RM, Singh N, Bell S, Day E, Kochman A, Wilkinson N, Beer P, Martin S, Millan D, Biankin AV, McNeish IA and Scottish Genomes Partnership

    Institute of Cancer Sciences, University of Glasgow, Glasgow, United Kingdom.

    <b>Purpose:</b> We sought to identify the genomic abnormalities in squamous cell carcinomas (SCC) arising in ovarian mature cystic teratoma (MCT), a rare gynecological malignancy of poor prognosis.<b>Experimental design:</b> We performed copy number, mutational state, and zygosity analysis of 151 genes in SCC arising in MCT (<i>n</i> = 25) using next-generation sequencing. The presence of high-/intermediate-risk HPV genotypes was assessed by quantitative PCR. Genomic events were correlated with clinical features and outcome.<b>Results:</b> MCT had a low mutation burden with a mean of only one mutation per case. Zygosity analyses of MCT indicated four separate patterns, suggesting that MCT can arise from errors at various stages of oogenesis. A total of 244 abnormalities were identified in 79 genes in MCT-associated SCC, and the overall mutational burden was high (mean 10.2 mutations per megabase). No SCC was positive for HPV. The most frequently altered genes in SCC were <i>TP53</i> (20/25 cases, 80%), <i>PIK3CA</i> (13/25 cases, 52%), and <i>CDKN2A</i> (11/25 cases, 44%). Mutation in <i>TP53</i> was associated with improved overall survival. In 8 of 20 cases with <i>TP53</i> mutations, two or more variants were identified, which were bi-allelic.<b>Conclusions:</b> Ovarian SCC arising in MCT has a high mutational burden, with <i>TP53</i> mutation the most common abnormality. The presence of <i>TP53</i> mutation is a good prognostic factor. SCC arising in MCT share similar mutation profiles to other SCC. Given their rarity, they should be included in basket studies that recruit patients with SCC of other organs. <i>Clin Cancer Res; 23(24); 7633-40. ©2017 AACR</i>.

    Funded by: Chief Scientist Office: SGP/1; Medical Research Council: G0501974, G0601891, MC_PC_15080, MR/N005813/1

    Clinical cancer research : an official journal of the American Association for Cancer Research 2017;23;24;7633-7640

  • Frequency-dependent selection in vaccine-associated pneumococcal population dynamics.

    Corander J, Fraser C, Gutmann MU, Arnold B, Hanage WP, Bentley SD, Lipsitch M and Croucher NJ

    Helsinki Institute for Information Technology, Department of Mathematics and Statistics, University of Helsinki, 00014, Helsinki, Finland.

    Many bacterial species are composed of multiple lineages distinguished by extensive variation in gene content. These often cocirculate in the same habitat, but the evolutionary and ecological processes that shape these complex populations are poorly understood. Addressing these questions is particularly important for Streptococcus pneumoniae, a nasopharyngeal commensal and respiratory pathogen, because the changes in population structure associated with the recent introduction of partial-coverage vaccines have substantially reduced pneumococcal disease. Here we show that pneumococcal lineages from multiple populations each have a distinct combination of intermediate-frequency genes. Functional analysis suggested that these loci may be subject to negative frequency-dependent selection (NFDS) through interactions with other bacteria, hosts or mobile elements. Correspondingly, these genes had similar frequencies in four populations with dissimilar lineage compositions. These frequencies were maintained following substantial alterations in lineage prevalences once vaccination programmes began. Fitting a multilocus NFDS model of post-vaccine population dynamics to three genomic datasets using Approximate Bayesian Computation generated reproducible estimates of the influence of NFDS on pneumococcal evolution, the strength of which varied between loci. Simulations replicated the stable frequency of lineages unperturbed by vaccination, patterns of serotype switching and clonal replacement. This framework highlights how bacterial ecology affects the impact of clinical interventions.

    Funded by: NIAID NIH HHS: R01 AI048935, R01 AI106786; Wellcome Trust

    Nature ecology & evolution 2017;1;12;1950-1960

  • The global distribution and diversity of protein vaccine candidate antigens in the highly virulent Streptococcus pnuemoniae serotype 1.

    Cornick JE, Tastan Bishop Ö, Yalcin F, Kiran AM, Kumwenda B, Chaguza C, Govindpershad S, Ousmane S, Senghore M, du Plessis M, Pluschke G, Ebruke C, McGee L, Sigaùque B, Collard JM, Bentley SD, Kadioglu A, Antonio M, von Gottberg A, French N, Klugman KP, Heyderman RS, Alderson M, Everett DB and PAGe consortium

    Malawi-Liverpool Wellcome Trust Clinical Research Programme, Queen Elizabeth Central Hospital, Blantyre, Malawi; Clinical Infection, Microbiology and Immunology, Institute of Infection and Global Health, University of Liverpool, Liverpool L69 7BE, UK. Electronic address:

    Serotype 1 is one of the most common causes of pneumococcal disease worldwide. Pneumococcal protein vaccines are currently being developed as an alternate intervention strategy to pneumococcal conjugate vaccines. Pre-requisites for an efficacious pneumococcal protein vaccine are universal presence and minimal variation of the target antigen in the pneumococcal population, and the capability to induce a robust human immune response. We used in silico analysis to assess the prevalence of seven protein vaccine candidates (CbpA, PcpA, PhtD, PspA, SP0148, SP1912, SP2108) among 445 serotype 1 pneumococci from 26 different countries, across four continents. CbpA (76%), PspA (68%), PhtD (28%), PcpA (11%) were not universally encoded in the study population, and would not provide full coverage against serotype 1. PcpA was widely present in the European (82%), but not in the African (2%) population. A multi-valent vaccine incorporating CbpA, PcpA, PhtD and PspA was predicted to provide coverage against 86% of the global population. SP0148, SP1912 and SP2108 were universally encoded and we further assessed their predicted amino acid, antigenic and structural variation. Multiple allelic variants of these proteins were identified, different allelic variants dominated in different continents; the observed variation was predicted to impact the antigenicity and structure of two SP0148 variants, one SP1912 variant and four SP2108 variants, however these variants were each only present in a small fraction of the global population (<2%). The vast majority of the observed variation was predicted to have no impact on the efficaciousness of a protein vaccine incorporating a single variant of SP0148, SP1912 and/or SP2108 from S. pneumoniae TIGR4. Our findings emphasise the importance of taking geographic differences into account when designing global vaccine interventions and support the continued development of SP0148, SP1912 and SP2108 as protein vaccine candidates against this important pneumococcal serotype.

    Funded by: Medical Research Council: MC_PC_14110, MC_U190074190, MC_U190081991, MR/R003076/1; NHGRI NIH HHS: U41 HG006941; Wellcome Trust

    Vaccine 2017;35;6;972-980

  • Natural variation of Epstein-Barr virus genes, proteins and pri-miRNA (revised).

    Correia S, Palser A, Elgueta Karstegl C, Middeldorp JM, Ramayanti O, Cohen JI, Hildesheim A, Fellner MD, Wiels J, White RE, Kellam P and Farrell PJ

    Section of Virology, Imperial College Faculty of Medicine, Norfolk Place, London W2 1PG, UK.

    Viral gene sequences from an enlarged set of about 200 Epstein-Barr virus (EBV) strains including many primary isolates have been used to investigate variation in key viral genetic regions, particularly LMP1, Zp, gp350, EBNA1 and the BART miRNA cluster 2. Determination of type 1 and type 2 EBV in saliva samples from people from a wide range of geographic and ethnic backgrounds demonstrates a small percentage of healthy white Caucasian British people carrying predominantly type 2 EBV. Linkage of Zp and gp350 variants to type 2 EBV is likely to be due to their genes being adjacent to the EBNA3 locus, which is one of the major determinants of the type 1/type 2 distinction. A novel classification of EBNA1 DNA binding domains named QCIGP results from phylogeny analysis of their protein sequences but is not linked to the type 1/type 2 classification. The BART cluster 2 miRNA region is classified into three major variants through SNPs in the pri-miRNA outside of the mature miRNA sequences. These SNPs can result in altered levels of expression of some miRNAs from the BART variant frequently present in Chinese and Indonesian nasopharyngeal carcinoma (NPC) samples. The EBV genetic variants identified here provide a basis for future more directed analysis of association of specific EBV variation with EBV biology and EBV associated diseases.IMPORTANCE Incidence of diseases associated with EBV varies greatly in different parts of the world. Relationships between EBV genome sequence variation and health, disease, geography and ethnicity of the host may thus be important for understanding the role of EBV in diseases and for development of an effective EBV vaccine. This paper provides the most comprehensive analysis so far of variation in specific EBV genes relevant to these diseases and proposed EBV vaccines. By focussing on variation in LMP1, Zp, gp350, EBNA1 and the BART miRNA cluster 2, new relationships to the known type 1/type 2 strains are demonstrated and novel classification of EBNA1 and the BART miRNAs is proposed.

    Journal of virology 2017

  • The Expanding World of Human Leishmaniasis.

    Cotton JA

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambs, CB10 1SA, UK. Electronic address:

    New Leishmania isolates form a novel group of human parasites related to Leishmania enrietti, with cases in Ghana, Thailand, and Martinique; other relatives infect Australian and South American wildlife. These parasites apparently cause both cutaneous and visceral disease, and may have evolved a novel transmission mechanism exploiting blood-feeding midges.

    Funded by: Wellcome Trust

    Trends in parasitology 2017;33;5;341-344

  • The genome of Leishmania adleri from a mammalian host highlights chromosome fission in Sauroleishmania.

    Coughlan S, Mulhair P, Sanders M, Schonian G, Cotton JA and Downing T

    School of Mathematics, Applied Mathematics and Statistics, National University of Ireland, Galway, Ireland.

    Control of pathogens arising from humans, livestock and wild animals can be enhanced by genome-based investigation. Phylogenetically classifying and optimal construction of these genomes using short sequence reads are key to this process. We examined the mammal-infecting unicellular parasite Leishmania adleri belonging to the lizard-infecting Sauroleishmania subgenus. L. adleri has been associated with cutaneous disease in humans, but can be asymptomatic in wild animals. We sequenced, assembled and investigated the L. adleri genome isolated from an asymptomatic Ethiopian rodent (MARV/ET/75/HO174) and verified it as L. adleri by comparison with other Sauroleishmania species. Chromosome-level scaffolding was achieved by combining reference-guided with de novo assembly followed by extensive improvement steps to produce a final draft genome with contiguity comparable with other references. L. tarentolae and L. major genome annotation was transferred and these gene models were manually verified and improved. This first high-quality draft Leishmania adleri reference genome is also the first Sauroleishmania genome from a non-reptilian host. Comparison of the L. adleri HO174 genome with those of L. tarentolae Parrot-TarII and lizard-infecting L. adleri RLAT/KE/1957/SKINK-7 showed extensive gene amplifications, pervasive aneuploidy, and fission of chromosomes 30 and 36. There was little genetic differentiation between L. adleri extracted from mammals and reptiles, highlighting challenges for leishmaniasis surveillance.

    Funded by: Wellcome Trust: 098051

    Scientific reports 2017;7;43747

  • Using whole genome sequencing to investigate transmission in a multi-host system: bovine tuberculosis in New Zealand.

    Crispell J, Zadoks RN, Harris SR, Paterson B, Collins DM, de-Lisle GW, Livingstone P, Neill MA, Biek R, Lycett SJ, Kao RR and Price-Carter M

    Institute of Biodiversity, Animal Health, and Comparative Medicine, University of Glasgow, Glasgow, Scotland, G61 1QH, UK.

    Background: Bovine tuberculosis (bTB), caused by Mycobacterium bovis, is an important livestock disease raising public health and economic concerns around the world. In New Zealand, a number of wildlife species are implicated in the spread and persistence of bTB in cattle populations, most notably the brushtail possum (Trichosurus vulpecula). Whole Genome Sequenced (WGS) M. bovis isolates sourced from infected cattle and wildlife across New Zealand were analysed. Bayesian phylogenetic analyses were conducted to estimate the substitution rate of the sampled population and investigate the role of wildlife. In addition, the utility of WGS was examined with a view to these methods being incorporated into routine bTB surveillance.

    Results: A high rate of exchange was evident between the sampled wildlife and cattle populations but directional estimates of inter-species transmission were sensitive to the sampling strategy employed. A relatively high substitution rate was estimated, this, in combination with a strong spatial signature and a good agreement to previous typing methods, acts to endorse WGS as a typing tool.

    Conclusions: In agreement with the current knowledge of bTB in New Zealand, transmission of M. bovis between cattle and wildlife was evident. Without direction, these estimates are less informative but taken in conjunction with the low prevalence of bTB in New Zealand's cattle population it is likely that, currently, wildlife populations are acting as the main bTB reservoir. Wildlife should therefore continue to be targeted if bTB is to be eradicated from New Zealand. WGS will be a considerable aid to bTB eradication by greatly improving the discriminatory power of molecular typing data. The substitution rates estimated here will be an important part of epidemiological investigations using WGS data.

    Funded by: Biotechnology and Biological Sciences Research Council

    BMC genomics 2017;18;1;180

  • Diverse evolutionary patterns of pneumococcal antigens identified by pangenome-wide immunological screening.

    Croucher NJ, Campo JJ, Le TQ, Liang X, Bentley SD, Hanage WP and Lipsitch M

    Department of Infectious Disease Epidemiology, Imperial College London, London W2 1PG, United Kingdom;

    Characterizing the immune response to pneumococcal proteins is critical in understanding this bacterium's epidemiology and vaccinology. Probing a custom-designed proteome microarray with sera from 35 healthy US adults revealed a continuous distribution of IgG affinities for 2,190 potential antigens from the species-wide pangenome. Reproducibly elevated IgG binding was elicited by 208 "antibody binding targets" (ABTs), which included 109 variants of the diverse pneumococcal surface proteins A and C (PspA and PspC) and zinc metalloprotease A and B (ZmpA and ZmpB) proteins. Functional analysis found ABTs were enriched in motifs for secretion and cell surface association, with extensive representation of cell wall synthesis machinery, adhesins, transporter solute-binding proteins, and degradative enzymes. ABTs were associated with stronger evidence for evolving under positive selection, although this varied between functional categories, as did rates of diversification through recombination. Particularly rapid variation was observed at some immunogenic accessory loci, including a phage protein and a phase-variable glycosyltransferase ubiquitous among the diverse set of genomic islands encoding the serine-rich PsrP glycoprotein. Nevertheless, many antigens were conserved in the core genome, and strains' antigenic profiles were generally stable. No strong evidence was found for any epistasis between antigens driving population dynamics, or redundancy between functionally similar accessory ABTs, or age stratification of antigen profiles. These results highlight the paradox of why substantial variation is observed in only a subset of epitopes. This result may indicate only some interactions between immunoglobulins and ABTs clear pneumococcal colonization or that acquired immunity to pneumococci is an accumulation of individually weak responses to ABTs evolving under different levels of functional constraint.

    Funded by: NIAID NIH HHS: R01 AI048935, R01 AI066304; NIGMS NIH HHS: U54 GM088558; Wellcome Trust: 098051, 104169/Z/14/Z

    Proceedings of the National Academy of Sciences of the United States of America 2017;114;3;E357-E366

  • ACTB Loss-of-Function Mutations Result in a Pleiotropic Developmental Disorder.

    Cuvertino S, Stuart HM, Chandler KE, Roberts NA, Armstrong R, Bernardini L, Bhaskar S, Callewaert B, Clayton-Smith J, Davalillo CH, Deshpande C, Devriendt K, Digilio MC, Dixit A, Edwards M, Friedman JM, Gonzalez-Meneses A, Joss S, Kerr B, Lampe AK, Langlois S, Lennon R, Loget P, Ma DYT, McGowan R, Des Medt M, O'Sullivan J, Odent S, Parker MJ, Pebrel-Richard C, Petit F, Stark Z, Stockler-Ipsiroglu S, Tinschert S, Vasudevan P, Villa O, White SM, Zahir FR, DDD Study, Woolf AS and Banka S

    Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine, and Health, The University of Manchester, M13 9PL Manchester, UK.

    ACTB encodes β-actin, an abundant cytoskeletal housekeeping protein. In humans, postulated gain-of-function missense mutations cause Baraitser-Winter syndrome (BRWS), characterized by intellectual disability, cortical malformations, coloboma, sensorineural deafness, and typical facial features. To date, the consequences of loss-of-function ACTB mutations have not been proven conclusively. We describe heterozygous ACTB deletions and nonsense and frameshift mutations in 33 individuals with developmental delay, apparent intellectual disability, increased frequency of internal organ malformations (including those of the heart and the renal tract), growth retardation, and a recognizable facial gestalt (interrupted wavy eyebrows, dense eyelashes, wide nose, wide mouth, and a prominent chin) that is distinct from characteristics of individuals with BRWS. Strikingly, this spectrum overlaps with that of several chromatin-remodeling developmental disorders. In wild-type mouse embryos, β-actin expression was prominent in the kidney, heart, and brain. ACTB mRNA expression levels in lymphoblastic lines and fibroblasts derived from affected individuals were decreased in comparison to those in control cells. Fibroblasts derived from an affected individual and ACTB siRNA knockdown in wild-type fibroblasts showed altered cell shape and migration, consistent with known roles of cytoplasmic β-actin. We also demonstrate that ACTB haploinsufficiency leads to reduced cell proliferation, altered expression of cell-cycle genes, and decreased amounts of nuclear, but not cytoplasmic, β-actin. In conclusion, we show that heterozygous loss-of-function ACTB mutations cause a distinct pleiotropic malformation syndrome with intellectual disability. Our biological studies suggest that a critically reduced amount of this protein alters cell shape, migration, proliferation, and gene expression to the detriment of brain, heart, and kidney development.

    Funded by: Medical Research Council: MR/L002744/1; Wellcome Trust

    American journal of human genetics 2017;101;6;1021-1033

  • BCFtools/csq: haplotype-aware variant consequences.

    Danecek P and McCarthy SA

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK.

    Motivation: Prediction of functional variant consequences is an important part of sequencing pipelines, allowing the categorization and prioritization of genetic variants for follow up analysis. However, current predictors analyze variants as isolated events, which can lead to incorrect predictions when adjacent variants alter the same codon, or when a frame-shifting indel is followed by a frame-restoring indel. Exploiting known haplotype information when making consequence predictions can resolve these issues.

    Results: BCFtools/csq is a fast program for haplotype-aware consequence calling which can take into account known phase. Consequence predictions are changed for 501 of 5019 compound variants found in the 81.7M variants in the 1000 Genomes Project data, with an average of 139 compound variants per haplotype. Predictions match existing tools when run in localized mode, but the program is an order of magnitude faster and requires an order of magnitude less memory.

    Availability and implementation: The program is freely available for commercial and non-commercial use in the BCFtools package which is available for download from .


    Supplementary information: Supplementary data are available at Bioinformatics online.

    Funded by: Wellcome Trust

    Bioinformatics (Oxford, England) 2017;33;13;2037-2039

  • The STRATAA study protocol: a programme to assess the burden of enteric fever in Bangladesh, Malawi and Nepal using prospective population census, passive surveillance, serological studies and healthcare utilisation surveys.

    Darton TC, Meiring JE, Tonks S, Khan MA, Khanam F, Shakya M, Thindwa D, Baker S, Basnyat B, Clemens JD, Dougan G, Dolecek C, Dunstan SJ, Gordon MA, Heyderman RS, Holt KE, Pitzer VE, Qadri F, Zaman K, Pollard AJ and STRATAA Study Consortium

    The Hospital for Tropical Diseases, Wellcome Trust Major Overseas Programme, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam.

    Introduction: Invasive infections caused by <i>Salmonella enterica</i> serovar Typhi and Paratyphi A are estimated to account for 12-27 million febrile illness episodes worldwide annually. Determining the true burden of typhoidal <i>Salmonellae</i> infections is hindered by lack of population-based studies and adequate laboratory diagnostics.The Strategic Typhoid alliance across Africa and Asia study takes a systematic approach to measuring the age-stratified burden of clinical and subclinical disease caused by typhoidal <i>Salmonellae</i> infections at three high-incidence urban sites in Africa and Asia. We aim to explore the natural history of <i>Salmonella</i> transmission in endemic settings, addressing key uncertainties relating to the epidemiology of enteric fever identified through mathematical models, and enabling optimisation of vaccine strategies.

    Methods/design: Using census-defined denominator populations of ≥100 000 individuals at sites in Malawi, Bangladesh and Nepal, the primary outcome is to characterise the burden of enteric fever in these populations over a 24-month period. During passive surveillance, clinical and household data, and laboratory samples will be collected from febrile individuals. In parallel, healthcare utilisation and water, sanitation and hygiene surveys will be performed to characterise healthcare-seeking behaviour and assess potential routes of transmission. The rates of both undiagnosed and subclinical exposure to typhoidal <i>Salmonellae</i> (seroincidence), identification of chronic carriage and population seroprevalence of typhoid infection will be assessed through age-stratified serosurveys performed at each site. Secondary attack rates will be estimated among household contacts of acute enteric fever cases and possible chronic carriers.

    Ethics and dissemination: This protocol has been ethically approved by the Oxford Tropical Research Ethics Committee, the icddr,b Institutional Review Board, the Malawian National Health Sciences Research Committee and College of Medicine Research Ethics Committee and Nepal Health Research Council. The study is being conducted in accordance with the principles of the Declaration of Helsinki and Good Clinical Practice. Informed consent was obtained before study enrolment. Results will be submitted to international peer-reviewed journals and presented at international conferences.

    Trial registration number: ISRCTN 12131979.

    Ethics references: Oxford (Oxford Tropical Research Ethics Committee 39-15).Bangladesh (icddr,b Institutional Review Board PR-15119).Malawi (National Health Sciences Research Committee 15/5/1599).Nepal (Nepal Health Research Council 306/2015).

    Funded by: Wellcome Trust: 106158/Z/14/Z

    BMJ open 2017;7;6;e016283

  • No evidence for maintenance of a sympatric Heliconius species barrier by chromosomal inversions.

    Davey JW, Barker SL, Rastas PM, Pinharanda A, Martin SH, Durbin R, McMillan WO, Merrill RM and Jiggins CD

    Department of Zoology University of Cambridge Downing Street Cambridge CB2 3EJ United Kingdom.

    Mechanisms that suppress recombination are known to help maintain species barriers by preventing the breakup of coadapted gene combinations. The sympatric butterfly species <i>Heliconius melpomene</i> and <i>Heliconius cydno</i> are separated by many strong barriers, but the species still hybridize infrequently in the wild, and around 40% of the genome is influenced by introgression. We tested the hypothesis that genetic barriers between the species are maintained by inversions or other mechanisms that reduce between-species recombination rate. We constructed fine-scale recombination maps for Panamanian populations of both species and their hybrids to directly measure recombination rate within and between species, and generated long sequence reads to detect inversions. We find no evidence for a systematic reduction in recombination rates in F1 hybrids, and also no evidence for inversions longer than 50 kb that might be involved in generating or maintaining species barriers. This suggests that mechanisms leading to global or local reduction in recombination do not play a significant role in the maintenance of species barriers between <i>H. melpomene</i> and <i>H. cydno</i>.

    Funded by: Wellcome Trust

    Evolution letters 2017;1;3;138-154

  • Seeding and Establishment of Legionella pneumophila in Hospitals: Implications for Genomic Investigations of Nosocomial Legionnaires' Disease.

    David S, Afshar B, Mentasti M, Ginevra C, Podglajen I, Harris SR, Chalker VJ, Jarraud S, Harrison TG and Parkhill J

    Pathogen Genomics, Wellcome Trust Sanger Institute, Cambridge, UK.

    Background: Legionnaires' disease is an important cause of hospital-acquired pneumonia and is caused by infection with the bacterium Legionella. Because current typing methods often fail to resolve the infection source in possible nosocomial cases, we aimed to determine whether whole-genome sequencing (WGS) could be used to support or refute suspected links between cases and hospitals. We focused on cases involving a major nosocomial-associated strain, L. pneumophila sequence type (ST) 1.

    Methods: WGS data from 229 L. pneumophila ST1 isolates were analyzed, including 99 isolates from the water systems of 17 hospitals and 42 clinical isolates from patients with confirmed or suspected hospital-acquired infections, as well as isolates obtained from or associated with community-acquired sources of Legionnaires' disease.

    Results: Phylogenetic analysis demonstrated that all hospitals from which multiple isolates were obtained have been colonized by 1 or more distinct ST1 populations. However, deep sampling of 1 hospital also revealed the existence of substantial diversity and ward-specific microevolution within the population. Across all hospitals, suspected links with cases were supported with WGS, although the degree of support was dependent on the depth of environmental sampling and available contextual information. Finally, phylogeographic analysis revealed that hospitals have been seeded with L. pneumophila via both local and international spread of ST1.

    Conclusions: WGS can be used to support or refute suspected links between hospitals and Legionnaires' disease cases. However, deep hospital sampling is frequently required due to the potential coexistence of multiple populations, existence of substantial diversity, and similarity of hospital isolates to local populations.

    Clinical infectious diseases : an official publication of the Infectious Diseases Society of America 2017;64;9;1251-1259

  • Dynamics and impact of homologous recombination on the evolution of Legionella pneumophila.

    David S, Sánchez-Busó L, Harris SR, Marttinen P, Rusniok C, Buchrieser C, Harrison TG and Parkhill J

    Pathogen Genomics, Wellcome Trust Sanger Institute, Cambridge, United Kingdom.

    Legionella pneumophila is an environmental bacterium and the causative agent of Legionnaires' disease. Previous genomic studies have shown that recombination accounts for a high proportion (>96%) of diversity within several major disease-associated sequence types (STs) of L. pneumophila. This suggests that recombination represents a potentially important force shaping adaptation and virulence. Despite this, little is known about the biological effects of recombination in L. pneumophila, particularly with regards to homologous recombination (whereby genes are replaced with alternative allelic variants). Using newly available population genomic data, we have disentangled events arising from homologous and non-homologous recombination in six major disease-associated STs of L. pneumophila (subsp. pneumophila), and subsequently performed a detailed characterisation of the dynamics and impact of homologous recombination. We identified genomic "hotspots" of homologous recombination that include regions containing outer membrane proteins, the lipopolysaccharide (LPS) region and Dot/Icm effectors, which provide interesting clues to the selection pressures faced by L. pneumophila. Inference of the origin of the recombined regions showed that isolates have most frequently imported DNA from isolates belonging to their own clade, but also occasionally from other major clades of the same subspecies. This supports the hypothesis that the possibility for horizontal exchange of new adaptations between major clades of the subspecies may have been a critical factor in the recent emergence of several clinically important STs from diverse genomic backgrounds. However, acquisition of recombined regions from another subspecies, L. pneumophila subsp. fraseri, was rarely observed, suggesting the existence of a recombination barrier and/or the possibility of ongoing speciation between the two subspecies. Finally, we suggest that multi-fragment recombination may occur in L. pneumophila, whereby multiple non-contiguous segments that originate from the same molecule of donor DNA are imported into a recipient genome during a single episode of recombination.

    Funded by: Wellcome Trust

    PLoS genetics 2017;13;6;e1006855

  • A point mutation in the ion conduction pore of AMPA receptor GRIA3 causes dramatically perturbed sleep patterns as well as intellectual disability.

    Davies B, Brown LA, Cais O, Watson J, Clayton AJ, Chang VT, Biggs D, Preece C, Hernandez-Pliego P, Krohn J, Bhomra A, Twigg SRF, Rimmer A, Kanapin A, WGS500 Consortium, Sen A, Zaiwalla Z, McVean G, Foster R, Donnelly P, Taylor JC, Blair E, Nutt D, Aricescu AR, Greger IH, Peirson SN, Flint J and Martin HC

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, Oxfordshire OX3 7BN, UK.

    The discovery of genetic variants influencing sleep patterns can shed light on the physiological processes underlying sleep. As part of a large clinical sequencing project, WGS500, we sequenced a family in which the two male children had severe developmental delay and a dramatically disturbed sleep-wake cycle, with very long wake and sleep durations, reaching up to 106-h awake and 48-h asleep. The most likely causal variant identified was a novel missense variant in the X-linked GRIA3 gene, which has been implicated in intellectual disability. GRIA3 encodes GluA3, a subunit of AMPA-type ionotropic glutamate receptors (AMPARs). The mutation (A653T) falls within the highly conserved transmembrane domain of the ion channel gate, immediately adjacent to the analogous residue in the Grid2 (glutamate receptor) gene, which is mutated in the mouse neurobehavioral mutant, Lurcher. In vitro, the GRIA3(A653T) mutation stabilizes the channel in a closed conformation, in contrast to Lurcher. We introduced the orthologous mutation into a mouse strain by CRISPR-Cas9 mutagenesis and found that hemizygous mutants displayed significant differences in the structure of their activity and sleep compared to wild-type littermates. Typically, mice are polyphasic, exhibiting multiple sleep bouts of sleep several minutes long within a 24-h period. The Gria3A653T mouse showed significantly fewer brief bouts of activity and sleep than the wild-types. Furthermore, Gria3A653T mice showed enhanced period lengthening under constant light compared to wild-type mice, suggesting an increased sensitivity to light. Our results suggest a role for GluA3 channel activity in the regulation of sleep behavior in both mice and humans.

    Funded by: Medical Research Council: MC_U105174197, MC_UP_1201/15, MR/L009609/1, MR/L016265/1; Wellcome Trust

    Human molecular genetics 2017;26;20;3869-3882

  • HRDetect is a predictor of BRCA1 and BRCA2 deficiency based on mutational signatures.

    Davies H, Glodzik D, Morganella S, Yates LR, Staaf J, Zou X, Ramakrishna M, Martin S, Boyault S, Sieuwerts AM, Simpson PT, King TA, Raine K, Eyfjord JE, Kong G, Borg Å, Birney E, Stunnenberg HG, van de Vijver MJ, Børresen-Dale AL, Martens JW, Span PN, Lakhani SR, Vincent-Salomon A, Sotiriou C, Tutt A, Thompson AM, Van Laere S, Richardson AL, Viari A, Campbell PJ, Stratton MR and Nik-Zainal S

    Wellcome Trust Sanger Institute, Hinxton, UK.

    Approximately 1-5% of breast cancers are attributed to inherited mutations in BRCA1 or BRCA2 and are selectively sensitive to poly(ADP-ribose) polymerase (PARP) inhibitors. In other cancer types, germline and/or somatic mutations in BRCA1 and/or BRCA2 (BRCA1/BRCA2) also confer selective sensitivity to PARP inhibitors. Thus, assays to detect BRCA1/BRCA2-deficient tumors have been sought. Recently, somatic substitution, insertion/deletion and rearrangement patterns, or 'mutational signatures', were associated with BRCA1/BRCA2 dysfunction. Herein we used a lasso logistic regression model to identify six distinguishing mutational signatures predictive of BRCA1/BRCA2 deficiency. A weighted model called HRDetect was developed to accurately detect BRCA1/BRCA2-deficient samples. HRDetect identifies BRCA1/BRCA2-deficient tumors with 98.7% sensitivity (area under the curve (AUC) = 0.98). Application of this model in a cohort of 560 individuals with breast cancer, of whom 22 were known to carry a germline BRCA1 or BRCA2 mutation, allowed us to identify an additional 22 tumors with somatic loss of BRCA1 or BRCA2 and 47 tumors with functional BRCA1/BRCA2 deficiency where no mutation was detected. We validated HRDetect on independent cohorts of breast, ovarian and pancreatic cancers and demonstrated its efficacy in alternative sequencing strategies. Integrating all of the classes of mutational signatures thus reveals a larger proportion of individuals with breast cancer harboring BRCA1/BRCA2 deficiency (up to 22%) than hitherto appreciated (∼1-5%) who could have selective therapeutic sensitivity to PARP inhibition.

    Funded by: NCI NIH HHS: P50 CA168504; Wellcome Trust

    Nature medicine 2017;23;4;517-525

  • Whole-Genome Sequencing Reveals Breast Cancers with Mismatch Repair Deficiency.

    Davies H, Morganella S, Purdie CA, Jang SJ, Borgen E, Russnes H, Glodzik D, Zou X, Viari A, Richardson AL, Børresen-Dale AL, Thompson A, Eyfjord JE, Kong G, Stratton MR and Nik-Zainal S

    Wellcome Trust Sanger Institute, Hinxton, United Kingdom.

    Mismatch repair (MMR)-deficient cancers have been discovered to be highly responsive to immune therapies such as PD-1 checkpoint blockade, making their definition in patients, where they may be relatively rare, paramount for treatment decisions. In this study, we utilized patterns of mutagenesis known as mutational signatures, which are imprints of the mutagenic processes associated with MMR deficiency, to identify MMR-deficient breast tumors from a whole-genome sequencing dataset comprising a cohort of 640 patients. We identified 11 of 640 tumors as MMR deficient, but only 2 of 11 exhibited germline mutations in MMR genes or Lynch Syndrome. Two additional tumors had a substantially reduced proportion of mutations attributed to MMR deficiency, where the predominant mutational signatures were related to APOBEC enzymatic activity. Overall, 6 of 11 of the MMR-deficient cases in this cohort were confirmed genetically or epigenetically as having abrogation of MMR genes. However, IHC analysis of MMR-related proteins revealed all but one of 10 samples available for testing as MMR deficient. Thus, the mutational signatures more faithfully reported MMR deficiency than sequencing of MMR genes, because they represent a direct pathophysiologic readout of repair pathway abnormalities. As whole-genome sequencing continues to become more affordable, it could be used to expose individually abnormal tumors in tissue types where MMR deficiency has been rarely detected, but also rarely sought. <i>Cancer Res; 77(18); 4755-62. ©2017 AACR</i>.

    Funded by: Cancer Research UK: C60100/A23916; NCI NIH HHS: P50 CA168504; Wellcome Trust: WT101126/B/13/Z

    Cancer research 2017;77;18;4755-4762

  • Genomic analyses identify hundreds of variants associated with age at menarche and support a role for puberty timing in cancer risk.

    Day FR, Thompson DJ, Helgason H, Chasman DI, Finucane H, Sulem P, Ruth KS, Whalen S, Sarkar AK, Albrecht E, Altmaier E, Amini M, Barbieri CM, Boutin T, Campbell A, Demerath E, Giri A, He C, Hottenga JJ, Karlsson R, Kolcic I, Loh PR, Lunetta KL, Mangino M, Marco B, McMahon G, Medland SE, Nolte IM, Noordam R, Nutile T, Paternoster L, Perjakova N, Porcu E, Rose LM, Schraut KE, Segrè AV, Smith AV, Stolk L, Teumer A, Andrulis IL, Bandinelli S, Beckmann MW, Benitez J, Bergmann S, Bochud M, Boerwinkle E, Bojesen SE, Bolla MK, Brand JS, Brauch H, Brenner H, Broer L, Brüning T, Buring JE, Campbell H, Catamo E, Chanock S, Chenevix-Trench G, Corre T, Couch FJ, Cousminer DL, Cox A, Crisponi L, Czene K, Davey Smith G, de Geus EJCN, de Mutsert R, De Vivo I, Dennis J, Devilee P, Dos-Santos-Silva I, Dunning AM, Eriksson JG, Fasching PA, Fernández-Rhodes L, Ferrucci L, Flesch-Janys D, Franke L, Gabrielson M, Gandin I, Giles GG, Grallert H, Gudbjartsson DF, Guénel P, Hall P, Hallberg E, Hamann U, Harris TB, Hartman CA, Heiss G, Hooning MJ, Hopper JL, Hu F, Hunter DJ, Ikram MA, Im HK, Järvelin MR, Joshi PK, Karasik D, Kellis M, Kutalik Z, LaChance G, Lambrechts D, Langenberg C, Launer LJ, Laven JSE, Lenarduzzi S, Li J, Lind PA, Lindstrom S, Liu Y, Luan J, Mägi R, Mannermaa A, Mbarek H, McCarthy MI, Meisinger C, Meitinger T, Menni C, Metspalu A, Michailidou K, Milani L, Milne RL, Montgomery GW, Mulligan AM, Nalls MA, Navarro P, Nevanlinna H, Nyholt DR, Oldehinkel AJ, O'Mara TA, Padmanabhan S, Palotie A, Pedersen N, Peters A, Peto J, Pharoah PDP, Pouta A, Radice P, Rahman I, Ring SM, Robino A, Rosendaal FR, Rudan I, Rueedi R, Ruggiero D, Sala CF, Schmidt MK, Scott RA, Shah M, Sorice R, Southey MC, Sovio U, Stampfer M, Steri M, Strauch K, Tanaka T, Tikkanen E, Timpson NJ, Traglia M, Truong T, Tyrer JP, Uitterlinden AG, Edwards DRV, Vitart V, Völker U, Vollenweider P, Wang Q, Widen E, van Dijk KW, Willemsen G, Winqvist R, Wolffenbuttel BHR, Zhao JH, Zoledziewska M, Zygmunt M, Alizadeh BZ, Boomsma DI, Ciullo M, Cucca F, Esko T, Franceschini N, Gieger C, Gudnason V, Hayward C, Kraft P, Lawlor DA, Magnusson PKE, Martin NG, Mook-Kanamori DO, Nohr EA, Polasek O, Porteous D, Price AL, Ridker PM, Snieder H, Spector TD, Stöckl D, Toniolo D, Ulivi S, Visser JA, Völzke H, Wareham NJ, Wilson JF, LifeLines Cohort Study, InterAct Consortium, kConFab/AOCS Investigators, Endometrial Cancer Association Consortium, Ovarian Cancer Association Consortium, PRACTICAL consortium, Spurdle AB, Thorsteindottir U, Pollard KS, Easton DF, Tung JY, Chang-Claude J, Hinds D, Murray A, Murabito JM, Stefansson K, Ong KK and Perry JRB

    MRC Epidemiology Unit, University of Cambridge School of Clinical Medicine, Institute of Metabolic Science, Cambridge Biomedical Campus, Cambridge, UK.

    The timing of puberty is a highly polygenic childhood trait that is epidemiologically associated with various adult diseases. Using 1000 Genomes Project-imputed genotype data in up to ∼370,000 women, we identify 389 independent signals (P < 5 × 10<sup>-8</sup>) for age at menarche, a milestone in female pubertal development. In Icelandic data, these signals explain ∼7.4% of the population variance in age at menarche, corresponding to ∼25% of the estimated heritability. We implicate ∼250 genes via coding variation or associated expression, demonstrating significant enrichment in neural tissues. Rare variants near the imprinted genes MKRN3 and DLK1 were identified, exhibiting large effects when paternally inherited. Mendelian randomization analyses suggest causal inverse associations, independent of body mass index (BMI), between puberty timing and risks for breast and endometrial cancers in women and prostate cancer in men. In aggregate, our findings highlight the complexity of the genetic regulation of puberty timing and support causal links with cancer susceptibility.

    Funded by: British Heart Foundation: SP/07/008/24066; Cancer Research UK: 10124, 14136; Medical Research Council: G0401527, G1000143, G1001357, MC_UU_12013/1, MC_UU_12013/3, MC_UU_12013/5, MC_UU_12015/1, MC_UU_12015/2, MR/J012165/1; NCI NIH HHS: R01 CA192393, UM1 CA182913; NHLBI NIH HHS: T32 HL007055; NIA NIH HHS: R56 AG029451, Z01 AG006000-01; NICHD NIH HHS: P2C HD050924

    Nature genetics 2017;49;6;834-841

  • Cytogenetic Resources and Information.

    De Braekeleer E, Huret JL, Mossafa H and Dessen P

    Haematological Cancer Genetics & Stem Cell Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK.

    The main databases devoted stricto sensu to cancer cytogenetics are the "Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer" ( ), the "Atlas of Genetics and Cytogenetics in Oncology and Haematology" ( ), and COSMIC ( ).However, being a complex multistep process, cancer cytogenetics are broadened to "cytogenomics," with complementary resources on: general databases (nucleic acid and protein sequences databases; cartography browsers: GenBank, RefSeq, UCSC, Ensembl, UniProtKB, and Entrez Gene), cancer genomic portals associated with recent international integrated programs, such as TCGA or ICGC, other fusion genes databases, array CGH databases, copy number variation databases, and mutation databases. Other resources such as the International System for Human Cytogenomic Nomenclature (ISCN), the International Classification of Diseases for Oncology (ICD-O), and the Human Gene Nomenclature Database (HGNC) allow a common language.Data within the scientific/medical community should be freely available. However, most of the institutional stakeholders are now gradually disengaging, and well-known databases are forced to beg or to disappear (which may happen!).

    Methods in molecular biology (Clifton, N.J.) 2017;1541;311-331

  • Prognostic impact of p15 gene aberrations in acute leukemia.

    De Braekeleer M, Douet-Guilbert N and De Braekeleer E

    a Laboratoire d'Histologie, Embryologie et Cytogénétique, Faculté de Médecine et des Sciences de la Santé , Université de Brest , Brest , France.

    The p15 gene (also known as CDKN2B, INK4B, p15<sup>INK4B</sup>), located in band 9p21, encodes a protein that induces a G1-phase cell cycle arrest through inhibition of CDK4/6 (cyclin-dependent kinase 4/6). It also plays an important role in the regulation of cellular commitment of hematopoietic progenitor cells and myeloid cell differentiation. p15 can be silenced by several mechanisms, including deletion and hypermethylation of its promoter. Homozygous p15 deletion is rare in acute myeloblastic leukemia (AML) and myelodysplastic syndromes (MDS) but frequent in acute lymphoblastic leukemia (ALL). On the contrary, methylation of the p15 promoter is identified in some 50% of the patients with AML and MDS, but is less frequent in ALL. The analysis of the 28 studies available in the literature revealed conflicting results (unfavorable, favorable or no impact) that can be due, at least in part, to methodological and/or biological pitfalls. Among those, are the heterogeneity of the methylation patterns of the p15 gene and the lack of a comprehensive analysis including transcriptional and translational inactivation that have major impact on its expression. Therefore, detection of the p15 mRNA expression (quantitative or not) may represent a more appropriate method to determine the prognostic impact of the p15 gene.

    Leukemia & lymphoma 2017;58;2;257-265

  • A single-copy Sleeping Beauty transposon mutagenesis screen identifies new PTEN-cooperating tumor suppressor genes.

    de la Rosa J, Weber J, Friedrich MJ, Li Y, Rad L, Ponstingl H, Liang Q, de Quirós SB, Noorani I, Metzakopian E, Strong A, Li MA, Astudillo A, Fernández-García MT, Fernández-García MS, Hoffman GJ, Fuente R, Vassiliou GS, Rad R, López-Otín C, Bradley A and Cadiñanos J

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.

    The overwhelming number of genetic alterations identified through cancer genome sequencing requires complementary approaches to interpret their significance and interactions. Here we developed a novel whole-body insertional mutagenesis screen in mice, which was designed for the discovery of Pten-cooperating tumor suppressors. Toward this aim, we coupled mobilization of a single-copy inactivating Sleeping Beauty transposon to Pten disruption within the same genome. The analysis of 278 transposition-induced prostate, breast and skin tumors detected tissue-specific and shared data sets of known and candidate genes involved in cancer. We validated ZBTB20, CELF2, PARD3, AKAP13 and WAC, which were identified by our screens in multiple cancer types, as new tumor suppressor genes in prostate cancer. We demonstrated their synergy with PTEN in preventing invasion in vitro and confirmed their clinical relevance. Further characterization of Wac in vivo showed obligate haploinsufficiency for this gene (which encodes an autophagy-regulating factor) in a Pten-deficient context. Our study identified complex PTEN-cooperating tumor suppressor networks in different cancer types, with potential clinical implications.

    Funded by: Medical Research Council: MC_PC_12009; Wellcome Trust

    Nature genetics 2017;49;5;730-741

  • Disentangling PTEN-cooperating tumor suppressor gene networks in cancer.

    de la Rosa J, Weber J, Rad R, Bradley A and Cadiñanos J

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK.

    We have recently performed a whole-body, genome-wide screen in mice using a single-copy inactivating transposon for the identification of <i>Pten</i> (phosphatase and tensin homolog)-cooperating tumor suppressor genes (TSGs). We identified known and putative TSGs in multiple cancer types and validated the functional and clinical relevance of several promising candidates for human prostate cancer.

    Molecular & cellular oncology 2017;4;4;e1325550

  • Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease.

    de Lange KM, Moutsianas L, Lee JC, Lamb CA, Luo Y, Kennedy NA, Jostins L, Rice DL, Gutierrez-Achury J, Ji SG, Heap G, Nimmo ER, Edwards C, Henderson P, Mowat C, Sanderson J, Satsangi J, Simmons A, Wilson DC, Tremelling M, Hart A, Mathew CG, Newman WG, Parkes M, Lees CW, Uhlig H, Hawkey C, Prescott NJ, Ahmad T, Mansfield JC, Anderson CA and Barrett JC

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.

    Genetic association studies have identified 215 risk loci for inflammatory bowel disease, thereby uncovering fundamental aspects of its molecular biology. We performed a genome-wide association study of 25,305 individuals and conducted a meta-analysis with published summary statistics, yielding a total sample size of 59,957 subjects. We identified 25 new susceptibility loci, 3 of which contain integrin genes that encode proteins in pathways that have been identified as important therapeutic targets in inflammatory bowel disease. The associated variants are correlated with expression changes in response to immune stimulus at two of these genes (ITGA4 and ITGB8) and at previously implicated loci (ITGAL and ICAM1). In all four cases, the expression-increasing allele also increases disease risk. We also identified likely causal missense variants in a gene implicated in primary immune deficiency, PLCG2, and a negative regulator of inflammation, SLAMF8. Our results demonstrate that new associations at common variants continue to identify genes relevant to therapeutic target identification and prioritization.

    Funded by: Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Department of Health: NIHR-RP-R3-12-026; Medical Research Council: G0600329, G0800675, G0800759, MC_UU_00008/7, MC_UU_12010/7, MR/J00314X/1, MR/M00533X/1, MR/N01104X/1, MR/N01104X/2; Wellcome Trust

    Nature genetics 2017;49;2;256-261

  • Identifying transposon insertions and their effects from RNA-sequencing data.

    de Ruiter JR, Kas SM, Schut E, Adams DJ, Koudijs MJ, Wessels LFA and Jonkers J

    Division of Molecular Pathology and Cancer Genomics Netherlands, Netherlands Cancer Institute, Plesmanlaan 121, Amsterdam 1066 CX, The Netherlands.

    Insertional mutagenesis using engineered transposons is a potent forward genetic screening technique used to identify cancer genes in mouse model systems. In the analysis of these screens, transposon insertion sites are typically identified by targeted DNA-sequencing and subsequently assigned to predicted target genes using heuristics. As such, these approaches provide no direct evidence that insertions actually affect their predicted targets or how transcripts of these genes are affected. To address this, we developed IM-Fusion, an approach that identifies insertion sites from gene-transposon fusions in standard single- and paired-end RNA-sequencing data. We demonstrate IM-Fusion on two separate transposon screens of 123 mammary tumors and 20 B-cell acute lymphoblastic leukemias, respectively. We show that IM-Fusion accurately identifies transposon insertions and their true target genes. Furthermore, by combining the identified insertion sites with expression quantification, we show that we can determine the effect of a transposon insertion on its target gene(s) and prioritize insertions that have a significant effect on expression. We expect that IM-Fusion will significantly enhance the accuracy of cancer gene discovery in forward genetic screens and provide initial insight into the biological effects of insertions on candidate cancer genes.

    Funded by: Cancer Research UK: 13031; European Research Council: 319661

    Nucleic acids research 2017;45;12;7064-7077

  • Rapid establishment of the European Bank for induced Pluripotent Stem Cells (EBiSC) - the Hot Start experience.

    De Sousa PA, Steeg R, Wachter E, Bruce K, King J, Hoeve M, Khadun S, McConnachie G, Holder J, Kurtz A, Seltmann S, Dewender J, Reimann S, Stacey G, O'Shea O, Chapman C, Healy L, Zimmermann H, Bolton B, Rawat T, Atkin I, Veiga A, Kuebler B, Serano BM, Saric T, Hescheler J, Brüstle O, Peitz M, Thiele C, Geijsen N, Holst B, Clausen C, Lako M, Armstrong L, Gupta SK, Kvist AJ, Hicks R, Jonebring A, Brolén G, Ebneth A, Cabrera-Socorro A, Foerch P, Geraerts M, Stummann TC, Harmon S, George C, Streeter I, Clarke L, Parkinson H, Harrison PW, Faulconbridge A, Cherubin L, Burdett T, Trigueros C, Patel MJ, Lucas C, Hardy B, Predan R, Dokler J, Brajnik M, Keminer O, Pless O, Gribbon P, Claussen C, Ringwald A, Kreisel B, Courtney A and Allsopp TE

    Centre for Clinical Brain Sciences, Chancellors Building, 49 Little France Crescent, University of Edinburgh, Edinburgh EH16 4SB, UK; Roslin Cells Ltd(1), Head office, Nine Edinburgh Bioquarter, 9 Little France Rd, Edinburgh EH16 4UX, UK; EBiSC banking facility, Babraham Research Campus, B260 Meditrina, Cambridge CB22 3AT, UK. Electronic address:

    A fast track "Hot Start" process was implemented to launch the European Bank for Induced Pluripotent Stem Cells (EBiSC) to provide early release of a range of established control and disease linked human induced pluripotent stem cell (hiPSC) lines. Established practice amongst consortium members was surveyed to arrive at harmonised and publically accessible Standard Operations Procedures (SOPs) for tissue procurement, bio-sample tracking, iPSC expansion, cryopreservation, qualification and distribution to the research community. These were implemented to create a quality managed foundational collection of lines and associated data made available for distribution. Here we report on the successful outcome of this experience and work flow for banking and facilitating access to an otherwise disparate European resource, with lessons to benefit the international research community. ETOC: The report focuses on the EBiSC experience of rapidly establishing an operational capacity to procure, bank and distribute a foundational collection of established hiPSC lines. It validates the feasibility and defines the challenges of harnessing and integrating the capability and productivity of centres across Europe using commonly available resources currently in the field.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/E012841/1, BBS/B/14779; Medical Research Council: G0301182; NHGRI NIH HHS: P41 HG003619

    Stem cell research 2017;20;105-114

  • Phase-variable methylation and epigenetic regulation by type I restriction-modification systems.

    De Ste Croix M, Vacca I, Kwun MJ, Ralph JD, Bentley SD, Haigh R, Croucher NJ and Oggioni MR

    Department of Genetics, University of Leicester, Leicester LE1 7RH, UK.

    Epigenetic modifications in bacteria, such as DNA methylation, have been shown to affect gene regulation, thereby generating cells that are isogenic but with distinctly different phenotypes. Restriction-modification (RM) systems contain prototypic methylases that are responsible for much of bacterial DNA methylation. This review focuses on a distinctive group of type I RM loci that , through phase variation, can modify their methylation target specificity and can thereby switch bacteria between alternative patterns of DNA methylation. Phase variation occurs at the level of the target recognition domains of the hsdS (specificity) gene via reversible recombination processes acting upon multiple hsdS alleles. We describe the global distribution of such loci throughout the prokaryotic kingdom and highlight the differences in loci structure across the various bacterial species. Although RM systems are often considered simply as an evolutionary response to bacteriophages, these multi-hsdS type I systems have also shown the capacity to change bacterial phenotypes. The ability of these RM systems to allow bacteria to reversibly switch between different physiological states, combined with the existence of such loci across many species of medical and industrial importance, highlights the potential of phase-variable DNA methylation to act as a global regulatory mechanism in bacteria.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/N002903/1; Medical Research Council: MR/M003078/1

    FEMS microbiology reviews 2017;41;Supp_1;S3-S15

  • Comparison of HapMap and 1000 Genomes Reference Panels in a Large-Scale Genome-Wide Association Study.

    de Vries PS, Sabater-Lleal M, Chasman DI, Trompet S, Ahluwalia TS, Teumer A, Kleber ME, Chen MH, Wang JJ, Attia JR, Marioni RE, Steri M, Weng LC, Pool R, Grossmann V, Brody JA, Venturini C, Tanaka T, Rose LM, Oldmeadow C, Mazur J, Basu S, Frånberg M, Yang Q, Ligthart S, Hottenga JJ, Rumley A, Mulas A, de Craen AJ, Grotevendt A, Taylor KD, Delgado GE, Kifley A, Lopez LM, Berentzen TL, Mangino M, Bandinelli S, Morrison AC, Hamsten A, Tofler G, de Maat MP, Draisma HH, Lowe GD, Zoledziewska M, Sattar N, Lackner KJ, Völker U, McKnight B, Huang J, Holliday EG, McEvoy MA, Starr JM, Hysi PG, Hernandez DG, Guan W, Rivadeneira F, McArdle WL, Slagboom PE, Zeller T, Psaty BM, Uitterlinden AG, de Geus EJ, Stott DJ, Binder H, Hofman A, Franco OH, Rotter JI, Ferrucci L, Spector TD, Deary IJ, März W, Greinacher A, Wild PS, Cucca F, Boomsma DI, Watkins H, Tang W, Ridker PM, Jukema JW, Scott RJ, Mitchell P, Hansen T, O'Donnell CJ, Smith NL, Strachan DP and Dehghan A

    Department of Epidemiology, Erasmus MC, Rotterdam, the Netherlands.

    An increasing number of genome-wide association (GWA) studies are now using the higher resolution 1000 Genomes Project reference panel (1000G) for imputation, with the expectation that 1000G imputation will lead to the discovery of additional associated loci when compared to HapMap imputation. In order to assess the improvement of 1000G over HapMap imputation in identifying associated loci, we compared the results of GWA studies of circulating fibrinogen based on the two reference panels. Using both HapMap and 1000G imputation we performed a meta-analysis of 22 studies comprising the same 91,953 individuals. We identified six additional signals using 1000G imputation, while 29 loci were associated using both HapMap and 1000G imputation. One locus identified using HapMap imputation was not significant using 1000G imputation. The genome-wide significance threshold of 5×10-8 is based on the number of independent statistical tests using HapMap imputation, and 1000G imputation may lead to further independent tests that should be corrected for. When using a stricter Bonferroni correction for the 1000G GWA study (P-value < 2.5×10-8), the number of loci significant only using HapMap imputation increased to 4 while the number of loci significant only using 1000G decreased to 5. In conclusion, 1000G imputation enabled the identification of 20% more loci than HapMap imputation, although the advantage of 1000G imputation became less clear when a stricter Bonferroni correction was used. More generally, our results provide insights that are applicable to the implementation of other dense reference panels that are under development.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F019394/1; Medical Research Council: MR/K026992/1; NCATS NIH HHS: UL1 TR000124; NCI NIH HHS: R01 CA047988, UM1 CA182913; NCRR NIH HHS: UL1 RR025005; NHGRI NIH HHS: U01 HG004402; NHLBI NIH HHS: HHSN268200800007C, HHSN268201100005C, HHSN268201100005G, HHSN268201100005I, HHSN268201100006C, HHSN268201100007C, HHSN268201100007I, HHSN268201100008C, HHSN268201100008I, HHSN268201100009C, HHSN268201100009I, HHSN268201100010C, HHSN268201100011C, HHSN268201100011I, HHSN268201100012C, HHSN268201200036C, N01HC25195, N01HC55222, N01HC85079, N01HC85080, N01HC85081, N01HC85082, N01HC85083, N01HC85086, R01 HL043851, R01 HL059367, R01 HL080467, R01 HL086694, R01 HL087641, R01 HL087652, R01 HL103612, R01 HL105756, R01 HL120393, U01 HL080295; NIA NIH HHS: R01 AG023629, R01 AG033193; NIAMS NIH HHS: F32 AR059469; NIDDK NIH HHS: K24 DK080140, P30 DK063491, U01 DK062418; NIMH NIH HHS: R01 MH081802, RC2 MH089951, U24 MH068457; NIMHD NIH HHS: R01 MD009164; NINDS NIH HHS: R01 NS017950; Wellcome Trust

    PloS one 2017;12;1;e0167742

  • Prevalence and architecture of de novo mutations in developmental disorders.

    Deciphering Developmental Disorders Study

    The genomes of individuals with severe, undiagnosed developmental disorders are enriched in damaging de novo mutations (DNMs) in developmentally important genes. Here we have sequenced the exomes of 4,293 families containing individuals with developmental disorders, and meta-analysed these data with data from another 3,287 individuals with similar disorders. We show that the most important factors influencing the diagnostic yield of DNMs are the sex of the affected individual, the relatedness of their parents, whether close relatives are affected and the parental ages. We identified 94 genes enriched in damaging DNMs, including 14 that previously lacked compelling evidence of involvement in developmental disorders. We have also characterized the phenotypic diversity among these disorders. We estimate that 42% of our cohort carry pathogenic DNMs in coding sequences; approximately half of these DNMs disrupt gene function and the remainder result in altered protein function. We estimate that developmental disorders caused by DNMs have an average prevalence of 1 in 213 to 1 in 448 births, depending on parental age. Given current global demographics, this equates to almost 400,000 children born per year.

    Funded by: Medical Research Council: G0800674, MC_PC_U127561093, MR/M014568/1; Wellcome Trust; Wellcome Trust Sanger Institute: WT098051

    Nature 2017;542;7642;433-438

  • Environmental DNA metabarcoding: Transforming how we survey animal and plant communities.

    Deiner K, Bik HM, Mächler E, Seymour M, Lacoursière-Roussel A, Altermatt F, Creer S, Bista I, Lodge DM, de Vere N, Pfrender ME and Bernatchez L

    Atkinson Center for a Sustainable Future, Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, USA.

    The genomic revolution has fundamentally changed how we survey biodiversity on earth. High-throughput sequencing ("HTS") platforms now enable the rapid sequencing of DNA from diverse kinds of environmental samples (termed "environmental DNA" or "eDNA"). Coupling HTS with our ability to associate sequences from eDNA with a taxonomic name is called "eDNA metabarcoding" and offers a powerful molecular tool capable of noninvasively surveying species richness from many ecosystems. Here, we review the use of eDNA metabarcoding for surveying animal and plant richness, and the challenges in using eDNA approaches to estimate relative abundance. We highlight eDNA applications in freshwater, marine and terrestrial environments, and in this broad context, we distill what is known about the ability of different eDNA sample types to approximate richness in space and across time. We provide guiding questions for study design and discuss the eDNA metabarcoding workflow with a focus on primers and library preparation methods. We additionally discuss important criteria for consideration of bioinformatic filtering of data sets, with recommendations for increasing transparency. Finally, looking to the future, we discuss emerging applications of eDNA metabarcoding in ecology, conservation, invasion biology, biomonitoring, and how eDNA metabarcoding can empower citizen science and biodiversity education.

    Molecular ecology 2017;26;21;5872-5895

  • Principles of Reconstructing the Subclonal Architecture of Cancers.

    Dentro SC, Wedge DC and Van Loo P

    Wellcome Trust Sanger Institute, Cambridge CB10 1HH, United Kingdom.

    Most cancers evolve from a single founder cell through a series of clonal expansions that are driven by somatic mutations. These clonal expansions can lead to several coexisting subclones sharing subsets of mutations. Analysis of massively parallel sequencing data can infer a tumor's subclonal composition through the identification of populations of cells with shared mutations. We describe the principles that underlie subclonal reconstruction through single nucleotide variants (SNVs) or copy number alterations (CNAs) from bulk or single-cell sequencing. These principles include estimating the fraction of tumor cells for SNVs and CNAs, performing clustering of SNVs from single- and multisample cases, and single-cell sequencing. The application of subclonal reconstruction methods is providing key insights into tumor evolution, identifying subclonal driver mutations, patterns of parallel evolution and differences in mutational signatures between cellular populations, and characterizing the mechanisms of therapy resistance, spread, and metastasis.

    Funded by: Wellcome Trust

    Cold Spring Harbor perspectives in medicine 2017;7;8

  • Bacterial microbiota of the upper respiratory tract and childhood asthma.

    Depner M, Ege MJ, Cox MJ, Dwyer S, Walker AW, Birzele LT, Genuneit J, Horak E, Braun-Fahrländer C, Danielewicz H, Maier RM, Moffatt MF, Cookson WO, Heederik D, von Mutius E and Legatzki A

    Dr von Hauner Children's Hospital, LMU Munich, Munich, Germany. Electronic address:

    Background: Patients with asthma and healthy controls differ in bacterial colonization of the respiratory tract. The upper airways have been shown to reflect colonization of the lower airways, the actual site of inflammation in asthma, which is hardly accessible in population studies.

    Objective: We sought to characterize the bacterial communities at 2 sites of the upper respiratory tract obtained from children from a rural area and to relate these to asthma.

    Methods: The microbiota of 327 throat and 68 nasal samples from school-age farm and nonfarm children were analyzed by 454-pyrosequencing of the bacterial 16S ribosomal RNA gene.

    Results: Alterations in nasal microbiota but not of throat microbiota were associated with asthma. Children with asthma had lower α- and β-diversity of the nasal microbiota as compared with healthy control children. Furthermore, asthma presence was positively associated with a specific operational taxonomic unit from the genus Moraxella in children not exposed to farming, whereas in farm children Moraxella colonization was unrelated to asthma. In nonfarm children, Moraxella colonization explained the association between bacterial diversity and asthma to a large extent.

    Conclusions: Asthma was mainly associated with an altered nasal microbiota characterized by lower diversity and Moraxella abundance. Children living on farms might not be susceptible to the disadvantageous effect of Moraxella. Prospective studies may clarify whether Moraxella outgrowth is a cause or a consequence of loss in diversity.

    Funded by: Medical Research Council: G1000758

    The Journal of allergy and clinical immunology 2017;139;3;826-834.e13

  • The rise and fall of pneumococcal serotypes carried in the PCV era.

    Devine VT, Cleary DW, Jefferies JM, Anderson R, Morris DE, Tuck AC, Gladstone RA, O'Doherty G, Kuruparan P, Bentley SD, Faust SN and Clarke SC

    Faculty of Medicine and Institute for Life Sciences, University of Southampton, Southampton SO17 1BJ, UK.

    Streptococcus pneumoniae is a major cause of meningitis, sepsis and pneumonia worldwide. Vaccination using pneumococcal conjugate vaccines (PCV) has therefore been part of the UK's childhood immunisation programme since 2006. Here we describe pneumococcal carriage rates in children under five years of age attending the paediatric department of a large UK hospital in response to vaccine implementation over seven winter seasons from 2006 to 2013. S. pneumoniae (n=696) were isolated from nasopharyngeal swabs (n=2267) collected during seven consecutive winters, October to March, 2006/7 to 2012/13. This includes the period immediately following the introduction of the seven-valent pneumococcal conjugate vaccine (PCV7) in 2006 in addition to pre- and post-PCV13 introduction in 2010. We show a decrease in PCV13 vaccine serotypes (VT) in the three years following PCV13 vaccine implementation (2010/11 to 2012/13). Serotype 6A represented the only observed VT following PCV13 implementation with all others (including PCV7 serotypes) absent from carriage. Overall pneumococcal carriage, attributable to non-VT (NVT), was consistent across all sampling years with a mean of 31·1%. The ten most frequently isolated NVTs were 6C, 11A, 15B, 23B, 15A, 21, 22F, 35F, 23A and 15C. Fluctuations in the prevalence of each were however noted. Comparing prevalence at 2006/07 with 2012/13 only 15A was shown to have increased significantly (p value of 0·003) during the course of PCV implementation. These data support the increasing evidence that the primary effect of PCVs is due to population immunity by reducing or eliminating the carriage of invasive VT serotypes. With IPD being increasingly attributed to non-vaccine serotypes, surveillance of carriage data continues to act as an early warning system for vaccine design and public health policy that require continual data of both carried pneumococcal serotypes and IPD attributed serotype data.

    Vaccine 2017;35;9;1293-1298

  • Principles guiding embryo selection following genome-wide haplotyping of preimplantation embryos.

    Dimitriadou E, Melotte C, Debrock S, Esteki MZ, Dierickx K, Voet T, Devriendt K, de Ravel T, Legius E, Peeraer K, Meuleman C and Vermeesch JR

    Department of Human Genetics, Centre for Human Genetics, University Hospitals Leuven, O&N I Herestraat 49 - box 602, KU Leuven, 3000 Leuven, Belgium.

    Study question: How to select and prioritize embryos during PGD following genome-wide haplotyping?

    Summary answer: In addition to genetic disease-specific information, the embryo selected for transfer is based on ranking criteria including the existence of mitotic and/or meiotic aneuploidies, but not carriership of mutations causing recessive disorders.

    What is known already: Embryo selection for monogenic diseases has been mainly performed using targeted disease-specific assays. Recently, these targeted approaches are being complemented by generic genome-wide genetic analysis methods such as karyomapping or haplarithmisis, which are based on genomic haplotype reconstruction of cell(s) biopsied from embryos. This provides not only information about the inheritance of Mendelian disease alleles but also about numerical and structural chromosome anomalies and haplotypes genome-wide. Reflections on how to use this information in the diagnostic laboratory are lacking.

    Study design, size, duration: We present the results of the first 101 PGD cycles (373 embryos) using haplarithmisis, performed in the Centre for Human Genetics, UZ Leuven. The questions raised were addressed by a multidisciplinary team of clinical geneticist, fertility specialists and ethicists.

    Participants/materials, setting, methods: Sixty-three couples enrolled in the genome-wide haplotyping-based PGD program. Families presented with either inherited genetic variants causing known disorders and/or chromosomal rearrangements that could lead to unbalanced translocations in the offspring.

    Main results and the role of chance: Embryos were selected based on the absence or presence of the disease allele, a trisomy or other chromosomal abnormality leading to known developmental disorders. In addition, morphologically normal Day 5 embryos were prioritized for transfer based on the presence of other chromosomal imbalances and/or carrier information.

    Limitations, reasons for caution: Some of the choices made and principles put forward are specific for cleavage-stage-based genetic testing. The proposed guidelines are subject to continuous update based on the accumulating knowledge from the implementation of genome-wide methods for PGD in many different centers world-wide as well as the results of ongoing scientific research.

    Wider implications of the findings: Our embryo selection principles have a profound impact on the organization of PGD operations and on the information that is transferred among the genetic unit, the fertility clinic and the patients. These principles are also important for the organization of pre- and post-counseling and influence the interpretation and reporting of preimplantation genotyping results. As novel genome-wide approaches for embryo selection are revolutionizing the field of reproductive genetics, national and international discussions to set general guidelines are warranted.

    Study funding/competing interest(s): The European Union's Research and Innovation funding programs FP7-PEOPLE-2012-IAPP SARM: 324509 and Horizon 2020 WIDENLIFE: 692065 to J.R.V., T.V., E.D. and M.Z.E. J.R.V., T.V. and M.Z.E. have patents ZL910050-PCT/EP2011/060211-WO/2011/157846 ('Methods for haplotyping single cells') with royalties paid and ZL913096-PCT/EP2014/068315-WO/2015/028576 ('Haplotyping and copy-number typing using polymorphic variant allelic frequencies') with royalties paid, licensed to Cartagenia (Agilent technologies). J.R.V. also has a patent ZL91 2076-PCT/EP20 one 3/070858 ('High throughout genotyping by sequencing') with royalties paid.

    Trial registration number: N/A.

    Human reproduction (Oxford, England) 2017;32;3;687-697

  • Integrated view of Vibrio cholerae in the Americas.

    Domman D, Quilici ML, Dorman MJ, Njamkepo E, Mutreja A, Mather AE, Delgado G, Morales-Espinosa R, Grimont PAD, Lizárraga-Partida ML, Bouchier C, Aanensen DM, Kuri-Morales P, Tarr CL, Dougan G, Parkhill J, Campos J, Cravioto A, Weill FX and Thomson NR

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK.

    Latin America has experienced two of the largest cholera epidemics in modern history; one in 1991 and the other in 2010. However, confusion still surrounds the relationships between globally circulating pandemic <i>Vibrio cholerae</i> clones and local bacterial populations. We used whole-genome sequencing to characterize cholera across the Americas over a 40-year time span. We found that both epidemics were the result of intercontinental introductions of seventh pandemic El Tor <i>V. cholerae</i> and that at least seven lineages local to the Americas are associated with disease that differs epidemiologically from epidemic cholera. Our results consolidate historical accounts of pandemic cholera with data to show the importance of local lineages, presenting an integrated view of cholera that is important to the design of future disease control strategies.

    Funded by: Wellcome Trust

    Science (New York, N.Y.) 2017;358;6364;789-793

  • Wounding induces dedifferentiation of epidermal Gata6+ cells and acquisition of stem cell properties.

    Donati G, Rognoni E, Hiratsuka T, Liakath-Ali K, Hoste E, Kar G, Kayikci M, Russell R, Kretzschmar K, Mulder KW, Teichmann SA and Watt FM

    King's College London Centre for Stem Cells and Regenerative Medicine, 28th Floor, Tower Wing, Guy's Campus, Great Maze Pond, London SE1 9RT, UK.

    The epidermis is maintained by multiple stem cell populations whose progeny differentiate along diverse, and spatially distinct, lineages. Here we show that the transcription factor Gata6 controls the identity of the previously uncharacterized sebaceous duct (SD) lineage and identify the Gata6 downstream transcription factor network that specifies a lineage switch between sebocytes and SD cells. During wound healing differentiated Gata6<sup>+</sup> cells migrate from the SD into the interfollicular epidermis and dedifferentiate, acquiring the ability to undergo long-term self-renewal and differentiate into a much wider range of epidermal lineages than in undamaged tissue. Our data not only demonstrate that the structural and functional complexity of the junctional zone is regulated by Gata6, but also reveal that dedifferentiation is a previously unrecognized property of post-mitotic, terminally differentiated cells that have lost contact with the basement membrane. This resolves the long-standing debate about the contribution of terminally differentiated cells to epidermal wound repair.

    Funded by: Medical Research Council: G1100073, MC_U105185859

    Nature cell biology 2017;19;6;603-613

  • Population genetic structuring of methicillin-resistant Staphylococcus aureus clone EMRSA-15 within UK reflects patient referral patterns.

    Donker T, Reuter S, Scriberras J, Reynolds R, Brown NM, Török ME, James R, Network EOEMR, Aanensen DM, Bentley SD, Holden MTG, Parkhill J, Spratt BG, Peacock SJ, Feil EJ and Grundmann H

    2​Department of Medical Microbiology, University Medical Centre Groningen, University of Groningen, Groningen, The Netherlands.

    Antibiotic resistance forms a serious threat to the health of hospitalised patients, rendering otherwise treatable bacterial infections potentially life-threatening. A thorough understanding of the mechanisms by which resistance spreads between patients in different hospitals is required in order to design effective control strategies. We measured the differences between bacterial populations of 52 hospitals in the United Kingdom and Ireland, using whole-genome sequences from 1085 MRSA clonal complex 22 isolates collected between 1998 and 2012. The genetic differences between bacterial populations were compared with the number of patients transferred between hospitals and their regional structure. The MRSA populations within single hospitals, regions and countries were genetically distinct from the rest of the bacterial population at each of these levels. Hospitals from the same patient referral regions showed more similar MRSA populations, as did hospitals sharing many patients. Furthermore, the bacterial populations from different time-periods within the same hospital were generally more similar to each other than contemporaneous bacterial populations from different hospitals. We conclude that, while a large part of the dispersal and expansion of MRSA takes place among patients seeking care in single hospitals, inter-hospital spread of resistant bacteria is by no means a rare occurrence. Hospitals are exposed to constant introductions of MRSA on a number of levels: (1) most MRSA is received from hospitals that directly transfer large numbers of patients, while (2) fewer introductions happen between regions or (3) across national borders, reflecting lower numbers of transferred patients. A joint coordinated control effort between hospitals, is therefore paramount for the national control of MRSA, antibiotic-resistant bacteria and other hospital-associated pathogens.

    Funded by: Biotechnology and Biological Sciences Research Council; Chief Scientist Office; Department of Health; Medical Research Council: G1000803, MR/N029399/1; Wellcome Trust: 089472, 098051

    Microbial genomics 2017;3;7;e000113

  • No Functional Role for microRNA-342 in a Mouse Model of Pancreatic Acinar Carcinoma.

    Dooley J, Lagou V, Pasciuto E, Linterman MA, Prosser HM, Himmelreich U and Liston A

    Translational Immunology Laboratory, VIB, Leuven, Belgium.

    The intronic microRNA (miR)-342 has been proposed as a potent tumor-suppressor gene. miR-342 is found to be downregulated or epigenetically silenced in multiple different tumor sites, and this loss of expression permits the upregulation of several key oncogenic pathways. In several different cell lines, lower miR-342 expression results in enhanced proliferation and metastasis potential, both <i>in vitro</i> and in xenogenic transplant conditions. Here, we sought to determine the function of miR-342 in an <i>in vivo</i> spontaneous cancer model, using the Ela1-TAg transgenic model of pancreatic acinar carcinoma. Through longitudinal magnetic resonance imaging monitoring of Ela1-TAg transgenic mice, either wild-type or knockout for <i>miR-342</i>, we found no role for miR-342 in the development, growth rate, or pathogenicity of pancreatic acinar carcinoma. These results indicate the importance of assessing miR function in the complex physiology of <i>in vivo</i> model systems and indicate that further functional testing of miR-342 is required before concluding it is a bona fide tumor-suppressor-miR.

    Frontiers in oncology 2017;7;101

  • Control of virulence gene transcription by indirect readout in Vibrio cholerae and Salmonella enterica serovar Typhimurium.

    Dorman CJ and Dorman MJ

    Department of Microbiology, Moyne Institute of Preventive Medicine, Trinity College Dublin, Dublin, Ireland.

    Indirect readout mechanisms of transcription control rely on the recognition of DNA shape by transcription factors (TFs). TFs may also employ a direct readout mechanism that involves the reading of the base sequence in the DNA major groove at the binding site. TFs with winged helix-turn-helix (wHTH) motifs use an alpha helix to read the base sequence in the major groove while inserting a beta sheet 'wing' into the adjacent minor groove. Such wHTH proteins are important regulators of virulence gene transcription in many pathogens; they also control housekeeping genes. This article considers the cases of the non-invasive Gram-negative pathogen Vibrio cholerae and the invasive pathogen Salmonella enterica serovar Typhimurium. Both possess clusters of A + T-rich horizontally acquired virulence genes that are silenced by the nucleoid-associated protein H-NS and regulated positively or negatively by wHTH TFs: for example, ToxR and LeuO in V. cholerae; HilA, LeuO, SlyA and OmpR in S. Typhimurium. Because of their relatively relaxed base sequence requirements for target recognition, indirect readout mechanisms have the potential to engage regulatory proteins with many more targets than might be the case using direct readout, making indirect readout an important, yet often ignored, contributor to the expression of pathogenic phenotypes.

    Funded by: Wellcome Trust: 098051

    Environmental microbiology 2017;19;10;3834-3845

  • Genome watch: Klebsiella pneumoniae: when a colonizer turns bad.

    Dorman MJ and Short FL

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2017;15;7;384

  • Typhoid in Africa and vaccine deployment.

    Dougan G

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK; Department of Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK. Electronic address:

    The Lancet. Global health 2017;5;3;e236-e237

  • Molecular synergy underlies the co-occurrence patterns and phenotype of NPM1-mutant acute myeloid leukemia.

    Dovey OM, Cooper JL, Mupo A, Grove CS, Lynn C, Conte N, Andrews RM, Pacharne S, Tzelepis K, Vijayabaskar MS, Green P, Rad R, Arends M, Wright P, Yusa K, Bradley A, Varela I and Vassiliou GS

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom.

    <i>NPM1</i> mutations define the commonest subgroup of acute myeloid leukemia (AML) and frequently co-occur with <i>FLT3</i> internal tandem duplications (ITD) or, less commonly, <i>NRAS</i> or <i>KRAS</i> mutations. Co-occurrence of mutant <i>NPM1</i> with <i>FLT3-ITD</i> carries a significantly worse prognosis than <i>NPM1-RAS</i> combinations. To understand the molecular basis of these observations, we compare the effects of the 2 combinations on hematopoiesis and leukemogenesis in knock-in mice. Early effects of these mutations on hematopoiesis show that compound <i>Npm1</i><sup> <i>cA/+</i> </sup><i>;Nras</i><sup> <i>G12D/+</i> </sup> or <i>Npm1</i><sup> <i>cA</i> </sup><i>;Flt3</i><sup> <i>ITD</i> </sup> share a number of features: <i>Hox</i> gene overexpression, enhanced self-renewal, expansion of hematopoietic progenitors, and myeloid differentiation bias. However, <i>Npm1</i><sup> <i>cA</i> </sup><i>;Flt3</i><sup> <i>ITD</i> </sup> mutants displayed significantly higher peripheral leukocyte counts, early depletion of common lymphoid progenitors, and a monocytic bias in comparison with the granulocytic bias in <i>Npm1</i><sup> <i>cA/+</i> </sup><i>;Nras</i><sup> <i>G12D/+</i> </sup> mutants. Underlying this was a striking molecular synergy manifested as a dramatically altered gene expression profile in <i>Npm1</i><sup> <i>cA</i> </sup><i>;Flt3</i><sup> <i>ITD</i> </sup> , but not <i>Npm1</i><sup> <i>cA/+</i> </sup><i>;Nras</i><sup> <i>G12D/+</i> </sup> , progenitors compared with wild-type. Both double-mutant models developed high-penetrance AML, although latency was significantly longer with <i>Npm1</i><sup> <i>cA/+</i> </sup><i>;Nras</i><sup> <i>G12D/+</i> </sup> During AML evolution, both models acquired additional copies of the mutant <i>Flt3</i> or <i>Nras</i> alleles, but only <i>Npm1</i><sup> <i>cA/+</i> </sup><i>;Nras</i><sup> <i>G12D/+</i> </sup> mice showed acquisition of other human AML mutations, including <i>IDH1</i> R132Q. We also find, using primary Cas9-expressing AMLs, that <i>Hoxa</i> genes and selected interactors or downstream targets are required for survival of both types of double-mutant AML. Our results show that molecular complementarity underlies the higher frequency and significantly worse prognosis associated with <i>NPM1</i>c/<i>FLT3-ITD</i> vs <i>NPM1/NRAS-G12D-</i>mutant AML and functionally confirm the role of <i>HOXA</i> genes in NPM1c-driven AML.

    Funded by: Medical Research Council: MC_PC_12009; Wellcome Trust: WT095663MA

    Blood 2017;130;17;1911-1922

  • Genome-wide analysis of ivermectin response by Onchocerca volvulus reveals that genetic drift and soft selective sweeps contribute to loss of drug sensitivity.

    Doyle SR, Bourguinat C, Nana-Djeunga HC, Kengne-Ouafo JA, Pion SDS, Bopda J, Kamgno J, Wanji S, Che H, Kuesel AC, Walker M, Basáñez MG, Boakye DA, Osei-Atweneboana MY, Boussinesq M, Prichard RK and Grant WN

    Department of Animal, Plant and Soil Sciences, La Trobe University, Bundoora, Australia.

    Background: Treatment of onchocerciasis using mass ivermectin administration has reduced morbidity and transmission throughout Africa and Central/South America. Mass drug administration is likely to exert selection pressure on parasites, and phenotypic and genetic changes in several Onchocerca volvulus populations from Cameroon and Ghana-exposed to more than a decade of regular ivermectin treatment-have raised concern that sub-optimal responses to ivermectin's anti-fecundity effect are becoming more frequent and may spread.

    Methodology/principal findings: Pooled next generation sequencing (Pool-seq) was used to characterise genetic diversity within and between 108 adult female worms differing in ivermectin treatment history and response. Genome-wide analyses revealed genetic variation that significantly differentiated good responder (GR) and sub-optimal responder (SOR) parasites. These variants were not randomly distributed but clustered in ~31 quantitative trait loci (QTLs), with little overlap in putative QTL position and gene content between the two countries. Published candidate ivermectin SOR genes were largely absent in these regions; QTLs differentiating GR and SOR worms were enriched for genes in molecular pathways associated with neurotransmission, development, and stress responses. Finally, single worm genotyping demonstrated that geographic isolation and genetic change over time (in the presence of drug exposure) had a significantly greater role in shaping genetic diversity than the evolution of SOR.

    Conclusions/significance: This study is one of the first genome-wide association analyses in a parasitic nematode, and provides insight into the genomics of ivermectin response and population structure of O. volvulus. We argue that ivermectin response is a polygenically-determined quantitative trait (QT) whereby identical or related molecular pathways but not necessarily individual genes are likely to determine the extent of ivermectin response in different parasite populations. Furthermore, we propose that genetic drift rather than genetic selection of SOR is the underlying driver of population differentiation, which has significant implications for the emergence and potential spread of SOR within and between these parasite populations.

    Funded by: Wellcome Trust; World Health Organization: 001

    PLoS neglected tropical diseases 2017;11;7;e0005816

  • Use of CRISPR-modified human stem cell organoids to study the origin of mutational signatures in cancer.

    Drost J, van Boxtel R, Blokzijl F, Mizutani T, Sasaki N, Sasselli V, de Ligt J, Behjati S, Grolleman JE, van Wezel T, Nik-Zainal S, Kuiper RP, Cuppen E and Clevers H

    Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences (KNAW) and University Medical Center (UMC) Utrecht, 3584CT Utrecht, Netherlands.

    Mutational processes underlie cancer initiation and progression. Signatures of these processes in cancer genomes may explain cancer etiology and could hold diagnostic and prognostic value. We developed a strategy that can be used to explore the origin of cancer-associated mutational signatures. We used CRISPR-Cas9 technology to delete key DNA repair genes in human colon organoids, followed by delayed subcloning and whole-genome sequencing. We found that mutation accumulation in organoids deficient in the mismatch repair gene <i>MLH1</i> is driven by replication errors and accurately models the mutation profiles observed in mismatch repair-deficient colorectal cancers. Application of this strategy to the cancer predisposition gene <i>NTHL1</i>, which encodes a base excision repair protein, revealed a mutational footprint (signature 30) previously observed in a breast cancer cohort. We show that signature 30 can arise from germline <i>NTHL1</i> mutations.

    Funded by: Wellcome Trust: 100183

    Science (New York, N.Y.) 2017;358;6360;234-238

  • The Experimental Design Assistant.

    du Sert NP, Bamsey I, Bate ST, Berdoy M, Clark RA, Cuthill IC, Fry D, Karp NA, Macleod M, Moon L, Stanford SC and Lings B

    National Centre for the Replacement, Refinement and Reduction of Animals in Research (NC3Rs), London, UK.

    Funded by: National Centre for the Replacement, Refinement and Reduction of Animals in Research: NC/L000970/1

    Nature methods 2017;14;11;1024-1025

  • Virus genomes reveal factors that spread and sustained the Ebola epidemic.

    Dudas G, Carvalho LM, Bedford T, Tatem AJ, Baele G, Faria NR, Park DJ, Ladner JT, Arias A, Asogun D, Bielejec F, Caddy SL, Cotten M, D'Ambrozio J, Dellicour S, Di Caro A, Diclaro JW, Duraffour S, Elmore MJ, Fakoli LS, Faye O, Gilbert ML, Gevao SM, Gire S, Gladden-Young A, Gnirke A, Goba A, Grant DS, Haagmans BL, Hiscox JA, Jah U, Kugelman JR, Liu D, Lu J, Malboeuf CM, Mate S, Matthews DA, Matranga CB, Meredith LW, Qu J, Quick J, Pas SD, Phan MVT, Pollakis G, Reusken CB, Sanchez-Lockhart M, Schaffner SF, Schieffelin JS, Sealfon RS, Simon-Loriere E, Smits SL, Stoecker K, Thorne L, Tobin EA, Vandi MA, Watson SJ, West K, Whitmer S, Wiley MR, Winnicki SM, Wohl S, Wölfel R, Yozwiak NL, Andersen KG, Blyden SO, Bolay F, Carroll MW, Dahn B, Diallo B, Formenty P, Fraser C, Gao GF, Garry RF, Goodfellow I, Günther S, Happi CT, Holmes EC, Kargbo B, Keïta S, Kellam P, Koopmans MPG, Kuhn JH, Loman NJ, Magassouba N, Naidoo D, Nichol ST, Nyenswah T, Palacios G, Pybus OG, Sabeti PC, Sall A, Ströher U, Wurie I, Suchard MA, Lemey P and Rambaut A

    Institute of Evolutionary Biology, University of Edinburgh, King's Buildings, Edinburgh EH9 3FL, UK.

    The 2013-2016 West African epidemic caused by the Ebola virus was of unprecedented magnitude, duration and impact. Here we reconstruct the dispersal, proliferation and decline of Ebola virus throughout the region by analysing 1,610 Ebola virus genomes, which represent over 5% of the known cases. We test the association of geography, climate and demography with viral movement among administrative regions, inferring a classic 'gravity' model, with intense dispersal between larger and closer populations. Despite attenuation of international dispersal after border closures, cross-border transmission had already sown the seeds for an international epidemic, rendering these measures ineffective at curbing the epidemic. We address why the epidemic did not spread into neighbouring countries, showing that these countries were susceptible to substantial outbreaks but at lower risk of introductions. Finally, we reveal that this large epidemic was a heterogeneous and spatially dissociated collection of transmission clusters of varying size, duration and connectivity. These insights will help to inform interventions in future epidemics.

    Funded by: European Research Council: 260864; Medical Research Council: MR/L015080/1, MR/M501621/1; NCATS NIH HHS: UL1 TR001114; NCRR NIH HHS: UL1 RR025774; NHGRI NIH HHS: R01 HG006139, U01 HG007480; NIAID NIH HHS: HHSN272200700016I, HHSN272201400048C, R01 AI081982, R01 AI104621, R01 AI107034, R01 AI114855, R01 AI117011, R13 AI104216, R43 AI088843, R44 AI088843, R44 AI115754, U01 AI082119, U19 AI110818; NIGMS NIH HHS: R35 GM119774; NIH HHS: 1U01HG007480-01, AI081982, AI082119, AI082805, AI088843, AI104216, AI104621, AI115754, S10 OD020069; National Institute of Allergy and Infectious Disease: 5R01AI/114855-03; Wellcome Trust: 106866/Z/15/Z; World Health Organization: 001

    Nature 2017;544;7650;309-315

  • Population genetic structure and adaptation of malaria parasites on the edge of endemic distribution.

    Duffy CW, Ba H, Assefa S, Ahouidi AD, Deh YB, Tandia A, Kirsebom FCM, Kwiatkowski DP and Conway DJ

    Department of Pathogen Molecular Biology, London School of Hygiene & Tropical Medicine, Keppel St, London, WC1E 7HT, UK.

    To determine whether the major human malaria parasite Plasmodium falciparum exhibits fragmented population structure or local adaptation at the northern limit of its African distribution where the dry Sahel zone meets the Sahara, samples were collected from diverse locations within Mauritania over a range of ~1000 km. Microsatellite genotypes were obtained for 203 clinical infection samples from eight locations, and Illumina paired-end sequences were obtained to yield high coverage genomewide single nucleotide polymorphism (SNP) data for 65 clinical infection samples from four locations. Most infections contained single parasite genotypes, reflecting low rates of transmission and superinfection locally, in contrast to the situation seen in population samples from countries further south. A minority of infections shared related or identical genotypes locally, indicating some repeated transmission of parasite clones without recombination. This caused some multilocus linkage disequilibrium and local divergence, but aside from the effect of repeated genotypes there was minimal differentiation between locations. Several chromosomal regions had elevated integrated haplotype scores (|iHS|) indicating recent selection, including those containing drug resistance genes. A genomewide F<sub>ST</sub> scan comparison with previous sequence data from an area in West Africa with higher infection endemicity indicates that regional gene flow prevents genetic isolation, but revealed allele frequency differentiation at three drug resistance loci and an erythrocyte invasion ligand gene. Contrast of extended haplotype signatures revealed none to be unique to Mauritania. Discrete foci of infection on the edge of the Sahara are genetically highly connected to the wider continental parasite population, and local elimination would be difficult to achieve without very substantial reduction in malaria throughout the region.

    Funded by: European Research Council: 294428; Medical Research Council: G0600718, G1100123, MR/M006212/1; Wellcome Trust

    Molecular ecology 2017;26;11;2880-2894

  • Modulation of Aneuploidy in Leishmania donovani during Adaptation to Different In Vitro and In Vivo Environments and Its Impact on Gene Expression.

    Dumetz F, Imamura H, Sanders M, Seblova V, Myskova J, Pescher P, Vanaerschot M, Meehan CJ, Cuypers B, De Muylder G, Späth GF, Bussotti G, Vermeesch JR, Berriman M, Cotton JA, Volf P, Dujardin JC and Domagalska MA

    Molecular Parasitology, Institute of Tropical Medicine, Antwerp, Belgium.

    Aneuploidy is usually deleterious in multicellular organisms but appears to be tolerated and potentially beneficial in unicellular organisms, including pathogens. <i>Leishmania</i>, a major protozoan parasite, is emerging as a new model for aneuploidy, since <i>in vitro</i>-cultivated strains are highly aneuploid, with interstrain diversity and intrastrain mosaicism. The alternation of two life stages in different environments (extracellular promastigotes and intracellular amastigotes) offers a unique opportunity to study the impact of environment on aneuploidy and gene expression. We sequenced the whole genomes and transcriptomes of <i>Leishmania donovani</i> strains throughout their adaptation to <i>in vivo</i> conditions mimicking natural vertebrate and invertebrate host environments. The nucleotide sequences were almost unchanged within a strain, in contrast to highly variable aneuploidy. Although high in promastigotes <i>in vitro</i>, aneuploidy dropped significantly in hamster amastigotes, in a progressive and strain-specific manner, accompanied by the emergence of new polysomies. After a passage through a sand fly, smaller yet consistent karyotype changes were detected. Changes in chromosome copy numbers were correlated with the corresponding transcript levels, but additional aneuploidy-independent regulation of gene expression was observed. This affected stage-specific gene expression, downregulation of the entire chromosome 31, and upregulation of gene arrays on chromosomes 5 and 8. Aneuploidy changes in <i>Leishmania</i> are probably adaptive and exploited to modulate the dosage and expression of specific genes; they are well tolerated, but additional mechanisms may exist to regulate the transcript levels of other genes located on aneuploid chromosomes. Our model should allow studies of the impact of aneuploidy on molecular adaptations and cellular fitness.<b>IMPORTANCE</b> Aneuploidy is usually detrimental in multicellular organisms, but in several microorganisms, it can be tolerated and even beneficial. <i>Leishmania</i>-a protozoan parasite that kills more than 30,000 people each year-is emerging as a new model for aneuploidy studies, as unexpectedly high levels of aneuploidy are found in clinical isolates. <i>Leishmania</i> lacks classical regulation of transcription at initiation through promoters, so aneuploidy could represent a major adaptive strategy of this parasite to modulate gene dosage in response to stressful environments. For the first time, we document the dynamics of aneuploidy throughout the life cycle of the parasite, <i>in vitro</i> and <i>in vivo</i> We show its adaptive impact on transcription and its interaction with regulation. Besides offering a new model for aneuploidy studies, we show that further genomic studies should be done directly in clinical samples without parasite isolation and that adequate methods should be developed for this.

    Funded by: Wellcome Trust: 098051

    mBio 2017;8;3

  • "Matching" consent to purpose: The example of the Matchmaker Exchange.

    Dyke SOM, Knoppers BM, Hamosh A, Firth HV, Hurles M, Brudno M, Boycott KM, Philippakis AA and Rehm HL

    Centre of Genomics and Policy, Faculty of Medicine, McGill University, Montreal, Quebec, Canada.

    The Matchmaker Exchange (MME) connects rare disease clinicians and researchers to facilitate the sharing of data from undiagnosed patients for the purpose of novel gene discovery. Such sharing raises the odds that two or more similar patients with candidate genes in common may be found, thereby allowing their condition to be more readily studied and understood. Consent considerations for data sharing in MME included both the ethical and legal differences between clinical and research settings and the level of privacy risk involved in sharing varying amounts of rare disease patient data to enable patient matches. In this commentary, we discuss these consent considerations and the resulting MME Consent Policy as they may be relevant to other international data sharing initiatives.

    Funded by: CIHR; NHGRI NIH HHS: U41 HG006627, U41 HG006834, U54 HG006542, UM1 HG008900; Wellcome Trust

    Human mutation 2017;38;10;1281-1285

  • Genome-wide analysis of differential transcriptional and epigenetic variability across human immune cell types.

    Ecker S, Chen L, Pancaldi V, Bagger FO, Fernández JM, Carrillo de Santa Pau E, Juan D, Mann AL, Watt S, Casale FP, Sidiropoulos N, Rapin N, Merkel A, BLUEPRINT Consortium, Stunnenberg HG, Stegle O, Frontini M, Downes K, Pastinen T, Kuijpers TW, Rico D, Valencia A, Beck S, Soranzo N and Paul DS

    Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro 3, 28029, Madrid, Spain.

    Background: A healthy immune system requires immune cells that adapt rapidly to environmental challenges. This phenotypic plasticity can be mediated by transcriptional and epigenetic variability.

    Results: We apply a novel analytical approach to measure and compare transcriptional and epigenetic variability genome-wide across CD14<sup>+</sup>CD16<sup>-</sup> monocytes, CD66b<sup>+</sup>CD16<sup>+</sup> neutrophils, and CD4<sup>+</sup>CD45RA<sup>+</sup> naïve T cells from the same 125 healthy individuals. We discover substantially increased variability in neutrophils compared to monocytes and T cells. In neutrophils, genes with hypervariable expression are found to be implicated in key immune pathways and are associated with cellular properties and environmental exposure. We also observe increased sex-specific gene expression differences in neutrophils. Neutrophil-specific DNA methylation hypervariable sites are enriched at dynamic chromatin regions and active enhancers.

    Conclusions: Our data highlight the importance of transcriptional and epigenetic variability for the key role of neutrophils as the first responders to inflammatory stimuli. We provide a resource to enable further functional studies into the plasticity of immune cells, which can be accessed from: .

    Funded by: British Heart Foundation: RG/08/014/24067, RG/13/13/30194; Medical Research Council: G0800270, MR/L003120/1; Wellcome Trust: WT091310, WT098051

    Genome biology 2017;18;1;18

  • Drug Resistance Mechanisms in Colorectal Cancer Dissected with Cell Type-Specific Dynamic Logic Models.

    Eduati F, Doldàn-Martelli V, Klinger B, Cokelaer T, Sieber A, Kogera F, Dorel M, Garnett MJ, Blüthgen N and Saez-Rodriguez J

    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, United Kingdom.

    Genomic features are used as biomarkers of sensitivity to kinase inhibitors used widely to treat human cancer, but effective patient stratification based on these principles remains limited in impact. Insofar as kinase inhibitors interfere with signaling dynamics, and, in turn, signaling dynamics affects inhibitor responses, we investigated associations in this study between cell-specific dynamic signaling pathways and drug sensitivity. Specifically, we measured 14 phosphoproteins under 43 different perturbed conditions (combinations of 5 stimuli and 7 inhibitors) in 14 colorectal cancer cell lines, building cell line-specific dynamic logic models of underlying signaling networks. Model parameters representing pathway dynamics were used as features to predict sensitivity to a panel of 27 drugs. Specific parameters of signaling dynamics correlated strongly with drug sensitivity for 14 of the drugs, 9 of which had no genomic biomarker. Following one of these associations, we validated a drug combination predicted to overcome resistance to MEK inhibitors by coblockade of GSK3, which was not found based on associations with genomic data. These results suggest that to better understand the cancer resistance and move toward personalized medicine, it is essential to consider signaling network dynamics that cannot be inferred from static genotypes. <i>Cancer Res; 77(12); 3364-75. ©2017 AACR</i>.

    Cancer research 2017;77;12;3364-3375

  • Phylogenetic Analysis of Klebsiella pneumoniae from Hospitalized Children, Pakistan.

    Ejaz H, Wang N, Wilksch JJ, Page AJ, Cao H, Gujaran S, Keane JA, Lithgow T, Ul-Haq I, Dougan G, Strugnell RA and Heinz E

    Klebsiella pneumoniae shows increasing emergence of multidrug-resistant lineages, including strains resistant to all available antimicrobial drugs. We conducted whole-genome sequencing of 178 highly drug-resistant isolates from a tertiary hospital in Lahore, Pakistan. Phylogenetic analyses to place these isolates into global context demonstrate the expansion of multiple independent lineages, including K. quasipneumoniae.

    Funded by: Wellcome Trust

    Emerging infectious diseases 2017;23;11;1872-1875

  • Deriving an optimal threshold of waist circumference for detecting cardiometabolic risk in sub-Saharan Africa.

    Ekoru K, Murphy GAV, Young EH, Delisle H, Jerome CS, Assah F, Longo-Mbenza B, Nzambi JPD, On'Kin JBK, Buntix F, Muyer MC, Christensen DL, Wesseh CS, Sabir A, Okafor C, Gezawa ID, Puepet F, Enang O, Raimi T, Ohwovoriole E, Oladapo OO, Bovet P, Mollentze W, Unwin N, Gray WK, Walker R, Agoudavi K, Siziya S, Chifamba J, Njelekela M, Fourie CM, Kruger S, Schutte AE, Walsh C, Gareta D, Kamali A, Seeley J, Norris SA, Crowther NJ, Pillay D, Kaleebu P, Motala AA and Sandhu MS

    Sandhu Group, Department of Medicine, University of Cambridge, Cambridge, UK.

    Background: Waist circumference (WC) thresholds derived from western populations continue to be used in sub-Saharan Africa (SSA) despite increasing evidence of ethnic variation in the association between adiposity and cardiometabolic disease and availability of data from African populations. We aimed to derive a SSA-specific optimal WC cut-point for identifying individuals at increased cardiometabolic risk.

    Methods: We used individual level cross-sectional data on 24 181 participants aged ⩾15 years from 17 studies conducted between 1990 and 2014 in eight countries in SSA. Receiver operating characteristic curves were used to derive optimal WC cut-points for detecting the presence of at least two components of metabolic syndrome (MS), excluding WC.

    Results: The optimal WC cut-point was 81.2 cm (95% CI 78.5-83.8 cm) and 81.0 cm (95% CI 79.2-82.8 cm) for men and women, respectively, with comparable accuracy in men and women. Sensitivity was higher in women (64%, 95% CI 63-65) than in men (53%, 95% CI 51-55), and increased with the prevalence of obesity. Having WC above the derived cut-point was associated with a twofold probability of having at least two components of MS (age-adjusted odds ratio 2.6, 95% CI 2.4-2.9, for men and 2.2, 95% CI 2.0-2.3, for women).

    Conclusion: The optimal WC cut-point for identifying men at increased cardiometabolic risk is lower (⩾81.2 cm) than current guidelines (⩾94.0 cm) recommend, and similar to that in women in SSA. Prospective studies are needed to confirm these cut-points based on cardiometabolic outcomes.International Journal of Obesity advance online publication, 31 October 2017; doi:10.1038/ijo.2017.240.

    Funded by: Medical Research Council: MR/K013491/1

    International journal of obesity (2005) 2017

  • A reversible haploid mouse embryonic stem cell biobank resource for functional genomics.

    Elling U, Wimmer RA, Leibbrandt A, Burkard T, Michlits G, Leopoldi A, Micheler T, Abdeen D, Zhuk S, Aspalter IM, Handl C, Liebergesell J, Hubmann M, Husa AM, Kinzer M, Schuller N, Wetzel E, van de Loo N, Martinez JAZ, Estoppey D, Riedl R, Yang F, Fu B, Dechat T, Ivics Z, Agu CA, Bell O, Blaas D, Gerhardt H, Hoepfner D, Stark A and Penninger JM

    Institute of Molecular Biotechnology of the Austrian Academy of Science (IMBA), Vienna Biocenter (VBC), Dr. Bohr Gasse 3, Vienna, Austria.

    The ability to directly uncover the contributions of genes to a given phenotype is fundamental for biology research. However, ostensibly homogeneous cell populations exhibit large clonal variance that can confound analyses and undermine reproducibility. Here we used genome-saturated mutagenesis to create a biobank of over 100,000 individual haploid mouse embryonic stem (mES) cell lines targeting 16,970 genes with genetically barcoded, conditional and reversible mutations. This Haplobank is, to our knowledge, the largest resource of hemi/homozygous mutant mES cells to date and is available to all researchers. Reversible mutagenesis overcomes clonal variance by permitting functional annotation of the genome directly in sister cells. We use the Haplobank in reverse genetic screens to investigate the temporal resolution of essential genes in mES cells, and to identify novel genes that control sprouting angiogenesis and lineage specification of blood vessels. Furthermore, a genome-wide forward screen with Haplobank identified PLA2G16 as a host factor that is required for cytotoxicity by rhinoviruses, which cause the common cold. Therefore, clones from the Haplobank combined with the use of reversible technologies enable high-throughput, reproducible, functional annotation of the genome.

    Funded by: Austrian Science Fund FWF: P 23308; European Research Council: 341036

    Nature 2017;550;7674;114-118

  • A non-endoscopic device to sample the oesophageal microbiota: a case-control study.

    Elliott DRF, Walker AW, O'Donovan M, Parkhill J and Fitzgerald RC

    Medical Research Centre Cancer Unit, Hutchison/MRC Research Centre, University of Cambridge, Cambridge, UK.

    Background: The strongest risk factor for oesophageal adenocarcinoma is reflux disease, and the rising incidence of this coincides with the eradication of Helicobacter pylori, both of which might alter the oesophageal microbiota. We aimed to profile the microbiota at different stages of Barrett's carcinogenesis and investigate the Cytosponge as a minimally invasive tool for sampling the oesophageal microbiota.

    Methods: In this case-control study, 16S rRNA gene amplicon sequencing was done on 210 oesophageal samples from 86 patients representing the Barrett's oesophagus progression sequence (normal squamous controls [n=20], non-dysplastic [n=24] and dysplastic Barrett's oesophagus [n=23], and oesophageal adenocarcinoma [n=19]), relevant negative controls, and replicates on the Illumina MiSeq platform. Samples were taken from patients enrolled in the BEST2 study at five UK hospitals and the OCCAMS study at six UK hospitals. We compared fresh frozen tissue, fresh frozen endoscopic brushings, and the Cytosponge device for microbial DNA yield (qPCR), diversity, and community composition.

    Findings: There was decreased microbial diversity in oesophageal adenocarcinoma tissue compared with tissue from healthy control patients as measured by the observed operational taxonomic unit (OTU) richness (p=0·0012), Chao estimated total richness (p=0·0004), and Shannon diversity index (p=0·0075). Lactobacillus fermentum was enriched in oesophageal adenocarcinoma (p=0·028), and lactic acid bacteria dominated the microenvironment in seven (47%) of 15 cases of oesophageal adenocarcinoma. Comparison of oesophageal sampling methods showed that the Cytosponge yielded more than ten-times higher quantities of microbial DNA than did endoscopic brushes or biopsies using quantitative PCR (p<0·0001). The Cytosponge samples contained the majority of taxa detected in biopsy and brush samples, but were enriched for genera from the oral cavity and stomach, including Fusobacterium, Megasphaera, Campylobacter, Capnocytophaga, and Dialister. The Cytosponge detected decreased microbial diversity in patients with high-grade dysplasia in comparison to control patients, as measured by the observed OTU richness (p=0·0147), Chao estimated total richness (p=0·023), and Shannon diversity index (p=0·0085).

    Interpretation: Alterations in microbial communities occur in the lower oesophagus in Barrett's carcinogenesis, which can be detected at the pre-invasive stage of high-grade dysplasia with the novel Cytosponge device. Our findings are potentially applicable to early disease detection, and future test development should focus on longitudinal sampling of the microbiota to monitor for changes in microbial diversity in a larger cohort of patients.

    Funding: Cancer Research UK, National Institute for Health Research, Medical Research Council, Wellcome Trust, The Scottish Government (RESAS).

    Funded by: Cancer Research UK; Medical Research Council; Wellcome Trust

    The lancet. Gastroenterology & hepatology 2017;2;1;32-42

  • Application of rare variant transmission disequilibrium tests to epileptic encephalopathy trio sequence data.

    Epi4K Consortium, EuroEPINOMICS-RES Consortium and Epilepsy Phenome Genome Project

    The classic epileptic encephalopathies, including infantile spasms (IS) and Lennox-Gastaut syndrome (LGS), are severe seizure disorders that usually arise sporadically. De novo variants in genes mainly encoding ion channel and synaptic proteins have been found to account for over 15% of patients with IS or LGS. The contribution of autosomal recessive genetic variation, however, is less well understood. We implemented a rare variant transmission disequilibrium test (TDT) to search for autosomal recessive epileptic encephalopathy genes in a cohort of 320 outbred patient-parent trios that were generally prescreened for rare metabolic disorders. In the current sample, our rare variant transmission disequilibrium test did not identify individual genes with significantly distorted transmission over expectation after correcting for the multiple tests. While the rare variant transmission disequilibrium test did not find evidence of a role for individual autosomal recessive genes, our current sample is insufficiently powered to assess the overall role of autosomal recessive genotypes in an outbred epileptic encephalopathy population.

    Funded by: NHLBI NIH HHS: RC2 HL102923, RC2 HL102924, RC2 HL102925, RC2 HL102926, RC2 HL103010, UC2 HL102923, UC2 HL102924, UC2 HL102925, UC2 HL102926, UC2 HL103010; NIA NIH HHS: P30 AG028377; NIAID NIH HHS: R56 AI098588, U19 AI067854, UM1 AI100645; NIMH NIH HHS: K01 MH098126, R01 MH097993; NINDS NIH HHS: U01 NS053998, U01 NS077274, U01 NS077276, U01 NS077303, U01 NS077364; Wellcome Trust

    European journal of human genetics : EJHG 2017;25;7;894-899

  • Integration of Tmc1/2 into the mechanotransduction complex in zebrafish hair cells is regulated by Transmembrane O-methyltransferase (Tomt).

    Erickson T, Morgan CP, Olt J, Hardy K, Busch-Nentwich EM, Maeda R, Clemens-Grisham R, Krey JF, Nechiporuk AV, Barr-Gillespie PG, Marcotti W and Nicolson T

    Oregon Hearing Research Center and the Vollum Institute, Oregon Health and Science University, Portland, United States.

    Transmembrane O-methyltransferase (TOMT / LRTOMT) is responsible for non-syndromic deafness DFNB63. However, the specific defects that lead to hearing loss have not been described. Using a zebrafish model of DFNB63, we show that the auditory and vestibular phenotypes are due to a lack of mechanotransduction (MET) in Tomt-deficient hair cells. GFP-tagged Tomt is enriched in the Golgi of hair cells, suggesting that Tomt might regulate the trafficking of other MET components to the hair bundle. We found that Tmc1/2 proteins are specifically excluded from the hair bundle in tomt mutants, whereas other MET complex proteins can still localize to the bundle. Furthermore, mouse TOMT and TMC1 can directly interact in HEK 293 cells, and this interaction is modulated by His183 in TOMT. Thus, we propose a model of MET complex assembly where Tomt and the Tmcs interact within the secretory pathway to traffic Tmc proteins to the hair bundle.

    Funded by: NICHD NIH HHS: R01 HD072844

    eLife 2017;6

  • A Temporal Proteomic Map of Epstein-Barr Virus Lytic Replication in B Cells.

    Ersing I, Nobre L, Wang LW, Soday L, Ma Y, Paulo JA, Narita Y, Ashbaugh CW, Jiang C, Grayson NE, Kieff E, Gygi SP, Weekes MP and Gewurz BE

    Division of Infectious Disease, Department of Medicine, Brigham & Women's Hospital, Harvard Medical School, 181 Longwood Avenue, Boston, MA 02115, USA; Institut für Klinische und Molekulare Virologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany.

    Epstein-Barr virus (EBV) replication contributes to multiple human diseases, including infectious mononucleosis, nasopharyngeal carcinoma, B cell lymphomas, and oral hairy leukoplakia. We performed systematic quantitative analyses of temporal changes in host and EBV proteins during lytic replication to gain insights into virus-host interactions, using conditional Burkitt lymphoma models of type I and II EBV infection. We quantified profiles of >8,000 cellular and 69 EBV proteins, including >500 plasma membrane proteins, providing temporal views of the lytic B cell proteome and EBV virome. Our approach revealed EBV-induced remodeling of cell cycle, innate and adaptive immune pathways, including upregulation of the complement cascade and proteasomal degradation of the B cell receptor complex, conserved between EBV types I and II. Cross-comparison with proteomic analyses of human cytomegalovirus infection and of a Kaposi-sarcoma-associated herpesvirus immunoevasin identified host factors targeted by multiple herpesviruses. Our results provide an important resource for studies of EBV replication.

    Funded by: NCI NIH HHS: K08 CA140780, R01 CA085180; NIDDK NIH HHS: K01 DK098285; NIGMS NIH HHS: R01 GM067945; Wellcome Trust

    Cell reports 2017;19;7;1479-1493

  • Loss of PBRM1 rescues VHL dependent replication stress to promote renal carcinogenesis.

    Espana-Agusti J, Warren A, Chew SK, Adams DJ and Matakidou A

    Department of Oncology, University of Cambridge, CRUK Cambridge institute, Cambridge, CB2 0RE, UK.

    Inactivation of the VHL (Von Hippel Lindau) tumour suppressor has long been recognised as necessary for the pathogenesis of clear cell renal cancer (ccRCC); however, the molecular mechanisms underlying transformation and the requirement for additional genetic hits remain unclear. Here, we show that loss of VHL alone results in DNA replication stress and damage accumulation, effects that constrain cellular growth and transformation. By contrast, concomitant loss of the chromatin remodelling factor PBRM1 (mutated in 40% of ccRCC) rescues VHL-induced replication stress, maintaining cellular fitness and allowing proliferation. In line with these data we demonstrate that combined deletion of Vhl and Pbrm1 in the mouse kidney is sufficient for the development of fully-penetrant, multifocal carcinomas, closely mimicking human ccRCC. Our results illustrate how VHL and PBRM1 co-operate to drive renal transformation and uncover replication stress as an underlying vulnerability of all VHL mutated renal cancers that could be therapeutically exploited.

    Funded by: Cancer Research UK: 12177, C37839/A12177; Wellcome Trust

    Nature communications 2017;8;1;2026

  • Identification and initial characterisation of a protein involved in Campylobacter jejuni cell shape.

    Esson D, Gupta S, Bailey D, Wigley P, Wedley A, Mather AE, Méric G, Mastroeni P, Sheppard SK, Thomson NR, Parkhill J, Maskell DJ, Christie G and Grant AJ

    Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge, UK.

    Campylobacter jejuni is the leading cause of bacterial food borne illness. While helical cell shape is considered important for C. jejuni pathogenesis, this bacterium is capable of adopting other morphologies. To better understand how helical-shaped C. jejuni maintain their shape and thus any associated colonisation, pathogenicity or other advantage, it is first important to identify the genes and proteins involved. So far, two peptidoglycan modifying enzymes Pgp1 and Pgp2 have been shown to be required for C. jejuni helical cell shape. We performed a visual screen of ∼2000 transposon mutants of C. jejuni for cell shape mutants. Whole genome sequence data of the mutants with altered cell shape, directed mutants, wild type stocks and isolated helical and rod-shaped 'wild type' C. jejuni, identified a number of different mutations in pgp1 and pgp2, which result in a change in helical to rod bacterial cell shape. We also identified an isolate with a loss of curvature. In this study, we have identified the genomic change in this isolate, and found that targeted deletion of the gene with the change resulted in bacteria with loss of curvature. Helical cell shape was restored by supplying the gene in trans. We examined the effect of loss of the gene on bacterial motility, adhesion and invasion of tissue culture cells and chicken colonisation, as well as the effect on the muropeptide profile of the peptidoglycan sacculus. Our work identifies another factor involved in helical cell shape.

    Funded by: Medical Research Council: MR/L015080/1

    Microbial pathogenesis 2017;104;202-211

  • Bringing Treponema into the spotlight.

    Everall I and Sánchez-Busó L

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2017;15;4;196

  • Genomic epidemiology of a national outbreak of post-surgical Mycobacterium abscessus wound infections in Brazil.

    Everall I, Nogueira CL, Bryant JM, Sánchez-Busó L, Chimara E, Duarte RDS, Ramos JP, Lima KVB, Lopes ML, Palaci M, Kipnis A, Monego F, Floto RA, Parkhill J, Leão SC and Harris SR

    1​Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    An epidemic of post-surgical wound infections, caused by a non-tuberculous mycobacterium, has been on-going in Brazil. It has been unclear whether one or multiple lineages are responsible and whether their wide geographical distribution across Brazil is due to spread from a single point source or is the result of human-mediated transmission. 188 isolates, collected from nine Brazilian states, were whole genome sequenced and analysed using phylogenetic and comparative genomic approaches. The isolates from Brazil formed a single clade, which was estimated to have emerged in 2003. We observed temporal and geographic structure within the lineage that enabled us to infer the movement of sub-lineages across Brazil. The genome size of the Brazilian lineage was reduced relative to most strains in the three subspecies of <i>Mycobacterium abscessus</i> and contained a novel plasmid, pMAB02, in addition to the previously described pMAB01 plasmid. One lineage, which emerged just prior to the initial outbreak, is responsible for the epidemic of post-surgical wound infections in Brazil. Phylogenetic analysis indicates that multiple transmission events led to its spread. The presence of a novel plasmid and the reduced genome size suggest that the lineage has undergone adaptation to the surgical niche.

    Funded by: Wellcome Trust: 098051, 10224/Z/15/Z, 107032AIA

    Microbial genomics 2017;3;5;e000111

  • Structural analysis of pathogenic mutations in the DYRK1A gene in patients with developmental disorders.

    Evers JM, Laskowski RA, Bertolli M, Clayton-Smith J, Deshpande C, Eason J, Elmslie F, Flinter F, Gardiner C, Hurst JA, Kingston H, Kini U, Lampe AK, Lim D, Male A, Naik S, Parker MJ, Price S, Robert L, Sarkar A, Straub V, Woods G, Thornton JM, DDD Study and Wright CF

    European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK.

    Haploinsufficiency in DYRK1A is associated with a recognizable developmental syndrome, though the mechanism of action of pathogenic missense mutations is currently unclear. Here we present 19 de novo mutations in this gene, including five missense mutations, identified by the Deciphering Developmental Disorder study. Protein structural analysis reveals that the missense mutations are either close to the ATP or peptide binding-sites within the kinase domain, or are important for protein stability, suggesting they lead to a loss of the protein's function mechanism. Furthermore, there is some correlation between the magnitude of the change and the severity of the resultant phenotype. A comparison of the distribution of the pathogenic mutations along the length of DYRK1A with that of natural variants, as found in the ExAC database, confirms that mutations in the N-terminal end of the kinase domain are more disruptive of protein function. In particular, pathogenic mutations occur in significantly closer proximity to the ATP and the substrate peptide than the natural variants. Overall, we suggest that de novo dominant mutations in DYRK1A account for nearly 0.5% of severe developmental disorders due to substantially reduced kinase function.

    Funded by: Wellcome Trust: WT098051

    Human molecular genetics 2017;26;3;519-526

  • Integrated genome and transcriptome sequencing identifies a noncoding mutation in the genome replication factor DONSON as the cause of microcephaly-micromelia syndrome.

    Evrony GD, Cordero DR, Shen J, Partlow JN, Yu TW, Rodin RE, Hill RS, Coulter ME, Lam AN, Jayaraman D, Gerrelli D, Diaz DG, Santos C, Morrison V, Galli A, Tschulena U, Wiemann S, Martel MJ, Spooner B, Ryu SC, Elhosary PC, Richardson JM, Tierney D, Robinson CA, Chibbar R, Diudea D, Folkerth R, Wiebe S, Barkovich AJ, Mochida GH, Irvine J, Lemire EG, Blakley P and Walsh CA

    Division of Genetics and Genomics, Manton Center for Orphan Disease, and Howard Hughes Medical Institute, Boston Children's Hospital, Boston, Massachusetts 02115, USA.

    While next-generation sequencing has accelerated the discovery of human disease genes, progress has been largely limited to the "low hanging fruit" of mutations with obvious exonic coding or canonical splice site impact. In contrast, the lack of high-throughput, unbiased approaches for functional assessment of most noncoding variants has bottlenecked gene discovery. We report the integration of transcriptome sequencing (RNA-seq), which surveys all mRNAs to reveal functional impacts of variants at the transcription level, into the gene discovery framework for a unique human disease, microcephaly-micromelia syndrome (MMS). MMS is an autosomal recessive condition described thus far in only a single First Nations population and causes intrauterine growth restriction, severe microcephaly, craniofacial anomalies, skeletal dysplasia, and neonatal lethality. Linkage analysis of affected families, including a very large pedigree, identified a single locus on Chromosome 21 linked to the disease (LOD > 9). Comprehensive genome sequencing did not reveal any pathogenic coding or canonical splicing mutations within the linkage region but identified several nonconserved noncoding variants. RNA-seq analysis detected aberrant splicing in <i>DONSON</i> due to one of these noncoding variants, showing a causative role for <i>DONSON</i> disruption in MMS. We show that <i>DONSON</i> is expressed in progenitor cells of embryonic human brain and other proliferating tissues, is co-expressed with components of the DNA replication machinery, and that <i>Donson</i> is essential for early embryonic development in mice as well, suggesting an essential conserved role for DONSON in the cell cycle. Our results demonstrate the utility of integrating transcriptomics into the study of human genetic disease when DNA sequencing alone is not sufficient to reveal the underlying pathogenic mutation.

    Funded by: Medical Research Council: G0700089, MC_PC_15004; NICHD NIH HHS: K12 HD001255, U54 HD090255; NIDCD NIH HHS: R03 DC013866; NIGMS NIH HHS: T32 GM007753; NIMH NIH HHS: U24 MH081810; NINDS NIH HHS: R01 NS035129

    Genome research 2017;27;8;1323-1335

  • Multiple short windows of calcium-dependent protein kinase 4 activity coordinate distinct cell cycle events during Plasmodium gametogenesis.

    Fang H, Klages N, Baechler B, Hillner E, Yu L, Pardo M, Choudhary J and Brochet M

    Department of Microbiology and Molecular Medicine, University of Geneva, Geneva, Switzerland.

    Malaria transmission relies on the production of gametes following ingestion by a mosquito. Here, we show that Ca<sup>2+</sup>-dependent protein kinase 4 controls three processes essential to progress from a single haploid microgametocyte to the release of eight flagellated microgametes in <i>Plasmodium berghei</i>. A myristoylated isoform is activated by Ca<sup>2+</sup> to initiate a first genome replication within twenty seconds of activation. This role is mediated by a protein of the SAPS-domain family involved in S-phase entry. At the same time, CDPK4 is required for the assembly of the subsequent mitotic spindle and to phosphorylate a microtubule-associated protein important for mitotic spindle formation. Finally, a non-myristoylated isoform is essential to complete cytokinesis by activating motility of the male flagellum. This role has been linked to phosphorylation of an uncharacterised flagellar protein. Altogether, this study reveals how a kinase integrates and transduces multiple signals to control key cell-cycle transitions during <i>Plasmodium</i> gametogenesis.

    Funded by: Wellcome Trust

    eLife 2017;6

  • Neutrophil-mediated IL-6 receptor trans-signaling and the risk of chronic obstructive pulmonary disease and asthma.

    Farahi N, Paige E, Balla J, Prudence E, Ferreira RC, Southwood M, Appleby SL, Bakke P, Gulsvik A, Litonjua AA, Sparrow D, Silverman EK, Cho MH, Danesh J, Paul DS, Freitag DF and Chilvers ER

    Division of Respiratory Medicine, Department of Medicine, University of Cambridge School of Clinical Medicine, Cambridge CB2 0QQ, UK.

    The Asp358Ala variant in the interleukin-6 receptor (IL-6R) gene has been implicated in asthma, autoimmune and cardiovascular disorders, but its role in other respiratory conditions such as chronic obstructive pulmonary disease (COPD) has not been investigated. The aims of this study were to evaluate whether there is an association between Asp358Ala and COPD or asthma risk, and to explore the role of the Asp358Ala variant in sIL-6R shedding from neutrophils and its pro-inflammatory effects in the lung. We undertook logistic regression using data from the UK Biobank and the ECLIPSE COPD cohort. Results were meta-analyzed with summary data from a further three COPD cohorts (7,519 total cases and 35,653 total controls), showing no association between Asp358Ala and COPD (OR = 1.02 [95% CI: 0.96, 1.07]). Data from the UK Biobank showed a positive association between the Asp358Ala variant and atopic asthma (OR = 1.07 [1.01, 1.13]). In a series of in vitro studies using blood samples from 37 participants, we found that shedding of sIL-6R from neutrophils was greater in carriers of the Asp358Ala minor allele than in non-carriers. Human pulmonary artery endothelial cells cultured with serum from homozygous carriers showed an increase in MCP-1 release in carriers of the minor allele, with the difference eliminated upon addition of tocilizumab. In conclusion, there is evidence that neutrophils may be an important source of sIL-6R in the lungs, and the Asp358Ala variant may have pro-inflammatory effects in lung cells. However, we were unable to identify evidence for an association between Asp358Ala and COPD.

    Funded by: British Heart Foundation: RG/08/014/24067, RG/13/13/30194; Medical Research Council: G0800270, MC_QA137853, MR/J00345X/1, MR/L003120/1; NHLBI NIH HHS: R01 HL089856, R01 HL089897, U01 HL089856, U01 HL089897; NIEHS NIH HHS: R25 ES011080

    Human molecular genetics 2017;26;8;1584-1596

  • How to use… lymph node biopsy in paediatrics.

    Farndon S, Behjati S, Jonas N and Messahel B

    Cancer Genome Project, Wellcome Trust Sanger Institute, Cambridge, UK.

    Lymphadenopathy is a common finding in children. It often causes anxiety among parents and healthcare professionals because it can be a sign of cancer. There is limited high-quality evidence to guide clinicians as to which children should be referred for lymph node biopsy. The gold standard method for evaluating lymphadenopathy of unknown cause is an excision biopsy. In this Interpretation, we discuss the use of lymph node biopsy in children.

    Archives of disease in childhood. Education and practice edition 2017;102;5;244-248

  • Association of Genetic Variants Related to CETP Inhibitors and Statins With Lipoprotein Levels and Cardiovascular Risk.

    Ference BA, Kastelein JJP, Ginsberg HN, Chapman MJ, Nicholls SJ, Ray KK, Packard CJ, Laufs U, Brook RD, Oliver-Williams C, Butterworth AS, Danesh J, Smith GD, Catapano AL and Sabatine MS

    Division of Cardiovascular Medicine, Wayne State University School of Medicine, Detroit, Michigan.

    Importance: Some cholesteryl ester transfer protein (CETP) inhibitors lower low-density lipoprotein cholesterol (LDL-C) levels without reducing cardiovascular events, suggesting that the clinical benefit of lowering LDL-C may depend on how LDL-C is lowered.

    Objective: To estimate the association between changes in levels of LDL-C (and other lipoproteins) and the risk of cardiovascular events related to variants in the CETP gene, both alone and in combination with variants in the 3-hydroxy-3-methylglutaryl-CoA reductase (HMGCR) gene.

    Design, setting, and participants: Mendelian randomization analyses evaluating the association between CETP and HMGCR scores, changes in lipid and lipoprotein levels, and the risk of cardiovascular events involving 102 837 participants from 14 cohort or case-control studies conducted in North America or the United Kingdom between 1948 and 2012. The associations with cardiovascular events were externally validated in 189 539 participants from 48 studies conducted between 2011 and 2015.

    Exposures: Differences in mean high-density lipoprotein cholesterol (HDL-C), LDL-C, and apolipoprotein B (apoB) levels in participants with CETP scores at or above vs below the median.

    Main outcomes and measures: Odds ratio (OR) for major cardiovascular events.

    Results: The primary analysis included 102 837 participants (mean age, 59.9 years; 58% women) who experienced 13 821 major cardiovascular events. The validation analyses included 189 539 participants (mean age, 58.5 years; 39% women) with 62 240 cases of coronary heart disease (CHD). Considered alone, the CETP score was associated with higher levels of HDL-C, lower LDL-C, concordantly lower apoB, and a corresponding lower risk of major vascular events (OR, 0.946 [95% CI, 0.921-0.972]) that was similar in magnitude to the association between the HMGCR score and risk of major cardiovascular events per unit change in levels of LDL-C (and apoB). When combined with the HMGCR score, the CETP score was associated with the same reduction in LDL-C levels but an attenuated reduction in apoB levels and a corresponding attenuated nonsignificant risk of major cardiovascular events (OR, 0.985 [95% CI, 0.955-1.015]). In external validation analyses, a genetic score consisting of variants with naturally occurring discordance between levels of LDL-C and apoB was associated with a similar risk of CHD per unit change in apoB level (OR, 0.782 [95% CI, 0.720-0.845] vs 0.793 [95% CI, 0.774-0.812]; P = .79 for difference), but a significantly attenuated risk of CHD per unit change in LDL-C level (OR, 0.916 [95% CI, 0.890-0.943] vs 0.831 [95% CI, 0.816-0.847]; P < .001) compared with a genetic score associated with concordant changes in levels of LDL-C and apoB.

    Conclusions and relevance: Combined exposure to variants in the genes that encode the targets of CETP inhibitors and statins was associated with discordant reductions in LDL-C and apoB levels and a corresponding risk of cardiovascular events that was proportional to the attenuated reduction in apoB but significantly less than expected per unit change in LDL-C. The clinical benefit of lowering LDL-C levels may therefore depend on the corresponding reduction in apoB-containing lipoprotein particles.

    Funded by: British Heart Foundation: RG/08/014/24067; Medical Research Council: MC_UU_12013/1, MR/L003120/1

    JAMA 2017;318;10;947-956

  • Arc Requires PSD95 for Assembly into Postsynaptic Complexes Involved with Neural Dysfunction and Intelligence.

    Fernández E, Collins MO, Frank RAW, Zhu F, Kopanitsa MV, Nithianantharajah J, Lemprière SA, Fricker D, Elsegood KA, McLaughlin CL, Croning MDR, Mclean C, Armstrong JD, Hill WD, Deary IJ, Cencelli G, Bagni C, Fromer M, Purcell SM, Pocklington AJ, Choudhary JS, Komiyama NH and Grant SGN

    Genes to Cognition Programme, The Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK; KU Leuven, Center for Human Genetics and Leuven Institute for Neurodegenerative Diseases (LIND), and VIB Center for the Biology of Disease, Leuven, Belgium.

    Arc is an activity-regulated neuronal protein, but little is known about its interactions, assembly into multiprotein complexes, and role in human disease and cognition. We applied an integrated proteomic and genetic strategy by targeting a tandem affinity purification (TAP) tag and Venus fluorescent protein into the endogenous Arc gene in mice. This allowed biochemical and proteomic characterization of native complexes in wild-type and knockout mice. We identified many Arc-interacting proteins, of which PSD95 was the most abundant. PSD95 was essential for Arc assembly into 1.5-MDa complexes and activity-dependent recruitment to excitatory synapses. Integrating human genetic data with proteomic data showed that Arc-PSD95 complexes are enriched in schizophrenia, intellectual disability, autism, and epilepsy mutations and normal variants in intelligence. We propose that Arc-PSD95 postsynaptic complexes potentially affect human cognitive function.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F019394/1; Medical Research Council: G0801418, G0802238, MR/K026992/1, MR/L010305/1, MR/M501682/1, MR/P005748/1; Wellcome Trust

    Cell reports 2017;21;3;679-691

  • An efficient method for generation of bi-allelic null mutant mouse embryonic stem cells and its application for investigating epigenetic modifiers.

    Fisher CL, Marks H, Cho LT, Andrews R, Wormald S, Carroll T, Iyer V, Tate P, Rosen B, Stunnenberg HG, Fisher AG and Skarnes WC

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Mouse embryonic stem (ES) cells are a popular model system to study biological processes, though uncovering recessive phenotypes requires inactivating both alleles. Building upon resources from the International Knockout Mouse Consortium (IKMC), we developed a targeting vector for second allele inactivation in conditional-ready IKMC 'knockout-first' ES cell lines. We applied our technology to several epigenetic regulators, recovering bi-allelic targeted clones with a high efficiency of 60% and used Flp recombinase to restore expression in two null cell lines to demonstrate how our system confirms causality through mutant phenotype reversion. We designed our strategy to select against re-targeting the 'knockout-first' allele and identify essential genes in ES cells, including the histone methyltransferase Setdb1. For confirmation, we exploited the flexibility of our system, enabling tamoxifen inducible conditional gene ablation while controlling for genetic background and tamoxifen effects. Setdb1 ablated ES cells exhibit severe growth inhibition, which is not rescued by exogenous Nanog expression or culturing in naive pluripotency '2i' media, suggesting that the self-renewal defect is mediated through pluripotency network independent pathways. Our strategy to generate null mutant mouse ES cells is applicable to thousands of genes and repurposes existing IKMC Intermediate Vectors.

    Funded by: Medical Research Council: MC_U120027516

    Nucleic acids research 2017;45;21;e174

  • Sequence data and association statistics from 12,940 type 2 diabetes cases and controls.

    Flannick J, Fuchsberger C, Mahajan A, Teslovich TM, Agarwala V, Gaulton KJ, Caulkins L, Koesterer R, Ma C, Moutsianas L, McCarthy DJ, Rivas MA, Perry JRB, Sim X, Blackwell TW, Robertson NR, Rayner NW, Cingolani P, Locke AE, Tajes JF, Highland HM, Dupuis J, Chines PS, Lindgren CM, Hartl C, Jackson AU, Chen H, Huyghe JR, van de Bunt M, Pearson RD, Kumar A, Müller-Nurasyid M, Grarup N, Stringham HM, Gamazon ER, Lee J, Chen Y, Scott RA, Below JE, Chen P, Huang J, Go MJ, Stitzel ML, Pasko D, Parker SCJ, Varga TV, Green T, Beer NL, Day-Williams AG, Ferreira T, Fingerlin T, Horikoshi M, Hu C, Huh I, Ikram MK, Kim BJ, Kim Y, Kim YJ, Kwon MS, Lee J, Lee S, Lin KH, Maxwell TJ, Nagai Y, Wang X, Welch RP, Yoon J, Zhang W, Barzilai N, Voight BF, Han BG, Jenkinson CP, Kuulasmaa T, Kuusisto J, Manning A, Ng MCY, Palmer ND, Balkau B, Stančáková A, Abboud HE, Boeing H, Giedraitis V, Prabhakaran D, Gottesman O, Scott J, Carey J, Kwan P, Grant G, Smith JD, Neale BM, Purcell S, Butterworth AS, Howson JMM, Lee HM, Lu Y, Kwak SH, Zhao W, Danesh J, Lam VKL, Park KS, Saleheen D, So WY, Tam CHT, Afzal U, Aguilar D, Arya R, Aung T, Chan E, Navarro C, Cheng CY, Palli D, Correa A, Curran JE, Rybin D, Farook VS, Fowler SP, Freedman BI, Griswold M, Hale DE, Hicks PJ, Khor CC, Kumar S, Lehne B, Thuillier D, Lim WY, Liu J, Loh M, Musani SK, Puppala S, Scott WR, Yengo L, Tan ST, Taylor HA, Thameem F, Wilson G, Wong TY, Njølstad PR, Levy JC, Mangino M, Bonnycastle LL, Schwarzmayr T, Fadista J, Surdulescu GL, Herder C, Groves CJ, Wieland T, Bork-Jensen J, Brandslund I, Christensen C, Koistinen HA, Doney ASF, Kinnunen L, Esko T, Farmer AJ, Hakaste L, Hodgkiss D, Kravic J, Lyssenko V, Hollensted M, Jørgensen ME, Jørgensen T, Ladenvall C, Justesen JM, Käräjämäki A, Kriebel J, Rathmann W, Lannfelt L, Lauritzen T, Narisu N, Linneberg A, Melander O, Milani L, Neville M, Orho-Melander M, Qi L, Qi Q, Roden M, Rolandsson O, Swift A, Rosengren AH, Stirrups K, Wood AR, Mihailov E, Blancher C, Carneiro MO, Maguire J, Poplin R, Shakir K, Fennell T, DePristo M, de Angelis MH, Deloukas P, Gjesing AP, Jun G, Nilsson P, Murphy J, Onofrio R, Thorand B, Hansen T, Meisinger C, Hu FB, Isomaa B, Karpe F, Liang L, Peters A, Huth C, O'Rahilly SP, Palmer CNA, Pedersen O, Rauramaa R, Tuomilehto J, Salomaa V, Watanabe RM, Syvänen AC, Bergman RN, Bharadwaj D, Bottinger EP, Cho YS, Chandak GR, Chan JC, Chia KS, Daly MJ, Ebrahim SB, Langenberg C, Elliott P, Jablonski KA, Lehman DM, Jia W, Ma RCW, Pollin TI, Sandhu M, Tandon N, Froguel P, Barroso I, Teo YY, Zeggini E, Loos RJF, Small KS, Ried JS, DeFronzo RA, Grallert H, Glaser B, Metspalu A, Wareham NJ, Walker M, Banks E, Gieger C, Ingelsson E, Im HK, Illig T, Franks PW, Buck G, Trakalo J, Buck D, Prokopenko I, Mägi R, Lind L, Farjoun Y, Owen KR, Gloyn AL, Strauch K, Tuomi T, Kooner JS, Lee JY, Park T, Donnelly P, Morris AD, Hattersley AT, Bowden DW, Collins FS, Atzmon G, Chambers JC, Spector TD, Laakso M, Strom TM, Bell GI, Blangero J, Duggirala R, Tai ES, McVean G, Hanis CL, Wilson JG, Seielstad M, Frayling TM, Meigs JB, Cox NJ, Sladek R, Lander ES, Gabriel S, Mohlke KL, Meitinger T, Groop L, Abecasis G, Scott LJ, Morris AP, Kang HM, Altshuler D, Burtt NP, Florez JC, Boehnke M and McCarthy MI

    Department of Molecular Biology, Massachusetts General Hospital, Boston, Massachusetts, USA.

    To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1-5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced individuals, and, to increase power, most were tested in larger numbers of individuals (>80% of low-frequency coding variants in ~82 K Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44 K Europeans via genotype imputation). The variants, genotypes, and association statistics from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D.

    Funded by: British Heart Foundation: RG/08/014/24067, RG/14/5/30893; Medical Research Council: G0601261, G0601966, G0700931, MC_PC_13040, MC_UU_12015/1, MC_UU_12015/2, MR/K002414/1, MR/K007017/1, MR/L003120/1, MR/L01341X/1, MR/L01632X/1, MR/M501633/1, MR/M501633/2; NHLBI NIH HHS: T32 HL007055; NIA NIH HHS: P30 AG038072; NIDDK NIH HHS: F32 DK079466, K24 DK080140, P30 DK020541, P30 DK020572, P30 DK020595, R01 DK072193, R01 DK093757, R01 DK101478, R01 DK106236, R01 DK107904, U01 DK078616, U01 DK085524; NIGMS NIH HHS: U54 GM115428; NIH HHS: S10 OD018522

    Scientific data 2017;4;170179

  • Genome editing reveals a role for OCT4 in human embryogenesis.

    Fogarty NME, McCarthy A, Snijders KE, Powell BE, Kubikova N, Blakeley P, Lea R, Elder K, Wamaitha SE, Kim D, Maciulyte V, Kleinjung J, Kim JS, Wells D, Vallier L, Bertero A, Turner JMA and Niakan KK

    Human Embryo and Stem Cell Laboratory, The Francis Crick Institute, 1 Midland Road, London NW1 1AT, UK.

    Despite their fundamental biological and clinical importance, the molecular mechanisms that regulate the first cell fate decisions in the human embryo are not well understood. Here we use CRISPR-Cas9-mediated genome editing to investigate the function of the pluripotency transcription factor OCT4 during human embryogenesis. We identified an efficient OCT4-targeting guide RNA using an inducible human embryonic stem cell-based system and microinjection of mouse zygotes. Using these refined methods, we efficiently and specifically targeted the gene encoding OCT4 (POU5F1) in diploid human zygotes and found that blastocyst development was compromised. Transcriptomics analysis revealed that, in POU5F1-null cells, gene expression was downregulated not only for extra-embryonic trophectoderm genes, such as CDX2, but also for regulators of the pluripotent epiblast, including NANOG. By contrast, Pou5f1-null mouse embryos maintained the expression of orthologous genes, and blastocyst development was established, but maintenance was compromised. We conclude that CRISPR-Cas9-mediated genome editing is a powerful method for investigating gene function in the context of human development.

    Funded by: British Heart Foundation: FS/11/77/39327; Cancer Research UK: FC001120, FC001193; Medical Research Council: FC001120, FC001193, MC_PC_12009; Wellcome Trust: FC001120, FC001193

    Nature 2017;550;7674;67-73

  • Deletion of the MAD2L1 spindle assembly checkpoint gene is tolerated in mouse models of acute T-cell lymphoma and hepatocellular carcinoma.

    Foijer F, Albacker LA, Bakker B, Spierings DC, Yue Y, Xie SZ, Davis S, Lutum-Jehle A, Takemoto D, Hare B, Furey B, Bronson RT, Lansdorp PM, Bradley A and Sorger PK

    European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.

    Chromosome instability (CIN) is deleterious to normal cells because of the burden of aneuploidy. However, most human solid tumors have an abnormal karyotype implying that gain and loss of chromosomes by cancer cells confers a selective advantage. CIN can be induced in the mouse by inactivating the spindle assembly checkpoint. This is lethal in the germline but we show here that adult T cells and hepatocytes can survive conditional inactivation of the Mad2l1 SAC gene and resulting CIN. This causes rapid onset of acute lymphoblastic leukemia (T-ALL) and progressive development of hepatocellular carcinoma (HCC), both lethal diseases. The resulting DNA copy number variation and patterns of chromosome loss and gain are tumor-type specific, suggesting differential selective pressures on the two tumor cell types.

    Funded by: NCI NIH HHS: P01 CA139980, R01 CA084179

    eLife 2017;6

  • Conservation and diversification of small RNA pathways within flatworms.

    Fontenla S, Rinaldi G, Smircich P and Tort JF

    Departamento de Genética, Facultad de Medicina, Universidad de la República (UDELAR), Gral. Flores 2125, CP11800, Montevideo, MVD, Uruguay.

    Background: Small non-coding RNAs, including miRNAs, and gene silencing mediated by RNA interference have been described in free-living and parasitic lineages of flatworms, but only few key factors of the small RNA pathways have been exhaustively investigated in a limited number of species. The availability of flatworm draft genomes and predicted proteomes allowed us to perform an extended survey of the genes involved in small non-coding RNA pathways in this phylum.

    Results: Overall, findings show that the small non-coding RNA pathways are conserved in all the analyzed flatworm linages; however notable peculiarities were identified. While Piwi genes are amplified in free-living worms they are completely absent in all parasitic species. Remarkably all flatworms share a specific Argonaute family (FL-Ago) that has been independently amplified in different lineages. Other key factors such as Dicer are also duplicated, with Dicer-2 showing structural differences between trematodes, cestodes and free-living flatworms. Similarly, a very divergent GW182 Argonaute interacting protein was identified in all flatworm linages. Contrasting to this, genes involved in the amplification of the RNAi interfering signal were detected only in the ancestral free living species Macrostomum lignano. We here described all the putative small RNA pathways present in both free living and parasitic flatworm lineages.

    Conclusion: These findings highlight innovations specifically evolved in platyhelminths presumably associated with novel mechanisms of gene expression regulation mediated by small RNA pathways that differ to what has been classically described in model organisms. Understanding these phylum-specific innovations and the differences between free living and parasitic species might provide clues to adaptations to parasitism, and would be relevant for gene-silencing technology development for parasitic flatworms that infect hundreds of million people worldwide.

    BMC evolutionary biology 2017;17;1;215

  • Mycoplasma genitalium: whole genome sequence analysis, recombination and population structure.

    Fookes MC, Hadfield J, Harris S, Parmar S, Unemo M, Jensen JS and Thomson NR

    Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Background: Although Mycoplasma genitalium is a common sexually transmitted pathogen causing clinically distinct diseases both in male and females, few genomes have been sequenced up to now, due mainly to its fastidious nature and slow growth. Hence, we lack a robust phylogenetic framework to provide insights into the population structure of the species. Currently our understanding of the nature and diversity of M. genitalium relies on molecular tests targeting specific genes or regions of the genome and knowledge is limited by a general under-testing internationally. This is set against a background of drug resistance whereby M. genitalium has developed resistance to mainly all therapeutic antimicrobials.

    Results: We sequenced 28 genomes of Mycoplasma genitalium from temporally (1980-2010) and geographically (Europe, Japan, Australia) diverse sources. All the strain showed essentially the same genomic content without any accessory regions found. However, we identified extensive recombination across their genomes with a total of 25 regions showing heightened levels of SNP density. These regions include the MgPar loci, associated with host interactions, as well as other genes that could also be involved in this role. Using these data, we generated a robust phylogeny which shows that there are two main clades with differing degrees of genomic variability. SNPs found in region V of 23S rRNA and parC were consistent with azithromycin/erythromycin and fluoroquinolone resistances, respectively, and with their phenotypic MIC data.

    Conclusions: The sequence data here generated is essential for designing rational approaches to type and track Mycoplasma genitalium as antibiotic resistance increases. It represents a first approach to its population genetics to better appreciate the role of this organism as a sexually transmitted pathogen.

    Funded by: Wellcome Trust: 098051

    BMC genomics 2017;18;1;993

  • Genome-wide genetic screening with chemically mutagenized haploid embryonic stem cells.

    Forment JV, Herzog M, Coates J, Konopka T, Gapp BV, Nijman SM, Adams DJ, Keane TM and Jackson SP

    The Wellcome Trust and Cancer Research UK Gurdon Institute, and Department of Biochemistry, University of Cambridge, Cambridge, UK.

    In model organisms, classical genetic screening via random mutagenesis provides key insights into the molecular bases of genetic interactions, helping to define synthetic lethality, synthetic viability and drug-resistance mechanisms. The limited genetic tractability of diploid mammalian cells, however, precludes this approach. Here, we demonstrate the feasibility of classical genetic screening in mammalian systems by using haploid cells, chemical mutagenesis and next-generation sequencing, providing a new tool to explore mammalian genetic interactions.

    Funded by: Cancer Research UK: 13031, A11224; European Research Council: 311166

    Nature chemical biology 2017;13;1;12-14

  • Illuminating microbial diversity.

    Forster SC

    Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK and the Hudson Institute of Medical Research, Clayton, Victoria 3168, Australia.

    Nature reviews. Microbiology 2017;15;10;578

  • Identification of highly-protective combinations of Plasmodium vivax recombinant proteins for vaccine development.

    França CT, White MT, He WQ, Hostetler JB, Brewster J, Frato G, Malhotra I, Gruszczyk J, Huon C, Lin E, Kiniboro B, Yadava A, Siba P, Galinski MR, Healer J, Chitnis C, Cowman AF, Takashima E, Tsuboi T, Tham WH, Fairhurst RM, Rayner JC, King CL and Mueller I

    Division of Population Health and Immunity, Walter and Eliza Hall Institute, Parkville, Australia.

    The study of antigenic targets of naturally-acquired immunity is essential to identify and prioritize antigens for further functional characterization. We measured total IgG antibodies to 38 <i>P. vivax</i> antigens, investigating their relationship with prospective risk of malaria in a cohort of 1-3 years old Papua New Guinean children. Using simulated annealing algorithms, the potential protective efficacy of antibodies to multiple antigen-combinations, and the antibody thresholds associated with protection were investigated for the first time. High antibody levels to multiple known and newly identified proteins were strongly associated with protection (IRR 0.44-0.74, p<0.001-0.041). Among five-antigen combinations with the strongest protective effect (>90%), EBP, DBPII, RBP1a, CyRPA, and PVX_081550 were most frequently identified; several of them requiring very low antibody levels to show a protective association. These data identify individual antigens that should be prioritized for further functional testing and establish a clear path to testing a multicomponent <i>P. vivax</i> vaccine.

    Funded by: Medical Research Council: MR/J002283/1, MR/L012170/1; NIAID NIH HHS: U19 AI089686; NIH HHS: P51 OD011132; Wellcome Trust: 098051

    eLife 2017;6

  • Genome-wide transposon screening and quantitative insertion site sequencing for cancer gene discovery in mice.

    Friedrich MJ, Rad L, Bronner IF, Strong A, Wang W, Weber J, Mayho M, Ponstingl H, Engleitner T, Grove C, Pfaus A, Saur D, Cadiñanos J, Quail MA, Vassiliou GS, Liu P, Bradley A and Rad R

    The Wellcome Trust Sanger Institute, Genome Campus, Hinxton/Cambridge, UK.

    Transposon-mediated forward genetics screening in mice has emerged as a powerful tool for cancer gene discovery. It pinpoints cancer drivers that are difficult to find with other approaches, thus complementing the sequencing-based census of human cancer genes. We describe here a large series of mouse lines for insertional mutagenesis that are compatible with two transposon systems, PiggyBac and Sleeping Beauty, and give guidance on the use of different engineered transposon variants for constitutive or tissue-specific cancer gene discovery screening. We also describe a method for semiquantitative transposon insertion site sequencing (QiSeq). The QiSeq library preparation protocol exploits acoustic DNA fragmentation to reduce bias inherent to widely used restriction-digestion-based approaches for ligation-mediated insertion site amplification. Extensive multiplexing in combination with next-generation sequencing allows affordable ultra-deep transposon insertion site recovery in high-throughput formats within 1 week. Finally, we describe principles of data analysis and interpretation for obtaining insights into cancer gene function and genetic tumor evolution.

    Funded by: Medical Research Council: MC_PC_12009

    Nature protocols 2017;12;2;289-309

  • Document retrieval on repetitive string collections.

    Gagie T, Hartikainen A, Karhu K, Kärkkäinen J, Navarro G, Puglisi SJ and Sirén J

    CeBiB - Center of Biotechnology and Bioengineering, School of Computer Science and Telecommunications, Diego Portales University, Santiago, Chile.

    Most of the fastest-growing string collections today are repetitive, that is, most of the constituent documents are similar to many others. As these collections keep growing, a key approach to handling them is to exploit their repetitiveness, which can reduce their space usage by orders of magnitude. We study the problem of indexing repetitive string collections in order to perform efficient document retrieval operations on them. Document retrieval problems are routinely solved by search engines on large natural language collections, but the techniques are less developed on generic string collections. The case of repetitive string collections is even less understood, and there are very few existing solutions. We develop two novel ideas, <i>interleaved LCPs</i> and <i>precomputed document lists</i>, that yield highly compressed indexes solving the problem of document listing (find all the documents where a string appears), top-<i>k</i> document retrieval (find the <i>k</i> documents where a string appears most often), and document counting (count the number of documents where a string appears). We also show that a classical data structure supporting the latter query becomes highly compressible on repetitive data. Finally, we show how the tools we developed can be combined to solve ranked conjunctive and disjunctive multi-term queries under the simple [Formula: see text] model of relevance. We thoroughly evaluate the resulting techniques in various real-life repetitiveness scenarios, and recommend the best choices for each case.

    Funded by: Wellcome Trust

    Information retrieval 2017;20;3;253-291

  • Wheeler graphs: A framework for BWT-based data structures.

    Gagie T, Manzini G and Sirén J

    Diego Portales University and CEBIB, Santiago, Chile.

    The famous Burrows-Wheeler Transform (BWT) was originally defined for a single string but variations have been developed for sets of strings, labeled trees, de Bruijn graphs, etc. In this paper we propose a framework that includes many of these variations and that we hope will simplify the search for more. We first define <i>Wheeler graphs</i> and show they have a property we call <i>path coherence</i>. We show that if the state diagram of a finite-state automaton is a Wheeler graph then, by its path coherence, we can order the nodes such that, for any string, the nodes reachable from the initial state or states by processing that string are consecutive. This means that even if the automaton is non-deterministic, we can still store it compactly and process strings with it quickly. We then rederive several variations of the BWT by designing straightforward finite-state automata for the relevant problems and showing that their state diagrams are Wheeler graphs.

    Theoretical computer science 2017;698;67-78

  • P113 is a merozoite surface protein that binds the N terminus of Plasmodium falciparum RH5.

    Galaway F, Drought LG, Fala M, Cross N, Kemp AC, Rayner JC and Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Cambridge CB10 1SA, UK.

    Invasion of erythrocytes by Plasmodium falciparum merozoites is necessary for malaria pathogenesis and is therefore a primary target for vaccine development. RH5 is a leading subunit vaccine candidate because anti-RH5 antibodies inhibit parasite growth and the interaction with its erythrocyte receptor basigin is essential for invasion. RH5 is secreted, complexes with other parasite proteins including CyRPA and RIPR, and contains a conserved N-terminal region (RH5Nt) of unknown function that is cleaved from the native protein. Here, we identify P113 as a merozoite surface protein that directly interacts with RH5Nt. Using recombinant proteins and a sensitive protein interaction assay, we establish the binding interdependencies of all the other known RH5 complex components and conclude that the RH5Nt-P113 interaction provides a releasable mechanism for anchoring RH5 to the merozoite surface. We exploit these findings to design a chemically synthesized peptide corresponding to RH5Nt, which could contribute to a cost-effective malaria vaccine.

    Nature communications 2017;8;14333

  • Epigenetic germline inheritance in mammals: looking to the past to understand the future.

    Gapp K and Bohacek J

    Gurdon Institute, University of Cambridge, Cambridge, UK.

    Life experiences can induce epigenetic changes in mammalian germ cells, which can influence the developmental trajectory of the offspring and impact health and disease across generations. While this concept of epigenetic germline inheritance has long been met with skepticism, evidence in support of this route of information transfer is now overwhelming, and some key mechanisms underlying germline transmission of acquired information are emerging. This review focuses specifically on sperm RNAs as causal vectors of inheritance. We examine how they might become altered in the germline, and how different classes of sperm RNAs might interact with other epimodifications in germ cells or in the zygote. We integrate the latest findings with earlier pioneering work in this field, point out major questions and challenges, and suggest how new experiments could address them.

    Genes, brain, and behavior 2017

  • Transcription Factor Activities Enhance Markers of Drug Sensitivity in Cancer.

    Garcia-Alonso L, Iorio F, Matchan A, Fonseca N, Jaaks P, Peat G, Pignatelli M, Falcone F, Benes CH, Dunham I, Bignell G, McDade SS, Garnett MJ and Saez-Rodriguez J

    European Molecular Biology Laboratory - European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, United Kingdom.

    Transcriptional dysregulation induced by aberrant transcription factors (TF) is a key feature of cancer, but its global influence on drug sensitivity has not been examined. Here, we infer the transcriptional activity of 127 TFs through analysis of RNA-seq gene expression data newly generated for 448 cancer cell lines, combined with publicly available datasets to survey a total of 1,056 cancer cell lines and 9,250 primary tumors. Predicted TF activities are supported by their agreement with independent shRNA essentiality profiles and homozygous gene deletions, and recapitulate mutant-specific mechanisms of transcriptional dysregulation in cancer. By analyzing cell line responses to 265 compounds, we uncovered numerous TFs whose activity interacts with anticancer drugs. Importantly, combining existing pharmacogenomic markers with TF activities often improves the stratification of cell lines in response to drug treatment. Our results, which can be queried freely at, offer a broad foundation for discovering opportunities to refine personalized cancer therapies.<b>Significance:</b> Systematic analysis of transcriptional dysregulation in cancer cell lines and patient tumor specimens offers a publicly searchable foundation to discover new opportunities to refine personalized cancer therapies. <i>Cancer Res; 78(3); 769-80. ©2017 AACR</i>.

    Funded by: Wellcome Trust: 102696

    Cancer research 2017;78;3;769-780

  • Platelet responses to agonists in a cohort of highly characterised platelet donors are consistent over time.

    Garner SF, Furnell A, Kahan BC, Jones CI, Attwood A, Harrison P, Kelly AM, Goodall AH, Cardigan R and Ouwehand WH

    NHS Blood and Transplant, Cambridge, UK.

    Background and objectives: Platelet function shows significant inheritance that is at least partially genetically controlled. There is also evidence that the platelet response is stable over time, but there are few studies that have assessed consistency of platelet function over months and years. We aimed to measure platelet function in platelet donors over time in individuals selected from a cohort of 956 donors whose platelet function had been previously characterised.

    Materials and methods: Platelet function was assessed by flow cytometry, measuring fibrinogen binding and P-selectin expression after stimulation with either cross-linked collagen-related peptide or adenosine 5'-diphosphate. Eighty-nine donors from the Cambridge Platelet Function Cohort whose platelet responses were initially within the lower or upper decile of reactivity were retested between 4 months and five and a half years later.

    Results: There was moderate-to-high correlation between the initial and repeat platelet function results for all assays (P ≤ 0·007, r<sup>2</sup> 0·2961-0·7625); furthermore, the range of results observed in the initial low and high responder groups remained significantly different at the time of the second test (P ≤ 0·0005).

    Conclusion: Platelet function remains consistent over time. This implies that this potential influence on quality of donated platelet concentrates will remain essentially constant for a given donor.

    Funded by: British Heart Foundation: RG/09/012/28096; Department of Health: RP-PG-0310-1002

    Vox sanguinis 2017;112;1;18-24

  • Minimal genetic change in Vibrio cholerae in Mozambique over time: Multilocus variable number tandem repeat analysis and whole genome sequencing.

    Garrine M, Mandomando I, Vubil D, Nhampossa T, Acacio S, Li S, Paulson JN, Almeida M, Domman D, Thomson NR, Alonso P and Stine OC

    Centro de Investigação em Saúde de Manhiça (CISM), Maputo, Mozambique.

    Although cholera is a major public health concern in Mozambique, its transmission patterns remain unknown. We surveyed the genetic relatedness of 75 Vibrio cholerae isolates from patients at Manhiça District Hospital between 2002-2012 and 3 isolates from river using multilocus variable-number tandem-repeat analysis (MLVA) and whole genome sequencing (WGS). MLVA revealed 22 genotypes in two clonal complexes and four unrelated genotypes. WGS revealed i) the presence of recombination, ii) 67 isolates descended monophyletically from a single source connected to Wave 3 of the Seventh Pandemic, and iii) four clinical isolates lacking the cholera toxin gene. This Wave 3 strain persisted for at least eight years in either an environmental reservoir or circulating within the human population. Our data raises important questions related to where these isolates persist and how identical isolates can be collected years apart despite our understanding of high change rate of MLVA loci and the V. cholerae molecular clock.

    Funded by: NIAID NIH HHS: R01 AI123422

    PLoS neglected tropical diseases 2017;11;6;e0005671

  • No genetic association between attention-deficit/hyperactivity disorder (ADHD) and Parkinson's disease in nine ADHD candidate SNPs.

    Geissler JM, International Parkinson Disease Genomics Consortium members, Romanos M, Gerlach M, Berg D and Schulte C

    Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, Center of Mental Health, University Hospital of Würzburg, Margarete-Höppel-Platz 1, 97080, Würzburg, Germany.

    Attention-deficit/hyperactivity disorder (ADHD) and Parkinson's disease (PD) involve pathological changes in brain structures such as the basal ganglia, which are essential for the control of motor and cognitive behavior and impulsivity. The cause of ADHD and PD remains unknown, but there is increasing evidence that both seem to result from a complicated interplay of genetic and environmental factors affecting numerous cellular processes and brain regions. To explore the possibility of common genetic pathways within the respective pathophysiologies, nine ADHD candidate single nucleotide polymorphisms (SNPs) in seven genes were tested for association with PD in 5333 cases and 12,019 healthy controls: one variant, respectively, in the genes coding for synaptosomal-associated protein 25 k (SNAP25), the dopamine (DA) transporter (SLC6A3; DAT1), DA receptor D4 (DRD4), serotonin receptor 1B (HTR1B), tryptophan hydroxylase 2 (TPH2), the norepinephrine transporter SLC6A2 and three SNPs in cadherin 13 (CDH13). Information was extracted from a recent meta-analysis of five genome-wide association studies, in which 7,689,524 SNPs in European samples were successfully imputed. No significant association was observed after correction for multiple testing. Therefore, it is reasonable to conclude that candidate variants implicated in the pathogenesis of ADHD do not play a substantial role in PD.

    Funded by: Intramural NIH HHS: ZIA AG000933-03; NINDS NIH HHS: R01 NS075321; Parkinson's UK: J-0901; Wellcome Trust

    Attention deficit and hyperactivity disorders 2017;9;2;121-127

  • MPRAnator: a web-based tool for the design of massively parallel reporter assay experiments.

    Georgakopoulos-Soares I, Jain N, Gray JM and Hemberg M

    Department of Computational Genomics, Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK.

    Motivation: With the rapid advances in DNA synthesis and sequencing technologies and the continuing decline in the associated costs, high-throughput experiments can be performed to investigate the regulatory role of thousands of oligonucleotide sequences simultaneously. Nevertheless, designing high-throughput reporter assay experiments such as massively parallel reporter assays (MPRAs) and similar methods remains challenging.

    Results: We introduce MPRAnator, a set of tools that facilitate rapid design of MPRA experiments. With MPRA Motif design, a set of variables provides fine control of how motifs are placed into sequences, thereby allowing the investigation of the rules that govern transcription factor (TF) occupancy. MPRA single-nucleotide polymorphism design can be used to systematically examine the functional effects of single or combinations of single-nucleotide polymorphisms at regulatory sequences. Finally, the Transmutation tool allows for the design of negative controls by permitting scrambling, reversing, complementing or introducing multiple random mutations in the input sequences or motifs.

    Availability and implementation: MPRAnator tool set is implemented in Python, Perl and Javascript and is freely available at and The source code is available on under the MIT license. The REST API allows programmatic access to MPRAnator using simple URLs.

    Contact: or information: Supplementary data are available at Bioinformatics online.

    Funded by: NIMH NIH HHS: R01 MH101528; Wellcome Trust

    Bioinformatics (Oxford, England) 2017;33;1;137-138

  • Precision oncology for acute myeloid leukemia using a knowledge bank approach.

    Gerstung M, Papaemmanuil E, Martincorena I, Bullinger L, Gaidzik VI, Paschka P, Heuser M, Thol F, Bolli N, Ganly P, Ganser A, McDermott U, Döhner K, Schlenk RF, Döhner H and Campbell PJ

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, UK.

    Underpinning the vision of precision medicine is the concept that causative mutations in a patient's cancer drive its biology and, by extension, its clinical features and treatment response. However, considerable between-patient heterogeneity in driver mutations complicates evidence-based personalization of cancer care. Here, by reanalyzing data from 1,540 patients with acute myeloid leukemia (AML), we explore how large knowledge banks of matched genomic-clinical data can support clinical decision-making. Inclusive, multistage statistical models accurately predicted likelihoods of remission, relapse and mortality, which were validated using data from independent patients in The Cancer Genome Atlas. Comparison of long-term survival probabilities under different treatments enables therapeutic decision support, which is available in exploratory form online. Personally tailored management decisions could reduce the number of hematopoietic cell transplants in patients with AML by 20-25% while maintaining overall survival rates. Power calculations show that databases require information from thousands of patients for accurate decision support. Knowledge banks facilitate personally tailored therapeutic decisions but require sustainable updating, inclusive cohorts and large sample sizes.

    Funded by: NCI NIH HHS: P30 CA008748; Wellcome Trust

    Nature genetics 2017;49;3;332-340

  • Evidence for Contemporary Switching of the O-Antigen Gene Cluster between Shiga Toxin-Producing Escherichia coli Strains Colonizing Cattle.

    Geue L, Menge C, Eichhorn I, Semmler T, Wieler LH, Pickard D, Berens C and Barth SA

    Friedrich-Loeffler-Institut/Federal Research Institute for Animal Health, Institute of Molecular Pathogenesis Jena, Germany.

    Shiga toxin-producing <i>Escherichia coli</i> (STEC) comprise a group of zoonotic enteric pathogens with ruminants, especially cattle, as the main reservoir. O-antigens are instrumental for host colonization and bacterial niche adaptation. They are highly immunogenic and, therefore, targeted by the adaptive immune system. The O-antigen is one of the most diverse bacterial cell constituents and variation not only exists between different bacterial species, but also between individual isolates/strains within a single species. We recently identified STEC persistently infecting cattle and belonging to the different serotypes O156:H25 (<i>n</i> = 21) and O182:H25 (<i>n</i> = 15) that were of the MLST sequence types ST300 or ST688. These STs differ by a single nucleotide in <i>purA</i> only. Fitness-, virulence-associated genome regions, and CRISPR/CAS (clustered regularly interspaced short palindromic repeats/CRISPR associated sequence) arrays of these STEC O156:H25 and O182:H25 isolates were highly similar, and identical genomic integration sites for the <i>stx</i> converting bacteriophages and the core LEE, identical Shiga toxin converting bacteriophage genes for <i>stx1a</i>, identical complete LEE loci, and identical sets of chemotaxis and flagellar genes were identified. In contrast to this genomic similarity, the nucleotide sequences of the O-antigen gene cluster (O-AGC) regions between <i>galF</i> and <i>gnd</i> and very few flanking genes differed fundamentally and were specific for the respective serotype. Sporadic aEPEC O156:H8 isolates (<i>n</i> = 5) were isolated in temporal and spatial proximity. While the O-AGC and the corresponding 5' and 3' flanking regions of these aEPEC isolates were identical to the respective region in the STEC O156:H25 isolates, the core genome, the virulence associated genome regions and the CRISPR/CAS elements differed profoundly. Our cumulative epidemiological and molecular data suggests a recent switch of the O-AGC between isolates with O156:H8 strains having served as DNA donors. Such O-antigen switches can affect the evaluation of a strain's pathogenic and virulence potential, suggesting that NGS methods might lead to a more reliable risk assessment.

    Frontiers in microbiology 2017;8;424

  • A staging system for correct phenotype interpretation of mouse embryos harvested on embryonic day 14 (E14.5).

    Geyer SH, Reissig L, Rose J, Wilson R, Prin F, Szumska D, Ramirez-Solis R, Tudor C, White J, Mohun TJ and Weninger WJ

    Centre for Anatomy and Cell Biology & MIC, Medical University of Vienna, Vienna, Austria.

    We present a simple and quick system for accurately scoring the developmental progress of mouse embryos harvested on embryonic day 14 (E14.5). Based solely on the external appearance of the maturing forelimb, we provide a convenient way to distinguish six developmental sub-stages. Using a variety of objective morphometric data obtained from the commonly used C57BL/6N mouse strain, we show that these stages correlate precisely with the growth of the entire embryo and its organs. Applying the new staging system to phenotype analyses of E14.5 embryos of 58 embryonic lethal null mutant lines from the DMDD research programme ( and its pilot, we show that homozygous mutant embryos are frequently delayed in development. To demonstrate the importance of our staging system for correct phenotype interpretation, we describe stage-specific changes of the palate, heart and gut, and provide examples in which correct diagnosis of malformations relies on correct staging.

    Journal of anatomy 2017;230;5;710-719

  • Morphology, topology and dimensions of the heart and arteries of genetically normal and mutant mouse embryos at stages S21-S23.

    Geyer SH, Reissig LF, Hüsemann M, Höfle C, Wilson R, Prin F, Szumska D, Galli A, Adams DJ, White J, Mohun TJ and Weninger WJ

    Division of Anatomy & MIC, Medical University of Vienna, Vienna, Austria.

    Accurate identification of abnormalities in the mouse embryo depends not only on comparisons with appropriate, developmental stage-matched controls, but also on an appreciation of the range of anatomical variation that can be expected during normal development. Here we present a morphological, topological and metric analysis of the heart and arteries of mouse embryos harvested on embryonic day (E)14.5, based on digital volume data of whole embryos analysed by high-resolution episcopic microscopy (HREM). By comparing data from 206 genetically normal embryos, we have analysed the range and frequency of normal anatomical variations in the heart and major arteries across Theiler stages S21-S23. Using this, we have identified abnormalities in these structures among 298 embryos from mutant mouse lines carrying embryonic lethal gene mutations produced for the Deciphering the Mechanisms of Developmental Disorders (DMDD) programme. We present examples of both commonly occurring abnormal phenotypes and novel pathologies that most likely alter haemodynamics in these genetically altered mouse embryos. Our findings offer a reference baseline for identifying accurately abnormalities of the heart and arteries in embryos that have largely completed organogenesis.

    Journal of anatomy 2017

  • Activation of the Aryl Hydrocarbon Receptor Interferes with Early Embryonic Development.

    Gialitakis M, Tolaini M, Li Y, Pardo M, Yu L, Toribio A, Choudhary JS, Niakan K, Papayannopoulos V and Stockinger B

    The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK. Electronic address:

    The transcriptional program of early embryonic development is tightly regulated by a set of well-defined transcription factors that suppress premature expression of differentiation genes and sustain the pluripotent identity. It is generally accepted that this program can be perturbed by environmental factors such as chemical pollutants; however, the precise molecular mechanisms remain unknown. The aryl hydrocarbon receptor (AHR) is a widely expressed nuclear receptor that senses environmental stimuli and modulates target gene expression. Here, we have investigated the AHR interactome in embryonic stem cells by mass spectrometry and show that ectopic activation of AHR during early differentiation disrupts the differentiation program via the chromatin remodeling complex NuRD (nucleosome remodeling and deacetylation). The activated AHR/NuRD complex altered the expression of differentiation-specific genes that control the first two developmental decisions without affecting the pluripotency program. These findings identify a mechanism that allows environmental stimuli to disrupt embryonic development through AHR signaling.

    Funded by: Cancer Research UK: FC001120, FC001129, FC001159; Medical Research Council; Wellcome Trust: 100910/Z/13/Z, WT098051

    Stem cell reports 2017;9;5;1377-1386

  • Increased Expression of a MicroRNA Correlates with Anthelmintic Resistance in Parasitic Nematodes.

    Gillan V, Maitland K, Laing R, Gu H, Marks ND, Winter AD, Bartley D, Morrison A, Skuce PJ, Rezansoff AM, Gilleard JS, Martinelli A, Britton C and Devaney E

    Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom.

    Resistance to anthelmintic drugs is a major problem in the global fight against parasitic nematodes infecting humans and animals. While previous studies have identified mutations in drug target genes in resistant parasites, changes in the expression levels of both targets and transporters have also been reported. The mechanisms underlying these changes in gene expression are unresolved. Here, we take a novel approach to this problem by investigating the role of small regulatory RNAs in drug resistant strains of the important parasite <i>Haemonchus contortus</i>. microRNAs (miRNAs) are small (22 nt) non-coding RNAs that regulate gene expression by binding predominantly to the 3' UTR of mRNAs. Changes in miRNA expression have been implicated in drug resistance in a variety of tumor cells. In this study, we focused on two geographically distinct ivermectin resistant strains of <i>H. contortus</i> and two lines generated by multiple rounds of backcrossing between susceptible and resistant parents, with ivermectin selection. All four resistant strains showed significantly increased expression of a single miRNA, <i>hco-miR-9551</i>, compared to the susceptible strain. This same miRNA is also upregulated in a multi-drug-resistant strain of the related nematode <i>Teladorsagia circumcincta</i>. <i>hco-miR-9551</i> is enriched in female worms, is likely to be located on the X chromosome and is restricted to clade V parasitic nematodes. Genes containing predicted binding sites for <i>hco-miR-9551</i> were identified computationally and refined based on differential expression in a transcriptomic dataset prepared from the same drug resistant and susceptible strains. This analysis identified three putative target mRNAs, one of which, a CHAC domain containing protein, is located in a region of the <i>H. contortus</i> genome introgressed from the resistant parent. <i>hco-miR-9551</i> was shown to interact with the 3' UTR of this gene by dual luciferase assay. This study is the first to suggest a role for miRNAs and the genes they regulate in drug resistant parasitic nematodes. <i>miR-9551</i> also has potential as a biomarker of resistance in different nematode species.

    Funded by: Wellcome Trust: 086823/Z/08/Z

    Frontiers in cellular and infection microbiology 2017;7;452

  • De novo yeast genome assemblies from MinION, PacBio and MiSeq platforms.

    Giordano F, Aigrain L, Quail MA, Coupland P, Bonfield JK, Davies RM, Tischler G, Jackson DK, Keane TM, Li J, Yue JX, Liti G, Durbin R and Ning Z

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Long-read sequencing technologies such as Pacific Biosciences and Oxford Nanopore MinION are capable of producing long sequencing reads with average fragment lengths of over 10,000 base-pairs and maximum lengths reaching 100,000 base- pairs. Compared with short reads, the assemblies obtained from long-read sequencing platforms have much higher contig continuity and genome completeness as long fragments are able to extend paths into problematic or repetitive regions. Many successful assembly applications of the Pacific Biosciences technology have been reported ranging from small bacterial genomes to large plant and animal genomes. Recently, genome assemblies using Oxford Nanopore MinION data have attracted much attention due to the portability and low cost of this novel sequencing instrument. In this paper, we re-sequenced a well characterized genome, the Saccharomyces cerevisiae S288C strain using three different platforms: MinION, PacBio and MiSeq. We present a comprehensive metric comparison of assemblies generated by various pipelines and discuss how the platform associated data characteristics affect the assembly quality. With a given read depth of 31X, the assemblies from both Pacific Biosciences and Oxford Nanopore MinION show excellent continuity and completeness for the 16 nuclear chromosomes, but not for the mitochondrial genome, whose reconstruction still represents a significant challenge.

    Funded by: Wellcome Trust

    Scientific reports 2017;7;1;3935

  • New insights into sex chromosome evolution in anole lizards (Reptilia, Dactyloidae).

    Giovannotti M, Trifonov VA, Paoletti A, Kichigin IG, O'Brien PC, Kasai F, Giovagnoli G, Ng BL, Ruggeri P, Cerioni PN, Splendiani A, Pereira JC, Olmo E, Rens W, Caputo Barucchi V and Ferguson-Smith MA

    Dipartimento di Scienze della Vita e dell'Ambiente, Università Politecnica delle Marche, via Brecce Bianche, 60131, Ancona, Italy.

    Anoles are a clade of iguanian lizards that underwent an extensive radiation between 125 and 65 million years ago. Their karyotypes show wide variation in diploid number spanning from 26 (Anolis evermanni) to 44 (A. insolitus). This chromosomal variation involves their sex chromosomes, ranging from simple systems (XX/XY), with heterochromosomes represented by either micro- or macrochromosomes, to multiple systems (X<sub>1</sub>X<sub>1</sub>X<sub>2</sub>X<sub>2</sub>/X<sub>1</sub>X<sub>2</sub>Y). Here, for the first time, the homology relationships of sex chromosomes have been investigated in nine anole lizards at the whole chromosome level. Cross-species chromosome painting using sex chromosome paints from A. carolinensis, Ctenonotus pogus and Norops sagrei and gene mapping of X-linked genes demonstrated that the anole ancestral sex chromosome system constituted by microchromosomes is retained in all the species with the ancestral karyotype (2n = 36, 12 macro- and 24 microchromosomes). On the contrary, species with a derived karyotype, namely those belonging to genera Ctenonotus and Norops, show a series of rearrangements (fusions/fissions) involving autosomes/microchromosomes that led to the formation of their current sex chromosome systems. These results demonstrate that different autosomes were involved in translocations with sex chromosomes in closely related lineages of anole lizards and that several sequential microautosome/sex chromosome fusions lead to a remarkable increase in size of Norops sagrei sex chromosomes.

    Chromosoma 2017;126;2;245-260

  • Pre-vaccine serotype composition within a lineage signposts its serotype replacement - a carriage study over 7 years following pneumococcal conjugate vaccine use in the UK.

    Gladstone RA, Devine V, Jones J, Cleary D, Jefferies JM, Bentley SD, Faust SN and Clarke SC

    1​Infection Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Serotype replacement has been reported in carriage and disease after pneumococcal conjugate vaccine (PCV) introductions in the UK and globally. We previously described concurrent expansion and decline of sequence types associated with serotype replacement over 5 years following PCV introductions in the UK. Here we use whole-genome sequencing to fully characterise the population structure of pneumococcal isolates collected over seven winters encompassing PCV7 and PCV13 introductions in the UK, investigating the importance of lineages in serotype replacement. We analysed 672 pneumococcal genomes from colonised children of 4 years old or less. The temporal prevalence of 20 lineages, defined by hierarchical Bayesian analysis of population structure (BAPS), was assessed in the context of serotype replacement. Multiple serotypes were detected in the primary winter of sampling within three vaccine-type (VT) lineages BAPS4, BAPS10 and BAPS11, in which serotype replacement were observed. In contrast, serotype replacement was not seen in the remaining three VT lineages (BAPS1, BAPS13 and BAPS14), that expressed a single serotype (6B, 6A and 3, respectively) in the primary winter. One lineage, BAPS1 serotype 6B was undetectable in the population towards the end of the study period. The dynamics of serotype replacement, in this UK population, was preceded by the presence or absence of multiple serotypes within VT lineages, in the pre-PCV population. This observation could help predict which non-vaccine types (NVTs) may be involved in replacement in future PCV introductions here and elsewhere. It could further indicate whether any antibiotic resistance associated with the lineages is likely to be affected by replacement.

    Funded by: Wellcome Trust

    Microbial genomics 2017;3;6;e000119

  • Immunogenomic approaches to understand the function of immune disease variants.

    Glinos DA, Soskic B and Trynka G

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK.

    Mapping hundreds of genetic variants through genome wide association studies provided an opportunity to gain insights into the pathobiology of immune-mediated diseases. However, as most of the disease variants fall outside the gene coding sequences the functional interpretation of the exact role of the associated variants remains to be determined. The integration of disease-associated variants with large scale genomic maps of cell-type-specific gene regulation at both chromatin and transcript levels deliver examples of functionally prioritized causal variants and genes. In particular, the enrichment of disease variants with histone marks can point towards the cell types most relevant to disease development. Furthermore, chromatin contact maps that link enhancers to promoter regions in a direct way allow the identification of genes that can be regulated by the disease variants. Candidate genes implicated with such approaches can be further examined through the correlation of gene expression with genotypes. Additionally, in the context of immune-mediated diseases it is important to combine genomics with immunology approaches. Genotype correlations with the immune system as a whole, as well as with cellular responses to different stimuli, provide a valuable platform for understanding the functional impact of disease-associated variants. The intersection of immunogenomic resources with disease-associated variants paints a detailed picture of disease causal mechanisms. Here, we provide an overview of recent studies that combine these approaches to identify disease vulnerable pathways.

    Funded by: Wellcome Trust: WT206194

    Immunology 2017;152;4;527-535

  • A somatic-mutational process recurrently duplicates germline susceptibility loci and tissue-specific super-enhancers in breast cancers.

    Glodzik D, Morganella S, Davies H, Simpson PT, Li Y, Zou X, Diez-Perez J, Staaf J, Alexandrov LB, Smid M, Brinkman AB, Rye IH, Russnes H, Raine K, Purdie CA, Lakhani SR, Thompson AM, Birney E, Stunnenberg HG, van de Vijver MJ, Martens JW, Børresen-Dale AL, Richardson AL, Kong G, Viari A, Easton D, Evan G, Campbell PJ, Stratton MR and Nik-Zainal S

    Wellcome Trust Sanger Institute, Cambridge, UK.

    Somatic rearrangements contribute to the mutagenized landscape of cancer genomes. Here, we systematically interrogated rearrangements in 560 breast cancers by using a piecewise constant fitting approach. We identified 33 hotspots of large (>100 kb) tandem duplications, a mutational signature associated with homologous-recombination-repair deficiency. Notably, these tandem-duplication hotspots were enriched in breast cancer germline susceptibility loci (odds ratio (OR) = 4.28) and breast-specific 'super-enhancer' regulatory elements (OR = 3.54). These hotspots may be sites of selective susceptibility to double-strand-break damage due to high transcriptional activity or, through incrementally increasing copy number, may be sites of secondary selective pressure. The transcriptomic consequences ranged from strong individual oncogene effects to weak but quantifiable multigene expression effects. We thus present a somatic-rearrangement mutational process affecting coding sequences and noncoding regulatory elements and contributing a continuum of driver consequences, from modest to strong effects, thereby supporting a polygenic model of cancer development.

    Funded by: Cancer Research UK: 12077; European Research Council: 322737; Wellcome Trust: 098051

    Nature genetics 2017;49;3;341-348

  • Genetic diversity of next generation antimalarial targets: A baseline for drug resistance surveillance programmes.

    Gomes AR, Ravenhall M, Benavente ED, Talman A, Sutherland C, Roper C, Clark TG and Campino S

    Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK.

    Drug resistance is a recurrent problem in the fight against malaria. Genetic and epidemiological surveillance of antimalarial resistant parasite alleles is crucial to guide drug therapies and clinical management. New antimalarial compounds are currently at various stages of clinical trials and regulatory evaluation. Using ∼2000 Plasmodium falciparum genome sequences, we investigated the genetic diversity of eleven gene-targets of promising antimalarial compounds and assessed their potential efficiency across malaria endemic regions. We determined if the loci are under selection prior to the introduction of new drugs and established a baseline of genetic variance, including potential resistant alleles, for future surveillance programmes.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/J014567/1; Medical Research Council: MC_PC_15103, MR/K000551/1, MR/M01360X/1, MR/N010469/1

    International journal for parasitology. Drugs and drug resistance 2017;7;2;174-180

  • Immuno-oncology from the perspective of somatic evolution.

    González S, Volkova N, Beer P and Gerstung M

    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.

    The past years have witnessed significant success for cancer immunotherapies that activate a patient's immune system against their cancer cells. At the same time our understanding of the genetic changes driving tumor evolution have progressed dramatically. The study of cancer genomes has shown that tumors are best understood as cell populations governed by the rules of evolution, leading to the emergence and spread of cell lineages with pathogenic mutations. Moreover, somatic evolution can explain the acquisition of mutations conferring drug resistance in the ever-lasting battle for reaching even fitter cell states. Here, we review the current state of the art of somatic cancer evolution and mechanisms of immune control and escape. We also revisit the principles of immunotherapy from the perspective of somatic evolution and discuss the basic rules of resistance to immunotherapies as dictated by evolution.

    Seminars in cancer biology 2017

  • Stem cell senescence drives age-attenuated induction of pituitary tumours in mouse models of paediatric craniopharyngioma.

    Gonzalez-Meljem JM, Haston S, Carreno G, Apps JR, Pozzi S, Stache C, Kaushal G, Virasami A, Panousopoulos L, Mousavy-Gharavy SN, Guerrero A, Rashid M, Jani N, Goding CR, Jacques TS, Adams DJ, Gil J, Andoniadou CL and Martinez-Barbera JP

    Developmental Biology and Cancer Programme, Birth Defects Research Centre, UCL Institute of Child Health, London, WC1N 1EH, UK.

    Senescent cells may promote tumour progression through the activation of a senescence-associated secretory phenotype (SASP), whether these cells are capable of initiating tumourigenesis in vivo is not known. Expression of oncogenic β-catenin in Sox2+ young adult pituitary stem cells leads to formation of clusters of stem cells and induction of tumours resembling human adamantinomatous craniopharyngioma (ACP), derived from Sox2- cells in a paracrine manner. Here, we uncover the mechanisms underlying this paracrine tumourigenesis. We show that expression of oncogenic β-catenin in Hesx1+ embryonic precursors also results in stem cell clusters and paracrine tumours. We reveal that human and mouse clusters are analogous and share a common signature of senescence and SASP. Finally, we show that mice with reduced senescence and SASP responses exhibit decreased tumour-inducing potential. Together, we provide evidence that senescence and a stem cell-associated SASP drive cell transformation and tumour initiation in vivo in an age-dependent fashion.

    Funded by: Cancer Research UK: 13031; Medical Research Council: MC_U120085810, MR/L016729/1, MR/M000125/1; Wellcome Trust

    Nature communications 2017;8;1;1819

  • Gastrointestinal Carriage Is a Major Reservoir of Klebsiella pneumoniae Infection in Intensive Care Patients.

    Gorrie CL, Mirceta M, Wick RR, Edwards DJ, Thomson NR, Strugnell RA, Pratt NF, Garlick JS, Watson KM, Pilcher DV, McGloughlin SA, Spelman DW, Jenney AWJ and Holt KE

    Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute.

    Background: Klebsiella pneumoniae is an opportunistic pathogen and leading cause of hospital-associated infections. Intensive care unit (ICU) patients are particularly at risk. Klebsiella pneumoniae is part of the healthy human microbiome, providing a potential reservoir for infection. However, the frequency of gut colonization and its contribution to infections are not well characterized.

    Methods: We conducted a 1-year prospective cohort study in which 498 ICU patients were screened for rectal and throat carriage of K. pneumoniae shortly after admission. Klebsiella pneumoniae isolated from screening swabs and clinical diagnostic samples were characterized using whole genome sequencing and combined with epidemiological data to identify likely transmission events.

    Results: Klebsiella pneumoniae carriage frequencies were estimated at 6% (95% confidence interval [CI], 3%-8%) among ICU patients admitted direct from the community, and 19% (95% CI, 14%-51%) among those with recent healthcare contact. Gut colonization on admission was significantly associated with subsequent infection (infection risk 16% vs 3%, odds ratio [OR] = 6.9, P < .001), and genome data indicated matching carriage and infection isolates in 80% of isolate pairs. Five likely transmission chains were identified, responsible for 12% of K. pneumoniae infections in ICU. In sum, 49% of K. pneumoniae infections were caused by the patients' own unique strain, and 48% of screened patients with infections were positive for prior colonization.

    Conclusions: These data confirm K. pneumoniae colonization is a significant risk factor for infection in ICU, and indicate ~50% of K. pneumoniae infections result from patients' own microbiota. Screening for colonization on admission could limit risk of infection in the colonized patient and others.

    Clinical infectious diseases : an official publication of the Infectious Diseases Society of America 2017;65;2;208-215

  • Genome-wide physical activity interactions in adiposity - A meta-analysis of 200,452 adults.

    Graff M, Scott RA, Justice AE, Young KL, Feitosa MF, Barata L, Winkler TW, Chu AY, Mahajan A, Hadley D, Xue L, Workalemahu T, Heard-Costa NL, den Hoed M, Ahluwalia TS, Qi Q, Ngwa JS, Renström F, Quaye L, Eicher JD, Hayes JE, Cornelis M, Kutalik Z, Lim E, Luan J, Huffman JE, Zhang W, Zhao W, Griffin PJ, Haller T, Ahmad S, Marques-Vidal PM, Bien S, Yengo L, Teumer A, Smith AV, Kumari M, Harder MN, Justesen JM, Kleber ME, Hollensted M, Lohman K, Rivera NV, Whitfield JB, Zhao JH, Stringham HM, Lyytikäinen LP, Huppertz C, Willemsen G, Peyrot WJ, Wu Y, Kristiansson K, Demirkan A, Fornage M, Hassinen M, Bielak LF, Cadby G, Tanaka T, Mägi R, van der Most PJ, Jackson AU, Bragg-Gresham JL, Vitart V, Marten J, Navarro P, Bellis C, Pasko D, Johansson Å, Snitker S, Cheng YC, Eriksson J, Lim U, Aadahl M, Adair LS, Amin N, Balkau B, Auvinen J, Beilby J, Bergman RN, Bergmann S, Bertoni AG, Blangero J, Bonnefond A, Bonnycastle LL, Borja JB, Brage S, Busonero F, Buyske S, Campbell H, Chines PS, Collins FS, Corre T, Smith GD, Delgado GE, Dueker N, Dörr M, Ebeling T, Eiriksdottir G, Esko T, Faul JD, Fu M, Færch K, Gieger C, Gläser S, Gong J, Gordon-Larsen P, Grallert H, Grammer TB, Grarup N, van Grootheest G, Harald K, Hastie ND, Havulinna AS, Hernandez D, Hindorff L, Hocking LJ, Holmens OL, Holzapfel C, Hottenga JJ, Huang J, Huang T, Hui J, Huth C, Hutri-Kähönen N, James AL, Jansson JO, Jhun MA, Juonala M, Kinnunen L, Koistinen HA, Kolcic I, Komulainen P, Kuusisto J, Kvaløy K, Kähönen M, Lakka TA, Launer LJ, Lehne B, Lindgren CM, Lorentzon M, Luben R, Marre M, Milaneschi Y, Monda KL, Montgomery GW, De Moor MHM, Mulas A, Müller-Nurasyid M, Musk AW, Männikkö R, Männistö S, Narisu N, Nauck M, Nettleton JA, Nolte IM, Oldehinkel AJ, Olden M, Ong KK, Padmanabhan S, Paternoster L, Perez J, Perola M, Peters A, Peters U, Peyser PA, Prokopenko I, Puolijoki H, Raitakari OT, Rankinen T, Rasmussen-Torvik LJ, Rawal R, Ridker PM, Rose LM, Rudan I, Sarti C, Sarzynski MA, Savonen K, Scott WR, Sanna S, Shuldiner AR, Sidney S, Silbernagel G, Smith BH, Smith JA, Snieder H, Stančáková A, Sternfeld B, Swift AJ, Tammelin T, Tan ST, Thorand B, Thuillier D, Vandenput L, Vestergaard H, van Vliet-Ostaptchouk JV, Vohl MC, Völker U, Waeber G, Walker M, Wild S, Wong A, Wright AF, Zillikens MC, Zubair N, Haiman CA, Lemarchand L, Gyllensten U, Ohlsson C, Hofman A, Rivadeneira F, Uitterlinden AG, Pérusse L, Wilson JF, Hayward C, Polasek O, Cucca F, Hveem K, Hartman CA, Tönjes A, Bandinelli S, Palmer LJ, Kardia SLR, Rauramaa R, Sørensen TIA, Tuomilehto J, Salomaa V, Penninx BWJH, de Geus EJC, Boomsma DI, Lehtimäki T, Mangino M, Laakso M, Bouchard C, Martin NG, Kuh D, Liu Y, Linneberg A, März W, Strauch K, Kivimäki M, Harris TB, Gudnason V, Völzke H, Qi L, Järvelin MR, Chambers JC, Kooner JS, Froguel P, Kooperberg C, Vollenweider P, Hallmans G, Hansen T, Pedersen O, Metspalu A, Wareham NJ, Langenberg C, Weir DR, Porteous DJ, Boerwinkle E, Chasman DI, CHARGE Consortium, EPIC-InterAct Consortium, PAGE Consortium, Abecasis GR, Barroso I, McCarthy MI, Frayling TM, O'Connell JR, van Duijn CM, Boehnke M, Heid IM, Mohlke KL, Strachan DP, Fox CS, Liu CT, Hirschhorn JN, Klein RJ, Johnson AD, Borecki IB, Franks PW, North KE, Cupples LA, Loos RJF and Kilpeläinen TO

    Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America.

    Physical activity (PA) may modify the genetic effects that give rise to increased risk of obesity. To identify adiposity loci whose effects are modified by PA, we performed genome-wide interaction meta-analyses of BMI and BMI-adjusted waist circumference and waist-hip ratio from up to 200,452 adults of European (n = 180,423) or other ancestry (n = 20,029). We standardized PA by categorizing it into a dichotomous variable where, on average, 23% of participants were categorized as inactive and 77% as physically active. While we replicate the interaction with PA for the strongest known obesity-risk locus in the FTO gene, of which the effect is attenuated by ~30% in physically active individuals compared to inactive individuals, we do not identify additional loci that are sensitive to PA. In additional genome-wide meta-analyses adjusting for PA and interaction with PA, we identify 11 novel adiposity loci, suggesting that accounting for PA or other environmental factors that contribute to variation in adiposity may facilitate gene discovery.

    Funded by: British Heart Foundation: RG/10/12/28456; Medical Research Council: MC_U106179473, MC_UU_12013/1, MC_UU_12015/1, MC_UU_12015/2, MC_UU_12015/3, MC_UU_12019/1, MR/K002414/1; NCATS NIH HHS: KL2 TR001109; NCI NIH HHS: UM1 CA182913; NEI NIH HHS: T32 EY022303; NHLBI NIH HHS: K99 HL130580, R01 HL105756, R01 HL117078; NICHD NIH HHS: P2C HD050924; NIDDK NIH HHS: P30 DK020541, P30 DK020572, P30 DK072488, R01 DK072193, R01 DK089256, R01 DK093757; NIEHS NIH HHS: P30 ES010126; NIH HHS: S10 OD018522, S10 OD020069

    PLoS genetics 2017;13;4;e1006528

  • Convergent evolution and topologically disruptive polymorphisms among multidrug-resistant tuberculosis in Peru.

    Grandjean L, Gilman RH, Iwamoto T, Köser CU, Coronel J, Zimic M, Török ME, Ayabina D, Kendall M, Fraser C, Harris S, Parkhill J, Peacock SJ, Moore DAJ and Colijn C

    University College London, Institute of Child Health, London, United Kingdom.

    Background: Multidrug-resistant tuberculosis poses a major threat to the success of tuberculosis control programs worldwide. Understanding how drug-resistant tuberculosis evolves can inform the development of new therapeutic and preventive strategies.

    Methods: Here, we use novel genome-wide analysis techniques to identify polymorphisms that are associated with drug resistance, adaptive evolution and the structure of the phylogenetic tree. A total of 471 samples from different patients collected between 2009 and 2013 in the Lima suburbs of Callao and Lima South were sequenced on the Illumina MiSeq platform with 150bp paired-end reads. After alignment to the reference H37Rv genome, variants were called using standardized methodology. Genome-wide analysis was undertaken using custom written scripts implemented in R software.

    Results: High quality homoplastic single nucleotide polymorphisms were observed in genes known to confer drug resistance as well as genes in the Mycobacterium tuberculosis ESX secreted protein pathway, pks12, and close to toxin/anti-toxin pairs. Correlation of homoplastic variant sites identified that many were significantly correlated, suggestive of epistasis. Variation in genes coding for ESX secreted proteins also significantly disrupted phylogenetic structure. Mutations in ESX genes in key antigenic epitope positions were also found to disrupt tree topology.

    Conclusion: Variation in these genes have a biologically plausible effect on immunogenicity and virulence. This makes functional characterization warranted to determine the effects of these polymorphisms on bacterial fitness and transmission.

    Funded by: Medical Research Council: MR/K007467/1

    PloS one 2017;12;12;e0189838

  • Consumer Health Informatics Aspects of Direct-to-Consumer Personal Genomic Testing.

    Gray K, Stephen R, Terrill B, Wilson B, Middleton A, Tytherleigh R, Turbitt E, Gaff C, Savard J, Hickerton C, Newson A and Metcalfe S

    The University of Melbourne, VIC Australia.

    This paper uses consumer health informatics as a framework to explore whether and how direct-to-consumer personal genomic testing can be regarded as a form of information which assists consumers to manage their health. It presents findings from qualitative content analysis of web sites that offer testing services, and of transcripts from focus groups conducted as part a study of the Australian public's expectations of personal genomics. Content analysis showed that service offerings have some features of consumer health information but lack consistency. Focus group participants were mostly unfamiliar with the specifics of test reports and related information services. Some of their ideas about aids to knowledge were in line with the benefits described on provider web sites, but some expectations were inflated. People were ambivalent about whether these services would address consumers' health needs, interests and contexts and whether they would support consumers' health self-management decisions and outcomes. There is scope for consumer health informatics approaches to refine the usage and the utility of direct-to-consumer personal genomic testing. Further research may focus on how uptake is affected by consumers' health literacy or by services' engagement with consumers about what they really want.

    Studies in health technology and informatics 2017;245;89-93

  • De novo SETD5 loss-of-function variant as a cause for intellectual disability in a 10-year old boy with an aberrant blind ending bronchus.

    Green C, Willoughby J, DDD Study and Balasubramanian M

    Sheffield Clinical Genetics Service, Sheffield Children's NHS Foundation Trust, Sheffield, UK.

    Although rare, 3p microdeletion cases have been well described in the clinical literature. The clinical phenotype includes; intellectual disability (ID), growth retardation, facial dysmorphism, and cardiac malformations. Advances in chromosome microarray (CMA) testing narrowed the 3p25 critical region to a 124 kb region, and recent Whole Exome Sequencing (WES) studies have suggested that the SETD5 gene contributes significantly to the 3p25 phenotype. Loss-of-Function (LoF) variants in SETD5 are now considered a likely cause of ID. We report here a patient with a frameshift LoF variant in exon 12 of SETD5. This patient has features overlapping with other patients described with LoF SETD5 variants to include; similar facial morphology, feeding difficulties, ID, behavioral abnormalities and leg length discrepancy. In addition, he presents with an aberrant blind ending bronchus. This report adds to publications describing intragenic mutations in SETD5 and supports the assertion that de novo LoF mutations in SETD5 present with an overlapping but distinct phenotype in comparison with 3p25 microdeletion syndromes.

    American journal of medical genetics. Part A 2017;173;12;3165-3171

  • Genetic invalidation of Lp-PLA2 as a therapeutic target: Large-scale study of five functional Lp-PLA2-lowering alleles.

    Gregson JM, Freitag DF, Surendran P, Stitziel NO, Chowdhury R, Burgess S, Kaptoge S, Gao P, Staley JR, Willeit P, Nielsen SF, Caslake M, Trompet S, Polfus LM, Kuulasmaa K, Kontto J, Perola M, Blankenberg S, Veronesi G, Gianfagna F, Männistö S, Kimura A, Lin H, Reilly DF, Gorski M, Mijatovic V, CKDGen consortium, Munroe PB, Ehret GB, International Consortium for Blood Pressure, Thompson A, Uria-Nickelsen M, Malarstig A, Dehghan A, CHARGE inflammation working group, Vogt TF, Sasaoka T, Takeuchi F, Kato N, Yamada Y, Kee F, Müller-Nurasyid M, Ferrières J, Arveiler D, Amouyel P, Salomaa V, Boerwinkle E, Thompson SG, Ford I, Wouter Jukema J, Sattar N, Packard CJ, Shafi Majumder AA, Alam DS, Deloukas P, Schunkert H, Samani NJ, Kathiresan S, MICAD Exome consortium, Nordestgaard BG, Saleheen D, Howson JM, Di Angelantonio E, Butterworth AS, Danesh J and EPIC-CVD consortium and the CHD Exome+ consortium

    1 MRC/BHF Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, UK.

    Aims Darapladib, a potent inhibitor of lipoprotein-associated phospholipase A<sub>2</sub> (Lp-PLA<sub>2</sub>), has not reduced risk of cardiovascular disease outcomes in recent randomized trials. We aimed to test whether Lp-PLA<sub>2</sub> enzyme activity is causally relevant to coronary heart disease. Methods In 72,657 patients with coronary heart disease and 110,218 controls in 23 epidemiological studies, we genotyped five functional variants: four rare loss-of-function mutations (c.109+2T > C (rs142974898), Arg82His (rs144983904), Val279Phe (rs76863441), Gln287Ter (rs140020965)) and one common modest-impact variant (Val379Ala (rs1051931)) in PLA2G7, the gene encoding Lp-PLA<sub>2</sub>. We supplemented de-novo genotyping with information on a further 45,823 coronary heart disease patients and 88,680 controls in publicly available databases and other previous studies. We conducted a systematic review of randomized trials to compare effects of darapladib treatment on soluble Lp-PLA<sub>2</sub> activity, conventional cardiovascular risk factors, and coronary heart disease risk with corresponding effects of Lp-PLA<sub>2</sub>-lowering alleles. Results Lp-PLA<sub>2</sub> activity was decreased by 64% ( p = 2.4 × 10<sup>-25</sup>) with carriage of any of the four loss-of-function variants, by 45% ( p < 10<sup>-300</sup>) for every allele inherited at Val279Phe, and by 2.7% ( p = 1.9 × 10<sup>-12</sup>) for every allele inherited at Val379Ala. Darapladib 160 mg once-daily reduced Lp-PLA<sub>2</sub> activity by 65% ( p < 10<sup>-300</sup>). Causal risk ratios for coronary heart disease per 65% lower Lp-PLA<sub>2</sub> activity were: 0.95 (0.88-1.03) with Val279Phe; 0.92 (0.74-1.16) with carriage of any loss-of-function variant; 1.01 (0.68-1.51) with Val379Ala; and 0.95 (0.89-1.02) with darapladib treatment. Conclusions In a large-scale human genetic study, none of a series of Lp-PLA<sub>2</sub>-lowering alleles was related to coronary heart disease risk, suggesting that Lp-PLA<sub>2</sub> is unlikely to be a causal risk factor.

    Funded by: British Heart Foundation: RG/14/5/30893, SP/09/002; European Research Council: 268834; Medical Research Council: G0800270, MC_UU_00002/7

    European journal of preventive cardiology 2017;24;5;492-504

  • Mosaic autosomal aneuploidies are detectable from single-cell RNAseq data.

    Griffiths JA, Scialdone A and Marioni JC

    Cancer Research UK Cambridge Institute, University of Cambridge, CB2 0RE, Cambridge, UK.

    Background: Aneuploidies are copy number variants that affect entire chromosomes. They are seen commonly in cancer, embryonic stem cells, human embryos, and in various trisomic diseases. Aneuploidies frequently affect only a subset of cells in a sample; this is known as "mosaic" aneuploidy. A cell that harbours an aneuploidy exhibits disrupted gene expression patterns which can alter its behaviour. However, detection of aneuploidies using conventional single-cell DNA-sequencing protocols is slow and expensive.

    Methods: We have developed a method that uses chromosome-wide expression imbalances to identify aneuploidies from single-cell RNA-seq data. The method provides quantitative aneuploidy calls, and is integrated into an R software package available on GitHub and as an Additional file of this manuscript.

    Results: We validate our approach using data with known copy number, identifying the vast majority of aneuploidies with a low rate of false discovery. We show further support for the method's efficacy by exploiting allele-specific gene expression levels, and differential expression analyses.

    Conclusions: The method is quick and easy to apply, straightforward to interpret, and represents a substantial cost saving compared to single-cell genome sequencing techniques. However, the method is less well suited to data where gene expression is highly variable. The results obtained from the method can be used to investigate the consequences of aneuploidy itself, or to exclude aneuploidy-affected expression values from conventional scRNA-seq data analysis.

    Funded by: Cancer Research UK: A17197; Wellcome Trust: 105031/B/14/Z, 109081/Z/15/A

    BMC genomics 2017;18;1;904

  • Megakaryocytes in Myeloproliferative Neoplasms Have Unique Somatic Mutations.

    Guo BB, Allcock RJ, Mirzai B, Malherbe JA, Choudry FA, Frontini M, Chuah H, Liang J, Kavanagh SE, Howman R, Ouwehand WH, Fuller KA and Erber WN

    School of Biomedical Sciences, University of Western Australia, Crawley, Western Australia, Australia.

    Myeloproliferative neoplasms (MPNs) are a group of related clonal hemopoietic stem cell disorders associated with hyperproliferation of myeloid cells. They are driven by mutations in the hemopoietic stem cell, most notably JAK2<sup>V617F</sup>, CALR, and MPL. Clinically, they have the propensity to progress to myelofibrosis and transform to acute myeloid leukemia. Megakaryocytic hyperplasia with abnormal features are characteristic, and it is thought that these cells stimulate and drive fibrotic progression. The biological defects underpinning this remain to be explained. In this study we examined the megakaryocyte genome in 12 patients with MPNs to determine whether there are somatic variants and whether there is any association with marrow fibrosis. We performed targeted next-generation sequencing for 120 genes associated with myeloid neoplasms on megakaryocytes isolated from aspirated bone marrow. Ten of the 12 patients had genomic defects in megakaryocytes that were not present in nonmegakaryocytic hemopoietic marrow cells from the same patient. The greatest allelic burden was in patients with increased reticulin deposition. The megakaryocyte-unique mutations were predominantly in genes that regulate chromatin remodeling, chromosome alignment, and stability. These findings show that genomic abnormalities are present in megakaryocytes in MPNs and that these appear to be associated with progression to bone marrow fibrosis.

    The American journal of pathology 2017;187;7;1512-1522

  • Epigenetic resetting of human pluripotency.

    Guo G, von Meyenn F, Rostovskaya M, Clarke J, Dietmann S, Baker D, Sahakyan A, Myers S, Bertone P, Reik W, Plath K and Smith A

    Wellcome Trust - Medical Research Council Cambridge Stem Cell Institute, University of Cambridge, Cambridge CB2 1QR, UK

    Much attention has focussed on the conversion of human pluripotent stem cells (PSCs) to a more naïve developmental status. Here we provide a method for resetting via transient histone deacetylase inhibition. The protocol is effective across multiple PSC lines and can proceed without karyotype change. Reset cells can be expanded without feeders with a doubling time of around 24 h. WNT inhibition stabilises the resetting process. The transcriptome of reset cells diverges markedly from that of primed PSCs and shares features with human inner cell mass (ICM). Reset cells activate expression of primate-specific transposable elements. DNA methylation is globally reduced to a level equivalent to that in the ICM and is non-random, with gain of methylation at specific loci. Methylation imprints are mostly lost, however. Reset cells can be re-primed to undergo tri-lineage differentiation and germline specification. In female reset cells, appearance of biallelic X-linked gene transcription indicates reactivation of the silenced X chromosome. On reconversion to primed status, <i>XIST</i>-induced silencing restores monoallelic gene expression. The facile and robust conversion routine with accompanying data resources will enable widespread utilisation, interrogation, and refinement of candidate naïve cells.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/K010867/1; Medical Research Council: G1001028, G1100526, MR/P00072X/1; Wellcome Trust: 095645/Z/11/Z

    Development (Cambridge, England) 2017;144;15;2748-2763

  • First Draft Genome Sequence of the Dourine Causative Agent: Trypanosoma Equiperdum Strain OVI.

    Hébert L, Moumen B, Madeline A, Steinbiss S, Lakhdar L, Van Reet N, Büscher P, Laugier C, Cauchard J and Petry S

    ANSES, Dozulé Laboratory for Equine Diseases, Bacteriology and Parasitology Unit, 14430 Goustranville, France.

    <i>Trypanosoma equiperdum</i> is the causative agent of dourine, a sexually-transmitted infection of horses. This parasite belongs to the subgenus Trypanozoon that also includes the agent of sleeping sickness (<i>Trypanosoma brucei</i>) and surra (<i>Trypanosoma evansi</i>). We herein report the genome sequence of a <i>T. equiperdum</i> strain OVI, isolated from a horse in South-Africa in 1976. This is the first genome sequence of the <i>T. equiperdum</i> species, and its availability will provide important insights for future studies on genetic classification of the subgenus Trypanozoon.

    Journal of genomics 2017;5;1-3

  • Continuity and Admixture in the Last Five Millennia of Levantine History from Ancient Canaanite and Present-Day Lebanese Genome Sequences.

    Haber M, Doumet-Serhal C, Scheib C, Xue Y, Danecek P, Mezzavilla M, Youhanna S, Martiniano R, Prado-Martinez J, Szpak M, Matisoo-Smith E, Schutkowski H, Mikulski R, Zalloua P, Kivisild T and Tyler-Smith C

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. Electronic address:

    The Canaanites inhabited the Levant region during the Bronze Age and established a culture that became influential in the Near East and beyond. However, the Canaanites, unlike most other ancient Near Easterners of this period, left few surviving textual records and thus their origin and relationship to ancient and present-day populations remain unclear. In this study, we sequenced five whole genomes from ∼3,700-year-old individuals from the city of Sidon, a major Canaanite city-state on the Eastern Mediterranean coast. We also sequenced the genomes of 99 individuals from present-day Lebanon to catalog modern Levantine genetic diversity. We find that a Bronze Age Canaanite-related ancestry was widespread in the region, shared among urban populations inhabiting the coast (Sidon) and inland populations (Jordan) who likely lived in farming societies or were pastoral nomads. This Canaanite-related ancestry derived from mixture between local Neolithic populations and eastern migrants genetically related to Chalcolithic Iranians. We estimate, using linkage-disequilibrium decay patterns, that admixture occurred 6,600-3,550 years ago, coinciding with recorded massive population movements in Mesopotamia during the mid-Holocene. We show that present-day Lebanese derive most of their ancestry from a Canaanite-related population, which therefore implies substantial genetic continuity in the Levant since at least the Bronze Age. In addition, we find Eurasian ancestry in the Lebanese not present in Bronze Age or earlier Levantines. We estimate that this Eurasian ancestry arrived in the Levant around 3,750-2,170 years ago during a period of successive conquests by distant populations.

    Funded by: Wellcome Trust

    American journal of human genetics 2017;101;2;274-282

  • Statistical methods to detect pleiotropy in human complex traits.

    Hackinger S and Zeggini E

    Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    In recent years pleiotropy, the phenomenon of one genetic locus influencing several traits, has become a widely researched field in human genetics. With the increasing availability of genome-wide association study summary statistics, as well as the establishment of deeply phenotyped sample collections, it is now possible to systematically assess the genetic overlap between multiple traits and diseases. In addition to increasing power to detect associated variants, multi-trait methods can also aid our understanding of how different disorders are aetiologically linked by highlighting relevant biological pathways. A plethora of available tools to perform such analyses exists, each with their own advantages and limitations. In this review, we outline some of the currently available methods to conduct multi-trait analyses. First, we briefly introduce the concept of pleiotropy and outline the current landscape of pleiotropy research in human genetics; second, we describe analytical considerations and analysis methods; finally, we discuss future directions for the field.

    Funded by: Wellcome Trust: WT098051

    Open biology 2017;7;11

  • Evaluation of shared genetic aetiology between osteoarthritis and bone mineral density identifies SMAD3 as a novel osteoarthritis risk locus.

    Hackinger S, Trajanoska K, Styrkarsdottir U, Zengini E, Steinberg J, Ritchie GRS, Hatzikotoulas K, Gilly A, Evangelou E, Kemp JP, arcOGEN Consortium, GEFOS Consortium, Evans D, Ingvarsson T, Jonsson H, Thorsteinsdottir U, Stefansson K, McCaskie AW, Brooks RA, Wilkinson JM, Rivadeneira F and Zeggini E

    Human Genetics, Wellcome Trust Sanger Institute, Hinxton CB10 1HH, UK.

    Osteoarthritis (OA) is a common complex disease with high public health burden and no curative therapy. High bone mineral density (BMD) is associated with an increased risk of developing OA, suggesting a shared underlying biology. Here, we performed the first systematic overlap analysis of OA and BMD on a genome wide scale. We used summary statistics from the GEFOS consortium for lumbar spine (n = 31,800) and femoral neck (n = 32,961) BMD, and from the arcOGEN consortium for three OA phenotypes (hip, ncases=3,498; knee, ncases=3,266; hip and/or knee, ncases=7,410; ncontrols=11,009). Performing LD score regression we found a significant genetic correlation between the combined OA phenotype (hip and/or knee) and lumbar spine BMD (rg=0.18, P = 2.23 × 10-2), which may be driven by the presence of spinal osteophytes. We identified 143 variants with evidence for cross-phenotype association which we took forward for replication in independent large-scale OA datasets, and subsequent meta-analysis with arcOGEN for a total sample size of up to 23,425 cases and 236,814 controls. We found robustly replicating evidence for association with OA at rs12901071 (OR 1.08 95% CI 1.05-1.11, Pmeta=3.12 × 10-10), an intronic variant in the SMAD3 gene, which is known to play a role in bone remodeling and cartilage maintenance. We were able to confirm expression of SMAD3 in intact and degraded cartilage of the knee and hip. Our findings provide the first systematic evaluation of pleiotropy between OA and BMD, highlight genes with biological relevance to both traits, and establish a robust new OA genetic risk locus at SMAD3.

    Funded by: Medical Research Council: G0000934, G0100594, G0600237, G0900753, G0901461, MC_PC_12009, MC_UU_12013/4, MR/K002279/1; NIDDK NIH HHS: U01 DK062418; Wellcome Trust

    Human molecular genetics 2017;26;19;3850-3858

  • The Hidden Genomics of Chlamydia trachomatis.

    Hadfield J, Bénard A, Domman D and Thomson N

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK.

    The application of whole-genome sequencing has moved us on from sequencing single genomes to defining unravelling population structures in different niches, and at the -species, -serotype or even -genus level, and in local, national and global settings. This has been instrumental in cataloguing and revealing a huge a range of diversity in this bacterium, when at first we thought there was little. Genomics has challenged assumptions, added insight, as well as confusion and glimpses of truths. What is clear is that at a time when we start to realise the extent and nature of the diversity contained within a genus or a species like this, the huge depth of knowledge communities have developed, through cell biology, as well as the new found molecular approaches will be more precious than ever to link genotype to phenotype. Here we detail the technological developments and insights we have seen during the relatively short time since we began to see the hidden genome of Chlamydia trachomatis.

    Current topics in microbiology and immunology 2017

  • Comprehensive global genome dynamics of Chlamydia trachomatis show ancient diversification followed by contemporary mixing and recent lineage expansion.

    Hadfield J, Harris SR, Seth-Smith HMB, Parmar S, Andersson P, Giffard PM, Schachter J, Moncada J, Ellison L, Vaulet MLG, Fermepin MR, Radebe F, Mendoza S, Ouburg S, Morré SA, Sachse K, Puolakkainen M, Korhonen SJ, Sonnex C, Wiggins R, Jalal H, Brunelli T, Casprini P, Pitt R, Ison C, Savicheva A, Shipitsyna E, Hadad R, Kari L, Burton MJ, Mabey D, Solomon AW, Lewis D, Marsh P, Unemo M, Clarke IN, Parkhill J and Thomson NR

    Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    <i>Chlamydia trachomatis</i> is the world's most prevalent bacterial sexually transmitted infection and leading infectious cause of blindness, yet it is one of the least understood human pathogens, in part due to the difficulties of in vitro culturing and the lack of available tools for genetic manipulation. Genome sequencing has reinvigorated this field, shedding light on the contemporary history of this pathogen. Here, we analyze 563 full genomes, 455 of which are novel, to show that the history of the species comprises two phases, and conclude that the currently circulating lineages are the result of evolution in different genomic ecotypes. Temporal analysis indicates these lineages have recently expanded in the space of thousands of years, rather than the millions of years as previously thought, a finding that dramatically changes our understanding of this pathogen's history. Finally, at a time when almost every pathogen is becoming increasingly resistant to antimicrobials, we show that there is no evidence of circulating genomic resistance in <i>C. trachomatis</i>.

    Funded by: Wellcome Trust

    Genome research 2017;27;7;1220-1229

  • Dietary restriction protects from age-associated DNA methylation and induces epigenetic reprogramming of lipid metabolism.

    Hahn O, Grönke S, Stubbs TM, Ficz G, Hendrich O, Krueger F, Andrews S, Zhang Q, Wakelam MJ, Beyer A, Reik W and Partridge L

    Max Planck Institute for Biology of Ageing, 50931, Cologne, Germany.

    Background: Dietary restriction (DR), a reduction in food intake without malnutrition, increases most aspects of health during aging and extends lifespan in diverse species, including rodents. However, the mechanisms by which DR interacts with the aging process to improve health in old age are poorly understood. DNA methylation could play an important role in mediating the effects of DR because it is sensitive to the effects of nutrition and can affect gene expression memory over time.

    Results: Here, we profile genome-wide changes in DNA methylation, gene expression and lipidomics in response to DR and aging in female mouse liver. DR is generally strongly protective against age-related changes in DNA methylation. During aging with DR, DNA methylation becomes targeted to gene bodies and is associated with reduced gene expression, particularly of genes involved in lipid metabolism. The lipid profile of the livers of DR mice is correspondingly shifted towards lowered triglyceride content and shorter chain length of triglyceride-associated fatty acids, and these effects become more pronounced with age.

    Conclusions: Our results indicate that DR remodels genome-wide patterns of DNA methylation so that age-related changes are profoundly delayed, while changes at loci involved in lipid metabolism affect gene expression and the resulting lipid profile.

    Funded by: Biotechnology and Biological Sciences Research Council: BBS/E/B/000C0413, BBS/E/B/000C0417; Medical Research Council: MR/M004821/1; Wellcome Trust

    Genome biology 2017;18;1;56

  • The Y chromosomes of the great apes.

    Hallast P and Jobling MA

    Institute of Molecular and Cell Biology, University of Tartu, Tartu, 51010, Estonia.

    The great apes (orangutans, gorillas, chimpanzees, bonobos and humans) descended from a common ancestor around 13 million years ago, and since then their sex chromosomes have followed very different evolutionary paths. While great-ape X chromosomes are highly conserved, their Y chromosomes, reflecting the general lability and degeneration of this male-specific part of the genome since its early mammalian origin, have evolved rapidly both between and within species. Understanding great-ape Y chromosome structure, gene content and diversity would provide a valuable evolutionary context for the human Y, and would also illuminate sex-biased behaviours, and the effects of the evolutionary pressures exerted by different mating strategies on this male-specific part of the genome. High-quality Y-chromosome sequences are available for human and chimpanzee (and low-quality for gorilla). The chromosomes differ in size, sequence organisation and content, and while retaining a relatively stable set of ancestral single-copy genes, show considerable variation in content and copy number of ampliconic multi-copy genes. Studies of Y-chromosome diversity in other great apes are relatively undeveloped compared to those in humans, but have nevertheless provided insights into speciation, dispersal, and mating patterns. Future studies, including data from larger sample sizes of wild-born and geographically well-defined individuals, and full Y-chromosome sequences from bonobos, gorillas and orangutans, promise to further our understanding of population histories, male-biased behaviours, mutation processes, and the functions of Y-chromosomal genes.

    Human genetics 2017

  • High Rate of Recurrent De Novo Mutations in Developmental and Epileptic Encephalopathies.

    Hamdan FF, Myers CT, Cossette P, Lemay P, Spiegelman D, Laporte AD, Nassif C, Diallo O, Monlong J, Cadieux-Dion M, Dobrzeniecka S, Meloche C, Retterer K, Cho MT, Rosenfeld JA, Bi W, Massicotte C, Miguet M, Brunga L, Regan BM, Mo K, Tam C, Schneider A, Hollingsworth G, Deciphering Developmental Disorders Study, FitzPatrick DR, Donaldson A, Canham N, Blair E, Kerr B, Fry AE, Thomas RH, Shelagh J, Hurst JA, Brittain H, Blyth M, Lebel RR, Gerkes EH, Davis-Keppen L, Stein Q, Chung WK, Dorison SJ, Benke PJ, Fassi E, Corsten-Janssen N, Kamsteeg EJ, Mau-Them FT, Bruel AL, Verloes A, Õunap K, Wojcik MH, Albert DVF, Venkateswaran S, Ware T, Jones D, Liu YC, Mohammad SS, Bizargity P, Bacino CA, Leuzzi V, Martinelli S, Dallapiccola B, Tartaglia M, Blumkin L, Wierenga KJ, Purcarin G, O'Byrne JJ, Stockler S, Lehman A, Keren B, Nougues MC, Mignot C, Auvin S, Nava C, Hiatt SM, Bebin M, Shao Y, Scaglia F, Lalani SR, Frye RE, Jarjour IT, Jacques S, Boucher RM, Riou E, Srour M, Carmant L, Lortie A, Major P, Diadori P, Dubeau F, D'Anjou G, Bourque G, Berkovic SF, Sadleir LG, Campeau PM, Kibar Z, Lafrenière RG, Girard SL, Mercimek-Mahmutoglu S, Boelman C, Rouleau GA, Scheffer IE, Mefford HC, Andrade DM, Rossignol E, Minassian BA and Michaud JL

    Centre Hospitalier Universitaire Sainte-Justine Research Center, Montreal, QC H3T1C5, Canada.

    Developmental and epileptic encephalopathy (DEE) is a group of conditions characterized by the co-occurrence of epilepsy and intellectual disability (ID), typically with developmental plateauing or regression associated with frequent epileptiform activity. The cause of DEE remains unknown in the majority of cases. We performed whole-genome sequencing (WGS) in 197 individuals with unexplained DEE and pharmaco-resistant seizures and in their unaffected parents. We focused our attention on de novo mutations (DNMs) and identified candidate genes containing such variants. We sought to identify additional subjects with DNMs in these genes by performing targeted sequencing in another series of individuals with DEE and by mining various sequencing datasets. We also performed meta-analyses to document enrichment of DNMs in candidate genes by leveraging our WGS dataset with those of several DEE and ID series. By combining these strategies, we were able to provide a causal link between DEE and the following genes: NTRK2, GABRB2, CLTC, DHDDS, NUS1, RAB11A, GABBR2, and SNAP25. Overall, we established a molecular diagnosis in 63/197 (32%) individuals in our WGS series. The main cause of DEE in these individuals was de novo point mutations (53/63 solved cases), followed by inherited mutations (6/63 solved cases) and de novo CNVs (4/63 solved cases). De novo missense variants explained a larger proportion of individuals in our series than in other series that were primarily ascertained because of ID. Moreover, these DNMs were more frequently recurrent than those identified in ID series. These observations indicate that the genetic landscape of DEE might be different from that of ID without epilepsy.

    Funded by: Medical Research Council: MC_PC_U127561093; NHGRI NIH HHS: UM1 HG008900; NICHD NIH HHS: U54 HD083091; NIGMS NIH HHS: T32 GM007748; NINDS NIH HHS: R01 NS069605

    American journal of human genetics 2017;101;5;664-685

  • Extreme mutation bias and high AT content in Plasmodium falciparum.

    Hamilton WL, Claessens A, Otto TD, Kekre M, Fairhurst RM, Rayner JC and Kwiatkowski D

    Malaria Programme, Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK.

    For reasons that remain unknown, the Plasmodium falciparum genome has an exceptionally high AT content compared to other Plasmodium species and eukaryotes in general - nearly 80% in coding regions and approaching 90% in non-coding regions. Here, we examine how this phenomenon relates to genome-wide patterns of de novo mutation. Mutation accumulation experiments were performed by sequential cloning of six P. falciparum isolates growing in human erythrocytes in vitro for 4 years, with 279 clones sampled for whole genome sequencing at different time points. Genome sequence analysis of these samples revealed a significant excess of G:C to A:T transitions compared to other types of nucleotide substitution, which would naturally cause AT content to equilibrate close to the level seen across the P. falciparum reference genome (80.6% AT). These data also uncover an extremely high rate of small indel mutation relative to other species, primarily associated with repetitive AT-rich sequences, in addition to larger-scale structural rearrangements focused in antigen-coding var genes. In conclusion, high AT content in P. falciparum is driven by a systematic mutational bias and ultimately leads to an unusual level of microstructural plasticity, raising the question of whether this contributes to adaptive evolution.

    Funded by: Medical Research Council: G0600718, MR/M006212/1; Wellcome Trust: 098051

    Nucleic acids research 2017;45;4;1889-1901

  • Comparing Ancient DNA Preservation in Petrous Bone and Tooth Cementum.

    Hansen HB, Damgaard PB, Margaryan A, Stenderup J, Lynnerup N, Willerslev E and Allentoft ME

    Centre for GeoGenetics, Natural History Museum, University of Copenhagen, Copenhagen, Denmark.

    Large-scale genomic analyses of ancient human populations have become feasible partly due to refined sampling methods. The inner part of petrous bones and the cementum layer in teeth roots are currently recognized as the best substrates for such research. We present a comparative analysis of DNA preservation in these two substrates obtained from the same human skulls, across a range of different ages and preservation environments. Both substrates display significantly higher endogenous DNA content (average of 16.4% and 40.0% for teeth and petrous bones, respectively) than parietal skull bone (average of 2.2%). Despite sample-to-sample variation, petrous bone overall performs better than tooth cementum (p = 0.001). This difference, however, is driven largely by a cluster of viking skeletons from one particular locality, showing relatively poor molecular tooth preservation (<10% endogenous DNA). In the remaining skeletons there is no systematic difference between the two substrates. A crude preservation (good/bad) applied to each sample prior to DNA-extraction predicted the above/below 10% endogenous DNA threshold in 80% of the cases. Interestingly, we observe signficantly higher levels of cytosine to thymine deamination damage and lower proportions of mitochondrial/nuclear DNA in petrous bone compared to tooth cementum. Lastly, we show that petrous bones from ancient cremated individuals contain no measurable levels of authentic human DNA. Based on these findings we discuss the pros and cons of sampling the different elements.

    PloS one 2017;12;1;e0170940

  • A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications.

    Haque A, Engel J, Teichmann SA and Lönnberg T

    QIMR Berghofer Medical Research Institute, Herston, Brisbane, Queensland, 4006, Australia.

    RNA sequencing (RNA-seq) is a genomic approach for the detection and quantitative analysis of messenger RNA molecules in a biological sample and is useful for studying cellular responses. RNA-seq has fueled much discovery and innovation in medicine over recent years. For practical reasons, the technique is usually conducted on samples comprising thousands to millions of cells. However, this has hindered direct assessment of the fundamental unit of biology-the cell. Since the first single-cell RNA-sequencing (scRNA-seq) study was published in 2009, many more have been conducted, mostly by specialist laboratories with unique skills in wet-lab single-cell genomics, bioinformatics, and computation. However, with the increasing commercial availability of scRNA-seq platforms, and the rapid ongoing maturation of bioinformatics approaches, a point has been reached where any biomedical researcher or clinician can use scRNA-seq to make exciting discoveries. In this review, we present a practical guide to help researchers design their first scRNA-seq studies, including introductory information on experimental hardware, protocol choice, quality control, data analysis and biological interpretation.

    Genome medicine 2017;9;1;75

  • Methicillin-resistant Staphylococcus aureus emerged long before the introduction of methicillin into clinical practice.

    Harkins CP, Pichon B, Doumith M, Parkhill J, Westh H, Tomasz A, de Lencastre H, Bentley SD, Kearns AM and Holden MTG

    School of Medicine, University of St Andrews, St Andrews, KY16 9TF, UK.

    Background: The spread of drug-resistant bacterial pathogens poses a major threat to global health. It is widely recognised that the widespread use of antibiotics has generated selective pressures that have driven the emergence of resistant strains. Methicillin-resistant Staphylococcus aureus (MRSA) was first observed in 1960, less than one year after the introduction of this second generation beta-lactam antibiotic into clinical practice. Epidemiological evidence has always suggested that resistance arose around this period, when the mecA gene encoding methicillin resistance carried on an SCCmec element, was horizontally transferred to an intrinsically sensitive strain of S. aureus.

    Results: Whole genome sequencing a collection of the first MRSA isolates allows us to reconstruct the evolutionary history of the archetypal MRSA. We apply Bayesian phylogenetic reconstruction to infer the time point at which this early MRSA lineage arose and when SCCmec was acquired. MRSA emerged in the mid-1940s, following the acquisition of an ancestral type I SCCmec element, some 14 years before the first therapeutic use of methicillin.

    Conclusions: Methicillin use was not the original driving factor in the evolution of MRSA as previously thought. Rather it was the widespread use of first generation beta-lactams such as penicillin in the years prior to the introduction of methicillin, which selected for S. aureus strains carrying the mecA determinant. Crucially this highlights how new drugs, introduced to circumvent known resistance mechanisms, can be rendered ineffective by unrecognised adaptations in the bacterial population due to the historic selective landscape created by the widespread use of other antibiotics.

    Funded by: Wellcome Trust: 098051, 104241/Z/14/Z

    Genome biology 2017;18;1;130

  • Genomic surveillance reveals low prevalence of livestock-associated methicillin-resistant Staphylococcus aureus in the East of England.

    Harrison EM, Coll F, Toleman MS, Blane B, Brown NM, Török ME, Parkhill J and Peacock SJ

    Department of Medicine, University of Cambridge, Box 157 Addenbrooke's Hospital, Hills Road, Cambridge, CB2 0QQ, United Kingdom.

    Livestock-associated methicillin-resistant Staphylococcus aureus (LA-MRSA) is an emerging problem in many parts of the world. LA-MRSA has been isolated previously from animals and humans in the United Kingdom (UK), but the prevalence is unknown. The aim of this study was to determine the prevalence and to describe the molecular epidemiology of LA-MRSA isolated in the East of England (broadly Cambridge and the surrounding area). We accessed whole genome sequence data for 2,283 MRSA isolates from 1,465 people identified during a 12-month prospective study between 2012 and 2013 conducted in the East of England, United Kingdom. This laboratory serves four hospitals and 75 general practices. We screened the collection for multilocus sequence types (STs) and for host specific resistance and virulence factors previously associated with LA-MRSA. We identified 13 putative LA-MRSA isolates from 12 individuals, giving an estimated prevalence of 0.82% (95% CI 0.47% to 1.43%). Twelve isolates were mecC-MRSA (ten CC130, one ST425 and one ST1943) and single isolate was ST398. Our data demonstrate a low burden of LA-MRSA in the East of England, but the detection of mecC-MRSA and ST398 indicates the need for vigilance. Genomic surveillance provides a mechanism to detect and track the emergence and spread of MRSA clones of human importance.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council: G1000803, MR/N029399/1; Wellcome Trust: 201344/Z/16/Z

    Scientific reports 2017;7;1;7406

  • Key features of invasive pneumococcal isolates recovered in Lima, Peru determined through whole genome sequencing.

    Hawkins P, Mercado E, Chochua S, Castillo ME, Reyes I, Chaparro E, Gladstone R, Bentley SD, Breiman RF, Metcalf BJ, Beall B, Ochoa TJ and McGee L

    Emory University, Atlanta, USA; Centers for Disease Control and Prevention, Atlanta, USA. Electronic address:

    Before PCV7 introduction, invasive pneumococcal disease (IPD) was responsible for approximately 12,000-18,000 deaths annually among children <5years in Latin America. In Peru, PCV7 was introduced in 2009. We used whole genome sequencing to deduce key features of invasive strains collected in Lima, Peru from 2006 to 2011. We sequenced 212 IPD isolates from 16 hospitals in Lima pre (2006-2009; n=133) and post (2010-2011; n=79) PCV7 introduction; 130 (61.3%) isolates were from children≤5years old. CDC's Streptococcus lab bioinformatics pipeline revealed serotypes, sequence types (STs), pilus genes, PBP types and other resistance determinants. During the pre-PCV7 period, serotype 14 was the most common serotype (24.8%), followed by 6B (20.3%), 19F (10.5%), and 23F (6.8%). Post-PCV7, the proportion of PCV7 serotype 6B decreased significantly (to 6.3%), while 19F (16.3%), 14 (15.0%), 23F (7.5%), and 19A (7.5%) were the most common serotypes; only serotypes 3 and 10A increased significantly. Overall, 82% (n=173) of all isolates carried at least one resistance determinant, including 72 (34%) isolates that carried resistance determinants against 3 or more antimicrobial classes; of these 72 isolates, 56 (78%) belonged to a PCV7 serotype. Eighty-two STs were identified, with 53 of them organized in 14 clonal complexes. ST frequencies were distributed differently pre and post-PCV7 introduction, with only 18 of the 57 STs identified in years 2006-2009 isolates also observed in years 2010-2011 isolates. The apparent expansion of a 19F/ST1421 lineage with predicted β-lactam resistance (PBP type 13:16:20) and carrying resistance determinants against four additional antimicrobial classes was observed.

    International journal of medical microbiology : IJMM 2017;307;7;415-421

  • Antimicrobial resistance determinants and susceptibility profiles of pneumococcal isolates recovered in Trinidad and Tobago.

    Hawkins PA, Akpaka PE, Nurse-Lucas M, Gladstone R, Bentley SD, Breiman RF, McGee L and Swanston WH

    Emory University, Atlanta, GA, USA; US Centers for Disease Control and Prevention (CDC), Atlanta, GA, USA. Electronic address:

    Objectives: In Latin America and the Caribbean, pneumococcal infections are estimated to account for 12000-18000 deaths, 327000 pneumonia cases, 4000 meningitis cases and 1229 sepsis cases each year in children under five years old. Pneumococcal antimicrobial resistance has evolved into a worldwide health problem in the last few decades. This study aimed to determine the antimicrobial susceptibility profiles of pneumococcal isolates collected in Trinidad and Tobago and their associated genetic determinants.

    Methods: Whole-genome sequences were obtained from 98 pneumococcal isolates recovered at several regional hospitals, including 83 invasive and 15 non-invasive strains, recovered before (n=25) and after (n=73) introduction of pneumococcal conjugate vaccines (PCVs). A bioinformatics pipeline was used to identify core genomic and accessory elements conferring antimicrobial resistance phenotypes, including β-lactam non-susceptibility.

    Results and discussion: Forty-one isolates (41.8%) were predicted as resistant to at least one antimicrobial class, including 13 (13.3%) resistant to at least three classes. The most common serotypes associated with antimicrobial resistance were 23F (n=10), 19F (n=8), 6B (n=6) and 14 (n=5). The most common serotypes associated with penicillin non-susceptibility were 19F (n=7) and 14 (n=5). Thirty-nine isolates (39.8%) were positive for PI-1 or PI-2 type pili: 30 (76.9%) were PI-1+, 4 (10.3%) were PI-2+ and 5 (12.8%) were positive for both PI-1 and PI-2. Of the 13 multidrug-resistant isolates, 10 belonged to globally distributed clones PMEN3 and PMEN14 and were isolated in the post-PCV period, suggesting clonal expansion.

    Journal of global antimicrobial resistance 2017;11;148-151

  • Circulating and Tissue-Resident CD4+ T Cells With Reactivity to Intestinal Microbiota Are Abundant in Healthy Individuals and Function Is Altered During Inflammation.

    Hegazy AN, West NR, Stubbington MJT, Wendt E, Suijker KIM, Datsi A, This S, Danne C, Campion S, Duncan SH, Owens BMJ, Uhlig HH, McMichael A, Oxford IBD Cohort Investigators, Bergthaler A, Teichmann SA, Keshav S and Powrie F

    Translational Gastroenterology Unit, Nuffield Department of Clinical Medicine, Experimental Medicine Division, John Radcliffe Hospital, University of Oxford, United Kingdom; Kennedy Institute of Rheumatology, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, United Kingdom.

    Background &amp; aims: Interactions between commensal microbes and the immune system are tightly regulated and maintain intestinal homeostasis, but little is known about these interactions in humans. We investigated responses of human CD4+ T cells to the intestinal microbiota. We measured the abundance of T cells in circulation and intestinal tissues that respond to intestinal microbes and determined their clonal diversity. We also assessed their functional phenotypes and effects on intestinal resident cell populations, and studied alterations in microbe-reactive T cells in patients with chronic intestinal inflammation.

    Methods: We collected samples of peripheral blood mononuclear cells and intestinal tissues from healthy individuals (controls, n = 13-30) and patients with inflammatory bowel diseases (n = 119; 59 with ulcerative colitis and 60 with Crohn's disease). We used 2 independent assays (CD154 detection and carboxy-fluorescein succinimidyl ester dilution assays) and 9 intestinal bacterial species (Escherichia coli, Lactobacillus acidophilus, Bifidobacterium animalis subsp lactis, Faecalibacterium prausnitzii, Bacteroides vulgatus, Roseburia intestinalis, Ruminococcus obeum, Salmonella typhimurium, and Clostridium difficile) to quantify, expand, and characterize microbe-reactive CD4+ T cells. We sequenced T-cell receptor Vβ genes in expanded microbe-reactive T-cell lines to determine their clonal diversity. We examined the effects of microbe-reactive CD4+ T cells on intestinal stromal and epithelial cell lines. Cytokines, chemokines, and gene expression patterns were measured by flow cytometry and quantitative polymerase chain reaction.

    Results: Circulating and gut-resident CD4+ T cells from controls responded to bacteria at frequencies of 40-4000 per million for each bacterial species tested. Microbiota-reactive CD4+ T cells were mainly of a memory phenotype, present in peripheral blood mononuclear cells and intestinal tissue, and had a diverse T-cell receptor Vβ repertoire. These cells were functionally heterogeneous, produced barrier-protective cytokines, and stimulated intestinal stromal and epithelial cells via interleukin 17A, interferon gamma, and tumor necrosis factor. In patients with inflammatory bowel diseases, microbiota-reactive CD4+ T cells were reduced in the blood compared with intestine; T-cell responses that we detected had an increased frequency of interleukin 17A production compared with responses of T cells from blood or intestinal tissues of controls.

    Conclusions: In an analysis of peripheral blood mononuclear cells and intestinal tissues from patients with inflammatory bowel diseases vs controls, we found that reactivity to intestinal bacteria is a normal property of the human CD4+ T-cell repertoire, and does not necessarily indicate disrupted interactions between immune cells and the commensal microbiota. T-cell responses to commensals might support intestinal homeostasis, by producing barrier-protective cytokines and providing a large pool of T cells that react to pathogens.

    Funded by: European Research Council: 260507; NIAID NIH HHS: UM1 AI100645; Wellcome Trust

    Gastroenterology 2017;153;5;1320-1337.e16

  • The great escape.

    Heinz E

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2017;16;1;4

  • Reshaping the tree of life.

    Heinz E and Domman D

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    This month's Genome Watch highlights how metagenomics is continuing to reveal the diversity of microorganisms in the environment and how it is challenging and expanding our understanding of how life evolved on Earth.

    Nature reviews. Microbiology 2017;15;6;322

  • Summing up the parts of the hypothalamus.

    Hemberg M

    Wellcome Trust Sanger Institute, Hinxton, UK.

    Nature neuroscience 2017;20;3;378-379

  • Rare Variant Analysis of Human and Rodent Obesity Genes in Individuals with Severe Childhood Obesity.

    Hendricks AE, Bochukova EG, Marenne G, Keogh JM, Atanassova N, Bounds R, Wheeler E, Mistry V, Henning E, Körner A, Muddyman D, McCarthy S, Hinney A, Hebebrand J, Scott RA, Langenberg C, Wareham NJ, Surendran P, Howson JM, Butterworth AS, Danesh J, Nordestgaard BG, Nielsen SF, Afzal S, Papadia S, Ashford S, Garg S, Millhauser GL, Palomino RI, Kwasniewska A, Tachmazidou I, O'Rahilly S, Zeggini E, Barroso I, Farooqi IS, Understanding Society Scientific Group, EPIC-CVD Consortium and UK10K Consortium

    Wellcome Trust Sanger Institute, Cambridge, UK.

    Obesity is a genetically heterogeneous disorder. Using targeted and whole-exome sequencing, we studied 32 human and 87 rodent obesity genes in 2,548 severely obese children and 1,117 controls. We identified 52 variants contributing to obesity in 2% of cases including multiple novel variants in GNAS, which were sometimes found with accelerated growth rather than short stature as described previously. Nominally significant associations were found for rare functional variants in BBS1, BBS9, GNAS, MKKS, CLOCK and ANGPTL6. The p.S284X variant in ANGPTL6 drives the association signal (rs201622589, MAF~0.1%, odds ratio = 10.13, p-value = 0.042) and results in complete loss of secretion in cells. Further analysis including additional case-control studies and population controls (N = 260,642) did not support association of this variant with obesity (odds ratio = 2.34, p-value = 2.59 × 10<sup>-3</sup>), highlighting the challenges of testing rare variant associations and the need for very large sample sizes. Further validation in cohorts with severe obesity and engineering the variants in model organisms will be needed to explore whether human variants in ANGPTL6 and other genes that lead to obesity when deleted in mice, do contribute to obesity. Such studies may yield druggable targets for weight loss therapies.

    Funded by: British Heart Foundation: RG/08/014/24067, RG/15/14/31880, RG/16/4/32218; European Research Council: 268834; Medical Research Council: G0800270, G0900554, MC_UU_12012/1, MC_UU_12012/5, MC_UU_12015/1, MR/L003120/1, MR/L010305/1, MR/P013880/1, MR/P02811X/1; NIDDK NIH HHS: R01 DK064265, R01 DK110403; NIGMS NIH HHS: R25 GM058903; Wellcome Trust

    Scientific reports 2017;7;1;4394

  • Genetic Screen for Postembryonic Development in the Zebrafish (Danio rerio): Dominant Mutations Affecting Adult Form.

    Henke K, Daane JM, Hawkins MB, Dooley CM, Busch-Nentwich EM, Stemple DL and Harris MP

    Department of Orthopedic Research, Boston Children's Hospital, Massachusetts 02115

    Large-scale forward genetic screens have been instrumental for identifying genes that regulate development, homeostasis, and regeneration, as well as the mechanisms of disease. The zebrafish, <i>Danio rerio</i>, is an established genetic and developmental model used in genetic screens to uncover genes necessary for early development. However, the regulation of postembryonic development has received less attention as these screens are more labor intensive and require extensive resources. The lack of systematic interrogation of late development leaves large aspects of the genetic regulation of adult form and physiology unresolved. To understand the genetic control of postembryonic development, we performed a dominant screen for phenotypes affecting the adult zebrafish. In our screen, we identified 72 adult viable mutants showing changes in the shape of the skeleton as well as defects in pigmentation. For efficient mapping of these mutants and mutation identification, we devised a new mapping strategy based on identification of mutant-specific haplotypes. Using this method in combination with a candidate gene approach, we were able to identify linked mutations for 22 out of 25 mutants analyzed. Broadly, our mutational analysis suggests that there are key genes and pathways associated with late development. Many of these pathways are shared with humans and are affected in various disease conditions, suggesting constraint in the genetic pathways that can lead to change in adult form. Taken together, these results show that dominant screens are a feasible and productive means to identify mutations that can further our understanding of gene function during postembryonic development and in disease.

    Funded by: NIDCR NIH HHS: U01 DE024434

    Genetics 2017;207;2;609-623

  • HLA haplotypes in primary sclerosing cholangitis patients of admixed and non-European ancestry.

    Henriksen EKK, Viken MK, Wittig M, Holm K, Folseraas T, Mucha S, Melum E, Hov JR, Lazaridis KN, Juran BD, Chazouillères O, Färkkilä M, Gotthardt DN, Invernizzi P, Carbone M, Hirschfield GM, Rushbrook SM, Goode E, UK-PSC Consortium, Ponsioen CY, Weersma RK, Eksteen B, Yimam KK, Gordon SC, Goldberg D, Yu L, Bowlus CL, Franke A, Lie BA and Karlsen TH

    Norwegian PSC Research Center, Department of Transplantation Medicine, Division of Surgery, Inflammatory Medicine and Transplantation, Oslo University Hospital Rikshospitalet, Oslo, Norway.

    Primary sclerosing cholangitis (PSC) is strongly associated with several human leukocyte antigen (HLA) haplotypes. Due to extensive linkage disequilibrium and multiple polymorphic candidate genes in the HLA complex, identifying the alleles responsible for these associations has proven difficult. We aimed to evaluate whether studying populations of admixed or non-European descent could help in defining the causative HLA alleles. When assessing haplotypes carrying HLA-DRB1*13:01 (hypothesized to specifically increase the susceptibility to chronic cholangitis), we observed that every haplotype in the Scandinavian PSC population carried HLA-DQB1*06:03. In contrast, only 65% of HLA-DRB1*13:01 haplotypes in an admixed/non-European PSC population carried this allele, suggesting that further assessments of the PSC-associated haplotype HLA-DRB1*13:01-DQA1*01:03-DQB1*06:03 in admixed or multi-ethnic populations could aid in identifying the causative allele.

    Funded by: NIDDK NIH HHS: R01 DK084960

    HLA 2017;90;4;228-233

  • Molecular epidemiology of Klebsiella pneumoniae invasive infections over a decade at Kilifi County Hospital in Kenya.

    Henson SP, Boinett CJ, Ellington MJ, Kagia N, Mwarumba S, Nyongesa S, Mturi N, Kariuki S, Scott JAG, Thomson NR and Morpeth SC

    KEMRI-Wellcome Trust Research Programme, Kilifi, Kenya; Centre for Tropical Medicine and Global Health, Nuffield Department of Clinical Medicine, Oxford University, Oxford, United Kingdom.

    Multidrug resistant (MDR) Klebsiella pneumoniae is a common cause of nosocomial infections worldwide. Recent years have seen an explosion of resistance to extended-spectrum β-lactamases (ESBLs) and emergence of carbapenem resistance. Here, we examine 198 invasive K. pneumoniae isolates collected from over a decade in Kilifi County Hospital (KCH) in Kenya. We observe a significant increase in MDR K. pneumoniae isolates, particularly to third generation cephalosporins conferred by ESBLs. Using whole-genome sequences, we describe the population structure and the distribution of antimicrobial resistance genes within it. More than half of the isolates examined in this study were ESBL-positive, encoding CTX-M-15, SHV-2, SHV-12 and SHV-27, and 79% were MDR conferring resistance to at least three antimicrobial classes. Although no isolates in our dataset were found to be resistant to carbapenems we did find a plasmid with the genetic architecture of a known New Delhi metallo-β-lactamase-1 (NDM)-carrying plasmid in 25 isolates. In the absence of carbapenem use in KCH and because of the instability of the NDM-1 gene in the plasmid, the NDM-1 gene has been lost in these isolates. Our data suggests that isolates that encode NDM-1 could be present in the population; should carbapenems be introduced as treatment in public hospitals in Kenya, resistance is likely to ensue rapidly.

    International journal of medical microbiology : IJMM 2017;307;7;422-429

  • PGBD5 promotes site-specific oncogenic mutations in human tumors.

    Henssen AG, Koche R, Zhuang J, Jiang E, Reed C, Eisenberg A, Still E, MacArthur IC, Rodríguez-Fos E, Gonzalez S, Puiggròs M, Blackford AN, Mason CE, de Stanchina E, Gönen M, Emde AK, Shah M, Arora K, Reeves C, Socci ND, Perlman E, Antonescu CR, Roberts CWM, Steen H, Mullen E, Jackson SP, Torrents D, Weng Z, Armstrong SA and Kentsis A

    Molecular Pharmacology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York, USA.

    Genomic rearrangements are a hallmark of human cancers. Here, we identify the piggyBac transposable element derived 5 (PGBD5) gene as encoding an active DNA transposase expressed in the majority of childhood solid tumors, including lethal rhabdoid tumors. Using assembly-based whole-genome DNA sequencing, we found previously undefined genomic rearrangements in human rhabdoid tumors. These rearrangements involved PGBD5-specific signal (PSS) sequences at their breakpoints and recurrently inactivated tumor-suppressor genes. PGBD5 was physically associated with genomic PSS sequences that were also sufficient to mediate PGBD5-induced DNA rearrangements in rhabdoid tumor cells. Ectopic expression of PGBD5 in primary immortalized human cells was sufficient to promote cell transformation in vivo. This activity required specific catalytic residues in the PGBD5 transposase domain as well as end-joining DNA repair and induced structural rearrangements with PSS breakpoints. These results define PGBD5 as an oncogenic mutator and provide a plausible mechanism for site-specific DNA rearrangements in childhood and adult solid tumors.

    Funded by: NCATS NIH HHS: UL1 TR000457, UL1 TR002384; NCI NIH HHS: K08 CA160660, P30 CA008748, P50 CA140146, R21 CA188881; NIH HHS: U54 OD020355; Wellcome Trust

    Nature genetics 2017;49;7;1005-1014

  • Evidence for three genetic loci involved in both anorexia nervosa risk and variation of body mass index.

    Hinney A, Kesselmeier M, Jall S, Volckmar AL, Föcker M, Antel J, GCAN, WTCCC3, Heid IM, Winkler TW, GIANT, Grant SF, EGG, Guo Y, Bergen AW, Kaye W, Berrettini W, Hakonarson H, Price Foundation Collaborative Group, Children’s Hospital of Philadelphia/Price Foundation, Herpertz-Dahlmann B, de Zwaan M, Herzog W, Ehrlich S, Zipfel S, Egberts KM, Adan R, Brandys M, van Elburg A, Boraska Perica V, Franklin CS, Tschöp MH, Zeggini E, Bulik CM, Collier D, Scherag A, Müller TD and Hebebrand J

    Department of Child and Adolescent Psychiatry, Psychotherapy, and Psychosomatics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany.

    The maintenance of normal body weight is disrupted in patients with anorexia nervosa (AN) for prolonged periods of time. Prior to the onset of AN, premorbid body mass index (BMI) spans the entire range from underweight to obese. After recovery, patients have reduced rates of overweight and obesity. As such, loci involved in body weight regulation may also be relevant for AN and vice versa. Our primary analysis comprised a cross-trait analysis of the 1000 single-nucleotide polymorphisms (SNPs) with the lowest P-values in a genome-wide association meta-analysis (GWAMA) of AN (GCAN) for evidence of association in the largest published GWAMA for BMI (GIANT). Subsequently we performed sex-stratified analyses for these 1000 SNPs. Functional ex vivo studies on four genes ensued. Lastly, a look-up of GWAMA-derived BMI-related loci was performed in the AN GWAMA. We detected significant associations (P-values <5 × 10<sup>-5</sup>, Bonferroni-corrected P<0.05) for nine SNP alleles at three independent loci. Interestingly, all AN susceptibility alleles were consistently associated with increased BMI. None of the genes (chr. 10: CTBP2, chr. 19: CCNE1, chr. 2: CARF and NBEAL1; the latter is a region with high linkage disequilibrium) nearest to these SNPs has previously been associated with AN or obesity. Sex-stratified analyses revealed that the strongest BMI signal originated predominantly from females (chr. 10 rs1561589; P<sub>overall</sub>: 2.47 × 10<sup>-06</sup>/P<sub>females</sub>: 3.45 × 10<sup>-07</sup>/P<sub>males</sub>: 0.043). Functional ex vivo studies in mice revealed reduced hypothalamic expression of Ctbp2 and Nbeal1 after fasting. Hypothalamic expression of Ctbp2 was increased in diet-induced obese (DIO) mice as compared with age-matched lean controls. We observed no evidence for associations for the look-up of BMI-related loci in the AN GWAMA. A cross-trait analysis of AN and BMI loci revealed variants at three chromosomal loci with potential joint impact. The chromosome 10 locus is particularly promising given that the association with obesity was primarily driven by females. In addition, the detected altered hypothalamic expression patterns of Ctbp2 and Nbeal1 as a result of fasting and DIO implicate these genes in weight regulation.

    Funded by: Medical Research Council: MC_UU_12013/4, MR/K013351/1; NICHD NIH HHS: R01 HD056465, U54 HD086984; NIDDK NIH HHS: R01 DK075787; NIMH NIH HHS: K01 MH100435, K01 MH106675, K01 MH109782, T32 MH076694; Wellcome Trust

    Molecular psychiatry 2017;22;2;192-201

  • Early loss of Crebbp confers malignant stem cell properties on lymphoid progenitors.

    Horton SJ, Giotopoulos G, Yun H, Vohra S, Sheppard O, Bashford-Rogers R, Rashid M, Clipson A, Chan WI, Sasca D, Yiangou L, Osaki H, Basheer F, Gallipoli P, Burrows N, Erdem A, Sybirna A, Foerster S, Zhao W, Sustic T, Petrunkina Harrison A, Laurenti E, Okosun J, Hodson D, Wright P, Smith KG, Maxwell P, Fitzgibbon J, Du MQ, Adams DJ and Huntly BJP

    Wellcome Trust-MRC Cambridge Stem Cell Institute, Cambridge, UK.

    Loss-of-function mutations of cyclic-AMP response element binding protein, binding protein (CREBBP) are prevalent in lymphoid malignancies. However, the tumour suppressor functions of CREBBP remain unclear. We demonstrate that loss of Crebbp in murine haematopoietic stem and progenitor cells (HSPCs) leads to increased development of B-cell lymphomas. This is preceded by accumulation of hyperproliferative lymphoid progenitors with a defective DNA damage response (DDR) due to a failure to acetylate p53. We identify a premalignant lymphoma stem cell population with decreased H3K27ac, which undergoes transcriptional and genetic evolution due to the altered DDR, resulting in lymphomagenesis. Importantly, when Crebbp is lost later in lymphopoiesis, cellular abnormalities are lost and tumour generation is attenuated. We also document that CREBBP mutations may occur in HSPCs from patients with CREBBP-mutated lymphoma. These data suggest that earlier loss of Crebbp is advantageous for lymphoid transformation and inform the cellular origins and subsequent evolution of lymphoid malignancies.

    Funded by: Cancer Research UK: 13031; European Research Council: 647685; Medical Research Council: G1000288, MC_PC_12009, MR/M008584/1, MR/M008975/1, MR/M010392/1; Worldwide Cancer Research: 14-1069

    Nature cell biology 2017;19;9;1093-1104

  • Clinical and biological insights from viral genome sequencing.

    Houldcroft CJ, Beale MA and Breuer J

    Department of Infection, Immunity and Inflammation, Great Ormond Street Institute of Child Health, University College London, London WC1N 1EH, UK; and the Division of Biological Anthropology, University of Cambridge, Cambridge CB2 3QG, UK.

    Whole-genome sequencing (WGS) of pathogens is becoming increasingly important not only for basic research but also for clinical science and practice. In virology, WGS is important for the development of novel treatments and vaccines, and for increasing the power of molecular epidemiology and evolutionary genomics. In this Opinion article, we suggest that WGS of viruses in a clinical setting will become increasingly important for patient care. We give an overview of different WGS methods that are used in virology and summarize their advantages and disadvantages. Although there are only partially addressed technical, financial and ethical issues in regard to the clinical application of viral WGS, this technique provides important insights into virus transmission, evolution and pathogenesis.

    Nature reviews. Microbiology 2017;15;3;183-192

  • WormBase ParaSite - a comprehensive resource for helminth genomics.

    Howe KL, Bolt BJ, Shafie M, Kersey P and Berriman M

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. Electronic address:

    The number of publicly available parasitic worm genome sequences has increased dramatically in the past three years, and research interest in helminth functional genomics is now quickly gathering pace in response to the foundation that has been laid by these collective efforts. A systematic approach to the organisation, curation, analysis and presentation of these data is clearly vital for maximising the utility of these data to researchers. We have developed a portal called WormBase ParaSite ( for interrogating helminth genomes on a large scale. Data from over 100 nematode and platyhelminth species are integrated, adding value by way of systematic and consistent functional annotation (e.g. protein domains and Gene Ontology terms), gene expression analysis (e.g. alignment of life-stage specific transcriptome data sets), and comparative analysis (e.g. orthologues and paralogues). We provide several ways of exploring the data, including genome browsers, genome and gene summary pages, text search, sequence search, a query wizard, bulk downloads, and programmatic interfaces. In this review, we provide an overview of the back-end infrastructure and analysis behind WormBase ParaSite, and the displays and tools available to users for interrogating helminth genomic data.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/K020080; Medical Research Council: MR/L001020/1

    Molecular and biochemical parasitology 2017;215;2-10

  • Fifteen new risk loci for coronary artery disease highlight arterial-wall-specific mechanisms.

    Howson JMM, Zhao W, Barnes DR, Ho WK, Young R, Paul DS, Waite LL, Freitag DF, Fauman EB, Salfati EL, Sun BB, Eicher JD, Johnson AD, Sheu WHH, Nielsen SF, Lin WY, Surendran P, Malarstig A, Wilk JB, Tybjærg-Hansen A, Rasmussen KL, Kamstrup PR, Deloukas P, Erdmann J, Kathiresan S, Samani NJ, Schunkert H, Watkins H, CARDIoGRAMplusC4D, Do R, Rader DJ, Johnson JA, Hazen SL, Quyyumi AA, Spertus JA, Pepine CJ, Franceschini N, Justice A, Reiner AP, Buyske S, Hindorff LA, Carty CL, North KE, Kooperberg C, Boerwinkle E, Young K, Graff M, Peters U, Absher D, Hsiung CA, Lee WJ, Taylor KD, Chen YH, Lee IT, Guo X, Chung RH, Hung YJ, Rotter JI, Juang JJ, Quertermous T, Wang TD, Rasheed A, Frossard P, Alam DS, Majumder AAS, Di Angelantonio E, Chowdhury R, EPIC-CVD, Chen YI, Nordestgaard BG, Assimes TL, Danesh J, Butterworth AS and Saleheen D

    MRC/BHF Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.

    Coronary artery disease (CAD) is a leading cause of morbidity and mortality worldwide. Although 58 genomic regions have been associated with CAD thus far, most of the heritability is unexplained, indicating that additional susceptibility loci await identification. An efficient discovery strategy may be larger-scale evaluation of promising associations suggested by genome-wide association studies (GWAS). Hence, we genotyped 56,309 participants using a targeted gene array derived from earlier GWAS results and performed meta-analysis of results with 194,427 participants previously genotyped, totaling 88,192 CAD cases and 162,544 controls. We identified 25 new SNP-CAD associations (P < 5 × 10<sup>-8</sup>, in fixed-effects meta-analysis) from 15 genomic regions, including SNPs in or near genes involved in cellular adhesion, leukocyte migration and atherosclerosis (PECAM1, rs1867624), coagulation and inflammation (PROCR, rs867186 (p.Ser219Gly)) and vascular smooth muscle cell differentiation (LMOD1, rs2820315). Correlation of these regions with cell-type-specific gene expression and plasma protein levels sheds light on potential disease mechanisms.

    Funded by: British Heart Foundation: CH/12/2/29428, RG/08/014/24067, RG/14/5/30893, SP/02/002/14543, SP/09/002/27676; European Research Council: 268834; Medical Research Council: G0800270, MR/L003120/1; NCATS NIH HHS: UL1 TR000124; NHLBI NIH HHS: K99 HL130580, R21 HL123677, T32 HL098049; NIDDK NIH HHS: K23 DK088942, P30 DK063491, R56 DK104806; NIEHS NIH HHS: P30 ES010126; NIH HHS: S10 OD020069

    Nature genetics 2017;49;7;1113-1119

  • Comparative genomics reveals convergent evolution between the bamboo-eating giant and red pandas.

    Hu Y, Wu Q, Ma S, Ma T, Shan L, Wang X, Nie Y, Ning Z, Yan L, Xiu Y and Wei F

    Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China.

    Phenotypic convergence between distantly related taxa often mirrors adaptation to similar selective pressures and may be driven by genetic convergence. The giant panda (Ailuropoda melanoleuca) and red panda (Ailurus fulgens) belong to different families in the order Carnivora, but both have evolved a specialized bamboo diet and adaptive pseudothumb, representing a classic model of convergent evolution. However, the genetic bases of these morphological and physiological convergences remain unknown. Through de novo sequencing the red panda genome and improving the giant panda genome assembly with added data, we identified genomic signatures of convergent evolution. Limb development genes DYNC2H1 and PCNT have undergone adaptive convergence and may be important candidate genes for pseudothumb development. As evolutionary responses to a bamboo diet, adaptive convergence has occurred in genes involved in the digestion and utilization of bamboo nutrients such as essential amino acids, fatty acids, and vitamins. Similarly, the umami taste receptor gene TAS1R1 has been pseudogenized in both pandas. These findings offer insights into genetic convergence mechanisms underlying phenotypic convergence and adaptation to a specialized bamboo diet.

    Proceedings of the National Academy of Sciences of the United States of America 2017;114;5;1081-1086

  • Fine-mapping inflammatory bowel disease loci to single-variant resolution.

    Huang H, Fang M, Jostins L, Umićević Mirkov M, Boucher G, Anderson CA, Andersen V, Cleynen I, Cortes A, Crins F, D'Amato M, Deffontaine V, Dmitrieva J, Docampo E, Elansary M, Farh KK, Franke A, Gori AS, Goyette P, Halfvarson J, Haritunians T, Knight J, Lawrance IC, Lees CW, Louis E, Mariman R, Meuwissen T, Mni M, Momozawa Y, Parkes M, Spain SL, Théâtre E, Trynka G, Satsangi J, van Sommeren S, Vermeire S, Xavier RJ, International Inflammatory Bowel Disease Genetics Consortium, Weersma RK, Duerr RH, Mathew CG, Rioux JD, McGovern DPB, Cho JH, Georges M, Daly MJ and Barrett JC

    Analytic and Translational Genetics Unit, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts 02114, USA.

    Inflammatory bowel diseases are chronic gastrointestinal inflammatory disorders that affect millions of people worldwide. Genome-wide association studies have identified 200 inflammatory bowel disease-associated loci, but few have been conclusively resolved to specific functional variants. Here we report fine-mapping of 94 inflammatory bowel disease loci using high-density genotyping in 67,852 individuals. We pinpoint 18 associations to a single causal variant with greater than 95% certainty, and an additional 27 associations to a single variant with greater than 50% certainty. These 45 variants are significantly enriched for protein-coding changes (n = 13), direct disruption of transcription-factor binding sites (n = 3), and tissue-specific epigenetic marks (n = 10), with the last category showing enrichment in specific immune cells among associations stronger in Crohn's disease and in gut mucosa among associations stronger in ulcerative colitis. The results of this study suggest that high-resolution fine-mapping in large samples can convert many discoveries from genome-wide association studies into statistically convincing causal variants, providing a powerful substrate for experimental elucidation of disease mechanisms.

    Funded by: Chief Scientist Office: ETM/137; Medical Research Council: G0600329, G0800759, MR/M00533X/1; NCI NIH HHS: R01 CA141743; NIAID NIH HHS: U01 AI067068; NIDCR NIH HHS: U54 DE023789; NIDDK NIH HHS: P01 DK046763, P30 DK043351, R01 DK064869, R01 DK092235, R01 DK106593, U01 DK062413, U01 DK062420, U01 DK062422, U01 DK062429, U01 DK062432, U24 DK062429; Wellcome Trust: 098051, 098759

    Nature 2017;547;7662;173-178

  • Stella modulates transcriptional and endogenous retrovirus programs during maternal-to-zygotic transition.

    Huang Y, Kim JK, Do DV, Lee C, Penfold CA, Zylicz JJ, Marioni JC, Hackett JA and Surani MA

    Wellcome Trust/Cancer Research United Kingdom Gurdon Institute, University of Cambridge, Cambridge, United Kingdom.

    The maternal-to-zygotic transition (MZT) marks the period when the embryonic genome is activated and acquires control of development. Maternally inherited factors play a key role in this critical developmental process, which occurs at the 2-cell stage in mice. We investigated the function of the maternally inherited factor Stella (encoded by <i>Dppa3</i>) using single-cell/embryo approaches. We show that loss of maternal Stella results in widespread transcriptional mis-regulation and a partial failure of MZT. Strikingly, activation of endogenous retroviruses (ERVs) is significantly impaired in Stella maternal/zygotic knockout embryos, which in turn leads to a failure to upregulate chimeric transcripts. Amongst ERVs, MuERV-L activation is particularly affected by the absence of Stella, and direct in vivo knockdown of MuERV-L impacts the developmental potential of the embryo. We propose that Stella is involved in ensuring activation of ERVs, which themselves play a potentially key role during early development, either directly or through influencing embryonic gene expression.

    Funded by: Cancer Research UK: C6946/A14492; Wellcome Trust: 092096 , 096738

    eLife 2017;6

  • ARIBA: rapid antimicrobial resistance genotyping directly from sequencing reads.

    Hunt M, Mather AE, Sánchez-Busó L, Page AJ, Parkhill J, Keane JA and Harris SR

    1​Infection Genomics, Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK.

    Antimicrobial resistance (AMR) is one of the major threats to human and animal health worldwide, yet few high-throughput tools exist to analyse and predict the resistance of a bacterial isolate from sequencing data. Here we present a new tool, ARIBA, that identifies AMR-associated genes and single nucleotide polymorphisms directly from short reads, and generates detailed and customizable output. The accuracy and advantages of ARIBA over other tools are demonstrated on three datasets from Gram-positive and Gram-negative bacteria, with ARIBA outperforming existing methods.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/M014088/1; Medical Research Council: MR/L015080/1; Wellcome Trust: 206194

    Microbial genomics 2017;3;10;e000131

  • Pooling strategy and chromosome painting characterize a living zebroid for the first time.

    Iannuzzi A, Pereira J, Iannuzzi C, Fu B and Ferguson-Smith M

    Laboratory of Animal Cytogenetics and Genomics, National Research Council of Italy, Institute of Animal Production Systems in Mediterranean Environments (ISPAAM), Naples, Italy.

    We have investigated the complex karyotype of a living zebra-donkey hybrid for the first time using chromosome-specific painting probes produced from flow-sorted chromosomes from a zebra (Equus burchelli) and horse (Equus caballus). As the chromosomes proved difficult to distinguish from one another, a successful new strategy was devised to resolve the difficulty and characterize each chromosome. This was based on selecting five panels of whole chromosome painting probes that could differentiate zebra and donkey chromosomes by labelling the probes with either FITC or Cy3 fluorochromes. Each panel was hybridized sequentially to the same G-Q-banded metaphases and the results combined so that every zebra and donkey chromosome in each suitable metaphase could be identified. A diploid number of 2n = 53, XY was found, containing haploid sets of 22 chromosomes from the zebra and 31 chromosomes from the donkey, without evidence of chromosome rearrangement. This new strategy, developed for the first time, may have several applications in the resolution of other complex hybrid karyotypes and chromosomal aberrations.

    PloS one 2017;12;7;e0180158

  • Gene Expression in Leishmania Is Regulated Predominantly by Gene Dosage.

    Iantorno SA, Durrant C, Khan A, Sanders MJ, Beverley SM, Warren WC, Berriman M, Sacks DL, Cotton JA and Grigg ME

    Laboratory of Parasitic Diseases, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, USA.

    <i>Leishmania tropica</i>, a unicellular eukaryotic parasite present in North and East Africa, the Middle East, and the Indian subcontinent, has been linked to large outbreaks of cutaneous leishmaniasis in displaced populations in Iraq, Jordan, and Syria. Here, we report the genome sequence of this pathogen and 7,863 identified protein-coding genes, and we show that the majority of clinical isolates possess high levels of allelic diversity, genetic admixture, heterozygosity, and extensive aneuploidy. By utilizing paired genome-wide high-throughput DNA sequencing (DNA-seq) with RNA-seq, we found that gene dosage, at the level of individual genes or chromosomal "somy" (a general term covering disomy, trisomy, tetrasomy, etc.), accounted for greater than 85% of total gene expression variation in genes with a 2-fold or greater change in expression. High gene copy number variation (CNV) among membrane-bound transporters, a class of proteins previously implicated in drug resistance, was found for the most highly differentially expressed genes. Our results suggest that gene dosage is an adaptive trait that confers phenotypic plasticity among natural <i>Leishmania</i> populations by rapid down- or upregulation of transporter proteins to limit the effects of environmental stresses, such as drug selection.<b>IMPORTANCE</b><i>Leishmania</i> is a genus of unicellular eukaryotic parasites that is responsible for a spectrum of human diseases that range from cutaneous leishmaniasis (CL) and mucocutaneous leishmaniasis (MCL) to life-threatening visceral leishmaniasis (VL). Developmental and strain-specific gene expression is largely thought to be due to mRNA message stability or posttranscriptional regulatory networks for this species, whose genome is organized into polycistronic gene clusters in the absence of promoter-mediated regulation of transcription initiation of nuclear genes. Genetic hybridization has been demonstrated to yield dramatic structural genomic variation, but whether such changes in gene dosage impact gene expression has not been formally investigated. Here we show that the predominant mechanism determining transcript abundance differences (>85%) in <i>Leishmania tropica</i> is that of gene dosage at the level of individual genes or chromosomal somy.

    Funded by: NIAID NIH HHS: R01 AI029646; Wellcome Trust: 206194

    mBio 2017;8;5

  • Variation in olfactory neuron repertoires is genetically controlled and environmentally modulated.

    Ibarra-Soria X, Nakahara TS, Lilue J, Jiang Y, Trimmer C, Souza MA, Netto PH, Ikegami K, Murphy NR, Kusma M, Kirton A, Saraiva LR, Keane TM, Matsunami H, Mainland J, Papes F and Logan DW

    Wellcome Trust Sanger Institute, Cambridge, United Kingdom.

    The mouse olfactory sensory neuron (OSN) repertoire is composed of 10 million cells and each expresses one olfactory receptor (OR) gene from a pool of over 1000. Thus, the nose is sub-stratified into more than a thousand OSN subtypes. Here, we employ and validate an RNA-sequencing-based method to quantify the abundance of all OSN subtypes in parallel, and investigate the genetic and environmental factors that contribute to neuronal diversity. We find that the OSN subtype distribution is stereotyped in genetically identical mice, but varies extensively between different strains. Further, we identify <i>cis</i>-acting genetic variation as the greatest component influencing OSN composition and demonstrate independence from OR function. However, we show that olfactory stimulation with particular odorants results in modulation of dozens of OSN subtypes in a subtle but reproducible, specific and time-dependent manner. Together, these mechanisms generate a highly individualized olfactory sensory system by promoting neuronal diversity.

    Funded by: Medical Research Council: MR/L007428/1; NIDCD NIH HHS: F32 DC014202, P30 DC011735, R01 DC013339, R01 DC014423

    eLife 2017;6

  • An untypeable enterotoxigenic Escherichia coli represents one of the dominant types causing human disease.

    Iguchi A, von Mentzer A, Kikuchi T and Thomson NR

    1University of Miyazaki, Miyazaki, Japan.

    Enterotoxigenic <i>Escherichia coli</i> (ETEC) is a major cause of diarrhoea in children below 5 years of age in endemic areas, and is a primary cause of diarrhoea in travellers visiting developing countries. Epidemiological analysis of <i>E. coli</i> pathovars is traditionally carried out based on the results of serotyping. However, genomic analysis of a global ETEC collection of 362 isolates taken from patients revealed nine novel O-antigen biosynthesis gene clusters that were previously unrecognized, and have collectively been called unclassified. When put in the context of all isolates sequenced, one of the novel O-genotypes, OgN5, was found to be the second most common ETEC O-genotype causing disease, after O6, in a globally representative ETEC collection. It's also clear that ETEC OgN5 isolates have spread globally. These novel O-genotypes have now been included in our comprehensive O-genotyping scheme, and can be detected using a PCR-based and an <i>in silico</i> typing method. This will assist in epidemiological studies, as well as in ETEC vaccine development.

    Microbial genomics 2017;3;9;e000121

  • On the effective depth of viral sequence data.

    Illingworth CJR, Roy S, Beale MA, Tutill H, Williams R and Breuer J

    Department of Genetics, University of Cambridge, Cambridge, UK.

    Genome sequence data are of great value in describing evolutionary processes in viral populations. However, in such studies, the extent to which data accurately describes the viral population is a matter of importance. Multiple factors may influence the accuracy of a dataset, including the quantity and nature of the sample collected, and the subsequent steps in viral processing. To investigate this phenomenon, we sequenced replica datasets spanning a range of viruses, and in which the point at which samples were split was different in each case, from a dataset in which independent samples were collected from a single patient to another in which all processing steps up to sequencing were applied to a single sample before splitting the sample and sequencing each replicate. We conclude that neither a high read depth nor a high template number in a sample guarantee the precision of a dataset. Measures of consistency calculated from within a single biological sample may also be insufficient; distortion of the composition of a population by the experimental procedure or genuine within-host diversity between samples may each affect the results. Where it is possible, data from replicate samples should be collected to validate the consistency of short-read sequence data.

    Virus evolution 2017;3;2;vex030

  • The spread of artemisinin-resistant Plasmodium falciparum in the Greater Mekong subregion: a molecular epidemiology observational study.

    Imwong M, Suwannasin K, Kunasol C, Sutawong K, Mayxay M, Rekol H, Smithuis FM, Hlaing TM, Tun KM, van der Pluijm RW, Tripura R, Miotto O, Menard D, Dhorda M, Day NPJ, White NJ and Dondorp AM

    Department of Molecular Tropical Medicine and Genetics, Mahidol University, Bangkok, Thailand; Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand. Electronic address:

    Background: Evidence suggests that the PfKelch13 mutations that confer artemisinin resistance in falciparum malaria have multiple independent origins across the Greater Mekong subregion, which has motivated a regional malaria elimination agenda. We aimed to use molecular genotyping to assess antimalarial drug resistance selection and spread in the Greater Mekong subregion.

    Methods: In this observational study, we tested Plasmodium falciparum isolates from Myanmar, northeastern Thailand, southern Laos, and western Cambodia for PfKelch13 mutations and for Pfplasmepsin2 gene amplification (indicating piperaquine resistance). We collected blood spots from patients with microscopy or rapid test confirmed uncomplicated falciparum malaria. We used microsatellite genotyping to assess genetic relatedness.

    Findings: As part of studies on the epidemiology of artemisinin-resistant malaria between Jan 1, 2008, and Dec 31, 2015, we collected 434 isolates. In 2014-15, a single long PfKelch13 C580Y haplotype (-50 to +31·5 kb) lineage, which emerged in western Cambodia in 2008, was detected in 65 of 88 isolates from northeastern Thailand, 86 of 111 isolates from southern Laos, and 14 of 14 isolates from western Cambodia, signifying a hard transnational selective sweep. Pfplasmepsin2 amplification occurred only within this lineage, and by 2015 these closely related parasites were found in ten of the 14 isolates from Cambodia and 15 of 15 isolates from northeastern Thailand. C580Y mutated parasites from Myanmar had a different genetic origin.

    Interpretation: Our results suggest that the dominant artemisinin-resistant P falciparum C580Y lineage probably arose in western Cambodia and then spread to Thailand and Laos, outcompeting other parasites and acquiring piperaquine resistance. The emergence and spread of fit artemisinin-resistant P falciparum parasite lineages, which then acquire partner drug resistance across the Greater Mekong subregion, threatens regional malaria control and elimination goals. Elimination of falciparum malaria from this region should be accelerated while available antimalarial drugs still remain effective.

    Funding: The Wellcome Trust and the Bill and Melinda Gates Foundation.

    Funded by: Wellcome Trust

    The Lancet. Infectious diseases 2017;17;5;491-497

  • The role of sex and body weight on the metabolic effects of high-fat diet in C57BL/6N mice.

    Ingvorsen C, Karp NA and Lelliott CJ

    Mouse Pipelines, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.

    Background: Metabolic disorders are commonly investigated using knockout and transgenic mouse models on the C57BL/6N genetic background due to its genetic susceptibility to the deleterious metabolic effects of high-fat diet (HFD). There is growing awareness of the need to consider sex in disease progression, but limited attention has been paid to sexual dimorphism in mouse models and its impact in metabolic phenotypes. We assessed the effect of HFD and the impact of sex on metabolic variables in this strain.

    Methods: We generated a reference data set encompassing glucose tolerance, body composition and plasma chemistry data from 586 C57BL/6N mice fed a standard chow and 733 fed a HFD collected as part of a high-throughput phenotyping pipeline. Linear mixed model regression analysis was used in a dual analysis to assess the effect of HFD as an absolute change in phenotype, but also as a relative change accounting for the potential confounding effect of body weight.

    Results: HFD had a significant impact on all variables tested with an average absolute effect size of 29%. For the majority of variables (78%), the treatment effect was modified by sex and this was dominated by male-specific or a male stronger effect. On average, there was a 13.2% difference in the effect size between the male and female mice for sexually dimorphic variables. HFD led to a significant body weight phenotype (24% increase), which acts as a confounding effect on the other analysed variables. For 79% of the variables, body weight was found to be a significant source of variation, but even after accounting for this confounding effect, similar HFD-induced phenotypic changes were found to when not accounting for weight.

    Conclusion: HFD and sex are powerful modifiers of metabolic parameters in C57BL/6N mice. We also demonstrate the value of considering body size as a covariate to obtain a richer understanding of metabolic phenotypes.

    Funded by: Wellcome Trust: WT098051

    Nutrition & diabetes 2017;7;4;e261

  • Sub-minute Phosphoregulation of Cell Cycle Systems during Plasmodium Gamete Formation.

    Invergo BM, Brochet M, Yu L, Choudhary J, Beltrao P and Billker O

    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire CB10 1SD, UK; Malaria Programme, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK.

    The transmission of malaria parasites to mosquitoes relies on the rapid induction of sexual reproduction upon their ingestion into a blood meal. Haploid female and male gametocytes become activated and emerge from their host cells, and the males enter the cell cycle to produce eight microgametes. The synchronized nature of gametogenesis allowed us to investigate phosphorylation signaling during its first minute in Plasmodium berghei via a high-resolution time course of the phosphoproteome. This revealed an unexpectedly broad response, with proteins related to distinct cell cycle events undergoing simultaneous phosphoregulation. We implicate several protein kinases in the process, and we validate our analyses on the plant-like calcium-dependent protein kinase 4 (CDPK4) and a homolog of serine/arginine-rich protein kinases (SRPK1). Mutants in these kinases displayed distinct phosphoproteomic disruptions, consistent with differences in their phenotypes. The results reveal the central role of protein phosphorylation in the atypical cell cycle regulation of a divergent eukaryote.

    Cell reports 2017;21;7;2017-2029

  • Distinct Campylobacter fetus lineages adapted as livestock pathogens and human pathobionts in the intestinal microbiota.

    Iraola G, Forster SC, Kumar N, Lehours P, Bekal S, García-Peña FJ, Paolicchi F, Morsella C, Hotzel H, Hsueh PR, Vidal A, Lévesque S, Yamazaki W, Balzan C, Vargas A, Piccirillo A, Chaban B, Hill JE, Betancor L, Collado L, Truyers I, Midwinter AC, Dagi HT, Mégraud F, Calleros L, Pérez R, Naya H and Lawley TD

    Unidad de Bioinformática, Institut Pasteur Montevideo, 11400, Montevideo, Uruguay.

    Campylobacter fetus is a venereal pathogen of cattle and sheep, and an opportunistic human pathogen. It is often assumed that C. fetus infection occurs in humans as a zoonosis through food chain transmission. Here we show that mammalian C. fetus consists of distinct evolutionary lineages, primarily associated with either human or bovine hosts. We use whole-genome phylogenetics on 182 strains from 17 countries to provide evidence that C. fetus may have originated in humans around 10,500 years ago and may have "jumped" into cattle during the livestock domestication period. We detect C. fetus genomes in 8% of healthy human fecal metagenomes, where the human-associated lineages are the dominant type (78%). Thus, our work suggests that C. fetus is an unappreciated human intestinal pathobiont likely spread by human to human transmission. This genome-based evolutionary framework will facilitate C. fetus epidemiology research and the development of improved molecular diagnostics and prevention schemes for this neglected pathogen.

    Funded by: Medical Research Council: PF451; Wellcome Trust: 098051

    Nature communications 2017;8;1;1367

  • Insecticide-induced leg loss does not eliminate biting and reproduction in Anopheles gambiae mosquitoes.

    Isaacs AT, Lynd A and Donnelly MJ

    Department of Vector Biology, Liverpool School of Tropical Medicine, Liverpool, UK.

    Recent successes in malaria control have been largely attributable to the deployment of insecticide-based vector control tools such as bed nets and indoor residual spraying. Pyrethroid-treated bed nets are acutely neurotoxic to mosquitoes, inducing symptoms such as loss of coordination, paralysis, and violent spasms. One result of pyrethroid exposure often seen in laboratory tests is mosquito leg loss, a condition that has thus far been assumed to equate to mortality, as females are not expected to blood feed. However, whilst limb loss is unlikely to be adaptive, females with missing limbs may play a role in the propagation of both their species and pathogens. To test the hypothesis that leg loss inhibits mosquitoes from biting and reproducing, mosquitoes with one, two, or six legs were evaluated for their success in feeding upon a human. These experiments demonstrated that insecticide-induced leg loss had no significant effect upon blood feeding or egg laying success. We conclude that studies of pyrethroid efficacy should not discount mosquitoes that survive insecticide exposure with fewer than six legs, as they may still be capable of biting humans, reproducing, and contributing to malaria transmission.

    Funded by: NIAID NIH HHS: U19 AI089674

    Scientific reports 2017;7;46674

  • DNA methylation homeostasis in human and mouse development.

    Iurlaro M, von Meyenn F and Reik W

    Epigenetics Programme, Babraham Institute, Cambridge CB22 3AT, UK.

    The molecular pathways that regulate gain and loss of DNA methylation during mammalian development need to be tightly balanced to maintain a physiological equilibrium. Here we explore the relative contributions of the different pathways and enzymatic activities involved in methylation homeostasis in the context of genome-wide and locus-specific epigenetic reprogramming in mammals. An adaptable epigenetic machinery allows global epigenetic reprogramming to concur with local maintenance of critical epigenetic memory in the genome, and appears to regulate the tempo of global reprogramming in different cell lineages and species.

    Current opinion in genetics & development 2017;43;101-109

  • Disentangling Immediate Adaptive Introgression from Selection on Standing Introgressed Variation in Humans.

    Jagoda E, Lawson DJ, Wall JD, Lambert D, Muller C, Westaway M, Leavesley M, Capellini TD, Mirazón Lahr M, Gerbault P, Thomas MG, Migliano AB, Willerslev E, Metspalu M and Pagani L

    Human Evolutionary Biology, Harvard University 11 Divinity Avenue Cambridge MA 02138 USA.

    Recent studies have reported evidence suggesting that portions of contemporary human genomes introgressed from archaic hominin populations went to high frequencies due to positive selection. However, no study to date has specifically addressed the post-introgression population dynamics of these putative cases of adaptive introgression. Here, for the first time, we specifically define cases of immediate adaptive introgression (iAI), in which archaic haplotypes rose to high frequencies in humans as a result of a selective sweep that occurred shortly after the introgression event. We define these cases as distinct from instances of selection on standing introgressed variation (SI), in which an introgressed haplotype initially segregated neutrally and subsequently underwent positive selection. Using a geographically diverse dataset, we report novel cases of selection on introgressed variation in living humans and shortlisted among these cases those whose selective sweeps are more consistent with having been the product of iAI rather than SI. Many of these novel inferred iAI haplotypes have potential biological relevance, including three introgressed haplotypes that contain immune-related genes in West Siberians, South Asians, and West Eurasians. Overall, our results suggest that iAI may not represent the full picture of positive selection on archaically introgressed haplotypes in humans and that more work needs to be done to analyze the role of SI in the archaic introgression landscape of living humans.

    Molecular biology and evolution 2017

  • Clonal Hematopoiesis and Risk of Atherosclerotic Cardiovascular Disease.

    Jaiswal S, Natarajan P, Silver AJ, Gibson CJ, Bick AG, Shvartz E, McConkey M, Gupta N, Gabriel S, Ardissino D, Baber U, Mehran R, Fuster V, Danesh J, Frossard P, Saleheen D, Melander O, Sukhova GK, Neuberg D, Libby P, Kathiresan S and Ebert BL

    From the Department of Medicine, Division of Hematology, Brigham and Women's Hospital (S.J., A.J.S., M.M.) and Harvard Medical School (B.L.E.), the Department of Medicine, Division of Cardiovascular Medicine, Brigham and Women's Hospital (E.S.) and Harvard Medical School (G.K.S., P.L.), the Department of Pathology (S.J.) and the Center for Genomic Medicine (P.N., S.K.), Massachusetts General Hospital, the Department of Medicine, Division of Cardiology, and Cardiovascular Research Center (P.N., S.K.), and the Department of Medicine (A.G.B.), Massachusetts General Hospital and Harvard Medical School, and the Departments of Medical Oncology (C.J.G.) and Biostatistics and Computational Biology (D.N.), Dana-Farber Cancer Institute, Boston, and the Program in Medical and Population Genetics, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge (P.N., A.G.B., N.G., S.G., S.K.) - all in Massachusetts; the Department of Cardiology, University Hospital, Parma, Italy (D.A.); the Department of Medicine, Division of Cardiology, Mt. Sinai School of Medicine, New York (U.B., R.M., V.F.); Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid (V.F.); Medical Research Council-British Heart Foundation Cardiovascular Epidemiology Unit and National Institute for Health Research Blood and Transplant Research Unit in Donor Health and Genomics, Department of Public Health and Primary Care, and the British Heart Foundation, Cambridge Centre of Excellence, Department of Medicine, University of Cambridge, Cambridge (J.D.), and the Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton (J.D.) - both in the United Kingdom; the Center for Non-Communicable Diseases, Karachi, Pakistan (P.F., D.S.); the Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia (D.S.); and the Department of Clinical Sciences Malmö, Lund University, Lund, Sweden (O.M.).

    Background: Clonal hematopoiesis of indeterminate potential (CHIP), which is defined as the presence of an expanded somatic blood-cell clone in persons without other hematologic abnormalities, is common among older persons and is associated with an increased risk of hematologic cancer. We previously found preliminary evidence for an association between CHIP and atherosclerotic cardiovascular disease, but the nature of this association was unclear.

    Methods: We used whole-exome sequencing to detect the presence of CHIP in peripheral-blood cells and associated such presence with coronary heart disease using samples from four case-control studies that together enrolled 4726 participants with coronary heart disease and 3529 controls. To assess causality, we perturbed the function of Tet2, the second most commonly mutated gene linked to clonal hematopoiesis, in the hematopoietic cells of atherosclerosis-prone mice.

    Results: In nested case-control analyses from two prospective cohorts, carriers of CHIP had a risk of coronary heart disease that was 1.9 times as great as in noncarriers (95% confidence interval [CI], 1.4 to 2.7). In two retrospective case-control cohorts for the evaluation of early-onset myocardial infarction, participants with CHIP had a risk of myocardial infarction that was 4.0 times as great as in noncarriers (95% CI, 2.4 to 6.7). Mutations in DNMT3A, TET2, ASXL1, and JAK2 were each individually associated with coronary heart disease. CHIP carriers with these mutations also had increased coronary-artery calcification, a marker of coronary atherosclerosis burden. Hypercholesterolemia-prone mice that were engrafted with bone marrow obtained from homozygous or heterozygous Tet2 knockout mice had larger atherosclerotic lesions in the aortic root and aorta than did mice that had received control bone marrow. Analyses of macrophages from Tet2 knockout mice showed elevated expression of several chemokine and cytokine genes that contribute to atherosclerosis.

    Conclusions: The presence of CHIP in peripheral-blood cells was associated with nearly a doubling in the risk of coronary heart disease in humans and with accelerated atherosclerosis in mice. (Funded by the National Institutes of Health and others.).

    Funded by: British Heart Foundation: RG/08/014/24067; FIC NIH HHS: RC1 TW008485; Medical Research Council: G0800270, MR/L003120/1; NHGRI NIH HHS: U54 HG003067; NHLBI NIH HHS: R01 HL080472, R01 HL082945, RC2 HL101834, T32 HL116324; Wellcome Trust

    The New England journal of medicine 2017;377;2;111-121

  • Tracking the Evolution of Non-Small-Cell Lung Cancer.

    Jamal-Hanjani M, Wilson GA, McGranahan N, Birkbak NJ, Watkins TBK, Veeriah S, Shafi S, Johnson DH, Mitter R, Rosenthal R, Salm M, Horswell S, Escudero M, Matthews N, Rowan A, Chambers T, Moore DA, Turajlic S, Xu H, Lee SM, Forster MD, Ahmad T, Hiley CT, Abbosh C, Falzon M, Borg E, Marafioti T, Lawrence D, Hayward M, Kolvekar S, Panagiotopoulos N, Janes SM, Thakrar R, Ahmed A, Blackhall F, Summers Y, Shah R, Joseph L, Quinn AM, Crosbie PA, Naidu B, Middleton G, Langman G, Trotter S, Nicolson M, Remmen H, Kerr K, Chetty M, Gomersall L, Fennell DA, Nakas A, Rathinam S, Anand G, Khan S, Russell P, Ezhil V, Ismail B, Irvin-Sellers M, Prakash V, Lester JF, Kornaszewska M, Attanoos R, Adams H, Davies H, Dentro S, Taniere P, O'Sullivan B, Lowe HL, Hartley JA, Iles N, Bell H, Ngai Y, Shaw JA, Herrero J, Szallasi Z, Schwarz RF, Stewart A, Quezada SA, Le Quesne J, Van Loo P, Dive C, Hackshaw A, Swanton C and TRACERx Consortium

    From the Cancer Research UK Lung Cancer Centre of Excellence (M.J.-H., G.A.W., N. McGranahan, N.J.B., S.V., S.S., D.H.J., R.R., S.-M.L., M.D.F., C.A., S.M.J., C.D., C.S.), London and Manchester, Good Clinical Laboratory Practice Facility, University College London (UCL) Experimental Cancer Medicine Centre (H.L.L., J.A.H.), Bill Lyons Informatics Centre (J.H.), and Cancer Immunology Unit (S.A.Q.), UCL Cancer Institute, the Translational Cancer Therapeutics Laboratory (G.A.W., N. McGranahan, N.J.B., T.B.K.W., A.R., T.C., S. Turajlic, H.X., C.T.H., C.S.), Department of Bioinformatics and Biostatistics (R.M., M.S., S.H., M.E., A.S.), Advanced Sequencing Facility (N. Matthews), and Cancer Genomics Laboratory (S.D., P.V.L.), Francis Crick Institute, the Renal and Skin Units, Royal Marsden Hospital (S. Turajlic), the Departments of Medical Oncology (M.J.-H., S.-M.L., M.D.F., T.A., C.A., C.S.), Pathology (M.F., E.B., T.M.), Cardiothoracic Surgery (D.L., M.H., S. Kolvekar, N.P.), Respiratory Medicine (S.M.J., R.T.), and Radiology (A.A.), UCL Hospitals, Lungs for Living, UCL Respiratory, UCL (S.M.J.), the Department of Radiotherapy, North Middlesex University Hospital (G.A.), the Department of Respiratory Medicine, Royal Free Hospital (S. Khan), and UCL Cancer Research UK and Cancer Trials Centre (N.I., H.B., Y.N., A.H.), London, Cancer Studies, University of Leicester (D.A.M., D.A.F., J.A.S., J.L.Q.), the Department of Thoracic Surgery, Glenfield Hospital (A.N., S.R.), and the Medical Research Center Toxicology Unit (J.L.Q.), Leicester, the Institute of Cancer Studies, University of Manchester (F.B.), the Christie Hospital (F.B., Y.S.), the Departments of Cardiothoracic Surgery (R.S.) and Pathology (L.J., A.M.Q.) and the North West Lung Centre (P.A.C.), University Hospital of South Manchester, and Cancer Research UK Manchester Institute (C.D.), Manchester, the Departments of Thoracic Surgery (B.N.) and Cellular Pathology (G.L., S. Trotter), Birmingham Heartlands Hospital, Molecular Pathology Diagnostic Services, Queen Elizabeth Hospital (P.T., B.O.), and Institute of Immunology and Immunotherapy, University of Birmingham (G.M.), Birmingham, the Departments of Medical Oncology (M.N.), Cardiothoracic Surgery (H.R.), Pathology (K.K.), Respiratory Medicine (M.C.), and Radiology (L.G.), Aberdeen University Medical School and Aberdeen Royal Infirmary, Aberdeen, the Department of Respiratory Medicine, Barnet and Chase Farm Hospitals, Barnet (S. Khan), the Department of Respiratory Medicine, Princess Alexandra Hospital, Harlow (P.R.), the Department of Clinical Oncology, St. Luke's Cancer Centre, Guildford (V.E.), the Departments of Pathology (B.I.), Respiratory Medicine (M.I.-S.), and Radiology (V.P.), Ashford and St. Peters' Hospitals, Surrey, the Department of Clinical Oncology, Velindre Hospital (J.F.L.), the Departments of Radiology (H.A.) and Respiratory Medicine (H.D.), University Hospital Llandough, the Departments of Pathology (R.A.) and Cardiothoracic Surgery (M.K.), University Hospital of Wales, and Cardiff University (R.A.), Cardiff, and Wellcome Trust Sanger Institute, Hinxton, and Big Data Institute, University of Oxford, Oxford (S.D.) - all in the United Kingdom; the Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby (Z.S.); the Computational Health Informatics Program, Boston Children's Hospital and Harvard Medical School, Boston (Z.S.); MTA-SE-NAP, Brain Metastasis Research Group, 2nd Department of Pathology, Semmelweis University, Budapest, Hungary (Z.S.); Berlin Institute for Medical Systems Biology, Max Delbrueck Center for Molecular Medicine, Berlin (R.F.S.); and the Department of Human Genetics, University of Leuven, Leuven, Belgium (P.V.L.).

    Background: Among patients with non-small-cell lung cancer (NSCLC), data on intratumor heterogeneity and cancer genome evolution have been limited to small retrospective cohorts. We wanted to prospectively investigate intratumor heterogeneity in relation to clinical outcome and to determine the clonal nature of driver events and evolutionary processes in early-stage NSCLC.

    Methods: In this prospective cohort study, we performed multiregion whole-exome sequencing on 100 early-stage NSCLC tumors that had been resected before systemic therapy. We sequenced and analyzed 327 tumor regions to define evolutionary histories, obtain a census of clonal and subclonal events, and assess the relationship between intratumor heterogeneity and recurrence-free survival.

    Results: We observed widespread intratumor heterogeneity for both somatic copy-number alterations and mutations. Driver mutations in EGFR, MET, BRAF, and TP53 were almost always clonal. However, heterogeneous driver alterations that occurred later in evolution were found in more than 75% of the tumors and were common in PIK3CA and NF1 and in genes that are involved in chromatin modification and DNA damage response and repair. Genome doubling and ongoing dynamic chromosomal instability were associated with intratumor heterogeneity and resulted in parallel evolution of driver somatic copy-number alterations, including amplifications in CDK4, FOXA1, and BCL11A. Elevated copy-number heterogeneity was associated with an increased risk of recurrence or death (hazard ratio, 4.9; P=4.4×10<sup>-4</sup>), which remained significant in multivariate analysis.

    Conclusions: Intratumor heterogeneity mediated through chromosome instability was associated with an increased risk of recurrence or death, a finding that supports the potential value of chromosome instability as a prognostic predictor. (Funded by Cancer Research UK and others; TRACERx number, NCT01888601 .).

    Funded by: Medical Research Council: G108/596, MC_UP_1203/1

    The New England journal of medicine 2017;376;22;2109-2121

  • Evolution of mobile genetic element composition in an epidemic methicillin-resistant Staphylococcus aureus: temporal changes correlated with frequent loss and gain events.

    Jamrozy D, Coll F, Mather AE, Harris SR, Harrison EM, MacGowan A, Karas A, Elston T, Estée Török M, Parkhill J and Peacock SJ

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Background: Horizontal transfer of mobile genetic elements (MGEs) that carry virulence and antimicrobial resistance genes mediates the evolution of methicillin-resistant Staphylococcus aureus, and the emergence of new MRSA clones. Most MRSA lineages show an association with specific MGEs and the evolution of MGE composition following clonal expansion has not been widely studied.

    Results: We investigated the genomes of 1193 S. aureus bloodstream isolates, 1169 of which were MRSA, collected in the UK and the Republic of Ireland between 2001 and 2010. The majority of isolates belonged to clonal complex (CC)22 (n = 923), which contained diverse MGEs including elements that were found in other MRSA lineages. Several MGEs showed variable distribution across the CC22 phylogeny, including two antimicrobial resistance plasmids (pWBG751-like and SAP078A-like, carrying erythromycin and heavy metal resistance genes, respectively), a pathogenicity island carrying the enterotoxin C gene and two phage types Sa1int and Sa6int. Multiple gains and losses of these five MGEs were identified in the CC22 phylogeny using ancestral state reconstruction. Analysis of the temporal distribution of the five MGEs between 2001 and 2010 revealed an unexpected reduction in prevalence of the two plasmids and the pathogenicity island, and an increase in the two phage types. This occurred across the lineage and was not correlated with changes in the relative prevalence of CC22, or of any sub-lineages within in.

    Conclusions: Ancestral state reconstruction coupled with temporal trend analysis demonstrated that epidemic MRSA CC22 has an evolving MGE composition, and indicates that this important MRSA lineage has continued to adapt to changing selective pressure since its emergence.

    Funded by: Medical Research Council: G1000803, MR/N029399/1; Wellcome Trust

    BMC genomics 2017;18;1;684

  • The secondary resistome of multidrug-resistant Klebsiella pneumoniae.

    Jana B, Cain AK, Doerrler WT, Boinett CJ, Fookes MC, Parkhill J and Guardabassi L

    Department of Veterinary Disease Biology, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark.

    Klebsiella pneumoniae causes severe lung and bloodstream infections that are difficult to treat due to multidrug resistance. We hypothesized that antimicrobial resistance can be reversed by targeting chromosomal non-essential genes that are not responsible for acquired resistance but essential for resistant bacteria under therapeutic concentrations of antimicrobials. Conditional essentiality of individual genes to antimicrobial resistance was evaluated in an epidemic multidrug-resistant clone of K. pneumoniae (ST258). We constructed a high-density transposon mutant library of >430,000 unique Tn5 insertions and measured mutant depletion upon exposure to three clinically relevant antimicrobials (colistin, imipenem or ciprofloxacin) by Transposon Directed Insertion-site Sequencing (TraDIS). Using this high-throughput approach, we defined three sets of chromosomal non-essential genes essential for growth during exposure to colistin (n = 35), imipenem (n = 1) or ciprofloxacin (n = 1) in addition to known resistance determinants, collectively termed the "secondary resistome". As proof of principle, we demonstrated that inactivation of a non-essential gene not previously found linked to colistin resistance (dedA) restored colistin susceptibility by reducing the minimum inhibitory concentration from 8 to 0.5 μg/ml, 4-fold below the susceptibility breakpoint (S ≤ 2 μg/ml). This finding suggests that the secondary resistome is a potential target for developing antimicrobial "helper" drugs that restore the efficacy of existing antimicrobials.

    Funded by: Medical Research Council: G1100100; Wellcome Trust: 098051

    Scientific reports 2017;7;42483

  • Discovery and functional prioritization of Parkinson's disease candidate genes from large-scale whole exome sequencing.

    Jansen IE, Ye H, Heetveld S, Lechler MC, Michels H, Seinstra RI, Lubbe SJ, Drouet V, Lesage S, Majounie E, Gibbs JR, Nalls MA, Ryten M, Botia JA, Vandrovcova J, Simon-Sanchez J, Castillo-Lizardo M, Rizzu P, Blauwendraat C, Chouhan AK, Li Y, Yogi P, Amin N, van Duijn CM, International Parkinson’s Disease Genetics Consortium (IPGDC), Morris HR, Brice A, Singleton AB, David DC, Nollen EA, Jain S, Shulman JM and Heutink P

    German Center for Neurodegenerative Diseases (DZNE), Otfried-Müller-Str. 23, Tübingen, 72076, Germany.

    Background: Whole-exome sequencing (WES) has been successful in identifying genes that cause familial Parkinson's disease (PD). However, until now this approach has not been deployed to study large cohorts of unrelated participants. To discover rare PD susceptibility variants, we performed WES in 1148 unrelated cases and 503 control participants. Candidate genes were subsequently validated for functions relevant to PD based on parallel RNA-interference (RNAi) screens in human cell culture and Drosophila and C. elegans models.

    Results: Assuming autosomal recessive inheritance, we identify 27 genes that have homozygous or compound heterozygous loss-of-function variants in PD cases. Definitive replication and confirmation of these findings were hindered by potential heterogeneity and by the rarity of the implicated alleles. We therefore looked for potential genetic interactions with established PD mechanisms. Following RNAi-mediated knockdown, 15 of the genes modulated mitochondrial dynamics in human neuronal cultures and four candidates enhanced α-synuclein-induced neurodegeneration in Drosophila. Based on complementary analyses in independent human datasets, five functionally validated genes-GPATCH2L, UHRF1BP1L, PTPRH, ARSB, and VPS13C-also showed evidence consistent with genetic replication.

    Conclusions: By integrating human genetic and functional evidence, we identify several PD susceptibility gene candidates for further investigation. Our approach highlights a powerful experimental strategy with broad applicability for future studies of disorders with complex genetic etiologies.

    Funded by: Medical Research Council: G0700943, G0802462, G1100643, MR/K01417X/1; NCI NIH HHS: P30 CA125123, R01 CA141668; NCRR NIH HHS: C06 RR029965; NIA NIH HHS: K08 AG034290, R01 AG033193, R01 AG050631, U01 AG046161, Z01 AG000949, Z01 AG000957; NIEHS NIH HHS: Z01 ES101986; NIGMS NIH HHS: R01 GM084947; NINDS NIH HHS: P50 NS071674, R01 NS037167, R21 NS089854; Parkinson's UK: J-0901, K-1501; Wellcome Trust

    Genome biology 2017;18;1;22

  • Genetic variation and gene expression across multiple tissues and developmental stages in a nonhuman primate.

    Jasinska AJ, Zelaya I, Service SK, Peterson CB, Cantor RM, Choi OW, DeYoung J, Eskin E, Fairbanks LA, Fears S, Furterer AE, Huang YS, Ramensky V, Schmitt CA, Svardal H, Jorgensen MJ, Kaplan JR, Villar D, Aken BL, Flicek P, Nag R, Wong ES, Blangero J, Dyer TD, Bogomolov M, Benjamini Y, Weinstock GM, Dewar K, Sabatti C, Wilson RK, Jentsch JD, Warren W, Coppola G, Woods RP and Freimer NB

    Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA.

    By analyzing multitissue gene expression and genome-wide genetic variation data in samples from a vervet monkey pedigree, we generated a transcriptome resource and produced the first catalog of expression quantitative trait loci (eQTLs) in a nonhuman primate model. This catalog contains more genome-wide significant eQTLs per sample than comparable human resources and identifies sex- and age-related expression patterns. Findings include a master regulatory locus that likely has a role in immune function and a locus regulating hippocampal long noncoding RNAs (lncRNAs), whose expression correlates with hippocampal volume. This resource will facilitate genetic investigation of quantitative traits, including brain and behavioral phenotypes relevant to neuropsychiatric disorders.

    Funded by: NCI NIH HHS: P30 CA016672; NCRR NIH HHS: P40 RR019963, R01 RR016300; NIA NIH HHS: P01 AG002219, P50 AG005138; NIDCR NIH HHS: UL1 DE019580; NIH HHS: P40 OD010965, R01 OD010980; NIMH NIH HHS: HHSN271201300031C, P50 MH066392, P50 MH084053, R01 MH075916, R01 MH080405, R01 MH085542, R01 MH093725, R01 MH097276, R01 MH101782, R37 MH057881, R37 MH060233, RL1 MH083270, T32 MH073526; NINDS NIH HHS: P30 NS062691, PL1 NS062410; Wellcome Trust: 108749

    Nature genetics 2017;49;12;1714-1721

  • Laser Capture and Deep Sequencing Reveals the Transcriptomic Programmes Regulating the Onset of Pancreas and Liver Differentiation in Human Embryos.

    Jennings RE, Berry AA, Gerrard DT, Wearne SJ, Strutt J, Withey S, Chhatriwala M, Piper Hanley K, Vallier L, Bobola N and Hanley NA

    Division of Diabetes, Endocrinology & Gastroenterology, Faculty of Biology, Medicine & Health, AV Hill Building, University of Manchester, Oxford Road, Manchester M13 9PT, UK; Endocrinology Department, Manchester University NHS Foundation Trust, Grafton Street, Manchester M13 9WU, UK.

    To interrogate the alternative fates of pancreas and liver in the earliest stages of human organogenesis, we developed laser capture, RNA amplification, and computational analysis of deep sequencing. Pancreas-enriched gene expression was less conserved between human and mouse than for liver. The dorsal pancreatic bud was enriched for components of Notch, Wnt, BMP, and FGF signaling, almost all genes known to cause pancreatic agenesis or hypoplasia, and over 30 unexplored transcription factors. SOX9 and RORA were imputed as key regulators in pancreas compared with EP300, HNF4A, and FOXA family members in liver. Analyses implied that current in vitro human stem cell differentiation follows a dorsal rather than a ventral pancreatic program and pointed to additional factors for hepatic differentiation. In summary, we provide the transcriptional codes regulating the start of human liver and pancreas development to facilitate stem cell research and clinical interpretation without inter-species extrapolation.

    Funded by: Arthritis Research UK; British Heart Foundation; Medical Research Council: G1100420, MC_PC_12009, MR/J003352/1, MR/L009986/1, MR/P023541/1; Wellcome Trust: 088566, 105610/Z/14/Z

    Stem cell reports 2017;9;5;1387-1394

  • Crosstalk between PKA and PKG controls pH-dependent host cell egress of Toxoplasma gondii.

    Jia Y, Marq JB, Bisio H, Jacot D, Mueller C, Yu L, Choudhary J, Brochet M and Soldati-Favre D

    Department of Microbiology and Molecular Medicine, CMU, University of Geneva, Geneva 4, Switzerland.

    <i>Toxoplasma gondii</i> encodes three protein kinase A catalytic (PKAc1-3) and one regulatory (PKAr) subunits to integrate cAMP-dependent signals. Here, we show that inactive PKAc1 is maintained at the parasite pellicle by interacting with acylated PKAr. Either a conditional knockdown of PKAr or the overexpression of PKAc1 blocks parasite division. Conversely, down-regulation of PKAc1 or stabilisation of a dominant-negative PKAr isoform that does not bind cAMP triggers premature parasite egress from infected cells followed by serial invasion attempts leading to host cell lysis. This untimely egress depends on host cell acidification. A phosphoproteome analysis suggested the interplay between cAMP and cGMP signalling as PKAc1 inactivation changes the phosphorylation profile of a putative cGMP-phosphodiesterase. Concordantly, inhibition of the cGMP-dependent protein kinase G (PKG) blocks egress induced by PKAc1 inactivation or environmental acidification, while a cGMP-phosphodiesterase inhibitor circumvents egress repression by PKAc1 or pH neutralisation. This indicates that pH and PKAc1 act as balancing regulators of cGMP metabolism to control egress. These results reveal a crosstalk between PKA and PKG pathways to govern egress in <i>T. gondii</i>.

    The EMBO journal 2017;36;21;3250-3267

  • Defined, serum/feeder-free conditions for expansion and drug screening of primary B-acute lymphoblastic leukemia.

    Jiang Z, Wu D, Ye W, Weng J, Lai P, Shi P, Guo X, Huang G, Deng Q, Tang Y, Zhao H, Cui S, Lin S, Wang S, Li B, Wu Q, Li Y, Liu P, Pei D, Du X, Yao Y and Li P

    Key Laboratory of Regenerative Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.

    Functional screening for compounds represents a major hurdle in the development of rational therapeutics for B-acute lymphoblastic leukemia (B-ALL). In addition, using cell lines as valid models for evaluating responses to novel drug therapies raises serious concerns, as cell lines are prone to genotypic/phenotypic drift and loss of heterogeneity <i>in vitro</i>. Here, we reported that OP9 cells, not OP9-derived adipocytes (OP9TA), support the growth of primary B-ALL cells <i>in vitro</i>. To identify the factors from OP9 cells that support the growth of primary B-ALL cells, we performed RNA-Seq to analyze the gene expression profiles of OP9 and OP9TA cells. We thus developed a defined, serum/feeder-free condition (FI76V) that can support the expansion of a range of clinically distinct primary B-ALL cells that still maintain their leukemia-initiating ability. We demonstrated the suitability of high-throughput drug screening based on our B-ALL cultured conditions. Upon screening 378 kinase inhibitors, we identified a cluster of 17 kinase inhibitors that can efficiently kill B-ALL cells <i>in vitro</i>. Importantly, we demonstrated the synergistic cytotoxicity of dinaciclib/BTG226 to B-ALL cells. Taken together, we developed a defined condition for the <i>ex vivo</i> expansion of primary B-ALL cells that is suitable for high-throughput screening of novel compounds.

    Oncotarget 2017;8;63;106382-106392

  • Human Y-chromosome variation in the genome-sequencing era.

    Jobling MA and Tyler-Smith C

    Department of Genetics &Genome Biology, University of Leicester, University Road, Leicester LE1 7RH, UK.

    The properties of the human Y chromosome - namely, male specificity, haploidy and escape from crossing over - make it an unusual component of the genome, and have led to its genetic variation becoming a key part of studies of human evolution, population history, genealogy, forensics and male medical genetics. Next-generation sequencing (NGS) technologies have driven recent progress in these areas. In particular, NGS has yielded direct estimates of mutation rates, and an unbiased and calibrated molecular phylogeny that has unprecedented detail. Moreover, the availability of direct-to-consumer NGS services is fuelling a rise of 'citizen scientists', whose interest in resequencing their own Y chromosomes is generating a wealth of new data.

    Funded by: Wellcome Trust: 098051

    Nature reviews. Genetics 2017;18;8;485-497

  • The Type III Secretion System Effector SptP of Salmonella enterica Serovar Typhi.

    Johnson R, Byrne A, Berger CN, Klemm E, Crepin VF, Dougan G and Frankel G

    MRC Centre for Molecular Bacteriology and Infection, Department of Life Sciences, Imperial College London, London, United Kingdom.

    Strains of the various Salmonella enterica serovars cause gastroenteritis or typhoid fever in humans, with virulence depending on the action of two type III secretion systems (Salmonella pathogenicity island 1 [SPI-1] and SPI-2). SptP is a Salmonella SPI-1 effector, involved in mediating recovery of the host cytoskeleton postinfection. SptP requires a chaperone, SicP, for stability and secretion. SptP has 94% identity between S. enterica serovar Typhimurium and S Typhi; direct comparison of the protein sequences revealed that S Typhi SptP has numerous amino acid changes within its chaperone-binding domain. Subsequent comparison of ΔsptP S Typhi and S. Typhimurium strains demonstrated that, unlike SptP in S. Typhimurium, SptP in S Typhi was not involved in invasion or cytoskeletal recovery postinfection. Investigation of whether the observed amino acid changes within SptP of S Typhi affected its function revealed that S Typhi SptP was unable to complement S. Typhimurium ΔsptP due to an absence of secretion. We further demonstrated that while S. Typhimurium SptP is stable intracellularly within S Typhi, S Typhi SptP is unstable, although stability could be recovered following replacement of the chaperone-binding domain with that of S. Typhimurium. Direct assessment of the strength of the interaction between SptP and SicP of both serovars via bacterial two-hybrid analysis demonstrated that S Typhi SptP has a significantly weaker interaction with SicP than the equivalent proteins in S. Typhimurium. Taken together, our results suggest that changes within the chaperone-binding domain of SptP in S Typhi hinder binding to its chaperone, resulting in instability, preventing translocation, and therefore restricting the intracellular activity of this effector.

    Importance: Studies investigating Salmonella pathogenesis typically rely on Salmonella Typhimurium, even though Salmonella Typhi causes the more severe disease in humans. As such, an understanding of S. Typhi pathogenesis is lacking. Differences within the type III secretion system effector SptP between typhoidal and nontyphoidal serovars led us to characterize this effector within S Typhi. Our results suggest that SptP is not translocated from typhoidal serovars, even though the loss of sptP results in virulence defects in S. Typhimurium. Although SptP is just one effector, our results exemplify that the behavior of these serovars is significantly different and genes identified to be important for S. Typhimurium virulence may not translate to S Typhi.

    Funded by: Medical Research Council: MR/J006874/1, MR/K019007/1

    Journal of bacteriology 2017;199;4

  • Meta-Analysis of Genome-Wide Association Studies for Abdominal Aortic Aneurysm Identifies Four New Disease-Specific Risk Loci.

    Jones GT, Tromp G, Kuivaniemi H, Gretarsdottir S, Baas AF, Giusti B, Strauss E, Van't Hof FN, Webb TR, Erdman R, Ritchie MD, Elmore JR, Verma A, Pendergrass S, Kullo IJ, Ye Z, Peissig PL, Gottesman O, Verma SS, Malinowski J, Rasmussen-Torvik LJ, Borthwick KM, Smelser DT, Crosslin DR, de Andrade M, Ryer EJ, McCarty CA, Böttinger EP, Pacheco JA, Crawford DC, Carrell DS, Gerhard GS, Franklin DP, Carey DJ, Phillips VL, Williams MJ, Wei W, Blair R, Hill AA, Vasudevan TM, Lewis DR, Thomson IA, Krysa J, Hill GB, Roake J, Merriman TR, Oszkinis G, Galora S, Saracini C, Abbate R, Pulli R, Pratesi C, Saratzis A, Verissimo AR, Bumpstead S, Badger SA, Clough RE, Cockerill G, Hafez H, Scott DJ, Futers TS, Romaine SP, Bridge K, Griffin KJ, Bailey MA, Smith A, Thompson MM, van Bockxmeer FM, Matthiasson SE, Thorleifsson G, Thorsteinsdottir U, Blankensteijn JD, Teijink JA, Wijmenga C, de Graaf J, Kiemeney LA, Lindholt JS, Hughes A, Bradley DT, Stirrups K, Golledge J, Norman PE, Powell JT, Humphries SE, Hamby SE, Goodall AH, Nelson CP, Sakalihasan N, Courtois A, Ferrell RE, Eriksson P, Folkersen L, Franco-Cereceda A, Eicher JD, Johnson AD, Betsholtz C, Ruusalepp A, Franzén O, Schadt EE, Björkegren JL, Lipovich L, Drolet AM, Verhoeven EL, Zeebregts CJ, Geelkerken RH, van Sambeek MR, van Sterkenburg SM, de Vries JP, Stefansson K, Thompson JR, de Bakker PI, Deloukas P, Sayers RD, Harrison SC, van Rij AM, Samani NJ and Bown MJ

    For the author affiliations, please see the Appendix.

    Rationale: Abdominal aortic aneurysm (AAA) is a complex disease with both genetic and environmental risk factors. Together, 6 previously identified risk loci only explain a small proportion of the heritability of AAA.

    Objective: To identify additional AAA risk loci using data from all available genome-wide association studies.

    Methods and results: Through a meta-analysis of 6 genome-wide association study data sets and a validation study totaling 10 204 cases and 107 766 controls, we identified 4 new AAA risk loci: 1q32.3 (SMYD2), 13q12.11 (LINC00540), 20q13.12 (near PCIF1/MMP9/ZNF335), and 21q22.2 (ERG). In various database searches, we observed no new associations between the lead AAA single nucleotide polymorphisms and coronary artery disease, blood pressure, lipids, or diabetes mellitus. Network analyses identified ERG, IL6R, and LDLR as modifiers of MMP9, with a direct interaction between ERG and MMP9.

    Conclusions: The 4 new risk loci for AAA seem to be specific for AAA compared with other cardiovascular diseases and related traits suggesting that traditional cardiovascular risk factor management may only have limited value in preventing the progression of aneurysmal disease.

    Funded by: British Heart Foundation: FS/11/16/28696, RG/14/5/30893; NCATS NIH HHS: UL1 TR001422; NHGRI NIH HHS: U01 HG006382, U01 HG008657; NHLBI NIH HHS: R01 HL125863; Wellcome Trust: 084695

    Circulation research 2017;120;2;341-353

  • Somatic mutations reveal asymmetric cellular dynamics in the early human embryo.

    Ju YS, Martincorena I, Gerstung M, Petljak M, Alexandrov LB, Rahbari R, Wedge DC, Davies HR, Ramakrishna M, Fullam A, Martin S, Alder C, Patel N, Gamble S, O'Meara S, Giri DD, Sauer T, Pinder SE, Purdie CA, Borg Å, Stunnenberg H, van de Vijver M, Tan BK, Caldas C, Tutt A, Ueno NT, van 't Veer LJ, Martens JW, Sotiriou C, Knappskog S, Span PN, Lakhani SR, Eyfjörd JE, Børresen-Dale AL, Richardson A, Thompson AM, Viari A, Hurles ME, Nik-Zainal S, Campbell PJ and Stratton MR

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Somatic cells acquire mutations throughout the course of an individual's life. Mutations occurring early in embryogenesis are often present in a substantial proportion of, but not all, cells in postnatal humans and thus have particular characteristics and effects. Depending on their location in the genome and the proportion of cells they are present in, these mosaic mutations can cause a wide range of genetic disease syndromes and predispose carriers to cancer. They have a high chance of being transmitted to offspring as de novo germline mutations and, in principle, can provide insights into early human embryonic cell lineages and their contributions to adult tissues. Although it is known that gross chromosomal abnormalities are remarkably common in early human embryos, our understanding of early embryonic somatic mutations is very limited. Here we use whole-genome sequences of normal blood from 241 adults to identify 163 early embryonic mutations. We estimate that approximately three base substitution mutations occur per cell per cell-doubling event in early human embryogenesis and these are mainly attributable to two known mutational signatures. We used the mutations to reconstruct developmental lineages of adult cells and demonstrate that the two daughter cells of many early embryonic cell-doubling events contribute asymmetrically to adult blood at an approximately 2:1 ratio. This study therefore provides insights into the mutation rates, mutational processes and developmental outcomes of cell dynamics that operate during early human embryogenesis.

    Funded by: Wellcome Trust: 077012/Z/05/Z

    Nature 2017;543;7647;714-718

  • Efficient gene targeting in mouse zygotes mediated by CRISPR/Cas9-protein.

    Jung CJ, Zhang J, Trenchard E, Lloyd KC, West DB, Rosen B and de Jong PJ

    University of California, San Francisco Benioff Children's Hospital Oakland Research Institute, Oakland, CA, 94609, USA.

    The CRISPR/Cas9 system has rapidly advanced targeted genome editing technologies. However, its efficiency in targeting with constructs in mouse zygotes via homology directed repair (HDR) remains low. Here, we systematically explored optimal parameters for targeting constructs in mouse zygotes via HDR using mouse embryonic stem cells as a model system. We characterized several parameters, including single guide RNA cleavage activity and the length and symmetry of homology arms in the construct, and we compared the targeting efficiency between Cas9, Cas9nickase, and dCas9-FokI. We then applied the optimized conditions to zygotes, delivering Cas9 as either mRNA or protein. We found that Cas9 nucleo-protein complex promotes highly efficient, multiplexed targeting of circular constructs containing reporter genes and floxed exons. This approach allows for a one-step zygote injection procedure targeting multiple genes to generate conditional alleles via homologous recombination, and simultaneous knockout of corresponding genes in non-targeted alleles via non-homologous end joining.

    Funded by: NCI NIH HHS: P30 CA093373; NHGRI NIH HHS: U54 HG006364; NIDDK NIH HHS: U24 DK092993; NIH HHS: U42 OD011175, U42 OD012210, UM1 OD023221

    Transgenic research 2017;26;2;263-277

  • Genome-wide meta-analysis of 241,258 adults accounting for smoking behaviour identifies novel loci for obesity traits.

    Justice AE, Winkler TW, Feitosa MF, Graff M, Fisher VA, Young K, Barata L, Deng X, Czajkowski J, Hadley D, Ngwa JS, Ahluwalia TS, Chu AY, Heard-Costa NL, Lim E, Perez J, Eicher JD, Kutalik Z, Xue L, Mahajan A, Renström F, Wu J, Qi Q, Ahmad S, Alfred T, Amin N, Bielak LF, Bonnefond A, Bragg J, Cadby G, Chittani M, Coggeshall S, Corre T, Direk N, Eriksson J, Fischer K, Gorski M, Neergaard Harder M, Horikoshi M, Huang T, Huffman JE, Jackson AU, Justesen JM, Kanoni S, Kinnunen L, Kleber ME, Komulainen P, Kumari M, Lim U, Luan J, Lyytikäinen LP, Mangino M, Manichaikul A, Marten J, Middelberg RPS, Müller-Nurasyid M, Navarro P, Pérusse L, Pervjakova N, Sarti C, Smith AV, Smith JA, Stančáková A, Strawbridge RJ, Stringham HM, Sung YJ, Tanaka T, Teumer A, Trompet S, van der Laan SW, van der Most PJ, Van Vliet-Ostaptchouk JV, Vedantam SL, Verweij N, Vink JM, Vitart V, Wu Y, Yengo L, Zhang W, Hua Zhao J, Zimmermann ME, Zubair N, Abecasis GR, Adair LS, Afaq S, Afzal U, Bakker SJL, Bartz TM, Beilby J, Bergman RN, Bergmann S, Biffar R, Blangero J, Boerwinkle E, Bonnycastle LL, Bottinger E, Braga D, Buckley BM, Buyske S, Campbell H, Chambers JC, Collins FS, Curran JE, de Borst GJ, de Craen AJM, de Geus EJC, Dedoussis G, Delgado GE, den Ruijter HM, Eiriksdottir G, Eriksson AL, Esko T, Faul JD, Ford I, Forrester T, Gertow K, Gigante B, Glorioso N, Gong J, Grallert H, Grammer TB, Grarup N, Haitjema S, Hallmans G, Hamsten A, Hansen T, Harris TB, Hartman CA, Hassinen M, Hastie ND, Heath AC, Hernandez D, Hindorff L, Hocking LJ, Hollensted M, Holmen OL, Homuth G, Jan Hottenga J, Huang J, Hung J, Hutri-Kähönen N, Ingelsson E, James AL, Jansson JO, Jarvelin MR, Jhun MA, Jørgensen ME, Juonala M, Kähönen M, Karlsson M, Koistinen HA, Kolcic I, Kolovou G, Kooperberg C, Krämer BK, Kuusisto J, Kvaløy K, Lakka TA, Langenberg C, Launer LJ, Leander K, Lee NR, Lind L, Lindgren CM, Linneberg A, Lobbens S, Loh M, Lorentzon M, Luben R, Lubke G, Ludolph-Donislawski A, Lupoli S, Madden PAF, Männikkö R, Marques-Vidal P, Martin NG, McKenzie CA, McKnight B, Mellström D, Menni C, Montgomery GW, Musk AB, Narisu N, Nauck M, Nolte IM, Oldehinkel AJ, Olden M, Ong KK, Padmanabhan S, Peyser PA, Pisinger C, Porteous DJ, Raitakari OT, Rankinen T, Rao DC, Rasmussen-Torvik LJ, Rawal R, Rice T, Ridker PM, Rose LM, Bien SA, Rudan I, Sanna S, Sarzynski MA, Sattar N, Savonen K, Schlessinger D, Scholtens S, Schurmann C, Scott RA, Sennblad B, Siemelink MA, Silbernagel G, Slagboom PE, Snieder H, Staessen JA, Stott DJ, Swertz MA, Swift AJ, Taylor KD, Tayo BO, Thorand B, Thuillier D, Tuomilehto J, Uitterlinden AG, Vandenput L, Vohl MC, Völzke H, Vonk JM, Waeber G, Waldenberger M, Westendorp RGJ, Wild S, Willemsen G, Wolffenbuttel BHR, Wong A, Wright AF, Zhao W, Zillikens MC, Baldassarre D, Balkau B, Bandinelli S, Böger CA, Boomsma DI, Bouchard C, Bruinenberg M, Chasman DI, Chen YD, Chines PS, Cooper RS, Cucca F, Cusi D, Faire U, Ferrucci L, Franks PW, Froguel P, Gordon-Larsen P, Grabe HJ, Gudnason V, Haiman CA, Hayward C, Hveem K, Johnson AD, Wouter Jukema J, Kardia SLR, Kivimaki M, Kooner JS, Kuh D, Laakso M, Lehtimäki T, Marchand LL, März W, McCarthy MI, Metspalu A, Morris AP, Ohlsson C, Palmer LJ, Pasterkamp G, Pedersen O, Peters A, Peters U, Polasek O, Psaty BM, Qi L, Rauramaa R, Smith BH, Sørensen TIA, Strauch K, Tiemeier H, Tremoli E, van der Harst P, Vestergaard H, Vollenweider P, Wareham NJ, Weir DR, Whitfield JB, Wilson JF, Tyrrell J, Frayling TM, Barroso I, Boehnke M, Deloukas P, Fox CS, Hirschhorn JN, Hunter DJ, Spector TD, Strachan DP, van Duijn CM, Heid IM, Mohlke KL, Marchini J, Loos RJF, Kilpeläinen TO, Liu CT, Borecki IB, North KE and Cupples LA

    Department of Epidemiology, University of North Carolina, Chapel Hill, North Carolina 27599, USA.

    Few genome-wide association studies (GWAS) account for environmental exposures, like smoking, potentially impacting the overall trait variance when investigating the genetic contribution to obesity-related traits. Here, we use GWAS data from 51,080 current smokers and 190,178 nonsmokers (87% European descent) to identify loci influencing BMI and central adiposity, measured as waist circumference and waist-to-hip ratio both adjusted for BMI. We identify 23 novel genetic loci, and 9 loci with convincing evidence of gene-smoking interaction (GxSMK) on obesity-related traits. We show consistent direction of effect for all identified loci and significance for 18 novel and for 5 interaction loci in an independent study sample. These loci highlight novel biological functions, including response to oxidative stress, addictive behaviour, and regulatory functions emphasizing the importance of accounting for environment in genetic analyses. Our results suggest that tobacco smoking may alter the genetic susceptibility to overall adiposity and body fat distribution.

    Funded by: British Heart Foundation: RG/10/12/28456, RG/14/5/30893; FIC NIH HHS: R01 TW005596, R01 TW008288; Medical Research Council: G1001799, MC_PC_U127561128, MC_UU_12015/1, MC_UU_12015/2, MC_UU_12019/1, MR/N01104X/1; NCATS NIH HHS: KL2 TR001109, UL1 TR000040, UL1 TR000124, UL1 TR001079, UL1 TR001881; NCI NIH HHS: P01 CA033619, R01 CA047988, R37 CA054281, U01 CA098758, U01 CA136792, UM1 CA182913; NCRR NIH HHS: P20 RR020649, UL1 RR025005; NEI NIH HHS: T32 EY022303; NHGRI NIH HHS: N01HG65403, R01 HG002651, U01 HG004402, U01 HG004790, U01 HG004802, U01 HG007376, Z01 HG000024; NHLBI NIH HHS: HHSN268200800007C, HHSN268201100001I, HHSN268201100002I, HHSN268201100004I, HHSN268201100046C, HHSN268201200036C, HHSN268201500001C, HHSN268201500001I, K99 HL130580, N01HC25195, N01HC55015, N01HC55016, N01HC55019, N01HC55020, N01HC55021, N01HC55022, N01HC85079, N01HC85080, N01HC85081, N01HC85082, N01HC85083, N01HC85086, N01HC95159, N01HC95160, N01HC95161, N01HC95162, N01HC95163, N01HC95164, N01HC95165, N01HC95166, N01HC95167, N01HC95168, N01HC95169, R01 HL034594, R01 HL043851, R01 HL045670, R01 HL053353, R01 HL059367, R01 HL071981, R01 HL080467, R01 HL085144, R01 HL086694, R01 HL087641, R01 HL087652, R01 HL087660, R01 HL103612, R01 HL105756, R01 HL117078, R01 HL118305, R01 HL119443, R01 HL120393, R21 HL126024, T32 HL007055, U01 HL054457, U01 HL080295, U01 HL084729, U01 HL130114, U10 HL054457; NIA NIH HHS: HHSN271201100004C, N01AG12100, R01 AG013196, R01 AG023629, R03 AG046389, R37 AG013196, RC2 AG036495, RC4 AG039029, U01 AG009740; NIAAA NIH HHS: K05 AA017688, P50 AA011998, R01 AA007535, R01 AA013320, R01 AA013321, R01 AA013326, R01 AA014041; NICHD NIH HHS: P2C HD050924, R01 HD057194; NIDA NIH HHS: R01 DA012854, R01 DA018673, R56 DA012854; NIDDK NIH HHS: P30 DK020572, P30 DK046200, P30 DK056350, P30 DK063491, R01 DK062370, R01 DK072193, R01 DK075787, R01 DK078150, R01 DK078616, R01 DK089256, R01 DK091718, R01 DK093757, R01 DK100383, U01 DK062418, U01 DK078616; NIEHS NIH HHS: P30 ES010126; NIGMS NIH HHS: T32 GM074905; NIH HHS: S10 OD018522, S10 OD020069; NIMH NIH HHS: R01 MH066206, R01 MH081802, RC2 MH089951, U24 MH068457; NIMHD NIH HHS: R01 MD009164; NLM NIH HHS: R01 LM010098; WHI NIH HHS: HHSN268201100001C, HHSN268201100002C, HHSN268201100003C, HHSN268201100004C, N01WH22110; Wellcome Trust

    Nature communications 2017;8;14977

  • The Human Phenotype Ontology in 2017.

    Köhler S, Vasilevsky NA, Engelstad M, Foster E, McMurry J, Aymé S, Baynam G, Bello SM, Boerkoel CF, Boycott KM, Brudno M, Buske OJ, Chinnery PF, Cipriani V, Connell LE, Dawkins HJ, DeMare LE, Devereau AD, de Vries BB, Firth HV, Freson K, Greene D, Hamosh A, Helbig I, Hum C, Jähn JA, James R, Krause R, F Laulederkind SJ, Lochmüller H, Lyon GJ, Ogishima S, Olry A, Ouwehand WH, Pontikos N, Rath A, Schaefer F, Scott RH, Segal M, Sergouniotis PI, Sever R, Smith CL, Straub V, Thompson R, Turner C, Turro E, Veltman MW, Vulliamy T, Yu J, von Ziegenweidt J, Zankl A, Züchner S, Zemojtel T, Jacobsen JO, Groza T, Smedley D, Mungall CJ, Haendel M and Robinson PN

    Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany

    Deep phenotyping has been defined as the precise and comprehensive analysis of phenotypic abnormalities in which the individual components of the phenotype are observed and described. The three components of the Human Phenotype Ontology (HPO; project are the phenotype vocabulary, disease-phenotype annotations and the algorithms that operate on these. These components are being used for computational deep phenotyping and precision medicine as well as integration of clinical data into translational research. The HPO is being increasingly adopted as a standard for phenotypic abnormalities by diverse groups such as international rare disease organizations, registries, clinical labs, biomedical resources, and clinical software tools and will thereby contribute toward nascent efforts at global data exchange for identifying disease etiologies. This update article reviews the progress of the HPO project since the debut Nucleic Acids Research database article in 2014, including specific areas of expansion such as common (complex) disease, new algorithms for phenotype driven genomic discovery and diagnostics, integration of cross-species mapping efforts with the Mammalian Phenotype Ontology, an improved quality control pipeline, and the addition of patient-friendly terminology.

    Funded by: British Heart Foundation: RG/09/012/28096; Department of Health: RP-PG-0310-1002; Medical Research Council: G1002274, MC_UP_1501/2; NHGRI NIH HHS: U01 HG009453, U41 HG000330; NIH HHS: R24 OD011883

    Nucleic acids research 2017;45;D1;D865-D876

  • De Novo Mutations in Protein Kinase Genes CAMK2A and CAMK2B Cause Intellectual Disability.

    Küry S, van Woerden GM, Besnard T, Proietti Onori M, Latypova X, Towne MC, Cho MT, Prescott TE, Ploeg MA, Sanders S, Stessman HAF, Pujol A, Distel B, Robak LA, Bernstein JA, Denommé-Pichon AS, Lesca G, Sellars EA, Berg J, Carré W, Busk ØL, van Bon BWM, Waugh JL, Deardorff M, Hoganson GE, Bosanko KB, Johnson DS, Dabir T, Holla ØL, Sarkar A, Tveten K, de Bellescize J, Braathen GJ, Terhal PA, Grange DK, van Haeringen A, Lam C, Mirzaa G, Burton J, Bhoj EJ, Douglas J, Santani AB, Nesbitt AI, Helbig KL, Andrews MV, Begtrup A, Tang S, van Gassen KLI, Juusola J, Foss K, Enns GM, Moog U, Hinderhofer K, Paramasivam N, Lincoln S, Kusako BH, Lindenbaum P, Charpentier E, Nowak CB, Cherot E, Simonet T, Ruivenkamp CAL, Hahn S, Brownstein CA, Xia F, Schmitt S, Deb W, Bonneau D, Nizon M, Quinquis D, Chelly J, Rudolf G, Sanlaville D, Parent P, Gilbert-Dussardier B, Toutain A, Sutton VR, Thies J, Peart-Vissers LELM, Boisseau P, Vincent M, Grabrucker AM, Dubourg C, Undiagnosed Diseases Network, Tan WH, Verbeek NE, Granzow M, Santen GWE, Shendure J, Isidor B, Pasquier L, Redon R, Yang Y, State MW, Kleefstra T, Cogné B, GEM HUGO, Deciphering Developmental Disorders Study, Petrovski S, Retterer K, Eichler EE, Rosenfeld JA, Agrawal PB, Bézieau S, Odent S, Elgersma Y and Mercier S

    CHU Nantes, Service de Génétique Médicale, 9 quai Moncousu, 44093 Nantes Cedex 1, France. Electronic address:

    Calcium/calmodulin-dependent protein kinase II (CAMK2) is one of the first proteins shown to be essential for normal learning and synaptic plasticity in mice, but its requirement for human brain development has not yet been established. Through a multi-center collaborative study based on a whole-exome sequencing approach, we identified 19 exceedingly rare de novo CAMK2A or CAMK2B variants in 24 unrelated individuals with intellectual disability. Variants were assessed for their effect on CAMK2 function and on neuronal migration. For both CAMK2A and CAMK2B, we identified mutations that decreased or increased CAMK2 auto-phosphorylation at Thr286/Thr287. We further found that all mutations affecting auto-phosphorylation also affected neuronal migration, highlighting the importance of tightly regulated CAMK2 auto-phosphorylation in neuronal function and neurodevelopment. Our data establish the importance of CAMK2A and CAMK2B and their auto-phosphorylation in human brain function and expand the phenotypic spectrum of the disorders caused by variants in key players of the glutamatergic signaling pathway.

    Funded by: NIAMS NIH HHS: R01 AR068429; NICHD NIH HHS: U19 HD077671; NIMH NIH HHS: R01 MH101221; NINDS NIH HHS: K08 NS092898

    American journal of human genetics 2017;101;5;768-788

  • Genetic Characterization of Vibrio cholerae O1 isolates from outbreaks between 2011 and 2015 in Tanzania.

    Kachwamba Y, Mohammed AA, Lukupulo H, Urio L, Majigo M, Mosha F, Matonya M, Kishimba R, Mghamba J, Lusekelo J, Nyanga S, Almeida M, Li S, Domman D, Massele SY and Stine OC

    Muhimbili University of Health and Allied Sciences, Dar es Salaam, United Republic of Tanzania.

    Background: Cholera outbreaks have occurred in Tanzania since 1974. To date, the genetic epidemiology of these outbreaks has not been assessed.

    Methods: 96 Vibrio cholerae O1 isolates from five regions were characterized, and their genetic relatedness assessed using multi-locus variable-number tandem-repeat analysis (MLVA) and whole genome sequencing (WGS).

    Results: Of the 48 MLVA genotypes observed, 3 were genetically unrelated to any others, while the remaining 45 genotypes separated into three MLVA clonal complexes (CCs) - each comprised of genotypes differing by a single allelic change. In Kigoma, two separate outbreaks, 4 months apart (January and May, 2015), were each caused by genetically distinct strains by MLVA and WGS. Remarkably, one MLVA CC contained isolates from both the May outbreak and ones from the 2011/2012 outbreak in Dar-es-Salaam. However, WGS revealed the isolates from the two outbreaks to be distinct clades. The outbreak that started in August 2015 in Dar-es-Salaam and spread to Morogoro, Singida and Mara was comprised of a single MLVA CC and WGS clade. Isolates from within an outbreak were closely related differing at fewer than 5 nucleotides. All isolates were part of the 3<sup>rd</sup> wave of the 7<sup>th</sup> pandemic and were found in four clades related to isolates from Kenya and Asia.

    Conclusions: We conclude that genetically related V. cholerae cluster in outbreaks, and distinct strains circulate simultaneously.

    Funded by: NIAID NIH HHS: R01 AI039129

    BMC infectious diseases 2017;17;1;157

  • Impact of insecticide resistance in Anopheles arabiensis on malaria incidence and prevalence in Sudan and the costs of mitigation.

    Kafy HT, Ismail BA, Mnzava AP, Lines J, Abdin MSE, Eltaher JS, Banaga AO, West P, Bradley J, Cook J, Thomas B, Subramaniam K, Hemingway J, Knox TB, Malik EM, Yukich JO, Donnelly MJ and Kleinschmidt I

    Vector Unit, Ministry of Health, Khartoum, Sudan.

    Insecticide-based interventions have contributed to ∼78% of the reduction in the malaria burden in sub-Saharan Africa since 2000. Insecticide resistance in malaria vectors could presage a catastrophic rebound in disease incidence and mortality. A major impediment to the implementation of insecticide resistance management strategies is that evidence of the impact of resistance on malaria disease burden is limited. A cluster randomized trial was conducted in Sudan with pyrethroid-resistant and carbamate-susceptible malaria vectors. Clusters were randomly allocated to receive either long-lasting insecticidal nets (LLINs) alone or LLINs in combination with indoor residual spraying (IRS) with a pyrethroid (deltamethrin) insecticide in the first year and a carbamate (bendiocarb) insecticide in the two subsequent years. Malaria incidence was monitored for 3 y through active case detection in cohorts of children aged 1 to <10 y. When deltamethrin was used for IRS, incidence rates in the LLIN + IRS arm and the LLIN-only arm were similar, with the IRS providing no additional protection [incidence rate ratio (IRR) = 1.0 (95% confidence interval [CI]: 0.36-3.0; <i>P</i> = 0.96)]. When bendiocarb was used for IRS, there was some evidence of additional protection [interaction IRR = 0.55 (95% CI: 0.40-0.76; <i>P</i> < 0.001)]. In conclusion, pyrethroid resistance may have had an impact on pyrethroid-based IRS. The study was not designed to assess whether resistance had an impact on LLINs. These data alone should not be used as the basis for any policy change in vector control interventions.

    Funded by: Medical Research Council: MR/K012126/1; World Health Organization: 001

    Proceedings of the National Academy of Sciences of the United States of America 2017;114;52;E11267-E11275

  • Tracking the embryonic stem cell transition from ground state pluripotency.

    Kalkan T, Olova N, Roode M, Mulas C, Lee HJ, Nett I, Marks H, Walker R, Stunnenberg HG, Lilley KS, Nichols J, Reik W, Bertone P and Smith A

    Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, Cambridge CB2 1QR, UK

    Mouse embryonic stem (ES) cells are locked into self-renewal by shielding from inductive cues. Release from this ground state in minimal conditions offers a system for delineating developmental progression from naïve pluripotency. Here, we examine the initial transition process. The ES cell population behaves asynchronously. We therefore exploited a short-half-life <i>Rex1::GFP</i> reporter to isolate cells either side of exit from naïve status. Extinction of ES cell identity in single cells is acute. It occurs only after near-complete elimination of naïve pluripotency factors, but precedes appearance of lineage specification markers. Cells newly departed from the ES cell state display features of early post-implantation epiblast and are distinct from primed epiblast. They also exhibit a genome-wide increase in DNA methylation, intermediate between early and late epiblast. These findings are consistent with the proposition that naïve cells transition to a distinct formative phase of pluripotency preparatory to lineage priming.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/K010867/1, BB/M004023/1; Medical Research Council: G1100526, MC_PC_12009; Wellcome Trust: 095645/Z/11/Z, 091484/Z/10/Z

    Development (Cambridge, England) 2017;144;7;1221-1234

  • Systematic longitudinal survey of invasive Escherichia coli in England demonstrates a stable population structure only transiently disturbed by the emergence of ST131.

    Kallonen T, Brodrick HJ, Harris SR, Corander J, Brown NM, Martin V, Peacock SJ and Parkhill J

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom.

    <i>Escherichia coli</i> associated with urinary tract infections and bacteremia has been intensively investigated, including recent work focusing on the virulent, globally disseminated, multidrug-resistant lineage ST131. To contextualize ST131 within the broader <i>E. coli</i> population associated with disease, we used genomics to analyze a systematic 11-yr hospital-based survey of <i>E. coli</i> associated with bacteremia using isolates collected from across England by the British Society for Antimicrobial Chemotherapy and from the Cambridge University Hospitals NHS Foundation Trust. Population dynamics analysis of the most successful lineages identified the emergence of ST131 and ST69 and their establishment as two of the five most common lineages along with ST73, ST95, and ST12. The most frequently identified lineage was ST73. Compared to ST131, ST73 was susceptible to most antibiotics, indicating that multidrug resistance was not the dominant reason for prevalence of <i>E. coli</i> lineages in this population. Temporal phylogenetic analysis of the emergence of ST69 and ST131 identified differences in the dynamics of emergence and showed that expansion of ST131 in this population was not driven by sequential emergence of increasingly resistant subclades. We showed that over time, the <i>E. coli</i> population was only transiently disturbed by the introduction of new lineages before a new equilibrium was rapidly achieved. Together, these findings suggest that the frequency of <i>E. coli</i> lineages in invasive disease is driven by negative frequency-dependent selection occurring outside of the hospital, most probably in the commensal niche, and that drug resistance is not a primary determinant of success in this niche.

    Funded by: Wellcome Trust

    Genome research 2017

  • WD40-repeat 47, a microtubule-associated protein, is essential for brain development and autophagy.

    Kannan M, Bayam E, Wagner C, Rinaldi B, Kretz PF, Tilly P, Roos M, McGillewie L, Bär S, Minocha S, Chevalier C, Po C, Sanger Mouse Genetics Project, Chelly J, Mandel JL, Borgatti R, Piton A, Kinnear C, Loos B, Adams DJ, Hérault Y, Collins SC, Friant S, Godin JD and Yalcin B

    Department of Translational Medicine and Neurogenetics, Institut de Génétique et de Biologie Moléculaire et Cellulaire, 67404 Illkirch, France.

    The family of WD40-repeat (WDR) proteins is one of the largest in eukaryotes, but little is known about their function in brain development. Among 26 WDR genes assessed, we found 7 displaying a major impact in neuronal morphology when inactivated in mice. Remarkably, all seven genes showed corpus callosum defects, including thicker (<i>Atg16l1</i>, <i>Coro1c</i>, <i>Dmxl2</i>, and <i>Herc1</i>), thinner (<i>Kif21b</i> and <i>Wdr89</i>), or absent corpus callosum (<i>Wdr47</i>), revealing a common role for WDR genes in brain connectivity. We focused on the poorly studied WDR47 protein sharing structural homology with LIS1, which causes lissencephaly. In a dosage-dependent manner, mice lacking <i>Wdr47</i> showed lethality, extensive fiber defects, microcephaly, thinner cortices, and sensory motor gating abnormalities. We showed that WDR47 shares functional characteristics with LIS1 and participates in key microtubule-mediated processes, including neural stem cell proliferation, radial migration, and growth cone dynamics. In absence of WDR47, the exhaustion of late cortical progenitors and the consequent decrease of neurogenesis together with the impaired survival of late-born neurons are likely yielding to the worsening of the microcephaly phenotype postnatally. Interestingly, the WDR47-specific C-terminal to LisH (CTLH) domain was associated with functions in autophagy described in mammals. Silencing WDR47 in hypothalamic GT1-7 neuronal cells and yeast models independently recapitulated these findings, showing conserved mechanisms. Finally, our data identified superior cervical ganglion-10 (SCG10) as an interacting partner of WDR47. Taken together, these results provide a starting point for studying the implications of WDR proteins in neuronal regulation of microtubules and autophagy.

    Proceedings of the National Academy of Sciences of the United States of America 2017;114;44;E9308-E9317

  • TPL-2 restricts Ccl24-dependent immunity to Heligmosomoides polygyrus.

    Kannan Y, Entwistle LJ, Pelly VS, Perez-Lloret J, Walker AW, Ley SC and Wilson MS

    Allergy and Anti-helminth Immunity Laboratory, The Francis Crick Institute, London, United Kingdom.

    TPL-2 (COT, MAP3K8) kinase activates the MEK1/2-ERK1/2 MAPK signaling pathway in innate immune responses following TLR, TNFR1 and IL-1R stimulation. TPL-2 contributes to type-1/Th17-mediated autoimmunity and control of intracellular pathogens. We recently demonstrated TPL-2 reduces severe airway allergy to house dust mite by negatively regulating type-2 responses. In the present study, we found that TPL-2 deficiency resulted in resistance to Heligmosomoides polygyrus infection, with accelerated worm expulsion, reduced fecal egg burden and reduced worm fitness. Using co-housing experiments, we found resistance to infection in TPL-2 deficient mice (Map3k8-/-) was independent of microbiota alterations in H. polygyrus infected WT and Map3k8-/-mice. Additionally, our data demonstrated immunity to H. polygyrus infection in TPL-2 deficient mice was not due to dysregulated type-2 immune responses. Genome-wide analysis of intestinal tissue from infected TPL-2-deficient mice identified elevated expression of genes involved in chemotaxis and homing of leukocytes and cells, including Ccl24 and alternatively activated genes. Indeed, Map3k8-/-mice had a significant influx of eosinophils, neutrophils, monocytes and Il4GFP+ T cells. Conditional knockout experiments demonstrated that specific deletion of TPL-2 in CD11c+ cells, but not Villin+ epithelial cells, LysM+ myeloid cells or CD4+ T cells, led to accelerated resistance to H. polygyrus. In line with a central role of CD11c+ cells, CD11c+ CD11b+ cells isolated from TPL-2-deficient mice had elevated Ccl24. Finally, Ccl24 neutralization in TPL-2 deficient mice significantly decreased the expression of Arg1, Retnla, Chil3 and Ear11 correlating with a loss of resistance to H. polygyrus. These observations suggest that TPL-2-regulated Ccl24 in CD11c+CD11b+ cells prevents accelerated type-2 mediated immunity to H. polygyrus. Collectively, this study identifies a previously unappreciated role for TPL-2 controlling immune responses to H. polygyrus infection by restricting Ccl24 production.

    Funded by: Medical Research Council: MC_U117584209

    PLoS pathogens 2017;13;7;e1006536

  • Flipping between Polycomb repressed and active transcriptional states introduces noise in gene expression.

    Kar G, Kim JK, Kolodziejczyk AA, Natarajan KN, Torlai Triglia E, Mifsud B, Elderkin S, Marioni JC, Pombo A and Teichmann SA

    European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

    Polycomb repressive complexes (PRCs) are important histone modifiers, which silence gene expression; yet, there exists a subset of PRC-bound genes actively transcribed by RNA polymerase II (RNAPII). It is likely that the role of Polycomb repressive complex is to dampen expression of these PRC-active genes. However, it is unclear how this flipping between chromatin states alters the kinetics of transcription. Here, we integrate histone modifications and RNAPII states derived from bulk ChIP-seq data with single-cell RNA-sequencing data. We find that Polycomb repressive complex-active genes have greater cell-to-cell variation in expression than active genes, and these results are validated by knockout experiments. We also show that PRC-active genes are clustered on chromosomes in both two and three dimensions, and interactions with active enhancers promote a stabilization of gene expression noise. These findings provide new insights into how chromatin regulation modulates stochastic gene expression and transcriptional bursting, with implications for regulation of pluripotency and development.Polycomb repressive complexes modify histones but it is unclear how changes in chromatin states alter kinetics of transcription. Here, the authors use single-cell RNAseq and ChIPseq to find that actively transcribed genes with Polycomb marks have greater cell-to-cell variation in expression.

    Nature communications 2017;8;1;36

  • Improving the Identification of Phenotypic Abnormalities and Sexual Dimorphism in Mice When Studying Rare Event Categorical Characteristics.

    Karp NA, Heller R, Yaacoby S, White JK and Benjamini Y

    Mouse Informatics Group, Wellcome Trust Sanger Institute, Cambridge, CB10 1SA, United Kingdom

    Biological research frequently involves the study of phenotyping data. Many of these studies focus on rare event categorical data, and functional genomics studies typically study the presence or absence of an abnormal phenotype. With the growing interest in the role of sex, there is a need to assess the phenotype for sexual dimorphism. The identification of abnormal phenotypes for downstream research is challenged by the small sample size, the rare event nature, and the multiple testing problem, as many variables are monitored simultaneously. Here, we develop a statistical pipeline to assess statistical and biological significance while managing the multiple testing problem. We propose a two-step pipeline to initially assess for a treatment effect, in our case example genotype, and then test for an interaction with sex. We compare multiple statistical methods and use simulations to investigate the control of the type-one error rate and power. To maximize the power while addressing the multiple testing issue, we implement filters to remove data sets where the hypotheses to be tested cannot achieve significance. A motivating case study utilizing a large scale high-throughput mouse phenotyping data set from the Wellcome Trust Sanger Institute Mouse Genetics Project, where the treatment is a gene ablation, demonstrates the benefits of the new pipeline on the downstream biological calls.

    Funded by: European Research Council: 294519; NHGRI NIH HHS: U54 HG006370; Wellcome Trust: WT098051

    Genetics 2017;205;2;491-501

  • Prevalence of sexual dimorphism in mammalian phenotypic traits.

    Karp NA, Mason J, Beaudet AL, Benjamini Y, Bower L, Braun RE, Brown SDM, Chesler EJ, Dickinson ME, Flenniken AM, Fuchs H, Angelis MH, Gao X, Guo S, Greenaway S, Heller R, Herault Y, Justice MJ, Kurbatova N, Lelliott CJ, Lloyd KCK, Mallon AM, Mank JE, Masuya H, McKerlie C, Meehan TF, Mott RF, Murray SA, Parkinson H, Ramirez-Solis R, Santos L, Seavitt JR, Smedley D, Sorg T, Speak AO, Steel KP, Svenson KL, International Mouse Phenotyping Consortium, Wakana S, West D, Wells S, Westerberg H, Yaacoby S and White JK

    Mouse Informatics Group, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    The role of sex in biomedical studies has often been overlooked, despite evidence of sexually dimorphic effects in some biological studies. Here, we used high-throughput phenotype data from 14,250 wildtype and 40,192 mutant mice (representing 2,186 knockout lines), analysed for up to 234 traits, and found a large proportion of mammalian traits both in wildtype and mutants are influenced by sex. This result has implications for interpreting disease phenotypes in animal models and humans.

    Funded by: Medical Research Council: G0300212, MC_QA137918, MC_U142684171, MC_U142684172, MC_UP_1502/3, MR/N012119/1; NCI NIH HHS: P30 CA034196; NHGRI NIH HHS: UM1 HG006348; NIH HHS: U42 OD011185, U42 OD012210, UM1 OD023221, UM1 OD023222

    Nature communications 2017;8;15475

  • Insertional mutagenesis identifies drivers of a novel oncogenic pathway in invasive lobular breast carcinoma.

    Kas SM, de Ruiter JR, Schipper K, Annunziato S, Schut E, Klarenbeek S, Drenth AP, van der Burg E, Klijn C, Ten Hoeve JJ, Adams DJ, Koudijs MJ, Wesseling J, Nethe M, Wessels LFA and Jonkers J

    Division of Molecular Pathology, The Netherlands Cancer Institute, Amsterdam, the Netherlands.

    Invasive lobular carcinoma (ILC) is the second most common breast cancer subtype and accounts for 8-14% of all cases. Although the majority of human ILCs are characterized by the functional loss of E-cadherin (encoded by CDH1), inactivation of Cdh1 does not predispose mice to develop mammary tumors, implying that mutations in additional genes are required for ILC formation in mice. To identify these genes, we performed an insertional mutagenesis screen using the Sleeping Beauty transposon system in mice with mammary-specific inactivation of Cdh1. These mice developed multiple independent mammary tumors of which the majority resembled human ILC in terms of morphology and gene expression. Recurrent and mutually exclusive transposon insertions were identified in Myh9, Ppp1r12a, Ppp1r12b and Trp53bp2, whose products have been implicated in the regulation of the actin cytoskeleton. Notably, MYH9, PPP1R12B and TP53BP2 were also frequently aberrated in human ILC, highlighting these genes as drivers of a novel oncogenic pathway underlying ILC development.

    Funded by: Cancer Research UK: 13031

    Nature genetics 2017;49;8;1219-1230

  • Single-cell epigenomics: Recording the past and predicting the future.

    Kelsey G, Stegle O and Reik W

    Epigenetics Programme, Babraham Institute, Cambridge CB22 3AT, UK.

    Single-cell multi-omics has recently emerged as a powerful technology by which different layers of genomic output-and hence cell identity and function-can be recorded simultaneously. Integrating various components of the epigenome into multi-omics measurements allows for studying cellular heterogeneity at different time scales and for discovering new layers of molecular connectivity between the genome and its functional output. Measurements that are increasingly available range from those that identify transcription factor occupancy and initiation of transcription to long-lasting and heritable epigenetic marks such as DNA methylation. Together with techniques in which cell lineage is recorded, this multilayered information will provide insights into a cell's past history and its future potential. This will allow new levels of understanding of cell fate decisions, identity, and function in normal development, physiology, and disease.

    Funded by: European Research Council; Medical Research Council: MR/K011332/1; Wellcome Trust

    Science (New York, N.Y.) 2017;358;6359;69-75

  • Identification of 153 new loci associated with heel bone mineral density and functional involvement of GPC6 in osteoporosis.

    Kemp JP, Morris JA, Medina-Gomez C, Forgetta V, Warrington NM, Youlten SE, Zheng J, Gregson CL, Grundberg E, Trajanoska K, Logan JG, Pollard AS, Sparkes PC, Ghirardello EJ, Allen R, Leitch VD, Butterfield NC, Komla-Ebri D, Adoum AT, Curry KF, White JK, Kussy F, Greenlaw KM, Xu C, Harvey NC, Cooper C, Adams DJ, Greenwood CMT, Maurano MT, Kaptoge S, Rivadeneira F, Tobias JH, Croucher PI, Ackert-Bicknell CL, Bassett JHD, Williams GR, Richards JB and Evans DM

    University of Queensland Diamantina Institute, Translational Research Institute, Brisbane, Queensland, Australia.

    Osteoporosis is a common disease diagnosed primarily by measurement of bone mineral density (BMD). We undertook a genome-wide association study (GWAS) in 142,487 individuals from the UK Biobank to identify loci associated with BMD as estimated by quantitative ultrasound of the heel. We identified 307 conditionally independent single-nucleotide polymorphisms (SNPs) that attained genome-wide significance at 203 loci, explaining approximately 12% of the phenotypic variance. These included 153 previously unreported loci, and several rare variants with large effect sizes. To investigate the underlying mechanisms, we undertook (1) bioinformatic, functional genomic annotation and human osteoblast expression studies; (2) gene-function prediction; (3) skeletal phenotyping of 120 knockout mice with deletions of genes adjacent to lead independent SNPs; and (4) analysis of gene expression in mouse osteoblasts, osteocytes and osteoclasts. The results implicate GPC6 as a novel determinant of BMD, and also identify abnormal skeletal phenotypes in knockout mice associated with a further 100 prioritized genes.

    Funded by: Arthritis Research UK: 17702, 21231; British Heart Foundation: RG/08/014/24067, RG/13/13/30194; Department of Health: HTA/10/33/04; Medical Research Council: G0400491, MC_QA137853, MC_U147585819, MC_U147585824, MC_U147585827, MC_UP_A620_1014, MC_UU_12011/1, MC_UU_12013/4, MR/L003120/1; Wellcome Trust: 094134, 101123WILLIAMS

    Nature genetics 2017;49;10;1468-1475

  • Fine-Scale Genetic Structure in Finland.

    Kerminen S, Havulinna AS, Hellenthal G, Martin AR, Sarin AP, Perola M, Palotie A, Salomaa V, Daly MJ, Ripatti S and Pirinen M

    Institute for Molecular Medicine Finland, University of Helsinki, 00014, Finland.

    Coupling dense genotype data with new computational methods offers unprecedented opportunities for individual-level ancestry estimation once geographically precisely defined reference data sets become available. We study such a reference data set for Finland containing 2376 such individuals from the FINRISK Study survey of 1997 both of whose parents were born close to each other. This sampling strategy focuses on the population structure present in Finland before the 1950s. By using the recent haplotype-based methods ChromoPainter (CP) and FineSTRUCTURE (FS) we reveal a highly geographically clustered genetic structure in Finland and report its connections to the settlement history as well as to the current dialectal regions of the Finnish language. The main genetic division within Finland shows striking concordance with the 1323 borderline of the treaty of Nöteborg. In general, we detect genetic substructure throughout the country, which reflects stronger regional genetic differences in Finland compared to, for example, the UK, which in a similar analysis was dominated by a single unstructured population. We expect that similar population genetic reference data sets will become available for many more populations in the near future with important applications, for example, in forensic genetics and in genetic association studies. With this in mind, we report those extensions of the CP + FS approach that we found most useful in our analyses of the Finnish data.

    G3 (Bethesda, Md.) 2017;7;10;3459-3468

  • Clinical features associated with CTNNB1 de novo loss of function mutations in ten individuals.

    Kharbanda M, Pilz DT, Tomkins S, Chandler K, Saggar A, Fryer A, McKay V, Louro P, Smith JC, Burn J, Kini U, De Burca A, FitzPatrick DR, Kinning E and DDD Study

    West of Scotland Clinical Genetics Service, Level 2A Laboratory Medicine Building, Queen Elizabeth University Hospital, Glasgow, UK. Electronic address:

    Loss of function mutations in CTNNB1 have been reported in individuals with intellectual disability [MIM #615075] associated with peripheral spasticity, microcephaly and central hypotonia, suggesting a recognisable phenotype associated with haploinsufficiency for this gene. Trio based whole exome sequencing via the Deciphering Developmental Disorders (DDD) study has identified eleven further individuals with de novo loss of function mutations in CTNNB1. Here we report detailed phenotypic information on ten of these. We confirm the features that have been previously described and further delineate the skin and hair findings, including fair skin and fair and sparse hair with unusual patterning.

    Funded by: Medical Research Council: MC_PC_U127561093; Wellcome Trust

    European journal of medical genetics 2017;60;2;130-135

  • Adults with suspected central nervous system infection: A prospective study of diagnostic accuracy.

    Khatib U, van de Beek D, Lees JA and Brouwer MC

    Department of Neurology, Center of Infection and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands.

    Objectives: To study the diagnostic accuracy of clinical and laboratory features in the diagnosis of central nervous system (CNS) infection and bacterial meningitis.

    Methods: We included consecutive adult episodes with suspected CNS infection who underwent cerebrospinal fluid (CSF) examination. The reference standard was the diagnosis classified into five categories: 1) CNS infection; 2) CNS inflammation without infection; 3) other neurological disorder; 4) non-neurological infection; and 5) other systemic disorder.

    Results: Between 2012 and 2015, 363 episodes of suspected CNS infection were included. CSF examination showed leucocyte count >5/mm<sup>3</sup> in 47% of episodes. Overall, 89 of 363 episodes were categorized as CNS infection (25%; most commonly viral meningitis [7%], bacterial meningitis [7%], and viral encephalitis [4%]), 36 (10%) episodes as CNS inflammatory disorder, 111 (31%) as systemic infection, in 119 (33%) as other neurological disorder, and 8 (2%) as other systemic disorders. Diagnostic accuracy of individual clinical characteristics and blood tests for the diagnosis of CNS infection or bacterial meningitis was low. CSF leucocytosis differentiated best between bacterial meningitis and other diagnoses (area under the curve [AUC] 0.95) or any neurological infection versus other diagnoses (AUC 0.93).

    Conclusions: Clinical characteristics fail to differentiate between neurological infections and other diagnoses, and CSF analysis is the main contributor to the final diagnosis.

    Funded by: Medical Research Council: 1365620 ; Wellcome Trust: 098051

    The Journal of infection 2017;74;1;1-9

  • Association of Rare and Common Variation in the Lipoprotein Lipase Gene With Coronary Artery Disease.

    Khera AV, Won HH, Peloso GM, O'Dushlaine C, Liu D, Stitziel NO, Natarajan P, Nomura A, Emdin CA, Gupta N, Borecki IB, Asselta R, Duga S, Merlini PA, Correa A, Kessler T, Wilson JG, Bown MJ, Hall AS, Braund PS, Carey DJ, Murray MF, Kirchner HL, Leader JB, Lavage DR, Manus JN, Hartzel DN, Samani NJ, Schunkert H, Marrugat J, Elosua R, McPherson R, Farrall M, Watkins H, Lander ES, Rader DJ, Danesh J, Ardissino D, Gabriel S, Willer C, Abecasis GR, Saleheen D, Dewey FE, Kathiresan S and Myocardial Infarction Genetics Consortium, DiscovEHR Study Group, CARDIoGRAM Exome Consortium, and Global Lipids Genetics Consortium

    Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts2Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston3Cardiology Division, Massachusetts General Hospital, Harvard Medical School, Boston.

    Importance: The activity of lipoprotein lipase (LPL) is the rate-determining step in clearing triglyceride-rich lipoproteins from the circulation. Mutations that damage the LPL gene (LPL) lead to lifelong deficiency in enzymatic activity and can provide insight into the relationship of LPL to human disease.

    Objective: To determine whether rare and/or common variants in LPL are associated with early-onset coronary artery disease (CAD).

    Design, setting, and participants: In a cross-sectional study, LPL was sequenced in 10 CAD case-control cohorts of the multinational Myocardial Infarction Genetics Consortium and a nested CAD case-control cohort of the Geisinger Health System DiscovEHR cohort between 2010 and 2015. Common variants were genotyped in up to 305 699 individuals of the Global Lipids Genetics Consortium and up to 120 600 individuals of the CARDIoGRAM Exome Consortium between 2012 and 2014. Study-specific estimates were pooled via meta-analysis.

    Exposures: Rare damaging mutations in LPL included loss-of-function variants and missense variants annotated as pathogenic in a human genetics database or predicted to be damaging by computer prediction algorithms trained to identify mutations that impair protein function. Common variants in the LPL gene region included those independently associated with circulating triglyceride levels.

    Main outcomes and measures: Circulating lipid levels and CAD.

    Results: Among 46 891 individuals with LPL gene sequencing data available, the mean (SD) age was 50 (12.6) years and 51% were female. A total of 188 participants (0.40%; 95% CI, 0.35%-0.46%) carried a damaging mutation in LPL, including 105 of 32 646 control participants (0.32%) and 83 of 14 245 participants with early-onset CAD (0.58%). Compared with 46 703 noncarriers, the 188 heterozygous carriers of an LPL damaging mutation displayed higher plasma triglyceride levels (19.6 mg/dL; 95% CI, 4.6-34.6 mg/dL) and higher odds of CAD (odds ratio = 1.84; 95% CI, 1.35-2.51; P < .001). An analysis of 6 common LPL variants resulted in an odds ratio for CAD of 1.51 (95% CI, 1.39-1.64; P = 1.1 × 10-22) per 1-SD increase in triglycerides.

    Conclusions and relevance: The presence of rare damaging mutations in LPL was significantly associated with higher triglyceride levels and presence of coronary artery disease. However, further research is needed to assess whether there are causal mechanisms by which heterozygous lipoprotein lipase deficiency could lead to coronary artery disease.

    Funded by: British Heart Foundation: CS/14/2/30841, MR/L003120/1, RE/13/1/30181, RG/08/014/24067, RG/13/13/30194; European Research Council: 268834; Medical Research Council: MR/L003120/1, MR/L01629X/1; NCATS NIH HHS: KL2 TR001100; NHGRI NIH HHS: U54 HG003067; NHLBI NIH HHS: HHSN268201300046C, HHSN268201300047C, HHSN268201300048C, HHSN268201300049C, HHSN268201300050C, K01 HL125751, K08 HL114642, R01 HL109946, R01 HL127564, R01 HL131961, R35 HL135824, RC2 HL102923, RC2 HL102924, RC2 HL102925, RC2 HL102926, RC2 HL103010; Wellcome Trust

    JAMA 2017;317;9;937-946

  • You are where you live.

    Kidman SE and Bryant JM

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    This month's Genome Watch discusses how whole-genome sequencing of bacteria from several body sites has provided insights into the spatial diversity of bacteria within patients.

    Nature reviews. Microbiology 2017;15;2;68

  • Common genetic variation drives molecular heterogeneity in human iPSCs.

    Kilpinen H, Goncalves A, Leha A, Afzal V, Alasoo K, Ashford S, Bala S, Bensaddek D, Casale FP, Culley OJ, Danecek P, Faulconbridge A, Harrison PW, Kathuria A, McCarthy D, McCarthy SA, Meleckyte R, Memari Y, Moens N, Soares F, Mann A, Streeter I, Agu CA, Alderton A, Nelson R, Harper S, Patel M, White A, Patel SR, Clarke L, Halai R, Kirton CM, Kolb-Kokocinski A, Beales P, Birney E, Danovi D, Lamond AI, Ouwehand WH, Vallier L, Watt FM, Durbin R, Stegle O and Gaffney DJ

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

    Technology utilizing human induced pluripotent stem cells (iPS cells) has enormous potential to provide improved cellular models of human disease. However, variable genetic and phenotypic characterization of many existing iPS cell lines limits their potential use for research and therapy. Here we describe the systematic generation, genotyping and phenotyping of 711 iPS cell lines derived from 301 healthy individuals by the Human Induced Pluripotent Stem Cells Initiative. Our study outlines the major sources of genetic and phenotypic variation in iPS cells and establishes their suitability as models of complex human traits and cancer. Through genome-wide profiling we find that 5-46% of the variation in different iPS cell phenotypes, including differentiation capacity and cellular morphology, arises from differences between individuals. Additionally, we assess the phenotypic consequences of genomic copy-number alterations that are repeatedly observed in iPS cells. In addition, we present a comprehensive map of common regulatory variants affecting the transcriptome of human pluripotent cells.

    Funded by: Medical Research Council: G0801843, MC_PC_12009, MC_PC_12026; Wellcome Trust: WT090851

    Nature 2017;546;7658;370-375

  • Detection of structural mosaicism from targeted and whole-genome sequencing data.

    King DA, Sifrim A, Fitzgerald TW, Rahbari R, Hobson E, Homfray T, Mansour S, Mehta SG, Shehla M, Tomkins SE, Vasudevan PC, Hurles ME and Deciphering Developmental Disorders Study

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom.

    Structural mosaic abnormalities are large post-zygotic mutations present in a subset of cells and have been implicated in developmental disorders and cancer. Such mutations have been conventionally assessed in clinical diagnostics using cytogenetic or microarray testing. Modern disease studies rely heavily on exome sequencing, yet an adequate method for the detection of structural mosaicism using targeted sequencing data is lacking. Here, we present a method, called MrMosaic, to detect structural mosaic abnormalities using deviations in allele fraction and read coverage from next-generation sequencing data. Whole-exome sequencing (WES) and whole-genome sequencing (WGS) simulations were used to calculate detection performance across a range of mosaic event sizes, types, clonalities, and sequencing depths. The tool was applied to 4911 patients with undiagnosed developmental disorders, and 11 events among nine patients were detected. For eight of these 11 events, mosaicism was observed in saliva but not blood, suggesting that assaying blood alone would miss a large fraction, possibly >50%, of mosaic diagnostic chromosomal rearrangements.

    Funded by: Wellcome Trust: WT098051

    Genome research 2017;27;10;1704-1714

  • Proliferation Drives Aging-Related Functional Decline in a Subpopulation of the Hematopoietic Stem Cell Compartment.

    Kirschner K, Chandra T, Kiselev V, Flores-Santa Cruz D, Macaulay IC, Park HJ, Li J, Kent DG, Kumar R, Pask DC, Hamilton TL, Hemberg M, Reik W and Green AR

    Cambridge Institute for Medical Research, University of Cambridge, Cambridge, Cambridgeshire CB2 0XY, UK; Department of Haematology, University of Cambridge, Cambridge, Cambridgeshire CB2 0XY, UK; Stem Cell Institute, University of Cambridge, Cambridge, Cambridgeshire CB2 0XY, UK; Institute for Cancer Sciences, University of Glasgow, Glasgow, Lanarkshire G61 1BD, UK. Electronic address:

    Aging of the hematopoietic stem cell (HSC) compartment is characterized by lineage bias and reduced stem cell function, the molecular basis of which is largely unknown. Using single-cell transcriptomics, we identified a distinct subpopulation of old HSCs carrying a p53 signature indicative of stem cell decline alongside pro-proliferative JAK/STAT signaling. To investigate the relationship between JAK/STAT and p53 signaling, we challenged HSCs with a constitutively active form of JAK2 (V617F) and observed an expansion of the p53-positive subpopulation in old mice. Our results reveal cellular heterogeneity in the onset of HSC aging and implicate a role for JAK2V617F-driven proliferation in the p53-mediated functional decline of old HSCs.

    Funded by: Medical Research Council: MC_PC_12009

    Cell reports 2017;19;8;1503-1511

  • SC3: consensus clustering of single-cell RNA-seq data.

    Kiselev VY, Kirschn