Sanger Institute - Publications 2019

Number of papers published in 2019: 676

  • Terminal-Repeat Retrotransposons in Miniature (TRIMs) in bivalves.

    Šatović E, Luchetti A, Pasantes JJ, García-Souto D, Cedilak A, Mantovani B and Plohl M

    Division of Molecular Biology, Ruđer Bošković Institute, Zagreb, Croatia.

    Terminal repeat retrotransposons in miniature (TRIMs) are small non-autonomous LTR retrotransposons consisting of two terminal direct repeats surrounding a short internal domain. The detection and characterization of these elements has been mainly limited to plants. Here we present the first finding of a TRIM element in bivalves, and among the first known in the kingdom Animalia. Class Bivalvia has high ecological and commercial importance in marine ecosystems and aquaculture, and, in recent years, an increasing number of genomic studies has addressed to these organisms. We have identified biv-TRIM in several bivalve species: Donax trunculus, Ruditapes decussatus, R. philippinarum, Venerupis corrugata, Polititapes rhomboides, Venus verrucosa, Dosinia exoleta, Glycymeris glycymeris, Cerastoderma edule, Magallana gigas, Mytilus galloprovincialis. biv-TRIM has several characteristics typical for this group of elements, exhibiting different variations. In addition to canonically structured elements, solo-TDRs and tandem repeats were detected. The presence of this element in the genome of each species is <1%. The phylogenetic analysis showed a complex clustering pattern of biv-TRIM elements, and indicates the involvement of horizontal transfer in the spreading of this element.

    Funded by: Ministarstvo Znanosti, Obrazovanja i Sporta (Ministry of Science, Education and Sports): 098-0982913-2756

    Scientific reports 2019;9;1;19962

  • Assays for functionally defined normal and malignant mammary stem cells.

    Aalam SMM, Beer PA and Kannan N

    Laboratory of Stem Cell and Cancer Biology, Division of Experimental Pathology and Laboratory Medicine, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, United States.

    The discovery of rare, heterogeneous self-renewing stem cells with shared developmental and molecular features within epithelial components of mammary gland and breast cancers has provided a conceptual framework to understand cellular composition of these tissues and mechanisms that control their number. These normal mammary epithelial stem cells (MaSCs) and breast cancer stem cells (BCSCs) were identified and analyzed using transplant assays (namely mammary repopulating unit (MRU) assay, mammary tumor-initiating cell (TIC) assay), which reveal their latent ability to regenerate respective normal and malignant epithelial tissues with self-renewing units displaying hierarchical cellular differentiation over multiple generations in recipient mice. "Next-generation" methods using "barcoded" normal and malignant mammary cells, with the help of next-generation sequencing (NGS) technology, have revealed hidden complexity and heterogeneous growth potential of MaSCs and BCSCs. Several single markers or combinations of markers have been reported to prospectively enrich MaSCs and BCSCs. Such markers and the extent to which they enrich for MaSCs and BCSCs activity require a critical appraisal. Also, knowledge of the functional assays and their limitations and harmonious reporting of results is a prerequisite to improve our understanding of MaSCs and BCSCs. This chapter describes evolution of the concept of MaSCs and BCSCs, and specific methodologies to investigate them.

    Advances in cancer research 2019;141;129-174

  • Integrated transcriptomic and proteomic analysis of pathogenic mycobacteria and their esx-1 mutants reveal secretion-dependent regulation of ESX-1 substrates and WhiB6 as a transcriptional regulator.

    Abdallah AM, Weerdenburg EM, Guan Q, Ummels R, Borggreve S, Adroub SA, Malas TB, Naeem R, Zhang H, Otto TD, Bitter W and Pain A

    Pathogen Genomics Laboratory, BESE Division, King Abdullah University of Science and Technology (KAUST), Thuwal-Jeddah, Kingdom of Saudi Arabia.

    The mycobacterial type VII secretion system ESX-1 is responsible for the secretion of a number of proteins that play important roles during host infection. The regulation of the expression of secreted proteins is often essential to establish successful infection. Using transcriptome sequencing, we found that the abrogation of ESX-1 function in Mycobacterium marinum leads to a pronounced increase in gene expression levels of the espA operon during the infection of macrophages. In addition, the disruption of ESX-1-mediated protein secretion also leads to a specific down-regulation of the ESX-1 substrates, but not of the structural components of this system, during growth in culture medium. This effect is observed in both M. marinum and M. tuberculosis. We established that down-regulation of ESX-1 substrates is the result of a regulatory process that is influenced by the putative transcriptional regulator whib6, which is located adjacent to the esx-1 locus. In addition, the overexpression of the ESX-1-associated PE35/PPE68 protein pair resulted in a significantly increased secretion of the ESX-1 substrate EsxA, demonstrating a functional link between these proteins. Taken together, these data show that WhiB6 is required for the secretion-dependent regulation of ESX-1 substrates and that ESX-1 substrates are regulated independently from the structural components, both during infection and as a result of active secretion.

    PloS one 2019;14;1;e0211003

  • Genomic risk score offers predictive performance comparable to clinical risk factors for ischaemic stroke.

    Abraham G, Malik R, Yonova-Doing E, Salim A, Wang T, Danesh J, Butterworth AS, Howson JMM, Inouye M and Dichgans M

    Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia.

    Recent genome-wide association studies in stroke have enabled the generation of genomic risk scores (GRS) but their predictive power has been modest compared to established stroke risk factors. Here, using a meta-scoring approach, we develop a metaGRS for ischaemic stroke (IS) and analyse this score in the UK Biobank (n = 395,393; 3075 IS events by age 75). The metaGRS hazard ratio for IS (1.26, 95% CI 1.22-1.31 per metaGRS standard deviation) doubles that of a previous GRS, identifying a subset of individuals at monogenic levels of risk: the top 0.25% of metaGRS have three-fold risk of IS. The metaGRS is similarly or more predictive compared to several risk factors, such as family history, blood pressure, body mass index, and smoking. We estimate the reductions needed in modifiable risk factors for individuals with different levels of genomic risk and suggest that, for individuals with high metaGRS, achieving risk factor levels recommended by current guidelines may be insufficient to mitigate risk.

    Funded by: British Heart Foundation: RG/18/13/33946; British Heart Foundation (BHF): RG/13/13/30194; Medical Research Council: MC_PC_12028, MC_PC_17228, MC_QA137853; RCUK | Medical Research Council (MRC): MR/L003120/1

    Nature communications 2019;10;1;5819

  • Metastasis in the wild: investigating metastasis in non-laboratory animals.

    Abu-Helil B and van der Weyden L

    Experimental Cancer Genetics (T113), Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Humans are not the only species to spontaneously develop metastatic cancer as cases of metastasis have been reported in a wide range of animals, including dinosaurs. Mouse models have been an invaluable tool in experimental and clinical metastasis research, with the use of genetically-engineered mouse models that spontaneously develop metastasis or ectopic/orthotopic transplantation of tumour cells to wildtype or immunodeficient mice being responsible for many key advances in our understanding of metastasis. However, are there other species that can also be relevant models? Similarities to humans in terms of environmental exposures, life-span, genetics, histopathology and available therapeutics are all factors that can be considered when looking at species other than the laboratory mouse. This review will explore the occurrence of metastasis in multiple species from a variety of domestic, captive and free-living veterinary cases to assist in identifying potential alternative experimental and clinical research models relevant to humans.

    Clinical & experimental metastasis 2019;36;1;15-28

  • PANINI: Pangenome Neighbour Identification for Bacterial Populations.

    Abudahab K, Prada JM, Yang Z, Bentley SD, Croucher NJ, Corander J and Aanensen DM

    1​Centre for Genomic Pathogen Surveillance, Wellcome Genome Campus, Hinxton, UK.

    The standard workhorse for genomic analysis of the evolution of bacterial populations is phylogenetic modelling of mutations in the core genome. However, a notable amount of information about evolutionary and transmission processes in diverse populations can be lost unless the accessory genome is also taken into consideration. Here, we introduce panini (Pangenome Neighbour Identification for Bacterial Populations), a computationally scalable method for identifying the neighbours for each isolate in a data set using unsupervised machine learning with stochastic neighbour embedding based on the t-SNE (t-distributed stochastic neighbour embedding) algorithm. panini is browser-based and integrates with the Microreact platform for rapid online visualization and exploration of both core and accessory genome evolutionary signals, together with relevant epidemiological, geographical, temporal and other metadata. Several case studies with single- and multi-clone pneumococcal populations are presented to demonstrate the ability to identify biologically important signals from gene content data. panini is available at and code at

    Funded by: Wellcome Trust

    Microbial genomics 2019;5;4

  • ZRANB3 is an African-specific type 2 diabetes locus associated with beta-cell mass and insulin response.

    Adeyemo AA, Zaghloul NA, Chen G, Doumatey AP, Leitch CC, Hostelley TL, Nesmith JE, Zhou J, Bentley AR, Shriner D, Fasanmade O, Okafor G, Eghan B, Agyenim-Boateng K, Chandrasekharappa S, Adeleye J, Balogun W, Owusu S, Amoah A, Acheampong J, Johnson T, Oli J, Adebamowo C, South Africa Zulu Type 2 Diabetes Case-Control Study, Collins F, Dunston G and Rotimi CN

    Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, 20892, MD, USA.

    Genome analysis of diverse human populations has contributed to the identification of novel genomic loci for diseases of major clinical and public health impact. Here, we report a genome-wide analysis of type 2 diabetes (T2D) in sub-Saharan Africans, an understudied ancestral group. We analyze ~18 million autosomal SNPs in 5,231 individuals from Nigeria, Ghana and Kenya. We identify a previously-unreported genome-wide significant locus: ZRANB3 (Zinc Finger RANBP2-Type Containing 3, lead SNP p = 2.831 × 10<sup>-9</sup>). Knockdown or genomic knockout of the zebrafish ortholog results in reduction in pancreatic β-cell number which we demonstrate to be due to increased apoptosis in islets. siRNA transfection of murine Zranb3 in MIN6 β-cells results in impaired insulin secretion in response to high glucose, implicating Zranb3 in β-cell functional response to high glucose conditions. We also show transferability in our study of 32 established T2D loci. Our findings advance understanding of the genetics of T2D in non-European ancestry populations.

    Funded by: FIC NIH HHS: T37 TW000041; NHGRI NIH HHS: ZIA HG200362; NIDDK NIH HHS: F31 DK115179, R01 DK102001, T32 DK098107; NIGMS NIH HHS: R25 GM055036

    Nature communications 2019;10;1;3195

  • Low RUNX3 expression alters dendritic cell function in patients with systemic sclerosis and contributes to enhanced fibrosis.

    Affandi AJ, Carvalheiro T, Ottria A, Broen JC, Bossini-Castillo L, Tieland RG, Bon LV, Chouri E, Rossato M, Mertens JS, Garcia S, Pandit A, de Kroon LM, Christmann RB, Martin J, van Roon JA, Radstake TR and Marut W

    Laboratory of Translational Immunology, University Medical Centre Utrecht, Utrecht University, Utrecht, The Netherlands.

    Objectives: Systemic sclerosis (SSc) is an autoimmune disease with unknown pathogenesis manifested by inflammation, vasculopathy and fibrosis in skin and internal organs. Type I interferon signature found in SSc propelled us to study plasmacytoid dendritic cells (pDCs) in this disease. We aimed to identify candidate pathways underlying pDC aberrancies in SSc and to validate its function on pDC biology.

    Methods: In total, 1193 patients with SSc were compared with 1387 healthy donors and 8 patients with localised scleroderma. PCR-based transcription factor profiling and methylation status analyses, single nucleotide polymorphism genotyping by sequencing and flow cytometry analysis were performed in pDCs isolated from the circulation of healthy controls or patients with SSc. pDCs were also cultured under hypoxia, inhibitors of methylation and hypoxia-inducible factors and runt-related transcription factor 3 (RUNX3) levels were determined. To study Runx3 function, <i>Itgax</i>-Cre:<i>Runx3</i><sup>f/f</sup> mice were used in in vitro functional assay and bleomycin-induced SSc skin inflammation and fibrosis model.

    Results: Here, we show downregulation of transcription factor RUNX3 in SSc pDCs. A higher methylation status of the <i>RUNX3</i> gene, which is associated with polymorphism rs6672420, correlates with lower <i>RUNX3</i> expression and SSc susceptibility. Hypoxia is another factor that decreases <i>RUNX3</i> level in pDC. Mouse pDCs deficient of <i>Runx3</i> show enhanced maturation markers on CpG stimulation. In vivo, deletion of <i>Runx3</i> in dendritic cell leads to spontaneous induction of skin fibrosis in untreated mice and increased severity of bleomycin-induced skin fibrosis.

    Conclusions: We show at least two pathways potentially causing low RUNX3 level in SSc pDCs, and we demonstrate the detrimental effect of loss of <i>Runx3</i> in SSc model further underscoring the role of pDCs in this disease.

    Annals of the rheumatic diseases 2019;78;9;1249-1259

  • Genomic analysis of respiratory syncytial virus infections in households and utility in inferring who infects the infant.

    Agoti CN, Phan MVT, Munywoki PK, Githinji G, Medley GF, Cane PA, Kellam P, Cotten M and Nokes DJ

    Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Epidemiology and Demography Department, Kilifi, Kenya.

    Infants (under 1-year-old) are at most risk of life threatening respiratory syncytial virus (RSV) disease. RSV epidemiological data alone has been insufficient in defining who acquires infection from whom (WAIFW) within households. We investigated RSV genomic variation within and between infected individuals and assessed its potential utility in tracking transmission in households. Over an entire single RSV season in coastal Kenya, nasal swabs were collected from members of 20 households every 3-4 days regardless of symptom status and screened for RSV nucleic acid. Next generation sequencing was used to generate >90% RSV full-length genomes for 51.1% of positive samples (191/374). Single nucleotide polymorphisms (SNPs) observed during household infection outbreaks ranged from 0-21 (median: 3) while SNPs observed during single-host infection episodes ranged from 0-17 (median: 1). Using the viral genomic data alone there was insufficient resolution to fully reconstruct within-household transmission chains. For households with clear index cases, the most likely source of infant infection was via a toddler (aged 1 to <3 years-old) or school-aged (aged 6 to <12 years-old) co-occupant. However, for best resolution of WAIFW within households, we suggest an integrated analysis of RSV genomic and epidemiological data.

    Funded by: Wellcome Trust (Wellcome): 090853, 102975, 107769

    Scientific reports 2019;9;1;10076

  • Fat storage-inducing transmembrane protein 2 (FIT2) is less abundant in type 2 diabetes, and regulates triglyceride accumulation and insulin sensitivity in adipocytes.

    Agrawal M, Yeo CR, Shabbir A, Chhay V, Silver DL, Magkos F, Vidal-Puig A and Toh SA

    Department of Pharmacology and Nutritional Sciences, University of Kentucky, Lexington, Kentucky, USA.

    Fat storage-inducing transmembrane protein 2 (FIT2) aids in partitioning of cellular triacylglycerol into lipid droplets. A genome-wide association study reported FITM2-R3H domain containing like-HNF4A locus to be associated with type 2 diabetes (T2DM) in East Asian populations. Mice with adipose tissue (AT)-specific FIT2 knockout exhibited lipodystrophic features, with reduced AT mass, insulin resistance, and greater inflammation in AT when fed a high-fat diet. The role of FIT2 in regulating human adipocyte function is not known. Here, we found FIT2 protein abundance is lower in subcutaneous and omental AT obtained from patients with T2DM compared with nondiabetic control subjects. Partial loss of FIT2 protein in primary human adipocytes attenuated their lipid storage capacity and induced insulin resistance. After palmitate treatment, triacylglycerol accumulation, insulin-induced Akt (Ser-473) phosphorylation, and insulin-stimulated glucose uptake were significantly reduced in FIT2 knockdown adipocytes compared with control cells. Gene expression of proinflammatory cytokines IL-18 and IL-6 and phosphorylation of the endoplasmic reticulum stress marker inositol-requiring enzyme 1α were greater in FIT2 knockdown adipocytes than in control cells. Our results show for the first time that FIT2 is associated with T2DM in humans and plays an integral role in maintaining metabolically healthy AT function.-Agrawal, M., Yeo, C. R., Shabbir, A., Chhay, V., Silver, D. L., Magkos, F., Vidal-Puig, A., Toh, S.-A. Fat storage-inducing transmembrane protein 2 (FIT2) is less abundant in type 2 diabetes, and regulates triglyceride accumulation and insulin sensitivity in adipocytes.

    Funded by: Medical Research Council: G0400192, G0802051, MC_UU_12012/2

    FASEB journal : official publication of the Federation of American Societies for Experimental Biology 2019;33;1;430-440

  • Seq-Well: A Sample-Efficient, Portable Picowell Platform for Massively Parallel Single-Cell RNA Sequencing.

    Aicher TP, Carroll S, Raddi G, Gierahn T, Wadsworth MH, Hughes TK, Love C and Shalek AK

    Ragon Institute of MGH, Harvard, and MIT, Cambridge, MA, USA.

    Seq-Well is a low-cost picowell platform that can be used to simultaneously profile the transcriptomes of thousands of cells from diverse, low input clinical samples. In Seq-Well, uniquely barcoded mRNA capture beads and cells are co-confined in picowells that are sealed using a semipermeable membrane, enabling efficient cell lysis and mRNA capture. The beads are subsequently removed and processed in parallel for sequencing, with each transcript's cell of origin determined via the unique barcodes. Due to its simplicity and portability, Seq-Well can be performed almost anywhere.

    Funded by: NCI NIH HHS: R33 CA202820, U54 CA217377; NHGRI NIH HHS: RM1 HG006193; NHLBI NIH HHS: R01 HL095791; NIAID NIH HHS: P01 AI039671, P01 AI045757, R01 AI138546, R21 AI106025, R56 AI104274, U19 AI089992, U24 AI118672, Z01 AI000947; NIDA NIH HHS: R01 DA046277; NIGMS NIH HHS: T32 GM008042

    Methods in molecular biology (Clifton, N.J.) 2019;1979;111-132

  • Finding Diagnostically Useful Patterns in Quantitative Phenotypic Data.

    Aitken S, Firth HV, McRae J, Halachev M, Kini U, Parker MJ, Lees MM, Lachlan K, Sarkar A, Joss S, Splitt M, McKee S, Németh AH, Scott RH, Wright CF, Marsh JA, Hurles ME, FitzPatrick DR and DDD Study

    MRC Human Genetics Unit, Institute of Genetic and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK.

    Trio-based whole-exome sequence (WES) data have established confident genetic diagnoses in ∼40% of previously undiagnosed individuals recruited to the Deciphering Developmental Disorders (DDD) study. Here we aim to use the breadth of phenotypic information recorded in DDD to augment diagnosis and disease variant discovery in probands. Median Euclidean distances (mEuD) were employed as a simple measure of similarity of quantitative phenotypic data within sets of ≥10 individuals with plausibly causative de novo mutations (DNM) in 28 different developmental disorder genes. 13/28 (46.4%) showed significant similarity for growth or developmental milestone metrics, 10/28 (35.7%) showed similarity in HPO term usage, and 12/28 (43%) showed no phenotypic similarity. Pairwise comparisons of individuals with high-impact inherited variants to the 32 individuals with causative DNM in ANKRD11 using only growth z-scores highlighted 5 likely causative inherited variants and two unrecognized DNM resulting in an 18% diagnostic uplift for this gene. Using an independent approach, naive Bayes classification of growth and developmental data produced reasonably discriminative models for the 24 DNM genes with sufficiently complete data. An unsupervised naive Bayes classification of 6,993 probands with WES data and sufficient phenotypic information defined 23 in silico syndromes (ISSs) and was used to test a "phenotype first" approach to the discovery of causative genotypes using WES variants strictly filtered on allele frequency, mutation consequence, and evidence of constraint in humans. This highlighted heterozygous de novo nonsynonymous variants in SPTBN2 as causative in three DDD probands.

    Funded by: Medical Research Council: MC_PC_16018, MC_UU_00007/3, MR/M02122X/1; Wellcome Trust

    American journal of human genetics 2019;105;5;933-946

  • A novel Ancestral Beijing sublineage of Mycobacterium tuberculosis suggests the transition site to Modern Beijing sublineages.

    Ajawatanawong P, Yanai H, Smittipat N, Disratthakit A, Yamada N, Miyahara R, Nedsuwan S, Imasanguan W, Kantipong P, Chaiyasirinroje B, Wongyai J, Plitphonganphim S, Tantivitayakul P, Phelan J, Parkhill J, Clark TG, Hibberd ML, Ruangchai W, Palittapongarnpim P, Juthayothin T, Thawornwattana Y, Viratyosin W, Tongsima S, Mahasirimongkol S, Tokunaga K and Palittapongarnpim P

    Department of Microbiology, Faculty of Science, Mahidol University, Rama 6 Road, Bangkok, Thailand.

    Global Mycobacterium tuberculosis population comprises 7 major lineages. The Beijing strains, particularly the ones classified as Modern groups, have been found worldwide, frequently associated with drug resistance, younger ages, outbreaks and appear to be expanding. Here, we report analysis of whole genome sequences of 1170 M. tuberculosis isolates together with their patient profiles. Our samples belonged to Lineage 1-4 (L1-L4) with those of L1 and L2 being equally dominant. Phylogenetic analysis revealed several new or rare sublineages. Differential associations between sublineages of M. tuberculosis and patient profiles, including ages, ethnicity, HIV (human immunodeficiency virus) infection and drug resistance were demonstrated. The Ancestral Beijing strains and some sublineages of L4 were associated with ethnic minorities while L1 was more common in Thais. L2.2.1.Ancestral 4 surprisingly had a mutation that is typical of the Modern Beijing sublineages and was common in Akha and Lahu tribes who have migrated from Southern China in the last century. This may indicate that the evolutionary transition from the Ancestral to Modern Beijing sublineages might be gradual and occur in Southern China, where the presence of multiple ethnic groups might have allowed for the circulations of various co-evolving sublineages which ultimately lead to the emergence of the Modern Beijing strains.

    Scientific reports 2019;9;1;13718

  • Identification of functional long non-coding RNAs in C. elegans.

    Akay A, Jordan D, Navarro IC, Wrzesinski T, Ponting CP, Miska EA and Haerty W

    Wellcome CRUK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK.

    Background: Functional characterisation of the compact genome of the model organism Caenorhabditis elegans remains incomplete despite its sequencing 20 years ago. The last decade of research has seen a tremendous increase in the number of non-coding RNAs identified in various organisms. While we have mechanistic understandings of small non-coding RNA pathways, long non-coding RNAs represent a diverse class of active transcripts whose function remains less well characterised.

    Results: By analysing hundreds of published transcriptome datasets, we annotated 3392 potential lncRNAs including 143 multi-exonic loci that showed increased nucleotide conservation and GC content relative to other non-coding regions. Using CRISPR/Cas9 genome editing, we generated deletion mutants for ten long non-coding RNA loci. Using automated microscopy for in-depth phenotyping, we show that six of the long non-coding RNA loci are required for normal development and fertility. Using RNA interference-mediated gene knock-down, we provide evidence that for two of the long non-coding RNA loci, the observed phenotypes are dependent on the corresponding RNA transcripts.

    Conclusions: Our results highlight that a large section of the non-coding regions of the C. elegans genome remains unexplored. Based on our in vivo analysis of a selection of high-confidence lncRNA loci, we expect that a significant proportion of these high-confidence regions is likely to have a biological function at either the genomic or the transcript level.

    Funded by: Biotechnology and Biological Sciences Research Council: BBS/E/T/000PR9783, BBS/E/T/000PR9818; Biotechnology and Biological Sciences Research Council (GB): BB/P016774/1; Cancer Research UK: C13474/A18583, C6946/A14492; Medical Research Council: MR/P026028/1; Wellcome Trust: 092096/Z/10/Z, 104640/Z/14/Z

    BMC biology 2019;17;1;14

  • Human Antibodies that Slow Erythrocyte Invasion Potentiate Malaria-Neutralizing Antibodies.

    Alanine DGW, Quinkert D, Kumarasingha R, Mehmood S, Donnellan FR, Minkah NK, Dadonaite B, Diouf A, Galaway F, Silk SE, Jamwal A, Marshall JM, Miura K, Foquet L, Elias SC, Labbé GM, Douglas AD, Jin J, Payne RO, Illingworth JJ, Pattinson DJ, Pulido D, Williams BG, de Jongh WA, Wright GJ, Kappe SHI, Robinson CV, Long CA, Crabb BS, Gilson PR, Higgins MK and Draper SJ

    The Jenner Institute, University of Oxford, Old Road Campus Research Building, Oxford OX3 7DQ, UK; Department of Biochemistry, University of Oxford, South Parks Road, Oxford OX1 3QU, UK.

    The Plasmodium falciparum reticulocyte-binding protein homolog 5 (PfRH5) is the leading target for next-generation vaccines against the disease-causing blood-stage of malaria. However, little is known about how human antibodies confer functional immunity against this antigen. We isolated a panel of human monoclonal antibodies (mAbs) against PfRH5 from peripheral blood B cells from vaccinees in the first clinical trial of a PfRH5-based vaccine. We identified a subset of mAbs with neutralizing activity that bind to three distinct sites and another subset of mAbs that are non-functional, or even antagonistic to neutralizing antibodies. We also identify the epitope of a novel group of non-neutralizing antibodies that significantly reduce the speed of red blood cell invasion by the merozoite, thereby potentiating the effect of all neutralizing PfRH5 antibodies as well as synergizing with antibodies targeting other malaria invasion proteins. Our results provide a roadmap for structure-guided vaccine development to maximize antibody efficacy against blood-stage malaria.

    Cell 2019;178;1;216-228.e21

  • Genetic effects on promoter usage are highly context-specific and contribute to complex traits.

    Alasoo K, Rodrigues J, Danesh J, Freitag DF, Paul DS and Gaffney DJ

    Institute of Computer Science, University of Tartu, Tartu, Estonia.

    Genetic variants regulating RNA splicing and transcript usage have been implicated in both common and rare diseases. Although transcript usage quantitative trait loci (tuQTLs) have been mapped across multiple cell types and contexts, it is challenging to distinguish between the main molecular mechanisms controlling transcript usage: promoter choice, splicing and 3' end choice. Here, we analysed RNA-seq data from human macrophages exposed to three inflammatory and one metabolic stimulus. In addition to conventional gene-level and transcript-level analyses, we also directly quantified promoter usage, splicing and 3' end usage. We found that promoters, splicing and 3' ends were predominantly controlled by independent genetic variants enriched in distinct genomic features. Promoter usage QTLs were also 50% more likely to be context-specific than other tuQTLs and constituted 25% of the transcript-level colocalisations with complex traits. Thus, promoter usage might be an underappreciated molecular mechanism mediating complex trait associations in a context-specific manner.

    Funded by: British Heart Foundation: RG/13/13/30194; British Heart Foundation Cambridge Centre of Excellence: RE/13/6/30180; Estonian Research Council: IUT34-4, MOBJD67; Medical Research Council: MR/L003120/1; Wellcome: WT09805, WT098503, WT099754/Z/12/Z; Wellcome Trust

    eLife 2019;8

  • Emergence of phylogenetically diverse and fluoroquinolone resistant Salmonella Enteritidis as a cause of invasive nontyphoidal Salmonella disease in Ghana.

    Aldrich C, Hartman H, Feasey N, Chattaway MA, Dekker D, Al-Emran HM, Larkin L, McCormick J, Sarpong N, Le Hello S, Adu-Sarkodie Y, Panzner U, Park SE, Im J, Marks F, May J, Dallman TJ and Eibach D

    Department of Infectious Disease Epidemiology, Bernhard Nocht Institute for Tropical Medicine, Hamburg, Germany.

    Background: Salmonella enterica serovar Enteritidis is a cause of both poultry- and egg-associated enterocolitis globally and bloodstream-invasive nontyphoidal Salmonella (iNTS) disease in sub-Saharan Africa (sSA). Distinct, multi-drug resistant genotypes associated with iNTS disease in sSA have recently been described, often requiring treatment with fluoroquinolone antibiotics. In industrialised countries, antimicrobial use in poultry production has led to frequent fluoroquinolone resistance amongst globally prevalent enterocolitis-associated lineages.

    Methodology/principal findings: Twenty seven S. Enteritidis isolates from patients with iNTS disease and two poultry isolates, collected between 2007 and 2015 in the Ashanti region of Ghana, were whole-genome sequenced. These isolates, notable for a high rate of diminished ciprofloxacin susceptibility (DCS), were placed in the phyletic context of 1,067 sequences from the Public Health England (PHE) S. Enteritidis genome database to understand whether DCS was associated with African or globally-circulating clades of S. Enteritidis. Analysis showed four of the major S. Enteritidis clades were represented, two global and two African. All thirteen DCS isolates, containing a single gyrA mutation at codon 87, belonged to a global PT4-like clade responsible for epidemics of poultry-associated enterocolitis. Apart from two DCS isolates, which clustered with PHE isolates associated with travel to Spain and Brazil, the remaining DCS isolates, including one poultry isolate, belonged to two monophyletic clusters in which gyrA 87 mutations appear to have developed within the region.

    Conclusions/significance: Extensive phylogenetic diversity is evident amongst iNTS disease-associated S. Enteritidis in Ghana. Antimicrobial resistance profiles differed by clade, highlighting the challenges of devising empirical sepsis guidelines. The detection of fluoroquinolone resistance in phyletically-related poultry and human isolates is of major concern and surveillance and control measures within the region's burgeoning poultry industry are required to protect a human population at high risk of iNTS disease.

    PLoS neglected tropical diseases 2019;13;6;e0007485

  • Sociodemographic patterns of health insurance coverage in Namibia.

    Allcock SH, Young EH and Sandhu MS

    Department of Medicine, University of Cambridge, Cambridge, Cambridgeshire, UK.

    Introduction: Health insurance has been found to increase healthcare utilisation and reduce catastrophic health expenditures in a number of countries; however, coverage is often unequally distributed among populations. The sociodemographic patterns of health insurance in Namibia are not fully understood. We aimed to assess the prevalence of health insurance, the relation between health insurance and health service utilisation and to explore the sociodemographic factors associated with health insurance in Namibia. Such findings may help to inform health policy to improve financial access to healthcare in the country.

    Methods: Using data on 14,443 individuals, aged 15 to 64 years, from the 2013 Namibia Demographic and Health Survey, the association between health insurance and health service utilisation was investigated using multivariable mixed effects Poisson regression analyses, adjusted for sociodemographic covariates and regional, enumeration area and household clustering. Multivariable mixed effects Poisson regression analyses were also conducted to explore the association between key sociodemographic factors and health insurance, adjusted for covariates and clustering. Effect modification by sex, education level and wealth quintile was also explored.

    Results: Just 17.5% of this population were insured (men: 20.2%; women: 16.2%). In fully-adjusted analyses, education was significantly positively associated with health insurance, independent of other sociodemographic factors (higher education RR: 3.98; 95% CI: 3.11-5.10; p < 0.001). Female sex (RR: 0.83; 95% CI: 0.74-0.94; p = 0.003) and wealth (highest wealth quintile RR: 13.47; 95% CI: 9.06-20.04; p < 0.001) were also independently associated with insurance. There was a complex interaction between sex, education and wealth in the context of health insurance. With increasing education level, women were more likely to be insured (p for interaction < 0.001), and education had a greater impact on the likelihood of health insurance in lower wealth quintiles.

    Conclusions: In this population, health insurance was associated with health service utilisation but insurance coverage was low, and was independently associated with sex, education and wealth. Education may play a key role in health insurance coverage, especially for women and the less wealthy. These findings may help to inform the targeting of strategies to improve financial protection from healthcare-associated costs in Namibia.

    Funded by: Medical Research Council: MR/K013491/1; Wellcome Trust: 206194

    International journal for equity in health 2019;18;1;16

  • JACKS: joint analysis of CRISPR/Cas9 knockout screens.

    Allen F, Behan F, Khodak A, Iorio F, Yusa K, Garnett M and Parts L

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom.

    Genome-wide CRISPR/Cas9 knockout screens are revolutionizing mammalian functional genomics. However, their range of applications remains limited by signal variability from different guide RNAs that target the same gene, which confounds gene effect estimation and dictates large experiment sizes. To address this problem, we report JACKS, a Bayesian method that jointly analyzes screens performed with the same guide RNA library. Modeling the variable guide efficacies greatly improves hit identification over processing a single screen at a time and outperforms existing methods. This more efficient analysis gives additional hits and allows designing libraries with a 2.5-fold reduction in required cell numbers without sacrificing performance compared to current analysis standards.

    Funded by: Wellcome Trust

    Genome research 2019;29;3;464-471

  • A new genomic blueprint of the human gut microbiota.

    Almeida A, Mitchell AL, Boland M, Forster SC, Gloor GB, Tarkowska A, Lawley TD and Finn RD

    European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK.

    The composition of the human gut microbiota is linked to health and disease, but knowledge of individual microbial species is needed to decipher their biological roles. Despite extensive culturing and sequencing efforts, the complete bacterial repertoire of the human gut microbiota remains undefined. Here we identify 1,952 uncultured candidate bacterial species by reconstructing 92,143 metagenome-assembled genomes from 11,850 human gut microbiomes. These uncultured genomes substantially expand the known species repertoire of the collective human gut microbiota, with a 281% increase in phylogenetic diversity. Although the newly identified species are less prevalent in well-studied populations compared to reference isolate genomes, they improve classification of understudied African and South American samples by more than 200%. These candidate species encode hundreds of newly identified biosynthetic gene clusters and possess a distinctive functional capacity that might explain their elusive nature. Our work expands the known diversity of uncultured gut bacteria, which provides unprecedented resolution for taxonomic and functional characterization of the intestinal microbiota.

    Nature 2019;568;7753;499-504

  • Epigenetic remodelling licences adult cholangiocytes for organoid formation and liver regeneration.

    Aloia L, McKie MA, Vernaz G, Cordero-Espinoza L, Aleksieva N, van den Ameele J, Antonica F, Font-Cunill B, Raven A, Aiese Cigliano R, Belenguer G, Mort RL, Brand AH, Zernicka-Goetz M, Forbes SJ, Miska EA and Huch M

    The Wellcome Trust/CRUK Gurdon Institute, University of Cambridge, Cambridge, UK.

    Following severe or chronic liver injury, adult ductal cells (cholangiocytes) contribute to regeneration by restoring both hepatocytes and cholangiocytes. We recently showed that ductal cells clonally expand as self-renewing liver organoids that retain their differentiation capacity into both hepatocytes and ductal cells. However, the molecular mechanisms by which adult ductal-committed cells acquire cellular plasticity, initiate organoids and regenerate the damaged tissue remain largely unknown. Here, we describe that ductal cells undergo a transient, genome-wide, remodelling of their transcriptome and epigenome during organoid initiation and in vivo following tissue damage. TET1-mediated hydroxymethylation licences differentiated ductal cells to initiate organoids and activate the regenerative programme through the transcriptional regulation of stem-cell genes and regenerative pathways including the YAP-Hippo signalling. Our results argue in favour of the remodelling of genomic methylome/hydroxymethylome landscapes as a general mechanism by which differentiated cells exit a committed state in response to tissue damage.

    Funded by: Cancer Research UK: A14492, A18583; European Research Council: 669198; Medical Research Council: MC_PC_12009, MR/P016839/1; National Centre for the Replacement, Refinement and Reduction of Animals in Research: NC/R001162/1; Wellcome Trust: 092096, 103792, 104151, 104640, 105839

    Nature cell biology 2019;21;11;1321-1333

  • Multilocus Analysis Resolves the European Finch Epidemic Strain of Trichomonas gallinae and Suggests Introgression from Divergent Trichomonads.

    Alrefaei AF, Low R, Hall N, Jardim R, Dávila A, Gerhold R, John S, Steinbiss S, Cunningham AA, Lawson B, Bell D and Tyler K

    School of Biological Sciences, University of East Anglia, Norwich, Norfolk, United Kingdom.

    In Europe, Trichomonas gallinae recently emerged as a cause of epidemic disease in songbirds. A clonal strain of the parasite, first found in the United Kingdom, has become the predominant strain there and spread to continental Europe. Discriminating this epidemic strain of T. gallinae from other strains necessitated development of multilocus sequence typing (MLST). Development of the MLST was facilitated by the assembly and annotation of a 54.7 Mb draft genome of a cloned stabilate of the A1 European finch epidemic strain (isolated from Greenfinch, Chloris chloris, XT-1081/07 in 2007) containing 21,924 protein coding genes. This enabled construction of a robust 19 locus MLST based on existing typing loci for Trichomonas vaginalis and T. gallinae. Our MLST has the sensitivity to discriminate strains within existing genotypes confidently, and resolves the American finch A1 genotype from the European finch epidemic A1 genotype. Interestingly, one isolate we obtained from a captive black-naped fruit dove Ptilinopsus melanospilus, was not truly T. gallinae but a hybrid of T. gallinae with a distant trichomonad lineage. Phylogenetic analysis of the individual loci in this fruit dove provides evidence of gene flow between distant trichomonad lineages at 2 of the 19 loci examined and may provide precedence for the emergence of other hybrid trichomonad genomes including T. vaginalis.

    Genome biology and evolution 2019;11;8;2391-2402

  • Major subpopulations of Plasmodium falciparum in sub-Saharan Africa.

    Amambua-Ngwa A, Amenga-Etego L, Kamau E, Amato R, Ghansah A, Golassa L, Randrianarivelojosia M, Ishengoma D, Apinjoh T, Maïga-Ascofaré O, Andagalu B, Yavo W, Bouyou-Akotet M, Kolapo O, Mane K, Worwui A, Jeffries D, Simpson V, D'Alessandro U, Kwiatkowski D and Djimde AA

    Medical Research Council Unit The Gambia at LSHTM, Banjul, The Gambia.

    Understanding genomic variation and population structure of <i>Plasmodium falciparum</i> across Africa is necessary to sustain progress toward malaria elimination. Genome clustering of 2263 <i>P. falciparum</i> isolates from 24 malaria-endemic settings in 15 African countries identified major western, central, and eastern ancestries, plus a highly divergent Ethiopian population. Ancestry aligned to these regional blocs, overlapping with both the parasite's origin and with historical human migration. The parasite populations are interbred and shared genomic haplotypes, especially across drug resistance loci, which showed the strongest recent identity-by-descent between populations. A recent signature of selection on chromosome 12 with candidate resistance loci against artemisinin derivatives was evident in Ghana and Malawi. Such selection and the emerging substructure may affect treatment-based intervention strategies against <i>P. falciparum</i> malaria.

    Funded by: Medical Research Council: G0600718

    Science (New York, N.Y.) 2019;365;6455;813-816

  • M3Drop: dropout-based feature selection for scRNASeq.

    Andrews TS and Hemberg M

    Department of Cellular Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridgshire, UK.

    Motivation: Most genomes contain thousands of genes, but for most functional responses, only a subset of those genes are relevant. To facilitate many single-cell RNASeq (scRNASeq) analyses the set of genes is often reduced through feature selection, i.e. by removing genes only subject to technical noise.

    Results: We present M3Drop, an R package that implements popular existing feature selection methods and two novel methods which take advantage of the prevalence of zeros (dropouts) in scRNASeq data to identify features. We show these new methods outperform existing methods on simulated and real datasets.

    Availability and implementation: M3Drop is freely available on github as an R package and is compatible with other popular scRNASeq tools:

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Bioinformatics (Oxford, England) 2019;35;16;2865-2867

  • Statistical Methods for Single‐Cell RNA‐Sequencing

    Andrews, T.S., Kiselev, V.Y. and Hemberg, M.

    In: Handbook of Statistical Genomics, 4th edition. Balding, D., Moltke, I. and Marioni, J. (eds.), Wiley, Hoboken, NJ, USA 2019;735-720

  • Voices in methods development.

    Anikeeva P, Boyden E, Brangwynne C, Cissé II, Fiehn O, Fromme P, Gingras AC, Greene CS, Heard E, Hell SW, Hillman E, Jensen GJ, Karchin R, Kiessling LL, Kleinstiver BP, Knight R, Kukura P, Lancaster MA, Loman N, Looger L, Lundberg E, Luo Q, Miyawaki A, Myers EW, Nolan GP, Picotti P, Reik W, Sauer M, Shalek AK, Shendure J, Slavov N, Tanay A, Troyanskaya O, van Valen D, Wang HW, Yi C, Yin P, Zernicka-Goetz M and Zhuang X

    Departments of Materials Science & Engineering and Brain & Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.

    Funded by: Medical Research Council: MC_UP_1201/9, MR/M501621/1; NCI NIH HHS: R00 CA218870; NIAID NIH HHS: U19 AI057229; NIGMS NIH HHS: DP2 GM119419, DP2 GM123497

    Nature methods 2019;16;10;945-951

  • Evaluating 18s-rRNA LAMP and selective whole genome amplification (sWGA) assay in detecting asymptomatic Plasmodium falciparum infections in blood donors.

    Aninagyei E, Smith-Graham S, Boye A, Egyir-Yawson A and Acheampong DO

    Department of Biomedical Sciences, School of Allied Health Sciences, University of Cape Coast, Cape Coast, Ghana.

    Background: Undesirable consequences of donor Plasmodium falciparum parasitaemia on stored donor blood have been reported. Therefore, it is imperative that all prospective blood donors are screened for P. falciparum infections using sensitive techniques. In this study, the sensitivities of microscopy, rapid diagnostic test (RDT), loop-mediated isothermal amplification (LAMP) assay and selective whole genome amplification (sWGA) technique in detecting P. falciparum infections in blood donors was assessed.

    Methods: Randomly selected blood donors from 5 districts in Greater Accra Region of Ghana were screened for asymptomatic P. falciparum infections. Each donor sample was screened with SD Bioline RDT kit for P. falciparum histidine rich protein 2 and Plasmodium lactate dehydrogenase antigens, sWGA and 18s-rRNA LAMP. Crude DNA LAMP (crDNA-LAMP) was compared to purified DNA LAMP (pDNA-LAMP).

    Results: A total of 771 blood donors were screened. The respective overall prevalence of P. falciparum in Ghana by microscopy, RDT, crDNA-LAMP, pDNA-LAMP and sWGA was 7.4%, 11.8%, 16.9%, 17.5% and 18.0%. Using sWGA as the reference test, the sensitivities of microscopy, RDT, crDNA-LAMP and pDNA-LAMP were 41.0% (95% CI 32.7-49.7), 65.5% (95% CI 56.9-73.3), 82.6% (95% CI 75.8-88.3) and 95.7% (95% CI 90.1-98.4), respectively. There was near perfect agreement between LAMP and sWGA (sWGA vs. crDNA-LAMP, κ = 0.87; sWGA vs. pDNA-LAMP, κ = 0.96), while crDNA-LAMP and pDNA-LAMP agreed perfectly (κ = 0.91). Goodness of fit test indicated non-significant difference between the performance of LAMP and sWGA (crDNA-LAMP vs. sWGA: x<sup>2</sup> = 0.71, p = 0.399 and pDNA-LAMP vs. sWGA: x<sup>2</sup> = 0.14, p = 0.707). Finally, compared to sWGA, the performance of LAMP did not differ in detecting sub-microscopic parasitaemia (sWGA vs. crDNA-LAMP: x<sup>2</sup> = 1.12, p = 0.290 and sWGA vs. pDNA-LAMP: x<sup>2</sup> = 0.22, p = 0.638).

    Conclusions: LAMP assay agreed near perfectly with sWGA with non-significant differences in their ability to detect asymptomatic P. falciparum parasitaemia in blood donors. Therefore, it is recommended that LAMP based assays are employed to detect P. falciparum infections in blood donors due to its high sensitivity, simplicity, cost-effectiveness and user-friendliness.

    Funded by: Wellcome Trust

    Malaria journal 2019;18;1;214

  • Enhanced β-adrenergic signalling underlies an age-dependent beneficial metabolic effect of PI3K p110α inactivation in adipose tissue.

    Araiz C, Yan A, Bettedi L, Samuelson I, Virtue S, McGavigan AK, Dani C, Vidal-Puig A and Foukas LC

    Institute of Healthy Ageing & Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT, UK.

    The insulin/IGF-1 signalling pathway is a key regulator of metabolism and the rate of ageing. We previously documented that systemic inactivation of phosphoinositide 3-kinase (PI3K) p110α, the principal PI3K isoform that positively regulates insulin signalling, results in a beneficial metabolic effect in aged mice. Here we demonstrate that deletion of p110α specifically in the adipose tissue leads to less fat accumulation over a significant part of adult life and allows the maintenance of normal glucose tolerance despite insulin resistance. This effect of p110α inactivation is due to a potentiating effect on β-adrenergic signalling, which leads to increased catecholamine-induced energy expenditure in the adipose tissue. Our findings provide a paradigm of how partial inactivation of an essential component of the insulin signalling pathway can have an overall beneficial metabolic effect and suggest that PI3K inhibition could potentiate the effect of β-adrenergic agonists in the treatment of obesity and its associated comorbidities.

    Funded by: Medical Research Council: MC_UU_12012/5

    Nature communications 2019;10;1;1546

  • Molecular epidemiology of G12 rotavirus strains during eight consecutive epidemic seasons in the Basque Country (North of Spain), 2010-2018.

    Arana A, Jere KC, Chaguza C, Montes M, Alkorta M, Iturriza-Gomara M and Cilla G

    Microbiology Department, Donostia University Hospital - Biodonostia Health Research Institute, San Sebastián, Spain. Electronic address:

    G12 rotaviruses were first detected in Spain (Gipuzkoa province) in December 2004. After four years with no detections, G12 strains re-emerged in the 2010-2011 epidemic season, when the first European epidemic circulation of this genotype was observed in Gipuzkoa. G12 rotaviruses were also the dominant strains in 2011-2012, 2014-2015 and 2015-2016 epidemic seasons and were sporadically detected in the remaining periods (2012-2014 and 2016-2018). The most frequently detected G-type between 2010 and 2018 was G12 (29.9%) rather than G1 rotavirus (17.8%), which historically had been the dominant genotype in our setting (1989-2009 period) and globally. Phylogenetic analysis of the VP4 and VP7 genome segments showed chronologically ordered clades, which spanned between two to four consecutive seasons. Overall, the circulating G12 rotavirus strains in Gipuzkoa between 2010 and 2018 belonged to four clades, which emerged in early 2009 potentially due to at least four importations from other regions followed by local evolution. Whole genome analysis of 16 G12 strains detected from 2010 to 2018 revealed a Wa-like genotype constellation, G12-P[8]-I1-R1-C1-M1-A1-N1-T1-E1-H1, and also showed that G12 strains from Gipuzkoa were similar to those identified in other countries. These findings suggest circulation of G12 rotavirus strains in different parts of the world leading to high genetic diversity.

    Funded by: Department of Health: HPRU 2012–10038; Wellcome Trust: 201945/Z/16/Z

    Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases 2019;71;67-75

  • Multi-omics profiling of mouse gastrulation at single-cell resolution.

    Argelaguet R, Clark SJ, Mohammed H, Stapel LC, Krueger C, Kapourani CA, Imaz-Rosshandler I, Lohoff T, Xiang Y, Hanna CW, Smallwood S, Ibarra-Soria X, Buettner F, Sanguinetti G, Xie W, Krueger F, Göttgens B, Rugg-Gunn PJ, Kelsey G, Dean W, Nichols J, Stegle O, Marioni JC and Reik W

    European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.

    Formation of the three primary germ layers during gastrulation is an essential step in the establishment of the vertebrate body plan and is associated with major transcriptional changes<sup>1-5</sup>. Global epigenetic reprogramming accompanies these changes<sup>6-8</sup>, but the role of the epigenome in regulating early cell-fate choice remains unresolved, and the coordination between different molecular layers is unclear. Here we describe a single-cell multi-omics map of chromatin accessibility, DNA methylation and RNA expression during the onset of gastrulation in mouse embryos. The initial exit from pluripotency coincides with the establishment of a global repressive epigenetic landscape, followed by the emergence of lineage-specific epigenetic patterns during gastrulation. Notably, cells committed to mesoderm and endoderm undergo widespread coordinated epigenetic rearrangements at enhancer marks, driven by ten-eleven translocation (TET)-mediated demethylation and a concomitant increase of accessibility. By contrast, the methylation and accessibility landscape of ectodermal cells is already established in the early epiblast. Hence, regulatory elements associated with each germ layer are either epigenetically primed or remodelled before cell-fate decisions, providing the molecular framework for a hierarchical emergence of the primary germ layers.

    Funded by: European Research Council: 810296; Medical Research Council: MC_PC_12009, MC_UU_00009/1, MR/K011332/1, MR/M01536X/1; Wellcome Trust: 095645, 105031, 108438/E/15/Z, 210754/Z/18/Z

    Nature 2019;576;7787;487-491

  • Genome-wide by environment interaction studies of depressive symptoms and psychosocial stress in UK Biobank and Generation Scotland.

    Arnau-Soler A, Macdonald-Dunlop E, Adams MJ, Clarke TK, MacIntyre DJ, Milburn K, Navrady L, Generation Scotland, Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium, Hayward C, McIntosh AM and Thomson PA

    Medical Genetics Section, University of Edinburgh, Centre for Genomic and Experimental Medicine and MRC Institute of Genetics and Molecular Medicine, Edinburgh, UK.

    Stress is associated with poorer physical and mental health. To improve our understanding of this link, we performed genome-wide association studies (GWAS) of depressive symptoms and genome-wide by environment interaction studies (GWEIS) of depressive symptoms and stressful life events (SLE) in two UK population-based cohorts (Generation Scotland and UK Biobank). No SNP was individually significant in either GWAS, but gene-based tests identified six genes associated with depressive symptoms in UK Biobank (DCC, ACSS3, DRD2, STAG1, FOXP2 and KYNU; p < 2.77 × 10<sup>-6</sup>). Two SNPs with genome-wide significant GxE effects were identified by GWEIS in Generation Scotland: rs12789145 (53-kb downstream PIWIL4; p = 4.95 × 10<sup>-9</sup>; total SLE) and rs17070072 (intronic to ZCCHC2; p = 1.46 × 10<sup>-8</sup>; dependent SLE). A third locus upstream CYLC2 (rs12000047 and rs12005200, p < 2.00 × 10<sup>-8</sup>; dependent SLE) when the joint effect of the SNP main and GxE effects was considered. GWEIS gene-based tests identified: MTNR1B with GxE effect with dependent SLE in Generation Scotland; and PHF2 with the joint effect in UK Biobank (p < 2.77 × 10<sup>-6</sup>). Polygenic risk scores (PRSs) analyses incorporating GxE effects improved the prediction of depressive symptom scores, when using weights derived from either the UK Biobank GWAS of depressive symptoms (p = 0.01) or the PGC GWAS of major depressive disorder (p = 5.91 × 10<sup>-3</sup>). Using an independent sample, PRS derived using GWEIS GxE effects provided evidence of shared aetiologies between depressive symptoms and schizotypal personality, heart disease and COPD. Further such studies are required and may result in improved treatments for depression and other stress-related conditions.

    Funded by: Chief Scientist Office: CZD/16/6; Medical Research Council: MC_QA137853, MR/K026992/1; NIMH NIH HHS: U01 MH109528; Wellcome Trust: 104036/Z/14/Z

    Translational psychiatry 2019;9;1;14

  • Generation of gene-corrected human induced pluripotent stem cell lines derived from retinitis pigmentosa patient with Ser331Cysfs*5 mutation in MERTK.

    Artero Castro A, Long K, Bassett A, Machuca C, León M, Ávila-Fernandez A, Cortón M, Vidal-Puig T, Ayuso C, Lukovic D and Erceg S

    Stem Cells Therapies in Neurodegenerative Diseases Lab, Centro de Investigacion Principe Felipe (CIPF), Valencia, Spain.

    The human induced pluripotent stem cell (hiPSC) line RP1-FiPS4F1 generated from the patient with autosomal recessive retinitis pigmentosa (arRP) caused by homozygous Ser331Cysfs*5 mutation in Mer tyrosine kinase receptor (MERTK) was genetically corrected using CRISPR/Cas9 system. Two isogenic hiPSCs lines, with heterozygous and homozygous correction of c.992_993delCA mutation in the MERTK gene were generated. These cell lines demonstrate normal karyotype, maintain a pluripotent state, and can differentiate toward three germ layers in vitro. These genetically corrected hiPSCs represent accurate controls to study the contribution of the specific genetic change to the disease, and potentially therapeutic material for cell-replacement therapy.

    Funded by: Medical Research Council: MC_UU_12012/2

    Stem cell research 2019;34;101341

  • Three phylogenetic groups have driven the recent population expansion of Cryptococcus neoformans.

    Ashton PM, Thanh LT, Trieu PH, Van Anh D, Trinh NM, Beardsley J, Kibengo F, Chierakul W, Dance DAB, Rattanavong S, Davong V, Hung LQ, Chau NVV, Tung NLN, Chan AK, Thwaites GE, Lalloo DG, Anscombe C, Nhat LTH, Perfect J, Dougan G, Baker S, Harris S and Day JN

    Wellcome Trust Asia Programme, Oxford University Clinical Research Unit, 764 Vo Van Kiet, Ho Chi Minh City, Vietnam.

    Cryptococcus neoformans (C. neoformans var. grubii) is an environmentally acquired pathogen causing 181,000 HIV-associated deaths each year. We sequenced 699 isolates, primarily C. neoformans from HIV-infected patients, from 5 countries in Asia and Africa. The phylogeny of C. neoformans reveals a recent exponential population expansion, consistent with the increase in the number of susceptible hosts. In our study population, this expansion has been driven by three sub-clades of the C. neoformans VNIa lineage; VNIa-4, VNIa-5 and VNIa-93. These three sub-clades account for 91% of clinical isolates sequenced in our study. Combining the genome data with clinical information, we find that the VNIa-93 sub-clade, the most common sub-clade in Uganda and Malawi, was associated with better outcomes than VNIa-4 and VNIa-5, which predominate in Southeast Asia. This study lays the foundation for further work investigating the dominance of VNIa-4, VNIa-5 and VNIa-93 and the association between lineage and clinical phenotype.

    Funded by: Medical Research Council: G1100684, MR/L015080/1

    Nature communications 2019;10;1;2035

  • Genomic analysis of Plasmodium vivax in southern Ethiopia reveals selective pressures in multiple parasite mechanisms.

    Auburn S, Getachew S, Pearson RD, Amato R, Miotto O, Trimarsanto H, Zhu SJ, Rumaseb A, Marfurt J, Noviyanti R, Grigg MJ, Barber B, William T, Goncalves SM, Drury E, Sriprawat K, Anstey NM, Nosten F, Petros B, Aseffa A, McVean G, Kwiatkowski DP and Price RN

    Global and Tropical Health Division, Menzies School of Health Research and Charles Darwin University, Darwin, Northern Territories, Australia.

    The Horn of Africa harbours the largest reservoir of Plasmodium vivax in the continent. Most of sub-Saharan Africa has remained relatively vivax-free due to a high prevalence of the human Duffy-negative trait, but the emergence of strains able to invade Duffy-negative reticulocytes poses a major public health threat. We undertook the first population genomic investigation of P. vivax from the region, comparing the genomes of 24 Ethiopian isolates against data from Southeast Asia to identify important local adaptions. The prevalence of the duffy binding protein amplification in Ethiopia was 79%, potentially reflecting adaptation to Duffy-negativity. There was also evidence of selection in a region upstream of the chloroquine resistance transporter, a putative chloroquine-resistance determinant. Strong signals of selection were observed in genes involved in immune evasion and regulation of gene expression, highlighting the need for a multifaceted intervention approach to combat P. vivax in the region.

    The Journal of infectious diseases 2019

  • Identification of a regeneration-organizing cell in the Xenopus tail.

    Aztekin C, Hiscock TW, Marioni JC, Gurdon JB, Simons BD and Jullien J

    Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge, UK.

    Unlike mammals, <i>Xenopus laevis</i> tadpoles have a high regenerative potential. To characterize this regenerative response, we performed single-cell RNA sequencing after tail amputation. By comparing naturally occurring regeneration-competent and -incompetent tadpoles, we identified a previously unrecognized cell type, which we term the regeneration-organizing cell (ROC). ROCs are present in the epidermis during normal tail development and specifically relocalize to the amputation plane of regeneration-competent tadpoles, forming the wound epidermis. Genetic ablation or manual removal of ROCs blocks regeneration, whereas transplantation of ROC-containing grafts induces ectopic outgrowths in early embryos. Transcriptional profiling revealed that ROCs secrete ligands associated with key regenerative pathways, signaling to progenitors to reconstitute lost tissue. These findings reveal the cellular mechanism through which ROCs form the wound epidermis and ensure successful regeneration.

    Funded by: Cancer Research UK: A14492; Medical Research Council: MC_PC_12009; Wellcome Trust: 092096, 098357, 101050, 105031

    Science (New York, N.Y.) 2019;364;6441;653-658

  • Whole genome sequence of Vibrio cholerae directly from dried spotted filter paper.

    Bénard AHM, Guenou E, Fookes M, Ateudjieu J, Kasambara W, Siever M, Rebaudet S, Boncy J, Adrien P, Piarroux R, Sack DA, Thomson N and Debes AK

    Wellcome Trust Sanger Institute, Genome campus, Hinxton United Kingdom.

    Background: Global estimates for cholera annually approximate 4 million cases worldwide with 95,000 deaths. Recent outbreaks, including Haiti and Yemen, are reminders that cholera is still a global health concern. Cholera outbreaks can rapidly induce high death tolls by overwhelming the capacity of health facilities, especially in remote areas or areas of civil unrest. Recent studies demonstrated that stool specimens preserved on filter paper facilitate molecular analysis of Vibrio cholerae in resource limited settings. Specimens preserved in a rapid, low-cost, safe and sustainable manner for sequencing provides previously unavailable data about circulating cholera strains. This may ultimately contribute new information to shape public policy response on cholera control and elimination.

    Methodology/principal findings: Whole genome sequencing (WGS) recovered close to a complete sequence of the V. cholerae O1 genome with satisfactory genome coverage from stool specimens enriched in alkaline peptone water (APW) and V. cholerae culture isolates, both spotted on filter paper. The minimum concentration of V. cholerae DNA sufficient to produce quality genomic information was 0.02 ng/μL. The genomic data confirmed the presence or absence of genes of epidemiological interest, including cholera toxin and pilus loci. WGS identified a variety of diarrheal pathogens from APW-enriched specimen spotted filter paper, highlighting the potential for this technique to explore the gut microbiome, potentially identifying co-infections, which may impact the severity of disease. WGS demonstrated that these specimens fit within the current global cholera phylogenetic tree, identifying the strains as the 7th pandemic El Tor.

    Conclusions: WGS results allowed for mapping of short reads from APW-enriched specimen and culture isolate spotted filter papers. This provided valuable molecular epidemiological sequence information on V. cholerae strains from remote, low-resource settings. These results identified the presence of co-infecting pathogens while providing rare insight into the specific V. cholerae strains causing outbreaks in cholera-endemic areas.

    PLoS neglected tropical diseases 2019;13;5;e0007330

  • Activation of Skeletal Stem and Progenitor Cells for Bone Regeneration Is Driven by PDGFRβ Signaling.

    Böhm AM, Dirckx N, Tower RJ, Peredo N, Vanuytven S, Theunis K, Nefyodova E, Cardoen R, Lindner V, Voet T, Van Hul M and Maes C

    Laboratory of Skeletal Cell Biology and Physiology (SCEBP), Skeletal Biology and Engineering Research Center (SBE), Department of Development and Regeneration, KU Leuven, 3000 Leuven, Belgium.

    Bone repair and regeneration critically depend on the activation and recruitment of osteogenesis-competent skeletal stem and progenitor cells (SSPCs). Yet, the origin and triggering cues for SSPC propagation and migration remain largely elusive. Through bulk and single-cell transcriptome profiling of fetal osterix (Osx)-expressing cells, followed by lineage mapping, cell tracing, and conditional mouse mutagenesis, we here identified PDGF-PDGFRβ signaling as critical functional mediator of SSPC expansion, migration, and angiotropism during bone repair. Our data show that cells marked by a history of Osx expression, including those arising in fetal or early postnatal periods, represent or include SSPCs capable of delivering all the necessary differentiated progeny to repair acute skeletal injuries later in life, provided that they express functional PDGFRβ. Mechanistically, MMP-9 and VCAM-1 appear to be involved downstream of PDGF-PDGFRβ. Our results reveal considerable cellular dynamism in the skeletal system and show that activation and recruitment of SSPCs for bone repair require functional PDGFRβ signaling.

    Developmental cell 2019;51;2;236-254.e12

  • Progression of the canonical reference malaria parasite genome from 2002-2019.

    Böhme U, Otto TD, Sanders M, Newbold CI and Berriman M

    Parasite Genomics, Wellcome Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK.

    Here we describe the ways in which the sequence and annotation of the <i>Plasmodium falciparum</i> reference genome has changed since its publication in 2002. As the malaria species responsible for the most deaths worldwide, the richness of annotation and accuracy of the sequence are important resources for the <i>P. falciparum</i> research community as well as the basis for interpreting the genomes of subsequently sequenced species. At the time of publication in 2002 over 60% of predicted genes had unknown functions. As of March 2019, this number has been significantly decreased to 33%. The reduction is due to the inclusion of genes that were subsequently characterised experimentally and genes with significant similarity to others with known functions. In addition, the structural annotation of genes has been significantly refined; 27% of gene structures have been changed since 2002, comprising changes in exon-intron boundaries, addition or deletion of exons and the addition or deletion of genes. The sequence has also undergone significant improvements. In addition to the correction of a large number of single-base and insertion or deletion errors, a major miss-assembly between the subtelomeres of chromosome 7 and 8 has been corrected. As the number of sequenced isolates continues to grow rapidly, a single reference genome will not be an adequate basis for interpreting intra-species sequence diversity. We therefore describe in this publication a population reference genome of <i>P. falciparum</i>, called Pfref1. This reference will enable the community to map to regions that are not present in the current assembly. <i>P. falciparum</i> 3D7 will continue to be maintained, with ongoing curation ensuring continual improvements in annotation quality.

    Wellcome open research 2019;4;58

  • Truncation of GdpP mediates β-lactam resistance in clinical isolates of Staphylococcus aureus.

    Ba X, Kalmar L, Hadjirin NF, Kerschner H, Apfalter P, Morgan FJ, Paterson GK, Girvan SL, Zhou R, Harrison EM and Holmes MA

    Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.

    Objectives: High-level β-lactam resistance in MRSA is mediated in the majority of strains by a mecA or mecC gene. In this study, we identified 10 mec gene-negative MRSA human isolates from Austria and 11 bovine isolates from the UK showing high levels of β-lactam resistance and sought to understand the molecular basis of the resistance observed.

    Methods: Different antimicrobial resistance testing methods (disc diffusion, Etest and VITEK® 2) were used to establish the β-lactam resistance profiles for the isolates and the isolates were further investigated by WGS.

    Results: A number of mutations (including novel ones) in PBPs, AcrB, YjbH and the pbp4 promoter were identified in the resistant isolates, but not in closely related susceptible isolates. Importantly, a truncation in the cyclic diadenosine monophosphate phosphodiesterase enzyme, GdpP, was identified in 7 of the 10 Austrian isolates and 10 of the 11 UK isolates. Complementation of four representative isolates with an intact copy of the gdpP gene restored susceptibility to penicillins and abolished the growth defects caused by the truncation.

    Conclusions: This study reports naturally occurring inactivation of GdpP protein in Staphylococcus aureus of both human origin and animal origin, and demonstrates clinical relevance to a previously reported association between this truncation and increased β-lactam resistance and impaired bacterial growth in laboratory-generated mutants. It also highlights possible limitations of genomic determination of antibiotic susceptibility based on single gene presence or absence when choosing the appropriate antimicrobial treatment for patients.

    Funded by: Medical Research Council: G1001787, MR/N002660/1, MR/P007201/1, MR/S00291X/1

    The Journal of antimicrobial chemotherapy 2019;74;5;1182-1191

  • Somatic evolution and global expansion of an ancient transmissible cancer lineage.

    Baez-Ortega A, Gori K, Strakova A, Allen JL, Allum KM, Bansse-Issa L, Bhutia TN, Bisson JL, Briceño C, Castillo Domracheva A, Corrigan AM, Cran HR, Crawford JT, Davis E, de Castro KF, B de Nardi A, de Vos AP, Delgadillo Keenan L, Donelan EM, Espinoza Huerta AR, Faramade IA, Fazil M, Fotopoulou E, Fruean SN, Gallardo-Arrieta F, Glebova O, Gouletsou PG, Häfelin Manrique RF, Henriques JJGP, Horta RS, Ignatenko N, Kane Y, King C, Koenig D, Krupa A, Kruzeniski SJ, Kwon YM, Lanza-Perea M, Lazyan M, Lopez Quintana AM, Losfelt T, Marino G, Martínez Castañeda S, Martínez-López MF, Meyer M, Migneco EJ, Nakanwagi B, Neal KB, Neunzig W, Ní Leathlobhair M, Nixon SJ, Ortega-Pacheco A, Pedraza-Ordoñez F, Peleteiro MC, Polak K, Pye RJ, Reece JF, Rojas Gutierrez J, Sadia H, Schmeling SK, Shamanova O, Sherlock AG, Stammnitz M, Steenland-Smit AE, Svitich A, Tapia Martínez LJ, Thoya Ngoka I, Torres CG, Tudor EM, van der Wel MG, Viţălaru BA, Vural SA, Walkinton O, Wang J, Wehrle-Martinez AS, Widdowson SAE, Stratton MR, Alexandrov LB, Martincorena I and Murchison EP

    Transmissible Cancer Group, Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.

    The canine transmissible venereal tumor (CTVT) is a cancer lineage that arose several millennia ago and survives by "metastasizing" between hosts through cell transfer. The somatic mutations in this cancer record its phylogeography and evolutionary history. We constructed a time-resolved phylogeny from 546 CTVT exomes and describe the lineage's worldwide expansion. Examining variation in mutational exposure, we identify a highly context-specific mutational process that operated early in the cancer's evolution but subsequently vanished, correlate ultraviolet-light mutagenesis with tumor latitude, and describe tumors with heritable hyperactivity of an endogenous mutational process. CTVT displays little evidence of ongoing positive selection, and negative selection is detectable only in essential genes. We illustrate how long-lived clonal organisms capture changing mutagenic environments, and reveal that neutral genetic drift is the dominant feature of long-term cancer evolution.

    Funded by: Wellcome Trust

    Science (New York, N.Y.) 2019;365;6452

  • Single-Cell RNA Sequencing with Drop-Seq.

    Bageritz J and Raddi G

    Division Signaling and Functional Genomics, German Cancer Research Center (DKFZ), Heidelberg, Germany.

    Drop-Seq is a low-cost, high-throughput platform to profile thousands of cells by encapsualting them into individual droplets. Uniquely barcoded mRNA capture microparticles and cells are coconfined through a microfluidic device within the droplets where they undergo cell lysis and RNA hybridiztion. After breaking the droplets and pooling the hybridized particles, reverse transcription, PCR, and sequencing in single reactions allow to generate data from thousands of single-cell transcriptomes while maintaining information on the cellular origin of each transcript.

    Funded by: NIAID NIH HHS: Z01 AI000947; NIGMS NIH HHS: T32 GM008042

    Methods in molecular biology (Clifton, N.J.) 2019;1979;73-85

  • MAN1B-CDG: Novel variants with a distinct phenotype and review of literature.

    Balasubramanian M, Johnson DS and DDD Study

    Sheffield Clinical Genetics Service, Sheffield Children's NHS Foundation Trust, UK; Highly Specialised Service for Severe, Complex and Atypical OI Service, Sheffield Children's NHS Foundation Trust, Sheffield, UK; Academic Unit of Child Health, University of Sheffield, UK. Electronic address:

    Background: Congenital disorders of glycosylation (CDG) are a group of rare metabolic diseases due to impaired lipid and protein glycosylation. It comprises a characteristic high frequency of intellectual disability (ID) and a wide range of clinical phenotypes.

    Objective: To identify the underlying diagnosis in two families each with two siblings with variable level of ID through trio whole exome sequencing.

    Methods: Both the families were recruited to the Deciphering Developmental Disorders (DDD) study to identify the aetiology for their ID. Further work-up included isoelectric focusing (IEF) of serum transferrin done to add evidence to the molecular diagnosis.

    Results: These patients were found to have three novel variants in MAN1B1 inherited from their healthy parents. Serum transferrin IEF showed a type 2 pattern.

    Discussion: MAN1B1 variants were initially described in association with non-syndromic ID; subsequent literature suggested that variants in MAN1B1 resulted in a CDG-type II syndrome. However, there remains a paucity of literature on detailed clinical phenotyping and it still remains a rare form of CDG. The present patients showed the phenotype previously reported in MAN1B1-CDG: a characteristic facial dysmorphism, hypotonia, truncal obesity and in some, behavioural problems.

    Conclusions: In unexplained ID, serum transferrin should be included in the first-line screening. With advances in genomic medicine, it is important to diagnose CDG as this has implications for management and recurrence risk counselling.

    European journal of medical genetics 2019;62;2;109-114

  • FBXO7 sensitivity of phenotypic traits elucidated by a hypomorphic allele.

    Ballesteros Reviriego C, Clare S, Arends MJ, Cambridge EL, Swiatkowska A, Caetano S, Abu-Helil B, Kane L, Harcourt K, Goulding DA, Gleeson D, Ryder E, Doe B, White JK, van der Weyden L, Dougan G, Adams DJ and Speak AO

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom.

    FBXO7 encodes an F box containing protein that interacts with multiple partners to facilitate numerous cellular processes and has a canonical role as part of an SCF E3 ubiquitin ligase complex. Mutation of FBXO7 is responsible for an early onset Parkinsonian pyramidal syndrome and genome-wide association studies have linked variants in FBXO7 to erythroid traits. A putative orthologue in Drosophila, nutcracker, has been shown to regulate the proteasome, and deficiency of nutcracker results in male infertility. Therefore, we reasoned that modulating Fbxo7 levels in a murine model could provide insights into the role of this protein in mammals. We used a targeted gene trap model which retained 4-16% residual gene expression and assessed the sensitivity of phenotypic traits to gene dosage. Fbxo7 hypomorphs showed regenerative anaemia associated with a shorter erythrocyte half-life, and male mice were infertile. Alterations to T cell phenotypes were also observed, which intriguingly were both T cell intrinsic and extrinsic. Hypomorphic mice were also sensitive to infection with Salmonella, succumbing to a normally sublethal challenge. Despite these phenotypes, Fbxo7 hypomorphs were produced at a normal Mendelian ratio with a normal lifespan and no evidence of neurological symptoms. These data suggest that erythrocyte survival, T cell development and spermatogenesis are particularly sensitive to Fbxo7 gene dosage.

    PloS one 2019;14;3;e0212481

  • ATM orchestrates the DNA-damage response to counter toxic non-homologous end-joining at broken replication forks.

    Balmus G, Pilger D, Coates J, Demir M, Sczaniecka-Clift M, Barros AC, Woods M, Fu B, Yang F, Chen E, Ostermaier M, Stankovic T, Ponstingl H, Herzog M, Yusa K, Martinez FM, Durant ST, Galanty Y, Beli P, Adams DJ, Bradley A, Metzakopian E, Forment JV and Jackson SP

    The Wellcome Trust and Cancer Research UK Gurdon Institute and Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK.

    Mutations in the ATM tumor suppressor gene confer hypersensitivity to DNA-damaging chemotherapeutic agents. To explore genetic resistance mechanisms, we performed genome-wide CRISPR-Cas9 screens in cells treated with the DNA topoisomerase I inhibitor topotecan. Thus, we here establish that inactivating terminal components of the non-homologous end-joining (NHEJ) machinery or of the BRCA1-A complex specifically confer topotecan resistance to ATM-deficient cells. We show that hypersensitivity of ATM-mutant cells to topotecan or the poly-(ADP-ribose) polymerase (PARP) inhibitor olaparib reflects delayed engagement of homologous recombination at DNA-replication-fork associated single-ended double-strand breaks (DSBs), allowing some to be subject to toxic NHEJ. Preventing DSB ligation by NHEJ, or enhancing homologous recombination by BRCA1-A complex disruption, suppresses this toxicity, highlighting a crucial role for ATM in preventing toxic LIG4-mediated chromosome fusions. Notably, suppressor mutations in ATM-mutant backgrounds are different to those in BRCA1-mutant scenarios, suggesting new opportunities for patient stratification and additional therapeutic vulnerabilities for clinical exploitation.

    Funded by: Wellcome Trust

    Nature communications 2019;10;1;87

  • The Genetic Basis of Metabolic Disease.

    Barroso I and McCarthy MI

    Wellcome Sanger Institute, Hinxton, Cambridge CB10 1SA, UK. Electronic address:

    Recent developments in genetics and genomics are providing a detailed and systematic characterization of the genetic underpinnings of common metabolic diseases and traits, highlighting the inherent complexity within systems for homeostatic control and the many ways in which that control can fail. The genetic architecture underlying these common metabolic phenotypes is complex, with each trait influenced by hundreds of loci spanning a range of allele frequencies and effect sizes. Here, we review the growing appreciation of this complexity and how this has fostered the implementation of genome-scale approaches that deliver robust mechanistic inference and unveil new strategies for translational exploitation.

    Funded by: NIDDK NIH HHS: U01 DK105535

    Cell 2019;177;1;146-161

  • The Life Cycle of the Acropora Coral-Eating Flatworm (AEFW), Prosthiostomum acroporae; The Influence of Temperature and Management Guidelines

    Barton JA, Hutson KS, Bourne DG, Humphrey C, Dybala C and Rawlinson KA

    As coral aquaculture is increasing around the world for reef restoration and trade, mitigating the impact of coral predators, pathogens and parasites is necessary for optimal growth. The Acropora coral-eating flatworm (AEFW), Prosthiostomum acroporae (Platyhelminthes: Polycladida: Prosthiostomidae) feeds on wild and cultivated Acropora species and its inadvertent introduction into reef tanks can lead to the rapid death of coral colonies. To guide the treatment of infested corals we investigated the flatworm's life cycle parameters at a range of temperatures that represent those found in reef tanks, coral aquaculture facilities and seasonal fluctuations in the wild. We utilized P. acroporae from a long-term in vivo culture on Acropora species to examine the effects of temperature (3 degrees C increments from 21 to 30 degrees C) on flatworm embryonation period, hatching success, hatchling longevity, and time to sexual maturity. Our findings show that warmer seawater shortened generation times; at 27 degrees C it took, on average, 11 days for eggs to hatch, and 35 days for flatworms to reach sexual maturity, giving a minimum generation time of 38 days, whereas at 24 degrees C the generation time was 64 days. Warmer seawater (24-30 degrees C) also increased egg hatching success compared to cooler conditions (21 degrees C). These results indicate that warmer temperatures lead to higher population densities of P. acroporae. Temperature significantly increased the growth rate of P. acroporae, with individuals reaching a larger size at sexual maturity in warmer temperatures, but it did not influence hatchling longevity. Hatchlings, which can swim as well as crawl, can survive between 0.25 and 9 days in the absence of Acropora, and could therefore disperse between coral colonies and inter-connected aquaria. We used our data to predict embryonation duration and time to sexual maturity at 21-30 degrees C, and discuss how to optimize current treatments to disrupt the flatworm's life cycle in captivity.

    Frontiers in Marine Science 2019;6;524

  • Contrasting requirements during disease evolution identify EZH2 as a therapeutic target in AML.

    Basheer F, Giotopoulos G, Meduri E, Yun H, Mazan M, Sasca D, Gallipoli P, Marando L, Gozdecka M, Asby R, Sheppard O, Dudek M, Bullinger L, Döhner H, Dillon R, Freeman S, Ottmann O, Burnett A, Russell N, Papaemmanuil E, Hills R, Campbell P, Vassiliou GS and Huntly BJP

    Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, Cambridge, UK.

    Epigenetic regulators, such as EZH2, are frequently mutated in cancer, and loss-of-function <i>EZH2</i> mutations are common in myeloid malignancies. We have examined the importance of cellular context for Ezh2 loss during the evolution of acute myeloid leukemia (AML), where we observed stage-specific and diametrically opposite functions for Ezh2 at the early and late stages of disease. During disease maintenance, WT Ezh2 exerts an oncogenic function that may be therapeutically targeted. In contrast, Ezh2 acts as a tumor suppressor during AML induction. Transcriptional analysis explains this apparent paradox, demonstrating that loss of <i>Ezh2</i> derepresses different expression programs during disease induction and maintenance. During disease induction, <i>Ezh2</i> loss derepresses a subset of bivalent promoters that resolve toward gene activation, inducing a feto-oncogenic program that includes genes such as <i>Plag1</i>, whose overexpression phenocopies <i>Ezh2</i> loss to accelerate AML induction in mouse models. Our data highlight the importance of cellular context and disease phase for the function of Ezh2 and its potential therapeutic implications.

    Funded by: Wellcome Trust

    The Journal of experimental medicine 2019;216;4;966-981

  • Genome-scale drop-out screens to identify cancer cell vulnerabilities in AML.

    Basheer FT and Vassiliou GS

    Wellcome-MRC Cambridge Stem Cell Institute, Department of Haematology, University of Cambridge, Hills Road, Cambridge, CB2 0AW, United Kingdom; Haematological Cancer Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, United Kingdom.

    Acute myeloid leukemia (AML) is an aggressive cancer that remains lethal to the majority of sufferers. Whilst the mainstay treatments for this condition have remained largely unchanged over the past five decades, progress in deciphering its pathogenesis has accelerated in recent years, propelled in part by advances in cancer genomics and mechanistic studies of leukemogenic mutations. Newer molecular therapies targeting aberrant biological pathways are currently under investigation with a few moving closer to clinical use. However, collectively, these new therapies are not predicted to have a major impact on clinical outcomes and the need for the identification of further therapeutic targets in AML remains critical. Recently the use of CRISPR-Cas9 systems for genome editing and their potential application in genome-wide screening has opened a new frontier for unbiased discovery of therapeutic vulnerabilities in cancer and AML was the first disease in which this technology was systematically applied. In this review we give an overview of recent advances in identifying novel therapeutic vulnerabilities of AML using CRISPR-Cas9 and discuss possible future applications of CRISPR technologies in this field.

    Funded by: Cancer Research UK: C22324/A23015; Department of Health; Medical Research Council: MC_PC_12009; Wellcome Trust

    Current opinion in genetics & development 2019;54;83-87

  • Comparative analysis of the chicken IFITM locus by targeted genome sequencing reveals evolution of the locus and positive selection in IFITM1 and IFITM3.

    Bassano I, Ong SH, Sanz-Hernandez M, Vinkler M, Kebede A, Hanotte O, Onuigbo E, Fife M and Kellam P

    Department of Medicine, Division of Infectious Diseases, Wright Fleming Wing, St Mary's Campus, Imperial College London, Norfolk Place, London, W2 1PG, UK.

    Background: The interferon-induced transmembrane (IFITM) protein family comprises a class of restriction factors widely characterised in humans for their potent antiviral activity. Their biological activity is well documented in several animal species, but their genetic variation and biological mechanism is less well understood, particularly in avian species.

    Results: Here we report the complete sequence of the domestic chicken Gallus gallus IFITM locus from a wide variety of chicken breeds to examine the detailed pattern of genetic variation of the locus on chromosome 5, including the flanking genes ATHL1 and B4GALNT4. We have generated chIFITM sequences from commercial breeds (supermarket-derived chicken breasts), indigenous chickens from Nigeria (Nsukka) and Ethiopia, European breeds and inbred chicken lines from the Pirbright Institute, totalling of 206 chickens. Through mapping of genetic variants to the latest chIFITM consensus sequence our data reveal that the chIFITM locus does not show structural variation in the locus across the populations analysed, despite spanning diverse breeds from different geographic locations. However, single nucleotide variants (SNVs) in functionally important regions of the proteins within certain groups of chickens were detected, in particular the European breeds and indigenous birds from Ethiopia and Nigeria. In addition, we also found that two out of four SNVs located in the chIFITM1 (Ser36 and Arg77) and chIFITM3 (Val103) proteins were simultaneously under positive selection.

    Conclusions: Together these data suggest that IFITM genetic variation may contribute to the capacities of different chicken populations to resist virus infection.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/L003996/1, BBS/OS/GC/000015/2; Inter-COST: LTC18060

    BMC genomics 2019;20;1;272

  • PIGT-CDG, a disorder of the glycosylphosphatidylinositol anchor: description of 13 novel patients and expansion of the clinical characteristics.

    Bayat A, Knaus A, Juul AW, Dukic D, Gardella E, Charzewska A, Clement E, Hjalgrim H, Hoffman-Zacharska D, Horn D, Horton R, Hurst JA, Josifova D, Larsen LHG, Lascelles K, Obersztyn E, Pagnamenta A, Pal DK, Pendziwiat M, Ryten M, Taylor J, Vogt J, Weber Y, Krawitz PM, Helbig I, Kini U, Møller RS and DDD Study Group

    Department of Pediatrics, University Hospital of Hvidovre, Hvidovre, Denmark.

    Purpose: To provide a detailed electroclinical description and expand the phenotype of PIGT-CDG, to perform genotype-phenotype correlation, and to investigate the onset and severity of the epilepsy associated with the different genetic subtypes of this rare disorder. Furthermore, to use computer-assisted facial gestalt analysis in PIGT-CDG and to the compare findings with other glycosylphosphatidylinositol (GPI) anchor deficiencies.

    Methods: We evaluated 13 children from eight unrelated families with homozygous or compound heterozygous pathogenic variants in PIGT.

    Results: All patients had hypotonia, severe developmental delay, and epilepsy. Epilepsy onset ranged from first day of life to two years of age. Severity of the seizure disorder varied from treatable seizures to severe neonatal onset epileptic encephalopathies. The facial gestalt of patients resembled that of previously published PIGT patients as they were closest to the center of the PIGT cluster in the clinical face phenotype space and were distinguishable from other gene-specific phenotypes.

    Conclusion: We expand our knowledge of PIGT. Our cases reaffirm that the use of genetic testing is essential for diagnosis in this group of disorders. Finally, we show that computer-assisted facial gestalt analysis accurately assigned PIGT cases to the multiple congenital anomalies-hypotonia-seizures syndrome phenotypic series advocating the additional use of next-generation phenotyping technology.

    Funded by: Department of Health; Medical Research Council: MR/N008324/1; Wellcome Trust: WT098051

    Genetics in medicine : official journal of the American College of Medical Genetics 2019;21;10;2216-2223

  • Paternal exposure to benzo(a)pyrene induces genome-wide mutations in mouse offspring.

    Beal MA, Meier MJ, Williams A, Rowan-Carroll A, Gagné R, Lindsay SJ, Fitzgerald T, Hurles ME, Marchetti F and Yauk CL

    1Carleton University, Ottawa, Ontario K1S 5B6 Canada.

    Understanding the effects of environmental exposures on germline mutation rates has been a decades-long pursuit in genetics. We used next-generation sequencing and comparative genomic hybridization arrays to investigate genome-wide mutations in the offspring of male mice exposed to benzo(a)pyrene (BaP), a common environmental pollutant. We demonstrate that offspring developing from sperm exposed during the mitotic or post-mitotic phases of spermatogenesis have significantly more de novo single nucleotide variants (1.8-fold; <i>P</i> < 0.01) than controls. Both phases of spermatogenesis are susceptible to the induction of heritable mutations, although mutations arising from post-fertilization events are more common after post-mitotic exposure. In addition, the mutation spectra in sperm and offspring of BaP-exposed males are consistent. Finally, we report a significant increase in transmitted copy number duplications (<i>P</i> = 0.001) in BaP-exposed sires. Our study demonstrates that germ cell mutagen exposures induce genome-wide mutations in the offspring that may be associated with adverse health outcomes.

    Communications biology 2019;2;228

  • Genomic epidemiology of syphilis reveals independent emergence of macrolide resistance across multiple circulating lineages.

    Beale MA, Marks M, Sahi SK, Tantalo LC, Nori AV, French P, Lukehart SA, Marra CM and Thomson NR

    Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, UK.

    Syphilis is a sexually transmitted infection caused by Treponema pallidum subspecies pallidum and may lead to severe complications. Recent years have seen striking increases in syphilis in many countries. Previous analyses have suggested one lineage of syphilis, SS14, may have expanded recently, indicating emergence of a single pandemic azithromycin-resistant cluster. Here we use direct sequencing of T. pallidum combined with phylogenomic analyses to show that both SS14- and Nichols-lineages are simultaneously circulating in clinically relevant populations in multiple countries. We correlate the appearance of genotypic macrolide resistance with multiple independently evolved SS14 sub-lineages and show that genotypically resistant and sensitive sub-lineages are spreading contemporaneously. These findings inform our understanding of the current syphilis epidemic by demonstrating how macrolide resistance evolves in Treponema subspecies and provide a warning on broader issues of antimicrobial resistance.

    Funded by: NIAID NIH HHS: P01 AI034616, R01 AI042143; NINDS NIH HHS: R01 NS034235; Wellcome Trust

    Nature communications 2019;10;1;3255

  • Slc20a2, Encoding the Phosphate Transporter PiT2, Is an Important Genetic Determinant of Bone Quality and Strength.

    Beck-Cormier S, Lelliott CJ, Logan JG, Lafont DT, Merametdjian L, Leitch VD, Butterfield NC, Protheroe HJ, Croucher PI, Baldock PA, Gaultier-Lintia A, Maugars Y, Nicolas G, Banse C, Normant S, Magne N, Gérardin E, Bon N, Sourice S, Guicheux J, Beck L, Williams GR and Bassett JHD

    INSERM, UMR 1229, Regenerative Medicine and Skeleton (RMeS), Université de Nantes, École Nationale Vétérinaire, Agroalimentaire et de l'Alimentation, Nantes-Atlantique (ONIRIS), Nantes, France.

    Osteoporosis is characterized by low bone mineral density (BMD) and fragility fracture and affects over 200 million people worldwide. Bone quality describes the material properties that contribute to strength independently of BMD, and its quantitative analysis is a major priority in osteoporosis research. Tissue mineralization is a fundamental process requiring calcium and phosphate transporters. Here we identify impaired bone quality and strength in Slc20a2<sup>-/-</sup> mice lacking the phosphate transporter SLC20A2. Juveniles had abnormal endochondral and intramembranous ossification, decreased mineral accrual, and short stature. Adults exhibited only small reductions in bone mass and mineralization but a profound impairment of bone strength. Bone quality was severely impaired in Slc20a2<sup>-/-</sup> mice: yield load (-2.3 SD), maximum load (-1.7 SD), and stiffness (-2.7 SD) were all below values predicted from their bone mineral content as determined in a cohort of 320 wild-type controls. These studies identify Slc20a2 as a physiological regulator of tissue mineralization and highlight its critical role in the determination of bone quality and strength. © 2019 The Authors. Journal of Bone and Mineral Research Published by Wiley Periodicals Inc.

    Funded by: EC FP7 Capacities Specific Program-funded EMMA service project; INSERM, Université de Nantes, Région Pays de Loire; SC3M facility (SFR François Bonamy, University of Nantes); Séverine Battaglia (INSERM U1238, Nantes); UTE-IRS-UN Animal Facility; Wellcome Trust Joint Investigator Award: 110141/Z/15/Z and 110140/Z/15/Z; Wellcome Trust Strategic Award: 101123/Z/13/A

    Journal of bone and mineral research : the official journal of the American Society for Bone and Mineral Research 2019;34;6;1101-1114

  • Repeated clinical malaria episodes are associated with modification of the immune system in children.

    Bediako Y, Adams R, Reid AJ, Valletta JJ, Ndungu FM, Sodenkamp J, Mwacharo J, Ngoi JM, Kimani D, Kai O, Wambua J, Nyangweso G, de Villiers EP, Sanders M, Lotkowska ME, Lin JW, Manni S, Addy JWG, Recker M, Newbold C, Berriman M, Bejon P, Marsh K and Langhorne J

    Francis Crick Institute, London, UK.

    Background: There are over 200 million reported cases of malaria each year, and most children living in endemic areas will experience multiple episodes of clinical disease before puberty. We set out to understand how frequent clinical malaria, which elicits a strong inflammatory response, affects the immune system and whether these modifications are observable in the absence of detectable parasitaemia.

    Methods: We used a multi-dimensional approach comprising whole blood transcriptomic, cellular and plasma cytokine analyses on a cohort of children living with endemic malaria, but uninfected at sampling, who had been under active surveillance for malaria for 8 years. Children were categorised into two groups depending on the cumulative number of episodes experienced: high (≥ 8) or low (< 5).

    Results: We observe that multiple episodes of malaria are associated with modification of the immune system. Children who had experienced a large number of episodes demonstrated upregulation of interferon-inducible genes, a clear increase in circulating levels of the immunoregulatory cytokine IL-10 and enhanced activation of neutrophils, B cells and CD8<sup>+</sup> T cells.

    Conclusion: Transcriptomic analysis together with cytokine and immune cell profiling of peripheral blood can robustly detect immune differences between children with different numbers of prior malaria episodes. Multiple episodes of malaria are associated with modification of the immune system in children. Such immune modifications may have implications for the initiation of subsequent immune responses and the induction of vaccine-mediated protection.

    Funded by: Medical Research Council: MR/M003906/1, MR/P020321/1; Wellcome Trust: WT 206194

    BMC medicine 2019;17;1;60

  • Reasons to be testing: the dawn of complex molecular profiling in routine oncology practice.

    Beer PA, Cooke SL, Chang DK and Biankin AV

    Sanger Institute, Wellcome Trust Genome Campus, Cambridge; Wolfson Wohl Cancer Research Centre, Institute of Cancer Sciences, University of Glasgow, Bearsden, Glasgow.

    Funded by: Cancer Research UK: A23526, C29717/A18484, C596/A18076, C596/A20921; Medical Research Council: C29717/A17263; Wellcome Trust: 103721/Z/14/Z

    Annals of oncology : official journal of the European Society for Medical Oncology 2019;30;11;1691-1694

  • Prioritization of cancer therapeutic targets using CRISPR-Cas9 screens.

    Behan FM, Iorio F, Picco G, Gonçalves E, Beaver CM, Migliardi G, Santos R, Rao Y, Sassi F, Pinnelli M, Ansari R, Harper S, Jackson DA, McRae R, Pooley R, Wilkinson P, van der Meer D, Dow D, Buser-Doepner C, Bertotti A, Trusolino L, Stronach EA, Saez-Rodriguez J, Yusa K and Garnett MJ

    Wellcome Sanger Institute, Cambridge, UK.

    Functional genomics approaches can overcome limitations-such as the lack of identification of robust targets and poor clinical efficacy-that hamper cancer drug development. Here we performed genome-scale CRISPR-Cas9 screens in 324 human cancer cell lines from 30 cancer types and developed a data-driven framework to prioritize candidates for cancer therapeutics. We integrated cell fitness effects with genomic biomarkers and target tractability for drug development to systematically prioritize new targets in defined tissues and genotypes. We verified one of our most promising dependencies, the Werner syndrome ATP-dependent helicase, as a synthetic lethal target in tumours from multiple cancer types with microsatellite instability. Our analysis provides a resource of cancer dependencies, generates a framework to prioritize cancer drug targets and suggests specific new targets. The principles described in this study can inform the initial stages of drug development by contributing to a new, diverse and more effective portfolio of cancer drug targets.

    Nature 2019;568;7753;511-516

  • RRAD mutation causes electrical and cytoskeletal defects in cardiomyocytes derived from a familial case of Brugada syndrome.

    Belbachir N, Portero V, Al Sayed ZR, Gourraud JB, Dilasser F, Jesel L, Guo H, Wu H, Gaborit N, Guilluy C, Girardeau A, Bonnaud S, Simonet F, Karakachoff M, Pattier S, Scott C, Burel S, Marionneau C, Chariau C, Gaignerie A, David L, Genin E, Deleuze JF, Dina C, Sauzeau V, Loirand G, Baró I, Schott JJ, Probst V, Wu JC, Redon R, Charpentier F and Le Scouarnec S

    l'institut du thorax, INSERM, CNRS, UNIV Nantes, 8 quai Moncousu, 44007 Nantes cedex 1, France.

    Aims: The Brugada syndrome (BrS) is an inherited cardiac disorder predisposing to ventricular arrhythmias. Despite considerable efforts, its genetic basis and cellular mechanisms remain largely unknown. The objective of this study was to identify a new susceptibility gene for BrS through familial investigation.

    Methods and results: Whole-exome sequencing performed in a three-generation pedigree with five affected members allowed the identification of one rare non-synonymous substitution (p.R211H) in RRAD, the gene encoding the RAD GTPase, carried by all affected members of the family. Three additional rare missense variants were found in 3/186 unrelated index cases. We detected higher levels of RRAD transcripts in subepicardium than in subendocardium in human heart, and in the right ventricle outflow tract compared to the other cardiac compartments in mice. The p.R211H variant was then subjected to electrophysiological and structural investigations in human cardiomyocytes derived from induced pluripotent stem cells (iPSC-CMs). Cardiomyocytes derived from induced pluripotent stem cells from two affected family members exhibited reduced action potential upstroke velocity, prolonged action potentials and increased incidence of early afterdepolarizations, with decreased Na+ peak current amplitude and increased Na+ persistent current amplitude, as well as abnormal distribution of actin and less focal adhesions, compared with intra-familial control iPSC-CMs Insertion of p.R211H-RRAD variant in control iPSCs by genome editing confirmed these results. In addition, iPSC-CMs from affected patients exhibited a decreased L-type Ca2+ current amplitude.

    Conclusion: This study identified a potential new BrS-susceptibility gene, RRAD. Cardiomyocytes derived from induced pluripotent stem cells expressing RRAD variant recapitulated single-cell electrophysiological features of BrS, including altered Na+ current, as well as cytoskeleton disturbances.

    European heart journal 2019

  • DNA methylation aging clocks: challenges and recommendations.

    Bell CG, Lowe R, Adams PD, Baccarelli AA, Beck S, Bell JT, Christensen BC, Gladyshev VN, Heijmans BT, Horvath S, Ideker T, Issa JJ, Kelsey KT, Marioni RE, Reik W, Relton CL, Schalkwyk LC, Teschendorff AE, Wagner W, Zhang K and Rakyan VK

    William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK.

    Epigenetic clocks comprise a set of CpG sites whose DNA methylation levels measure subject age. These clocks are acknowledged as a highly accurate molecular correlate of chronological age in humans and other vertebrates. Also, extensive research is aimed at their potential to quantify biological aging rates and test longevity or rejuvenating interventions. Here, we discuss key challenges to understand clock mechanisms and biomarker utility. This requires dissecting the drivers and regulators of age-related changes in single-cell, tissue- and disease-specific models, as well as exploring other epigenomic marks, longitudinal and diverse population studies, and non-human models. We also highlight important ethical issues in forensic age determination and predicting the trajectory of biological aging in an individual.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/K010867/1, BB/PO28187, BB/R00675X/1, BB/S020845/1; Bundesministerium für Bildung und Forschung: VIP + Epi-Blood-Count; Cancer Research UK: C18281/A19169; Department of Health: BRC369/CN/SB/101310; Deutsche Forschungsgemeinschaft: WA 1706/8-1; Deutsche Krebshilfe: TRACK-AML; Interdisciplinary Center for Clinical Research, RWTH Aachen University: O3-3; Medical Research Council: MC_UU_00011/5, MC_UU_12013/5, MR/K013807/1, MR/R005176/1; NIA NIH HHS: AG031862-12, U34AG051425-01; NIEHS NIH HHS: R01ES014811; NIH HHS: AG021518, AG047200, AG047745, CA080946, CA100632, CA207110, CA207360, CA214005, CA216265, CA221705, P30 ES009089, R01 ES025225, R01 ES027747, RO1AI121226, RO1MDO1430401, RO1MH113930; National Science Foundation of China: 31571359, 31771464; Royal Society Newton Advanced Fellowship: 164914

    Genome biology 2019;20;1;249

  • Karyotype, evolution and phylogenetic reconstruction in Micronycterinae bats with implications for the ancestral karyotype of Phyllostomidae.

    Benathar TCM, Nagamachi CY, Rodrigues LRR, O'Brien PCM, Ferguson-Smith MA, Yang F and Pieczarka JC

    PPGBionorte, Belém, State of Para, Brazil.

    Background: The Micronycterinae form a subfamily of leaf-nosed bats (Phyllostomidae) that contains the genera Lampronycteris Sanborn, 1949, and Micronycteris Gray, 1866 (stricto sensu), and is characterized by marked karyotypic variability and discrepancies in the phylogenetic relationships suggested by the molecular versus morphological data. In the present study, we investigated the chromosomal evolution of the Micronycterinae using classical cytogenetics and multidirectional chromosome painting with whole-chromosomes probes of Phyllostomus hastatus and Carollia brevicauda. Our goal was to perform comparative chromosome mapping between the genera of this subfamily and explore the potential for using chromosomal rearrangements as phylogenetic markers.

    Results: The Micronycterinae exhibit great inter- and intraspecific karyotype diversity, with large blocks of telomere-like sequences inserted within or adjacent to constitutive heterochromatin regions. The phylogenetic results generated from our chromosomal data revealed that the Micronycterinae hold a basal position in the phylogenetic tree of the Phyllostomidae. Molecular cytogenetic data confirmed that there is a low degree of karyotype similarity between Lampronycteris and Micronycteris specimens analyzed, indicating an absence of synapomorphic associations in Micronycterinae.

    Conclusions: We herein confirm that karyotypic variability is present in subfamily Micronycterinae. We further report intraspecific variation and describe a new cytotype in M. megalotis. The cytogenetic data show that this group typically has large blocks of interstitial telomeric sequences that do not appear to be correlated with chromosomal rearrangement events. Phylogenetic analysis using chromosome data recovered the basal position for Micronycterinae, but did not demonstrate that it is a monophyletic lineage, due to the absence of common chromosomal synapomorphy between the genera. These findings may be related to an increase in the rate of chromosomal evolution during the time period that separates Lampronycteris from Micronycteris.

    BMC evolutionary biology 2019;19;1;98

  • Multi-ancestry genome-wide gene-smoking interaction study of 387,272 individuals identifies new loci associated with serum lipids.

    Bentley AR, Sung YJ, Brown MR, Winkler TW, Kraja AT, Ntalla I, Schwander K, Chasman DI, Lim E, Deng X, Guo X, Liu J, Lu Y, Cheng CY, Sim X, Vojinovic D, Huffman JE, Musani SK, Li C, Feitosa MF, Richard MA, Noordam R, Baker J, Chen G, Aschard H, Bartz TM, Ding J, Dorajoo R, Manning AK, Rankinen T, Smith AV, Tajuddin SM, Zhao W, Graff M, Alver M, Boissel M, Chai JF, Chen X, Divers J, Evangelou E, Gao C, Goel A, Hagemeijer Y, Harris SE, Hartwig FP, He M, Horimoto ARVR, Hsu FC, Hung YJ, Jackson AU, Kasturiratne A, Komulainen P, Kühnel B, Leander K, Lin KH, Luan J, Lyytikäinen LP, Matoba N, Nolte IM, Pietzner M, Prins B, Riaz M, Robino A, Said MA, Schupf N, Scott RA, Sofer T, Stancáková A, Takeuchi F, Tayo BO, van der Most PJ, Varga TV, Wang TD, Wang Y, Ware EB, Wen W, Xiang YB, Yanek LR, Zhang W, Zhao JH, Adeyemo A, Afaq S, Amin N, Amini M, Arking DE, Arzumanyan Z, Aung T, Ballantyne C, Barr RG, Bielak LF, Boerwinkle E, Bottinger EP, Broeckel U, Brown M, Cade BE, Campbell A, Canouil M, Charumathi S, Chen YI, Christensen K, COGENT-Kidney Consortium, Concas MP, Connell JM, de Las Fuentes L, de Silva HJ, de Vries PS, Doumatey A, Duan Q, Eaton CB, Eppinga RN, Faul JD, Floyd JS, Forouhi NG, Forrester T, Friedlander Y, Gandin I, Gao H, Ghanbari M, Gharib SA, Gigante B, Giulianini F, Grabe HJ, Gu CC, Harris TB, Heikkinen S, Heng CK, Hirata M, Hixson JE, Ikram MA, EPIC-InterAct Consortium, Jia Y, Joehanes R, Johnson C, Jonas JB, Justice AE, Katsuya T, Khor CC, Kilpeläinen TO, Koh WP, Kolcic I, Kooperberg C, Krieger JE, Kritchevsky SB, Kubo M, Kuusisto J, Lakka TA, Langefeld CD, Langenberg C, Launer LJ, Lehne B, Lewis CE, Li Y, Liang J, Lin S, Liu CT, Liu J, Liu K, Loh M, Lohman KK, Louie T, Luzzi A, Mägi R, Mahajan A, Manichaikul AW, McKenzie CA, Meitinger T, Metspalu A, Milaneschi Y, Milani L, Mohlke KL, Momozawa Y, Morris AP, Murray AD, Nalls MA, Nauck M, Nelson CP, North KE, O'Connell JR, Palmer ND, Papanicolau GJ, Pedersen NL, Peters A, Peyser PA, Polasek O, Poulter N, Raitakari OT, Reiner AP, Renström F, Rice TK, Rich SS, Robinson JG, Rose LM, Rosendaal FR, Rudan I, Schmidt CO, Schreiner PJ, Scott WR, Sever P, Shi Y, Sidney S, Sims M, Smith JA, Snieder H, Starr JM, Strauch K, Stringham HM, Tan NYQ, Tang H, Taylor KD, Teo YY, Tham YC, Tiemeier H, Turner ST, Uitterlinden AG, Understanding Society Scientific Group, van Heemst D, Waldenberger M, Wang H, Wang L, Wang L, Wei WB, Williams CA, Wilson G, Wojczynski MK, Yao J, Young K, Yu C, Yuan JM, Zhou J, Zonderman AB, Becker DM, Boehnke M, Bowden DW, Chambers JC, Cooper RS, de Faire U, Deary IJ, Elliott P, Esko T, Farrall M, Franks PW, Freedman BI, Froguel P, Gasparini P, Gieger C, Horta BL, Juang JJ, Kamatani Y, Kammerer CM, Kato N, Kooner JS, Laakso M, Laurie CC, Lee IT, Lehtimäki T, Lifelines Cohort, Magnusson PKE, Oldehinkel AJ, Penninx BWJH, Pereira AC, Rauramaa R, Redline S, Samani NJ, Scott J, Shu XO, van der Harst P, Wagenknecht LE, Wang JS, Wang YX, Wareham NJ, Watkins H, Weir DR, Wickremasinghe AR, Wu T, Zeggini E, Zheng W, Bouchard C, Evans MK, Gudnason V, Kardia SLR, Liu Y, Psaty BM, Ridker PM, van Dam RM, Mook-Kanamori DO, Fornage M, Province MA, Kelly TN, Fox ER, Hayward C, van Duijn CM, Tai ES, Wong TY, Loos RJF, Franceschini N, Rotter JI, Zhu X, Bierut LJ, Gauderman WJ, Rice K, Munroe PB, Morrison AC, Rao DC, Rotimi CN and Cupples LA

    Center for Research on Genomics and Global Health, National Human Genome Research Institute, US National Institutes of Health, Bethesda, MD, USA.

    The concentrations of high- and low-density-lipoprotein cholesterol and triglycerides are influenced by smoking, but it is unknown whether genetic associations with lipids may be modified by smoking. We conducted a multi-ancestry genome-wide gene-smoking interaction study in 133,805 individuals with follow-up in an additional 253,467 individuals. Combined meta-analyses identified 13 new loci associated with lipids, some of which were detected only because association differed by smoking status. Additionally, we demonstrate the importance of including diverse populations, particularly in studies of interactions with lifestyle factors, where genomic and lifestyle differences by ancestry may contribute to novel findings.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F019394/1; Intramural NIH HHS: Z01 HG200362, Z99 HG999999, ZIA HG200362-02; Medical Research Council: MC_UP_1605/7, MC_UU_00007/10, MC_UU_12015/1, MR/K002414/1, MR/L01341X/1, MR/L01632X/1, MR/R023484/1; NHLBI NIH HHS: K01 HL135405, R01 HL118305, R01 HL142302, R21 HL123677; NIA NIH HHS: U01 AG009740; NICHD NIH HHS: P2C HD050924, R01 HD057194; NIDDK NIH HHS: P30 DK020572, R01 DK062370, R01 DK107786, R01 DK117445, U01 DK062370; NIEHS NIH HHS: P30 ES007048; NIMHD NIH HHS: R01 MD012765

    Nature genetics 2019;51;4;636-648

  • SigProfilerMatrixGenerator: a tool for visualizing and exploring patterns of small mutational events.

    Bergstrom EN, Huang MN, Mahto U, Barnes M, Stratton MR, Rozen SG and Alexandrov LB

    Department of Cellular and Molecular Medicine and Department of Bioengineering and Moores Cancer Center, University of California, San Diego, La Jolla, CA, 92093, USA.

    Background: Cancer genomes are peppered with somatic mutations imprinted by different mutational processes. The mutational pattern of a cancer genome can be used to identify and understand the etiology of the underlying mutational processes. A plethora of prior research has focused on examining mutational signatures and mutational patterns from single base substitutions and their immediate sequencing context. We recently demonstrated that further classification of small mutational events (including substitutions, insertions, deletions, and doublet substitutions) can be used to provide a deeper understanding of the mutational processes that have molded a cancer genome. However, there has been no standard tool that allows fast, accurate, and comprehensive classification for all types of small mutational events.

    Results: Here, we present SigProfilerMatrixGenerator, a computational tool designed for optimized exploration and visualization of mutational patterns for all types of small mutational events. SigProfilerMatrixGenerator is written in Python with an R wrapper package provided for users that prefer working in an R environment. SigProfilerMatrixGenerator produces fourteen distinct matrices by considering transcriptional strand bias of individual events and by incorporating distinct classifications for single base substitutions, doublet base substitutions, and small insertions and deletions. While the tool provides a comprehensive classification of mutations, SigProfilerMatrixGenerator is also faster and more memory efficient than existing tools that generate only a single matrix.

    Conclusions: SigProfilerMatrixGenerator provides a standardized method for classifying small mutational events that is both efficient and scalable to large datasets. In addition to extending the classification of single base substitutions, the tool is the first to provide support for classifying doublet base substitutions and small insertions and deletions. SigProfilerMatrixGenerator is freely available at with an extensive documentation at .

    Funded by: Cancer Research UK Grand Challenge Award: C98/A24032; Singapore National Medical Research Council: MOH-000032/MOH-CIRG18may-0004, NMRC/CIRG/1422/2015

    BMC genomics 2019;20;1;685

  • Computational Intelligence for Life Sciences

    Besozzi,D., Manzoni,L., Nobile,M.S., Spolaor,S., Castelli,M., Vanneschi,L., Cazzaniga,P., Ruberto,S., Rundo,L. and Tangherloni,A.

    Computational Intelligence (CI) is a computer science discipline encompassing the theory, design, development and application of biologically and linguistically derived computational paradigms. Traditionally, the main elements of CI are Evolutionary Computation, Swarm Intelligence, Fuzzy Logic, and Neural Networks. CI aims at proposing new algorithms able to solve complex computational problems by taking inspiration from natural phenomena. In an intriguing turn of events, these nature-inspired methods have been widely adopted to investigate a plethora of problems related to nature itself. In this paper we present a variety of CI methods applied to three problems in life sciences, highlighting their effectiveness: we describe how protein folding can be faced by exploiting Genetic Programming, the inference of haplotypes can be tackled using Genetic Algorithms, and the estimation of biochemical kinetic parameters can be performed by means of Swarm Intelligence. We show that CI methods can generate very high quality solutions, providing a sound methodology to solve complex optimization problems in life sciences.

    Fundamenta Informaticae 2019;171;1-4;57-80

  • Tissue- and sex-specific small RNAomes reveal sex differences in response to the environment.

    Bezler A, Braukmann F, West SM, Duplan A, Conconi R, Schütz F, Gönczy P, Piano F, Gunsalus K, Miska EA and Keller L

    Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland.

    RNA interference (RNAi) related pathways are essential for germline development and fertility in metazoa and can contribute to inter- and trans-generational inheritance. In the nematode Caenorhabditis elegans, environmental double-stranded RNA provided by feeding can lead to heritable changes in phenotype and gene expression. Notably, transmission efficiency differs between the male and female germline, yet the underlying mechanisms remain elusive. Here we use high-throughput sequencing of dissected gonads to quantify sex-specific endogenous piRNAs, miRNAs and siRNAs in the C. elegans germline and the somatic gonad. We identify genes with exceptionally high levels of secondary 22G RNAs that are associated with low mRNA expression, a signature compatible with silencing. We further demonstrate that contrary to the hermaphrodite germline, the male germline, but not male soma, is resistant to environmental RNAi triggers provided by feeding, in line with previous work. This sex-difference in silencing efficacy is associated with lower levels of gonadal RNAi amplification products. Moreover, this tissue- and sex-specific RNAi resistance is regulated by the germline, since mutant males with a feminized germline are RNAi sensitive. This study provides important sex- and tissue-specific expression data of miRNA, piRNA and siRNA as well as mechanistic insights into sex-differences of gene regulation in response to environmental cues.

    Funded by: Cancer Research UK: C13474/A18583, C6946/A14492; NHGRI NIH HHS: U01 HG004276; NICHD NIH HHS: R01 HD046236; Wellcome Trust: 092096/Z/10/Z, 104640/Z/14/Z

    PLoS genetics 2019;15;2;e1007905

  • Nosocomial transmission of influenza: A retrospective cross-sectional study using next generation sequencing at a hospital in England (2012-2014).

    Blackburn RM, Frampton D, Smith CM, Fragaszy EB, Watson SJ, Ferns RB, Binter Š, Coen PG, Grant P, Shallcross LJ, Kozlakidis Z, Pillay D, Kellam P, Hué S, Nastouli E, Hayward AC and ICONIC group

    Institute of Health Informatics, UCL, London, UK.

    Background: The extent of transmission of influenza in hospital settings is poorly understood. Next generation sequencing may improve this by providing information on the genetic relatedness of viral strains.

    Objectives: We aimed to apply next generation sequencing to describe transmission in hospital and compare with methods based on routinely-collected data.

    Methods: All influenza samples taken through routine care from patients at University College London Hospitals NHS Foundation Trust (September 2012 to March 2014) were included. We conducted Illumina sequencing and identified genetic clusters. We compared nosocomial transmission estimates defined using classical methods (based on time from admission to sample) and genetic clustering. We identified pairs of cases with space-time links and assessed genetic relatedness.

    Results: We sequenced influenza sampled from 214 patients. There were 180 unique genetic strains, 16 (8.8%) of which seeded a new transmission chain. Nosocomial transmission was indicated for 32 (15.0%) cases using the classical definition and 34 (15.8%) based on genetic clustering. Of the 50 patients in a genetic cluster, 11 (22.0%) had known space-time links with other cases in the same cluster. Genetic distances between pairs of cases with space-time links were lower than for pairs without spatial links (P < .001).

    Conclusions: Genetic data confirmed that nosocomial transmission contributes significantly to the hospital burden of influenza and elucidated transmission chains. Prospective next generation sequencing could support outbreak investigations and monitor the impact of infection and control measures.

    Funded by: Department of Health: T5-344 (ICONIC); Medical Research Council: MR/S003797/1; Wellcome Trust: T5-344 (ICONIC)

    Influenza and other respiratory viruses 2019;13;6;556-563

  • Rapid sequencing of MRSA direct from clinical plates in a routine microbiology laboratory.

    Blane B, Raven KE, Leek D, Brown N, Parkhill J and Peacock SJ

    Department of Medicine, University of Cambridge, Box 157 Addenbrooke's Hospital, Hills Road, Cambridge CB2 0QQ, UK.

    Background: Routine sequencing of MRSA could bring about significant improvements to outbreak detection and investigation. Sequencing is commonly performed using DNA extracted from a pure culture, but overcoming the delay associated with this step could reduce the time to infection control interventions.

    Objectives: To develop and evaluate rapid sequencing of MRSA using primary clinical cultures.

    Methods: Patients with samples submitted to the clinical laboratory at the Cambridge University Hospitals NHS Foundation Trust from which MRSA was isolated were identified, the routine laboratory culture plates obtained and DNA extraction and sequencing performed.

    Results: An evaluation of routine MRSA cultures from 30 patients demonstrated that direct sequencing from bacterial colonies picked from four different culture media was feasible. The 30 clinical MRSA isolates were sequenced on the day of plate retrieval over five runs and passed quality control metrics for sequencing depth and coverage. The maximum contamination detected using Kraken was 1.09% fragments, which were identified as Prevotella dentalis. The most common contaminants were other staphylococcal species (25 isolate sequences) and Burkholderia dolosa (11 isolate sequences). Core genome pairwise SNP analysis to identify clusters based on isolates that were ≤50 SNPs different was used to triage cases for further investigation. This identified three clusters, but more detailed genomic and epidemiological evaluation excluded an acute outbreak.

    Conclusions: Rapid sequencing of MRSA from clinical culture plates is feasible and reduces the delay associated with purity culture prior to DNA extraction.

    The Journal of antimicrobial chemotherapy 2019;74;8;2153-2156

  • Diagnostic host gene signature for distinguishing enteric fever from other febrile diseases.

    Blohmke CJ, Muller J, Gibani MM, Dobinson H, Shrestha S, Perinparajah S, Jin C, Hughes H, Blackwell L, Dongol S, Karkey A, Schreiber F, Pickard D, Basnyat B, Dougan G, Baker S, Pollard AJ and Darton TC

    Department of Paediatrics, Centre for Clinical Vaccinology and Tropical Medicine, Oxford Vaccine Group, Oxford, UK.

    Misdiagnosis of enteric fever is a major global health problem, resulting in patient mismanagement, antimicrobial misuse and inaccurate disease burden estimates. Applying a machine learning algorithm to host gene expression profiles, we identified a diagnostic signature, which could distinguish culture-confirmed enteric fever cases from other febrile illnesses (area under receiver operating characteristic curve > 95%). Applying this signature to a culture-negative suspected enteric fever cohort in Nepal identified a further 12.6% as likely true cases. Our analysis highlights the power of data-driven approaches to identify host response patterns for the diagnosis of febrile illnesses. Expression signatures were validated using qPCR, highlighting their utility as PCR-based diagnostics for use in endemic settings.

    Funded by: Bill and Melinda Gates Foundation (Bill &amp;amp; Melinda Gates Foundation): OPP1084259, OPP1089317; European Commission FP7 Grant &quot;Advanced Immunization Technologies&quot; (ADITEC); European Vaccine Initiative: PIM; Medical Research Council: MR/M00919X/1; NIHR Oxford Biomedical Research Centre; UK Research and Innovation | Medical Research Council (MRC): MR/M00919X/1; Wellcome Trust (WT): 092661, 100087/Z/12/Z

    EMBO molecular medicine 2019;11;10;e10431

  • Construction of an Escherichia coli genome with fewer codons sets records.

    Blount BA and Ellis T

    Nature 2019;569;7757;492-494

  • Clinical and laboratory-induced colistin-resistance mechanisms in Acinetobacter baumannii.

    Boinett CJ, Cain AK, Hawkey J, Do Hoang NT, Khanh NNT, Thanh DP, Dordel J, Campbell JI, Lan NPH, Mayho M, Langridge GC, Hadfield J, Chau NVV, Thwaites GE, Parkhill J, Thomson NR, Holt KE and Baker S

    1​Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    The increasing incidence and emergence of multi-drug resistant (MDR) Acinetobacter baumannii has become a major global health concern. Colistin is a historic antimicrobial that has become commonly used as a treatment for MDR A. baumannii infections. The increase in colistin usage has been mirrored by an increase in colistin resistance. We aimed to identify the mechanisms associated with colistin resistance in A. baumannii using multiple high-throughput-sequencing technologies, including transposon-directed insertion site sequencing (TraDIS), RNA sequencing (RNAseq) and whole-genome sequencing (WGS) to investigate the genotypic changes of colistin resistance in A. baumannii. Using TraDIS, we found that genes involved in drug efflux (adeIJK), and phospholipid (mlaC, mlaF and mlaD) and lipooligosaccharide synthesis (lpxC and lpsO) were required for survival in sub-inhibitory concentrations of colistin. Transcriptomic (RNAseq) analysis revealed that expression of genes encoding efflux proteins (adeI, adeC, emrB, mexB and macAB) was enhanced in in vitro generated colistin-resistant strains. WGS of these organisms identified disruptions in genes involved in lipid A (lpxC) and phospholipid synthesis (mlaA), and in the baeS/R two-component system (TCS). We additionally found that mutations in the pmrB TCS genes were the primary colistin-resistance-associated mechanisms in three Vietnamese clinical colistin-resistant A. baumannii strains. Our results outline the entire range of mechanisms employed in A. baumannii for resistance against colistin, including drug extrusion and the loss of lipid A moieties by gene disruption or modification.

    Funded by: Medical Research Council: G1100100; Wellcome Trust: 100087/Z/12/Z, WT098051

    Microbial genomics 2019;5;2

  • Crumble: reference free lossy compression of sequence quality values.

    Bonfield JK, McCarthy SA and Durbin R

    DNA Pipelines, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

    Motivation: The bulk of space taken up by NGS sequencing CRAM files consists of per-base quality values. Most of these are unnecessary for variant calling, offering an opportunity for space saving.

    Results: On the Syndip test set, a 17 fold reduction in the quality storage portion of a CRAM file can be achieved while maintaining variant calling accuracy. The size reduction of an entire CRAM file varied from 2.2 to 7.4 fold, depending on the non-quality content of the original file (see Supplementary Material S6 for details).

    Availability and implementation: Crumble is OpenSource and can be obtained from

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Funded by: Wellcome Trust: 207492/Z/17/Z, WT098051

    Bioinformatics (Oxford, England) 2019;35;2;337-339

  • PAPSS2-related brachyolmia: Clinical and radiological phenotype in 18 new cases.

    Bownass L, Abbs S, Armstrong R, Baujat G, Behzadi G, Berentsen RD, Burren C, Calder A, Cormier-Daire V, Newbury-Ecob R, Foulds N, Juliusson PB, Kant SG, Lefroy H, Mehta SG, Merckoll E, Michot C, Monsell F, Offiah AC, Richards A, Rosendahl K, Rustad CF, Shears D, Tveten K, Wellesley D, Wordsworth P, Deciphering Developmental Disorders Study and Smithson S

    Clinical Genetics, St Michael's Hospital Bristol, University Hospitals Bristol NHS Foundation Trust, Bristol, UK.

    Brachyolmia is a skeletal dysplasia characterized by short spine-short stature, platyspondyly, and minor long bone abnormalities. We describe 18 patients, from different ethnic backgrounds and ages ranging from infancy to 19 years, with the autosomal recessive form, associated with PAPSS2. The main clinical features include disproportionate short stature with short spine associated with variable symptoms of pain, stiffness, and spinal deformity. Eight patients presented prenatally with short femora, whereas later in childhood their short-spine phenotype emerged. We observed the same pattern of changing skeletal proportion in other patients. The radiological findings included platyspondyly, irregular end plates of the elongated vertebral bodies, narrow disc spaces and short over-faced pedicles. In the limbs, there was mild shortening of femoral necks and tibiae in some patients, whereas others had minor epiphyseal or metaphyseal changes. In all patients, exome and Sanger sequencing identified homozygous or compound heterozygous PAPSS2 variants, including c.809G>A, common to white European patients. Bi-parental inheritance was established where possible. Low serum DHEAS, but not overt androgen excess was identified. Our study indicates that autosomal recessive brachyolmia occurs across continents and may be under-recognized in infancy. This condition should be considered in the differential diagnosis of short femora presenting in the second trimester.

    American journal of medical genetics. Part A 2019;179;9;1884-1894

  • Essential role of inverted repeat in Epstein-Barr virus IR-1 in B cell transformation; geographical variation of the viral genome.

    Bridges R, Correia S, Wegner F, Venturini C, Palser A, White RE, Kellam P, Breuer J and Farrell PJ

    1 Section of Virology, Faculty of Medicine , Imperial College London, London W2 1PG , UK.

    Many regions of the Epstein-Barr virus (EBV) genome, repeated and unique sequences, contribute to the geographical variation observed between strains. Here we use a large alignment of curated EBV genome sequences to identify major sites of variation in the genome of type 1 EBV strains; the CAO deletion in latent membrane protein 1 (LMP1) is the most frequent major indel present in the unique regions of EBV strains from various parts of the world. Principal component analysis was used to identify patterns of sequence variation and nucleotide positions in the sequences that can distinguish EBV from some different geographical regions. Viral genome sequence variation also affects interpretation of genetic content; known genes, origins of replication and gene expression control regions explain most of the viral genome but there are still a few sections of unknown function. One of these EBV genome regions contains a large inverted repeat sequence (invR) within the IR-1 major internal repeat array. We deleted this invR sequence and showed that this abolished the ability of the virus to transform human B cells into lymphoblastoid cell lines. This article is part of the theme issue 'Silent cancer agents: multi-disciplinary modelling of human DNA oncoviruses'.

    Philosophical transactions of the Royal Society of London. Series B, Biological sciences 2019;374;1773;20180299

  • Genes for Good: Engaging the Public in Genetics Research via Social Media.

    Brieger K, Zajac GJM, Pandit A, Foerster JR, Li KW, Annis AC, Schmidt EM, Clark CP, McMorrow K, Zhou W, Yang J, Kwong AM, Boughton AP, Wu J, Scheller C, Parikh T, de la Vega A, Brazel DM, Frieser M, Rea-Sandin G, Fritsche LG, Vrieze SI and Abecasis GR

    Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA.

    The Genes for Good study uses social media to engage a large, diverse participant pool in genetics research and education. Health history and daily tracking surveys are administered through a Facebook application, and participants who complete a minimum number of surveys are mailed a saliva sample kit ("spit kit") to collect DNA for genotyping. As of March 2019, we engaged >80,000 individuals, sent spit kits to >32,000 individuals who met minimum participation requirements, and collected >27,000 spit kits. Participants come from all 50 states and include a diversity of ancestral backgrounds. Rates of important chronic health indicators are consistent with those estimated for the general U.S. population using more traditional study designs. However, our sample is younger and contains a greater percentage of females than the general population. As one means of verifying data quality, we have replicated genome-wide association studies (GWASs) for exemplar traits, such as asthma, diabetes, body mass index (BMI), and pigmentation. The flexible framework of the web application makes it relatively simple to add new questionnaires and for other researchers to collaborate. We anticipate that the study sample will continue to grow and that future analyses may further capitalize on the strengths of the longitudinal data in combination with genetic information.

    American journal of human genetics 2019;105;1;65-77

  • Partially methylated domains are hypervariable in breast cancer and fuel widespread CpG island hypermethylation.

    Brinkman AB, Nik-Zainal S, Simmer F, Rodríguez-González FG, Smid M, Alexandrov LB, Butler A, Martin S, Davies H, Glodzik D, Zou X, Ramakrishna M, Staaf J, Ringnér M, Sieuwerts A, Ferrari A, Morganella S, Fleischer T, Kristensen V, Gut M, van de Vijver MJ, Børresen-Dale AL, Richardson AL, Thomas G, Gut IG, Martens JWM, Foekens JA, Stratton MR and Stunnenberg HG

    Department of Molecular Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, Radboud University, PO Box 9101, Nijmegen, 6500 HB, The Netherlands.

    Global loss of DNA methylation and CpG island (CGI) hypermethylation are key epigenomic aberrations in cancer. Global loss manifests itself in partially methylated domains (PMDs) which extend up to megabases. However, the distribution of PMDs within and between tumor types, and their effects on key functional genomic elements including CGIs are poorly defined. We comprehensively show that loss of methylation in PMDs occurs in a large fraction of the genome and represents the prime source of DNA methylation variation. PMDs are hypervariable in methylation level, size and distribution, and display elevated mutation rates. They impose intermediate DNA methylation levels incognizant of functional genomic elements including CGIs, underpinning a CGI methylator phenotype (CIMP). Repression effects on tumor suppressor genes are negligible as they are generally excluded from PMDs. The genomic distribution of PMDs reports tissue-of-origin and may represent tissue-specific silent regions which tolerate instability at the epigenetic, transcriptomic and genetic level.

    Funded by: NCI NIH HHS: P50 CA168504; Wellcome Trust

    Nature communications 2019;10;1;1749

  • Combined Genome and Transcriptome (G&T) Sequencing of Single Cells.

    Bronner IF and Lorenz S

    Single Cell Genomics Core Facility, Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.

    The simultaneous examination of a single cell's genome and transcriptome presents scientists with a powerful tool to study genetic variability and its effect on gene expression. In this chapter, we describe the library generation method for combined genome and transcriptome sequencing (G&T-seq) originally described by Macaulay et al. (Nat Protoc 11(11):2081-2103, 2016; Nat Methods 12(6):519-522, 2015). This includes some alterations we made to improve robustness of this process for both the novice user and laboratories that want to deploy this method at scale. Using this method, genomic DNA and full-length mRNA from single cells are separated, amplified, and converted into Illumina sequencer-compatible sequencing libraries.

    Funded by: Wellcome Trust

    Methods in molecular biology (Clifton, N.J.) 2019;1979;319-362

  • Best Practices for Illumina Library Preparation.

    Bronner IF and Quail MA

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom.

    In this unit, we describe a set of protocols and recommendations for Illumina library preparation. We review best practices in template quantitation methods; template fragmentation methodologies; solid-phase reverse-immobilization cleanup, including buffer exchange and size selection; end repair, A-tailing, and adapter ligation; indexing strategies; considerations regarding whether to use polymerase chain reaction; final library quantification methodologies; and normalization and pooling strategies. These workflows are applicable to both high-throughput and low-throughput Illumina library preparation and should help reduce bias, increase cost effectiveness, and produce high library yields. This is an extensive update of the previous version of this unit. © 2019 by John Wiley & Sons, Inc.

    Funded by: Wellcome Trust: 206194

    Current protocols in human genetics 2019;102;1;e86

  • Pilot Evaluation of a Fully Automated Bioinformatics System for Analysis of Methicillin-Resistant Staphylococcus aureus Genomes and Detection of Outbreaks.

    Brown NM, Blane B, Raven KE, Kumar N, Leek D, Bragin E, Rhodes PA, Enoch DA, Thaxter R, Parkhill J and Peacock SJ

    Clinical Microbiology and Public Health Laboratory, Cambridge University Hospitals NHS Foundation Trust, Cambridge, United Kingdom.

    Genomic surveillance that combines bacterial sequencing and epidemiological information will become the gold standard for outbreak detection, but its clinical translation is hampered by the lack of automated interpretation tools. We performed a prospective pilot study to evaluate the analysis of methicillin-resistant <i>Staphylococcus aureus</i> (MRSA) genomes using the Next Gen Diagnostics (NGD) automated bioinformatics system. Seventeen unselected MRSA-positive patients were identified in a clinical microbiology laboratory in England over a period of 2 weeks in 2018, and 1 MRSA isolate per case was sequenced on the Illumina MiniSeq instrument. The NGD system automatically activated after sequencing and processed fastq folders to determine species, multilocus sequence type, the presence of a <i>mec</i> gene, antibiotic susceptibility predictions, and genetic relatedness based on mapping to a reference MRSA genome and detection of pairwise core genome single-nucleotide polymorphisms. The NGD system required 90 s per sample to automatically analyze data from each run, the results of which were automatically displayed. The same data were independently analyzed using a research-based approach. There was full concordance between the two analysis methods regarding species (<i>S. aureus</i>), detection of <i>mecA</i>, sequence type assignment, and detection of genetic determinants of resistance. Both analysis methods identified two MRSA clusters based on relatedness, one of which contained 3 cases that were involved in an outbreak linked to a clinic and ward associated with diabetic patient care. We conclude that, in this pilot study, the NGD system provided rapid and accurate data that could support infection control practices.

    Funded by: Wellcome Trust

    Journal of clinical microbiology 2019;57;11

  • Somatic mutations and clonal dynamics in healthy and cirrhotic human liver.

    Brunner SF, Roberts ND, Wylie LA, Moore L, Aitken SJ, Davies SE, Sanders MA, Ellis P, Alder C, Hooks Y, Abascal F, Stratton MR, Martincorena I, Hoare M and Campbell PJ

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, UK.

    The most common causes of chronic liver disease are excess alcohol intake, viral hepatitis and non-alcoholic fatty liver disease, with the clinical spectrum ranging in severity from hepatic inflammation to cirrhosis, liver failure or hepatocellular carcinoma (HCC). The genome of HCC exhibits diverse mutational signatures, resulting in recurrent mutations across more than 30 cancer genes<sup>1-7</sup>. Stem cells from normal livers have a low mutational burden and limited diversity of signatures<sup>8</sup>, which suggests that the complexity of HCC arises during the progression to chronic liver disease and subsequent malignant transformation. Here, by sequencing whole genomes of 482 microdissections of 100-500 hepatocytes from 5 normal and 9 cirrhotic livers, we show that cirrhotic liver has a higher mutational burden than normal liver. Although rare in normal hepatocytes, structural variants, including chromothripsis, were prominent in cirrhosis. Driver mutations, such as point mutations and structural variants, affected 1-5% of clones. Clonal expansions of millimetres in diameter occurred in cirrhosis, with clones sequestered by the bands of fibrosis that surround regenerative nodules. Some mutational signatures were universal and equally active in both non-malignant hepatocytes and HCCs; some were substantially more active in HCCs than chronic liver disease; and others-arising from exogenous exposures-were present in a subset of patients. The activity of exogenous signatures between adjacent cirrhotic nodules varied by up to tenfold within each patient, as a result of clone-specific and microenvironmental forces. Synchronous HCCs exhibited the same mutational signatures as background cirrhotic liver, but with higher burden. Somatic mutations chronicle the exposures, toxicity, regeneration and clonal structure of liver tissue as it progresses from health to disease.

    Funded by: Medical Research Council: MC_PC_12009; Wellcome Trust: 103858, 206194, WT088340MA

    Nature 2019;574;7779;538-542

  • Editing the Genome of Human Induced Pluripotent Stem Cells Using CRISPR/Cas9 Ribonucleoprotein Complexes.

    Bruntraeger M, Byrne M, Long K and Bassett AR

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK.

    Genome editing using the CRISPR/Cas9 system has rapidly established itself as an essential tool in the genetic manipulation of many organisms, including human cell lines. Its application to human induced pluripotent stem cells (hiPSCs) allows for the generation of isogenic cell pairs that differ in a single genetic lesion, and therefore the identification and characterization of causal genetic variants. We describe a simple, effective approach to perform delicate manipulations of the genome of hiPSCs through delivery of Cas9 RNPs along with ssDNA oligonucleotide repair templates that can generate mutations in up to 98% of single cell clones and introduce single nucleotide changes at an efficiency of up to 40%. We describe our use of a T7 endonuclease assay to identify active guide RNAs, and a high-throughput sequencing genotyping strategy that allows the identification of correctly edited clones. We also present our experiences of generating single nucleotide changes at 15 sites, which show considerable variability between both guides and target sites in the efficiency at which such changes can be introduced.

    Methods in molecular biology (Clifton, N.J.) 2019;1961;153-183

  • Transcriptional responses of Biomphalaria pfeifferi and Schistosoma mansoni following exposure to niclosamide, with evidence for a synergistic effect on snails following exposure to both stressors.

    Buddenborg SK, Kamel B, Bu L, Zhang SM, Mkoji GM and Loker ES

    Department of Biology, Center for Evolutionary and Theoretical Immunology, University of New Mexico, Albuquerque NM United States of America.

    Background: Schistosomiasis is one of the world's most common NTDs. Successful control operations often target snail vectors with the molluscicide niclosamide. Little is known about how niclosamide affects snails, including for Biomphalaria pfeifferi, the most important vector for Schistosoma mansoni in Africa. We used Illumina technology to explore how field-derived B. pfeifferi, either uninfected or harboring cercariae-producing S. mansoni sporocysts, respond to a sublethal treatment of niclosamide. This study afforded the opportunity to determine if snails respond differently to biotic or abiotic stressors, and if they reserve unique responses for when presented with both stressors in combination. We also examined how sporocysts respond when their snail host is treated with niclosamide.

    Principal findings: Cercariae-producing sporocysts within snails treated with niclosamide express ~68% of the genes in the S. mansoni genome, as compared to 66% expressed by intramolluscan stages of S. mansoni in snails not treated with niclosamide. Niclosamide does not disable sporocysts nor does it seem to provoke from them distinctive responses associated with detoxifying a xenobiotic. For uninfected B. pfeifferi, niclosamide treatment alone increases expression of several features not up-regulated in infected snails including particular cytochrome p450s and heat shock proteins, glutathione-S-transferases, antimicrobial factors like LBP/BPI and protease inhibitors, and also provokes strong down regulation of proteases. Exposure of infected snails to niclosamide resulted in numerous up-regulated responses associated with apoptosis along with down-regulated ribosomal and defense functions, indicative of a distinctive, compromised state not achieved with either stimulus alone.

    Conclusions/significance: This study helps define the transcriptomic responses of an important and under-studied schistosome vector to S. mansoni sporocysts, to niclosamide, and to both in combination. It suggests the response of S. mansoni sporocysts to niclosamide is minimal and not reflective of a distinct repertoire of genes to handle xenobiotics while in the snail host. It also offers new insights for how niclosamide affects snails.

    Funded by: NIAID NIH HHS: R01 AI101438, R37 AI101438; NIGMS NIH HHS: P20 GM103451, P20 GM103452, P30 GM110907

    PLoS neglected tropical diseases 2019;13;12;e0006927

  • The in vivo transcriptome of Schistosoma mansoni in the prominent vector species Biomphalaria pfeifferi with supporting observations from Biomphalaria glabrata.

    Buddenborg SK, Kamel B, Hanelt B, Bu L, Zhang SM, Mkoji GM and Loker ES

    Department of Biology, Center for Evolutionary and Theoretical Immunology, University of New Mexico, Albuquerque, NM, United States of America.

    Background: The full scope of the genes expressed by schistosomes during intramolluscan development has yet to be characterized. Understanding the gene products deployed by larval schistosomes in their snail hosts will provide insights into their establishment, maintenance, asexual reproduction, ability to castrate their hosts, and their prolific production of human-infective cercariae. Using the Illumina platform, the intramolluscan transcriptome of Schistosoma mansoni was investigated in field-derived specimens of the prominent vector species Biomphalaria pfeifferi at 1 and 3 days post infection (d) and from snails shedding cercariae. These S. mansoni samples were derived from the same snails used in our complementary B. pfeifferi transcriptomic study. We supplemented this view with microarray analyses of S. mansoni from B. glabrata at 2d, 4d, 8d, 16d, and 32d to highlight robust features of S. mansoni transcription, even when a different technique and vector species was used.

    Principal findings: Transcripts representing at least 7,740 (66%) of known S. mansoni genes were expressed during intramolluscan development, with the greatest number expressed in snails shedding cercariae. Many transcripts were constitutively expressed throughout development featuring membrane transporters, and metabolic enzymes involved in protein and nucleic acid synthesis and cell division. Several proteases and protease inhibitors were expressed at all stages, including some proteases usually associated with cercariae. Transcripts associated with G-protein coupled receptors, germ cell perpetuation, and stress responses and defense were well represented. We noted transcripts homologous to planarian anti-bacterial factors, several neural development or neuropeptide transcripts including neuropeptide Y, and receptors that may be associated with schistosome germinal cell maintenance that could also impact host reproduction. In at least one snail the presence of larvae of another digenean species (an amphistome) was associated with repressed S. mansoni transcriptional activity.

    Conclusions/significance: This in vivo study, emphasizing field-derived snails and schistosomes, but supplemented with observations from a lab model, provides a distinct view from previous studies of development of cultured intramolluscan stages from lab-maintained organisms. We found many highly represented transcripts with suspected or unknown functions, with connection to intramolluscan development yet to be elucidated.

    Funded by: NIAID NIH HHS: R01 AI101438, R37 AI101438; NIGMS NIH HHS: P20 GM103451, P20 GM103452, P30 GM110907

    PLoS neglected tropical diseases 2019;13;9;e0007013

  • The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019.

    Buniello A, MacArthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C, McMahon A, Morales J, Mountjoy E, Sollis E, Suveges D, Vrousgou O, Whetzel PL, Amode R, Guillen JA, Riat HS, Trevanion SJ, Hall P, Junkins H, Flicek P, Burdett T, Hindorff LA, Cunningham F and Parkinson H

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

    The GWAS Catalog delivers a high-quality curated collection of all published genome-wide association studies enabling investigations to identify causal variants, understand disease mechanisms, and establish targets for novel therapies. The scope of the Catalog has also expanded to targeted and exome arrays with 1000 new associations added for these technologies. As of September 2018, the Catalog contains 5687 GWAS comprising 71673 variant-trait associations from 3567 publications. New content includes 284 full P-value summary statistics datasets for genome-wide and new targeted array studies, representing 6 × 109 individual variant-trait statistics. In the last 12 months, the Catalog's user interface was accessed by ∼90000 unique users who viewed >1 million pages. We have improved data access with the release of a new RESTful API to support high-throughput programmatic access, an improved web interface and a new summary statistics database. Summary statistics provision is supported by a new format proposed as a community standard for summary statistics data representation. This format was derived from our experience in standardizing heterogeneous submissions, mapping formats and in harmonizing content. Availability:

    Funded by: NHGRI NIH HHS: U41 HG007823

    Nucleic acids research 2019;47;D1;D1005-D1012

  • Next-generation sequencing for the diagnosis of MYH9-RD: Predicting pathogenic variants.

    Bury L, Megy K, Stephens JC, Grassi L, Greene D, Gleadall N, Althaus K, Allsup D, Bariana TK, Bonduel M, Butta NV, Collins P, Curry N, Deevi SVV, Downes K, Duarte D, Elliott K, Falcinelli E, Furie B, Keeling D, Lambert MP, Linger R, Mangles S, Mapeta R, Millar CM, Penkett C, Perry DJ, Stirrups KE, Turro E, Westbury SK, Wu J, BioResource N, Gomez K, Freson K, Ouwehand WH, Gresele P and Simeoni I

    Department of Internal Medicine, Section of Internal and Cardiovascular Medicine, University of Perugia, Perugia, Italy.

    The heterogeneous manifestations of MYH9-related disorder (MYH9-RD), characterized by macrothrombocytopenia, Döhle-like inclusion bodies in leukocytes, bleeding of variable severity with, in some cases, ear, eye, kidney, and liver involvement, make the diagnosis for these patients still challenging in clinical practice. We collected phenotypic data and analyzed the genetic variants in more than 3,000 patients with a bleeding or platelet disorder. Patients were enrolled in the BRIDGE-BPD and ThromboGenomics Projects and their samples processed by high throughput sequencing (HTS). We identified 50 patients with a rare variant in MYH9. All patients had macrothrombocytes and all except two had thrombocytopenia. Some degree of bleeding diathesis was reported in 41 of the 50 patients. Eleven patients presented hearing impairment, three renal failure and two elevated liver enzymes. Among the 28 rare variants identified in MYH9, 12 were novel. HTS was instrumental in diagnosing 23 patients (46%). Our results confirm the clinical heterogeneity of MYH9-RD and show that, in the presence of an unclassified platelet disorder with macrothrombocytes, MYH9-RD should always be considered. A HTS-based strategy is a reliable method to reach a conclusive diagnosis of MYH9-RD in clinical practice.

    Funded by: British Heart Foundation: 208, 226, RBAG/245; Department of Health: RBAG/181, RG65966; European Commission: RBAG/344; MRC: 295, RBAG/285; Medical Research Council: MR/J011711/1; NHS Blood and Transplant: RBAG/142; Wellcome Trust: RBAG/342

    Human mutation 2019;41;1;277-290

  • Circulating tumor DNA dynamics using patient-customized assays are associated with outcome in neoadjuvantly treated breast cancer.

    Butler TM, Boniface CT, Johnson-Camacho K, Tabatabaei S, Melendez D, Kelley T, Gray J, Corless CL and Spellman PT

    Department of Molecular and Medical Genetics, Oregon Health and Science University (OHSU) Portland, Oregon 97201, USA.

    Pathological complete response (pCR) is an accurate predictor of good outcome following neoadjuvant chemotherapy (NAC) for locally advanced breast cancer. The presence of circulating-tumor DNA (ctDNA) has recently been reported to be strongly predictive of poor outcome in similar patient groups. We monitored ctDNA levels from 10 women undergoing NAC for locally advanced breast cancer using a patient-specific, hybrid-capture sequencing technique sensitive to the level of one altered allele in 10,000. Plasma was collected prior to the start of NAC, prior to each infusion of NAC, and during follow-up for between 350 and 1150 d after the start of NAC. Prior to the start of NAC, ctDNA was detectable in 3/3 triple negative, 3/3 HER2<sup>+</sup>, and 2/4 HER2<sup>-</sup>, ER<sup>+</sup> breast cancer patients. Total cell-free DNA levels were considerably higher when patients were on NAC than at other times. ctDNA dynamics during NAC showed that patients with pCR experienced rapid declines in ctDNA levels, whereas patients without pCR typically showed evidence of residual ctDNA after initiation of treatment. Intriguingly, two of three patients that showed marked increases in ctDNA while on NAC experienced rapid recurrences (<2 yr following start of NAC). The third patient that had increases in ctDNA levels while on NAC had low-grade ER<sup>+</sup> disease and showed residual ctDNA after surgery, which became undetectable after local radiation. Taken together, these results demonstrate the ability of our approach to sensitively serially monitor ctDNA during NAC, and identifies a need to further investigate the possibility of stratifying patients who need additional treatment or identify therapies that are ineffective.

    Cold Spring Harbor molecular case studies 2019;5;2

  • Targeting MEK in vemurafenib-resistant hairy cell leukemia.

    Caeser R, Collord G, Yao WQ, Chen Z, Vassiliou GS, Beer PA, Du MQ, Scott MA, Follows GA and Hodson DJ

    Department of Haematology, University of Cambridge, Cambridge, UK.

    Funded by: Medical Research Council: MC_PC_12009, MR/M008584/1; Medical Research Council (MRC): MR/M008584/1; Wellcome Trust: WT098051

    Leukemia 2019;33;2;541-545

  • Genetic modification of primary human B cells to model high-grade lymphoma.

    Caeser R, Di Re M, Krupka JA, Gao J, Lara-Chica M, Dias JML, Cooke SL, Fenner R, Usheva Z, Runge HFP, Beer PA, Eldaly H, Pak HK, Park CS, Vassiliou GS, Huntly BJP, Mupo A, Bashford-Rogers RJM and Hodson DJ

    Wellcome MRC Cambridge Stem Cell Institute, Cambridge, CB2 0AW, UK.

    Sequencing studies of diffuse large B cell lymphoma (DLBCL) have identified hundreds of recurrently altered genes. However, it remains largely unknown whether and how these mutations may contribute to lymphomagenesis, either individually or in combination. Existing strategies to address this problem predominantly utilize cell lines, which are limited by their initial characteristics and subsequent adaptions to prolonged in vitro culture. Here, we describe a co-culture system that enables the ex vivo expansion and viral transduction of primary human germinal center B cells. Incorporation of CRISPR/Cas9 technology enables high-throughput functional interrogation of genes recurrently mutated in DLBCL. Using a backbone of BCL2 with either BCL6 or MYC, we identify co-operating genetic alterations that promote growth or even full transformation into synthetically engineered DLBCL models. The resulting tumors can be expanded and sequentially transplanted in vivo, providing a scalable platform to test putative cancer genes and to create mutation-directed, bespoke lymphoma models.

    Funded by: Medical Research Council: MC_PC_12009, MR/M008584/1, MR/R009708/1

    Nature communications 2019;10;1;4543

  • Complete Genome Sequence of Pseudomonas aeruginosa Reference Strain PAK.

    Cain AK, Nolan LM, Sullivan GJ, Whitchurch CB, Filloux A and Parkhill J

    Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    We report the complete genome of <i>Pseudomonas aeruginosa</i> strain PAK, a strain which has been instrumental in the study of a range of <i>P. aeruginosa</i> virulence and pathogenesis factors and has been used for over 50 years as a laboratory reference strain.

    Microbiology resource announcements 2019;8;41

  • De Novo Missense Substitutions in the Gene Encoding CDK8, a Regulator of the Mediator Complex, Cause a Syndromic Developmental Disorder.

    Calpena E, Hervieu A, Kaserer T, Swagemakers SMA, Goos JAC, Popoola O, Ortiz-Ruiz MJ, Barbaro-Dieber T, Bownass L, Brilstra EH, Brimble E, Foulds N, Grebe TA, Harder AVE, Lees MM, Monaghan KG, Newbury-Ecob RA, Ong KR, Osio D, Reynoso Santos FJ, Ruzhnikov MRZ, Telegrafi A, van Binsbergen E, van Dooren MF, Deciphering Developmental Disorders Study, van der Spek PJ, Blagg J, Twigg SRF, Mathijssen IMJ, Clarke PA and Wilkie AOM

    Clinical Genetics Group, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, UK.

    The Mediator is an evolutionarily conserved, multi-subunit complex that regulates multiple steps of transcription. Mediator activity is regulated by the reversible association of a four-subunit module comprising CDK8 or CDK19 kinases, together with cyclin C, MED12 or MED12L, and MED13 or MED13L. Mutations in MED12, MED13, and MED13L were previously identified in syndromic developmental disorders with overlapping phenotypes. Here, we report CDK8 mutations (located at 13q12.13) that cause a phenotypically related disorder. Using whole-exome or whole-genome sequencing, and by international collaboration, we identified eight different heterozygous missense CDK8 substitutions, including 10 shown to have arisen de novo, in 12 unrelated subjects; a recurrent mutation, c.185C>T (p.Ser62Leu), was present in five individuals. All predicted substitutions localize to the ATP-binding pocket of the kinase domain. Affected individuals have overlapping phenotypes characterized by hypotonia, mild to moderate intellectual disability, behavioral disorders, and variable facial dysmorphism. Congenital heart disease occurred in six subjects; additional features present in multiple individuals included agenesis of the corpus callosum, ano-rectal malformations, seizures, and hearing or visual impairments. To evaluate the functional impact of the mutations, we measured phosphorylation at STAT1-Ser727, a known CDK8 substrate, in a CDK8 and CDK19 CRISPR double-knockout cell line transfected with wild-type (WT) or mutant CDK8 constructs. These experiments demonstrated a reduction in STAT1 phosphorylation by all mutants, in most cases to a similar extent as in a kinase-dead control. We conclude that missense mutations in CDK8 cause a developmental disorder that has phenotypic similarity to syndromes associated with mutations in other subunits of the Mediator kinase module, indicating probable overlap in pathogenic mechanisms.

    Funded by: Cancer Research UK: C1362/A20263; Wellcome Trust: 102731/Z/13/Z, WT098051

    American journal of human genetics 2019;104;4;709-720

  • Endothelin receptor Aa regulates proliferation and differentiation of Erb-dependent pigment progenitors in zebrafish.

    Camargo-Sosa K, Colanesi S, Müller J, Schulte-Merker S, Stemple D, Patton EE and Kelsh RN

    Department of Biology and Biochemistry and Centre for Regenerative Medicine, University of Bath, Claverton Down, Bath, United Kingdom.

    Skin pigment patterns are important, being under strong selection for multiple roles including camouflage and UV protection. Pigment cells underlying these patterns form from adult pigment stem cells (APSCs). In zebrafish, APSCs derive from embryonic neural crest cells, but sit dormant until activated to produce pigment cells during metamorphosis. The APSCs are set-aside in an ErbB signaling dependent manner, but the mechanism maintaining quiescence until metamorphosis remains unknown. Mutants for a pigment pattern gene, parade, exhibit ectopic pigment cells localised to the ventral trunk, but also supernumerary cells restricted to the Ventral Stripe. Contrary to expectations, these melanocytes and iridophores are discrete cells, but closely apposed. We show that parade encodes Endothelin receptor Aa, expressed in the blood vessels, most prominently in the medial blood vessels, consistent with the ventral trunk phenotype. We provide evidence that neuronal fates are not affected in parade mutants, arguing against transdifferentiation of sympathetic neurons to pigment cells. We show that inhibition of BMP signaling prevents specification of sympathetic neurons, indicating conservation of this molecular mechanism with chick and mouse. However, inhibition of sympathetic neuron differentiation does not enhance the parade phenotype. Instead, we pinpoint ventral trunk-restricted proliferation of neural crest cells as an early feature of the parade phenotype. Importantly, using a chemical genetic screen for rescue of the ectopic pigment cell phenotype of parade mutants (whilst leaving the embryonic pattern untouched), we identify ErbB inhibitors as a key hit. The time-window of sensitivity to these inhibitors mirrors precisely the window defined previously as crucial for the setting aside of APSCs in the embryo, strongly implicating adult pigment stem cells as the source of the ectopic pigment cells. We propose that a novel population of APSCs exists in association with medial blood vessels, and that their quiescence is dependent upon Endothelin-dependent factors expressed by the blood vessels.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/L00769X/1; Medical Research Council: MC_PC_U127585840, MC_U127585840, MC_UU_00007/9, MR/J001457/1

    PLoS genetics 2019;15;2;e1007941

  • Virus discovery reveals frequent infection by diverse novel members of the Flaviviridae in wild lemurs.

    Canuti M, Williams CV, Sagan SM, Oude Munnink BB, Gadi S, Verhoeven JTP, Kellam P, Cotten M, Lang AS, Junge RE, Cullen JM and van der Hoek L

    Department of Biology, Memorial University of Newfoundland, 232 Elizabeth Ave., St. John's, NL, A1B 3X9, Canada.

    Lemurs are highly endangered mammals inhabiting the forests of Madagascar. In this study, we performed virus discovery on serum samples collected from 84 wild lemurs and identified viral sequence fragments from 4 novel viruses within the family Flaviviridae, including members of the genera Hepacivirus and Pegivirus. The sifaka hepacivirus (SifHV, two genotypes) and pegivirus (SifPgV, two genotypes) were discovered in the diademed sifaka (Propithecus diadema), while other pegiviral fragments were detected in samples from the indri (Indri indri, IndPgV) and the weasel sportive lemur (Lepilemur mustelinus, LepPgV). Although data are preliminary, each viral species appeared host species-specific and frequent infection was detected (18 of 84 individuals were positive for at least one virus). The complete coding sequence and partial 5' and 3' untranslated regions (UTRs) were obtained for SifHV and its genomic organization was consistent with that of other hepaciviruses, with one unique polyprotein and highly structured UTRs. Phylogenetic analyses showed the SifHV belonged to a clade that includes several viral species identified in rodents from Asia and North America, while SifPgV and IndPgV were more closely related to pegiviral species A and C, that include viruses found in humans as well as New- and Old-World monkeys. Our results support the current proposed model of virus-host co-divergence with frequent occurrence of cross-species transmission for these genera and highlight how the discovery of more members of the Flaviviridae can help clarify the ecology and evolutionary history of these viruses. Furthermore, this knowledge is important for conservation and captive management of lemurs.

    Funded by: European Community: EC grant 223498

    Archives of virology 2019;164;2;509-522

  • The midbody interactome reveals unexpected roles for PP1 phosphatases in cytokinesis.

    Capalbo L, Bassi ZI, Geymonat M, Todesca S, Copoiu L, Enright AJ, Callaini G, Riparbelli MG, Yu L, Choudhary JS, Ferrero E, Wheatley S, Douglas ME, Mishima M and D'Avino PP

    Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QP, UK.

    The midbody is an organelle assembled at the intercellular bridge between the two daughter cells at the end of mitosis. It controls the final separation of the daughter cells and has been involved in cell fate, polarity, tissue organization, and cilium and lumen formation. Here, we report the characterization of the intricate midbody protein-protein interaction network (interactome), which identifies many previously unknown interactions and provides an extremely valuable resource for dissecting the multiple roles of the midbody. Initial analysis of this interactome revealed that PP1β-MYPT1 phosphatase regulates microtubule dynamics in late cytokinesis and de-phosphorylates the kinesin component MKLP1/KIF23 of the centralspindlin complex. This de-phosphorylation antagonizes Aurora B kinase to modify the functions and interactions of centralspindlin in late cytokinesis. Our findings expand the repertoire of PP1 functions during mitosis and indicate that spatiotemporal changes in the distribution of kinases and counteracting phosphatases finely tune the activity of cytokinesis proteins.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/R001227/1; Cancer Research UK: C309/A25144

    Nature communications 2019;10;1;4513

  • Early Pleistocene enamel proteome from Dmanisi resolves Stephanorhinus phylogeny.

    Cappellini E, Welker F, Pandolfi L, Ramos-Madrigal J, Samodova D, Rüther PL, Fotakis AK, Lyon D, Moreno-Mayar JV, Bukhsianidze M, Rakownikow Jersie-Christensen R, Mackie M, Ginolhac A, Ferring R, Tappen M, Palkopoulou E, Dickinson MR, Stafford TW, Chan YL, Götherström A, Nathan SKSS, Heintzman PD, Kapp JD, Kirillova I, Moodley Y, Agusti J, Kahlke RD, Kiladze G, Martínez-Navarro B, Liu S, Sandoval Velasco M, Sinding MS, Kelstrup CD, Allentoft ME, Orlando L, Penkman K, Shapiro B, Rook L, Dalén L, Gilbert MTP, Olsen JV, Lordkipanidze D and Willerslev E

    Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark.

    The sequencing of ancient DNA has enabled the reconstruction of speciation, migration and admixture events for extinct taxa<sup>1</sup>. However, the irreversible post-mortem degradation<sup>2</sup> of ancient DNA has so far limited its recovery-outside permafrost areas-to specimens that are not older than approximately 0.5 million years (Myr)<sup>3</sup>. By contrast, tandem mass spectrometry has enabled the sequencing of approximately 1.5-Myr-old collagen type I<sup>4</sup>, and suggested the presence of protein residues in fossils of the Cretaceous period<sup>5</sup>-although with limited phylogenetic use<sup>6</sup>. In the absence of molecular evidence, the speciation of several extinct species of the Early and Middle Pleistocene epoch remains contentious. Here we address the phylogenetic relationships of the Eurasian Rhinocerotidae of the Pleistocene epoch<sup>7-9</sup>, using the proteome of dental enamel from a Stephanorhinus tooth that is approximately 1.77-Myr old, recovered from the archaeological site of Dmanisi (South Caucasus, Georgia)<sup>10</sup>. Molecular phylogenetic analyses place this Stephanorhinus as a sister group to the clade formed by the woolly rhinoceros (Coelodonta antiquitatis) and Merck's rhinoceros (Stephanorhinus kirchbergensis). We show that Coelodonta evolved from an early Stephanorhinus lineage, and that this latter genus includes at least two distinct evolutionary lines. The genus Stephanorhinus is therefore currently paraphyletic, and its systematic revision is needed. We demonstrate that sequencing the proteome of Early Pleistocene dental enamel overcomes the limitations of phylogenetic inference based on ancient collagen or DNA. Our approach also provides additional information about the sex and taxonomic assignment of other specimens from Dmanisi. Our findings reveal that proteomic investigation of ancient dental enamel-which is the hardest tissue in vertebrates<sup>11</sup>, and is highly abundant in the fossil record-can push the reconstruction of molecular evolution further back into the Early Pleistocene epoch, beyond the currently known limits of ancient DNA preservation.

    Funded by: Wellcome Trust

    Nature 2019;574;7776;103-107

  • ZMIZ1 Variants Cause a Syndromic Neurodevelopmental Disorder.

    Carapito R, Ivanova EL, Morlon A, Meng L, Molitor A, Erdmann E, Kieffer B, Pichot A, Naegely L, Kolmer A, Paul N, Hanauer A, Tran Mau-Them F, Jean-Marçais N, Hiatt SM, Cooper GM, Tvrdik T, Muir AM, Dimartino C, Chopra M, Amiel J, Gordon CT, Dutreux F, Garde A, Thauvin-Robinet C, Wang X, Leduc MS, Phillips M, Crawford HP, Kukolich MK, Hunt D, Harrison V, Kharbanda M, Deciphering Developmental Disorders Study, University of Washington Center for Mendelian Genomics, Smigiel R, Gold N, Hung CY, Viskochil DH, Dugan SL, Bayrak-Toydemir P, Joly-Helas G, Guerrot AM, Schluth-Bolard C, Rio M, Wentzensen IM, McWalter K, Schnur RE, Lewis AM, Lalani SR, Mensah-Bonsu N, Céraline J, Sun Z, Ploski R, Bacino CA, Mefford HC, Faivre L, Bodamer O, Chelly J, Isidor B and Bahram S

    Laboratoire d'ImmunoRhumatologie Moléculaire, plateforme GENOMAX, INSERM UMR_S 1109, Faculté de Médecine, Fédération Hospitalo-Universitaire OMICARE, Fédération de Médecine Translationnelle de Strasbourg (FMTS), LabEx TRANSPLANTEX, Université de Strasbourg, 4 rue Kirschleger, 67085 Strasbourg, France; Service d'Immunologie Biologique, Plateau Technique de Biologie, Pôle de Biologie, Nouvel Hôpital Civil, 1 place de l'Hôpital, 67091 Strasbourg, France. Electronic address:

    ZMIZ1 is a coactivator of several transcription factors, including p53, the androgen receptor, and NOTCH1. Here, we report 19 subjects with intellectual disability and developmental delay carrying variants in ZMIZ1. The associated features include growth failure, feeding difficulties, microcephaly, facial dysmorphism, and various other congenital malformations. Of these 19, 14 unrelated subjects carried de novo heterozygous single-nucleotide variants (SNVs) or single-base insertions/deletions, 3 siblings harbored a heterozygous single-base insertion, and 2 subjects had a balanced translocation disrupting ZMIZ1 or involving a regulatory region of ZMIZ1. In total, we identified 13 point mutations that affect key protein regions, including a SUMO acceptor site, a central disordered alanine-rich motif, a proline-rich domain, and a transactivation domain. All identified variants were absent from all available exome and genome databases. In vitro, ZMIZ1 showed impaired coactivation of the androgen receptor. In vivo, overexpression of ZMIZ1 mutant alleles in developing mouse brains using in utero electroporation resulted in abnormal pyramidal neuron morphology, polarization, and positioning, underscoring the importance of ZMIZ1 in neural development and supporting mutations in ZMIZ1 as the cause of a rare neurodevelopmental syndrome.

    Funded by: NCI NIH HHS: R01 CA070297, R01 CA151623; NHGRI NIH HHS: U01 HG007301, U24 HG008956, UM1 HG006493, UM1 HG007301; NICHD NIH HHS: U54 HD083091; NIDDK NIH HHS: R01 DK104941; NINDS NIH HHS: R01 NS069605, R56 NS069605; Wellcome Trust

    American journal of human genetics 2019;104;2;319-330

  • Brown and beige fat: From molecules to physiology and pathophysiology.

    Carobbio S, Guénantin AC, Samuelson I, Bahri M and Vidal-Puig A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK; Metabolic Research Laboratories, Addenbrooke's Treatment Centre, Institute of Metabolic Science, Addenbrooke's Hospital, University of Cambridge, Cambridge, UK. Electronic address:

    The adipose organ portrays adipocytes of diverse tones: white, brown and beige, each type with distinct functions. Adipocytes orchestrate their adaptation and expansion to provide storage to excess nutrients, the quick mobilisation of fuel to supply peripheral functional demands, insulation, and, in their thermogenic form, heat generation to maintain core body temperature. Thermogenic adipocytes could be targets for anti-obesity and anti-diabetic therapeutic approaches aiming to restore adipose tissue functionality and increase energy dissipation. However, for thermogenic adipose tissue to become therapeutically relevant, a better understanding of its development and origins, its progenitors and their characteristics and the composition of its niche, is essential. Also crucial is the identification of stimuli and molecules promoting its specific differentiation and activation. Here we highlight the structural/cellular differences between human and rodent brown adipose tissue and discuss how obesity and metabolic complication affects brown and beige cells as well as how they could be targeted to improve their activation and improve global metabolic homeostasis. Finally, we describe the limitations of current research models and the advantages of new emerging approaches.

    Funded by: British Heart Foundation: PG/12/53/29714, RG/12/13/29853; Medical Research Council: MC_UU_12012/2

    Biochimica et biophysica acta. Molecular and cell biology of lipids 2019;1864;1;37-50

  • Reply to Bombard and Mighton.

    Carrieri D, Howard HC, Clarke AJ, Stefansdottir V, Cornel MC, van El CG and Forzano F

    Egenis, University of Exeter, Exeter, UK.

    European journal of human genetics : EJHG 2019;27;4;507-508

  • Open Targets Platform: new developments and updates two years on.

    Carvalho-Silva D, Pierleoni A, Pignatelli M, Ong C, Fumis L, Karamanis N, Carmona M, Faulconbridge A, Hercules A, McAuley E, Miranda A, Peat G, Spitzer M, Barrett J, Hulcoop DG, Papa E, Koscielny G and Dunham I

    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.

    The Open Targets Platform integrates evidence from genetics, genomics, transcriptomics, drugs, animal models and scientific literature to score and rank target-disease associations for drug target identification. The associations are displayed in an intuitive user interface (, and are available through a REST-API ( and a bulk download ( In addition to target-disease associations, we also aggregate and display data at the target and disease levels to aid target prioritisation. Since our first publication two years ago, we have made eight releases, added new data sources for target-disease associations, started including causal genetic variants from non genome-wide targeted arrays, added new target and disease annotations, launched new visualisations and improved existing ones and released a new web tool for batch search of up to 200 targets. We have a new URL for the Open Targets Platform REST-API, new REST endpoints and also removed the need for authorisation for API fair use. Here, we present the latest developments of the Open Targets Platform, expanding the evidence and target-disease associations with new and improved data sources, refining data quality, enhancing website usability, and increasing our user base with our training workshops, user support, social media and bioinformatics forum engagement.

    Nucleic acids research 2019;47;D1;D1056-D1065

  • IgG and Fcγ Receptors in Intestinal Immunity and Inflammation.

    Castro-Dopico T and Clatworthy MR

    Molecular Immunity Unit, MRC Laboratory of Molecular Biology, Department of Medicine, University of Cambridge, Cambridge, United Kingdom.

    Fcγ receptors (FcγR) are cell surface glycoproteins that mediate cellular effector functions of immunoglobulin G (IgG) antibodies. Genetic variation in FcγR genes can influence susceptibility to a variety of antibody-mediated autoimmune and inflammatory disorders, including systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA). More recently, however, genetic studies have implicated altered FcγR signaling in the pathogenesis of inflammatory bowel disease (IBD), a condition classically associated with dysregulated innate and T cell immunity. Specifically, a variant of the activating receptor, FcγRIIA, with low affinity for IgG, confers protection against the development of ulcerative colitis, a subset of IBD, leading to a re-evaluation of the role of IgG and FcγRs in gastrointestinal tract immunity, an organ system traditionally associated with IgA. In this review, we summarize our current understanding of IgG and FcγR function at this unique host-environment interface, from the pathogenesis of colitis and defense against enteropathogens, its contribution to maternal-fetal cross-talk and susceptibility to cancer. Finally, we discuss the therapeutic implications of this information, both in terms of how FcγR signaling pathways may be targeted for the treatment of IBD and how FcγR engagement may influence the efficacy of therapeutic monoclonal antibodies in IBD.

    Frontiers in immunology 2019;10;805

  • Mucosal IgG in inflammatory bowel disease - a question of (sub)class?

    Castro-Dopico T and Clatworthy MR

    Molecular Immunity Unit, Department of Medicine, University of Cambridge , Cambridge , UK.

    Immunoglobulins (Igs) form a cornerstone of mucosal immunity. In the gastrointestinal tract, secretory IgA and IgM bind to commensal microorganisms within the intestinal lumen to prevent them from breaching the intestinal epithelium - a process known as immune exclusion. In recent years, there has been renewed interest in the role of IgG in intestinal immunity, driven in part by a genetic association of an affinity-lowering variant of an IgG receptor, FcγRIIA, with protection from ulcerative colitis (UC), a subclass of inflammatory bowel disease (IBD). We recently demonstrated a role for IgG and Fcγ receptor signalling in driving pathogenic IL-1β production by colonic mononuclear phagocytes and the subsequent induction of a local type 17 response in UC. Here, we discuss the potential relevance of our observations to the other major subclass of IBD - Crohn's disease (CD) - where the genetic association with <i>FCGR</i> variants is less robust and consider how this may impact therapeutic interventions in these disease subsets.

    Gut microbes 2019;1-9

  • Anti-commensal IgG Drives Intestinal Inflammation and Type 17 Immunity in Ulcerative Colitis.

    Castro-Dopico T, Dennison TW, Ferdinand JR, Mathews RJ, Fleming A, Clift D, Stewart BJ, Jing C, Strongili K, Labzin LI, Monk EJM, Saeb-Parsy K, Bryant CE, Clare S, Parkes M and Clatworthy MR

    Molecular Immunity Unit, University of Cambridge Department of Medicine, Cambridge CB2 0QH, UK.

    Inflammatory bowel disease is a chronic, relapsing condition with two subtypes, Crohn's disease (CD) and ulcerative colitis (UC). Genome-wide association studies (GWASs) in UC implicate a FCGR2A variant that alters the binding affinity of the antibody receptor it encodes, FcγRIIA, for immunoglobulin G (IgG). Here, we aimed to understand the mechanisms whereby changes in FcγRIIA affinity would affect inflammation in an IgA-dominated organ. We found a profound induction of anti-commensal IgG and a concomitant increase in activating FcγR signaling in the colonic mucosa of UC patients. Commensal-IgG immune complexes engaged gut-resident FcγR-expressing macrophages, inducing NLRP3- and reactive-oxygen-species-dependent production of interleukin-1β (IL-1β) and neutrophil-recruiting chemokines. These responses were modulated by the FCGR2A genotype. In vivo manipulation of macrophage FcγR signal strength in a mouse model of UC determined the magnitude of intestinal inflammation and IL-1β-dependent type 17 immunity. The identification of an important contribution of IgG-FcγR-dependent inflammation to UC has therapeutic implications.

    Funded by: Department of Health: BTRU-2014-10027, RP-2017-08-ST2-002; Medical Research Council: MR/N024907/1, U105181010; Wellcome Trust: 216366/Z/19/Z

    Immunity 2019;50;4;1099-1114.e10

  • New GJA8 variants and phenotypes highlight its critical role in a broad spectrum of eye anomalies.

    Ceroni F, Aguilera-Garcia D, Chassaing N, Bax DA, Blanco-Kelly F, Ramos P, Tarilonte M, Villaverde C, da Silva LRJ, Ballesta-Martínez MJ, Sanchez-Soler MJ, Holt RJ, Cooper-Charles L, Bruty J, Wallis Y, McMullan D, Hoffman J, Bunyan D, Stewart A, Stewart H, Lachlan K, DDD Study, Fryer A, McKay V, Roume J, Dureau P, Saggar A, Griffiths M, Calvas P, Ayuso C, Corton M and Ragge NK

    Faculty of Health and Life Sciences, Oxford Brookes University, Gipsy Lane, Oxford, OX3 0BP, UK.

    GJA8 encodes connexin 50 (Cx50), a transmembrane protein involved in the formation of lens gap junctions. GJA8 mutations have been linked to early onset cataracts in humans and animal models. In mice, missense mutations and homozygous Gja8 deletions lead to smaller lenses and microphthalmia in addition to cataract, suggesting that Gja8 may play a role in both lens development and ocular growth. Following screening of GJA8 in a cohort of 426 individuals with severe congenital eye anomalies, primarily anophthalmia, microphthalmia and coloboma, we identified four known [p.(Thr39Arg), p.(Trp45Leu), p.(Asp51Asn), and p.(Gly94Arg)] and two novel [p.(Phe70Leu) and p.(Val97Gly)] likely pathogenic variants in seven families. Five of these co-segregated with cataracts and microphthalmia, whereas the variant p.(Gly94Arg) was identified in an individual with congenital aphakia, sclerocornea, microphthalmia and coloboma. Four missense variants of unknown or unlikely clinical significance were also identified. Furthermore, the screening of GJA8 structural variants in a subgroup of 188 individuals identified heterozygous 1q21 microdeletions in five families with coloboma and other ocular and/or extraocular findings. However, the exact genotype-phenotype correlation of these structural variants remains to be established. Our data expand the spectrum of GJA8 variants and associated phenotypes, confirming the importance of this gene in early eye development.

    Funded by: Health Innovation Challenge Fund: HICF-1009-003; Spanish Institute of Health Carlos III: CP12/03256; Spanish Ministry of Economy and Competitiveness: SAF2013-46943-R

    Human genetics 2019;138;8-9;1027-1042

  • RTNduals: an R/Bioconductor package for analysis of co-regulation and inference of dual regulons.

    Chagas VS, Groeneveld CS, Oliveira KG, Trefflich S, de Almeida RC, Ponder BAJ, Meyer KB, Jones SJM, Robertson AG and Castro MAA

    Bioinformatics and Systems Biology Lab, Federal University of Paraná, Curitiba, Brazil.

    Motivation: Transcription factors (TFs) are key regulators of gene expression, and can activate or repress multiple target genes, forming regulatory units, or regulons. Understanding downstream effects of these regulators includes evaluating how TFs cooperate or compete within regulatory networks. Here we present RTNduals, an R/Bioconductor package that implements a general method for analyzing pairs of regulons.

    Results: RTNduals identifies a dual regulon when the number of targets shared between a pair of regulators is statistically significant. The package extends the RTN (Reconstruction of Transcriptional Networks) package, and uses RTN transcriptional networks to identify significant co-regulatory associations between regulons. The Supplementary Information reports two case studies for TFs using the METABRIC and TCGA breast cancer cohorts.

    Availability and implementation: RTNduals is written in the R language, and is available from the Bioconductor project at

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Bioinformatics (Oxford, England) 2019;35;24;5357-5358

  • Mapping imported malaria in Bangladesh using parasite genetic and human mobility data.

    Chang HH, Wesolowski A, Sinha I, Jacob CG, Mahmud A, Uddin D, Zaman SI, Hossain MA, Faiz MA, Ghose A, Sayeed AA, Rahman MR, Islam A, Karim MJ, Rezwan MK, Shamsuzzaman AKM, Jhora ST, Aktaruzzaman MM, Drury E, Gonçalves S, Kekre M, Dhorda M, Vongpromek R, Miotto O, Engø-Monsen K, Kwiatkowski D, Maude RJ and Buckee C

    Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, United States.

    For countries aiming for malaria elimination, travel of infected individuals between endemic areas undermines local interventions. Quantifying parasite importation has therefore become a priority for national control programs. We analyzed epidemiological surveillance data, travel surveys, parasite genetic data, and anonymized mobile phone data to measure the spatial spread of malaria parasites in southeast Bangladesh. We developed a genetic mixing index to estimate the likelihood of samples being local or imported from parasite genetic data and inferred the direction and intensity of parasite flow between locations using an epidemiological model integrating the travel survey and mobile phone calling data. Our approach indicates that, contrary to dogma, frequent mixing occurs in low transmission regions in the southwest, and elimination will require interventions in addition to reducing imported infections from forested regions. Unlike risk maps generated from clinical case counts alone, therefore, our approach distinguishes areas of frequent importation as well as high transmission.

    Funded by: Bill and Melinda Gates Foundation: CPT000390, OPP1118166, OPP1129596; Medical Research Council: G0600718; NIGMS NIH HHS: R35 GM124715, R35GM124715-02, U54 GM088558, U54GM088558; NLM NIH HHS: DP2 LM013102; Wellcome Trust; World Health Organization: 001

    eLife 2019;8

  • IL-7-dependent compositional changes within the γδ T cell pool in lymph nodes during ageing lead to an unbalanced anti-tumour response.

    Chen HC, Eling N, Martinez-Jimenez CP, O'Brien LM, Carbonaro V, Marioni JC, Odom DT and de la Roche M

    Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK.

    How the age-associated decline of immune function leads to increased cancer incidence is poorly understood. Here, we have characterised the cellular composition of the γδ T-cell pool in peripheral lymph nodes (pLNs) upon ageing. We find that ageing has minimal cell-intrinsic effects on function and global gene expression of γδ T cells, and γδTCR diversity remains stable. However, ageing alters TCRδ chain usage and clonal structure of γδ T-cell subsets. Importantly, IL-17-producing γδ17 T cells dominate the γδ T-cell pool of aged mice-mainly due to the selective expansion of Vγ6<sup>+</sup> γδ17 T cells and augmented γδ17 polarisation of Vγ4<sup>+</sup> T cells. Expansion of the γδ17 T-cell compartment is mediated by increased IL-7 expression in the T-cell zone of old mice. In a Lewis lung cancer model, pro-tumourigenic Vγ6<sup>+</sup> γδ17 T cells are exclusively activated in the tumour-draining LN and their infiltration into the tumour correlates with increased tumour size in aged mice. Thus, upon ageing, substantial compositional changes in γδ T-cell pool in the pLN lead to an unbalanced γδ T-cell response in the tumour that is associated with accelerated tumour growth.

    Funded by: Cancer Research UK (CRUK): A22231, A22257, A22398, C14303/A17197; EC | H2020 | H2020 Priority Excellent Science | H2020 European Research Council (ERC): 615584; European Bioinformatics Institute (EMBL-EBI); European Molecular Biology Organization (EMBO); Royal Society: WT107609, WT107609/Z/15/Z; Wellcome Sanger Institute: 105031/Z/14/Z; Wellcome Trust (Wellcome): WT098051, WT107609/Z/15/Z

    EMBO reports 2019;e47379

  • Gimpute: an efficient genetic data imputation pipeline.

    Chen J, Lippold D, Frank J, Rayner W, Meyer-Lindenberg A and Schwarz E

    Department of Psychiatry and Psychotherapy, Heidelberg University, Mannheim, Germany.

    Motivation: Genotype imputation is essential for genome-wide association studies (GWAS) to retrieve information of untyped variants and facilitate comparability across studies. However, there is a lack of automated pipelines that perform all required processing steps prior to and following imputation.

    Results: Based on widely used and freely available tools, we have developed Gimpute, an automated processing and imputation pipeline for genome-wide association data. Gimpute includes processing steps for genotype liftOver, quality control, population outlier detection, haplotype pre-phasing, imputation, post imputation, data management and the extension to other existing pipeline.

    Availability and implementation: The Gimpute package is an open source R package and is freely available at

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Bioinformatics (Oxford, England) 2019;35;8;1433-1435

  • Genome-wide association study of type 2 diabetes in Africa.

    Chen J, Sun M, Adeyemo A, Pirie F, Carstensen T, Pomilla C, Doumatey AP, Chen G, Young EH, Sandhu M, Morris AP, Barroso I, McCarthy MI, Mahajan A, Wheeler E, Rotimi CN and Motala AA

    Wellcome Sanger Institute, Hinxton, Cambridge, UK.

    Aims/hypothesis: Genome-wide association studies (GWAS) for type 2 diabetes have uncovered >400 risk loci, primarily in populations of European and Asian ancestry. Here, we aimed to discover additional type 2 diabetes risk loci (including African-specific variants) and fine-map association signals by performing genetic analysis in African populations.

    Methods: We conducted two type 2 diabetes genome-wide association studies in 4347 Africans from South Africa, Nigeria, Ghana and Kenya and meta-analysed both studies together. Likely causal variants were identified using fine-mapping approaches.

    Results: The most significantly associated variants mapped to the widely replicated type 2 diabetes risk locus near TCF7L2 (p = 5.3 × 10<sup>-13</sup>). Fine-mapping of the TCF7L2 locus suggested one type 2 diabetes association signal shared between Europeans and Africans (indexed by rs7903146) and a distinct African-specific signal (indexed by rs17746147). We also detected one novel signal, rs73284431, near AGMO (p = 5.2 × 10<sup>-9</sup>, minor allele frequency [MAF] = 0.095; monomorphic in most non-African populations), distinct from previously reported signals in the region. In analyses focused on 100 published type 2 diabetes risk loci, we identified 21 with shared causal variants in African and non-African populations.

    Conclusions/interpretation: These results demonstrate the value of performing GWAS in Africans, provide a resource to larger consortia for further discovery and fine-mapping and indicate that additional large-scale efforts in Africa are warranted to gain further insight in to the genetic architecture of type 2 diabetes.

    Funded by: CRGGH: 1ZIAHG200362; MRC: G0601261; NIH HHS: U01DK105535; Wellcome: WT206194

    Diabetologia 2019;62;7;1204-1211

  • Single-cell landscape in mammary epithelium reveals bipotent-like cells associated with breast cancer risk and outcome.

    Chen W, Morabito SJ, Kessenbrock K, Enver T, Meyer KB and Teschendorff AE

    1CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031 China.

    Adult stem-cells may serve as the cell-of-origin for cancer, yet their unbiased identification in single cell RNA sequencing data is challenging due to the high dropout rate. In the case of breast, the existence of a bipotent stem-like state is also controversial. Here we apply a marker-free algorithm to scRNA-Seq data from the human mammary epithelium, revealing a high-potency cell-state enriched for an independent mammary stem-cell expression module. We validate this stem-like state in independent scRNA-Seq data. Our algorithm further predicts that the stem-like state is bipotent, a prediction we are able to validate using FACS sorted bulk expression data. The bipotent stem-like state correlates with clinical outcome in basal breast cancer and is characterized by overexpression of <i>YBX1</i> and <i>ENO1</i>, two modulators of basal breast cancer risk. This study illustrates the power of a marker-free computational framework to identify a novel bipotent stem-like state in the mammary epithelium.

    Communications biology 2019;2;306

  • Missense variants in TAF1 and developmental phenotypes: challenges of determining pathogenicity.

    Cheng H, Capponi S, Wakeling E, Marchi E, Li Q, Zhao M, Weng C, Stefan PG, Ahlfors H, Kleyner R, Rope A, Lumaka A, Lukusa P, Devriendt K, Vermeesch J, Posey JE, Palmer EE, Murray L, Leon E, Diaz J, Worgan L, Mallawaarachchi A, Vogt J, de Munnik SA, Dreyer L, Baynam G, Ewans L, Stark Z, Lunke S, Gonçalves AR, Soares G, Oliveira J, Fassi E, Willing M, Waugh JL, Faivre L, Riviere JB, Moutton S, Mohammed S, Payne K, Walsh L, Begtrup A, Guillen Sacoto MJ, Douglas G, Alexander N, Buckley MF, Mark PR, Adès LC, Sandaradura SA, Lupski JR, Roscioli T, Agrawal PB, Kline AD, Deciphering Developmental Disorders Study, Wang K, Timmers HTM and Lyon GJ

    Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.

    We recently described a new neurodevelopmental syndrome (TAF1/MRXS33 intellectual disability syndrome) (MIM# 300966) caused by pathogenic variants involving the X-linked gene TAF1, which participates in RNA polymerase II transcription. The initial study reported eleven families, and the syndrome was defined as presenting early in life with hypotonia, facial dysmorphia, and developmental delay that evolved into intellectual disability (ID) and/or autism spectrum disorder (ASD). We have now identified an additional 27 families through a genotype-first approach. Familial segregation analysis, clinical phenotyping, and bioinformatics were capitalized on to assess potential variant pathogenicity, and molecular modelling was performed for those variants falling within structurally characterized domains of TAF1. A novel phenotypic clustering approach was also applied, in which the phenotypes of affected individuals were classified using 51 standardized Human Phenotype Ontology (HPO) terms. Phenotypes associated with TAF1 variants show considerable pleiotropy and clinical variability, but prominent among previously unreported effects were brain morphological abnormalities, seizures, hearing loss, and heart malformations. Our allelic series broadens the phenotypic spectrum of TAF1/MRXS33 intellectual disability syndrome and the range of TAF1 molecular defects in humans. It also illustrates the challenges for determining the pathogenicity of inherited missense variants, particularly for genes mapping to chromosome X. This article is protected by copyright. All rights reserved.

    Funded by: NHGRI NIH HHS: K08 HG008986, U01 HG008680, UM1 HG006542; NLM NIH HHS: R01 LM012895

    Human mutation 2019

  • Genetic variation associated with infection and the environment in the accidental pathogen Burkholderia pseudomallei.

    Chewapreecha C, Mather AE, Harris SR, Hunt M, Holden MTG, Chaichana C, Wuthiekanun V, Dougan G, Day NPJ, Limmathurotsakul D, Parkhill J and Peacock SJ

    1Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Bangkok, 10400 Thailand.

    The environmental bacterium <i>Burkholderia pseudomallei</i> causes melioidosis, an important endemic human disease in tropical and sub-tropical countries. This bacterium occupies broad ecological niches including soil, contaminated water, single-cell microbes, plants and infection in a range of animal species. Here, we performed genome-wide association studies for genetic determinants of environmental and human adaptation using a combined dataset of 1,010 whole genome sequences of <i>B. pseudomallei</i> from Northeast Thailand and Australia, representing two major disease hotspots. With these data, we identified 47 genes from 26 distinct loci associated with clinical or environmental isolates from Thailand and replicated 12 genes in an independent Australian cohort. We next outlined the selective pressures on the genetic loci (dN/dS) and the frequency at which they had been gained or lost throughout their evolutionary history, reflecting the bacterial adaptability to a wide range of ecological niches. Finally, we highlighted loci likely implicated in human disease.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/R012504/1, BBS/E/F/000PR10348, BBS/E/F/000PR10351; Department of Health: HICF-T5-342; Wellcome Trust: 089275/Z/09/Z, 098051, 106698, 216457/Z/19/Z, WT098600

    Communications biology 2019;2;428

  • GATA6 Cooperates with EOMES/SMAD2/3 to Deploy the Gene Regulatory Network Governing Human Definitive Endoderm and Pancreas Formation.

    Chia CY, Madrigal P, Denil SLIJ, Martinez I, Garcia-Bernardo J, El-Khairi R, Chhatriwala M, Shepherd MH, Hattersley AT, Dunn NR and Vallier L

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK; Institute of Medical Biology, A(∗)STAR (Agency for Science, Technology and Research), 8A Biomedical Grove, #06-06 Immunos, 138648, Singapore.

    Heterozygous de novo mutations in GATA6 are the most frequent cause of pancreatic agenesis in humans. In mice, however, a similar phenotype requires the biallelic loss of Gata6 and its paralog Gata4. To elaborate the human-specific requirements for GATA6, we chose to model GATA6 loss in vitro by combining both gene-edited and patient-derived pluripotent stem cells (hPSCs) and directed differentiation toward β-like cells. We find that GATA6 heterozygous hPSCs show a modest reduction in definitive endoderm (DE) formation, while GATA6-null hPSCs fail to enter the DE lineage. Consistent with these results, genome-wide studies show that GATA6 binds and cooperates with EOMES/SMAD2/3 to regulate the expression of cardinal endoderm genes. The early deficit in DE is accompanied by a significant reduction in PDX1<sup>+</sup> pancreatic progenitors and C-PEPTIDE<sup>+</sup> β-like cells. Taken together, our data position GATA6 as a gatekeeper to early human, but not murine, pancreatic ontogeny.

    Funded by: Medical Research Council: MC_PC_12009; Wellcome Trust

    Stem cell reports 2019;12;1;57-70

  • Reasonable expectations of privacy in non-disclosure of familial genetic risk: What is it reasonable to expect?

    Chico V

    Society and Ethics Research Group, Connecting Science, Wellcome Genome Campus, Cambridge, UK. Electronic address:

    Where there is conflict between a patient's interests in non-disclosure of their genetic information to relatives and the relative's interest in knowing the information because it indicates their genetic risk, clinicians have customarily been able to protect themselves against legal action by maintaining confidence even if, professionally, they did not consider this to be the right thing to do. In ABC v St Georges Healthcare NHS Trust ([2017] EWCA Civ 336) the healthcare team recorded their concern about the wisdom of the patient's decision to withhold genetic risk information from his relative, but chose to respect what they considered to be an unwise choice. Even though professional guidance considers that clinicians have the discretion to breach confidence where they believe this to be justified, (Royal College of Physicians, Royal College of Pathologists and the British Society of Human Genetics, 2006; GMC, 2017) clinicians find it difficult to exercise this discretion in line with their convictions against the backdrop of the legal prioritisation of the duty to maintain confidence. Thus, the professional discretion is not being freely exercised because of doubts about the legal protection available in the event of disclosure. The reliance on consent as the legal basis for setting aside the duty of confidence often vetoes sharing information with relatives. This paper argues that an objective approach based on privacy, rather than a subjective consent-based approach, would give greater freedom to clinicians to exercise the discretion which their professional guidance affords.

    Funded by: Wellcome Trust

    European journal of medical genetics 2019;62;5;308-315

  • Genome-wide mutational biases fuel transcriptional diversity in the Mycobacterium tuberculosis complex.

    Chiner-Oms Á, Berney M, Boinett C, González-Candelas F, Young DB, Gagneux S, Jacobs WR, Parkhill J, Cortes T and Comas I

    Unidad Mixta "Infección y Salud Pública" FISABIO-CSISP/Universidad de Valencia, Instituto de Biología Integrativa de Sistemas-I2SysBio, Valencia, Spain.

    The Mycobacterium tuberculosis complex (MTBC) members display different host-specificities and virulence phenotypes. Here, we have performed a comprehensive RNAseq and methylome analysis of the main clades of the MTBC and discovered unique transcriptional profiles. The majority of genes differentially expressed between the clades encode proteins involved in host interaction and metabolic functions. A significant fraction of changes in gene expression can be explained by positive selection on single mutations that either create or disrupt transcriptional start sites (TSS). Furthermore, we show that clinical strains have different methyltransferases inactivated and thus different methylation patterns. Under the tested conditions, differential methylation has a minor direct role on transcriptomic differences between strains. However, disruption of a methyltransferase in one clinical strain revealed important expression differences suggesting indirect mechanisms of expression regulation. Our study demonstrates that variation in transcriptional profiles are mainly due to TSS mutations and have likely evolved due to differences in host characteristics.

    Funded by: NIAID NIH HHS: R01 AI026170, R01 AI139465, R21 AI119573, R37 AI026170; Wellcome Trust

    Nature communications 2019;10;1;3994

  • Genomic determinants of speciation and spread of the Mycobacterium tuberculosis complex.

    Chiner-Oms Á, Sánchez-Busó L, Corander J, Gagneux S, Harris SR, Young D, González-Candelas F and Comas I

    Unidad Mixta "Infección y Salud Pública" FISABIO-CSISP/Universidad de Valencia, Instituto de Biología Integrativa de Sistemas (ISysBio), Valencia, Spain.

    Models on how bacterial lineages differentiate increase our understanding of early bacterial speciation events and the genetic loci involved. Here, we analyze the population genomics events leading to the emergence of the tuberculosis pathogen. The emergence is characterized by a combination of recombination events involving core pathogenesis functions and purifying selection on early diverging loci. We identify the <i>phoR</i> gene, the sensor kinase of a two-component system involved in virulence, as a key functional player subject to pervasive positive selection after the divergence of the <i>Mycobacterium tuberculosis</i> complex from its ancestor. Previous evidence showed that <i>phoR</i> mutations played a central role in the adaptation of the pathogen to different host species. Now, we show that <i>phoR</i> mutations have been under selection during the early spread of human tuberculosis, during later expansions, and in ongoing transmission events. Our results show that linking pathogen evolution across evolutionary and epidemiological time scales points to past and present virulence determinants.

    Science advances 2019;5;6;eaaw3307

  • Dissecting the molecular evolution of fluoroquinolone-resistant Shigella sonnei.

    Chung The H, Boinett C, Pham Thanh D, Jenkins C, Weill FX, Howden BP, Valcanis M, De Lappe N, Cormican M, Wangchuk S, Bodhidatta L, Mason CJ, Nguyen TNT, Ha Thanh T, Voong VP, Duong VT, Nguyen PHL, Turner P, Wick R, Ceyssens PJ, Thwaites G, Holt KE, Thomson NR, Rabaa MA and Baker S

    The Hospital for Tropical Diseases, Wellcome Trust Major Overseas Programme, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam.

    Shigella sonnei increasingly dominates the international epidemiological landscape of shigellosis. Treatment options for S. sonnei are dwindling due to resistance to several key antimicrobials, including the fluoroquinolones. Here we analyse nearly 400 S. sonnei whole genome sequences from both endemic and non-endemic regions to delineate the evolutionary history of the recently emergent fluoroquinolone-resistant S. sonnei. We reaffirm that extant resistant organisms belong to a single clonal expansion event. Our results indicate that sequential accumulation of defining mutations (gyrA-S83L, parC-S80I, and gyrA-D87G) led to the emergence of the fluoroquinolone-resistant S. sonnei population around 2007 in South Asia. This clone was then transmitted globally, resulting in establishments in Southeast Asia and Europe. Mutation analysis suggests that the clone became dominant through enhanced adaptation to oxidative stress. Experimental evolution reveals that under fluoroquinolone exposure in vitro, resistant S. sonnei develops further intolerance to the antimicrobial while the susceptible counterpart fails to attain complete resistance.

    Funded by: Wellcome Trust

    Nature communications 2019;10;1;4828

  • Impact of carbohydrate substrate complexity on the diversity of the human colonic microbiota.

    Chung WSF, Walker AW, Vermeiren J, Sheridan PO, Bosscher D, Garcia-Campayo V, Parkhill J, Flint HJ and Duncan SH

    Gut Health Group, Rowett Institute, University of Aberdeen, Foresterhill, Aberdeen, Scotland, AB25 2ZD, UK.

    The diversity of the colonic microbial community has been linked with health in adults and diet composition is one possible determinant of diversity. We used carefully controlled conditions in vitro to determine how the complexity and multiplicity of growth substrates influence species diversity of the human colonic microbiota. In each experiment, five parallel anaerobic fermenters that received identical faecal inocula were supplied continuously with single carbohydrates (either arabinoxylan-oligosaccharides (AXOS), pectin or inulin) or with a '3-mix' of all three carbohydrates, or with a '6-mix' that additionally contained resistant starch, β-glucan and galactomannan as energy sources. Inulin supported less microbial diversity over the first 6 d than the other two single substrates or the 3- and 6-mixes, showing that substrate complexity is key to influencing microbiota diversity. The communities enriched in these fermenters did not differ greatly at the phylum and family level, but were markedly different at the species level. Certain species were promoted by single substrates, whilst others (such as Bacteroides ovatus, LEfSe P = 0.001) showed significantly greater success with the mixed substrate. The complex polysaccharides such as pectin and arabinoxylan-oligosaccharides promoted greater diversity than simple homopolymers, such as inulin. These findings suggest that dietary strategies intended to achieve health benefits by increasing gut microbiota diversity should employ complex non-digestible substrates and substrate mixtures.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/J012351/1; Wellcome Trust

    FEMS microbiology ecology 2019;95;1

  • How to Find a Resident Kidney Macrophage: the Single-Cell Sequencing Solution.

    Clatworthy MR

    Molecular Immunity Unit, Department of Medicine, University of Cambridge, Cambridge, UK;

    Journal of the American Society of Nephrology : JASN 2019;30;5;715-716

  • Refining the Primrose syndrome phenotype: A study of five patients with ZBTB20 de novo variants and a review of the literature.

    Cleaver R, Berg J, Craft E, Foster A, Gibbons RJ, Hobson E, Lachlan K, Naik S, Sampson JR, Sharif S, Smithson S, Deciphering Developmental Disorders Study, Parker MJ and Tatton-Brown K

    South West Thames Regional Genetics Service, St. George's University Hospitals NHS Foundation Trust, London, United Kingdom.

    Primrose syndrome is a rare autosomal dominant condition caused by heterozygous missense variants within ZBTB20. Through an exome sequencing approach (as part of the Deciphering Developmental Disorders [DDD] study) we have identified five unrelated individuals with previously unreported, de novo ZBTB20 pathogenic missense variants. All five missense variants targeted the C2H2 zinc finger domains. This genotype-up approach has allowed further refinement of the Primrose syndrome phenotype. Major characteristics (>90% individuals) include an intellectual disability (most frequently in the moderate range), a recognizable facial appearance and brain MRI abnormalities, particularly abnormalities of the corpus callosum. Other frequent clinical associations (in 50-90% individuals) include sensorineural hearing loss (83%), hypotonia (78%), cryptorchidism in males (75%), macrocephaly (72%), behavioral issues (56%), and dysplastic/hypoplastic nails (57%). Based upon these clinical data we discuss our current management of patients with Primrose syndrome.

    Funded by: National Institute for Health Research; Wellcome Trust

    American journal of medical genetics. Part A 2019;179;3;344-349

  • A bird's-eye view of Italian genomic variation through whole-genome sequencing.

    Cocca M, Barbieri C, Concas MP, Robino A, Brumat M, Gandin I, Trudu M, Sala CF, Vuckovic D, Girotto G, Matullo G, Polasek O, Kolčić I, Gasparini P, Soranzo N, Toniolo D and Mezzavilla M

    Institute for Maternal and Child Health IRCCS Burlo Garofolo, Trieste, Italy.

    The genomic variation of the Italian peninsula populations is currently under characterised: the only Italian whole-genome reference is represented by the Tuscans from the 1000 Genome Project. To address this issue, we sequenced a total of 947 Italian samples from three different geographical areas. First, we defined a new Italian Genome Reference Panel (IGRP1.0) for imputation, which improved imputation accuracy, especially for rare variants, and we tested it by GWAS analysis on red blood traits. Furthermore, we extended the catalogue of genetic variation investigating the level of population structure, the pattern of natural selection, the distribution of deleterious variants and occurrence of human knockouts (HKOs). Overall the results demonstrate a high level of genomic differentiation between cohorts, different signatures of natural selection and a distinctive distribution of deleterious variants and HKOs, confirming the necessity of distinct genome references for the Italian population.

    Funded by: Wellcome Trust

    European journal of human genetics : EJHG 2019;28;4;435-444

  • The MITF-SOX10 regulated long non-coding RNA DIRC3 is a melanoma tumour suppressor.

    Coe EA, Tan JY, Shapiro M, Louphrasitthiphol P, Bassett AR, Marques AC, Goding CR and Vance KW

    Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom.

    The MITF and SOX10 transcription factors regulate the expression of genes important for melanoma proliferation, invasion and metastasis. Despite growing evidence of the contribution of long noncoding RNAs (lncRNAs) in cancer, including melanoma, their functions within MITF-SOX10 transcriptional programmes remain poorly investigated. Here we identify 245 candidate melanoma associated lncRNAs whose loci are co-occupied by MITF-SOX10 and that are enriched at active enhancer-like regions. Our work suggests that one of these, Disrupted In Renal Carcinoma 3 (DIRC3), may be a clinically important MITF-SOX10 regulated tumour suppressor. DIRC3 depletion in human melanoma cells leads to increased anchorage-independent growth, a hallmark of malignant transformation, whilst melanoma patients classified by low DIRC3 expression have decreased survival. DIRC3 is a nuclear lncRNA that activates expression of its neighbouring IGFBP5 tumour suppressor through modulating chromatin structure and suppressing SOX10 binding to putative regulatory elements within the DIRC3 locus. In turn, DIRC3 dependent regulation of IGFBP5 impacts the expression of genes involved in cancer associated processes and is needed for DIRC3 control of anchorage-independent growth. Our work indicates that lncRNA components of MITF-SOX10 networks are an important new class of melanoma regulators and candidate therapeutic targets that can act not only as downstream mediators of MITF-SOX10 function but as feedback regulators of MITF-SOX10 activity.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/N005856/1

    PLoS genetics 2019;15;12;e1008501

  • Missense Variants in the Histone Acetyltransferase Complex Component Gene TRRAP Cause Autism and Syndromic Intellectual Disability.

    Cogné B, Ehresmann S, Beauregard-Lacroix E, Rousseau J, Besnard T, Garcia T, Petrovski S, Avni S, McWalter K, Blackburn PR, Sanders SJ, Uguen K, Harris J, Cohen JS, Blyth M, Lehman A, Berg J, Li MH, Kini U, Joss S, von der Lippe C, Gordon CT, Humberson JB, Robak L, Scott DA, Sutton VR, Skraban CM, Johnston JJ, Poduri A, Nordenskjöld M, Shashi V, Gerkes EH, Bongers EMHF, Gilissen C, Zarate YA, Kvarnung M, Lally KP, Kulch PA, Daniels B, Hernandez-Garcia A, Stong N, McGaughran J, Retterer K, Tveten K, Sullivan J, Geisheker MR, Stray-Pedersen A, Tarpinian JM, Klee EW, Sapp JC, Zyskind J, Holla ØL, Bedoukian E, Filippini F, Guimier A, Picard A, Busk ØL, Punetha J, Pfundt R, Lindstrand A, Nordgren A, Kalb F, Desai M, Ebanks AH, Jhangiani SN, Dewan T, Coban Akdemir ZH, Telegrafi A, Zackai EH, Begtrup A, Song X, Toutain A, Wentzensen IM, Odent S, Bonneau D, Latypova X, Deb W, CAUSES Study, Redon S, Bilan F, Legendre M, Troyer C, Whitlock K, Caluseriu O, Murphree MI, Pichurin PN, Agre K, Gavrilova R, Rinne T, Park M, Shain C, Heinzen EL, Xiao R, Amiel J, Lyonnet S, Isidor B, Biesecker LG, Lowenstein D, Posey JE, Denommé-Pichon AS, Deciphering Developmental Disorders study, Férec C, Yang XJ, Rosenfeld JA, Gilbert-Dussardier B, Audebert-Bellanger S, Redon R, Stessman HAF, Nellaker C, Yang Y, Lupski JR, Goldstein DB, Eichler EE, Bolduc F, Bézieau S, Küry S and Campeau PM

    Centre Hospitalier Universitaire de Nantes, Service de Génétique Médicale, 9 quai Moncousu, 44093 Nantes, France; INSERM, CNRS, UNIV Nantes, l'institut du thorax, 44007 Nantes, France.

    Acetylation of the lysine residues in histones and other DNA-binding proteins plays a major role in regulation of eukaryotic gene expression. This process is controlled by histone acetyltransferases (HATs/KATs) found in multiprotein complexes that are recruited to chromatin by the scaffolding subunit transformation/transcription domain-associated protein (TRRAP). TRRAP is evolutionarily conserved and is among the top five genes intolerant to missense variation. Through an international collaboration, 17 distinct de novo or apparently de novo variants were identified in TRRAP in 24 individuals. A strong genotype-phenotype correlation was observed with two distinct clinical spectra. The first is a complex, multi-systemic syndrome associated with various malformations of the brain, heart, kidneys, and genitourinary system and characterized by a wide range of intellectual functioning; a number of affected individuals have intellectual disability (ID) and markedly impaired basic life functions. Individuals with this phenotype had missense variants clustering around the c.3127G>A p.(Ala1043Thr) variant identified in five individuals. The second spectrum manifested with autism spectrum disorder (ASD) and/or ID and epilepsy. Facial dysmorphism was seen in both groups and included upslanted palpebral fissures, epicanthus, telecanthus, a wide nasal bridge and ridge, a broad and smooth philtrum, and a thin upper lip. RNA sequencing analysis of skin fibroblasts derived from affected individuals skin fibroblasts showed significant changes in the expression of several genes implicated in neuronal function and ion transport. Thus, we describe here the clinical spectrum associated with TRRAP pathogenic missense variants, and we suggest a genotype-phenotype correlation useful for clinical evaluation of the pathogenicity of the variants.

    Funded by: CIHR; Intramural NIH HHS: Z01 HG200328; Medical Research Council: MR/M014568/1; NHGRI NIH HHS: K08 HG008986, UM1 HG006542; NICHD NIH HHS: R01 HD064667, U54 HD083091, U54 HD086984; NIGMS NIH HHS: T32 GM007266; NIMH NIH HHS: R01 MH101221; NINDS NIH HHS: R35 NS105078, U01 NS053998, U01 NS077274, U01 NS077276, U01 NS077303, U01 NS077364

    American journal of human genetics 2019;104;3;530-541

  • Detection of Cell Surface Ligands for Human Synovial γδ T Cells.

    Collins C, Lui Y, Santos AM, Ballif BA, Gogerly-Moragoda AM, Brouwer H, Ross R, Balagurunathan K, Sharma S, Wright GJ, Davis S and Budd RC

    Vermont Center for Immunology and Infectious Diseases, Department of Medicine, Larner College of Medicine, The University of Vermont, Burlington, VT 05405.

    Lack of understanding of the nature and physiological regulation of γδ T cell ligands has considerably hampered full understanding of the function of these cells. We developed an unbiased approach to identify human γδ T cells ligands by the production of a soluble TCR-γδ (sTCR-γδ) tetramer from a synovial Vδ1 γδ T cell clone from a Lyme arthritis patient. The sTCR-γδ was used in flow cytometry to initially define the spectrum of ligand expression by both human tumor cell lines and certain human primary cells. Analysis of diverse tumor cell lines revealed high ligand expression on several of epithelial or fibroblast origin, whereas those of hematopoietic origin were largely devoid of ligand. This allowed a bioinformatics-based identification of candidate ligands using RNAseq data from each tumor line. We further observed that whereas fresh monocytes and T cells expressed low to negligible levels of TCR-γδ ligands, activation of these cells resulted in upregulation of surface ligand expression. Ligand upregulation on monocytes was partly dependent upon IL-1β. The sTCR-γδ tetramer was then used to bind candidate ligands from lysates of activated monocytes and analyzed by mass spectrometry. Surface TCR-γδ ligand was eliminated by treatment with trypsin or removal of glycosaminoglycans, and also suppressed by inhibition of endoplasmic reticulum-Golgi transport. Of particular interest was that inhibition of glycolysis also blocked TCR-γδ ligand expression. These findings demonstrate the spectrum of ligand(s) expression for human synovial Vδ1 γδ T cells as well as the physiology that regulates their expression.

    Funded by: NHLBI NIH HHS: P01 HL107152; NIAID NIH HHS: R21 AI107298, R21 AI119979; NIGMS NIH HHS: P20 GM103449, P30 GM118228; Wellcome Trust

    Journal of immunology (Baltimore, Md. : 1950) 2019;203;9;2369-2376

  • Common and distinct transcriptional signatures of mammalian embryonic lethality.

    Collins JE, White RJ, Staudt N, Sealy IM, Packham I, Wali N, Tudor C, Mazzeo C, Green A, Siragher E, Ryder E, White JK, Papatheodoru I, Tang A, Füllgrabe A, Billis K, Geyer SH, Weninger WJ, Galli A, Hemberger M, Stemple DL, Robertson E, Smith JC, Mohun T, Adams DJ and Busch-Nentwich EM

    Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1SA, UK.

    The Deciphering the Mechanisms of Developmental Disorders programme has analysed the morphological and molecular phenotypes of embryonic and perinatal lethal mouse mutant lines in order to investigate the causes of embryonic lethality. Here we show that individual whole-embryo RNA-seq of 73 mouse mutant lines (>1000 transcriptomes) identifies transcriptional events underlying embryonic lethality and associates previously uncharacterised genes with specific pathways and tissues. For example, our data suggest that Hmgxb3 is involved in DNA-damage repair and cell-cycle regulation. Further, we separate embryonic delay signatures from mutant line-specific transcriptional changes by developing a baseline mRNA expression catalogue of wild-type mice during early embryogenesis (4-36 somites). Analysis of transcription outside coding sequence identifies deregulation of repetitive elements in Morc2a mutants and a gene involved in gene-specific splicing. Collectively, this work provides a large scale resource to further our understanding of early embryonic developmental disorders.

    Funded by: Cancer Research UK: 14356; Medical Research Council: MR/L007428/1

    Nature communications 2019;10;1;2792

  • Large-scale neuroanatomical study uncovers 198 gene associations in mouse brain morphogenesis.

    Collins SC, Mikhaleva A, Vrcelj K, Vancollie VE, Wagner C, Demeure N, Whitley H, Kannan M, Balz R, Anthony LFE, Edwards A, Moine H, White JK, Adams DJ, Reymond A, Lelliott CJ, Webber C and Yalcin B

    Institut de Génétique et de Biologie Moléculaire et Cellulaire, 67404, Illkirch, France.

    Brain morphogenesis is an important process contributing to higher-order cognition, however our knowledge about its biological basis is largely incomplete. Here we analyze 118 neuroanatomical parameters in 1,566 mutant mouse lines and identify 198 genes whose disruptions yield NeuroAnatomical Phenotypes (NAPs), mostly affecting structures implicated in brain connectivity. Groups of functionally similar NAP genes participate in pathways involving the cytoskeleton, the cell cycle and the synapse, display distinct fetal and postnatal brain expression dynamics and importantly, their disruption can yield convergent phenotypic patterns. 17% of human unique orthologues of mouse NAP genes are known loci for cognitive dysfunction. The remaining 83% constitute a vast pool of genes newly implicated in brain architecture, providing the largest study of mouse NAP genes and pathways. This offers a complementary resource to human genetic studies and predict that many more genes could be involved in mammalian brain morphogenesis.

    Nature communications 2019;10;1;3465

  • Genomic diversity and novel genome-wide association with fruit morphology in Capsicum, from 746k polymorphic sites.

    Colonna V, D'Agostino N, Garrison E, Albrechtsen A, Meisner J, Facchiano A, Cardi T and Tripodi P

    Institute of Genetics and Biophysics, National Research Council (CNR), Naples, Italy.

    Capsicum is one of the major vegetable crops grown worldwide. Current subdivision in clades and species is based on morphological traits and coarse sets of genetic markers. Broad variability of fruits has been driven by breeding programs and has been mainly studied by linkage analysis. We discovered 746k variable sites by sequencing 1.8% of the genome in a collection of 373 accessions belonging to 11 Capsicum species from 51 countries. We describe genomic variation at population-level, confirm major subdivision in clades and species, and show that the known major subdivision of C. annuum separates large and bulky fruits from small ones. In C. annuum, we identify four novel loci associated with phenotypes determining the fruit shape, including a non-synonymous mutation in the gene Longifolia 1-like (CA03g16080). Our collection covers all the economically important species of Capsicum widely used in breeding programs and represent the widest and largest study so far in terms of the number of species and number of genetic variants analyzed. We identified a large set of markers that can be used for population genetic studies and genetic association analyses. Our results provide a comprehensive and precise perspective on genomic variability in Capsicum at population-level and suggest that future fine genetic association studies will yield useful results for breeding.

    Funded by: EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020): 677379; Ministero dell&amp;apos;Istruzione, dell&amp;apos;Universit&amp;amp;#x00E0; e della Ricerca (Ministry of Education, University and Research): PON02_00395_3215002

    Scientific reports 2019;9;1;10067

  • PDX Finder: A portal for patient-derived tumor xenograft model discovery.

    Conte N, Mason JC, Halmagyi C, Neuhauser S, Mosaku A, Yordanova G, Chatzipli A, Begley DA, Krupke DM, Parkinson H, Meehan TF and Bult CC

    European Molecular Biology Laboratory- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

    Patient-derived tumor xenograft (PDX) mouse models are a versatile oncology research platform for studying tumor biology and for testing chemotherapeutic approaches tailored to genomic characteristics of individual patients' tumors. PDX models are generated and distributed by a diverse group of academic labs, multi-institution consortia and contract research organizations. The distributed nature of PDX repositories and the use of different metadata standards for describing model characteristics presents a significant challenge to identifying PDX models relevant to specific cancer research questions. The Jackson Laboratory and EMBL-EBI are addressing these challenges by co-developing PDX Finder, a comprehensive open global catalog of PDX models and their associated datasets. Within PDX Finder, model attributes are harmonized and integrated using a previously developed community minimal information standard to support consistent searching across the originating resources. Links to repositories are provided from the PDX Finder search results to facilitate model acquisition and/or collaboration. The PDX Finder resource currently contains information for 1985 PDX models of diverse cancers including those from large resources such as the Patient-Derived Models Repository, PDXNet and EurOPDX. Individuals or organizations that generate and distribute PDXs are invited to increase the 'findability' of their models by participating in the PDX Finder initiative at

    Funded by: NCI NIH HHS: R01 CA089713, U24 CA204781

    Nucleic acids research 2019;47;D1;D1073-D1079

  • Chemokine receptor trafficking coordinates neutrophil clustering and dispersal at wounds in zebrafish.

    Coombs C, Georgantzoglou A, Walker HA, Patt J, Merten N, Poplimont H, Busch-Nentwich EM, Williams S, Kotsi C, Kostenis E and Sarris M

    Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Site, Cambridge, CB2 3DY, UK.

    Immune cells congregate at specific loci to fight infections during inflammatory responses, a process that must be transient and self-resolving. Cell dispersal promotes resolution, but it remains unclear how transition from clustering to dispersal is regulated. Here we show, using quantitative live imaging in zebrafish, that differential ligand-induced trafficking of chemokine receptors such as Cxcr1 and Cxcr2 orchestrates the state of neutrophil congregation at sites of tissue damage. Through receptor mutagenesis and biosensors, we show that Cxcr1 promotes clustering at wound sites, but is promptly desensitized and internalized, which prevents excess congregation. By contrast, Cxcr2 promotes bidirectional motility and is sustained at the plasma membrane. Persistent plasma membrane residence of Cxcr2 prolongs downstream signaling and is required for sustained exploratory motion conducive to dispersal. Thus, differential trafficking of two chemokine receptors allows coordination of antagonistic cell behaviors, promoting a self-resolving migratory response.

    Funded by: Medical Research Council: MR/L019523/1; Wellcome Trust; Wellcome Trust (Wellcome): 204845/Z/16/Z

    Nature communications 2019;10;1;5166

  • Embryonal precursors of Wilms tumor.

    Coorens THH, Treger TD, Al-Saadi R, Moore L, Tran MGB, Mitchell TJ, Tugnait S, Thevanesan C, Young MD, Oliver TRW, Oostveen M, Collord G, Tarpey PS, Cagan A, Hooks Y, Brougham M, Reynolds BC, Barone G, Anderson J, Jorgensen M, Burke GAA, Visser J, Nicholson JC, Smeulders N, Mushtaq I, Stewart GD, Campbell PJ, Wedge DC, Martincorena I, Rampling D, Hook L, Warren AY, Coleman N, Chowdhury T, Sebire N, Drost J, Saeb-Parsy K, Stratton MR, Straathof K, Pritchard-Jones K and Behjati S

    Wellcome Sanger Institute, Hinxton CB10 1SA, UK.

    Adult cancers often arise from premalignant clonal expansions. Whether the same is true of childhood tumors has been unclear. To investigate whether Wilms tumor (nephroblastoma; a childhood kidney cancer) develops from a premalignant background, we examined the phylogenetic relationship between tumors and corresponding normal tissues. In 14 of 23 cases studied (61%), we found premalignant clonal expansions in morphologically normal kidney tissues that preceded tumor development. These clonal expansions were defined by somatic mutations shared between tumor and normal tissues but absent from blood cells. We also found hypermethylation of the <i>H19</i> locus, a known driver of Wilms tumor development, in 58% of the expansions. Phylogenetic analyses of bilateral tumors indicated that clonal expansions can evolve before the divergence of left and right kidney primordia. These findings reveal embryonal precursors from which unilateral and multifocal cancers develop.

    Funded by: Wellcome Trust: 110104

    Science (New York, N.Y.) 2019;366;6470;1247-1251

  • Bacterial Population Genomics

    Corander, J., Croucher, N.J., Harris, S.R., Lees, J.A. and Tonkin‐Hill, G.

    In: Handbook of Statistical Genomics, 4th edition. Balding, D., Moltke, I. and Marioni, J. (eds.), Wiley, Hoboken, NJ, USA 2019;997-1020

  • Identification and validation of two peptide markers for the recognition of Clostridioides difficile MLST-1 and MLST-11 by MALDI-MS.

    Corver J, Sen J, Hornung BVH, Mertens BJ, Berssenbrugge EKL, Harmanus C, Sanders IMJG, Kumar N, Lawley TD, Kuijper EJ, Hensbergen PJ and Nicolardi S

    Leiden University Medical Centre, Centre of Infectious Diseases, Department Medical Microbiology, Section Experimental Bacteriology, Leiden, the Netherlands; Centre for Microbiota Analysis and Therapeutics, Department Medical Microbiology, Leiden University, Leiden, the Netherlands.

    Objectives: Clostridioides difficile infection (CDI) has become the main cause of nosocomial infective diarrhoea. To survey and control the spread of different C. difficile strains, there is a need for suitable rapid tests. The aim of this study was to identify peptide/protein markers for the rapid recognition of C. difficile strains by matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS).

    Methods: We analysed 44 well-characterized strains, belonging to eight different multi-locus sequence types (MLST), using ultrahigh-resolution Fourier transform ion cyclotron resonance (FTICR) MS. The amino acid sequence of two peptide markers specific for MLST-1 and MLST-11 strains was elucidated by MALDI-TOF-MS/MS. The investigation of 2689 C. difficile genomes allowed the determination of the sensitivity and specificity of these markers. C18-solid-phased extraction was used to enrich the MLST-1 marker.

    Results: Two peptide markers (m/z 4927.81 and m/z 5001.84) were identified and characterized for MLST-1 and MLST-11 strains, respectively. The MLST-1 marker was found in 786 genomes of which three did not belong to MLST-1. The MLST-11 marker was found in 319 genomes, of which 14 did not belong to MLST-11. Importantly, all MLST-1 and MLST-11 genomes were positive for their respective marker. Furthermore, a peptide marker (m/z 5015.86) specific for MLST-15 was found in 59 genomes. We translated our findings into a fast and simple method that allowed the unambiguous identification of the MLST-1 marker on a MALDI-TOF-MS platform.

    Conclusions: MALDI-FTICR MS-based peptide profiling resulted in the identification of peptide markers for C. difficile MLST-1 and MLST-11.

    Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases 2019;25;7;904.e1-904.e7

  • Pathogenomics of Emerging Campylobacter Species.

    Costa D and Iraola G

    Microbial Genomics Laboratory, Institut Pasteur de Montevideo, Montevideo, Uruguay.

    <i>Campylobacter</i> is among the four main causes of gastroenteritis worldwide and has increased in both developed and developing countries over the last 10 years. The vast majority of reported <i>Campylobacter</i> infections are caused by <i>Campylobacter jejuni</i> and, to a lesser extent, <i>C. coli</i>; however, the increasing recognition of other emerging <i>Campylobacter</i> pathogens is urgently demanding a better understanding of how these underestimated species cause disease, transmit, and evolve. In parallel to the enhanced clinical awareness of campylobacteriosis due to improved diagnostic protocols, the application of high-throughput sequencing has increased the number of whole-genome sequences available to dozens of strains of many emerging campylobacters. This has allowed for comprehensive comparative pathogenomic analyses for several species, such as <i>C. fetus</i> and <i>C. concisus</i> These studies have started to reveal the evolutionary forces shaping their genomes and have brought to light many genomic features related to pathogenicity in these neglected species, promoting the development of new tools and approaches relevant for clinical microbiology. Despite the need for additional characterization of genomic diversity in emerging campylobacters, the increasing body of literature describing pathogenomic studies on these species deserves to be discussed from an integrative perspective. This review compiles the current knowledge and highlights future work toward deepening our understanding about genome dynamics and the mechanisms governing the evolution of pathogenicity in emerging <i>Campylobacter</i> species, which is urgently needed to develop strategies to prevent or control the spread of these pathogens.

    Clinical microbiology reviews 2019;32;4

  • The Contribution of Genetic Variation of Streptococcus pneumoniae to the Clinical Manifestation of Invasive Pneumococcal Disease.

    Cremers AJH, Mobegi FM, van der Gaast-de Jongh C, van Weert M, van Opzeeland FJ, Vehkala M, Knol MJ, Bootsma HJ, Välimäki N, Croucher NJ, Meis JF, Bentley S, van Hijum SAFT, Corander J, Zomer AL, Ferwerda G and de Jonge MI

    Section of Pediatric Infectious Diseases, Laboratory of Medical Immunology, Radboud Institute for Molecular Life Sciences, Nijmegen, The Netherlands.

    Background: Different clinical manifestations of invasive pneumococcal disease (IPD) have thus far mainly been explained by patient characteristics. Here we studied the contribution of pneumococcal genetic variation to IPD phenotype.

    Methods: The index cohort consisted of 349 patients admitted to 2 Dutch hospitals between 2000-2011 with pneumococcal bacteremia. We performed genome-wide association studies to identify pneumococcal lineages, genes, and allelic variants associated with 23 clinical IPD phenotypes. The identified associations were validated in a nationwide (n = 482) and a post-pneumococcal vaccination cohort (n = 121). The contribution of confirmed pneumococcal genotypes to the clinical IPD phenotype, relative to known clinical predictors, was tested by regression analysis.

    Results: Among IPD patients, the presence of pneumococcal gene slaA was a nationwide confirmed independent predictor of meningitis (odds ratio [OR], 10.5; P = .001), as was sequence cluster 9 (serotype 7F: OR, 3.68; P = .057). A set of 4 pneumococcal genes co-located on a prophage was a confirmed independent predictor of 30-day mortality (OR, 3.4; P = .003). We could detect the pneumococcal variants of concern in these patients' blood samples.

    Conclusions: In this study, knowledge of pneumococcal genotypic variants improved the clinical risk assessment for detrimental manifestations of IPD. This provides us with novel opportunities to target, anticipate, or avert the pathogenic effects related to particular pneumococcal variants, and indicates that information on pneumococcal genotype is important for the diagnostic and treatment strategy in IPD. Ongoing surveillance is warranted to monitor the clinical value of information on pneumococcal variants in dynamic microbial and susceptible host populations.

    Funded by: Medical Research Council: MR/R015600/1

    Clinical infectious diseases : an official publication of the Infectious Diseases Society of America 2019;68;1;61-69

  • The speciation and hybridization history of the genus Salmonella.

    Criscuolo A, Issenhuth-Jeanjean S, Didelot X, Thorell K, Hale J, Parkhill J, Thomson NR, Weill FX, Falush D and Brisse S

    Hub de Bioinformatique et Biostatistique - Département Biologie Computationnelle, Institut Pasteur, USR 3756 CNRS, Paris, France.

    Bacteria and archaea make up most of natural diversity, but the mechanisms that underlie the origin and maintenance of prokaryotic species are poorly understood. We investigated the speciation history of the genus <i>Salmonella</i>, an ecologically diverse bacterial lineage, within which <i>S. enterica</i> subsp. <i>enterica</i> is responsible for important human food-borne infections. We performed a survey of diversity across a large reference collection using multilocus sequence typing, followed by genome sequencing of distinct lineages. We identified 11 distinct phylogroups, 3 of which were previously undescribed. Strains assigned to <i>S. enterica</i> subsp. <i>salamae</i> are polyphyletic, with two distinct lineages that we designate Salamae A and B. Strains of the subspecies <i>houtenae</i> are subdivided into two groups, Houtenae A and B, and are both related to Selander's group VII. A phylogroup we designate VIII was previously unknown. A simple binary fission model of speciation cannot explain observed patterns of sequence diversity. In the recent past, there have been large-scale hybridization events involving an unsampled ancestral lineage and three distantly related lineages of the genus that have given rise to Houtenae A, Houtenae B and VII. We found no evidence for ongoing hybridization in the other eight lineages, but detected subtler signals of ancient recombination events. We are unable to fully resolve the speciation history of the genus, which might have involved additional speciation-by-hybridization or multi-way speciation events. Our results imply that traditional models of speciation by binary fission and divergence are not sufficient to account for <i>Salmonella</i> evolution.

    Microbial genomics 2019;5;8

  • Systematic screening of 96 Schistosoma mansoni cell-surface and secreted antigens does not identify any strongly protective vaccine candidates in a mouse model of infection.

    Crosnier C, Brandt C, Rinaldi G, McCarthy C, Barker C, Clare S, Berriman M and Wright GJ

    Wellcome Sanger Institute, Cambridge, CB10 1SA, UK.

    <b>Background:</b> Schistosomiasis is a major parasitic disease affecting people living in tropical and sup-tropical areas. Transmission of the parasite has been reported in 78 countries, causing significant morbidity and around 200,000 deaths per year in endemic regions. The disease is currently managed by the mass-administration of praziquantel to populations at risk of infection; however, the reliance on a single drug raises the prospect of parasite resistance to the only treatment widely available. The development of an effective vaccine would be a more powerful method of control, but none currently exists and the identification of new immunogens that can elicit protective immune responses therefore remains a priority. Because of the complex nature of the parasite life cycle, identification of new vaccine candidates has mostly relied on the use of animal models and on a limited set of recombinant proteins. <b>Methods:</b> In this study, we have established an infrastructure for testing a large number of vaccine candidates in mice and used it to screen 96 cell-surface and secreted recombinant proteins from <i>Schistosoma mansoni</i>. This approach, using standardised immunisation and percutaneous infection protocols, allowed us to compare an extensive set of antigens in a systematic manner. <b>Results:</b> Although some vaccine candidates were associated with a statistically significant reduction in the number of eggs in the initial screens, these observations could not be repeated in subsequent challenges and none of the proteins studied were associated with a strongly protective effect against infection. <b>Conclusions:</b> Although no antigens individually induced reproducible and strongly protective effects using our vaccination regime, we have established the experimental infrastructures to facilitate large-scale systematic subunit vaccine testing for schistosomiasis in a murine infection model.

    Funded by: Wellcome Trust

    Wellcome open research 2019;4;159

  • Oral delivery of the anti-tumor necrosis factor α domain antibody, V565, results in high intestinal and fecal concentrations with minimal systemic exposure in cynomolgus monkeys.

    Crowe JS, Roberts KJ, Carlton TM, Maggiore L, Cubitt MF, Ray KP, Donnelly MC, Wahlich JC, Humphreys JI, Robinson JR, Whale GA and West MR

    a VHsquared Ltd. , Babraham , UK.

    Objective: V565 is a novel oral anti-tumor necrosis factor (TNF)-α domain antibody being developed for topical treatment of inflammatory bowel disease (IBD) patients. Protein engineering rendered the molecule resistant to intestinal proteases. Here we investigate the formulation of V565 required to provide gastro-protection and enable optimal delivery to the lower intestinal tract in monkeys.

    Methods: Enteric-coated V565 mini-tablets were prepared and dissolution characteristics tested in vitro. Oral dosing of monkeys with enteric-coated mini-tablets containing V565 and methylene blue dye enabled in vivo localization of mini-tablet dissolution. V565 distribution in luminal contents and feces was measured by enzyme-linked immunosorbent assay (ELISA). To mimic transit across the damaged intestinal epithelium seen in IBD patients an intravenous (i.v.) bolus of V565 was given to monkeys and pharmacokinetic parameters of V565 measured in serum and urine by ELISA.

    Results: Enteric-coated mini-tablets resisted dissolution in 0.1 M HCl, before dissolving in a sustained release fashion at neutral pH. In orally dosed monkeys methylene blue intestinal staining indicated the jejunum and ileum as sites for mini-tablet dissolution. Measurements of V565 in monkey feces confirmed V565 survival through the intestinal tract. Systemic exposure after oral dosing was very low consistent with limited V565 mucosal penetration in healthy monkeys. The rapid clearance of V565 after i.v. dosing was consistent with renal excretion as the primary route for elimination of any V565 reaching the circulation.

    Conclusions: These results suggest that mini-tablets with a 24% Eudragit enteric coating are suitable for targeted release of orally delivered V565 in the intestine for topical treatment of IBD.

    Drug development and industrial pharmacy 2019;45;3;387-394

  • Cyt-Geist: Current and Future Challenges in Cytometry: Reports of the CYTO 2019 Conference Workshops.

    Czechowska K, Lannigan J, Aghaeepour N, Back JB, Begum J, Behbehani G, Bispo C, Bitoun D, Fernández AB, Boova ST, Brinkman RR, Ciccolella CO, Cotleur B, Davies D, Dela Cruz GV, Del Rio-Guerra R, Des Lauriers-Cox AM, Douagi I, Dumrese C, Bonilla Escobar DL, Estevam J, Ewald C, Fossum A, Gaudillière B, Green C, Groves C, Hall C, Haque Y, Hedrick MN, Hogg K, Hsieh EWY, Irish J, Lederer J, Leipold M, Lewis-Tuffin LJ, Litwin V, Lopez P, Nasdala I, Nedbal J, Ohlsson-Wilhelm BM, Price KM, Rahman AH, Rayanki R, Rieger AM, Robinson JP, Shapiro H, Sun YS, Tang VA, Tesfa L, Telford WG, Walker R, Welsh JA, Wheeler P and Tárnok A

    Altmattstrasse 1B, 6418, Rothenthurm, SZ, Switzerland.

    Cytometry. Part A : the journal of the International Society for Analytical Cytology 2019;95;12;1236-1274

  • Cyt-Geist: Current and Future Challenges in Cytometry: Reports of the CYTO 2018 Conference Workshops.

    Czechowska K, Lannigan J, Wang L, Arcidiacono J, Ashhurst TM, Barnard RM, Bauer S, Bispo C, Bonilla DL, Brinkman RR, Cabanski M, Chang HD, Chakrabarti L, Chojnowski G, Cotleur B, Degheidy H, Dela Cruz GV, Eck S, Elliott J, Errington R, Filby A, Gagnon D, Gardner R, Green C, Gregory M, Groves CJ, Hall C, Hammes F, Hedrick M, Hoffman R, Icha J, Ivaska J, Jenner DC, Jones D, Kerckhof FM, Kukat C, Lanham D, Leavesley S, Lee M, Lin-Gibson S, Litwin V, Liu Y, Molloy J, Moore JS, Müller S, Nedbal J, Niesner R, Nitta N, Ohlsson-Wilhelm B, Paul NE, Perfetto S, Portat Z, Props R, Radtke S, Rayanki R, Rieger A, Rogers S, Rubbens P, Salomon R, Schiemann M, Sharpe J, Sonder SU, Stewart JJ, Sun Y, Ulrich H, Van Isterdael G, Vitaliti A, van Vreden C, Weber M, Zimmermann J, Vacca G, Wallace P and Tárnok A

    Altmattstrasse 1B, 6418 Rothenthurm, Switzerland.

    Cytometry. Part A : the journal of the International Society for Analytical Cytology 2019;95;6;598-644

  • The male mosquito contribution towards malaria transmission: Mating influences the Anopheles female midgut transcriptome and increases female susceptibility to human malaria parasites.

    Dahalan FA, Churcher TS, Windbichler N and Lawniczak MKN

    Imperial College London, South Kensington, United Kingdom.

    Mating causes dramatic changes in female physiology, behaviour, and immunity in many insects, inducing oogenesis, oviposition, and refractoriness to further mating. Females from the Anopheles gambiae species complex typically mate only once in their lifetime during which they receive sperm and seminal fluid proteins as well as a mating plug that contains the steroid hormone 20-hydroxyecdysone. This hormone, which is also induced by blood-feeding, plays a major role in activating vitellogenesis for egg production. Here we show that female Anopheles coluzzii susceptibility to Plasmodium falciparum infection is significantly higher in mated females compared to virgins. We also find that mating status has a major impact on the midgut transcriptome, detectable only under sugar-fed conditions: once females have blood-fed, the transcriptional changes that are induced by mating are likely masked by the widespread effects of blood-feeding on gene expression. To determine whether increased susceptibility to parasites could be driven by the additional 20E that mated females receive from males, we mimicked mating by injecting virgin females with 20E, finding that these females are significantly more susceptible to human malaria parasites than virgin females injected with the control 20E carrier. Further RNAseq was carried out to examine whether the genes that change upon 20E injection in the midgut are similar to those that change upon mating. We find that 79 midgut-expressed genes are regulated in common by both mating and 20E, and 96% (n = 76) of these are regulated in the same direction (up vs down in 20E/mated). Together, these findings show that male Anopheles mosquitoes induce changes in the female midgut that can affect female susceptibility to P. falciparum. This implies that in nature, males might contribute to malaria transmission in previously unappreciated ways, and that vector control strategies that target males may have additional benefits towards reducing transmission.

    PLoS pathogens 2019;15;11;e1008063

  • Reverse GWAS: Using genetics to identify and model phenotypic subtypes.

    Dahl A, Cai N, Ko A, Laakso M, Pajukanta P, Flint J and Zaitlen N

    Department of Medicine, UCSF, San Francisco, California, United States of America.

    Recent and classical work has revealed biologically and medically significant subtypes in complex diseases and traits. However, relevant subtypes are often unknown, unmeasured, or actively debated, making automated statistical approaches to subtype definition valuable. We propose reverse GWAS (RGWAS) to identify and validate subtypes using genetics and multiple traits: while GWAS seeks the genetic basis of a given trait, RGWAS seeks to define trait subtypes with distinct genetic bases. Unlike existing approaches relying on off-the-shelf clustering methods, RGWAS uses a novel decomposition, MFMR, to model covariates, binary traits, and population structure. We use extensive simulations to show that modelling these features can be crucial for power and calibration. We validate RGWAS in practice by recovering a recently discovered stress subtype in major depression. We then show the utility of RGWAS by identifying three novel subtypes of metabolic traits. We biologically validate these metabolic subtypes with SNP-level tests and a novel polygenic test: the former recover known metabolic GxE SNPs; the latter suggests subtypes may explain substantial missing heritability. Crucially, statins, which are widely prescribed and theorized to increase diabetes risk, have opposing effects on blood glucose across metabolic subtypes, suggesting the subtypes have potential translational value.

    Funded by: NIDDK NIH HHS: U01 DK105561

    PLoS genetics 2019;15;4;e1008009

  • NOTCH1 Represses MCL-1 Levels in GSI-resistant T-ALL, Making them Susceptible to ABT-263.

    Dastur A, Choi A, Costa C, Yin X, Williams A, McClanaghan J, Greenberg M, Roderick J, Patel NU, Boisvert J, McDermott U, Garnett MJ, Almenara J, Grant S, Rizzo K, Engelman JA, Kelliher M, Faber AC and Benes CH

    Massachusetts General Hospital Cancer Center and Harvard Medical School, Boston, Massachusetts.

    Purpose: Effective targeted therapies are lacking for refractory and relapsed T-cell acute lymphoblastic leukemia (T-ALL). Suppression of the NOTCH pathway using gamma-secretase inhibitors (GSI) is toxic and clinically not effective. The goal of this study was to identify alternative therapeutic strategies for T-ALL.

    Experimental design: We performed a comprehensive analysis of our high-throughput drug screen across hundreds of human cell lines including 15 T-ALL models. We validated and further studied the top hit, navitoclax (ABT-263). We used multiple human T-ALL cell lines as well as primary patient samples, and performed both <i>in vitro</i> experiments and <i>in vivo</i> studies on patient-derived xenograft models.

    Results: We found that T-ALL are hypersensitive to navitoclax, an inhibitor of BCL2 family of antiapoptotic proteins. Importantly, GSI-resistant T-ALL are also susceptible to navitoclax. Sensitivity to navitoclax is due to low levels of MCL-1 in T-ALL. We identify an unsuspected regulation of mTORC1 by the NOTCH pathway, resulting in increased MCL-1 upon GSI treatment. Finally, we show that pharmacologic inhibition of mTORC1 lowers MCL-1 levels and further sensitizes cells to navitoclax <i>in vitro</i> and leads to tumor regressions <i>in vivo</i>.

    Conclusions: Our results support the development of navitoclax, as single agent and in combination with mTOR inhibitors, as a new therapeutic strategy for T-ALL, including in the setting of GSI resistance.

    Funded by: NCI NIH HHS: K22 CA175276, R01 CA096899, R01 CA140594, R01 CA167708, R01 CA205607, R01 CA215610; Wellcome Trust: 086357, 102696

    Clinical cancer research : an official journal of the American Association for Cancer Research 2019;25;1;312-324

  • Epidemic of carbapenem-resistant Klebsiella pneumoniae in Europe is driven by nosocomial spread.

    David S, Reuter S, Harris SR, Glasner C, Feltwell T, Argimon S, Abudahab K, Goater R, Giani T, Errico G, Aspbury M, Sjunnebo S, EuSCAPE Working Group, ESGEM Study Group, Feil EJ, Rossolini GM, Aanensen DM and Grundmann H

    Centre for Genomic Pathogen Surveillance, Wellcome Genome Campus, Cambridge, UK.

    Public health interventions to control the current epidemic of carbapenem-resistant Klebsiella pneumoniae rely on a comprehensive understanding of its emergence and spread over a wide range of geographical scales. We analysed the genome sequences and epidemiological data of >1,700 K. pneumoniae samples isolated from patients in 244 hospitals in 32 countries during the European Survey of Carbapenemase-Producing Enterobacteriaceae. We demonstrate that carbapenemase acquisition is the main cause of carbapenem resistance and that it occurred across diverse phylogenetic backgrounds. However, 477 of 682 (69.9%) carbapenemase-positive isolates are concentrated in four clonal lineages, sequence types 11, 15, 101, 258/512 and their derivatives. Combined analysis of the genetic and geographic distances between isolates with different β-lactam resistance determinants suggests that the propensity of K. pneumoniae to spread in hospital environments correlates with the degree of resistance and that carbapenemase-positive isolates have the highest transmissibility. Indeed, we found that over half of the hospitals that contributed carbapenemase-positive isolates probably experienced within-hospital transmission, and interhospital spread is far more frequent within, rather than between, countries. Finally, we propose a value of 21 for the number of single nucleotide polymorphisms that optimizes the discrimination of hospital clusters and detail the international spread of the successful epidemic lineage, ST258/512.

    Funded by: Medical Research Council: MR/R014922/1, MR/S004769/1

    Nature microbiology 2019;4;11;1919-1929

  • Epigenetic modifiers DNMT3A and BCOR are recurrently mutated in CYLD cutaneous syndrome.

    Davies HR, Hodgson K, Schwalbe E, Coxhead J, Sinclair N, Zou X, Cockell S, Husain A, Nik-Zainal S and Rajan N

    Wellcome Trust Sanger Institute, Hinxton, UK.

    Patients with CYLD cutaneous syndrome (CCS; syn. Brooke-Spiegler syndrome) carry germline mutations in the tumor suppressor CYLD and develop multiple skin tumors with diverse histophenotypes. Here, we comprehensively profile the genomic landscape of 42 benign and malignant tumors across 13 individuals from four multigenerational families and discover recurrent mutations in epigenetic modifiers DNMT3A and BCOR in 29% of benign tumors. Multi-level and microdissected sampling strikingly reveal that many clones with different DNMT3A mutations exist in these benign tumors, suggesting that intra-tumor heterogeneity is common. Integrated genomic, methylation and transcriptomic profiling in selected tumors suggest that isoform-specific DNMT3A2 mutations are associated with dysregulated methylation. Phylogenetic and mutational signature analyses confirm cylindroma pulmonary metastases from primary skin tumors. These findings contribute to existing paradigms of cutaneous tumorigenesis and metastasis.

    Funded by: Cancer Research UK: C60100/A25274; Wellcome Trust (Wellcome): WT097163MA

    Nature communications 2019;10;1;4717

  • Atlas of group A streptococcal vaccine candidates compiled using large-scale comparative genomics.

    Davies MR, McIntyre L, Mutreja A, Lacey JA, Lees JA, Towers RJ, Duchêne S, Smeesters PR, Frost HR, Price DJ, Holden MTG, David S, Giffard PM, Worthing KA, Seale AC, Berkley JA, Harris SR, Rivera-Hernandez T, Berking O, Cork AJ, Torres RSLA, Lithgow T, Strugnell RA, Bergmann R, Nitsche-Schmitz P, Chhatwal GS, Bentley SD, Fraser JD, Moreland NJ, Carapetis JR, Steer AC, Parkhill J, Saul A, Williamson DA, Currie BJ, Tong SYC, Dougan G and Walker MJ

    Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne and The Royal Melbourne Hospital, Melbourne, Victoria, Australia.

    Group A Streptococcus (GAS; Streptococcus pyogenes) is a bacterial pathogen for which a commercial vaccine for humans is not available. Employing the advantages of high-throughput DNA sequencing technology to vaccine design, we have analyzed 2,083 globally sampled GAS genomes. The global GAS population structure reveals extensive genomic heterogeneity driven by homologous recombination and overlaid with high levels of accessory gene plasticity. We identified the existence of more than 290 clinically associated genomic phylogroups across 22 countries, highlighting challenges in designing vaccines of global utility. To determine vaccine candidate coverage, we investigated all of the previously described GAS candidate antigens for gene carriage and gene sequence heterogeneity. Only 15 of 28 vaccine antigen candidates were found to have both low naturally occurring sequence variation and high (>99%) coverage across this diverse GAS population. This technological platform for vaccine coverage determination is equally applicable to prospective GAS vaccine antigens identified in future studies.

    Funded by: Wellcome Trust: 098051, 206194

    Nature genetics 2019;51;6;1035-1043

  • Highly Diverse Hepatitis C Strains Detected in Sub-Saharan Africa Have Unknown Susceptibility to Direct-Acting Antiviral Treatments.

    Davis C, Mgomella GS, da Silva Filipe A, Frost EH, Giroux G, Hughes J, Hogan C, Kaleebu P, Asiki G, McLauchlan J, Niebel M, Ocama P, Pomila C, Pybus OG, Pépin J, Simmonds P, Singer JB, Sreenu VB, Wekesa C, Young EH, Murphy DG, Sandhu M and Thomson EC

    Medical Research Council - University of Glasgow Centre for Virus Research, Glasgow, United Kingdom.

    The global plan to eradicate hepatitis C virus (HCV) led by the World Health Organization outlines the use of highly effective direct-acting antiviral drugs (DAAs) to achieve elimination by 2030. Identifying individuals with active disease and investigation of the breadth of diversity of the virus in sub-Saharan Africa (SSA) is essential as genotypes in this region (where very few clinical trials have been carried out) are distinct from those found in other parts of the world. We undertook a population-based, nested case-control study in Uganda and obtained additional samples from the Democratic Republic of Congo (DRC) to estimate the prevalence of HCV, assess strategies for disease detection using serological and molecular techniques, and characterize genetic diversity of the virus. Using next-generation and Sanger sequencing, we aimed to identify strains circulating in East and Central Africa. A total of 7,751 Ugandan patients were initially screened for HCV, and 20 PCR-positive samples were obtained for sequencing. Serological assays were found to vary significantly in specificity for HCV. HCV strains detected in Uganda included genotype (g) 4k, g4p, g4q, and g4s and a newly identified unassigned g7 HCV strain. Two additional unassigned g7 strains were identified in patients originating from DRC (one partial and one full open reading frame sequence). These g4 and g7 strains contain nonstructural (ns) protein 3 and 5A polymorphisms associated with resistance to DAAs in other genotypes. Clinical studies are therefore indicated to investigate treatment response in infected patients. Conclusion: Although HCV prevalence and genotypes have been well characterized in patients in well-resourced countries, clinical trials are urgently required in SSA, where highly diverse g4 and g7 strains circulate.

    Funded by: Gates Cambridge Trust; Medical Research Council: G0801566, G0901213, G0901213-92157, MC PC 16045, MC_PC_16045, MC_UU_12014/1, MC_UU_12014/12, MR/K013491/1; UK Department for International Development; Wellcome Trust: 102789/Z/13/A

    Hepatology (Baltimore, Md.) 2019;69;4;1426-1441

  • GFAKluge: A C++ library and command line utilities for the Graphical Fragment Assembly formats.

    Dawson ET and Durbin R

    Human Genetics, Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK.

    Summary: GFA has emerged as a standard format for the exchange of genome assemblies and sequence graphs. To encourage further adoption in high-performance software we have developed an open-source C++ library for GFA and a set of utilities for summarizing and manipulating the format.

    Availability: The gfakluge source code is freely available under the MIT license at It has been tested on both Mac OS X and Linux.

    Funded by: Wellcome Trust: 207492/Z/17/Z

    Journal of open source software 2019;4;33

  • Viral coinfection analysis using a MinHash toolkit.

    Dawson ET, Wagner S, Roberson D, Yeager M, Boland J, Garrison E, Chanock S, Schiffman M, Raine-Bennett T, Lorey T, Castle PE, Mirabello L and Durbin R

    Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, USA.

    Background: Human papillomavirus (HPV) is a common sexually transmitted infection associated with cervical cancer that frequently occurs as a coinfection of types and subtypes. Highly similar sublineages that show over 100-fold differences in cancer risk are not distinguishable in coinfections with current typing methods.

    Results: We describe an efficient set of computational tools, rkmh, for analyzing complex mixed infections of related viruses based on sequence data. rkmh makes extensive use of MinHash similarity measures, and includes utilities for removing host DNA and classifying reads by type, lineage, and sublineage. We show that rkmh is capable of assigning reads to their HPV type as well as HPV16 lineage and sublineages.

    Conclusions: Accurate read classification enables estimates of percent composition when there are multiple infecting lineages or sublineages. While we demonstrate rkmh for HPV with multiple sequencing technologies, it is also applicable to other mixtures of related sequences.

    Funded by: National Cancer Institute: intramural research program of the Division of Cancer Epidemiology and Genetics; National Institutes of Health: HHSN261200800001E; Wellcome Trust: WT206194, WT207492

    BMC bioinformatics 2019;20;1;389

  • A Specific CNOT1 Mutation Results in a Novel Syndrome of Pancreatic Agenesis and Holoprosencephaly through Impaired Pancreatic and Neurological Development.

    De Franco E, Watson RA, Weninger WJ, Wong CC, Flanagan SE, Caswell R, Green A, Tudor C, Lelliott CJ, Geyer SH, Maurer-Gesek B, Reissig LF, Lango Allen H, Caliebe A, Siebert R, Holterhus PM, Deeb A, Prin F, Hilbrands R, Heimberg H, Ellard S, Hattersley AT and Barroso I

    Institute of Biomedical and Clinical Science, University of Exeter Medical School, EX2 5DW Exeter, UK.

    We report a recurrent CNOT1 de novo missense mutation, GenBank: NM_016284.4; c.1603C>T (p.Arg535Cys), resulting in a syndrome of pancreatic agenesis and abnormal forebrain development in three individuals and a similar phenotype in mice. CNOT1 is a transcriptional repressor that has been suggested as being critical for maintaining embryonic stem cells in a pluripotent state. These findings suggest that CNOT1 plays a critical role in pancreatic and neurological development and describe a novel genetic syndrome of pancreatic agenesis and holoprosencephaly.

    American journal of human genetics 2019;104;5;985-989

  • Human placenta has no microbiome but can contain potential pathogens.

    de Goffau MC, Lager S, Sovio U, Gaccioli F, Cook E, Peacock SJ, Parkhill J, Charnock-Jones DS and Smith GCS

    Wellcome Sanger Institute, Cambridge, UK.

    We sought to determine whether pre-eclampsia, spontaneous preterm birth or the delivery of infants who are small for gestational age were associated with the presence of bacterial DNA in the human placenta. Here we show that there was no evidence for the presence of bacteria in the large majority of placental samples, from both complicated and uncomplicated pregnancies. Almost all signals were related either to the acquisition of bacteria during labour and delivery, or to contamination of laboratory reagents with bacterial DNA. The exception was Streptococcus agalactiae (group B Streptococcus), for which non-contaminant signals were detected in approximately 5% of samples collected before the onset of labour. We conclude that bacterial infection of the placenta is not a common cause of adverse pregnancy outcome and that the human placenta does not have a microbiome, but it does represent a potential site of perinatal acquisition of S. agalactiae, a major cause of neonatal sepsis.

    Nature 2019;572;7769;329-334

  • The fog of genetics: what is known, unknown and unknowable in the genetics of complex traits and diseases.

    de Magalhães JP and Wang J

    Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK.

    A major task for genetics is searching for genetic variants associated with disease. But we may well be missing a large number of "unknown unknown" alleles in the "fog of genetics".

    EMBO reports 2019;20;11;e48054

  • The Genomic and Immune Landscapes of Lethal Metastatic Breast Cancer.

    De Mattos-Arruda L, Sammut SJ, Ross EM, Bashford-Rogers R, Greenstein E, Markus H, Morganella S, Teng Y, Maruvka Y, Pereira B, Rueda OM, Chin SF, Contente-Cuomo T, Mayor R, Arias A, Ali HR, Cope W, Tiezzi D, Dariush A, Dias Amarante T, Reshef D, Ciriaco N, Martinez-Saez E, Peg V, Ramon Y Cajal S, Cortes J, Vassiliou G, Getz G, Nik-Zainal S, Murtaza M, Friedman N, Markowetz F, Seoane J and Caldas C

    Department of Oncology and Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Cambridge CB2 0RE, UK; Vall d'Hebron Institute of Oncology (VHIO), Vall d'Hebron University Hospital, Barcelona 08035, Spain.

    The detailed molecular characterization of lethal cancers is a prerequisite to understanding resistance to therapy and escape from cancer immunoediting. We performed extensive multi-platform profiling of multi-regional metastases in autopsies from 10 patients with therapy-resistant breast cancer. The integrated genomic and immune landscapes show that metastases propagate and evolve as communities of clones, reveal their predicted neo-antigen landscapes, and show that they can accumulate HLA loss of heterozygosity (LOH). The data further identify variable tumor microenvironments and reveal, through analyses of T cell receptor repertoires, that adaptive immune responses appear to co-evolve with the metastatic genomes. These findings reveal in fine detail the landscapes of lethal metastatic breast cancer.

    Cell reports 2019;27;9;2690-2708.e10

  • Recombination of the Phase-Variable spnIII Locus Is Independent of All Known Pneumococcal Site-Specific Recombinases.

    De Ste Croix M, Chen KY, Vacca I, Manso AS, Johnston C, Polard P, Kwun MJ, Bentley SD, Croucher NJ, Bayliss CD, Haigh RD and Oggioni MR

    Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom.

    <i>Streptococcus pneumoniae</i> is one of the world's leading bacterial pathogens, causing pneumonia, septicemia, and meningitis. In recent years, it has been shown that genetic rearrangements in a type I restriction-modification system (SpnIII) can impact colony morphology and gene expression. By generating a large panel of mutant strains, we have confirmed a previously reported result that the CreX (also known as IvrR and PsrA) recombinase found within the locus is not essential for <i>hsdS</i> inversions. In addition, mutants of homologous recombination pathways also undergo <i>hsdS</i> inversions. In this work, we have shown that these genetic rearrangements, which result in different patterns of genome methylation, occur across a wide variety of serotypes and sequence types, including two strains (a 19F and a 6B strain) naturally lacking CreX. Our gene expression analysis, by transcriptome sequencing (RNAseq), confirms that the level of <i>creX</i> expression is impacted by these genomic rearrangements. In addition, we have shown that the frequency of <i>hsdS</i> recombination is temperature dependent. Most importantly, we have demonstrated that the other known pneumococcal site-specific recombinases XerD, XerS, and SPD_0921 are not involved in <i>spnIII</i> recombination, suggesting that a currently unknown mechanism is responsible for the recombination of these phase-variable type I systems.<b>IMPORTANCE</b><i>Streptococcus pneumoniae</i> is a leading cause of pneumonia, septicemia, and meningitis. The discovery that genetic rearrangements in a type I restriction-modification locus can impact gene regulation and colony morphology led to a new understanding of how this pathogen switches from harmless colonizer to invasive pathogen. These rearrangements, which alter the DNA specificity of the type I restriction-modification enzyme, occur across many different pneumococcal serotypes and sequence types and in the absence of all known pneumococcal site-specific recombinases. This finding suggests that this is a truly global mechanism of pneumococcal gene regulation and the need for further investigation of mechanisms of site-specific recombination.

    Funded by: Wellcome Trust

    Journal of bacteriology 2019;201;15

  • Complete Assembly of Escherichia coli Sequence Type 131 Genomes Using Long Reads Demonstrates Antibiotic Resistance Gene Variation within Diverse Plasmid and Chromosomal Contexts.

    Decano AG, Ludden C, Feltwell T, Judge K, Parkhill J and Downing T

    School of Biotechnology, Dublin City University, Dublin, Ireland.

    The incidence of infections caused by extraintestinal <i>Escherichia coli</i> (ExPEC) is rising globally, which is a major public health concern. ExPEC strains that are resistant to antimicrobials have been associated with excess mortality, prolonged hospital stays, and higher health care costs. <i>E. coli</i> sequence type 131 (ST131) is a major ExPEC clonal group worldwide, with variable plasmid composition, and has an array of genes enabling antimicrobial resistance (AMR). ST131 isolates frequently encode the AMR genes <i>bla</i><sub>CTX-M-14</sub>, <i>bla</i><sub>CTX-M-15</sub>, and <i>bla</i><sub>CTX-M-27</sub>, which are often rearranged, amplified, and translocated by mobile genetic elements (MGEs). Short DNA reads do not fully resolve the architecture of repetitive elements on plasmids to allow MGE structures encoding <i>bla</i><sub>CTX-M</sub> genes to be fully determined. Here, we performed long-read sequencing to decipher the genome structures of six <i>E. coli</i> ST131 isolates from six patients. Most long-read assemblies generated entire chromosomes and plasmids as single contigs, in contrast to more fragmented assemblies created with short reads alone. The long-read assemblies highlighted diverse accessory genomes with <i>bla</i><sub>CTX-M-15</sub>, <i>bla</i><sub>CTX-M-14</sub>, and <i>bla</i><sub>CTX-M-27</sub> genes identified in three, one, and one isolates, respectively. One sample had no <i>bla</i><sub>CTX-M</sub> gene. Two samples had chromosomal <i>bla</i><sub>CTX-M-14</sub> and <i>bla</i><sub>CTX-M-15</sub> genes, and the latter was at three distinct locations, likely transposed by the adjacent MGEs: IS<i>Ecp1</i>, IS<i>903B,</i> and Tn<i>2</i> This study showed that AMR genes exist in multiple different chromosomal and plasmid contexts, even between closely related isolates within a clonal group such as <i>E. coli</i> ST131.<b>IMPORTANCE</b> Drug-resistant bacteria are a major cause of illness worldwide, and a specific subtype called <i>Escherichia coli</i> ST131 causes a significant number of these infections. ST131 bacteria become resistant to treatments by modifying their DNA and by transferring genes among one another via large packages of genes called plasmids, like a game of pass-the-parcel. Tackling infections more effectively requires a better understanding of what plasmids are being exchanged and their exact contents. To achieve this, we applied new high-resolution DNA sequencing technology to six ST131 samples from infected patients and compared the output to that of an existing approach. A combination of methods shows that drug resistance genes on plasmids are highly mobile because they can jump into ST131's chromosomes. We found that the plasmids are very elastic and undergo extensive rearrangements even in closely related samples. This application of DNA sequencing technologies illustrates at a new level the highly dynamic nature of ST131 genomes.

    mSphere 2019;4;3

  • Agreement between two large pan-cancer CRISPR-Cas9 gene dependency data sets.

    Dempster JM, Pacini C, Pantel S, Behan FM, Green T, Krill-Burger J, Beaver CM, Younger ST, Zhivich V, Najgebauer H, Allen F, Gonçalves E, Shepherd R, Doench JG, Yusa K, Vazquez F, Parts L, Boehm JS, Golub TR, Hahn WC, Root DE, Garnett MJ, Tsherniak A and Iorio F

    Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.

    Genome-scale CRISPR-Cas9 viability screens performed in cancer cell lines provide a systematic approach to identify cancer dependencies and new therapeutic targets. As multiple large-scale screens become available, a formal assessment of the reproducibility of these experiments becomes necessary. We analyze data from recently published pan-cancer CRISPR-Cas9 screens performed at the Broad and Sanger Institutes. Despite significant differences in experimental protocols and reagents, we find that the screen results are highly concordant across multiple metrics with both common and specific dependencies jointly identified across the two studies. Furthermore, robust biomarkers of gene dependency found in one data set are recovered in the other. Through further analysis and replication experiments at each institute, we show that batch effects are driven principally by two key experimental parameters: the reagent library and the assay length. These results indicate that the Broad and Sanger CRISPR-Cas9 viability screens yield robust and reproducible findings.

    Funded by: NCI NIH HHS: U01 CA176058

    Nature communications 2019;10;1;5817

  • In conversation with Sarah Teichmann.

    Dhillon P and Teichmann SA

    The FEBS Journal Editorial Office, Cambridge, UK.

    Sarah Teichmann is Head of Cellular Genetics at the Wellcome Sanger Institute and visiting research group leader at the European Bioinformatics Institute (EMBL-EBI). Sarah was appointed to the Sanger Institute and EMBL-EBI in 2013; prior to this she was a research group leader at the MRC Laboratory of Molecular Biology (LMB), where she first set up her group in 2001. The Teichmann lab is interested in global principles of protein interactions and gene expression, and in recent years has exploited cutting-edge single-cell genomics technologies to explore key questions relating to immune system function. In 2016, she co-founded the Human Cell Atlas initiative to map every cell type in the human body using single-cell transcriptomic technologies and spatial methods. Sarah has received many prestigious awards in recognition of her contributions to understanding protein complex assembly and gene regulatory networks. In this interview, she relays the story behind some of her research breakthroughs, discusses her career path and most influential mentors, and tells us why looking at biology at the level of a single cell can be so powerful and illuminating.

    The FEBS journal 2019;286;8;1445-1450

  • Genomic characterization of novel Neisseria species.

    Diallo K, MacLennan J, Harrison OB, Msefula C, Sow SO, Daugla DM, Johnson E, Trotter C, MacLennan CA, Parkhill J, Borrow R, Greenwood BM and Maiden MCJ

    Centre pour les Vaccins en Développement, Bamako, Mali.

    Of the ten human-restricted Neisseria species two, Neisseria meningitidis, and Neisseria gonorrhoeae, cause invasive disease: the other eight are carried asymptomatically in the pharynx, possibly modulating meningococcal and gonococcal infections. Consequently, characterizing their diversity is important for understanding the microbiome in health and disease. Whole genome sequences from 181 Neisseria isolates were examined, including those of three well-defined species (N. meningitidis; N. gonorrhoeae; and Neisseria polysaccharea) and genomes of isolates unassigned to any species (Nspp). Sequence analysis of ribosomal genes, and a set of core (cgMLST) genes were used to infer phylogenetic relationships. Average Nucleotide Identity (ANI) and phenotypic data were used to define species clusters, and morphological and metabolic differences among them. Phylogenetic analyses identified two polyphyletic clusters (N. polysaccharea and Nspp.), while, cgMLST data grouped Nspp isolates into nine clusters and identified at least three N. polysaccharea clusters. ANI results classified Nspp into seven putative species, and also indicated at least three putative N. polysaccharea species. Electron microscopy identified morphological differences among these species. This genomic approach provided a consistent methodology for species characterization using distinct phylogenetic clusters. Seven putative novel Neisseria species were identified, confirming the importance of genomic studies in the characterization of the genus Neisseria.

    Funded by: Wellcome Trust (Wellcome): 103957/Z/14/Z

    Scientific reports 2019;9;1;13742

  • CFM-ID 3.0: Significantly Improved ESI-MS/MS Prediction and Compound Identification.

    Djoumbou-Feunang Y, Pon A, Karu N, Zheng J, Li C, Arndt D, Gautam M, Allen F and Wishart DS

    Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada.

    Metabolite identification for untargeted metabolomics is often hampered by the lack of experimentally collected reference spectra from tandem mass spectrometry (MS/MS). To circumvent this problem, Competitive Fragmentation Modeling-ID (CFM-ID) was developed to accurately predict electrospray ionization-MS/MS (ESI-MS/MS) spectra from chemical structures and to aid in compound identification via MS/MS spectral matching. While earlier versions of CFM-ID performed very well, CFM-ID's performance for predicting the MS/MS spectra of certain classes of compounds, including many lipids, was quite poor. Furthermore, CFM-ID's compound identification capabilities were limited because it did not use experimentally available MS/MS spectra nor did it exploit metadata in its spectral matching algorithm. Here, we describe significant improvements to CFM-ID's performance and speed. These include (1) the implementation of a rule-based fragmentation approach for lipid MS/MS spectral prediction, which greatly improves the speed and accuracy of CFM-ID; (2) the inclusion of experimental MS/MS spectra and other metadata to enhance CFM-ID's compound identification abilities; (3) the development of new scoring functions that improves CFM-ID's accuracy by 21.1%; and (4) the implementation of a chemical classification algorithm that correctly classifies unknown chemicals (based on their MS/MS spectra) in >80% of the cases. This improved version called CFM-ID 3.0 is freely available as a web server. Its source code is also accessible online.

    Funded by: Canadian Institutes of Health Research: 148461; Genome Alberta: 12103

    Metabolites 2019;9;4

  • Single-Cell Transcriptomics Uncovers Zonation of Function in the Mesenchyme during Liver Fibrosis.

    Dobie R, Wilson-Kanamori JR, Henderson BEP, Smith JR, Matchett KP, Portman JR, Wallenborg K, Picelli S, Zagorska A, Pendem SV, Hudson TE, Wu MM, Budas GR, Breckenridge DG, Harrison EM, Mole DJ, Wigmore SJ, Ramachandran P, Ponting CP, Teichmann SA, Marioni JC and Henderson NC

    Centre for Inflammation Research, The Queen's Medical Research Institute, Edinburgh BioQuarter, University of Edinburgh, Edinburgh EH16 4TJ, UK.

    Iterative liver injury results in progressive fibrosis disrupting hepatic architecture, regeneration potential, and liver function. Hepatic stellate cells (HSCs) are a major source of pathological matrix during fibrosis and are thought to be a functionally homogeneous population. Here, we use single-cell RNA sequencing to deconvolve the hepatic mesenchyme in healthy and fibrotic mouse liver, revealing spatial zonation of HSCs across the hepatic lobule. Furthermore, we show that HSCs partition into topographically diametric lobule regions, designated portal vein-associated HSCs (PaHSCs) and central vein-associated HSCs (CaHSCs). Importantly we uncover functional zonation, identifying CaHSCs as the dominant pathogenic collagen-producing cells in a mouse model of centrilobular fibrosis. Finally, we identify LPAR1 as a therapeutic target on collagen-producing CaHSCs, demonstrating that blockade of LPAR1 inhibits liver fibrosis in a rodent NASH model. Taken together, our work illustrates the power of single-cell transcriptomics to resolve the key collagen-producing cells driving liver fibrosis with high precision.

    Cell reports 2019;29;7;1832-1847.e8

  • Genomes of Leishmania parasites directly sequenced from patients with visceral leishmaniasis in the Indian subcontinent.

    Domagalska MA, Imamura H, Sanders M, Van den Broeck F, Bhattarai NR, Vanaerschot M, Maes I, D'Haenens E, Rai K, Rijal S, Berriman M, Cotton JA and Dujardin JC

    Institute of Tropical Medicine Antwerp, Molecular Parasitology Unit, Antwerp, Belgium.

    Whole genome sequencing (WGS) is increasingly used for molecular diagnosis and epidemiology of infectious diseases. Current Leishmania genomic studies rely on DNA extracted from cultured parasites, which might introduce sampling and biological biases into the subsequent analyses. Up to now, direct analysis of Leishmania genome in clinical samples is hampered by high levels of human DNA and large variation in parasite load in clinical samples. Here, we present a method, based on target enrichment of Leishmania donovani DNA with Agilent SureSelect technology, that allows the analysis of Leishmania genomes directly in clinical samples. We validated our protocol with a set of artificially mixed samples, followed by the analysis of 63 clinical samples (bone marrow or spleen aspirates) from visceral leishmaniasis patients in Nepal. We were able to identify genotypes using a set of diagnostic SNPs in almost all of these samples (97%) and access comprehensive genome-wide information in most (83%). This allowed us to perform phylogenomic analysis, assess chromosome copy number and identify large copy number variants (CNVs). Pairwise comparisons between the parasite genomes in clinical samples and derived in vitro cultured promastigotes showed a lower aneuploidy in amastigotes as well as genomic differences, suggesting polyclonal infections in patients. Altogether our results underline the need for sequencing parasite genomes directly in the host samples.

    Funded by: Wellcome Trust: 206194

    PLoS neglected tropical diseases 2019;13;12;e0007900

  • Novel Insights Into the Spread of Enteric Pathogens Using Genomics.

    Domman D, Ruis C, Dorman MJ, Shakya M and Chain PSG

    Bioscience Division, Los Alamos National Laboratory, New Mexico.

    The Journal of infectious diseases 2019

  • The gene regulatory basis of genetic compensation during neural crest induction.

    Dooley CM, Wali N, Sealy IM, White RJ, Stemple DL, Collins JE and Busch-Nentwich EM

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom.

    The neural crest (NC) is a vertebrate-specific cell type that contributes to a wide range of different tissues across all three germ layers. The gene regulatory network (GRN) responsible for the formation of neural crest is conserved across vertebrates. Central to the induction of the NC GRN are AP-2 and SoxE transcription factors. NC induction robustness is ensured through the ability of some of these transcription factors to compensate loss of function of gene family members. However the gene regulatory events underlying compensation are poorly understood. We have used gene knockout and RNA sequencing strategies to dissect NC induction and compensation in zebrafish. We genetically ablate the NC using double mutants of tfap2a;tfap2c or remove specific subsets of the NC with sox10 and mitfa knockouts and characterise genome-wide gene expression levels across multiple time points. We find that compensation through a single wild-type allele of tfap2c is capable of maintaining early NC induction and differentiation in the absence of tfap2a function, but many target genes have abnormal expression levels and therefore show sensitivity to the reduced tfap2 dosage. This separation of morphological and molecular phenotypes identifies a core set of genes required for early NC development. We also identify the 15 somites stage as the peak of the molecular phenotype which strongly diminishes at 24 hpf even as the morphological phenotype becomes more apparent. Using gene knockouts, we associate previously uncharacterised genes with pigment cell development and establish a role for maternal Hippo signalling in melanocyte differentiation. This work extends and refines the NC GRN while also uncovering the transcriptional basis of genetic compensation via paralogues.

    PLoS genetics 2019;15;6;e1008213

  • High quality reference genomes for toxigenic and non-toxigenic Vibrio cholerae serogroup O139.

    Dorman MJ, Domman D, Uddin MI, Sharmin S, Afrad MH, Begum YA, Qadri F and Thomson NR

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom.

    Toxigenic Vibrio cholerae of the O139 serogroup have been responsible for several large cholera epidemics in South Asia, and continue to be of clinical and historical significance today. This serogroup was initially feared to represent a new, emerging V. cholerae clone that would lead to an eighth cholera pandemic. However, these concerns were ultimately unfounded. The majority of clinically relevant V. cholerae O139 isolates are closely related to serogroup O1, biotype El Tor V. cholerae, and comprise a single sublineage of the seventh pandemic El Tor lineage. Although related, these V. cholerae serogroups differ in several fundamental ways, in terms of their O-antigen, capsulation phenotype, and the genomic islands found on their chromosomes. Here, we present four complete, high-quality genomes for V. cholerae O139, obtained using long-read sequencing. Three of these sequences are from toxigenic V. cholerae, and one is from a bacterium which, although classified serologically as V. cholerae O139, lacks the CTXφ bacteriophage and the ability to produce cholera toxin. We highlight fundamental genomic differences between these isolates, the V. cholerae O1 reference strain N16961, and the prototypical O139 strain MO10. These sequences are an important resource for the scientific community, and will improve greatly our ability to perform genomic analyses of non-O1 V. cholerae in the future. These genomes also offer new insights into the biology of a V. cholerae serogroup that, from a genomic perspective, is poorly understood.

    Funded by: NIAID NIH HHS: R01 AI103055, R01 AI106878, R56 AI106878, U01 AI058935; Wellcome Trust: 098051, 206194

    Scientific reports 2019;9;1;5865

  • The history, genome and biology of NCTC 30: a non-pandemic Vibrio cholerae isolate from World War One.

    Dorman MJ, Kane L, Domman D, Turnbull JD, Cormie C, Fazal MA, Goulding DA, Russell JE, Alexander S and Thomson NR

    1 Wellcome Sanger Institute , Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA , UK.

    The sixth global cholera pandemic lasted from 1899 to 1923. However, despite widespread fear of the disease and of its negative effects on troop morale, very few soldiers in the British Expeditionary Forces contracted cholera between 1914 and 1918. Here, we have revived and sequenced the genome of NCTC 30, a 102-year-old Vibrio cholerae isolate, which we believe is the oldest publicly available live V. cholerae strain in existence. NCTC 30 was isolated in 1916 from a British soldier convalescent in Egypt. We found that this strain does not encode cholera toxin, thought to be necessary to cause cholera, and is not part of V. cholerae lineages responsible for the pandemic disease. We also show that NCTC 30, which predates the introduction of penicillin-based antibiotics, harbours a functional β-lactamase antibiotic resistance gene. Our data corroborate and provide molecular explanations for previous phenotypic studies of NCTC 30 and provide a new high-quality genome sequence for historical, non-pandemic V. cholerae.

    Funded by: Wellcome Trust: 206194

    Proceedings. Biological sciences 2019;286;1900;20182025

  • Meeting the discovery challenge of drug-resistant infections: progress and focusing resources.

    Dougan G, Dowson C, Overington J and Next Generation Antibiotic Discovery Symposium Participants

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK; The Department of Medicine, University of Cambridge, UK. Electronic address:

    Following multiple warnings from governments and health organisations, there has been renewed investment, led by the public sector, in the discovery of novel antimicrobials to meet the challenge of rising levels of drug-resistant infection, particularly in the case of resistance to antibiotics. Initiatives have also been announced to support and enable the antibiotic discovery process. In January 2018, the Medicines Discovery Catapult, UK, hosted a symposium: Next Generation Antibiotics Discovery, to consider the latest initiatives and any remaining challenges to inform and guide the international research community and better focus resources to yield a novel class of antibiotic.

    Funded by: Medical Research Council: MR/N002679/1, MR/P007503/1

    Drug discovery today 2019;24;2;452-461

  • A defined mechanistic correlate of protection against Plasmodium falciparum malaria in non-human primates.

    Douglas AD, Baldeviano GC, Jin J, Miura K, Diouf A, Zenonos ZA, Ventocilla JA, Silk SE, Marshall JM, Alanine DGW, Wang C, Edwards NJ, Leiva KP, Gomez-Puerta LA, Lucas CM, Wright GJ, Long CA, Royal JM and Draper SJ

    Jenner Institute, University of Oxford, Old Road Campus Research Building, Roosevelt Drive, Oxford, OX3 7DQ, UK.

    Malaria vaccine design and prioritization has been hindered by the lack of a mechanistic correlate of protection. We previously demonstrated a strong association between protection and merozoite-neutralizing antibody responses following vaccination of non-human primates against Plasmodium falciparum reticulocyte binding protein homolog 5 (PfRH5). Here, we test the mechanism of protection. Using mutant human IgG1 Fc regions engineered not to engage complement or FcR-dependent effector mechanisms, we produce merozoite-neutralizing and non-neutralizing anti-PfRH5 chimeric monoclonal antibodies (mAbs) and perform a passive transfer-P. falciparum challenge study in Aotus nancymaae monkeys. At the highest dose tested, 6/6 animals given the neutralizing PfRH5-binding mAb c2AC7 survive the challenge without treatment, compared to 0/6 animals given non-neutralizing PfRH5-binding mAb c4BA7 and 0/6 animals given an isotype control mAb. Our results address the controversy regarding whether merozoite-neutralizing antibody can cause protection against P. falciparum blood-stage infections, and highlight the quantitative challenge of achieving such protection.

    Funded by: Wellcome Trust: 201477/Z/16/Z

    Nature communications 2019;10;1;1953

  • The clinical presentation caused by truncating CHD8 variants.

    Douzgou S, Liang HW, Metcalfe K, Somarathi S, Tischkowitz M, Mohamed W, Kini U, McKee S, Yates L, Bertoli M, Lynch SA, Holder S, Deciphering Developmental Disorders Study and Banka S

    Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Sciences Centre, Manchester, UK.

    Variants in the chromodomain helicase DNA-binding protein 8 (CHD8) have been associated with intellectual disability (ID), autism spectrum disorders (ASDs) and overgrowth and CHD8 is one of the causative genes for OGID (overgrowth and ID). We investigated 25 individuals with CHD8 protein truncating variants (PTVs), including 10 previously unreported patients and found a male to female ratio of 2.7:1 (19:7) and a pattern of common features: macrocephaly (62.5%), tall stature (47%), developmental delay and/or intellectual disability (81%), ASDs (84%), sleep difficulties (50%), gastrointestinal problems (40%), and distinct facial features. Most of the individuals in this cohort had moderate-to-severe ID, some had regression of speech (37%), seizures (27%) and hypotonia (27%) and two individuals were non-ambulant. Our study shows that haploinsufficiency of CHD8 is associated with a distinctive OGID syndrome with pronounced autistic traits and supports a sex-dependent penetrance of CHD8 PTVs in humans.

    Funded by: Department of Health: HICF-1009-003; Medical Research Council: MC_PC_16018; Wellcome Trust: WT098051

    Clinical genetics 2019;96;1;72-84

  • Diagnostic high-throughput sequencing of 2396 patients with bleeding, thrombotic, and platelet disorders.

    Downes K, Megy K, Duarte D, Vries M, Gebhart J, Hofer S, Shamardina O, Deevi SVV, Stephens J, Mapeta R, Tuna S, Al Hasso N, Besser MW, Cooper N, Daugherty L, Gleadall N, Greene D, Haimel M, Martin H, Papadia S, Revel-Vilk S, Sivapalaratnam S, Symington E, Thomas W, Thys C, Tolios A, Penkett CJ, NIHR BioResource, Ouwehand WH, Abbs S, Laffan MA, Turro E, Simeoni I, Mumford AD, Henskens YMC, Pabinger I, Gomez K and Freson K

    Department of Haematology, University of Cambridge.

    A targeted high-throughput sequencing (HTS) panel test for clinical diagnostics requires careful consideration of the inclusion of appropriate diagnostic-grade genes, the ability to detect multiple types of genomic variation with high levels of analytic sensitivity and reproducibility, and variant interpretation by a multidisciplinary team (MDT) in the context of the clinical phenotype. We have sequenced 2396 index patients using the ThromboGenomics HTS panel test of diagnostic-grade genes known to harbor variants associated with rare bleeding, thrombotic, or platelet disorders (BTPDs). The molecular diagnostic rate was determined by the clinical phenotype, with an overall rate of 49.2% for all thrombotic, coagulation, platelet count, and function disorder patients and a rate of 3.2% for patients with unexplained bleeding disorders characterized by normal hemostasis test results. The MDT classified 745 unique variants, including copy number variants (CNVs) and intronic variants, as pathogenic, likely pathogenic, or variants of uncertain significance. Half of these variants (50.9%) are novel and 41 unique variants were identified in 7 genes recently found to be implicated in BTPDs. Inspection of canonical hemostasis pathways identified 29 patients with evidence of oligogenic inheritance. A molecular diagnosis has been reported for 894 index patients providing evidence that introducing an HTS genetic test is a valuable addition to laboratory diagnostics in patients with a high likelihood of having an inherited BTPD.

    Funded by: British Heart Foundation; Medical Research Council

    Blood 2019;134;23;2082-2091

  • Genome-wide Approaches to Investigate Anthelmintic Resistance.

    Doyle SR and Cotton JA

    Wellcome Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK. Electronic address:

    The rapid evolution of anthelmintic resistance in a number of parasites of livestock and domesticated animals has occurred in response to widespread use of anthelmintics for parasite control, and threatens the success of parasite control of species that infect humans. The genetic basis of resistance to most anthelmintics remains poorly resolved. Genome-wide approaches are now accessible due to recent advances in high-throughput sequencing, and are increasingly applied to characterize traits including drug resistance. Here, we discuss why traditional candidate gene studies have largely failed to define the genetics of resistance, and why - and in what circumstances - we expect genome-wide approaches to shed new light on the modes of action and the evolution of resistance to anthelmintic compounds.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/M003949/1; Wellcome Trust: 098051, 206194

    Trends in parasitology 2019;35;4;289-301

  • Population genomic and evolutionary modelling analyses reveal a single major QTL for ivermectin drug resistance in the pathogenic nematode, Haemonchus contortus.

    Doyle SR, Illingworth CJR, Laing R, Bartley DJ, Redman E, Martinelli A, Holroyd N, Morrison AA, Rezansoff A, Tracey A, Devaney E, Berriman M, Sargison N, Cotton JA and Gilleard JS

    Wellcome Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Background: Infections with helminths cause an enormous disease burden in billions of animals and plants worldwide. Large scale use of anthelmintics has driven the evolution of resistance in a number of species that infect livestock and companion animals, and there are growing concerns regarding the reduced efficacy in some human-infective helminths. Understanding the mechanisms by which resistance evolves is the focus of increasing interest; robust genetic analysis of helminths is challenging, and although many candidate genes have been proposed, the genetic basis of resistance remains poorly resolved.

    Results: Here, we present a genome-wide analysis of two genetic crosses between ivermectin resistant and sensitive isolates of the parasitic nematode Haemonchus contortus, an economically important gastrointestinal parasite of small ruminants and a model for anthelmintic research. Whole genome sequencing of parental populations, and key stages throughout the crosses, identified extensive genomic diversity that differentiates populations, but after backcrossing and selection, a single genomic quantitative trait locus (QTL) localised on chromosome V was revealed to be associated with ivermectin resistance. This QTL was common between the two geographically and genetically divergent resistant populations and did not include any leading candidate genes, suggestive of a previously uncharacterised mechanism and/or driver of resistance. Despite limited resolution due to low recombination in this region, population genetic analyses and novel evolutionary models supported strong selection at this QTL, driven by at least partial dominance of the resistant allele, and that large resistance-associated haplotype blocks were enriched in response to selection.

    Conclusions: We have described the genetic architecture and mode of ivermectin selection, revealing a major genomic locus associated with ivermectin resistance, the most conclusive evidence to date in any parasitic nematode. This study highlights a novel genome-wide approach to the analysis of a genetic cross in non-model organisms with extreme genetic diversity, and the importance of a high-quality reference genome in interpreting the signals of selection so identified.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/M003949; Rural and Environment Science and Analytical Services Division (GB): .; Wellcome Trust; Wellcome Trust (GB): 098051 &amp; 206194; Wellcome and the Royal Society Sir Henry Dale Fellowship: 101239/Z/13/Z

    BMC genomics 2019;20;1;218

  • Evaluation of DNA Extraction Methods on Individual Helminth Egg and Larval Stages for Whole-Genome Sequencing.

    Doyle SR, Sankaranarayanan G, Allan F, Berger D, Jimenez Castro PD, Collins JB, Crellen T, Duque-Correa MA, Ellis P, Jaleta TG, Laing R, Maitland K, McCarthy C, Moundai T, Softley B, Thiele E, Ouakou PT, Tushabe JV, Webster JP, Weiss AJ, Lok J, Devaney E, Kaplan RM, Cotton JA, Berriman M and Holroyd N

    Parasites and Microbes, Wellcome Sanger Institute, Hinxton, United Kingdom.

    Whole-genome sequencing is being rapidly applied to the study of helminth genomes, including <i>de novo</i> genome assembly, population genetics, and diagnostic applications. Although late-stage juvenile and adult parasites typically produce sufficient DNA for molecular analyses, these parasitic stages are almost always inaccessible in the live host; immature life stages found in the environment for which samples can be collected non-invasively offer a potential alternative; however, these samples typically yield very low quantities of DNA, can be environmentally resistant, and are susceptible to contamination, often from bacterial or host DNA. Here, we have tested five low-input DNA extraction protocols together with a low-input sequencing library protocol to assess the feasibility of whole-genome sequencing of individual immature helminth samples. These approaches do not use whole-genome amplification, a common but costly approach to increase the yield of low-input samples. We first tested individual parasites from two species spotted onto FTA cards-egg and L1 stages of <i>Haemonchus contortus</i> and miracidia of <i>Schistosoma mansoni</i>-before further testing on an additional five species-<i>Ancylostoma caninum</i>, <i>Ascaridia dissimilis</i>, <i>Dirofilaria immitis</i>, <i>Strongyloides stercoralis</i>, and <i>Trichuris muris</i>-with an optimal protocol. A sixth species-<i>Dracunculus medinensis</i>-was included for comparison. Whole-genome sequencing followed by analyses to determine the proportion of on- and off-target mapping revealed successful sample preparations for six of the eight species tested with variation both between species and between different life stages from some species described. These results demonstrate the feasibility of whole-genome sequencing of individual parasites, and highlight a new avenue toward generating sensitive, specific, and information-rich data for the diagnosis and surveillance of helminths.

    Funded by: NIAID NIH HHS: R01 AI050668

    Frontiers in genetics 2019;10;826

  • Pancreatic cancer organoids recapitulate disease and allow personalized drug screening.

    Driehuis E, van Hoeck A, Moore K, Kolders S, Francies HE, Gulersonmez MC, Stigter ECA, Burgering B, Geurts V, Gracanin A, Bounova G, Morsink FH, Vries R, Boj S, van Es J, Offerhaus GJA, Kranenburg O, Garnett MJ, Wessels L, Cuppen E, Brosens LAA and Clevers H

    Oncode Institute, University Medical Center Utrecht, 3584 CX Utrecht, The Netherlands.

    We report the derivation of 30 patient-derived organoid lines (PDOs) from tumors arising in the pancreas and distal bile duct. PDOs recapitulate tumor histology and contain genetic alterations typical of pancreatic cancer. In vitro testing of a panel of 76 therapeutic agents revealed sensitivities currently not exploited in the clinic, and underscores the importance of personalized approaches for effective cancer treatment. The PRMT5 inhibitor EZP015556, shown to target <i>MTAP</i> (a gene commonly lost in pancreatic cancer)-negative tumors, was validated as such, but also appeared to constitute an effective therapy for a subset of MTAP-positive tumors. Taken together, the work presented here provides a platform to identify novel therapeutics to target pancreatic tumor cells using PDOs.

    Proceedings of the National Academy of Sciences of the United States of America 2019

  • Bacteriophage targeting of gut bacterium attenuates alcoholic liver disease.

    Duan Y, Llorente C, Lang S, Brandl K, Chu H, Jiang L, White RC, Clarke TH, Nguyen K, Torralba M, Shao Y, Liu J, Hernandez-Morales A, Lessor L, Rahman IR, Miyamoto Y, Ly M, Gao B, Sun W, Kiesel R, Hutmacher F, Lee S, Ventura-Cots M, Bosques-Padilla F, Verna EC, Abraldes JG, Brown RS, Vargas V, Altamirano J, Caballería J, Shawcross DL, Ho SB, Louvet A, Lucey MR, Mathurin P, Garcia-Tsao G, Bataller R, Tu XM, Eckmann L, van der Donk WA, Young R, Lawley TD, Stärkel P, Pride D, Fouts DE and Schnabl B

    Department of Medicine, University of California San Diego, La Jolla, CA, USA.

    Chronic liver disease due to alcohol-use disorder contributes markedly to the global burden of disease and mortality<sup>1-3</sup>. Alcoholic hepatitis is a severe and life-threatening form of alcohol-associated liver disease. The gut microbiota promotes ethanol-induced liver disease in mice<sup>4</sup>, but little is known about the microbial factors that are responsible for this process. Here we identify cytolysin-a two-subunit exotoxin that is secreted by Enterococcus faecalis<sup>5,6</sup>-as a cause of hepatocyte death and liver injury. Compared with non-alcoholic individuals or patients with alcohol-use disorder, patients with alcoholic hepatitis have increased faecal numbers of E. faecalis. The presence of cytolysin-positive (cytolytic) E. faecalis correlated with the severity of liver disease and with mortality in patients with alcoholic hepatitis. Using humanized mice that were colonized with bacteria from the faeces of patients with alcoholic hepatitis, we investigated the therapeutic effects of bacteriophages that target cytolytic E. faecalis. We found that these bacteriophages decrease cytolysin in the liver and abolish ethanol-induced liver disease in humanized mice. Our findings link cytolytic E. faecalis with more severe clinical outcomes and increased mortality in patients with alcoholic hepatitis. We show that bacteriophages can specifically target cytolytic E. faecalis, which provides a method for precisely editing the intestinal microbiota. A clinical trial with a larger cohort is required to validate the relevance of our findings in humans, and to test whether this therapeutic approach is effective for patients with alcoholic hepatitis.

    Funded by: BLRD VA: I01 BX004594; NIAAA NIH HHS: P50 AA011999, R01 AA020703, R01 AA024726, U01 AA021908, U01 AA026939; NIDDK NIH HHS: P30 DK034989, P30 DK120515, P30 DK120531; NIGMS NIH HHS: T32 GM070421; Wellcome Trust

    Nature 2019;575;7783;505-511

  • Important Extracellular Interactions between Plasmodium Sporozoites and Host Cells Required for Infection.

    Dundas K, Shears MJ, Sinnis P and Wright GJ

    Cell Surface Signalling Laboratory and Parasites and Microbes Programme, Wellcome Trust Sanger Institute, Cambridge, CB10 1SA, UK.

    Malaria is an infectious disease, caused by Plasmodium parasites, that remains a major global health problem. Infection begins when salivary gland sporozoites are transmitted through the bite of an infected mosquito. Once within the host, sporozoites navigate through the dermis, into the bloodstream, and eventually invade hepatocytes. While we have an increasingly sophisticated cellular description of this journey, our molecular understanding of the extracellular interactions between the sporozoite and mammalian host that regulate migration and invasion remain comparatively poor. Here, we review the current state of our understanding, highlight the technical limitations that have frustrated progress, and outline how new approaches will help to address this knowledge gap with the ultimate aim of improving malaria treatments.

    Funded by: Medical Research Council: MR/J004111/1; NIAID NIH HHS: R01 AI056840, R01 AI132359; Wellcome Trust: 206194

    Trends in parasitology 2019;35;2;129-139

  • The flagellotropic bacteriophage YSD1 targets Salmonella Typhi with a Chi-like protein tail fibre.

    Dunstan RA, Pickard D, Dougan S, Goulding D, Cormie C, Hardy J, Li F, Grinter R, Harcourt K, Yu L, Song J, Schreiber F, Choudhary J, Clare S, Coulibaly F, Strugnell RA, Dougan G and Lithgow T

    Infection and Immunity Program, Department of Microbiology, Biomedicine Discovery Institute, Monash University, Clayton, 3800, Australia.

    The discovery of a Salmonella-targeting phage from the waterways of the United Kingdom provided an opportunity to address the mechanism by which Chi-like bacteriophage (phage) engages with bacterial flagellae. The long tail fibre seen on Chi-like phages has been proposed to assist the phage particle in docking to a host cell flagellum, but the identity of the protein that generates this fibre was unknown. We present the results from genome sequencing of this phage, YSD1, confirming its close relationship to the original Chi phage and suggesting candidate proteins to form the tail structure. Immunogold labelling in electron micrographs revealed that YSD1_22 forms the main shaft of the tail tube, while YSD1_25 forms the distal part contributing to the tail spike complex. The long curling tail fibre is formed by the protein YSD1_29, and treatment of phage with the antibodies that bind YSD1_29 inhibits phage infection of Salmonella. The host range for YSD1 across Salmonella serovars is broad, but not comprehensive, being limited by antigenic features of the flagellin subunits that make up the Salmonella flagellum, with which YSD1_29 engages to initiate infection.

    Molecular microbiology 2019;112;6;1831-1846

  • Exclusive dependence of IL-10Rα signalling on intestinal microbiota homeostasis and control of whipworm infection.

    Duque-Correa MA, Karp NA, McCarthy C, Forman S, Goulding D, Sankaranarayanan G, Jenkins TP, Reid AJ, Cambridge EL, Ballesteros Reviriego C, Sanger Mouse Genetics Project, 3i consortium, Müller W, Cantacessi C, Dougan G, Grencis RK and Berriman M

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom.

    The whipworm Trichuris trichiura is a soil-transmitted helminth that dwells in the epithelium of the caecum and proximal colon of their hosts causing the human disease, trichuriasis. Trichuriasis is characterized by colitis attributed to the inflammatory response elicited by the parasite while tunnelling through intestinal epithelial cells (IECs). The IL-10 family of receptors, comprising combinations of subunits IL-10Rα, IL-10Rβ, IL-22Rα and IL-28Rα, modulates intestinal inflammatory responses. Here we carefully dissected the role of these subunits in the resistance of mice to infection with T. muris, a mouse model of the human whipworm T. trichiura. Our findings demonstrate that whilst IL-22Rα and IL-28Rα are dispensable in the host response to whipworms, IL-10 signalling through IL-10Rα and IL-10Rβ is essential to control caecal pathology, worm expulsion and survival during T. muris infections. We show that deficiency of IL-10, IL-10Rα and IL-10Rβ results in dysbiosis of the caecal microbiota characterised by expanded populations of opportunistic bacteria of the families Enterococcaceae and Enterobacteriaceae. Moreover, breakdown of the epithelial barrier after whipworm infection in IL-10, IL-10Rα and IL-10Rβ-deficient mice, allows the translocation of these opportunistic pathogens or their excretory products to the liver causing organ failure and lethal disease. Importantly, bone marrow chimera experiments indicate that signalling through IL-10Rα and IL-10Rβ in haematopoietic cells, but not IECs, is crucial to control worm expulsion and immunopathology. These findings are supported by worm expulsion upon infection of conditional mutant mice for the IL-10Rα on IECs. Our findings emphasize the pivotal and complex role of systemic IL-10Rα signalling on immune cells in promoting microbiota homeostasis and maintaining the intestinal epithelial barrier, thus preventing immunopathology during whipworm infections.

    PLoS pathogens 2019;15;1;e1007265

  • Antibiotic Resistance and Typhoid.

    Dyson ZA, Klemm EJ, Palmer S and Dougan G

    Department of Medicine, University of Cambridge, Hinxton, Cambridge, United Kingdom.

    Multiple drug (antibiotic) resistance (MDR) has become a major threat to the treatment of typhoid and other infectious diseases. Since the 1970s, this threat has increased in Salmonella enterica serovar Typhi, driven in part by the emergence of successful genetic clades, such as haplotype H58, associated with the MDR phenotype. H58 S. Typhi can express multiple antibiotic resistance determinants while retaining the ability to efficiently transmit and persist within the human population. The recent identification of extensively drug resistant S. Typhi only highlights the dangers of ignoring this threat. Here we discuss the evolution of the S. Typhi MDR phenotype and consider options for management.

    Clinical infectious diseases : an official publication of the Infectious Diseases Society of America 2019;68;Supplement_2;S165-S170

  • Dppa2 and Dppa4 directly regulate the Dux-driven zygotic transcriptional program.

    Eckersley-Maslin M, Alda-Catalinas C, Blotenburg M, Kreibich E, Krueger C and Reik W

    Epigenetics Programme, Babraham Institute, Cambridge CB22 3AT, United Kingdom.

    The molecular regulation of zygotic genome activation (ZGA) in mammals remains an exciting area of research. Primed mouse embryonic stem cells contain a rare subset of "2C-like" cells that are epigenetically and transcriptionally similar to the two-cell embryo and thus represent an in vitro approximation for studying ZGA transcription regulation. Recently, the transcription factor Dux, expressed in the minor wave of ZGA, was described to activate many downstream ZGA transcripts. However, it remains unknown what upstream maternal factors initiate ZGA in either a Dux-dependent or Dux-independent manner. Here we performed a candidate-based overexpression screen, identifying, among others, developmental pluripotency-associated 2 (Dppa2) and Dppa4 as positive regulators of 2C-like cells and transcription of ZGA genes. In the germline, promoter DNA demethylation coincides with expression of Dppa2 and Dppa4, which remain expressed until embryonic day 7.5 (E7.5), when their promoters are remethylated. Furthermore, Dppa2 and Dppa4 are also expressed during induced pluripotent stem cell (iPSC) reprogramming at the time that 2C-like transcription transiently peaks. Through a combination of overexpression, knockdown, knockout, and rescue experiments together with transcriptional analyses, we show that Dppa2 and Dppa4 directly regulate the 2C-like cell population and associated transcripts, including Dux and the Zscan4 cluster. Importantly, we teased apart the molecular hierarchy in which the 2C-like transcriptional program is initiated and stabilized. Dppa2 and Dppa4 require Dux to initiate 2C-like transcription, suggesting that they act upstream by directly regulating Dux. Supporting this, ChIP-seq (chromatin immunoprecipitation [ChIP] combined with high-throughput sequencing) analysis revealed that Dppa2 and Dppa4 bind to the Dux promoter and gene body and drive its expression. Zscan4c is also able to induce 2C-like cells in wild-type cells but, in contrast to Dux, can no longer do so in Dppa2/4 double-knockout cells, suggesting that it may act to stabilize rather than drive the transcriptional network. Our findings suggest a model in which Dppa2/4 binding to the Dux promoter leads to Dux up-regulation and activation of the 2C-like transcriptional program, which is subsequently reinforced by Zscan4c.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/K010867/1; Wellcome Trust: 095645/Z/11/Z

    Genes & development 2019;33;3-4;194-208

  • Genomic architecture and introgression shape a butterfly radiation.

    Edelman NB, Frandsen PB, Miyagi M, Clavijo B, Davey J, Dikow RB, García-Accinelli G, Van Belleghem SM, Patterson N, Neafsey DE, Challis R, Kumar S, Moreira GRP, Salazar C, Chouteau M, Counterman BA, Papa R, Blaxter M, Reed RD, Dasmahapatra KK, Kronforst M, Joron M, Jiggins CD, McMillan WO, Di Palma F, Blumberg AJ, Wakeley J, Jaffe D and Mallet J

    Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA.

    We used 20 de novo genome assemblies to probe the speciation history and architecture of gene flow in rapidly radiating <i>Heliconius</i> butterflies. Our tests to distinguish incomplete lineage sorting from introgression indicate that gene flow has obscured several ancient phylogenetic relationships in this group over large swathes of the genome. Introgressed loci are underrepresented in low-recombination and gene-rich regions, consistent with the purging of foreign alleles more tightly linked to incompatibility loci. Here, we identify a hitherto unknown inversion that traps a color pattern switch locus. We infer that this inversion was transferred between lineages by introgression and is convergent with a similar rearrangement in another part of the genus. These multiple de novo genome sequences enable improved understanding of the importance of introgression and selective processes in adaptive radiation.

    Funded by: NCI NIH HHS: U54 CA193313; NHGRI NIH HHS: R01 HG003474, U54 HG003067; NIAID NIH HHS: U19 AI110818; NIGMS NIH HHS: P20 GM103475, R01 GM108626

    Science (New York, N.Y.) 2019;366;6465;594-599

  • Challenges in measuring and understanding biological noise.

    Eling N, Morgan MD and Marioni JC

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK.

    Biochemical reactions are intrinsically stochastic, leading to variation in the production of mRNAs and proteins within cells. In the scientific literature, this source of variation is typically referred to as 'noise'. The observed variability in molecular phenotypes arises from a combination of processes that amplify and attenuate noise. Our ability to quantify cell-to-cell variability in numerous biological contexts has been revolutionized by recent advances in single-cell technology, from imaging approaches through to 'omics' strategies. However, defining, accurately measuring and disentangling the stochastic and deterministic components of cell-to-cell variability is challenging. In this Review, we discuss the sources, impact and function of molecular phenotypic variability and highlight future directions to understand its role.

    Funded by: Cancer Research UK: 17197; Wellcome Trust: 105045/Z/14/Z

    Nature reviews. Genetics 2019;20;9;536-548

  • Derivation and maintenance of mouse haploid embryonic stem cells.

    Elling U, Woods M, Forment JV, Fu B, Yang F, Ng BL, Vicente JR, Adams DJ, Doe B, Jackson SP, Penninger JM and Balmus G

    Institute of Molecular Biotechnology of the Austrian Academy of Science (IMBA), Vienna Biocenter (VBC), Vienna, Austria.

    Ploidy represents the number of chromosome sets in a cell. Although gametes have a haploid genome (n), most mammalian cells have diploid genomes (2n). The diploid status of most cells correlates with the number of probable alleles for each autosomal gene and makes it difficult to target these genes via mutagenesis techniques. Here, we describe a 7-week protocol for the derivation of mouse haploid embryonic stem cells (hESCs) from female gametes that also outlines how to maintain the cells once derived. We detail additional procedures that can be used with cell lines obtained from the mouse Haplobank, a biobank of >100,000 individual mouse hESC lines with targeted mutations in 16,970 genes. hESCs can spontaneously diploidize and can be maintained in both haploid and diploid states. Mouse hESCs are genomically and karyotypically stable, are innately immortal and isogenic, and can be derived in an array of differentiated cell types; they are thus highly amenable to genetic screens and to defining molecular connectivity pathways.

    Nature protocols 2019;14;7;1991-2014

  • Contrasting patterns of longitudinal population dynamics and antimicrobial resistance mechanisms in two priority bacterial pathogens over 7 years in a single center.

    Ellington MJ, Heinz E, Wailan AM, Dorman MJ, de Goffau M, Cain AK, Henson SP, Gleadall N, Boinett CJ, Dougan G, Brown NM, Woodford N, Parkhill J, Török ME, Peacock SJ and Thomson NR

    Public Health England, National Infection Service, Addenbrooke's Hospital, Hills Road, Cambridge, CB2 0QW, UK.

    Background: Two of the most important pathogens contributing to the global rise in antimicrobial resistance (AMR) are Klebsiella pneumoniae and Enterobacter cloacae. Despite this, most of our knowledge about the changing patterns of disease caused by these two pathogens is based on studies with limited timeframes that provide few insights into their population dynamics or the dynamics in AMR elements that they can carry.

    Results: We investigate the population dynamics of two priority AMR pathogens over 7 years between 2007 and 2012 in a major UK hospital, spanning changes made to UK national antimicrobial prescribing policy in 2007. Between 2006 and 2012, K. pneumoniae showed epidemiological cycles of multi-drug-resistant (MDR) lineages being replaced approximately every 2 years. This contrasted E. cloacae where there was no temporally changing pattern, but a continuous presence of the mixed population.

    Conclusions: The differing patterns of clonal replacement and acquisition of mobile elements shows that the flux in the K. pneumoniae population was linked to the introduction of globally recognized MDR clones carrying drug resistance markers on mobile elements. However, E. cloacae carries a chromosomally encoded ampC conferring resistance to front-line treatments and shows that MDR plasmid acquisition in E. cloacae was not indicative of success in the hospital. This led to markedly different dynamics in the AMR populations of these two pathogens and shows that the mechanism of the resistance and its location in the genome or mobile elements is crucial to predict population dynamics of opportunistic pathogens in clinical settings.

    Funded by: Department of Health; Health Protection Agency: 108077; Medical Research Council: G1000803; Wellcome Trust: 098051

    Genome biology 2019;20;1;184

  • Studying immune to non-immune cell cross-talk using single-cell technologies.

    Elmentaite R, Teichmann SA and Madissoon E

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, United Kingdom.

    Single-cell RNA-sequencing has uncovered immune heterogeneity, including novel cell types, states and lineages that have expanded our understanding of the immune system as a whole. More recently, studies involving both immune and non-immune cells have demonstrated the importance of immune microenvironment in development, homeostasis and disease. This review focuses on the single-cell studies mapping cell-cell interactions for variety of tissues in development, health and disease. In addition, we address the need to generate a comprehensive interaction map to answer fundamental questions in immunology as well as experimental and computational strategies required for this purpose.

    Current opinion in systems biology 2019;18;87-94

  • A library of recombinant Babesia microti cell surface and secreted proteins for diagnostics discovery and reverse vaccinology.

    Elton CM, Rodriguez M, Ben Mamoun C, Lobo CA and Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Cambridge CB10 1SA, United Kingdom.

    Human babesiosis is an emerging tick-borne parasitic disease and blood transfusion-transmitted infection primarily caused by the apicomplexan parasite, Babesia microti. There is no licensed vaccine for B. microti and the development of a reliable serological screening test would contribute to ensuring the safety of the donated blood supply. The recent sequencing of the B. microti genome has revealed many novel genes encoding proteins that can now be tested for their suitability as subunit vaccine candidates and diagnostic serological markers. Extracellular proteins are considered excellent vaccine candidates and serological markers because they are directly exposed to the host humoral immune system, but can be challenging to express as soluble recombinant proteins. We have recently developed an approach based on a mammalian expression system that can produce large panels of functional recombinant cell surface and secreted parasite proteins. Here, we use the B. microti genome sequence to identify 54 genes that are predicted to encode surface-displayed and secreted proteins expressed during the blood stages, and show that 41 (76%) are expressed using our method at detectable levels. We demonstrate that the proteins contain conformational, heat-labile, epitopes and use them to serologically profile the kinetics of the humoral immune responses to two strains of B. microti in a murine infection model. Using sera from validated human infections, we show a concordance in the host antibody responses to B. microti infections in mouse and human hosts. Finally, we show that BmSA1 expressed in mammalian cells can elicit high antibody titres in vaccinated mice using a human-compatible adjuvant but these antibodies did not affect the pathology of infection in vivo. Our library of recombinant B. microti cell surface and secreted antigens constitutes a valuable resource that could contribute to the development of a serological diagnostic test, vaccines, and elucidate the molecular basis of host-parasite interactions.

    Funded by: Wellcome Trust: 206194

    International journal for parasitology 2019;49;2;115-125

  • DNA Sequence Variation in ACVR1C Encoding the Activin Receptor-Like Kinase 7 Influences Body Fat Distribution and Protects Against Type 2 Diabetes.

    Emdin CA, Khera AV, Aragam K, Haas M, Chaffin M, Klarin D, Natarajan P, Bick A, Zekavat SM, Nomura A, Ardissino D, Wilson JG, Schunkert H, McPherson R, Watkins H, Elosua R, Bown MJ, Samani NJ, Baber U, Erdmann J, Gupta N, Danesh J, Saleheen D, Gabriel S and Kathiresan S

    Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA.

    A genetic predisposition to higher waist-to-hip ratio adjusted for BMI (WHRadjBMI), a measure of body fat distribution, associates with increased risk for type 2 diabetes. We conducted an exome-wide association study of coding variation in UK Biobank (405,569 individuals) to identify variants that lower WHRadjBMI and protect against type 2 diabetes. We identified four variants in the gene <i>ACVR1C</i> (encoding the activin receptor-like kinase 7 receptor expressed on adipocytes and pancreatic β-cells), which independently associated with reduced WHRadjBMI: Asn150His (-0.09 SD, <i>P</i> = 3.4 × 10<sup>-17</sup>), Ile195Thr (-0.15 SD, <i>P</i> = 1.0 × 10<sup>-9</sup>), Ile482Val (-0.019 SD, <i>P</i> = 1.6 × 10<sup>-5</sup>), and rs72927479 (-0.035 SD, <i>P</i> = 2.6 × 10<sup>-12</sup>). Carriers of these variants exhibited reduced percent abdominal fat in DEXA imaging. Pooling across all four variants, a 0.2 SD decrease in WHRadjBMI through <i>ACVR1C</i> was associated with a 30% lower risk of type 2 diabetes (odds ratio [OR] 0.70, 95% CI 0.63, 0.77; <i>P</i> = 5.6 × 10<sup>-13</sup>). In an analysis of exome sequences from 55,516 individuals, carriers of predicted damaging variants in <i>ACVR1C</i> were at 54% lower risk of type 2 diabetes (OR 0.46, 95% CI 0.27, 0.81; <i>P</i> = 0.006). These findings indicate that variants predicted to lead to loss of <i>ACVR1C</i> gene function influence body fat distribution and protect from type 2 diabetes.

    Funded by: British Heart Foundation: CS/14/2/30841, RG2000010; Medical Research Council: MC_QA137853; NHGRI NIH HHS: K08 HG010155, U54 HG003067; NHLBI NIH HHS: HHSN268201300046C, HHSN268201300047C, HHSN268201300048C, HHSN268201300049C, HHSN268201300050C, R01 HL127564, RC2 HL102923, RC2 HL102924, RC2 HL102925, RC2 HL102926, RC2 HL103010; NIGMS NIH HHS: U54 GM115428

    Diabetes 2019;68;1;226-234

  • Staged developmental mapping and X chromosome transcriptional dynamics during mouse spermatogenesis.

    Ernst C, Eling N, Martinez-Jimenez CP, Marioni JC and Odom DT

    European Molecular Biology Laboratory, European Bioinformatics Institute, (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

    Male gametes are generated through a specialised differentiation pathway involving a series of developmental transitions that are poorly characterised at the molecular level. Here, we use droplet-based single-cell RNA-Sequencing to profile spermatogenesis in adult animals and at multiple stages during juvenile development. By exploiting the first wave of spermatogenesis, we both precisely stage germ cell development and enrich for rare somatic cell-types and spermatogonia. To capture the full complexity of spermatogenesis including cells that have low transcriptional activity, we apply a statistical tool that identifies previously uncharacterised populations of leptotene and zygotene spermatocytes. Focusing on post-meiotic events, we characterise the temporal dynamics of X chromosome re-activation and profile the associated chromatin state using CUT&RUN. This identifies a set of genes strongly repressed by H3K9me3 in spermatocytes, which then undergo extensive chromatin remodelling post-meiosis, thus acquiring an active chromatin state and spermatid-specific expression.

    Funded by: European Research Council: 615584; Wellcome Trust

    Nature communications 2019;10;1;1251

  • Meta-analysis of up to 622,409 individuals identifies 40 novel smoking behaviour associated genetic loci.

    Erzurumluoglu AM, Liu M, Jackson VE, Barnes DR, Datta G, Melbourne CA, Young R, Batini C, Surendran P, Jiang T, Adnan SD, Afaq S, Agrawal A, Altmaier E, Antoniou AC, Asselbergs FW, Baumbach C, Bierut L, Bertelsen S, Boehnke M, Bots ML, Brazel DM, Chambers JC, Chang-Claude J, Chen C, Corley J, Chou YL, David SP, de Boer RA, de Leeuw CA, Dennis JG, Dominiczak AF, Dunning AM, Easton DF, Eaton C, Elliott P, Evangelou E, Faul JD, Foroud T, Goate A, Gong J, Grabe HJ, Haessler J, Haiman C, Hallmans G, Hammerschlag AR, Harris SE, Hattersley A, Heath A, Hsu C, Iacono WG, Kanoni S, Kapoor M, Kaprio J, Kardia SL, Karpe F, Kontto J, Kooner JS, Kooperberg C, Kuulasmaa K, Laakso M, Lai D, Langenberg C, Le N, Lettre G, Loukola A, Luan J, Madden PAF, Mangino M, Marioni RE, Marouli E, Marten J, Martin NG, McGue M, Michailidou K, Mihailov E, Moayyeri A, Moitry M, Müller-Nurasyid M, Naheed A, Nauck M, Neville MJ, Nielsen SF, North K, Perola M, Pharoah PDP, Pistis G, Polderman TJ, Posthuma D, Poulter N, Qaiser B, Rasheed A, Reiner A, Renström F, Rice J, Rohde R, Rolandsson O, Samani NJ, Samuel M, Schlessinger D, Scholte SH, Scott RA, Sever P, Shao Y, Shrine N, Smith JA, Starr JM, Stirrups K, Stram D, Stringham HM, Tachmazidou I, Tardif JC, Thompson DJ, Tindle HA, Tragante V, Trompet S, Turcot V, Tyrrell J, Vaartjes I, van der Leij AR, van der Meer P, Varga TV, Verweij N, Völzke H, Wareham NJ, Warren HR, Weir DR, Weiss S, Wetherill L, Yaghootkar H, Yavas E, Jiang Y, Chen F, Zhan X, Zhang W, Zhao W, Zhao W, Zhou K, Amouyel P, Blankenberg S, Caulfield MJ, Chowdhury R, Cucca F, Deary IJ, Deloukas P, Di Angelantonio E, Ferrario M, Ferrières J, Franks PW, Frayling TM, Frossard P, Hall IP, Hayward C, Jansson JH, Jukema JW, Kee F, Männistö S, Metspalu A, Munroe PB, Nordestgaard BG, Palmer CNA, Salomaa V, Sattar N, Spector T, Strachan DP, Understanding Society Scientific Group, EPIC-CVD, GSCAN, Consortium for Genetics of Smoking Behaviour, CHD Exome+ consortium, van der Harst P, Zeggini E, Saleheen D, Butterworth AS, Wain LV, Abecasis GR, Danesh J, Tobin MD, Vrieze S, Liu DJ and Howson JMM

    Department of Health Sciences, University of Leicester, Leicester, UK.

    Smoking is a major heritable and modifiable risk factor for many diseases, including cancer, common respiratory disorders and cardiovascular diseases. Fourteen genetic loci have previously been associated with smoking behaviour-related traits. We tested up to 235,116 single nucleotide variants (SNVs) on the exome-array for association with smoking initiation, cigarettes per day, pack-years, and smoking cessation in a fixed effects meta-analysis of up to 61 studies (up to 346,813 participants). In a subset of 112,811 participants, a further one million SNVs were also genotyped and tested for association with the four smoking behaviour traits. SNV-trait associations with P < 5 × 10<sup>-8</sup> in either analysis were taken forward for replication in up to 275,596 independent participants from UK Biobank. Lastly, a meta-analysis of the discovery and replication studies was performed. Sixteen SNVs were associated with at least one of the smoking behaviour traits (P < 5 × 10<sup>-8</sup>) in the discovery samples. Ten novel SNVs, including rs12616219 near TMEM182, were followed-up and five of them (rs462779 in REV3L, rs12780116 in CNNM2, rs1190736 in GPR101, rs11539157 in PJA1, and rs12616219 near TMEM182) replicated at a Bonferroni significance threshold (P < 4.5 × 10<sup>-3</sup>) with consistent direction of effect. A further 35 SNVs were associated with smoking behaviour traits in the discovery plus replication meta-analysis (up to 622,409 participants) including a rare SNV, rs150493199, in CCDC141 and two low-frequency SNVs in CEP350 and HDGFRP2. Functional follow-up implied that decreased expression of REV3L may lower the probability of smoking initiation. The novel loci will facilitate understanding the genetic aetiology of smoking behaviour and may lead to the identification of potential drug targets for smoking prevention and/or cessation.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F019394/1; British Heart Foundation: RG/13/13/30194, RG/14/5/30893; British Heart Foundation (BHF): RG/13/13/30194, SP/09/002; Cancer Research UK: 10119, 10124; EC | European Research Council (ERC): 268834; Medical Research Council: G1000861, G1001799, MC_PC_12010, MC_PC_17228, MC_QA137853, MC_UU_00007/10, MC_UU_12015/1, MR/K023241/1, MR/K026992/1, MR/L003120/1, MR/L01341X/1, MR/L01632X/1, MR/N01104X/1, MR/N01104X/2, MR/N011317/1, MR/R023484/1, MR/S003762/1; NIDDK NIH HHS: R01 DK062370

    Molecular psychiatry 2019

  • Chromosome-level genome assembly for giant panda provides novel insights into Carnivora chromosome evolution.

    Fan H, Wu Q, Wei F, Yang F, Ng BL and Hu Y

    CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.

    Background: Chromosome evolution is an important driver of speciation and species evolution. Previous studies have detected chromosome rearrangement events among different Carnivora species using chromosome painting strategies. However, few of these studies have focused on chromosome evolution at a nucleotide resolution due to the limited availability of chromosome-level Carnivora genomes. Although the de novo genome assembly of the giant panda is available, current short read-based assemblies are limited to moderately sized scaffolds, making the study of chromosome evolution difficult.

    Results: Here, we present a chromosome-level giant panda draft genome with a total size of 2.29 Gb. Based on the giant panda genome and published chromosome-level dog and cat genomes, we conduct six large-scale pairwise synteny alignments and identify evolutionary breakpoint regions. Interestingly, gene functional enrichment analysis shows that for all of the three Carnivora genomes, some genes located in evolutionary breakpoint regions are significantly enriched in pathways or terms related to sensory perception of smell. In addition, we find that the sweet receptor gene TAS1R2, which has been proven to be a pseudogene in the cat genome, is located in an evolutionary breakpoint region of the giant panda, suggesting that interchromosomal rearrangement may play a role in the cat TAS1R2 pseudogenization.

    Conclusions: We show that the combined strategies employed in this study can be used to generate efficient chromosome-level genome assemblies. Moreover, our comparative genomics analyses provide novel insights into Carnivora chromosome evolution, linking chromosome evolution to functional gene evolution.

    Genome biology 2019;20;1;267

  • A Dietary Pattern with High Sugar Content Is Associated with Cardiometabolic Risk Factors in the Pomak Population.

    Farmaki AE, Rayner NW, Kafyra M, Matchan A, Ntaoutidou K, Feritoglou P, Athanasiadis A, Gilly A, Mamakou V, Zengini E, Karaleftheri M, Zeggini E and Dedoussis G

    Department of Nutrition and Dietetics, School of Health Science and Education, Harokopio University, 17671 Athens, Greece.

    The present study describes the geographically isolated Pomak population and its particular dietary patterns in relationship to cardiovascular risk factors. We collected a population-based cohort in a cross-sectional study, with detailed anthropometric, biochemical, clinical, and lifestyle parameter information. Dietary patterns were derived through principal component analysis based on a validated food-frequency questionnaire, administered to 1702 adult inhabitants of the Pomak villages on the Rhodope mountain range in Greece. A total of 69.9% of the participants were female with a population mean age of 44.9 years; 67% of the population were overweight or obese with a significantly different prevalence for obesity between men and women (17.5% vs. 37.5%, respectively, <i>p</i> < 0.001). Smoking was more prevalent in men (45.8% vs. 2.2%, <i>p</i> < 0.001), as 97.3% of women had never smoked. Four dietary patterns emerged as characteristic of the population, and were termed "high in sugars", "quick choices", "balanced", and "homemade". Higher adherence to the "high in sugars" dietary pattern was associated with increased glucose levels (<i>p</i> < 0.001) and increased risk of hypertension (OR (95% CI) 2.61 (1.55, 4.39), <i>p</i> < 0.001) and nominally associated with high blood glucose levels (OR (95% CI) 1.85 (1.11, 3.08), <i>p</i> = 0.018), compared to lower adherence. Overall, we characterize the dietary patterns of the Pomak population and describe associations with cardiovascular risk factors.

    Funded by: European Research Council: ERC-2011-StG 280559-SEPI; Wellcome Trust: 098051

    Nutrients 2019;11;12

  • Complimentary Methods for Multivariate Genome-Wide Association Study Identify New Susceptibility Genes for Blood Cell Traits.

    Fatumo S, Carstensen T, Nashiru O, Gurdasani D, Sandhu M and Kaleebu P

    Uganda Medical Informatics Centre, MRC/UVRI and LSHTM Uganda Research Unit, Entebbe, Uganda.

    Genome-wide association studies (GWAS) have found hundreds of novel loci associated with full blood count (FBC) phenotypes. However, most of these studies were performed in a single phenotype framework without putting into consideration the clinical relatedness among traits. In this work, in addition to the standard univariate GWAS, we also use two different multivariate methods to perform the first multiple traits GWAS of FBC traits in ∼7000 individuals from the Ugandan General Population Cohort (GPC). We started by performing the standard univariate GWAS approach. We then performed our first multivariate method, in this approach, we tested for marker associations with 15 FBC traits simultaneously in a multivariate mixed model implemented in GEMMA while accounting for the relatedness of individuals and pedigree structures, as well as population substructure. In this analysis, we provide a framework for the combination of multiple phenotypes in multivariate GWAS analysis and show evidence of multi-collinearity whenever the correlation between traits exceeds the correlation coefficient threshold of <i>r</i><sup>2</sup> >=0.75. This approach identifies two known and one novel loci. In the second multivariate method, we applied principal component analysis (PCA) to the same 15 correlated FBC traits. We then tested for marker associations with each PC in univariate linear mixed models implemented in GEMMA. We show that the FBC composite phenotype as assessed by each PC expresses information that is not completely encapsulated by the individual FBC traits, as this approach identifies three known and five novel loci that were not identified using both the standard univariate and multivariate GWAS methods. Across both multivariate methods, we identified six novel loci. As a proof of concept, both multivariate methods also identified known loci, <i>HBB</i> and <i>ITFG3</i>. The two multivariate methods show that multivariate genotype-phenotype methods increase power and identify novel genotype-phenotype associations not found with the standard univariate GWAS in the same dataset.

    Funded by: NIMH NIH HHS: U01 MH115485; Wellcome Trust

    Frontiers in genetics 2019;10;334

  • Complete Whole-Genome Sequence of Haemophilus haemolyticus NCTC 10839.

    Fazal MA, Alexander S, Grayson NE, Deheer-Graham A, Oliver K, Holroyd N, Parkhill J and Russell JE

    Culture Collections, Public Health England, London, United Kingdom.

    <i>Haemophilus haemolyticus</i> is a Gram-negative bacterium that is a commensal of the respiratory tract in humans. Here, we report the complete genome sequence available for <i>Haemophilus haemolyticus</i> strain NCTC 10839, which was originally isolated from the nasopharynx of a child.

    Funded by: Wellcome Trust

    Microbiology resource announcements 2019;8;25

  • Complete Whole-Genome Sequences of Two Raoultella terrigena Strains, NCTC 13097 and NCTC 13098, Isolated from Human Cases.

    Fazal MA, Alexander S, Grayson NE, Deheer-Graham A, Oliver K, Holroyd N, Parkhill J and Russell JE

    Culture Collections, Public Health England, London, United Kingdom.

    <i>Raoultella terrigena</i> is a bacterial species associated with soil and aquatic environments; however, sporadic cases of opportunistic disease in humans have been reported. Here, we report the first two complete genome sequences from clinical strains isolated from human sources that have been deposited in the National Collection of Type Cultures (NCTC).

    Funded by: Wellcome Trust

    Microbiology resource announcements 2019;8;27

  • A cellular census of human lungs identifies novel cell states in health and in asthma

    Felipe A. Vieira Braga, Gozde Kar, Marijn Berg, Orestes A. Carpaij, Krzysztof Polanski, Lukas M. Simon, Sharon Brouwer, Tom&#225;s Gomes, Laura Hesse, Jian Jiang, Eirini S. Fasouli, Mirjana Efremova, Roser Vento-Tormo, Carlos Talavera-L&#243;pez, Marnix R. Jonker, Karen Affleck, Subarna Palit, Paulina M. Strzelecka, Helen V. Firth, Krishnaa T. Mahbubani, Ana Cvejic, Kerstin B. Meyer, Kourosh Saeb-Parsy, Marjan Luinge, Corry-Anke Brandsma, Wim Timens, Ilias Angelidis, Maximilian Strunz, Gerard H. Koppelman, Antoon J. van Oosterhout, Herbert B. Schiller, Fabian J. Theis, Maarten van den Berge, Martijn C. Nawijn and Sarah A. Teichmann

    Human lungs enable efficient gas exchange and form an interface with the environment, which depends on mucosal immunity for protection against infectious agents. Tightly controlled interactions between structural and immune cells are required to maintain lung homeostasis. Here, we use single-cell transcriptomics to chart the cellular landscape of upper and lower airways and lung parenchyma in healthy lungs, and lower airways in asthmatic lungs. We report location-dependent airway epithelial cell states and a novel subset of tissue-resident memory T cells. In the lower airways of patients with asthma, mucous cell hyperplasia is shown to stem from a novel mucous ciliated cell state, as well as goblet cell hyperplasia. We report the presence of pathogenic effector type 2 helper T cells (TH2) in asthmatic lungs and find evidence for type 2 cytokines in maintaining the altered epithelial cell states. Unbiased analysis of cell&#8211;cell interactions identifies a shift from airway structural cell communication in healthy lungs to a TH2-dominated interactome in asthmatic lungs. Single-cell transcriptomics reveals immune and stromal compartment remodeling, including the enrichment of unique populations of epithelial cells and CD4+ T cells, in asthmatic lungs

    Nature Medicine 2019;25;7;1153

  • Separating Bacteria by Capsule Amount Using a Discontinuous Density Gradient.

    Feltwell T, Dorman MJ, Goulding DA, Parkhill J and Short FL

    Wellcome Sanger Institute, Wellcome Genome Campus.

    Capsule is a key virulence factor in many bacterial species, mediating immune evasion and resistance to various physical stresses. While many methods are available to quantify and compare capsule production between different strains or mutants, there is no widely used method for sorting bacteria based on how much capsule they produce. We have developed a method to separate bacteria by capsule amount, using a discontinuous density gradient. This method is used to compare capsule amounts semi-quantitatively between cultures, to isolate mutants with altered capsule production, and to purify capsulated bacteria from complex samples. This method can also be coupled with transposon-insertion sequencing to identify genes involved in capsule regulation. Here, the method is demonstrated in detail, including how to optimize the gradient conditions for a new bacterial species or strain, and how to construct and run the density gradient.

    Journal of visualized experiments : JoVE 2019;143

  • Outcompeting p53-Mutant Cells in the Normal Esophagus by Redox Manipulation.

    Fernandez-Antoran D, Piedrafita G, Murai K, Ong SH, Herms A, Frezza C and Jones PH

    Wellcome Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    As humans age, normal tissues, such as the esophageal epithelium, become a patchwork of mutant clones. Some mutations are under positive selection, conferring a competitive advantage over wild-type cells. We speculated that altering the selective pressure on mutant cell populations may cause them to expand or contract. We tested this hypothesis by examining the effect of oxidative stress from low-dose ionizing radiation (LDIR) on wild-type and p53 mutant cells in the transgenic mouse esophagus. We found that LDIR drives wild-type cells to stop proliferating and differentiate. p53 mutant cells are insensitive to LDIR and outcompete wild-type cells following exposure. Remarkably, combining antioxidant treatment and LDIR reverses this effect, promoting wild-type cell proliferation and p53 mutant differentiation, reducing the p53 mutant population. Thus, p53-mutant cells can be depleted from the normal esophagus by redox manipulation, showing that external interventions may be used to alter the mutational landscape of an aging tissue.

    Funded by: Cancer Research UK: C609/A17257; Medical Research Council: MC_UU_12022/6; Wellcome Trust: 098051, 296194

    Cell stem cell 2019;25;3;329-341.e6

  • Residual Variation Intolerance Score Detects Loci Under Selection in Neuroinvasive Listeria monocytogenes.

    Ferwerda B, Maury MM, Brouwer MC, Hafner L, van der Ende A, Bentley S, Lecuit M and van de Beek D

    Department of Neurology, Amsterdam Neuroscience, Amsterdam UMC, University of Amsterdam, Amsterdam, Netherlands.

    <i>Listeria monocytogenes</i> is a Gram-positive bacterium that can be found in a broad range of environments, including soil, food, animals, and humans. <i>L. monocytogenes</i> can cause a foodborne disease manifesting as sepsis and meningo-encephalitis. To evaluate signals of selection within the core genome of neuroinvasive <i>L. monocytogenes</i> strains, we sequenced 122 <i>L. monocytogenes</i> strains from cerebrospinal fluid (CSF) of Dutch meningitis patients and performed a genome-wide analysis using Tajima's D and ω (dN/dS). We also evaluated the residual variation intolerance score (RVIS), a computationally less demanding methodology, to identify loci under selection. Results show that the large genetic distance between the listerial lineages influences the Tajima's D and ω (dN/dS) outcome. Within genetic lineages we detected signals of selection in 6 of 2327 loci (<1%), which were replicated in an external cohort of 105 listerial CSF isolates from France. Functions of identified loci under selection were within metabolism pathways (<i>lmo2476</i>, encoding aldose 1-epimerase), putative antimicrobial resistance mechanisms (<i>lmo1855</i>, encoding PBPD3), and virulence factors (<i>lmo0549</i>, internalin-like protein; <i>lmo1482</i>, encoding comEC). RVIS over the two genetic lineages showed signals of selection in internalin-like proteins loci potentially involved in pathogen-host interaction (<i>lmo0549</i>, <i>lmo0610</i>, and <i>lmo1290</i>). Our results show that RVIS can be used to detect bacterial loci under selection.

    Frontiers in microbiology 2019;10;2702

  • Malta (MYH9 Associated Elastin Aggregation) Syndrome: Germline Variants in MYH9 Cause Rare Sweat Duct Proliferations and Irregular Elastin Aggregations.

    Fewings E, Ziemer M, Hörtnagel K, Reicherter K, Larionov A, Redman J, Goldgraben MA, Pepler A, Hearn T, Firth H, Ha T, Schaller J, Adams DJ, Rytina E, van Steensel M and Tischkowitz M

    Academic Department of Medical Genetics, University of Cambridge, Cambridge, United Kingdom.

    The Journal of investigative dermatology 2019

  • The epilepsy-associated protein TBC1D24 is required for normal development, survival and vesicle trafficking in mammalian neurons.

    Finelli MJ, Aprile D, Castroflorio E, Jeans A, Moschetta M, Chessum L, Degiacomi MT, Grasegger J, Lupien-Meilleur A, Bassett A, Rossignol E, Campeau PM, Bowl MR, Benfenati F, Fassio A and Oliver PL

    Department of Physiology, Anatomy and Genetics, University of Oxford, Parks Road, Oxford, UK.

    Mutations in the Tre2/Bub2/Cdc16 (TBC)1 domain family member 24 (TBC1D24) gene are associated with a range of inherited neurological disorders, from drug-refractory lethal epileptic encephalopathy and DOORS syndrome (deafness, onychodystrophy, osteodystrophy, mental retardation, seizures) to non-syndromic hearing loss. TBC1D24 has been implicated in neuronal transmission and maturation, although the molecular function of the gene and the cause of the apparently complex disease spectrum remain unclear. Importantly, heterozygous TBC1D24 mutation carriers have also been reported with seizures, suggesting that haploinsufficiency for TBC1D24 is significant clinically. Here we have systematically investigated an allelic series of disease-associated mutations in neurons alongside a new mouse model to investigate the consequences of TBC1D24 haploinsufficiency to mammalian neurodevelopment and synaptic physiology. The cellular studies reveal that disease-causing mutations that disrupt either of the conserved protein domains in TBC1D24 are implicated in neuronal development and survival and are likely acting as loss-of-function alleles. We then further investigated TBC1D24 haploinsufficiency in vivo and demonstrate that TBC1D24 is also crucial for normal presynaptic function: genetic disruption of Tbc1d24 expression in the mouse leads to an impairment of endocytosis and an enlarged endosomal compartment in neurons with a decrease in spontaneous neurotransmission. These data reveal the essential role for TBC1D24 at the mammalian synapse and help to define common synaptic mechanisms that could underlie the varied effects of TBC1D24 mutations in neurological disease.

    Funded by: European Research Council: 311394; Medical Research Council: MC_EX_MR/P502005/1, MC_UP_1503/2

    Human molecular genetics 2019;28;4;584-597

  • Translating insights into tumor evolution to clinical practice: promises and challenges.

    Fittall MW and Van Loo P

    The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK.

    Accelerating technological advances have allowed the widespread genomic profiling of tumors. As yet, however, the vast catalogues of mutations that have been identified have made only a modest impact on clinical medicine. Massively parallel sequencing has informed our understanding of the genetic evolution and heterogeneity of cancers, allowing us to place these mutational catalogues into a meaningful context. Here, we review the methods used to measure tumor evolution and heterogeneity, and the potential and challenges for translating the insights gained to achieve clinical impact for cancer therapy, monitoring, early detection, risk stratification, and prevention. We discuss how tumor evolution can guide cancer therapy by targeting clonal and subclonal mutations both individually and in combination. Circulating tumor DNA and circulating tumor cells can be leveraged for monitoring the efficacy of therapy and for tracking the emergence of resistant subclones. The evolutionary history of tumors can be deduced for late-stage cancers, either directly by sampling precursor lesions or by leveraging computational approaches to infer the timing of driver events. This approach can identify recurrent early driver mutations that represent promising avenues for future early detection strategies. Emerging evidence suggests that mutational processes and complex clonal dynamics are active even in normal development and aging. This will make discriminating developing malignant neoplasms from normal aging cell lineages a challenge. Furthermore, insight into signatures of mutational processes that are active early in tumor evolution may allow the development of cancer-prevention approaches. Research and clinical studies that incorporate an appreciation of the complex evolutionary patterns in tumors will not only produce more meaningful genomic data, but also better exploit the vulnerabilities of cancer, resulting in improved treatment outcomes.

    Funded by: Cancer Research UK: C422/A21434, FC001202; Medical Research Foundation: FC001202; Wellcome Trust: FC001202

    Genome medicine 2019;11;1;20

  • Federated discovery and sharing of genomic data using Beacons.

    Fiume M, Cupak M, Keenan S, Rambla J, de la Torre S, Dyke SOM, Brookes AJ, Carey K, Lloyd D, Goodhand P, Haeussler M, Baudis M, Stockinger H, Dolman L, Lappalainen I, Törnroos J, Linden M, Spalding JD, Ur-Rehman S, Page A, Flicek P, Sherry S, Haussler D, Varma S, Saunders G and Scollen S

    DNAstack, Toronto, Ontario, Canada.

    Funded by: Wellcome Trust: 098051, 201535

    Nature biotechnology 2019;37;3;220-224

  • Exome sequencing of 20,791 cases of type 2 diabetes and 24,440 controls.

    Flannick J, Mercader JM, Fuchsberger C, Udler MS, Mahajan A, Wessel J, Teslovich TM, Caulkins L, Koesterer R, Barajas-Olmos F, Blackwell TW, Boerwinkle E, Brody JA, Centeno-Cruz F, Chen L, Chen S, Contreras-Cubas C, Córdova E, Correa A, Cortes M, DeFronzo RA, Dolan L, Drews KL, Elliott A, Floyd JS, Gabriel S, Garay-Sevilla ME, García-Ortiz H, Gross M, Han S, Heard-Costa NL, Jackson AU, Jørgensen ME, Kang HM, Kelsey M, Kim BJ, Koistinen HA, Kuusisto J, Leader JB, Linneberg A, Liu CT, Liu J, Lyssenko V, Manning AK, Marcketta A, Malacara-Hernandez JM, Martínez-Hernández A, Matsuo K, Mayer-Davis E, Mendoza-Caamal E, Mohlke KL, Morrison AC, Ndungu A, Ng MCY, O'Dushlaine C, Payne AJ, Pihoker C, Broad Genomics Platform, Post WS, Preuss M, Psaty BM, Vasan RS, Rayner NW, Reiner AP, Revilla-Monsalve C, Robertson NR, Santoro N, Schurmann C, So WY, Soberón X, Stringham HM, Strom TM, Tam CHT, Thameem F, Tomlinson B, Torres JM, Tracy RP, van Dam RM, Vujkovic M, Wang S, Welch RP, Witte DR, Wong TY, Atzmon G, Barzilai N, Blangero J, Bonnycastle LL, Bowden DW, Chambers JC, Chan E, Cheng CY, Cho YS, Collins FS, de Vries PS, Duggirala R, Glaser B, Gonzalez C, Gonzalez ME, Groop L, Kooner JS, Kwak SH, Laakso M, Lehman DM, Nilsson P, Spector TD, Tai ES, Tuomi T, Tuomilehto J, Wilson JG, Aguilar-Salinas CA, Bottinger E, Burke B, Carey DJ, Chan JCN, Dupuis J, Frossard P, Heckbert SR, Hwang MY, Kim YJ, Kirchner HL, Lee JY, Lee J, Loos RJF, Ma RCW, Morris AD, O'Donnell CJ, Palmer CNA, Pankow J, Park KS, Rasheed A, Saleheen D, Sim X, Small KS, Teo YY, Haiman C, Hanis CL, Henderson BE, Orozco L, Tusié-Luna T, Dewey FE, Baras A, Gieger C, Meitinger T, Strauch K, Lange L, Grarup N, Hansen T, Pedersen O, Zeitler P, Dabelea D, Abecasis G, Bell GI, Cox NJ, Seielstad M, Sladek R, Meigs JB, Rich SS, Rotter JI, DiscovEHR Collaboration, CHARGE, LuCamp, ProDiGY, GoT2D, ESP, SIGMA-T2D, T2D-GENES, AMP-T2D-GENES, Altshuler D, Burtt NP, Scott LJ, Morris AP, Florez JC, McCarthy MI and Boehnke M

    Program in Metabolism, Broad Institute, Cambridge, MA, USA.

    Protein-coding genetic variants that strongly affect disease risk can yield relevant clues to disease pathogenesis. Here we report exome-sequencing analyses of 20,791 individuals with type 2 diabetes (T2D) and 24,440 non-diabetic control participants from 5 ancestries. We identify gene-level associations of rare variants (with minor allele frequencies of less than 0.5%) in 4 genes at exome-wide significance, including a series of more than 30 SLC30A8 alleles that conveys protection against T2D, and in 12 gene sets, including those corresponding to T2D drug targets (P = 6.1 × 10<sup>-3</sup>) and candidate genes from knockout mice (P = 5.2 × 10<sup>-3</sup>). Within our study, the strongest T2D gene-level signals for rare variants explain at most 25% of the heritability of the strongest common single-variant signals, and the gene-level effect sizes of the rare variants that we observed in established T2D drug targets will require 75,000-185,000 sequenced cases to achieve exome-wide significance. We propose a method to interpret these modest rare-variant associations and to incorporate these associations into future target or gene prioritization efforts.

    Funded by: Medical Research Council: G0601261; NCATS NIH HHS: UL1 TR000040, UL1 TR001079, UL1 TR001420, UL1 TR001881, UL1 TR002345, UL1 TR002548; NCRR NIH HHS: M01 RR000036, M01 RR000043, M01 RR000069, M01 RR000084, M01 RR000125, M01 RR001066, M01 RR014467, UL1 RR024134, UL1 RR024139, UL1 RR024153, UL1 RR024989, UL1 RR024992, UL1 RR025758, UL1 RR025780; NHGRI NIH HHS: U01 HG007417, U54 HG003067, U54 HG003273; NHLBI NIH HHS: HHSN268200800007C, HHSN268201200036C, HHSN268201300046C, HHSN268201300047C, HHSN268201300048C, HHSN268201300049C, HHSN268201300050C, HHSN268201500001C, HHSN268201500001I, HHSN268201500003C, HHSN268201700001I, HHSN268201700002I, HHSN268201700003I, HHSN268201700004I, HHSN268201700005I, HHSN268201800001C, N01HC25195, N01HC55222, N01HC85079, N01HC85080, N01HC85081, N01HC85082, N01HC85083, N01HC85086, N01HC95159, N01HC95160, N01HC95161, N01HC95163, N01HC95164, N01HC95165, N01HC95166, N01HC95167, N01HC95168, N01HC95169, R01 HL086694, R01 HL092577, R01 HL105756, R01 HL113323, RC2 HL102419, RC2 HL102923, RC2 HL102924, RC2 HL102925, RC2 HL102926, RC2 HL103010, U01 HL080295, U01 HL130114; NIA NIH HHS: P01 AG027734, P30 AG038072, R01 AG008122, R01 AG023629, R01 AG033193, R01 AG042188, R01 AG046949; NIDDK NIH HHS: K24 DK080140, K24 DK110550, P30 DK020595, P30 DK057521, P30 DK063491, R01 DK047482, R01 DK053889, R01 DK062370, R01 DK066358, R01 DK072193, R01 DK098032, R01 DK107786, R01 DK110113, R56 DK062370, RC2 DK088389, U01 DK057295, U01 DK061212, U01 DK061230, U01 DK061239, U01 DK061242, U01 DK061254, U01 DK062370, U01 DK078616, U01 DK085524, U01 DK085526, U01 DK085545, U01 DK105535, U01 DK105554; NIGMS NIH HHS: U54 GM115428; Wellcome Trust

    Nature 2019;570;7759;71-76

  • Evidence for increased genetic risk load for major depression in patients assigned to electroconvulsive therapy.

    Foo JC, Streit F, Frank J, Witt SH, Treutlein J, Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium, Baune BT, Moebus S, Jöckel KH, Forstner AJ, Nöthen MM, Rietschel M, Sartorius A and Kranaster L

    Central Institute of Mental Health, Department of Genetic Epidemiology in Psychiatry, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany.

    Electroconvulsive therapy (ECT) is the treatment of choice for severe and treatment-resistant depression; disorder severity and unfavorable treatment outcomes are shown to be influenced by an increased genetic burden for major depression (MD). Here, we tested whether ECT assignment and response/nonresponse are associated with an increased genetic burden for major depression (MD) using polygenic risk score (PRS), which summarize the contribution of disease-related common risk variants. Fifty-one psychiatric inpatients suffering from a major depressive episode underwent ECT. MD-PRS were calculated for these inpatients and a separate population-based sample (n = 3,547 healthy; n = 426 self-reported depression) based on summary statistics from the Psychiatric Genomics Consortium MDD-working group (Cases: n = 59,851; Controls: n = 113,154). MD-PRS explained a significant proportion of disease status between ECT patients and healthy controls (p = .022, R<sup>2</sup> = 1.173%); patients showed higher MD-PRS. MD-PRS in population-based depression self-reporters were intermediate between ECT patients and controls (n.s.). Significant associations between MD-PRS and ECT response (50% reduction in Hamilton depression rating scale scores) were not observed. Our findings indicate that ECT cohorts show an increased genetic burden for MD and are consistent with the hypothesis that treatment-resistant MD patients represent a subgroup with an increased genetic risk for MD. Larger samples are needed to better substantiate these findings.

    Funded by: Bundesministerium für Bildung und Forschung: 01ZX1311A, 01ZX1314A, 01ZX1314G; Deutsche Forschungsgemeinschaft: NO246/10-1, RI 908/11-1; NIMH NIH HHS: U01 MH109528, U01 MH109532; National Institute of Mental Health: U01 MH109528; National Institute on Drug Abuse: U01 MH1095320

    American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics 2019;180;1;35-45

  • Derivation of Intestinal Organoids from Human Induced Pluripotent Stem Cells for Use as an Infection System.

    Forbester JL, Hannan N, Vallier L and Dougan G

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Intestinal human organoids (iHOs) provide an effective system for studying the intestinal epithelium and its interaction with various stimuli. By using combinations of different signaling factors, human induced pluripotent stem cells (hIPSCs) can be driven to differentiate down the intestinal lineage. Here, we describe the process for this differentiation, including the derivation of hindgut from hIPSCs, embedding hindgut into a pro-intestinal culture system and passaging the resulting iHOs. We then describe how to carry out microinjections to introduce bacteria to the apical side of the intestinal epithelial cells (IECs).

    Funded by: Medical Research Council; Wellcome Trust

    Methods in molecular biology (Clifton, N.J.) 2019;1576;157-169

  • A human gut bacterial genome and culture collection for improved metagenomic analyses.

    Forster SC, Kumar N, Anonye BO, Almeida A, Viciani E, Stares MD, Dunn M, Mkandawire TT, Zhu A, Shao Y, Pike LJ, Louie T, Browne HP, Mitchell AL, Neville BA, Finn RD and Lawley TD

    Host-Microbiota Interactions Laboratory, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

    Understanding gut microbiome functions requires cultivated bacteria for experimental validation and reference bacterial genome sequences to interpret metagenome datasets and guide functional analyses. We present the Human Gastrointestinal Bacteria Culture Collection (HBC), a comprehensive set of 737 whole-genome-sequenced bacterial isolates, representing 273 species (105 novel species) from 31 families found in the human gastrointestinal microbiota. The HBC increases the number of bacterial genomes derived from human gastrointestinal microbiota by 37%. The resulting global Human Gastrointestinal Bacteria Genome Collection (HGG) classifies 83% of genera by abundance across 13,490 shotgun-sequenced metagenomic samples, improves taxonomic classification by 61% compared to the Human Microbiome Project (HMP) genome collection and achieves subspecies-level classification for almost 50% of sequences. The improved resource of gastrointestinal bacterial reference sequences circumvents dependence on de novo assembly of metagenomes and enables accurate and cost-effective shotgun metagenomic analyses of human gastrointestinal microbiota.

    Funded by: Wellcome Trust

    Nature biotechnology 2019;37;2;186-192

  • Drug Sensitivity Assays of Human Cancer Organoid Cultures.

    Francies HE, Barthorpe A, McLaren-Douglas A, Barendt WJ and Garnett MJ

    Wellcome Trust Sanger Institute, Cambridge, CB10 1SA, UK.

    Drug sensitivity testing utilizing preclinical disease models such as cancer cell lines is an important and widely used tool for drug development. Importantly, when combined with molecular data such as gene copy number variation or somatic coding mutations, associations between drug sensitivity and molecular data can be used to develop markers to guide patient therapies. The use of organoids as a preclinical cancer model has become possible following recent work demonstrating that organoid cultures can be derived from patient tumors with a high rate of success. A genetic analysis of colon cancer organoids found that these models encompassed the majority of the somatic variants present within the tumor from which it was derived, and capture much of the genetic diversity of colon cancer observed in patients. Importantly, the systematic sensitivity testing of organoid cultures to anticancer drugs identified clinical gene-drug interactions, suggestive of their potential as preclinical models for testing anticancer drug sensitivity. In this chapter, we describe how to perform medium/high-throughput drug sensitivity screens using 3D organoid cell cultures.

    Funded by: Cancer Research UK: A22536; Wellcome Trust: 102696

    Methods in molecular biology (Clifton, N.J.) 2019;1576;339-351

  • A maternally inherited frameshift CDKL5 variant in a male with global developmental delay and late-onset generalized epilepsy.

    Fraser H, Goldman A, Wright R, Deciphering Developmental Disorders Study and Banka S

    Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Health Innovation Manchester, Manchester, United Kingdom.

    Pathogenic CDKL5 variants cause an X-linked dominant infantile epileptic encephalopathy, predominantly in females. This condition is characterized by an early-onset severe mixed seizure disorder. We present a maternally inherited frameshift CDKL5 c.2809_2810insA p.(Cys937Ter) variant in a 13-year-old male with severe intellectual disability and late-onset generalized epilepsy. Interestingly, the variant segregation in the family is consistent with an X-linked recessive inheritance pattern, which has not previously been described with this gene. This variant is expected to result in truncation of some CDKL5 transcripts, which could potentially account for the later seizure onset and atypical inheritance pattern. Though the possibility of this variant not being causal cannot be completely excluded, this case adds to the variability of the documented phenotypic profile and to the debate around the role of C-terminus variants in CDKL5-related disease.

    Funded by: Cambridge South REC: 10/H0305/83; Health Innovation Challenge Fund: HICF-1009-003; National Institute for Health Research; Republic of Ireland REC: GEN/284/12; Wellcome Sanger Institute: WT098051

    American journal of medical genetics. Part A 2019;179;3;507-511

  • PiggyBac Transposon-Based Insertional Mutagenesis in Mice.

    Friedrich MJ, Bronner IF, Liu P, Bradley A and Rad R

    The Wellcome Trust Sanger Institute, Hinxton, UK.

    While sequencing and array-based studies are creating catalogues of genetic alterations in cancer, discriminating cancer drivers among the large sets of epigenetically, transcriptionally or posttranslationally dysregulated genes remains a challenge. Transposon-based genetic screening in mice has proven to be a powerful approach to address this challenge. Insertional mutagenesis directly flags biologically relevant genes and, combined with the transposon's unique molecular fingerprint, facilitates the recovery of insertion sites. We have generated transgenic mouse lines harboring different versions of PiggyBac-based oncogenic transposons, which in conjunction with PiggyBac transposase mice can be used for whole-body or tissue-specific insertional mutagenesis screens. We have also developed QiSeq, a method for (semi-)quantitative transposon insertion site sequencing, which overcomes biasing limitations of previous library preparation methods. QiSeq can be used in multiplexed high-throughput formats for candidate cancer gene discovery and gives insights into the clonal distribution of insertions for the study of genetic tumor evolution.

    Methods in molecular biology (Clifton, N.J.) 2019;1907;171-183

  • High-resolution mapping reveals that microniches in the gastric glands control Helicobacter pylori colonization of the stomach.

    Fung C, Tan S, Nakajima M, Skoog EC, Camarillo-Guerrero LF, Klein JA, Lawley TD, Solnick JV, Fukami T and Amieva MR

    Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, California, United States of America.

    Lifelong infection of the gastric mucosa by Helicobacter pylori can lead to peptic ulcers and gastric cancer. However, how the bacteria maintain chronic colonization in the face of constant mucus and epithelial cell turnover in the stomach is unclear. Here, we present a new model of how H. pylori establish and persist in stomach, which involves the colonization of a specialized microenvironment, or microniche, deep in the gastric glands. Using quantitative three-dimensional (3D) confocal microscopy and passive CLARITY technique (PACT), which renders tissues optically transparent, we analyzed intact stomachs from mice infected with a mixture of isogenic, fluorescent H. pylori strains with unprecedented spatial resolution. We discovered that a small number of bacterial founders initially establish colonies deep in the gastric glands and then expand to colonize adjacent glands, forming clonal population islands that persist over time. Gland-associated populations do not intermix with free-swimming bacteria in the surface mucus, and they compete for space and prevent newcomers from establishing in the stomach. Furthermore, bacterial mutants deficient in gland colonization are outcompeted by wild-type (WT) bacteria. Finally, we found that host factors such as the age at infection and T-cell responses control bacterial density within the glands. Collectively, our results demonstrate that microniches in the gastric glands house a persistent H. pylori reservoir, which we propose replenishes the more transient bacterial populations in the superficial mucosa.

    Funded by: NIAID NIH HHS: R21 AI137759

    PLoS biology 2019;17;5;e3000231

  • Methylation Warfare: Interaction of Pneumococcal Bacteriophages with Their Host.

    Furi L, Crawford LA, Rangel-Pineros G, Manso AS, De Ste Croix M, Haigh RD, Kwun MJ, Engelsen Fjelland K, Gilfillan GD, Bentley SD, Croucher NJ, Clokie MR and Oggioni MR

    Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom.

    Virus-host interactions are regulated by complex coevolutionary dynamics. In <i>Streptococcus pneumoniae</i>, phase-variable type I restriction-modification (R-M) systems are part of the core genome. We hypothesized that the ability of the R-M systems to switch between six target DNA specificities also has a key role in preventing the spread of bacteriophages. Using the streptococcal temperate bacteriophage SpSL1, we show that the variants of both the SpnIII and SpnIV R-M systems are able to restrict invading bacteriophage with an efficiency approximately proportional to the number of target sites in the bacteriophage genome. In addition to restriction of lytic replication, SpnIII also led to abortive infection in the majority of host cells. During lytic infection, transcriptional analysis found evidence of phage-host interaction through the strong upregulation of the <i>nrdR</i> nucleotide biosynthesis regulon. During lysogeny, the phage had less of an effect on host gene regulation. This research demonstrates a novel combined bacteriophage restriction and abortive infection mechanism, highlighting the importance that the phase-variable type I R-M systems have in the multifunctional defense against bacteriophage infection in the respiratory pathogen <i>S. pneumoniae</i><b>IMPORTANCE</b> With antimicrobial drug resistance becoming an increasing burden on human health, much attention has been focused on the potential use of bacteriophages and their enzymes as therapeutics. However, the investigations into the physiology of the complex interactions of bacteriophages with their hosts have attracted far less attention, in comparison. This work describes the molecular characterization of the infectious cycle of a bacteriophage in the important human pathogen <i>Streptococcus pneumoniae</i> and explores the intricate relationship between phase-variable host defense mechanisms and the virus. This is the first report showing how a phase-variable type I restriction-modification system is involved in bacteriophage restriction while it also provides an additional level of infection control through abortive infection.

    Funded by: Medical Research Council: MR/M003078/1; Wellcome Trust

    Journal of bacteriology 2019;201;19

  • Mapping and predicting gene-enhancer interactions.

    Gaffney DJ

    Wellcome Sanger Institute, Hinxton, UK.

    Nature genetics 2019;51;12;1662-1663

  • Resurrection of the ancestral RH5 invasion ligand provides a molecular explanation for the origin of P. falciparum malaria in humans.

    Galaway F, Yu R, Constantinou A, Prugnolle F and Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Sanger Institute, Cambridge, United Kingdom.

    Many important infectious diseases are the result of zoonoses, in which pathogens that normally infect animals acquire mutations that enable the breaching of species barriers to permit the infection of humans. Our understanding of the molecular events that enable host switching are often limited, and yet this is a fundamentally important question. Plasmodium falciparum, the etiological agent of severe human malaria, evolved following a zoonotic transfer of parasites from gorillas. One gene-rh5-which encodes an essential ligand for the invasion of host erythrocytes, is suspected to have played a critical role in this host switch. Genome comparisons revealed an introgressed sequence in the ancestor of P. falciparum containing rh5, which likely allowed the ancestral parasites to infect both gorilla and human erythrocytes. To test this hypothesis, we resurrected the ancestral introgressed reticulocyte-binding protein homologue 5 (RH5) sequence and used quantitative protein interaction assays to demonstrate that this ancestral protein could bind the basigin receptor from both humans and gorillas. We also showed that this promiscuous receptor binding phenotype of RH5 was shared with the parasite clade that transferred its genome segment to the ancestor of P. falciparum, while the other lineages exhibit host-specific receptor binding, confirming the central importance of this introgression event for Plasmodium host switching. Finally, since its transfer to humans, P. falciparum, and also the RH5 ligand, have evolved a strong human specificity. We show that this subsequent restriction to humans can be attributed to a single amino acid mutation in the RH5 sequence. Our findings reveal a molecular pathway for the origin and evolution of human P. falciparum malaria and may inform molecular surveillance to predict future zoonoses.

    Funded by: Wellcome Trust: 206194

    PLoS biology 2019;17;10;e3000490

  • Establishment of porcine and human expanded potential stem cells.

    Gao X, Nowak-Imialek M, Chen X, Chen D, Herrmann D, Ruan D, Chen ACH, Eckersley-Maslin MA, Ahmad S, Lee YL, Kobayashi T, Ryan D, Zhong J, Zhu J, Wu J, Lan G, Petkov S, Yang J, Antunes L, Campos LS, Fu B, Wang S, Yong Y, Wang X, Xue SG, Ge L, Liu Z, Huang Y, Nie T, Li P, Wu D, Pei D, Zhang Y, Lu L, Yang F, Kimber SJ, Reik W, Zou X, Shang Z, Lai L, Surani A, Tam PPL, Ahmed A, Yeung WSB, Teichmann SA, Niemann H and Liu P

    School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Stem Cell and Regenerative Medicine Consortium, Pokfulam, Hong Kong.

    We recently derived mouse expanded potential stem cells (EPSCs) from individual blastomeres by inhibiting the critical molecular pathways that predispose their differentiation. EPSCs had enriched molecular signatures of blastomeres and possessed developmental potency for all embryonic and extra-embryonic cell lineages. Here, we report the derivation of porcine EPSCs, which express key pluripotency genes, are genetically stable, permit genome editing, differentiate to derivatives of the three germ layers in chimeras and produce primordial germ cell-like cells in vitro. Under similar conditions, human embryonic stem cells and induced pluripotent stem cells can be converted, or somatic cells directly reprogrammed, to EPSCs that display the molecular and functional attributes reminiscent of porcine EPSCs. Importantly, trophoblast stem-cell-like cells can be generated from both human and porcine EPSCs. Our pathway-inhibition paradigm thus opens an avenue for generating mammalian pluripotent stem cells, and EPSCs present a unique cellular platform for translational research in biotechnology and regenerative medicine.

    Funded by: Cancer Research UK: A24843; Medical Research Council: G0801057, MR/M017354/1; Wellcome Trust: 095645, 098051, 203144

    Nature cell biology 2019;21;6;687-699

  • Unconventional forms of inheritance.

    Gapp K

    Gurdon Institute, University of Cambridge, Tennis Court Rd, Cambridge, CB2 1QN, UK; Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK; Institute for Neuroscience, Swiss Federal Institute of Technology, Winterthurerstrasse 190, Zürich, CH-8057, Switzerland. Electronic address:

    Seminars in cell & developmental biology 2019;97;84-85

  • Infection by Brucella melitensis or Brucella papionis modifies essential physiological functions of human trophoblasts.

    García-Méndez KB, Hielpos SM, Soler-Llorens PF, Arce-Gorvel V, Hale C, Gorvel JP, O'Callaghan D and Keriel A

    VBMI, INSERM, Université de Montpellier, Nîmes, France.

    Brucellosis is a zoonosis caused by bacteria of the Brucella genus. In ruminants, brucellosis causes abortion, followed by chronic infection and secretion of bacteria in milk. In humans, it usually presents as flu-like symptoms, with serious complications if untreated. Epidemiological studies have only recently established that brucellosis can also cause pregnancy complications in women, but the pathogenic mechanisms are unknown. Pioneering studies in ruminants showed that Brucella infect trophoblasts and then colonise the placenta where they grow to high density. A recent study showed that the main zoonotic Brucella species can infect human cytotrophoblasts (CTB) and extravillous trophoblasts (EVT). In this work, we show that Brucella papionis (associated with stillbirth in primates) also infects human trophoblasts. However, it replicates actively in CTB, whereas its replication is very restricted within EVT. We also observed alteration of several trophoblastic functions upon infection by B. papionis or Brucella melitensis (the most prevalent species in human brucellosis). Infection altered the production of hormones, the ability of CTB to form syncytiotrophoblasts, and the invasion capacity of EVT. We also found that infection can spread between different types of trophoblasts. These findings constitute a new step in understanding how Brucella infection causes adverse pregnancy outcomes.

    Funded by: Agence Nationale de Santé Publique; Fondation Méditerranée Infection; Institut National de la Santé et de la Recherche Médicale; Université de Montpellier

    Cellular microbiology 2019;21;7;e13019

  • Contribution of retrotransposition to developmental disorders.

    Gardner EJ, Prigmore E, Gallone G, Danecek P, Samocha KE, Handsaker J, Gerety SS, Ironfield H, Short PJ, Sifrim A, Singh T, Chandler KE, Clement E, Lachlan KL, Prescott K, Rosser E, FitzPatrick DR, Firth HV and Hurles ME

    Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, UK.

    Mobile genetic Elements (MEs) are segments of DNA which can copy themselves and other transcribed sequences through the process of retrotransposition (RT). In humans several disorders have been attributed to RT, but the role of RT in severe developmental disorders (DD) has not yet been explored. Here we identify RT-derived events in 9738 exome sequenced trios with DD-affected probands. We ascertain 9 de novo MEs, 4 of which are likely causative of the patient's symptoms (0.04%), as well as 2 de novo gene retroduplications. Beyond identifying likely diagnostic RT events, we estimate genome-wide germline ME mutation rate and selective constraint and demonstrate that coding RT events have signatures of purifying selection equivalent to those of truncating mutations. Overall, our analysis represents a comprehensive interrogation of the impact of retrotransposition on protein coding genes and a framework for future evolutionary and disease studies.

    Nature communications 2019;10;1;4630

  • Conservation of the Threatened Species, Pulsatilla vulgaris Mill. (Pasqueflower), is Aided by Reproductive System and Polyploidy.

    Gargiulo R, Worswick G, Arnold C, Pike LJ, Cowan RS, Hardwick KA, Chapman T and Fay MF

    Royal Botanic Gardens, Kew, Richmond, UK.

    Population loss due to habitat disturbance is a major concern in biodiversity conservation. Here we investigate the genetic causes of the demographic decline observed in English populations of Pulsatilla vulgaris and the consequences for conservation. Using 10 nuclear microsatellite markers, we compare genetic variation in wild populations with restored and seed-regenerated populations (674 samples). Emergence of genetic structure and loss of allelic variation in natural populations are not as evident as expected from demographic trends. Restored populations show genetic variation comparable to their source populations and, in general, to the wild ones. Genetic homogeneity is observed in regeneration trials, although some alleles not captured in source populations are detected. We infer that polyploidy, longevity, and clonal reproduction have provided P. vulgaris with the standing genetic variation necessary to make the species resilient to the effects of demographic decline, suggesting that the use of multiple sources for reintroduction may be beneficial to mimic natural gene flow and the availability of multiple allele copies typical of polyploid species.

    The Journal of heredity 2019;110;5;618-628

  • Clinical spectrum of POLR3-related leukodystrophy caused by biallelic POLR1C pathogenic variants.

    Gauquelin L, Cayami FK, Sztriha L, Yoon G, Tran LT, Guerrero K, Hocke F, van Spaendonk RML, Fung EL, D'Arrigo S, Vasco G, Thiffault I, Niyazov DM, Person R, Lewis KS, Wassmer E, Prescott T, Fallon P, McEntagart M, Rankin J, Webster R, Philippi H, van de Warrenburg B, Timmann D, Dixit A, Searle C, DDD Study,, Thakur N, Kruer MC, Sharma S, Vanderver A, Tonduti D, van der Knaap MS, Bertini E, Goizet C, Fribourg S, Wolf NI and Bernard G

    Department of Neurology and Neurosurgery (L.G., L.T.T., K.G., G.B.), McGill University, Montreal, Canada; Department of Pediatrics (L.G., L.T.T., K.G., G.B.), McGill University, Montreal, Canada; Division of Clinical and Metabolic Genetics and Division of Neurology (L.G., G.Y.), The Hospital for Sick Children, University of Toronto, Toronto, Canada; Department of Child Neurology (F.K.C., M.S.V.D.K., N.I.W.), Emma Children's Hospital, Amsterdam University Medical Centers, Vrije Universiteit Amsterdam, and Amsterdam Neuroscience, Amsterdam, The Netherlands; Department of Clinical Genetics (F.K.C., R.M.V.S.), VU University Medical Center, Amsterdam, The Netherlands; Department of Human Genetics (F.K.C.), Center for Biomedical Research, Diponegoro University, Semarang, Indonesia; Department of Pediatrics (L.S.), Faculty of Medicine, University of Szeged, Szeged, Hungary; Child Health and Human Development Program (L.T.T., K.G., G.B.), Research Institute of the McGill University Health Center, Montreal, Canada; Division of Medical Genetics, Department of Specialized Medicine (L.T.T., K.G., G.B.), McGill University Health Center, Montreal, Canada; Centre de Référence Neurogénétique (F.H., C.G.), Service de Génétique, CHU Bordeaux, Bordeaux, France; Department of Pediatrics (E.L.F.), Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, China; Developmental Neurology Department (S.D.A.), Fondazione IRCCS Istituto Neurologico C. Besta, Milan, Italy; Neuroscience and Neurorehabilitation Department (G.V.), Bambino Gesu Children's Hospital, Rome, Italy; Center for Pediatric Genomic Medicine (I.T.), Children's Mercy Hospitals and Clinics, Kansas City, MO; University of Missouri-Kansas City School of Medicine (I.T.), Kansas City, MO; Department of Pathology and Laboratory Medicine (I.T.), Children's Mercy Hospitals, Kansas City, MO; Department of Pediatrics (D.M.N.), Section of Medical Genetics, Ochsner for Children, New Orleans, LA; GeneDx (R.P.), Gaithersburg, MD; Division of Neurology (K.S.L.), Barrow Neurological Institute, Phoenix Children's Hospital, Phoenix, AZ; Department of Pediatric Neurology (E.W.), Birmingham Children's Hospital, Birmingham, United Kingdom; Department of Medical Genetics (T.P.), Telemark Hospital, Skien, Norway; Department of Paediatric Neurology (P.F.), St Georges University Hospital NHS Foundation Trust, London, United Kingdom; Clinical Genetics Service (M.M.), St George's University Hospitals NHS Foundation Trust, London, United Kingdom; Clinical Genetics Department (J.R.), Royal Devon and Exeter Hospital NHS Trust, Exeter, United Kingdom; Department of Neurology and Neurosurgery (R.W.), The Children's Hospital at Westmead, Westmead, New South Wales, Australia; Center of Developmental Neurology (H.P.), Frankfurt, Germany; Department of Neurology (B.V.D.W.), Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands; Department of Neurology (D.T.), Essen University Hospital, University of Duisburg-Essen, Essen, Germany; Department of Clinical Genetics (A.D., C.S.), Nottingham University Hospitals NHS Trust, Nottingham, United Kingdom; Wellcome Sanger Institute (DDD Study), Wellcome Genome Campus, Cambridge, United Kingdom; Department of Pediatrics (N.T.), Division of Child Neurology, University of Texas Health Science Center, Houston, TX, United States of America; Movement Disorders Center and Neurogenetics Research Program (M.C.K.), Barrow Neurological Institute, Phoenix Children's Hospital, Phoenix, AZ; Program in Neuroscience (M.C.K.), Arizona State University, Tempe, AZ, United States of America; Division of Neurology (S.S.), Department of Pediatrics, Lady Hardinge Medical College and Associated Kalawati Saran Children's Hospital, New Delhi, India; Division of Neurology (A.V.), Children's Hospital of Philadelphia, Philadelphia, PA; Department of Neurology (A.V.), Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States of America; Department of Child Neurology (D.T.), Neurological Institute C. Besta Foundation IRCCS, Milan, Italy; Department of Functional Genomics (M.S.V.D.K.), VU University, Amsterdam, The Netherlands; Unit of Neuromuscular and Neurodegenerative Disorders (E.B.), Laboratory of Molecular Medicine, Bambino Gesu Children's Hospital, Rome, Italy; Laboratoire MRGM, INSERM U1211, University Bordeaux, Bordeaux, France; Université de Bordeaux (S.F.), INSERM U1212, CNRS 5320, Bordeaux, France; and Department of Human Genetics (G.B.), McGill University, Montreal, Canada.

    Objective: To determine the clinical, radiologic, and molecular characteristics of RNA polymerase III-related leukodystrophy (POLR3-HLD) caused by biallelic <i>POLR1C</i> pathogenic variants.

    Methods: A cross-sectional observational study involving 25 centers worldwide was conducted. Clinical and molecular information was collected on 23 unreported and previously reported patients with POLR3-HLD and biallelic pathogenic variants in <i>POLR1C</i>. Brain MRI studies were reviewed.

    Results: Fourteen female and 9 male patients aged 7 days to 23 years were included in the study. Most participants presented early in life (birth to 6 years), and motor deterioration was seen during childhood. A notable proportion of patients required a wheelchair before adolescence, suggesting a more severe phenotype than previously described in POLR3-HLD. Dental, ocular, and endocrine features were not invariably present (70%, 50%, and 50%, respectively). Five patients (22%) had a combination of hypomyelinating leukodystrophy and abnormal craniofacial development, including 1 individual with clear Treacher Collins syndrome (TCS) features. Brain MRI revealed hypomyelination in all cases, often with areas of pronounced T2 hyperintensity corresponding to T1 hypointensity of the white matter. Twenty-nine different pathogenic variants (including 12 new disease-causing variants) in <i>POLR1C</i> were identified.

    Conclusions: This study provides a comprehensive description of POLR3-HLD caused by biallelic <i>POLR1C</i> pathogenic variants based on the largest cohort of patients to date. These results suggest distinct characteristics of POLR1C-related disorder, with a spectrum of clinical involvement characterized by hypomyelinating leukodystrophy with or without abnormal craniofacial development reminiscent of TCS.

    Neurology. Genetics 2019;5;6;e369

  • Extending the clinical and genetic spectrum of ARID2 related intellectual disability. A case series of 7 patients.

    Gazdagh G, Blyth M, Scurr I, Turnpenny PD, Mehta SG, Armstrong R, McEntagart M, Newbury-Ecob R, Tobias ES, DDD Study and Joss S

    West of Scotland Regional Genetics Service, Laboratory Medicine Building, Queen Elizabeth University Hospital, Glasgow, United Kingdom. Electronic address:

    In the last 3 years de novo sequence variants in the ARID2 (AT-rich interaction domain 2) gene, a subunit of the SWI/SNF complex, have been linked to intellectual disabilities in 3 case reports including one which describes frameshift mutations in ARID2 in 2 patients with features resembling Coffin-Siris syndrome. Coffin-Siris syndrome (CSS) is a rare congenital syndrome characterized by intellectual deficit, coarse facial features and hypoplastic or absent fifth fingernails and/or toenails among other features. Mutations in a number of different genes encoding SWI/SNF chromatin remodelling complex proteins have been described but the underlying molecular cause remains unknown in approximately 40% of patients with CSS. Here we describe 7 unrelated individuals, 2 with deletions of the ARID2 region and 5 with de novo truncating mutations in the ARID2 gene. Similarities to CSS are evident. Although hypertrichosis and hypoplasia of the fifth finger nail and distal phalanx do not appear to be common in these patients, toenail hypoplasia and the presence of Wormian bones might support the involvement of ARID2.

    Funded by: Medical Research Council: MR/N005813/1

    European journal of medical genetics 2019;62;1;27-34

  • Cardiac disorders and structural brain abnormalities are commonly associated with hypospadias in children with neurodevelopmental disorders.

    Gazdagh GE, Wang C, McGowan R, Tobias ES, Ahmed SF and DDD Study

    Developmental Endocrinology Research Group.

    The objective of our study was to use an established cohort of boys to investigate common patterns of malformations in those with hypospadias. We performed a retrospective review of the phenotype of participants in the Deciphering Developmental Disorders Study with neurodevelopmental delay and an 'Abnormality of the genital system'. This group was divided into two subgroups: those with hypospadias and without hypospadias. Associated phenotypes of the two subgroups were compared and analysed. Of the 166 Deciphering Developmental Disorders participants with hypospadias and neurodevelopmental delay, 47 (28%) had cardiovascular and 40 (24%) had structural brain abnormalities. The rate of cardiovascular abnormalities in those with neurodevelopmental delay and genital abnormalities other than hypospadias (N = 645) was lower at 19% (P = 0.001). In addition, structural brain malformations were higher at 24% in the hypospadias group versus 15% in the group without hypospadias (P = 0.002). The constellation of these features occured at a higher rate in the hypospadias group versus the no hypospadias group (P = 0.038). In summary, this is the first study to indicate that cardiovascular and brain abnormalities are frequently encountered in association with hypospadias in children with neurodevelopmental disorders. Not only do these associations provide insight into the underlying aetiology but also they highlight the multisystem involvement in conditions with hypospadias.

    Funded by: Medical Research Council: MR/N005813/1

    Clinical dysmorphology 2019;28;3;114-119

  • Targeted Next Generation Sequencing for malaria research in Africa: current status and outlook.

    Ghansah A, Kamau E, Amambua-Ngwa A, Ishengoma DS, Maiga-Ascofare O, Amenga-Etego L, Deme A, Yavo W, Randrianarivelojosia M, Plasmodium Diversity Network Africa, Ochola-Oyier LI, Helegbe GK, Bailey J, Alifrangis M and Djimde A

    Noguchi Memorial Institute for Medical Research, College of Health Sciences, University of Ghana, P. O. Box LG 581, Legon, Ghana.

    Targeted Next Generation Sequencing (TNGS) is an efficient and economical Next Generation Sequencing (NGS) platform and the preferred choice when specific genomic regions are of interest. So far, only institutions located in middle and high-income countries have developed and implemented the technology, however, the efficiency and cost savings, as opposed to more traditional sequencing methodologies (e.g. Sanger sequencing) make the approach potentially well suited for resource-constrained regions as well. In April 2018, scientists from the Plasmodium Diversity Network Africa (PDNA) and collaborators met during the 7th Pan African Multilateral Initiative of Malaria (MIM) conference held in Dakar, Senegal to explore the feasibility of applying TNGS to genetic studies and malaria surveillance in Africa. The group of scientists reviewed the current experience with TNGS platforms in sub-Saharan Africa (SSA) and identified potential roles the technology might play to accelerate malaria research, scientific discoveries and improved public health in SSA. Research funding, infrastructure and human resources were highlighted as challenges that will have to be mitigated to enable African scientists to drive the implementation of TNGS in SSA. Current roles of important stakeholders and strategies to strengthen existing networks to effectively harness this powerful technology for malaria research of public health importance were discussed.

    Funded by: Wellcome Trust: 107740/Z/15/Z

    Malaria journal 2019;18;1;324

  • Investigation of the role of typhoid toxin in acute typhoid fever in a human challenge model.

    Gibani MM, Jones E, Barton A, Jin C, Meek J, Camara S, Galal U, Heinz E, Rosenberg-Hasson Y, Obermoser G, Jones C, Campbell D, Black C, Thomaides-Brears H, Darlow C, Dold C, Silva-Reyes L, Blackwell L, Lara-Tejero M, Jiao X, Stack G, Blohmke CJ, Hill J, Angus B, Dougan G, Galán J and Pollard AJ

    Oxford Vaccine Group, Department of Paediatrics, University of Oxford and the NIHR Oxford Biomedical Research Centre, Oxford, UK.

    Salmonella Typhi is a human host-restricted pathogen that is responsible for typhoid fever in approximately 10.9 million people annually<sup>1</sup>. The typhoid toxin is postulated to have a central role in disease pathogenesis, the establishment of chronic infection and human host restriction<sup>2-6</sup>. However, its precise role in typhoid disease in humans is not fully defined. We studied the role of typhoid toxin in acute infection using a randomized, double-blind S. Typhi human challenge model<sup>7</sup>. Forty healthy volunteers were randomized (1:1) to oral challenge with 10<sup>4</sup> colony-forming units of wild-type or an isogenic typhoid toxin deletion mutant (TN) of S. Typhi. We observed no significant difference in the rate of typhoid infection (fever ≥38 °C for ≥12 h and/or S. Typhi bacteremia) between participants challenged with wild-type or TN S. Typhi (15 out of 21 (71%) versus 15 out of 19 (79%); P = 0.58). The duration of bacteremia was significantly longer in participants challenged with the TN strain compared with wild-type (47.6 hours (28.9-97.0) versus 30.3(3.6-49.4); P ≤ 0.001). The clinical syndrome was otherwise indistinguishable between wild-type and TN groups. These data suggest that the typhoid toxin is not required for infection and the development of early typhoid fever symptoms within the context of a human challenge model. Further clinical data are required to assess the role of typhoid toxin in severe disease or the establishment of bacterial carriage.

    Nature medicine 2019;25;7;1082-1088

  • Very low-depth whole-genome sequencing in complex trait association studies.

    Gilly A, Southam L, Suveges D, Kuchenbaecker K, Moore R, Melloni GEM, Hatzikotoulas K, Farmaki AE, Ritchie G, Schwartzentruber J, Danecek P, Kilian B, Pollard MO, Ge X, Tsafantakis E, Dedoussis G and Zeggini E

    Department of Human Genetics, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

    Motivation: Very low-depth sequencing has been proposed as a cost-effective approach to capture low-frequency and rare variation in complex trait association studies. However, a full characterization of the genotype quality and association power for very low-depth sequencing designs is still lacking.

    Results: We perform cohort-wide whole-genome sequencing (WGS) at low depth in 1239 individuals (990 at 1× depth and 249 at 4× depth) from an isolated population, and establish a robust pipeline for calling and imputing very low-depth WGS genotypes from standard bioinformatics tools. Using genotyping chip, whole-exome sequencing (75× depth) and high-depth (22×) WGS data in the same samples, we examine in detail the sensitivity of this approach, and show that imputed 1× WGS recapitulates 95.2% of variants found by imputed GWAS with an average minor allele concordance of 97% for common and low-frequency variants. In our study, 1× further allowed the discovery of 140 844 true low-frequency variants with 73% genotype concordance when compared to high-depth WGS data. Finally, using association results for 57 quantitative traits, we show that very low-depth WGS is an efficient alternative to imputed GWAS chip designs, allowing the discovery of up to twice as many true association signals than the classical imputed GWAS design.

    Availability and implementation: The HELIC genotype and WGS datasets have been deposited to the European Genome-phenome Archive ( EGAD00010000518; EGAD00010000522; EGAD00010000610; EGAD00001001636, EGAD00001001637. The peakplotter software is available at, the transformPhenotype app can be downloaded at

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Bioinformatics (Oxford, England) 2019;35;15;2555-2561

  • A real-time PCR for specific detection of the Legionella pneumophila serogroup 1 ST1 complex.

    Ginevra C, Chastang J, David S, Mentasti M, Yakunin E, Chalker VJ, Chalifa-Caspi V, Valinsky L, Jarraud S, Moran-Gilad J and ESCMID Study Group for Legionella Infections (ESGLI)

    CIRI, Centre International de Recherche en Infectiologie, Legionella Pathogenesis Team, Univ Lyon, Inserm, U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, ENS de Lyon, F-69007, Lyon, France; National Reference Centre of Legionella, Institute of Infectious Agents, Hospices Civils de Lyon, Lyon, France; ESCMID Study Group for Legionella Infections (ESGLI), Basel, Switzerland.

    Objective: Legionella pneumophila serogroup 1 (Lp1) sequence type (ST) 1 is globally widespread in the environment and accounts for a significant proportion of Legionella infections, including nosocomial Legionnaires' disease (LD). This study aimed to design a sensitive and specific detection method for Lp ST1 that will underpin epidemiological investigations and risk assessment.

    Methods: A total of 628 Lp genomes (126 ST1s) were analyzed by comparative genomics. Interrogation of more than 900 accessory genes revealed seven candidate targets for specific ST1 detection and specific primers and hydrolysis probes were designed and evaluated. The analytical sensitivity and specificity of the seven primer and probe sets were evaluated on serially diluted DNA extracted from the reference strain CIP107629 and via qPCR applied on 200 characterized isolates. The diagnostic performance of the assay was evaluated on 142 culture-proven clinical samples from LD cases and a real-life investigation of a case cluster.

    Results: Of seven qPCR assays that underwent analytical validation, one PCR target (lpp1868) showed higher sensitivity and specificity for ST1 and ST1-like strains. The diagnostic performance of the assay using respiratory samples corresponded to a sensitivity of 95% (19/20) (95% CI (75.1-99.9)) and specificity of 100% (122/122) (95% CI (97-100)). The ST1 PCR assay could link two out of three culture-negative hospitalized LD cases to ST1 during a real-time investigation.

    Conclusion: Using whole genome sequencing (WGS) data, we developed and validated a sensitive and specific qPCR assay for the detection of Lp1 belonging to the ST1 clonal complex by amplification of the lpp1868 gene. The ST1 qPCR is expected to deliver an added value for Lp control and prevention, in conjunction with other recently developed molecular assays.

    Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases 2019

  • International genomic definition of pneumococcal lineages, to contextualise disease, antibiotic resistance and vaccine impact.

    Gladstone RA, Lo SW, Lees JA, Croucher NJ, van Tonder AJ, Corander J, Page AJ, Marttinen P, Bentley LJ, Ochoa TJ, Ho PL, du Plessis M, Cornick JE, Kwambana-Adams B, Benisty R, Nzenze SA, Madhi SA, Hawkins PA, Everett DB, Antonio M, Dagan R, Klugman KP, von Gottberg A, McGee L, Breiman RF, Bentley SD and Global Pneumococcal Sequencing Consortium

    Parasites and microbes, Wellcome Sanger Institute, Hinxton, UK. Electronic address:

    Background: Pneumococcal conjugate vaccines have reduced the incidence of invasive pneumococcal disease, caused by vaccine serotypes, but non-vaccine-serotypes remain a concern. We used whole genome sequencing to study pneumococcal serotype, antibiotic resistance and invasiveness, in the context of genetic background.

    Methods: Our dataset of 13,454 genomes, combined with four published genomic datasets, represented Africa (40%), Asia (25%), Europe (19%), North America (12%), and South America (5%). These 20,027 pneumococcal genomes were clustered into lineages using PopPUNK, and named Global Pneumococcal Sequence Clusters (GPSCs). From our dataset, we additionally derived serotype and sequence type, and predicted antibiotic sensitivity. We then measured invasiveness using odds ratios that relating prevalence in invasive pneumococcal disease to carriage.

    Findings: The combined collections (n = 20,027) were clustered into 621 GPSCs. Thirty-five GPSCs observed in our dataset were represented by >100 isolates, and subsequently classed as dominant-GPSCs. In 22/35 (63%) of dominant-GPSCs both non-vaccine serotypes and vaccine serotypes were observed in the years up until, and including, the first year of pneumococcal conjugate vaccine introduction. Penicillin and multidrug resistance were higher (p < .05) in a subset dominant-GPSCs (14/35, 9/35 respectively), and resistance to an increasing number of antibiotic classes was associated with increased recombination (R<sup>2</sup> = 0.27 p < .0001). In 28/35 dominant-GPSCs, the country of isolation was a significant predictor (p < .05) of its antibiogram (mean misclassification error 0.28, SD ± 0.13). We detected increased invasiveness of six genetic backgrounds, when compared to other genetic backgrounds expressing the same serotype. Up to 1.6-fold changes in invasiveness odds ratio were observed.

    Interpretation: We define GPSCs that can be assigned to any pneumococcal genomic dataset, to aid international comparisons. Existing non-vaccine-serotypes in most GPSCs preclude the removal of these lineages by pneumococcal conjugate vaccines; leaving potential for serotype replacement. A subset of GPSCs have increased resistance, and/or serotype-independent invasiveness.

    Funded by: Medical Research Council: MR/R015600/1; Wellcome Trust

    EBioMedicine 2019;43;338-346

  • Immunology Driven by Large-Scale Single-Cell Sequencing.

    Gomes T, Teichmann SA and Talavera-López C

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

    The immune system encompasses a large degree of phenotypic diversity and plasticity in its cell types, and more is to be uncovered. We argue that large, multiomic datasets of single-cell resolution, in conjunction with improved computational methods, will be essential to resolving immune cell identity. Existing datasets, combined with 'big data' methodologies, can serve as a platform to support future studies in immunology. Technical and analytical advances in multiomics and spatial integration can provide a reference for gene regulation and cellular interactions in spatially structured tissue contexts. We posit that these developments may allow guided functional studies of immune cell populations and lay the groundwork for informed cell engineering and precision medicine.

    Trends in immunology 2019;40;11;1011-1021

  • More than 18,000 effectors in the Legionella genus genome provide multiple, independent combinations for replication in human cells.

    Gomez-Valero L, Rusniok C, Carson D, Mondino S, Pérez-Cobas AE, Rolando M, Pasricha S, Reuter S, Demirtas J, Crumbach J, Descorps-Declere S, Hartland EL, Jarraud S, Dougan G, Schroeder GN, Frankel G and Buchrieser C

    Institut Pasteur, Biologie des Bactéries Intracellulaires, 75724 Paris, France;

    The genus <i>Legionella</i> comprises 65 species, among which <i>Legionella pneumophila</i> is a human pathogen causing severe pneumonia. To understand the evolution of an environmental to an accidental human pathogen, we have functionally analyzed 80 <i>Legionella</i> genomes spanning 58 species. Uniquely, an immense repository of 18,000 secreted proteins encoding 137 different eukaryotic-like domains and over 200 eukaryotic-like proteins is paired with a highly conserved type IV secretion system (T4SS). Specifically, we show that eukaryotic Rho- and Rab-GTPase domains are found nearly exclusively in eukaryotes and <i>Legionella</i> Translocation assays for selected Rab-GTPase proteins revealed that they are indeed T4SS secreted substrates. Furthermore, F-box, U-box, and SET domains were present in >70% of all species, suggesting that manipulation of host signal transduction, protein turnover, and chromatin modification pathways are fundamental intracellular replication strategies for legionellae. In contrast, the Sec-7 domain was restricted to <i>L. pneumophila</i> and seven other species, indicating effector repertoire tailoring within different amoebae. Functional screening of 47 species revealed 60% were competent for intracellular replication in THP-1 cells, but interestingly, this phenotype was associated with diverse effector assemblages. These data, combined with evolutionary analysis, indicate that the capacity to infect eukaryotic cells has been acquired independently many times within the genus and that a highly conserved yet versatile T4SS secretes an exceptional number of different proteins shaped by interdomain gene transfer. Furthermore, we revealed the surprising extent to which legionellae have coopted genes and thus cellular functions from their eukaryotic hosts, providing an understanding of how dynamic reshuffling and gene acquisition have led to the emergence of major human pathogens.

    Proceedings of the National Academy of Sciences of the United States of America 2019

  • Structural rearrangements generate cell-specific, gene-independent CRISPR-Cas9 loss of fitness effects.

    Gonçalves E, Behan FM, Louzada S, Arnol D, Stronach EA, Yang F, Yusa K, Stegle O, Iorio F and Garnett MJ

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

    Background: CRISPR-Cas9 genome editing is widely used to study gene function, from basic biology to biomedical research. Structural rearrangements are a ubiquitous feature of cancer cells and their impact on the functional consequences of CRISPR-Cas9 gene-editing has not yet been assessed.

    Results: Utilizing CRISPR-Cas9 knockout screens for 250 cancer cell lines, we demonstrate that targeting structurally rearranged regions, in particular tandem or interspersed amplifications, is highly detrimental to cellular fitness in a gene-independent manner. In contrast, amplifications caused by whole chromosomal duplication have little to no impact on fitness. This effect is cell line specific and dependent on the ploidy status. We devise a copy-number ratio metric that substantially improves the detection of gene-independent cell fitness effects in CRISPR-Cas9 screens. Furthermore, we develop a computational tool, called Crispy, to account for these effects on a single sample basis and provide corrected gene fitness effects.

    Conclusion: Our analysis demonstrates the importance of structural rearrangements in mediating the effect of CRISPR-Cas9-induced DNA damage, with implications for the use of CRISPR-Cas9 gene-editing in cancer cells.

    Funded by: Cancer Research UK: C44943/A22536; Wellcome Trust: 098051, 206194

    Genome biology 2019;20;1;27

  • Factors Associated With Outcomes of Patients With Primary Sclerosing Cholangitis and Development and Validation of a Risk Scoring System.

    Goode EC, Clark AB, Mells GF, Srivastava B, Spiess K, Gelson WTH, Trivedi PJ, Lynch KD, Castren E, Vesterhus MN, Karlsen TH, Ji SG, Anderson CA, Thorburn D, Hudson M, Heneghan MA, Aldersley MA, Bathgate A, Sandford RN, Alexander GJ, Chapman RW, Walmsley M, UK-PSC Consortium, Hirschfield GM and Rushbrook SM

    Norfolk and Norwich University Hospital, Norwich, United Kingdom.

    We sought to identify factors that are predictive of liver transplantation or death in patients with primary sclerosing cholangitis (PSC), and to develop and validate a contemporaneous risk score for use in a real-world clinical setting. Analyzing data from 1,001 patients recruited to the UK-PSC research cohort, we evaluated clinical variables for their association with 2-year and 10-year outcome through Cox-proportional hazards and C-statistic analyses. We generated risk scores for short-term and long-term outcome prediction, validating their use in two independent cohorts totaling 451 patients. Thirty-six percent of the derivation cohort were transplanted or died over a cumulative follow-up of 7,904 years. Serum alkaline phosphatase of at least 2.4 × upper limit of normal at 1 year after diagnosis was predictive of 10-year outcome (hazard ratio [HR] = 3.05; C = 0.63; median transplant-free survival 63 versus 108 months; P < 0.0001), as was the presence of extrahepatic biliary disease (HR = 1.45; P = 0.01). We developed two risk scoring systems based on age, values of bilirubin, alkaline phosphatase, albumin, platelets, presence of extrahepatic biliary disease, and variceal hemorrhage, which predicted 2-year and 10-year outcomes with good discrimination (C statistic = 0.81 and 0.80, respectively). Both UK-PSC risk scores were well-validated in our external cohort and outperformed the Mayo Clinic and aspartate aminotransferase-to-platelet ratio index (APRI) scores (C statistic = 0.75 and 0.63, respectively). Although heterozygosity for the previously validated human leukocyte antigen (HLA)-DR*03:01 risk allele predicted increased risk of adverse outcome (HR = 1.33; P = 0.001), its addition did not improve the predictive accuracy of the UK-PSC risk scores. Conclusion: Our analyses, based on a detailed clinical evaluation of a large representative cohort of participants with PSC, furthers our understanding of clinical risk markers and reports the development and validation of a real-world scoring system to identify those patients most likely to die or require liver transplantation.

    Funded by: Addenbrooke's Charitable Trust, Cambridge University Hospitals; Department of Health; Health Research; Isaac Newton Trust; National Institute of Health Research; Norwegian PSC Research Center

    Hepatology (Baltimore, Md.) 2019;69;5;2120-2135

  • Bi-allelic Loss-of-Function CACNA1B Mutations in Progressive Epilepsy-Dyskinesia.

    Gorman KM, Meyer E, Grozeva D, Spinelli E, McTague A, Sanchis-Juan A, Carss KJ, Bryant E, Reich A, Schneider AL, Pressler RM, Simpson MA, Debelle GD, Wassmer E, Morton J, Sieciechowicz D, Jan-Kamsteeg E, Paciorkowski AR, King MD, Cross JH, Poduri A, Mefford HC, Scheffer IE, Haack TB, McCullagh G, Deciphering Developmental Disorders Study, UK10K Consortium, NIHR BioResource, Millichap JJ, Carvill GL, Clayton-Smith J, Maher ER, Raymond FL and Kurian MA

    Molecular Neurosciences, Developmental Neurosciences, UCL Great Ormond Street Institute of Child Health, London WC1N 1EH, UK; Department of Neurology, Great Ormond Street Hospital, London WC1N 3JH, UK.

    The occurrence of non-epileptic hyperkinetic movements in the context of developmental epileptic encephalopathies is an increasingly recognized phenomenon. Identification of causative mutations provides an important insight into common pathogenic mechanisms that cause both seizures and abnormal motor control. We report bi-allelic loss-of-function CACNA1B variants in six children from three unrelated families whose affected members present with a complex and progressive neurological syndrome. All affected individuals presented with epileptic encephalopathy, severe neurodevelopmental delay (often with regression), and a hyperkinetic movement disorder. Additional neurological features included postnatal microcephaly and hypotonia. Five children died in childhood or adolescence (mean age of death: 9 years), mainly as a result of secondary respiratory complications. CACNA1B encodes the pore-forming subunit of the pre-synaptic neuronal voltage-gated calcium channel Ca<sub>v</sub>2.2/N-type, crucial for SNARE-mediated neurotransmission, particularly in the early postnatal period. Bi-allelic loss-of-function variants in CACNA1B are predicted to cause disruption of Ca<sup>2+</sup> influx, leading to impaired synaptic neurotransmission. The resultant effect on neuronal function is likely to be important in the development of involuntary movements and epilepsy. Overall, our findings provide further evidence for the key role of Ca<sub>v</sub>2.2 in normal human neurodevelopment.

    Funded by: British Heart Foundation: RG/10/17/28553; Department of Health: RP-2016-07-019; Medical Research Council: MR/L010305/1; NINDS NIH HHS: R00 NS089858, R01 NS069605; Wellcome Trust

    American journal of human genetics 2019;104;5;948-956

  • Detection of vancomycin-resistant Enterococcus faecium hospital-adapted lineages in municipal wastewater treatment plants indicates widespread distribution and release into the environment.

    Gouliouris T, Raven KE, Moradigaravand D, Ludden C, Coll F, Blane B, Naydenova P, Horner C, Brown NM, Corander J, Limmathurotsakul D, Parkhill J and Peacock SJ

    Department of Medicine, University of Cambridge, Cambridge CB2 0QQ, United Kingdom.

    Vancomycin-resistant <i>Enterococcus faecium</i> (VREfm) is a leading cause of healthcare-associated infection. Reservoirs of VREfm are largely assumed to be nosocomial although there is a paucity of data on alternative sources. Here, we describe an integrated epidemiological and genomic analysis of <i>E. faecium</i> associated with bloodstream infection and isolated from wastewater. Treated and untreated wastewater from 20 municipal treatment plants in the East of England, United Kingdom was obtained and cultured to isolate <i>E. faecium</i>, ampicillin-resistant <i>E. faecium</i> (AREfm), and VREfm. VREfm was isolated from all 20 treatment plants and was released into the environment by 17/20 plants, the exceptions using terminal ultraviolet light disinfection. Median log<sub>10</sub> counts of AREfm and VREfm in untreated wastewater from 10 plants in direct receipt of hospital sewage were significantly higher than 10 plants that were not. We sequenced and compared the genomes of 423 isolates from wastewater with 187 isolates associated with bloodstream infection at five hospitals in the East of England. Among 481 <i>E. faecium</i> isolates belonging to the hospital-adapted clade, we observed genetic intermixing between wastewater and bloodstream infection, with highly related isolates shared between a major teaching hospital in the East of England and 9/20 plants. We detected 28 antibiotic resistance genes in the hospital-adapted clade, of which 23 were represented in bloodstream, hospital sewage, and municipal wastewater isolates. We conclude that our findings are consistent with widespread distribution of hospital-adapted VREfm beyond acute healthcare settings with extensive release of VREfm into the environment in the East of England.

    Funded by: Department of Health: HICF-T5-342; Wellcome Trust: 103387/Z/13/Z, 110243/Z/15/Z, 201344/Z/16/Z, WT098600

    Genome research 2019;29;4;626-634

  • Infection with carcinogenic helminth parasites and its production of metabolites induces the formation of DNA-adducts.

    Gouveia MJ, Brindley PJ, Rinaldi G, Gärtner F, da Costa JMC and Vale N

    1Center for the Study of Animal Science, CECA-ICETA, University of Porto, Praça Gomes Teixeira, Apartado 55142, 4051-401 Porto, Portugal.

    Background: Infections classified as group 1 biological carcinogens include the helminthiases caused by <i>Schistosoma haematobium</i> and <i>Opisthorchis viverrini</i>. The molecular mediators underlying the infection with these parasites and cancer remain unclear. Although carcinogenesis is a multistep process, we have postulated that these parasites release metabolites including oxysterols and estrogen-like metabolites that interact with host cell DNA<i>.</i> How and why the parasite produce/excrete these metabolites remain unclear. A gene encoding a CYP enzyme was identified in schistosomes and opisthorchiids. Therefore, it is reasonable hypothesized that CYP 450 might play a role in generation of pro-inflammatory and potentially carcinogenic compounds produced by helminth parasites such as oxysterols and catechol estrogens. Here, we performed enzymatic assays using several isoforms of CYP 450 as CYP1A1, 2E1 and 3A4 which are involved in the metabolism of chemical carcinogens that have been associated with several cancer. The main aim was the analysis of the role of these enzymes in production of helminth-associated metabolites and DNA-adducts.

    Method: The effect of cytochrome P450 enzymes CYP 1A1, 2E1 and 3A4 during the interaction between DNA, glycocholic acid and taurochenodeoxycholate sodium on the formation of DNA-adducts and metabolites associated with urogenital schistosomiasis (UGS) and opisthorchiasis was investigated <i>in vitro</i>. Liquid chromatography/mass spectrometry was used to detect and identify metabolites.

    Main findings: Through the enzymatic assays we provide a deeper understanding of how metabolites derived from helminths are formed and the influence of CYP 450. The assays using compounds similar to those previously observed in helminths as glycocholic acid and taurochenodeoxycholate sodium, allowed the detection of metabolites in their oxidized form and their with DNA. Remarkably, these metabolites were previously associated with schistosomiaisis and opisthorchiasis. Thus, in the future, it may be possible to synthesize this type of metabolites through this methodology and use them in cell lines to clarify the carcinogenesis process associated with these diseases.

    Principal conclusions: Metabolites similar to those detected in helminths are able to interact with DNA in vitro leading to the formation of DNA adducts. These evidences supported the previous postulate that imply helminth-like metabolites as initiators of helminthiases-associated carcinogenesis. Nonetheless, studies including these kinds of metabolites and cell lines in order to evaluate its potential carcinogenic are required.

    Infectious agents and cancer 2019;14;41

  • Wave 2 strains of atypical Vibrio cholerae El Tor caused the 2009-2011 cholera outbreak in Papua New Guinea.

    Greenhill AR, Mutreja A, Bulach D, Belousoff MJ, Jonduo MH, Collins DA, Kas MP, Wapling J, Seemann T, Lafana A, Dougan G, Brown MV and Horwood PF

    2​School of Health and Life Sciences, Federation University Australia, Churchill, Australia.

    Vibrio cholerae is the causative agent of cholera, a globally important human disease for at least 200 years. In 2009-2011, the first recorded cholera outbreak in Papua New Guinea (PNG) occurred. We conducted genetic and phenotypic characterization of 21 isolates of V. cholerae, with whole-genome sequencing conducted on 2 representative isolates. The PNG outbreak was caused by an atypical El Tor strain harbouring a tandem repeat of the CTX prophage on chromosome II. Whole-genome sequence data, prophage structural analysis and the absence of the SXT integrative conjugative element was indicative that the PNG isolates were most closely related to strains previously isolated in South-East and East Asia with affiliations to global wave 2 strains. This finding suggests that the cholera outbreak in PNG was caused by an exotic (non-endemic) strain of V. cholerae that originated in South-East Asia.

    Microbial genomics 2019;5;3

  • RTNsurvival: an R/Bioconductor package for regulatory network survival analysis.

    Groeneveld CS, Chagas VS, Jones SJM, Robertson AG, Ponder BAJ, Meyer KB and Castro MAA

    Bioinformatics and Systems Biology Lab, Federal University of Paraná, Curitiba, Brazil.

    Motivation: Transcriptional networks are models that allow the biological state of cells or tumours to be described. Such networks consist of connected regulatory units known as regulons, each comprised of a regulator and its targets. Inferring a transcriptional network can be a helpful initial step in characterizing the different phenotypes within a cohort. While the network itself provides no information on molecular differences between samples, the per-sample state of each regulon, i.e. the regulon activity, can be used for describing subtypes in a cohort. Integrating regulon activities with clinical data and outcomes would extend this characterization of differences between subtypes.

    Results: We describe RTNsurvival, an R/Bioconductor package that calculates regulon activity profiles using transcriptional networks reconstructed by the RTN package, gene expression data, and a two-tailed Gene Set Enrichment Analysis. Given regulon activity profiles across a cohort, RTNsurvival can perform Kaplan-Meier analyses and Cox Proportional Hazards regressions, while also considering confounding variables. The Supplementary Information provides two case studies that use data from breast and liver cancer cohorts and features uni- and multivariate regulon survival analysis.

    Availability and implementation: RTNsurvival is written in the R language, and is available from the Bioconductor project at

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Bioinformatics (Oxford, England) 2019;35;21;4488-4489

  • Identification of common genetic risk variants for autism spectrum disorder.

    Grove J, Ripke S, Als TD, Mattheisen M, Walters RK, Won H, Pallesen J, Agerbo E, Andreassen OA, Anney R, Awashti S, Belliveau R, Bettella F, Buxbaum JD, Bybjerg-Grauholm J, Bækvad-Hansen M, Cerrato F, Chambert K, Christensen JH, Churchhouse C, Dellenvall K, Demontis D, De Rubeis S, Devlin B, Djurovic S, Dumont AL, Goldstein JI, Hansen CS, Hauberg ME, Hollegaard MV, Hope S, Howrigan DP, Huang H, Hultman CM, Klei L, Maller J, Martin J, Martin AR, Moran JL, Nyegaard M, Nærland T, Palmer DS, Palotie A, Pedersen CB, Pedersen MG, dPoterba T, Poulsen JB, Pourcain BS, Qvist P, Rehnström K, Reichenberg A, Reichert J, Robinson EB, Roeder K, Roussos P, Saemundsen E, Sandin S, Satterstrom FK, Davey Smith G, Stefansson H, Steinberg S, Stevens CR, Sullivan PF, Turley P, Walters GB, Xu X, Autism Spectrum Disorder Working Group of the Psychiatric Genomics Consortium, BUPGEN, Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium, 23andMe Research Team, Stefansson K, Geschwind DH, Nordentoft M, Hougaard DM, Werge T, Mors O, Mortensen PB, Neale BM, Daly MJ and Børglum AD

    The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark.

    Autism spectrum disorder (ASD) is a highly heritable and heterogeneous group of neurodevelopmental phenotypes diagnosed in more than 1% of children. Common genetic variants contribute substantially to ASD susceptibility, but to date no individual variants have been robustly associated with ASD. With a marked sample-size increase from a unique Danish population resource, we report a genome-wide association meta-analysis of 18,381 individuals with ASD and 27,969 controls that identified five genome-wide-significant loci. Leveraging GWAS results from three phenotypes with significantly overlapping genetic architectures (schizophrenia, major depression, and educational attainment), we identified seven additional loci shared with other traits at equally strict significance levels. Dissecting the polygenic architecture, we found both quantitative and qualitative polygenic heterogeneity across ASD subtypes. These results highlight biological insights, particularly relating to neuronal function and corticogenesis, and establish that GWAS performed at scale will be much more productive in the near term in ASD.

    Funded by: NIDDK NIH HHS: K01 DK114379; NIH HHS: S10 OD018164; NIMH NIH HHS: R00 MH113823, R01 MH097849, R56 MH097849, U01 MH094432, U01 MH109514, U01 MH111661; Wellcome Trust

    Nature genetics 2019;51;3;431-444

  • RNAmut: robust identification of somatic mutations in acute myeloid leukemia using RNA-sequencing.

    Gu M, Zwiebel M, Ong SH, Boughton N, Nomdedeu J, Basheer F, Nannya Y, Quiros PM, Ogawa S, Cazzola M, Rad R, Butler AP, Vijayabaskar MS and Vassiliou GS

    Haematological Cancer Genetics, Wellcome Sanger Institute, Hinxton, Cambridge, UK.

    Funded by: Medical Research Council: MC_PC_12009; Wellcome Trust

    Haematologica 2019;105;6;e290-e293

  • Quantitative Proteome Landscape of the NCI-60 Cancer Cell Lines.

    Guo T, Luna A, Rajapakse VN, Koh CC, Wu Z, Liu W, Sun Y, Gao H, Menden MP, Xu C, Calzone L, Martignetti L, Auwerx C, Buljan M, Banaei-Esfahani A, Ori A, Iskar M, Gillet L, Bi R, Zhang J, Zhang H, Yu C, Zhong Q, Varma S, Schmitt U, Qiu P, Zhang Q, Zhu Y, Wild PJ, Garnett MJ, Bork P, Beck M, Liu K, Saez-Rodriguez J, Elloumi F, Reinhold WC, Sander C, Pommier Y and Aebersold R

    Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, Zhejiang, P. R. China; Guomics Laboratory of Proteomic Big Data, Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou 310024, Zhejiang Province, China; Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland. Electronic address:

    Here we describe a proteomic data resource for the NCI-60 cell lines generated by pressure cycling technology and SWATH mass spectrometry. We developed the DIA-expert software to curate and visualize the SWATH data, leading to reproducible detection of over 3,100 SwissProt proteotypic proteins and systematic quantification of pathway activities. Stoichiometric relationships of interacting proteins for DNA replication, repair, the chromatin remodeling NuRD complex, β-catenin, RNA metabolism, and prefoldins are more evident than that at the mRNA level. The data are available in CellMiner ( and, allowing casual users to test hypotheses and perform integrative, cross-database analyses of multi-omic drug response correlations for over 20,000 drugs. We demonstrate the value of proteome data in predicting drug response for over 240 clinically relevant chemotherapeutic and targeted therapies. In summary, we present a novel proteome resource for the NCI-60, together with relevant software tools, and demonstrate the benefit of proteome analyses.

    iScience 2019;21;664-680

  • Genomics of disease risk in globally diverse populations.

    Gurdasani D, Barroso I, Zeggini E and Sandhu MS

    Department of Human Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Risk of disease is multifactorial and can be shaped by socio-economic, demographic, cultural, environmental and genetic factors. Our understanding of the genetic determinants of disease risk has greatly advanced with the advent of genome-wide association studies (GWAS), which detect associations between genetic variants and complex traits or diseases by comparing populations of cases and controls. However, much of this discovery has occurred through GWAS of individuals of European ancestry, with limited representation of other populations, including from Africa, The Americas, Asia and Oceania. Population demography, genetic drift and adaptation to environments over thousands of years have led globally to the diversification of populations. This global genomic diversity can provide new opportunities for discovery and translation into therapies, as well as a better understanding of population disease risk. Large-scale multi-ethnic and representative biobanks and population health resources provide unprecedented opportunities to understand the genetic determinants of disease on a global scale.

    Funded by: Medical Research Council: G0801566 , G0901213, G0901213-92157, MR/K013491/1; Wellcome Trust: WT206194

    Nature reviews. Genetics 2019;20;9;520-535

  • Uganda Genome Resource Enables Insights into Population History and Genomic Discovery in Africa.

    Gurdasani D, Carstensen T, Fatumo S, Chen G, Franklin CS, Prado-Martinez J, Bouman H, Abascal F, Haber M, Tachmazidou I, Mathieson I, Ekoru K, DeGorter MK, Nsubuga RN, Finan C, Wheeler E, Chen L, Cooper DN, Schiffels S, Chen Y, Ritchie GRS, Pollard MO, Fortune MD, Mentzer AJ, Garrison E, Bergström A, Hatzikotoulas K, Adeyemo A, Doumatey A, Elding H, Wain LV, Ehret G, Auer PL, Kooperberg CL, Reiner AP, Franceschini N, Maher D, Montgomery SB, Kadie C, Widmer C, Xue Y, Seeley J, Asiki G, Kamali A, Young EH, Pomilla C, Soranzo N, Zeggini E, Pirie F, Morris AP, Heckerman D, Tyler-Smith C, Motala AA, Rotimi C, Kaleebu P, Barroso I and Sandhu MS

    William Harvey Research Institute, Queen Mary's University of London, London, UK.

    Genomic studies in African populations provide unique opportunities to understand disease etiology, human diversity, and population history. In the largest study of its kind, comprising genome-wide data from 6,400 individuals and whole-genome sequences from 1,978 individuals from rural Uganda, we find evidence of geographically correlated fine-scale population substructure. Historically, the ancestry of modern Ugandans was best represented by a mixture of ancient East African pastoralists. We demonstrate the value of the largest sequence panel from Africa to date as an imputation resource. Examining 34 cardiometabolic traits, we show systematic differences in trait heritability between European and African populations, probably reflecting the differential impact of genes and environment. In a multi-trait pan-African GWAS of up to 14,126 individuals, we identify novel loci associated with anthropometric, hematological, lipid, and glycemic traits. We find that several functionally important signals are driven by Africa-specific variants, highlighting the value of studying diverse populations across the region.

    Funded by: Intramural NIH HHS: ZIA HG200362; Medical Research Council: G0901213, MC_EX_MR/L016273/1, MR/K013491/1; NHGRI NIH HHS: U41 HG006941; NHLBI NIH HHS: R21 HL123677, R21 HL140385; NIDDK NIH HHS: R01 DK117445, R56 DK104806; NIMH NIH HHS: U01 MH115485; NIMHD NIH HHS: R01 MD012765; Wellcome Trust

    Cell 2019;179;4;984-1002.e36

  • Genetic and Phenotypic Characterization of the Etiological Agent of Canine Orchiepididymitis Smooth Brucella sp. BCCN84.3.

    Guzmán-Verri C, Suárez-Esquivel M, Ruíz-Villalobos N, Zygmunt MS, Gonnet M, Campos E, Víquez-Ruiz E, Chacón-Díaz C, Aragón-Aranda B, Conde-Álvarez R, Moriyón I, Blasco JM, Muñoz PM, Baker KS, Thomson NR, Cloeckaert A and Moreno E

    Programa de Investigación en Enfermedades Tropicales (PIET), Escuela de Medicina Veterinaria, Universidad Nacional, Heredia, Costa Rica.

    Members of the genus <i>Brucella</i> cluster in two phylogenetic groups: classical and non-classical species. The former group is composed of <i>Brucella</i> species that cause disease in mammals, including humans. A <i>Brucella</i> species, labeled as <i>Brucella</i> sp. BCCN84.3, was isolated from the testes of a Saint Bernard dog suffering orchiepididymitis, in Costa Rica. Following standard microbiological methods, the bacterium was first defined as "<i>Brucella melitensis</i> biovar 2." Further molecular typing, identified the strain as an atypical "<i>Brucella suis</i>." Distinctive <i>Brucella</i> sp. BCCN84.3 markers, absent in other <i>Brucella</i> species and strains, were revealed by fatty acid methyl ester analysis, high resolution melting PCR and <i>omp25</i> and <i>omp2a/omp2b</i> gene diversity. Analysis of multiple loci variable number of tandem repeats and whole genome sequencing demonstrated that this isolate was different from the currently described <i>Brucella</i> species. The smooth <i>Brucella</i> sp. BCCN84.3 clusters together with the classical <i>Brucella</i> clade and displays all the genes required for virulence. <i>Brucella</i> sp. BCCN84.3 is a <i>species nova</i> taxonomical entity displaying pathogenicity; therefore, relevant for differential diagnoses in the context of brucellosis. Considering the debate on the <i>Brucella</i> species concept, there is a need to describe the extant taxonomical entities of these pathogens in order to understand the dispersion and evolution.

    Funded by: Wellcome Trust

    Frontiers in veterinary science 2019;6;175

  • Genome-wide association study in Finnish twins highlights the connection between nicotine addiction and neurotrophin signaling pathway.

    Hällfors J, Palviainen T, Surakka I, Gupta R, Buchwald J, Raevuori A, Ripatti S, Korhonen T, Jousilahti P, Madden PAF, Kaprio J and Loukola A

    Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Finland.

    The heritability of nicotine dependence based on family studies is substantial. Nevertheless, knowledge of the underlying genetic architecture remains meager. Our aim was to identify novel genetic variants responsible for interindividual differences in smoking behavior. We performed a genome-wide association study on 1715 ever smokers ascertained from the population-based Finnish Twin Cohort enriched for heavy smoking. Data imputation used the 1000 Genomes Phase I reference panel together with a whole genome sequence-based Finnish reference panel. We analyzed three measures of nicotine addiction-smoking quantity, nicotine dependence and nicotine withdrawal. We annotated all genome-wide significant SNPs for their functional potential. First, we detected genome-wide significant association on 16p12 with smoking quantity (P = 8.5 × 10<sup>-9</sup> ), near CLEC19A. The lead-SNP stands 22 kb from a binding site for NF-κB transcription factors, which play a role in the neurotrophin signaling pathway. However, the signal was not replicated in an independent Finnish population-based sample, FINRISK (n = 6763). Second, nicotine withdrawal showed association on 2q21 in an intron of TMEM163 (P = 2.1 × 10<sup>-9</sup> ), and on 11p15 (P = 6.6 × 10<sup>-8</sup> ) in an intron of AP2A2, and P = 4.2 × 10<sup>-7</sup> for a missense variant in MUC6, both involved in the neurotrophin signaling pathway). Third, association was detected on 3p22.3 for maximum number of cigarettes smoked per day (P = 3.1 × 10<sup>-8</sup> ) near STAC. Associating CLEC19A and TMEM163 SNPs were annotated to influence gene expression or methylation. The neurotrophin signaling pathway has previously been associated with smoking behavior. Our findings further support the role in nicotine addiction.

    Funded by: Academy of Finland: 100499, 118555, 141054, 205585; Biomedicum Helsinki Foundation, and Doctoral Program in Biomedicine, University of Helsinki (J.H.): 263278, 265240; ENGAGE-European Network for Genetic and Genomic Epidemiology: 201413, FP7-HEALTH-F4-2007; Global Research Awards for Nicotine Dependence (GRAND); NIAAA NIH HHS: K05 AA000145, R01 AA009203, R01 AA012502, R37 AA012502; NIDA NIH HHS: R01 DA012854, R56 DA012854; Sigrid Juselius Foundation

    Addiction biology 2019;24;3;549-561

  • TP53 mutation status divides myelodysplastic syndromes with complex karyotypes into distinct prognostic subgroups.

    Haase D, Stevenson KE, Neuberg D, Maciejewski JP, Nazha A, Sekeres MA, Ebert BL, Garcia-Manero G, Haferlach C, Haferlach T, Kern W, Ogawa S, Nagata Y, Yoshida K, Graubert TA, Walter MJ, List AF, Komrokji RS, Padron E, Sallman D, Papaemmanuil E, Campbell PJ, Savona MR, Seegmiller A, Adès L, Fenaux P, Shih LY, Bowen D, Groves MJ, Tauro S, Fontenay M, Kosmider O, Bar-Natan M, Steensma D, Stone R, Heuser M, Thol F, Cazzola M, Malcovati L, Karsan A, Ganster C, Hellström-Lindberg E, Boultwood J, Pellagatti A, Santini V, Quek L, Vyas P, Tüchler H, Greenberg PL, Bejar R and International Working Group for MDS Molecular Prognostic Committee

    University Medical Center, Georg- August-University, Goettingen, Germany.

    Risk stratification is critical in the care of patients with myelodysplastic syndromes (MDS). Approximately 10% have a complex karyotype (CK), defined as more than two cytogenetic abnormalities, which is a highly adverse prognostic marker. However, CK-MDS can carry a wide range of chromosomal abnormalities and somatic mutations. To refine risk stratification of CK-MDS patients, we examined data from 359 CK-MDS patients shared by the International Working Group for MDS. Mutations were underrepresented with the exception of TP53 mutations, identified in 55% of patients. TP53 mutated patients had even fewer co-mutated genes but were enriched for the del(5q) chromosomal abnormality (p < 0.005), monosomal karyotype (p < 0.001), and high complexity, defined as more than 4 cytogenetic abnormalities (p < 0.001). Monosomal karyotype, high complexity, and TP53 mutation were individually associated with shorter overall survival, but monosomal status was not significant in a multivariable model. Multivariable survival modeling identified severe anemia (hemoglobin < 8.0 g/dL), NRAS mutation, SF3B1 mutation, TP53 mutation, elevated blast percentage (>10%), abnormal 3q, abnormal 9, and monosomy 7 as having the greatest survival risk. The poor risk associated with CK-MDS is driven by its association with prognostically adverse TP53 mutations and can be refined by considering clinical and karyotype features.

    Funded by: Medical Research Council: G1000729, G1000729/94931, MC_U137961146, MC_UU_00016/11, MC_UU_12009/11, MR/L008963/1, MR/R007608/1; NIDDK NIH HHS: K08 DK091360

    Leukemia 2019;33;7;1747-1758

  • A Transient Pulse of Genetic Admixture from the Crusaders in the Near East Identified from Ancient Genome Sequences.

    Haber M, Doumet-Serhal C, Scheib CL, Xue Y, Mikulski R, Martiniano R, Fischer-Genz B, Schutkowski H, Kivisild T and Tyler-Smith C

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK. Electronic address:

    During the medieval period, hundreds of thousands of Europeans migrated to the Near East to take part in the Crusades, and many of them settled in the newly established Christian states along the Eastern Mediterranean coast. Here, we present a genetic snapshot of these events and their aftermath by sequencing the whole genomes of 13 individuals who lived in what is today known as Lebanon between the 3<sup>rd</sup> and 13<sup>th</sup> centuries CE. These include nine individuals from the "Crusaders' pit" in Sidon, a mass burial in South Lebanon identified from the archaeology as the grave of Crusaders killed during a battle in the 13<sup>th</sup> century CE. We show that all of the Crusaders' pit individuals were males; some were Western Europeans from diverse origins, some were locals (genetically indistinguishable from present-day Lebanese), and two individuals were a mixture of European and Near Eastern ancestries, providing direct evidence that the Crusaders admixed with the local population. However, these mixtures appear to have had limited genetic consequences since signals of admixture with Europeans are not significant in any Lebanese group today-in particular, Lebanese Christians are today genetically similar to local people who lived during the Roman period which preceded the Crusades by more than four centuries.

    American journal of human genetics 2019;104;5;977-984

  • A Rare Deep-Rooting D0 African Y-Chromosomal Haplogroup and Its Implications for the Expansion of Modern Humans Out of Africa.

    Haber M, Jones AL, Connell BA, Asan, Arciero E, Yang H, Thomas MG, Xue Y and Tyler-Smith C

    The Wellcome Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK.

    Present-day humans outside Africa descend mainly from a single expansion out ∼50,000-70,000 years ago, but many details of this expansion remain unclear, including the history of the male-specific Y chromosome at this time. Here, we reinvestigate a rare deep-rooting African Y-chromosomal lineage by sequencing the whole genomes of three Nigerian men described in 2003 as carrying haplogroup DE* Y chromosomes, and analyzing them in the context of a calibrated worldwide Y-chromosomal phylogeny. We confirm that these three chromosomes do represent a deep-rooting DE lineage, branching close to the DE bifurcation, but place them on the D branch as an outgroup to all other known D chromosomes, and designate the new lineage D0. We consider three models for the expansion of Y lineages out of Africa ∼50,000-100,000 years ago, incorporating migration back to Africa where necessary to explain present-day Y-lineage distributions. Considering both the Y-chromosomal phylogenetic structure incorporating the D0 lineage, and published evidence for modern humans outside Africa, the most favored model involves an origin of the DE lineage within Africa with D0 and E remaining there, and migration out of the three lineages (C, D, and FT) that now form the vast majority of non-African Y chromosomes. The exit took place 50,300-81,000 years ago (latest date for FT lineage expansion outside Africa - earliest date for the D/D0 lineage split inside Africa), and most likely 50,300-59,400 years ago (considering Neanderthal admixture). This work resolves a long-running debate about Y-chromosomal out-of-Africa/back-to-Africa migrations, and provides insights into the out-of-Africa expansion more generally.

    Funded by: Wellcome Trust: 098051

    Genetics 2019;212;4;1421-1428

  • Identification of a new panel of reference genes to study pairing-dependent gene expression in Schistosoma mansoni.

    Haeberlein S, Angrisano A, Quack T, Lu Z, Kellershohn J, Blohm A, Grevelding CG and Hahnel SR

    Institute of Parasitology, BFS, Justus-Liebig-University, Giessen, Germany.

    Facilitated by the Schistosoma mansoni genome project, multiple transcriptomic studies were performed over the last decade to elucidate gene expression patterns among different developmental stages of the complex schistosome life cycle. While these analyses enable the identification of candidate genes with key functions in schistosome biology, a diverse molecular tool set is needed that allows comprehensive functional characterization at the single gene level. This includes the availability of reliable reference genes to confirm changes in the transcription of genes of interest over different biological samples and experimental conditions. In particular, the investigation of one key aspect of schistosome biology, the pairing-dependent gene expression in females and males, requires knowledge on reference genes that are expressed independently of both pairing and of in vitro culture effects. Therefore, the present study focused on the identification of quantitative reverse transcription (qRT)-PCR reference genes suitable for the investigation of pairing-dependent gene expression in the S. mansoni male. The "pipeline" we present here is based on qRT-PCR analyses of high biological replication combined with three different statistical analysis tools, BestKeeper, geNorm, and NormFinder. Our approach resulted in a statistically robust ranking of 15 selected reference genes with respect to their transcription stability between pairing-unexperienced and -experienced males. We further tested the top seven candidate genes for their transcription stability during invitro culture of adult S. mansoni. Of these, the two most suitable reference genes were used to investigate the influence of the pairing contact on the transcription of genes of interest, comprising a tyrosine decarboxylase gene Smtdc1, an ebony ortholog Smebony, and the follistatin ortholog Smfst in S. mansoni males. Performing pairing, separation and re-pairing experiments with adult S. mansoni in vitro, our results indicate for the first time that pairing can act as a molecular on/off-switch of specific genes to strictly control their expression in schistosome males.

    Funded by: Wellcome Trust: 107475/Z/15/Z

    International journal for parasitology 2019;49;8;615-624

  • High-Throughput Single-Cell Real-Time Quantitative PCR Analysis.

    Haim-Vilmovsky L

    EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK.

    Examining transcriptomics of populations at the single-cell level allows for higher resolution when studying functionality in development, differentiation, and physiology. Real-time quantitative PCR (qPCR) enables a sensitive detection of specific gene expression; however, processing a large number of samples for single-cell research involves a time-consuming process and high reagent costs. Here we describe a protocol for single-cell qPCR using nanofluidic chips. This method reduces the number of handling steps and volumes per reaction, allowing for more samples and genes to be measured.

    Methods in molecular biology (Clifton, N.J.) 2019;1979;177-183

  • Automating the Shared Resource Laboratory Using Computer Scripts: A Case Report.

    Hall C, Brown L, Graham J, Thompson S and Ng BL

    Cytometry Core Facility, Wellcome Sanger Institute, Genome Campus, Hinxton, CB10 1SA, UK.

    Shared resource laboratories (SRLs) offer instrumentation, training, and support to investigators and play an important role in the progress and development of science. To facilitate daily tasks and to provide an effective service, we have made use of computer scripts; a list of computer commands that are processed sequentially, to automate tasks in our flow cytometry facility. Using Python and an application programming interface (API), we automate user communication and produce a daily schedule display screen. We exploit the accessible nature of open standards to use R and Python to analyze and backup data from the BD Influx cell sorter. Finally, we show that through simple scripting, we can add value to an existing service by producing sort statistics from the Beckman Coulter XDP cell sorter. With these five examples, we demonstrate and wish to inspire other SRLs that the use of scripts helps to improve work efficiency, can solve problems, and can enhance the service provided by the SRL. © 2019 The Authors. Cytometry Part A published by Wiley Periodicals, Inc. on behalf of International Society for Advancement of Cytometry.

    Funded by: Wellcome Trust: WT206194

    Cytometry. Part A : the journal of the International Society for Analytical Cytology 2019;95;7;797-802

  • Relating evolutionary selection and mutant clonal dynamics in normal epithelia.

    Hall MWJ, Jones PH and Hall BA

    Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Cancer develops from mutated cells in normal tissues. Whether somatic mutations alter normal cell dynamics is key to understanding cancer risk and guiding interventions to reduce it. An analysis of the first incomplete moment of size distributions of clones carrying cancer-associated mutations in normal human eyelid skin gives a good fit with neutral drift, arguing mutations do not affect cell fate. However, this suggestion conflicts with genetic evidence in the same dataset that argues for strong positive selection of a subset of mutations. This implies cells carrying these mutations have a competitive advantage over normal cells, leading to large clonal expansions within the tissue. In the normal epithelium, clone growth is constrained by the limited size of the proliferating compartment and competition with surrounding cells. We show that if these factors are taken into account, the first incomplete moment of the clone size distribution is unable to exclude non-neutral behaviour. Furthermore, experimental factors can make a non-neutral clone size distribution appear neutral. We validate these principles with a new experimental dataset showing that when experiments are appropriately designed, the first incomplete moment can be a useful indicator of non-neutral competition. Finally, we discuss the complex relationship between mutant clone sizes and genetic selection.

    Funded by: Wellcome Trust

    Journal of the Royal Society, Interface 2019;16;156;20190230

  • Evolution and expansion of multidrug-resistant malaria in southeast Asia: a genomic epidemiology study.

    Hamilton WL, Amato R, van der Pluijm RW, Jacob CG, Quang HH, Thuy-Nhien NT, Hien TT, Hongvanthong B, Chindavongsa K, Mayxay M, Huy R, Leang R, Huch C, Dysoley L, Amaratunga C, Suon S, Fairhurst RM, Tripura R, Peto TJ, Sovann Y, Jittamala P, Hanboonkunupakarn B, Pukrittayakamee S, Chau NH, Imwong M, Dhorda M, Vongpromek R, Chan XHS, Maude RJ, Pearson RD, Nguyen T, Rockett K, Drury E, Gonçalves S, White NJ, Day NP, Kwiatkowski DP, Dondorp AM and Miotto O

    Wellcome Sanger Institute, Hinxton, UK; Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK.

    Background: A multidrug-resistant co-lineage of Plasmodium falciparum malaria, named KEL1/PLA1, spread across Cambodia in 2008-13, causing high rates of treatment failure with the frontline combination therapy dihydroartemisinin-piperaquine. Here, we report on the evolution and spread of KEL1/PLA1 in subsequent years.

    Methods: For this genomic epidemiology study, we analysed whole genome sequencing data from P falciparum clinical samples collected from patients with malaria between 2007 and 2018 from Cambodia, Laos, northeastern Thailand, and Vietnam, through the MalariaGEN P falciparum Community Project. Previously unpublished samples were provided by two large-scale multisite projects: the Tracking Artemisinin Resistance Collaboration II (TRAC2) and the Genetic Reconnaissance in the Greater Mekong Subregion (GenRe-Mekong) project. By investigating genome-wide relatedness between parasites, we inferred patterns of shared ancestry in the KEL1/PLA1 population.

    Findings: We analysed 1673 whole genome sequences that passed quality filters, and determined KEL1/PLA1 status in 1615. Before 2009, KEL1/PLA1 was only found in western Cambodia; by 2016-17 its prevalence had risen to higher than 50% in all of the surveyed countries except for Laos. In northeastern Thailand and Vietnam, KEL1/PLA1 exceeded 80% of the most recent P falciparum parasites. KEL1/PLA1 parasites maintained high genetic relatedness and low diversity, reflecting a recent common origin. Several subgroups of highly related parasites have recently emerged within this co-lineage, with diverse geographical distributions. The three largest of these subgroups (n=84, n=79, and n=47) mostly emerged since 2016 and were all present in Cambodia, Laos, and Vietnam. These expanding subgroups carried new mutations in the crt gene, which arose on a specific genetic background comprising multiple genomic regions. Four newly emerging crt mutations were rare in the early period and became more prevalent by 2016-17 (Thr93Ser, rising to 19·8%; His97Tyr to 11·2%; Phe145Ile to 5·5%; and Ile218Phe to 11·1%).

    Interpretation: After emerging and circulating for several years within Cambodia, the P falciparum KEL1/PLA1 co-lineage diversified into multiple subgroups and acquired new genetic features, including novel crt mutations. These subgroups have rapidly spread into neighbouring countries, suggesting enhanced fitness. These findings highlight the urgent need for elimination of this increasingly drug-resistant parasite co-lineage, and the importance of genetic surveillance in accelerating malaria elimination efforts.

    Funding: Wellcome Trust, Bill & Melinda Gates Foundation, UK Medical Research Council, and UK Department for International Development.

    Funded by: Wellcome Trust

    The Lancet. Infectious diseases 2019;19;9;943-951

  • The ribosomal P-stalk couples amino acid starvation to GCN2 activation in mammalian cells.

    Harding HP, Ordonez A, Allen F, Parts L, Inglis AJ, Williams RL and Ron D

    Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom.

    The eukaryotic translation initiation factor 2α (eIF2α) kinase GCN2 is activated by amino acid starvation to elicit a rectifying physiological program known as the Integrated Stress Response (ISR). A role for uncharged tRNAs as activating ligands of yeast GCN2 is supported experimentally. However, mouse GCN2 activation has recently been observed in circumstances associated with ribosome stalling with no global increase in uncharged tRNAs. We report on a mammalian CHO cell-based CRISPR-Cas9 mutagenesis screen for genes that contribute to ISR activation by amino acid starvation. Disruption of genes encoding components of the ribosome P-stalk, uL10 and P1, selectively attenuated GCN2-mediated ISR activation by amino acid starvation or interference with tRNA charging without affecting the endoplasmic reticulum unfolded protein stress-induced ISR, mediated by the related eIF2α kinase PERK. Wildtype ribosomes isolated from CHO cells, but not those with P-stalk lesions, stimulated GCN2-dependent eIF2α phosphorylation in vitro. These observations support a model whereby lack of a cognate charged tRNA exposes a latent capacity of the ribosome P-stalk to activate GCN2 in cells and help explain the emerging link between ribosome stalling and ISR activation.

    Funded by: Cancer Research UK: C14801/A21211; Wellcome: Wellcome 100140, Wellcome 200848/Z/16/Z

    eLife 2019;8

  • Genomic identification of cryptic susceptibility to penicillins and β-lactamase inhibitors in methicillin-resistant Staphylococcus aureus.

    Harrison EM, Ba X, Coll F, Blane B, Restif O, Carvell H, Köser CU, Jamrozy D, Reuter S, Lovering A, Gleadall N, Bellis KL, Uhlemann AC, Lowy FD, Massey RC, Grilo IR, Sobral R, Larsen J, Rhod Larsen A, Vingsbo Lundberg C, Parkhill J, Paterson GK, Holden MTG, Peacock SJ and Holmes MA

    Wellcome Sanger Institute, Hinxton, UK.

    Antibiotic resistance in bacterial pathogens threatens the future of modern medicine. One such resistant pathogen is methicillin-resistant Staphylococcus aureus (MRSA), which is resistant to nearly all β-lactam antibiotics, limiting treatment options. Here, we show that a significant proportion of MRSA isolates from different lineages, including the epidemic USA300 lineage, are susceptible to penicillins when used in combination with β-lactamase inhibitors such as clavulanic acid. Susceptibility is mediated by a combination of two different mutations in the mecA promoter region that lowers mecA-encoded penicillin-binding protein 2a (PBP2a) expression, and in the majority of isolates by either one of two substitutions in PBP2a (E246G or M122I) that increase the affinity of PBP2a for penicillin in the presence of clavulanic acid. Treatment of S. aureus infections in wax moth and mouse models shows that penicillin/β-lactamase inhibitor susceptibility can be exploited as an effective therapeutic choice for 'susceptible' MRSA infection. Finally, we show that isolates with the PBP2a E246G substitution have a growth advantage in the presence of penicillin but the absence of clavulanic acid, which suggests that penicillin/β-lactamase susceptibility is an example of collateral sensitivity (resistance to one antibiotic increases sensitivity to another). Our findings suggest that widely available and currently disregarded antibiotics could be effective in a significant proportion of MRSA infections.

    Funded by: Medical Research Council: G1001787/1, MR/N002660/1, MR/S00291X/1; Wellcome Trust: 201344/Z/16/Z

    Nature microbiology 2019;4;10;1680-1691

  • Low-frequency variation in TP53 has large effects on head circumference and intracranial volume.

    Haworth S, Shapland CY, Hayward C, Prins BP, Felix JF, Medina-Gomez C, Rivadeneira F, Wang C, Ahluwalia TS, Vrijheid M, Guxens M, Sunyer J, Tachmazidou I, Walter K, Iotchkova V, Jackson A, Cleal L, Huffmann J, Min JL, Sass L, Timmers PRHJ, UK10K consortium, Davey Smith G, Fisher SE, Wilson JF, Cole TJ, Fernandez-Orth D, Bønnelykke K, Bisgaard H, Pennell CE, Jaddoe VWV, Dedoussis G, Timpson N, Zeggini E, Vitart V and St Pourcain B

    MRC Integrative Epidemiology Unit, Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK.

    Cranial growth and development is a complex process which affects the closely related traits of head circumference (HC) and intracranial volume (ICV). The underlying genetic influences shaping these traits during the transition from childhood to adulthood are little understood, but might include both age-specific genetic factors and low-frequency genetic variation. Here, we model the developmental genetic architecture of HC, showing this is genetically stable and correlated with genetic determinants of ICV. Investigating up to 46,000 children and adults of European descent, we identify association with final HC and/or final ICV + HC at 9 novel common and low-frequency loci, illustrating that genetic variation from a wide allele frequency spectrum contributes to cranial growth. The largest effects are reported for low-frequency variants within TP53, with 0.5 cm wider heads in increaser-allele carriers versus non-carriers during mid-childhood, suggesting a previously unrecognized role of TP53 transcripts in human cranial development.

    Funded by: British Heart Foundation: RG/10/17/28553; Medical Research Council: G9824984, MC_PC_U127580972, MC_UU_00007/10, MC_UU_00007/5, MC_UU_00011/1, MC_UU_12012/5, MC_UU_12013/3, MR/J012165/1, MR/R010692/1

    Nature communications 2019;10;1;357

  • Caribbean multi-centre study of Klebsiella pneumoniae: whole-genome sequencing, antimicrobial resistance and virulence factors.

    Heinz E, Brindle R, Morgan-McCalla A, Peters K and Thomson NR

    1​Wellcome Trust Sanger Institute, Hinxton, UK.

    The surveillance of antimicrobial-resistant isolates has proven to be one of the most valuable tools to understand the global rise of multidrug-resistant bacterial pathogens. We report the first insights into the current situation in the Caribbean, where a pilot project to monitor antimicrobial resistance (AMR) through phenotypic resistance measurements combined with whole-genome sequencing was set up in collaboration with the Caribbean Public Health Agency (CARPHA). Our first study focused on Klebsiella pneumoniae, a highly relevant organism amongst the Gram-negative opportunistic pathogens worldwide causing hospital- and community-acquired infections. Our results show that not only carbapenem resistance, but also hypervirulent strains, are circulating in patients in the Caribbean. Our current data does not allow us to infer their prevalence in the population. We argue for the urgent need to further support AMR surveillance and stewardship in this almost uncharted territory, which can make a significant impact on the reduction of antimicrobial usage. This article contains data hosted by Microreact (

    Microbial genomics 2019;5;5

  • Resistance mechanisms and population structure of highly drug resistant Klebsiella in Pakistan during the introduction of the carbapenemase NDM-1.

    Heinz E, Ejaz H, Bartholdson Scott J, Wang N, Gujaran S, Pickard D, Wilksch J, Cao H, Haq IU, Dougan G and Strugnell RA

    Parasites and Microbes, Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK.

    Klebsiella pneumoniae is a major threat to public health with the emergence of isolates resistant to most, if not all, useful antibiotics. We present an in-depth analysis of 178 extended-spectrum beta-lactamase (ESBL)-producing K. pneumoniae collected from patients resident in a region of Pakistan, during the period 2010-2012, when the now globally-distributed carbapenemase bla-NDM-1 was being acquired by Klebsiella. We observed two dominant lineages, but neither the overall resistance profile nor virulence-associated factors, explain their evolutionary success. Phenotypic analysis of resistance shows few differences between the acquisition of resistance genes and the phenotypic resistance profile, including beta-lactam antibiotics that were used to treat ESBL-positive strains. Resistance against these drugs could be explained by inhibitor-resistant beta-lactamase enzymes, carbapenemases or ampC type beta-lactamases, at least one of which was detected in most, but not all relevant strains analysed. Complete genomes for six selected strains are reported, these provide detailed insights into the mobile elements present in these isolates during the initial spread of NDM-1. The unexplained success of some lineages within this pool of highly resistant strains, and the discontinuity between phenotypic resistance and genotype at the macro level, indicate that intrinsic mechanisms contribute to competitive advantage and/or resistance.

    Funded by: Wellcome Trust: 206194

    Scientific reports 2019;9;1;2392

  • Genome-wide CRISPR Screens in T Helper Cells Reveal Pervasive Crosstalk between Activation and Differentiation.

    Henriksson J, Chen X, Gomes T, Ullah U, Meyer KB, Miragaia R, Duddy G, Pramanik J, Yusa K, Lahesmaa R and Teichmann SA

    Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK; Department of Biosciences and Nutrition, Karolinska Institutet, Hälsovägen 7, Novum, SE-141 83, Huddinge, Sweden.

    T helper type 2 (Th2) cells are important regulators of mammalian adaptive immunity and have relevance for infection, autoimmunity, and tumor immunology. Using a newly developed, genome-wide retroviral CRISPR knockout (KO) library, combined with RNA-seq, ATAC-seq, and ChIP-seq, we have dissected the regulatory circuitry governing activation and differentiation of these cells. Our experiments distinguish cell activation versus differentiation in a quantitative framework. We demonstrate that these two processes are tightly coupled and are jointly controlled by many transcription factors, metabolic genes, and cytokine/receptor pairs. There are only a small number of genes regulating differentiation without any role in activation. By combining biochemical and genetic data, we provide an atlas for Th2 differentiation, validating known regulators and identifying factors, such as Pparg and Bhlhe40, as part of the core regulatory network governing Th2 helper cell fates.

    Funded by: Wellcome Trust: WT206194

    Cell 2019;176;4;882-896.e18

  • BRN2 suppresses apoptosis, reprograms DNA damage repair, and is associated with a high somatic mutation burden in melanoma.

    Herbert K, Binet R, Lambert JP, Louphrasitthiphol P, Kalkavan H, Sesma-Sanz L, Robles-Espinoza CD, Sarkar S, Suer E, Andrews S, Chauhan J, Roberts ND, Middleton MR, Gingras AC, Masson JY, Larue L, Falletta P and Goding CR

    Ludwig Institute for Cancer Research, Nuffield Department of Clinical Medicine, University of Oxford, Headington, Oxford OX3 7DQ, United Kingdom.

    Whether cell types exposed to a high level of environmental insults possess cell type-specific prosurvival mechanisms or enhanced DNA damage repair capacity is not well understood. BRN2 is a tissue-restricted POU domain transcription factor implicated in neural development and several cancers. In melanoma, BRN2 plays a key role in promoting invasion and regulating proliferation. Here we found, surprisingly, that rather than interacting with transcription cofactors, BRN2 is instead associated with DNA damage response proteins and directly binds PARP1 and Ku70/Ku80. Rapid PARP1-dependent BRN2 association with sites of DNA damage facilitates recruitment of Ku80 and reprograms DNA damage repair by promoting Ku-dependent nonhomologous end-joining (NHEJ) at the expense of homologous recombination. BRN2 also suppresses an apoptosis-associated gene expression program to protect against UVB-, chemotherapy- and vemurafenib-induced apoptosis. Remarkably, BRN2 expression also correlates with a high single-nucleotide variation prevalence in human melanomas. By promoting error-prone DNA damage repair via NHEJ and suppressing apoptosis of damaged cells, our results suggest that BRN2 contributes to the generation of melanomas with a high mutation burden. Our findings highlight a novel role for a key transcription factor in reprogramming DNA damage repair and suggest that BRN2 may impact the response to DNA-damaging agents in BRN2-expressing cancers.

    Funded by: CIHR: FDN 143301 ; NCI NIH HHS: P01 CA128814; Wellcome Trust: 106288/Z/14/Z, 204562/Z/16/Z

    Genes & development 2019;33;5-6;310-332

  • Peptides Derived of Kunitz-Type Serine Protease Inhibitor as Potential Vaccine Against Experimental Schistosomiasis.

    Hernández-Goenaga J, López-Abán J, Protasio AV, Vicente Santiago B, Del Olmo E, Vanegas M, Fernández-Soto P, Patarroyo MA and Muro A

    Infectious and Tropical Diseases Group (e-INTRO), IBSAL-CIETUS (Biomedical Research Institute of Salamanca-Research Centre for Tropical Diseases at the University of Salamanca), Faculty of Pharmacy, University of Salamanca, Salamanca, Spain.

    Schistosomiasis is a significant public health problem in sub-Saharan Africa, China, Southeast Asia, and regions of South and Central America affecting about 189 million people. Kunitz-type serine protease inhibitors have been identified as important players in the interaction of other flatworm parasites with their mammalian hosts. They are involved in host blood coagulation, fibrinolysis, inflammation, and ion channel blocking, all of them critical biological processes, which make them interesting targets to develop a vaccine. Here, we evaluate the protective efficacy of chemically synthesized T- and B-cell peptide epitopes derived from a kunitz protein from <i>Schistosoma mansoni</i>. Putative kunitz-type protease inhibitor proteins were identified in the <i>S. mansoni</i> genome, and their expression was analyzed by RNA-seq. Gene expression analyses showed that the kunitz protein Smp_147730 (Syn. Smp_311670) was dramatically and significantly up-regulated in schistosomula and adult worms when compared to the invading cercariae. T- and B-cell epitopes were predicted using bioinformatics tools, chemically synthesized, and formulated in the Adjuvant Adaptation (ADAD) vaccination system. BALB/c mice were vaccinated and challenged with <i>S. mansoni</i> cercariae. Kunitz peptides were highly protective in vaccinated BALB/c mice showing significant reductions in recovery of adult females (89-91%) and in the numbers of eggs trapped in the livers (77-81%) and guts (57-77%) of mice. Moreover, liver lesions were significantly reduced in vaccinated mice (64-65%) compared to infected control mice. The vaccination regime was well-tolerated with both peptides. We propose the use of these peptides, alone or in combination, as reliable candidates for vaccination against schistosomiasis.

    Frontiers in immunology 2019;10;2498

  • Evaluation of parameters affecting performance and reliability of machine learning-based antibiotic susceptibility testing from whole genome sequencing data.

    Hicks AL, Wheeler N, Sánchez-Busó L, Rakeman JL, Harris SR and Grad YH

    Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America.

    Prediction of antibiotic resistance phenotypes from whole genome sequencing data by machine learning methods has been proposed as a promising platform for the development of sequence-based diagnostics. However, there has been no systematic evaluation of factors that may influence performance of such models, how they might apply to and vary across clinical populations, and what the implications might be in the clinical setting. Here, we performed a meta-analysis of seven large Neisseria gonorrhoeae datasets, as well as Klebsiella pneumoniae and Acinetobacter baumannii datasets, with whole genome sequence data and antibiotic susceptibility phenotypes using set covering machine classification, random forest classification, and random forest regression models to predict resistance phenotypes from genotype. We demonstrate how model performance varies by drug, dataset, resistance metric, and species, reflecting the complexities of generating clinically relevant conclusions from machine learning-derived models. Our findings underscore the importance of incorporating relevant biological and epidemiological knowledge into model design and assessment and suggest that doing so can inform tailored modeling for individual drugs, pathogens, and clinical populations. We further suggest that continued comprehensive sampling and incorporation of up-to-date whole genome sequence data, resistance phenotypes, and treatment outcome data into model training will be crucial to the clinical utility and sustainability of machine learning-based molecular diagnostics.

    Funded by: NCI NIH HHS: U01 CA207167; NIAID NIH HHS: R01 AI132606; Wellcome Trust: 098051

    PLoS computational biology 2019;15;9;e1007349

  • Antibiotics-induced monodominance of a novel gut bacterial order.

    Hildebrand F, Moitinho-Silva L, Blasche S, Jahn MT, Gossmann TI, Huerta-Cepas J, Hercog R, Luetge M, Bahram M, Pryszlak A, Alves RJ, Waszak SM, Zhu A, Ye L, Costea PI, Aalvink S, Belzer C, Forslund SK, Sunagawa S, Hentschel U, Merten C, Patil KR, Benes V and Bork P

    Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.

    Objective: The composition of the healthy human adult gut microbiome is relatively stable over prolonged periods, and representatives of the most highly abundant and prevalent species have been cultured and described. However, microbial abundances can change on perturbations, such as antibiotics intake, enabling the identification and characterisation of otherwise low abundant species.

    Design: Analysing gut microbial time-series data, we used shotgun metagenomics to create strain level taxonomic and functional profiles. Community dynamics were modelled postintervention with a focus on conditionally rare taxa and previously unknown bacteria.

    Results: In response to a commonly prescribed cephalosporin (ceftriaxone), we observe a strong compositional shift in one subject, in which a previously unknown species, <sup>U</sup><i>Borkfalki ceftriaxensis</i>, was identified, blooming to 92% relative abundance. The genome assembly reveals that this species (1) belongs to a so far undescribed order of Firmicutes, (2) is ubiquitously present at low abundances in at least one third of adults, (3) is opportunistically growing, being ecologically similar to typical probiotic species and (4) is stably associated to healthy hosts as determined by single nucleotide variation analysis. It was the first coloniser after the antibiotic intervention that led to a long-lasting microbial community shift and likely permanent loss of nine commensals.

    Conclusion: The bloom of <sup>U</sup><i>B. ceftriaxensis</i> and a subsequent one of <i>Parabacteroides distasonis</i> demonstrate the existence of monodominance community states in the gut. Our study points to an undiscovered wealth of low abundant but common taxa in the human gut and calls for more highly resolved longitudinal studies, in particular on ecosystem perturbations.

    Gut 2019;68;10;1781-1790

  • Landscape of the Plasmodium Interactome Reveals Both Conserved and Species-Specific Functionality.

    Hillier C, Pardo M, Yu L, Bushell E, Sanderson T, Metcalf T, Herd C, Anar B, Rayner JC, Billker O and Choudhary JS

    Developmental Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany.

    Malaria represents a major global health issue, and the identification of new intervention targets remains an urgent priority. This search is hampered by more than one-third of the genes of malaria-causing Plasmodium parasites being uncharacterized. We report a large-scale protein interaction network in Plasmodium schizonts, generated by combining blue native-polyacrylamide electrophoresis with quantitative mass spectrometry and machine learning. This integrative approach, spanning 3 species, identifies >20,000 putative protein interactions, organized into 600 protein clusters. We validate selected interactions, assigning functions in chromatin regulation to previously unannotated proteins and suggesting a role for an EELM2 domain-containing protein and a putative microrchidia protein as mechanistic links between AP2-domain transcription factors and epigenetic regulation. Our interactome represents a high-confidence map of the native organization of core cellular processes in Plasmodium parasites. The network reveals putative functions for uncharacterized proteins, provides mechanistic and structural insight, and uncovers potential alternative therapeutic targets.

    Cell reports 2019;28;6;1635-1647.e5

  • Refugia and anthelmintic resistance: Concepts and challenges.

    Hodgkinson JE, Kaplan RM, Kenyon F, Morgan ER, Park AW, Paterson S, Babayan SA, Beesley NJ, Britton C, Chaudhry U, Doyle SR, Ezenwa VO, Fenton A, Howell SB, Laing R, Mable BK, Matthews L, McIntyre J, Milne CE, Morrison TA, Prentice JC, Sargison ND, Williams DJL, Wolstenholme AJ and Devaney E

    Institute of Infection and Global Health, University of Liverpool, Liverpool, L69 7ZJ, UK.

    Anthelmintic resistance is a threat to global food security. In order to alleviate the selection pressure for resistance and maintain drug efficacy, management strategies increasingly aim to preserve a proportion of the parasite population in 'refugia', unexposed to treatment. While persuasive in its logic, and widely advocated as best practice, evidence for the ability of refugia-based approaches to slow the development of drug resistance in parasitic helminths is currently limited. Moreover, the conditions needed for refugia to work, or how transferable those are between parasite-host systems, are not known. This review, born of an international workshop, seeks to deconstruct the concept of refugia and examine its assumptions and applicability in different situations. We conclude that factors potentially important to refugia, such as the fitness cost of drug resistance, the degree of mixing between parasite sub-populations selected through treatment or not, and the impact of parasite life-history, genetics and environment on the population dynamics of resistance, vary widely between systems. The success of attempts to generate refugia to limit anthelmintic drug resistance are therefore likely to be highly dependent on the system in hand. Additional research is needed on the concept of refugia and the underlying principles for its application across systems, as well as empirical studies within systems that prove and optimise its usefulness.

    International journal for parasitology. Drugs and drug resistance 2019;10;51-57

  • DNA methylation profiling allows for characterization of atrial and ventricular cardiac tissues and hiPSC-CMs.

    Hoff K, Lemme M, Kahlert AK, Runde K, Audain E, Schuster D, Scheewe J, Attmann T, Pickardt T, Caliebe A, Siebert R, Kramer HH, Milting H, Hansen A, Ammerpohl O and Hitz MP

    Department of Congenital Heart Disease and Pediatric Cardiology, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany.

    Background: Cardiac disease modelling using human-induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CM) requires thorough insight into cardiac cell type differentiation processes. However, current methods to discriminate different cardiac cell types are mostly time-consuming, are costly and often provide imprecise phenotypic evaluation. DNA methylation plays a critical role during early heart development and cardiac cellular specification. We therefore investigated the DNA methylation pattern in different cardiac tissues to identify CpG loci for further cardiac cell type characterization.

    Results: An array-based genome-wide DNA methylation analysis using Illumina Infinium HumanMethylation450 BeadChips led to the identification of 168 differentially methylated CpG loci in atrial and ventricular human heart tissue samples (n = 49) from different patients with congenital heart defects (CHD). Systematic evaluation of atrial-ventricular DNA methylation pattern in cardiac tissues in an independent sample cohort of non-failing donor hearts and cardiac patients using bisulfite pyrosequencing helped us to define a subset of 16 differentially methylated CpG loci enabling precise characterization of human atrial and ventricular cardiac tissue samples. This defined set of reproducible cardiac tissue-specific DNA methylation sites allowed us to consistently detect the cellular identity of hiPSC-CM subtypes.

    Conclusion: Testing DNA methylation of only a small set of defined CpG sites thus makes it possible to distinguish atrial and ventricular cardiac tissues and cardiac atrial and ventricular subtypes of hiPSC-CMs. This method represents a rapid and reliable system for phenotypic characterization of in vitro-generated cardiomyocytes and opens new opportunities for cardiovascular research and patient-specific therapy.

    Funded by: AFib-TrainNet: 675351; Deutsche Forschungsgemeinschaft: HA 3423/5-1; Deutsches Zentrum für Herz-Kreislaufforschung: 81Z2700202

    Clinical epigenetics 2019;11;1;89

  • Intratumoral Genetic and Functional Heterogeneity in Pediatric Glioblastoma.

    Hoffman M, Gillmor AH, Kunz DJ, Johnston MJ, Nikolic A, Narta K, Zarrei M, King J, Ellestad K, Dang NH, Cavalli FMG, Kushida MM, Coutinho FJ, Zhu Y, Luu B, Ma Y, Mungall AJ, Moore R, Marra MA, Taylor MD, Pugh TJ, Dirks PB, Strother D, Lafay-Cousin L, Resnick AC, Scherer S, Senger DL, Simons BD, Chan JA, Morrissy AS and Gallo M

    Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada.

    Pediatric glioblastoma (pGBM) is a lethal cancer with no effective therapies. To understand the mechanisms of tumor evolution in this cancer, we performed whole-genome sequencing with linked reads on longitudinally resected pGBM samples. Our analyses showed that all diagnostic and recurrent samples were collections of genetically diverse subclones. Clonal composition rapidly evolved at recurrence, with less than 8% of nonsynonymous single-nucleotide variants being shared in diagnostic-recurrent pairs. To track the origins of the mutational events observed in pGBM, we generated whole-genome datasets for two patients and their parents. These trios showed that genetic variants could be (i) somatic, (ii) inherited from a healthy parent, or (iii) <i>de novo</i> in the germlines of pGBM patients. Analysis of variant allele frequencies supported a model of tumor growth involving slow-cycling cancer stem cells that give rise to fast-proliferating progenitor-like cells and to nondividing cells. Interestingly, radiation and antimitotic chemotherapeutics did not increase overall tumor burden upon recurrence. These findings support an important role for slow-cycling stem cell populations in contributing to recurrences, because slow-cycling cell populations are expected to be less prone to genotoxic stress induced by these treatments and therefore would accumulate few mutations. Our results highlight the need for new targeted treatments that account for the complex functional hierarchies and genomic heterogeneity of pGBM. SIGNIFICANCE: This work challenges several assumptions regarding the genetic organization of pediatric GBM and highlights mutagenic programs that start during early prenatal development.<b>Graphical Abstract:</b>

    Funded by: CIHR: ICT-156651, PJT-156278; Cancer Research UK: C6946/A14492; Medical Research Council: MC_PC_12009; Wellcome Trust: 092096, 098357, 203828/Z/16/A, 203828/Z/16/Z

    Cancer research 2019;79;9;2111-2123

  • mzTab-M: A Data Standard for Sharing Quantitative Results in Mass Spectrometry Metabolomics.

    Hoffmann N, Rein J, Sachsenberg T, Hartler J, Haug K, Mayer G, Alka O, Dayalan S, Pearce JTM, Rocca-Serra P, Qi D, Eisenacher M, Perez-Riverol Y, Vizcaíno JA, Salek RM, Neumann S and Jones AR

    Leibniz-Institut für Analytische Wissenschaften-ISAS-e.V. , Otto-Hahn-Straße 6b , 44227 Dortmund , Germany.

    Mass spectrometry (MS) is one of the primary techniques used for large-scale analysis of small molecules in metabolomics studies. To date, there has been little data format standardization in this field, as different software packages export results in different formats represented in XML or plain text, making data sharing, database deposition, and reanalysis highly challenging. Working within the consortia of the Metabolomics Standards Initiative, Proteomics Standards Initiative, and the Metabolomics Society, we have created mzTab-M to act as a common output format from analytical approaches using MS on small molecules. The format has been developed over several years, with input from a wide range of stakeholders. mzTab-M is a simple tab-separated text format, but importantly, the structure is highly standardized through the design of a detailed specification document, tightly coupled to validation software, and a mandatory controlled vocabulary of terms to populate it. The format is able to represent final quantification values from analyses, as well as the evidence trail in terms of features measured directly from MS (e.g., LC-MS, GC-MS, DIMS, etc.) and different types of approaches used to identify molecules. mzTab-M allows for ambiguity in the identification of molecules to be communicated clearly to readers of the files (both people and software). There are several implementations of the format available, and we anticipate widespread adoption in the field.

    Funded by: NIGMS NIH HHS: R24 GM127667

    Analytical chemistry 2019;91;5;3302-3310

  • BACTOME-a reference database to explore the sequence- and gene expression-variation landscape of Pseudomonas aeruginosa clinical isolates.

    Hornischer K, Khaledi A, Pohl S, Schniederjans M, Pezoldt L, Casilag F, Muthukumarasamy U, Bruchmann S, Thöming J, Kordes A and Häussler S

    Institute of Molecular Bacteriology, Helmholtz Centre for Infection Research, D-38124 Braunschweig, Germany.

    Extensive use of next-generation sequencing (NGS) for pathogen profiling has the potential to transform our understanding of how genomic plasticity contributes to phenotypic versatility. However, the storage of large amounts of NGS data and visualization tools need to evolve to offer the scientific community fast and convenient access to these data. We introduce BACTOME as a database system that links aligned DNA- and RNA-sequencing reads of clinical Pseudomonas aeruginosa isolates with clinically relevant pathogen phenotypes. The database allows data extraction for any single isolate, gene or phenotype as well as data filtering and phenotypic grouping for specific research questions. With the integration of statistical tools we illustrate the usefulness of a relational database structure for the identification of phenotype-genotype correlations as an essential part of the discovery pipeline in genomic research. Furthermore, the database provides a compilation of DNA sequences and gene expression values of a plethora of clinical isolates to give a consensus DNA sequence and consensus gene expression signature. Deviations from the consensus thereby describe the genomic landscape and the transcriptional plasticity of the species P. aeruginosa. The database is available at

    Nucleic acids research 2019;47;D1;D716-D720

  • Human Herpesvirus Sequencing in the Genomic Era: The Growing Ranks of the Herpetic Legion.

    Houldcroft CJ

    Department of Medicine, Addenbrooke's Hospital, University of Cambridge, Cambs CB2 0QQ UK.

    The nine human herpesviruses are some of the most ubiquitous pathogens worldwide, causing life-long latent infection in a variety of different tissues. Human herpesviruses range from mild childhood infections to known tumour viruses and 'trolls of transplantation'. Epstein-Barr virus was the first human herpesvirus to have its whole genome sequenced; GenBank now includes thousands of herpesvirus genomes. This review will cover some of the recent advances in our understanding of herpesvirus diversity and disease that have come about as a result of new sequencing technologies, such as target enrichment and long-read sequencing. It will also look at the problem of resolving mixed-genotype infections, whether with short or long-read sequencing methods; and conclude with some thoughts on the future of the field as herpesvirus population genomics becomes a reality.

    Funded by: Wellcome Trust: 204870/Z/16/Z

    Pathogens (Basel, Switzerland) 2019;8;4

  • Human biology and ancient DNA: exploring disease, domestication and movement.

    Houldcroft CJ, Rifkin RF and Underdown SJ

    a Department of Medicine , Addenbrooke's Hospital, University of Cambridge , Cambridge , UK.

    Annals of human biology 2019;46;2;95-98

  • The Malaria Cell Atlas: Single parasite transcriptomes across the complete Plasmodium life cycle.

    Howick VM, Russell AJC, Andrews T, Heaton H, Reid AJ, Natarajan K, Butungi H, Metcalf T, Verzier LH, Rayner JC, Berriman M, Herren JK, Billker O, Hemberg M, Talman AM and Lawniczak MKN

    Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK.

    Malaria parasites adopt a remarkable variety of morphological life stages as they transition through multiple mammalian host and mosquito vector environments. We profiled the single-cell transcriptomes of thousands of individual parasites, deriving the first high-resolution transcriptional atlas of the entire <i>Plasmodium berghei</i> life cycle. We then used our atlas to precisely define developmental stages of single cells from three different human malaria parasite species, including parasites isolated directly from infected individuals. The Malaria Cell Atlas provides both a comprehensive view of gene usage in a eukaryotic parasite and an open-access reference dataset for the study of malaria parasites.

    Funded by: Medical Research Council: G1100339; Wellcome Trust: 206194

    Science (New York, N.Y.) 2019;365;6455

  • The human body at cellular resolution: the NIH Human Biomolecular Atlas Program.

    HuBMAP Consortium

    Transformative technologies are enabling the construction of three-dimensional maps of tissues with unprecedented spatial and molecular resolution. Over the next seven years, the NIH Common Fund Human Biomolecular Atlas Program (HuBMAP) intends to develop a widely accessible framework for comprehensively mapping the human body at single-cell resolution by supporting technology development, data acquisition, and detailed spatial mapping. HuBMAP will integrate its efforts with other funding agencies, programs, consortia, and the biomedical research community at large towards the shared vision of a comprehensive, accessible three-dimensional molecular and cellular atlas of the human body, in health and under various disease conditions.

    Funded by: NIH HHS: OT2 OD026675

    Nature 2019;574;7777;187-192

  • A proteomic time course through the differentiation of human induced pluripotent stem cells into hepatocyte-like cells.

    Hurrell T, Segeritz CP, Vallier L, Lilley KS and Cromarty AD

    Department of Pharmacology, Faculty of Health Sciences, School of Medicine, University of Pretoria, Private Bag X323, Arcadia, 0007, South Africa.

    Numerous in vitro models endeavour to mimic the characteristics of primary human hepatocytes for applications in regenerative medicine and pharmaceutical science. Mature hepatocyte-like cells (HLCs) derived from human induced pluripotent stem cells (hiPSCs) are one such in vitro model. Due to insufficiencies in transcriptome to proteome correlation, characterising the proteome of HLCs is essential to provide a suitable framework for their continual optimization. Here we interrogated the proteome during stepwise differentiation of hiPSCs into HLCs over 40 days. Whole cell protein lysates were collected and analysed using stabled isotope labelled mass spectrometry based proteomics. Quantitative proteomics identified over 6,000 proteins in duplicate multiplexed labelling experiments across two different time course series. Inductive cues in differentiation promoted sequential acquisition of hepatocyte specific markers. Analysis of proteins classically assigned as hepatic markers demonstrated trends towards maximum relative abundance between differentiation day 30 and 32. Characterisation of abundant proteins in whole cells provided evidence of the time dependent transition towards proteins corresponding with the functional repertoire of the liver. This data highlights how far the proteome of undifferentiated precursors have progressed to acquire a hepatic phenotype and constructs a platform for optimisation and improved maturation of HLC differentiation.

    Funded by: Medical Research Council: MC_PC_12009; National Centre for the Replacement, Refinement and Reduction of Animals in Research: NC/N001540/1; National Research Foundation (NRF): 87880

    Scientific reports 2019;9;1;3270

  • Adaptive Properties of the Genetically Encoded Amino Acid Alphabet Are Inherited from Its Subsets.

    Ilardo M, Bose R, Meringer M, Rasulev B, Grefenstette N, Stephenson J, Freeland S, Gillams RJ, Butch CJ and Cleaves HJ

    University of Utah Hematology, UC Berkeley Integrative Biology, George and Dolores Eccles Institute of Human Genetics, 15 N 2030 E, Room: 3240, Salt Lake City, UT, 84112, USA.

    Life uses a common set of 20 coded amino acids (CAAs) to construct proteins. This set was likely canonicalized during early evolution; before this, smaller amino acid sets were gradually expanded as new synthetic, proofreading and coding mechanisms became biologically available. Many possible subsets of the modern CAAs or other presently uncoded amino acids could have comprised the earlier sets. We explore the hypothesis that the CAAs were selectively fixed due to their unique adaptive chemical properties, which facilitate folding, catalysis, and solubility of proteins, and gave adaptive value to organisms able to encode them. Specifically, we studied in silico hypothetical CAA sets of 3-19 amino acids comprised of 1913 structurally diverse α-amino acids, exploring the adaptive value of their combined physicochemical properties relative to those of the modern CAA set. We find that even hypothetical sets containing modern CAA members are especially adaptive; it is difficult to find sets even among a large choice of alternatives that cover the chemical property space more amply. These results suggest that each time a CAA was discovered and embedded during evolution, it provided an adaptive value unusual among many alternatives, and each selective step may have helped bootstrap the developing set to include still more CAAs.

    Scientific reports 2019;9;1;12468

  • Whole genome sequencing of experimental hybrids supports meiosis-like sexual recombination in Leishmania.

    Inbar E, Shaik J, Iantorno SA, Romano A, Nzelu CO, Owens K, Sanders MJ, Dobson D, Cotton JA, Grigg ME, Beverley SM and Sacks D

    Laboratory of Parasitic Diseases, National Institutes of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, United States of America.

    Hybrid genotypes have been repeatedly described among natural isolates of Leishmania, and the recovery of experimental hybrids from sand flies co-infected with different strains or species of Leishmania has formally demonstrated that members of the genus possess the machinery for genetic exchange. As neither gamete stages nor cell fusion events have been directly observed during parasite development in the vector, we have relied on a classical genetic analysis to determine if Leishmania has a true sexual cycle. Here, we used whole genome sequencing to follow the chromosomal inheritance patterns of experimental hybrids generated within and between different strains of L. major and L. infantum. We also generated and sequenced the first experimental hybrids in L. tropica. We found that in each case the parental somy and allele contributions matched the inheritance patterns expected under meiosis 97-99% of the time. The hybrids were equivalent to F1 progeny, heterozygous throughout most of the genome for the markers that were homozygous and different between the parents. Rare, non-Mendelian patterns of chromosomal inheritance were observed, including a gain or loss of somy, and loss of heterozygosity, that likely arose during meiosis or during mitotic divisions of the progeny clones in the fly or culture. While the interspecies hybrids appeared to be sterile, the intraspecies hybrids were able to produce backcross and outcross progeny. Analysis of 5 backcross and outcross progeny clones generated from an L. major F1 hybrid, as well as 17 progeny clones generated from backcrosses involving a natural hybrid of L. tropica, revealed genome wide patterns of recombination, demonstrating that classical crossing over occurs at meiosis, and allowed us to construct the first physical and genetic maps in Leishmania. Altogether, the findings provide strong evidence for meiosis-like sexual recombination in Leishmania, presenting clear opportunities for forward genetic analysis and positional cloning of important genes.

    Funded by: NIAID NIH HHS: R01 AI029646, R01 AI031078; Wellcome Trust: 206194

    PLoS genetics 2019;15;5;e1008042

  • Mouse screen reveals multiple new genes underlying mouse and human hearing loss.

    Ingham NJ, Pearson SA, Vancollie VE, Rook V, Lewis MA, Chen J, Buniello A, Martelletti E, Preite L, Lam CC, Weiss FD, Powis Z, Suwannarat P, Lelliott CJ, Dawson SJ, White JK and Steel KP

    Wellcome Trust Sanger Institute, Hinxton, United Kingdom.

    Adult-onset hearing loss is very common, but we know little about the underlying molecular pathogenesis impeding the development of therapies. We took a genetic approach to identify new molecules involved in hearing loss by screening a large cohort of newly generated mouse mutants using a sensitive electrophysiological test, the auditory brainstem response (ABR). We review here the findings from this screen. Thirty-eight unexpected genes associated with raised thresholds were detected from our unbiased sample of 1,211 genes tested, suggesting extreme genetic heterogeneity. A wide range of auditory pathophysiologies was found, and some mutant lines showed normal development followed by deterioration of responses, revealing new molecular pathways involved in progressive hearing loss. Several of the genes were associated with the range of hearing thresholds in the human population and one, SPNS2, was involved in childhood deafness. The new pathways required for maintenance of hearing discovered by this screen present new therapeutic opportunities.

    PLoS biology 2019;17;4;e3000194

  • Multiple sclerosis genomic map implicates peripheral immune cells and microglia in susceptibility.

    International Multiple Sclerosis Genetics Consortium

    We analyzed genetic data of 47,429 multiple sclerosis (MS) and 68,374 control subjects and established a reference map of the genetic architecture of MS that includes 200 autosomal susceptibility variants outside the major histocompatibility complex (MHC), one chromosome X variant, and 32 variants within the extended MHC. We used an ensemble of methods to prioritize 551 putative susceptibility genes that implicate multiple innate and adaptive pathways distributed across the cellular components of the immune system. Using expression profiles from purified human microglia, we observed enrichment for MS genes in these brain-resident immune cells, suggesting that these may have a role in targeting an autoimmune process to the central nervous system, although MS is most likely initially triggered by perturbation of peripheral immune responses.

    Funded by: Medical Research Council; NIA NIH HHS: R01 AG036836; NIAID NIH HHS: R01 AI059829; NIGMS NIH HHS: RC2 GM093080; NINDS NIH HHS: R01 NS026799, R01 NS049477, R01 NS088155; Wellcome Trust

    Science (New York, N.Y.) 2019;365;6460

  • GARFIELD classifies disease-relevant genomic features through integration of functional annotations with association signals.

    Iotchkova V, Ritchie GRS, Geihs M, Morganella S, Min JL, Walter K, Timpson NJ, UK10K Consortium, Dunham I, Birney E and Soranzo N

    Human Genetics, Wellcome Sanger Institute, Hinxton, UK.

    Loci discovered by genome-wide association studies predominantly map outside protein-coding genes. The interpretation of the functional consequences of non-coding variants can be greatly enhanced by catalogs of regulatory genomic regions in cell lines and primary tissues. However, robust and readily applicable methods are still lacking by which to systematically evaluate the contribution of these regions to genetic variation implicated in diseases or quantitative traits. Here we propose a novel approach that leverages genome-wide association studies' findings with regulatory or functional annotations to classify features relevant to a phenotype of interest. Within our framework, we account for major sources of confounding not offered by current methods. We further assess enrichment of genome-wide association studies for 19 traits within Encyclopedia of DNA Elements- and Roadmap-derived regulatory regions. We characterize unique enrichment patterns for traits and annotations driving novel biological insights. The method is implemented in standalone software and an R package, to facilitate its application by the research community.

    Funded by: Medical Research Council: MC_UU_12013/2, MC_UU_12013/3

    Nature genetics 2019;51;2;343-353

  • Programmed genome editing of the omega-1 ribonuclease of the blood fluke, Schistosoma mansoni.

    Ittiprasert W, Mann VH, Karinshak SE, Coghlan A, Rinaldi G, Sankaranarayanan G, Chaidee A, Tanno T, Kumkhaek C, Prangtaworn P, Mentink-Kane MM, Cochran CJ, Driguez P, Holroyd N, Tracey A, Rodpai R, Everts B, Hokke CH, Hoffmann KF, Berriman M and Brindley PJ

    Department of Microbiology, Immunology and Tropical Medicine, School of Medicine and Health Sciences, George Washington University, Washington, DC, United States.

    CRISPR/Cas9-based genome editing has yet to be reported in species of the Platyhelminthes. We tested this approach by targeting omega-1 (ω1) of <i>Schistosoma mansoni</i> as proof of principle. This secreted ribonuclease is crucial for Th2 polarization and granuloma formation. Schistosome eggs were exposed to Cas9 complexed with guide RNA complementary to ω1 by electroporation or by transduction with lentiviral particles. Some eggs were also transfected with a single stranded donor template. Sequences of amplicons from gene-edited parasites exhibited Cas9-catalyzed mutations including homology directed repaired alleles, and other analyses revealed depletion of ω1 transcripts and the ribonuclease. Gene-edited eggs failed to polarize Th2 cytokine responses in macrophage/T-cell co-cultures, while the volume of pulmonary granulomas surrounding ω1-mutated eggs following tail-vein injection into mice was vastly reduced. Knock-out of ω1 and the diminished levels of these cytokines following exposure showcase the novel application of programmed gene editing for functional genomics in schistosomes.

    Funded by: NIAID NIH HHS: HHSN272201000005C, HHSN272201000005I; National Institute of Allergy and Infectious Diseases: R21AI109532; Royal Golden Jubilee Ph.D Program, Thailand: PHD/0047/2556; Thailand Research Fund: PHD/0011/2555, PHD/0047/2556, PHD/00531/2556; Wellcome: 107475/Z/15/Z, WT 098051

    eLife 2019;8

  • Epithelial NOTCH Signaling Rewires the Tumor Microenvironment of Colorectal Cancer to Drive Poor-Prognosis Subtypes and Metastasis.

    Jackstadt R, van Hooff SR, Leach JD, Cortes-Lavaud X, Lohuis JO, Ridgway RA, Wouters VM, Roper J, Kendall TJ, Roxburgh CS, Horgan PG, Nixon C, Nourse C, Gunzer M, Clark W, Hedley A, Yilmaz OH, Rashid M, Bailey P, Biankin AV, Campbell AD, Adams DJ, Barry ST, Steele CW, Medema JP and Sansom OJ

    Cancer Research UK Beatson Institute, Glasgow, UK.

    The metastatic process of colorectal cancer (CRC) is not fully understood and effective therapies are lacking. We show that activation of NOTCH1 signaling in the murine intestinal epithelium leads to highly penetrant metastasis (100% metastasis; with >80% liver metastases) in Kras<sup>G12D</sup>-driven serrated cancer. Transcriptional profiling reveals that epithelial NOTCH1 signaling creates a tumor microenvironment (TME) reminiscent of poorly prognostic human CRC subtypes (CMS4 and CRIS-B), and drives metastasis through transforming growth factor (TGF) β-dependent neutrophil recruitment. Importantly, inhibition of this recruitment with clinically relevant therapeutic agents blocks metastasis. We propose that NOTCH1 signaling is key to CRC progression and should be exploited clinically.

    Funded by: Cancer Research UK: 14356

    Cancer cell 2019;36;3;319-336.e7

  • Novel Methicillin-Resistant Staphylococcus aureus CC8 Clone Identified in a Hospital Setting in Armenia.

    Jamrozy D, Misra R, Xu Z, Ter-Stepanyan MM, Kocharyan KS, Cave R, Hambardzumyan AD and Mkrtchyan HV

    Wellcome Sanger Institute, Saffron Walden, United Kingdom.

    Whole-genome sequencing (WGS) of methicillin-resistant <i>Staphylococcus aureus</i> (MRSA) has been sparse in low- and middle-income countries, therefore, its population structure is unknown for many regions. We conducted a pilot surveillance of MRSA in the maternity ward of a teaching hospital in Armenia, to characterize the genotypes of circulating MRSA clones. In total, 10 MRSA isolates from a hospital environment (<i>n</i> = 4) and patients (<i>n</i> = 6) were recovered between March and May 2015 and April and May 2016, respectively. WGS analysis showed that the isolates belonged to two clonal complexes (CCs): CC8 (<i>n</i> = 8) and CC30 (<i>n</i> = 2). MRSA CC30 isolates carried staphylococcal cassette chromosome <i>mec</i> (SCC<i>mec</i>) type IVa, whereas MRSA CC8 revealed a type-V<sub>T</sub>-related SCC<i>mec</i>, which contained a CRISPR/Cas array and showed a high similarity to SCC<i>mec</i> found in coagulase-negative staphylococci. All but one MRSA CC8 isolates carried a plasmid identical to the pSK67 and four also carried a pathogenicity island similar to SaPI5. Phylogenetic analysis showed that the MRSA CC8 isolates formed a monophyletic cluster, which emerged around 1995 and was distinct from representatives of globally-distributed MRSA CC8 lineages. WGS characterization of MRSA in countries with no previous <i>S. aureus</i> genomic surveillance can therefore reveal an unrecognized diversity of MRSA lineages.

    Frontiers in microbiology 2019;10;1592

  • An unusual Burkholderia gladioli double chain-initiating nonribosomal peptide synthetase assembles 'fungal' icosalide antibiotics.

    Jenner M, Jian X, Dashti Y, Masschelein J, Hobson C, Roberts DM, Jones C, Harris S, Parkhill J, Raja HA, Oberlies NH, Pearce CJ, Mahenthiralingam E and Challis GL

    Department of Chemistry , University of Warwick , Coventry CV4 7AL , UK . Email:

    <i>Burkholderia</i> is a multi-talented genus of Gram-negative bacteria, which in recent years has become increasingly recognised as a promising source of bioactive natural products. Metabolite profiling of <i>Burkholderia gladioli</i> BCC0238 showed that it produces the asymmetric lipopeptidiolide antibiotic icosalide A1, originally isolated from a fungus. Comparative bioinformatics analysis of several genome-sequenced <i>B. gladioli</i> isolates identified a gene encoding a nonribosomal peptide synthase (NRPS) with an unusual architecture that was predicted to be responsible for icosalide biosynthesis. Inactivation of this gene in <i>B. gladioli</i> BCC0238 abolished icosalide production. PCR analysis and sequencing of total DNA from the original fungal icosalide A1 producer revealed it has a <i>B. gladioli</i> strain associated with it that harbours an NRPS with an identical architecture to that responsible for icosalide A1 assembly in <i>B. gladioli</i> BCC0238. Sequence analysis of the icosalide NRPS indicated that it contains two chain-initiating condensation (C<sub>I</sub>) domains. One of these is appended to the N-terminus of module 1 - a common architecture for NRPSs involved in lipopeptide assembly. The other is embedded in module 3, immediately downstream of a putative chain-elongating condensation domain. Analysis of the reactions catalysed by a tridomain construct from module 3 of the NRPS using intact protein mass spectrometry showed that the embedded C<sub>I</sub> domain initiates assembly of a second lipopeptide chain, providing key insights into the mechanism for asymmetric diolide assembly.

    Funded by: Medical Research Council: MR/N501839/1

    Chemical science 2019;10;21;5489-5494

  • Porcine antiviral activity is increased by CRISPRa-SAM system.

    Jiang J, Sun Y, Xiao R, Wai K, Ahmad MJ, Khan FA, Zhou H, Li Z, Zhang Y, Zhou A and Zhang S

    Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction Ministry of Education, Huazhong Agricultural University, Wuhan 430070, Hubei, People's Republic of China.

    Clustered Regularly Interspaced Short Palindromic Repeat activation-synergistic activation mediator system (CRISPRa-SAM) has been efficiently used to up-regulate the targeted genes in human and mouse. But it is not known whether the CRISPRa-SAM system can be used against porcine disease because its two important transcriptional activation domains (P65 and heat shock transcription factor 1 (HSF1)) are from mouse and human, respectively. Pig is one of the most important meat sources, porcine viral infectious diseases cause massive economic losses to the swine industry and threaten the public health. We aimed to investigate whether the CRISPRa-SAM system could increase porcine antiviral activity by mediating two pig-specific target genes (<i>Mx2</i> and β1,4 N-acetylgalactosaminyltransferase (<i>B4galnt2</i>)). First, we constructed PK-15 and IPEC-J2 cell lines expressing nuclease-deficient Cas9 (dCas9)-vp64 and MS2-P65-HSF1 stably. Next, in these two cell models, we activated <i>Mx2</i> and <i>B4galnt2</i> expression through CRISPRa-SAM system. Antiviral activity to PRV or H9N2 was improved in PK-15 cells where <i>Mx2</i> or <i>B4galnt2</i> was activated. Altogether, our results demonstrated the potential of CRISPRa-SAM system as a powerful tool for activating pig genes and improving porcine antiviral activity.

    Bioscience reports 2019;39;8

  • Genomics: the power, potential and pitfalls of the new technologies and how they are transforming healthcare.

    Josephs KS, Berner A, George A, Scott RH, Health Education England's Genomic Education Programme, Firth HV and Tatton-Brown K

    South West Thames Regional Genetic Services, London, UK and St George's, University of London, London, UK.

    Powerful new genomic technologies are transforming healthcare. The faster, cheaper generation of genomic data is driving the integration of genomics into all healthcare specialties. Within the next decade, healthcare professionals will be using genomic data to diagnose and manage their patients.However, despite these exciting advances, few clinicians are aware of or prepared for this genomics-based future. Through five patient-focused scenarios with accompanying interviews, this article showcases new genomic technologies while highlighting the inherent challenges associated with complex genomic data.

    Clinical medicine (London, England) 2019;19;4;269-272

  • Protein-coding variants implicate novel genes related to lipid homeostasis contributing to body-fat distribution.

    Justice AE, Karaderi T, Highland HM, Young KL, Graff M, Lu Y, Turcot V, Auer PL, Fine RS, Guo X, Schurmann C, Lempradl A, Marouli E, Mahajan A, Winkler TW, Locke AE, Medina-Gomez C, Esko T, Vedantam S, Giri A, Lo KS, Alfred T, Mudgal P, Ng MCY, Heard-Costa NL, Feitosa MF, Manning AK, Willems SM, Sivapalaratnam S, Abecasis G, Alam DS, Allison M, Amouyel P, Arzumanyan Z, Balkau B, Bastarache L, Bergmann S, Bielak LF, Blüher M, Boehnke M, Boeing H, Boerwinkle E, Böger CA, Bork-Jensen J, Bottinger EP, Bowden DW, Brandslund I, Broer L, Burt AA, Butterworth AS, Caulfield MJ, Cesana G, Chambers JC, Chasman DI, Chen YI, Chowdhury R, Christensen C, Chu AY, Collins FS, Cook JP, Cox AJ, Crosslin DS, Danesh J, de Bakker PIW, Denus S, Mutsert R, Dedoussis G, Demerath EW, Dennis JG, Denny JC, Di Angelantonio E, Dörr M, Drenos F, Dubé MP, Dunning AM, Easton DF, Elliott P, Evangelou E, Farmaki AE, Feng S, Ferrannini E, Ferrieres J, Florez JC, Fornage M, Fox CS, Franks PW, Friedrich N, Gan W, Gandin I, Gasparini P, Giedraitis V, Girotto G, Gorski M, Grallert H, Grarup N, Grove ML, Gustafsson S, Haessler J, Hansen T, Hattersley AT, Hayward C, Heid IM, Holmen OL, Hovingh GK, Howson JMM, Hu Y, Hung YJ, Hveem K, Ikram MA, Ingelsson E, Jackson AU, Jarvik GP, Jia Y, Jørgensen T, Jousilahti P, Justesen JM, Kahali B, Karaleftheri M, Kardia SLR, Karpe F, Kee F, Kitajima H, Komulainen P, Kooner JS, Kovacs P, Krämer BK, Kuulasmaa K, Kuusisto J, Laakso M, Lakka TA, Lamparter D, Lange LA, Langenberg C, Larson EB, Lee NR, Lee WJ, Lehtimäki T, Lewis CE, Li H, Li J, Li-Gao R, Lin LA, Lin X, Lind L, Lindström J, Linneberg A, Liu CT, Liu DJ, Luan J, Lyytikäinen LP, MacGregor S, Mägi R, Männistö S, Marenne G, Marten J, Masca NGD, McCarthy MI, Meidtner K, Mihailov E, Moilanen L, Moitry M, Mook-Kanamori DO, Morgan A, Morris AP, Müller-Nurasyid M, Munroe PB, Narisu N, Nelson CP, Neville M, Ntalla I, O'Connell JR, Owen KR, Pedersen O, Peloso GM, Pennell CE, Perola M, Perry JA, Perry JRB, Pers TH, Ewing A, Polasek O, Raitakari OT, Rasheed A, Raulerson CK, Rauramaa R, Reilly DF, Reiner AP, Ridker PM, Rivas MA, Robertson NR, Robino A, Rudan I, Ruth KS, Saleheen D, Salomaa V, Samani NJ, Schreiner PJ, Schulze MB, Scott RA, Segura-Lepe M, Sim X, Slater AJ, Small KS, Smith BH, Smith JA, Southam L, Spector TD, Speliotes EK, Stefansson K, Steinthorsdottir V, Stirrups KE, Strauch K, Stringham HM, Stumvoll M, Sun L, Surendran P, Swart KMA, Tardif JC, Taylor KD, Teumer A, Thompson DJ, Thorleifsson G, Thorsteinsdottir U, Thuesen BH, Tönjes A, Torres M, Tsafantakis E, Tuomilehto J, Uitterlinden AG, Uusitupa M, van Duijn CM, Vanhala M, Varma R, Vermeulen SH, Vestergaard H, Vitart V, Vogt TF, Vuckovic D, Wagenknecht LE, Walker M, Wallentin L, Wang F, Wang CA, Wang S, Wareham NJ, Warren HR, Waterworth DM, Wessel J, White HD, Willer CJ, Wilson JG, Wood AR, Wu Y, Yaghootkar H, Yao J, Yerges-Armstrong LM, Young R, Zeggini E, Zhan X, Zhang W, Zhao JH, Zhao W, Zheng H, Zhou W, Zillikens MC, Rivadeneira F, Borecki IB, Pospisilik JA, Deloukas P, Frayling TM, Lettre G, Mohlke KL, Rotter JI, Kutalik Z, Hirschhorn JN, Cupples LA, Loos RJF, North KE, Lindgren CM, CHD Exome+ Consortium, Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium, EPIC-CVD Consortium, ExomeBP Consortium, Global Lipids Genetic Consortium, GoT2D Genes Consortium, InterAct, ReproGen Consortium, T2D-Genes Consortium and MAGIC Investigators

    Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA.

    Body-fat distribution is a risk factor for adverse cardiovascular health consequences. We analyzed the association of body-fat distribution, assessed by waist-to-hip ratio adjusted for body mass index, with 228,985 predicted coding and splice site variants available on exome arrays in up to 344,369 individuals from five major ancestries (discovery) and 132,177 European-ancestry individuals (validation). We identified 15 common (minor allele frequency, MAF ≥5%) and nine low-frequency or rare (MAF <5%) coding novel variants. Pathway/gene set enrichment analyses identified lipid particle, adiponectin, abnormal white adipose tissue physiology and bone development and morphology as important contributors to fat distribution, while cross-trait associations highlight cardiometabolic traits. In functional follow-up analyses, specifically in Drosophila RNAi-knockdowns, we observed a significant increase in the total body triglyceride levels for two genes (DNAH10 and PLXND1). We implicate novel genes in fat distribution, stressing the importance of interrogating low-frequency and protein-coding variants.

    Funded by: British Heart Foundation: FS/12/82/29736, RG/13/13/30194, RG/14/5/30893, RG/18/13/33946; Medical Research Council: G9521010, MC_EX_MR/M009203/1, MC_PC_14089, MC_UU_00007/10, MC_UU_12015/1, MR/L01341X/1, MR/L01632X/1, MR/M009203/1, MR/S003746/1; NCATS NIH HHS: KL2 TR001109; NHGRI NIH HHS: R01 HG008983, R56 HG010297, U01 HG007416, U01 HG007417; NHLBI NIH HHS: K99 HL130580, R00 HL130580, R21 HL121422, T32 HL007055; NICHD NIH HHS: R01 HD057194; NIDA NIH HHS: R21 DA040177; NIDDK NIH HHS: P30 DK020572, P30 DK056336, P30 DK079626, R01 DK062370, R01 DK072193, R01 DK075787, R01 DK089256, R01 DK093757, R01 DK106621, R01 DK107786, R01 DK107904, R01 DK110113, U01 DK062370; NIGMS NIH HHS: R01 GM126479

    Nature genetics 2019;51;3;452-469

  • The resistome and genomic reconnaissance in the age of malaria elimination.

    Kümpornsin K, Kochakarn T and Chookajorn T

    Parasites and Microbes Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK.

    Malaria is an infectious disease caused by parasitic protozoa in the <i>Plasmodium</i> genus. A complete understanding of the biology of these parasites is challenging in view of their need to switch between the vertebrate and insect hosts. The parasites are also capable of becoming highly motile and of remaining dormant for decades, depending on the stage of their life cycle. Malaria elimination efforts have been implemented in several endemic countries, but the parasites have proven to be resilient. One of the major obstacles for malaria elimination is the development of antimalarial drug resistance. Ineffective treatment regimens will fail to remove the circulating parasites and to prevent the local transmission of the disease. Genomic epidemiology of malaria parasites has become a powerful tool to track emerging drug-resistant parasite populations almost in real time. Population-scale genomic data are instrumental in tracking the hidden pockets of <i>Plasmodium</i> in nationwide elimination efforts. However, genomic surveillance data can be useful in determining the threat only when combined with a thorough understanding of the malarial resistome - the genetic repertoires responsible for causing and potentiating drug resistance evolution. Even though long-term selection has been a standard method for drug target identification in laboratories, its implementation in large-scale exploration of the druggable space in <i>Plasmodium falciparum</i>, along with genome-editing technologies, have enabled mapping of the genetic repertoires that drive drug resistance. This Review presents examples of practical use and describes the latest technology to show the power of real-time genomic epidemiology in achieving malaria elimination.

    Disease models & mechanisms 2019;12;12

  • PhenoScanner V2: an expanded tool for searching human genotype-phenotype associations.

    Kamat MA, Blackshaw JA, Young R, Surendran P, Burgess S, Danesh J, Butterworth AS and Staley JR

    MRC/BHF Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK.

    Summary: PhenoScanner is a curated database of publicly available results from large-scale genetic association studies in humans. This online tool facilitates 'phenome scans', where genetic variants are cross-referenced for association with many phenotypes of different types. Here we present a major update of PhenoScanner ('PhenoScanner V2'), including over 150 million genetic variants and more than 65 billion associations (compared to 350 million associations in PhenoScanner V1) with diseases and traits, gene expression, metabolite and protein levels, and epigenetic markers. The query options have been extended to include searches by genes, genomic regions and phenotypes, as well as for genetic variants. All variants are positionally annotated using the Variant Effect Predictor and the phenotypes are mapped to Experimental Factor Ontology terms. Linkage disequilibrium statistics from the 1000 Genomes project can be used to search for phenotype associations with proxy variants.

    Availability and implementation: PhenoScanner V2 is available at

    Funded by: Medical Research Council: G0800270, MC_UU_00002/7, MR/L003120/1, MR/S003746/1

    Bioinformatics (Oxford, England) 2019;35;22;4851-4853

  • Efficiently inferring the demographic history of many populations with allele count data.

    Kamm J, Terhorst J, Durbin R and Song YS

    Wellcome Sanger Institute, Hinxton, Cambridge, UK.

    The sample frequency spectrum (SFS), or histogram of allele counts, is an important summary statistic in evolutionary biology, and is often used to infer the history of population size changes, migrations, and other demographic events affecting a set of populations. The expected multipopulation SFS under a given demographic model can be efficiently computed when the populations in the model are related by a tree, scaling to hundreds of populations. Admixture, back-migration, and introgression are common natural processes that violate the assumption of a tree-like population history, however, and until now the expected SFS could be computed for only a handful of populations when the demographic history is not a tree. In this article, we present a new method for efficiently computing the expected SFS and linear functionals of it, for demographies described by general directed acyclic graphs. This method can scale to more populations than p reviously possible for complex demographic histories including admixture. We apply our method to an 8-population SFS to estimate the timing and strength of a proposed "basal Eurasian" admixture event in human history. We implement and release our method in a new open-source software package momi2.

    Funded by: NIGMS NIH HHS: R01 GM109454

    Journal of the American Statistical Association 2019;115;531;1472-1487

  • Dual diagnosis causing severe phenotype in a patient with Angelman syndrome.

    Kanani F, Mordekar S, Parker MJ, Balasubramanian M and DDD Study

    Sheffield Clinical Genetics Service.

    Clinical dysmorphology 2019;28;3;160-163

  • The Microbiome in Paediatric Crohn's Disease-A Longitudinal, Prospective, 
Single-Centre Study.

    Kansal S, Catto-Smith AG, Boniface K, Thomas S, Cameron DJ, Oliver M, Alex G, Kirkwood CD and Wagner J

    Department of Gastroenterology and Clinical Nutrition, The Royal Children's Hospital, Parkville, Victoria, Australia.

    Background and aims: The gut mucosa is the principal site where Crohn's disease [CD] inflammation occurs. Limited information is available about the gut mucosal microbiome during CD relapse and remission. The aim of our study was to characterize specific changes in the gut microbiome during relapse and remission in a large single-centre paediatric CD cohort.

    Methods: We analysed the microbiome of 345 biopsies from 204 patients, including 88 CD first diagnosis [CDFD] patients, 38 relapse [CDRL] patients, 12 remission [CDRM] patients, and 66 controls. Species identification was conducted using oligotyping in combination with ARB/SILVA taxonomic annotation.

    Results: We observed 45 bacteria to differ between CDFD samples and controls with statistical significance, with Fusobacterium being the most implicated species in CDFD patients. We also identified gender-specific differences in CD. Five species showed a strong association with CDRL patients and 10 species with CDRM patients. Three taxa showed a positive co-occurrence across the two groups. Hespellia porcina [closest taxonomic neighbour to Clostridium oroticum] was the most strongly associated with CDRL samples. Interestingly, Fusobacterium was not part of the CDRL-associated taxa group. Faecalibacterium prausnitzii was equally present in CDFD and control samples.

    Conclusion: This is the first study that has investigated the gut mucosal microbiome in a paediatric CD cohort with longitudinal sampling. Importantly, the microbiome of patients in CDRM did not return to a healthy control state. Neither did the microbiome of patients with CDRL return to the profile seen at CDFD.

    Journal of Crohn's & colitis 2019;13;8;1044-1054

  • Exome-wide assessment of the functional impact and pathogenicity of multinucleotide mutations.

    Kaplanis J, Akawi N, Gallone G, McRae JF, Prigmore E, Wright CF, Fitzpatrick DR, Firth HV, Barrett JC, Hurles ME and Deciphering Developmental Disorders study

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, United Kingdom.

    Approximately 2% of de novo single-nucleotide variants (SNVs) appear as part of clustered mutations that create multinucleotide variants (MNVs). MNVs are an important source of genomic variability as they are more likely to alter an encoded protein than a SNV, which has important implications in disease as well as evolution. Previous studies of MNVs have focused on their mutational origins and have not systematically evaluated their functional impact and contribution to disease. We identified 69,940 MNVs and 91 de novo MNVs in 6688 exome-sequenced parent-offspring trios from the Deciphering Developmental Disorders Study comprising families with severe developmental disorders. We replicated the previously described MNV mutational signatures associated with DNA polymerase zeta, an error-prone translesion polymerase, and the APOBEC family of DNA deaminases. We estimate the simultaneous MNV germline mutation rate to be 1.78 × 10<sup>-10</sup> mutations per base pair per generation. We found that most MNVs within a single codon create a missense change that could not have been created by a SNV. MNV-induced missense changes were, on average, more physicochemically divergent, were more depleted in highly constrained genes (pLI ≥ 0.9), and were under stronger purifying selection compared with SNV-induced missense changes. We found that de novo MNVs were significantly enriched in genes previously associated with developmental disorders in affected children. This shows that MNVs can be more damaging than SNVs even when both induce missense changes, and are an important variant type to consider in relation to human disease.

    Funded by: Department of Health; Wellcome Trust: WT098051

    Genome research 2019;29;7;1047-1056

  • Disentangling the genetics of lean mass.

    Karasik D, Zillikens MC, Hsu YH, Aghdassi A, Akesson K, Amin N, Barroso I, Bennett DA, Bertram L, Bochud M, Borecki IB, Broer L, Buchman AS, Byberg L, Campbell H, Campos-Obando N, Cauley JA, Cawthon PM, Chambers JC, Chen Z, Cho NH, Choi HJ, Chou WC, Cummings SR, de Groot LCPGM, De Jager PL, Demuth I, Diatchenko L, Econs MJ, Eiriksdottir G, Enneman AW, Eriksson J, Eriksson JG, Estrada K, Evans DS, Feitosa MF, Fu M, Gieger C, Grallert H, Gudnason V, Lenore LJ, Hayward C, Hofman A, Homuth G, Huffman KM, Husted LB, Illig T, Ingelsson E, Ittermann T, Jansson JO, Johnson T, Biffar R, Jordan JM, Jula A, Karlsson M, Khaw KT, Kilpeläinen TO, Klopp N, Kloth JSL, Koller DL, Kooner JS, Kraus WE, Kritchevsky S, Kutalik Z, Kuulasmaa T, Kuusisto J, Laakso M, Lahti J, Lang T, Langdahl BL, Lerch MM, Lewis JR, Lill C, Lind L, Lindgren C, Liu Y, Livshits G, Ljunggren Ö, Loos RJF, Lorentzon M, Luan J, Luben RN, Malkin I, McGuigan FE, Medina-Gomez C, Meitinger T, Melhus H, Mellström D, Michaëlsson K, Mitchell BD, Morris AP, Mosekilde L, Nethander M, Newman AB, O'Connell JR, Oostra BA, Orwoll ES, Palotie A, Peacock M, Perola M, Peters A, Prince RL, Psaty BM, Räikkönen K, Ralston SH, Ripatti S, Rivadeneira F, Robbins JA, Rotter JI, Rudan I, Salomaa V, Satterfield S, Schipf S, Shin CS, Smith AV, Smith SB, Soranzo N, Spector TD, Stancáková A, Stefansson K, Steinhagen-Thiessen E, Stolk L, Streeten EA, Styrkarsdottir U, Swart KMA, Thompson P, Thomson CA, Thorleifsson G, Thorsteinsdottir U, Tikkanen E, Tranah GJ, Uitterlinden AG, van Duijn CM, van Schoor NM, Vandenput L, Vollenweider P, Völzke H, Wactawski-Wende J, Walker M, J Wareham N, Waterworth D, Weedon MN, Wichmann HE, Widen E, Williams FMK, Wilson JF, Wright NC, Yerges-Armstrong LM, Yu L, Zhang W, Zhao JH, Zhou Y, Nielson CM, Harris TB, Demissie S, Kiel DP and Ohlsson C

    Hebrew SeniorLife Institute for Aging Research and Harvard Medical School, Boston, MA.

    Background: Lean body mass (LM) plays an important role in mobility and metabolic function. We previously identified five loci associated with LM adjusted for fat mass in kilograms. Such an adjustment may reduce the power to identify genetic signals having an association with both lean mass and fat mass.

    Objectives: To determine the impact of different fat mass adjustments on genetic architecture of LM and identify additional LM loci.

    Methods: We performed genome-wide association analyses for whole-body LM (20 cohorts of European ancestry with n = 38,292) measured using dual-energy X-ray absorptiometry) or bioelectrical impedance analysis, adjusted for sex, age, age2, and height with or without fat mass adjustments (Model 1 no fat adjustment; Model 2 adjustment for fat mass as a percentage of body mass; Model 3 adjustment for fat mass in kilograms).

    Results: Seven single-nucleotide polymorphisms (SNPs) in separate loci, including one novel LM locus (TNRC6B), were successfully replicated in an additional 47,227 individuals from 29 cohorts. Based on the strengths of the associations in Model 1 vs Model 3, we divided the LM loci into those with an effect on both lean mass and fat mass in the same direction and refer to those as "sumo wrestler" loci (FTO and MC4R). In contrast, loci with an impact specifically on LM were termed "body builder" loci (VCAN and ADAMTSL3). Using existing available genome-wide association study databases, LM increasing alleles of SNPs in sumo wrestler loci were associated with an adverse metabolic profile, whereas LM increasing alleles of SNPs in "body builder" loci were associated with metabolic protection.

    Conclusions: In conclusion, we identified one novel LM locus (TNRC6B). Our results suggest that a genetically determined increase in lean mass might exert either harmful or protective effects on metabolic traits, depending on its relation to fat mass.

    Funded by: Cancer Research UK: 14136; Medical Research Council: G0401527, G1000143, MC_UU_00007/10, MC_UU_12015/1, MR/N003284/1; NCCDPHP CDC HHS: U01 DP006266; NIA NIH HHS: U24 AG051129; NIAMS NIH HHS: R01 AR041398, U01 AR066160; NIDDK NIH HHS: R01 DK107786, R01 DK110113

    The American journal of clinical nutrition 2019;109;2;276-287

  • CCL27/CCL28-CCR10 Chemokine Signaling Mediates Migration of Lymphatic Endothelial Cells.

    Karnezis T, Farnsworth RH, Harris NC, Williams SP, Caesar C, Byrne DJ, Herle P, Macheda ML, Shayan R, Zhang YF, Yazar S, Takouridis SJ, Gerard C, Fox SB, Achen MG and Stacker SA

    Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia.

    Metastasis via the lymphatic vasculature is an important step in cancer progression. The formation of new lymphatic vessels (lymphangiogenesis), or remodeling of existing lymphatics, is thought to facilitate the entry and transport of tumor cells into lymphatic vessels and on to distant organs. The migration of lymphatic endothelial cells (LEC) toward guidance cues is critical for lymphangiogenesis. While chemokines are known to provide directional navigation for migrating immune cells, their role in mediating LEC migration during tumor-associated lymphangiogenesis is not well defined. Here, we undertook gene profiling studies to identify chemokine-chemokine receptor pairs that are involved in tumor lymphangiogenesis associated with lymph node metastasis. CCL27 and CCL28 were expressed in tumor cells with metastatic potential, while their cognate receptor, CCR10, was expressed by LECs and upregulated by the lymphangiogenic growth factor VEGFD and the proinflammatory cytokine TNFα. Migration assays demonstrated that LECs are attracted to both CCL27 and CCL28 in a CCR10-dependent manner, while abnormal lymphatic vessel patterning in CCR10-deficient mice confirmed the significant role of CCR10 in lymphatic patterning. <i>In vivo</i> analyses showed that LECs are recruited to a CCL27 or CCL28 source, while VEGFD was required in combination with these chemokines to enable formation of coherent lymphatic vessels. Moreover, tumor xenograft experiments demonstrated that even though CCL27 expression by tumors enhanced LEC recruitment, the ability to metastasize was dependent on the expression of VEGFD. These studies demonstrate that CCL27 and CCL28 signaling through CCR10 may cooperate with inflammatory mediators and VEGFD during tumor lymphangiogenesis. SIGNIFICANCE: The study shows that the remodeling of lymphatic vessels in cancer is influenced by CCL27 and CCL28 chemokines, which may provide a future target to modulate metastatic spread.

    Cancer research 2019;79;7;1558-1572

  • Nedd8 hydrolysis by UCH proteases in Plasmodium parasites.

    Karpiyevich M, Adjalley S, Mol M, Ascher DB, Mason B, van der Heden van Noort GJ, Laman H, Ovaa H, Lee MCS and Artavanis-Tsakonas K

    Department of Pathology, University of Cambridge, Cambridge, United Kingdom.

    Plasmodium parasites are the causative agents of malaria, a disease with wide public health repercussions. Increasing drug resistance and the absence of a vaccine make finding new chemotherapeutic strategies imperative. Components of the ubiquitin and ubiquitin-like pathways have garnered increased attention as novel targets given their necessity to parasite survival. Understanding how these pathways are regulated in Plasmodium and identifying differences to the host is paramount to selectively interfering with parasites. Here, we focus on Nedd8 modification in Plasmodium falciparum, given its central role to cell division and DNA repair, processes critical to Plasmodium parasites given their unusual cell cycle and requirement for refined repair mechanisms. By applying a functional chemical approach, we show that deNeddylation is controlled by a different set of enzymes in the parasite versus the human host. We elucidate the molecular determinants of the unusual dual ubiquitin/Nedd8 recognition by the essential PfUCH37 enzyme and, through parasite transgenics and drug assays, determine that only its ubiquitin activity is critical to parasite survival. Our experiments reveal interesting evolutionary differences in how neddylation is controlled in higher versus lower eukaryotes, and highlight the Nedd8 pathway as worthy of further exploration for therapeutic targeting in antimalarial drug design.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/R001642/1; Medical Research Council: MR/M026302/1; Wellcome Trust: 085054/Z/08/Z, 206194

    PLoS pathogens 2019;15;10;e1008086

  • htsget: a protocol for securely streaming genomic data.

    Kelleher J, Lin M, Albach CH, Birney E, Davies R, Gourtovaia M, Glazer D, Gonzalez CY, Jackson DK, Kemp A, Marshall J, Nowak A, Senf A, Tovar-Corona JM, Vikhorev A, Keane TM and GA4GH Streaming Task Team

    Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK.

    Summary: Standardized interfaces for efficiently accessing high-throughput sequencing data are a fundamental requirement for large-scale genomic data sharing. We have developed htsget, a protocol for secure, efficient and reliable access to sequencing read and variation data. We demonstrate four independent client and server implementations, and the results of a comprehensive interoperability demonstration.

    Availability and implementation:

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Funded by: Wellcome Trust: 100956/Z/13/Z, 201535/Z/16/Z

    Bioinformatics (Oxford, England) 2019;35;1;119-121

  • KAT6A Syndrome: genotype-phenotype correlation in 76 patients with pathogenic KAT6A variants.

    Kennedy J, Goudie D, Blair E, Chandler K, Joss S, McKay V, Green A, Armstrong R, Lees M, Kamien B, Hopper B, Tan TY, Yap P, Stark Z, Okamoto N, Miyake N, Matsumoto N, Macnamara E, Murphy JL, McCormick E, Hakonarson H, Falk MJ, Li D, Blackburn P, Klee E, Babovic-Vuksanovic D, Schelley S, Hudgins L, Kant S, Isidor B, Cogne B, Bradbury K, Williams M, Patel C, Heussler H, Duff-Farrier C, Lakeman P, Scurr I, Kini U, Elting M, Reijnders M, Schuurs-Hoeijmakers J, Wafik M, Blomhoff A, Ruivenkamp CAL, Nibbeling E, Dingemans AJM, Douine ED, Nelson SF, DDD Study,, Hempel M, Bierhals T, Lessel D, Johannsen J, Arboleda VA and Newbury-Ecob R

    Clinical Genetics, University Hospitals Bristol, Southwell St, Bristol, UK.

    Purpose: Pathogenic variants in KAT6A have recently been identified as a cause of syndromic developmental delay. Within 2 years, the number of patients identified with pathogenic KAT6A variants has rapidly expanded and the full extent and variability of the clinical phenotype has not been reported.

    Methods: We obtained data for patients with KAT6A pathogenic variants through three sources: treating clinicians, an online family survey distributed through social media, and a literature review.

    Results: We identified 52 unreported cases, bringing the total number of published cases to 76. Our results expand the genotypic spectrum of pathogenic variants to include missense and splicing mutations. We functionally validated a pathogenic splice-site variant and identified a likely hotspot location for de novo missense variants. The majority of clinical features in KAT6A syndrome have highly variable penetrance. For core features such as intellectual disability, speech delay, microcephaly, cardiac anomalies, and gastrointestinal complications, genotype- phenotype correlations show that late-truncating pathogenic variants (exons 16-17) are significantly more prevalent. We highlight novel associations, including an increased risk of gastrointestinal obstruction.

    Conclusion: Our data expand the genotypic and phenotypic spectrum for individuals with genetic pathogenic variants in KAT6A and we outline appropriate clinical management.

    Funded by: Department of Health; NICHD NIH HHS: U54 HD086984; NIH HHS: DP5 OD024579; Wellcome Trust: WT098051

    Genetics in medicine : official journal of the American College of Medical Genetics 2019;21;4;850-860

  • Lipoprotein signatures of cholesteryl ester transfer protein and HMG-CoA reductase inhibition.

    Kettunen J, Holmes MV, Allara E, Anufrieva O, Ohukainen P, Oliver-Williams C, Wang Q, Tillin T, Hughes AD, Kähönen M, Lehtimäki T, Viikari J, Raitakari OT, Salomaa V, Järvelin MR, Perola M, Davey Smith G, Chaturvedi N, Danesh J, Di Angelantonio E, Butterworth AS and Ala-Korpela M

    Computational Medicine, Faculty of Medicine, University of Oulu and Biocenter Oulu, Oulu, Finland.

    Cholesteryl ester transfer protein (CETP) inhibition reduces vascular event risk, but confusion surrounds its effects on low-density lipoprotein (LDL) cholesterol. Here, we clarify associations of genetic inhibition of CETP on detailed lipoprotein measures and compare those to genetic inhibition of 3-hydroxy-3-methylglutaryl-coenzyme A reductase (HMGCR). We used an allele associated with lower CETP expression (rs247617) to mimic CETP inhibition and an allele associated with lower HMGCR expression (rs12916) to mimic the well-known effects of statins for comparison. The study consists of 65,427 participants of European ancestries with detailed lipoprotein subclass profiling from nuclear magnetic resonance spectroscopy. Genetic associations were scaled to 10% reduction in relative risk of coronary heart disease (CHD). We also examined observational associations of the lipoprotein subclass measures with risk of incident CHD in 3 population-based cohorts totalling 616 incident cases and 13,564 controls during 8-year follow-up. Genetic inhibition of CETP and HMGCR resulted in near-identical associations with LDL cholesterol concentration estimated by the Friedewald equation. Inhibition of HMGCR had relatively consistent associations on lower cholesterol concentrations across all apolipoprotein B-containing lipoproteins. In contrast, the associations of the inhibition of CETP were stronger on lower remnant and very-low-density lipoprotein (VLDL) cholesterol, but there were no associations on cholesterol concentrations in LDL defined by particle size (diameter 18-26 nm) (-0.02 SD LDL defined by particle size; 95% CI: -0.10 to 0.05 for CETP versus -0.24 SD, 95% CI -0.30 to -0.18 for HMGCR). Inhibition of CETP was strongly associated with lower proportion of triglycerides in all high-density lipoprotein (HDL) particles. In observational analyses, a higher triglyceride composition within HDL subclasses was associated with higher risk of CHD, independently of total cholesterol and triglycerides (strongest hazard ratio per 1 SD higher triglyceride composition in very large HDL 1.35; 95% CI: 1.18-1.54). In conclusion, CETP inhibition does not appear to affect size-specific LDL cholesterol but is likely to lower CHD risk by lowering concentrations of other atherogenic, apolipoprotein B-containing lipoproteins (such as remnant and VLDLs). Inhibition of CETP also lowers triglyceride composition in HDL particles, a phenomenon reflecting combined effects of circulating HDL, triglycerides, and apolipoprotein B-containing particles and is associated with a lower CHD risk in observational analyses. Our results reveal that conventional composite lipid assays may mask heterogeneous effects of emerging lipid-altering therapies.

    PLoS biology 2019;17;12;e3000572

  • Massively parallel sequencing of autosomal STRs and identity-informative SNPs highlights consanguinity in Saudi Arabia.

    Khubrani YM, Hallast P, Jobling MA and Wetton JH

    Department of Genetics & Genome Biology, University of Leicester, Leicester, UK; Forensic Genetics Laboratory, General Administration of Criminal Evidence, Public Security, Ministry of Interior, Saudi Arabia.

    While many studies have been undertaken of Middle Eastern populations using autosomal STR profiling by capillary electrophoresis, little has so far been published from this region on the forensic use of massively parallel sequencing (MPS). Here, we carried out MPS of 27 autosomal STRs and 91 identity-informative SNPs (iiSNPs) with the Verogen ForenSeq™ DNA Signature Prep Kit on a representative sample of 89 Saudi Arabian males, and analysed the resulting sequence data using Verogen's ForenSeq Universal Analysis Software (UAS) v1.3 and STRait Razor v3.0. This revealed sequence variation in the composition of complex STR arrays, and SNPs in their flanking regions, which raised the number of STR alleles from 238 distinct length variants to 357 sequence sub-variants. Similarly, between one and three additional polymorphic sites were observed within the amplicons of 37 of the 91 iiSNPs, forming up to six microhaplotypes per locus. These further enhance discrimination compared to the biallelic target SNP data presented by the primary UAS interface. In total, we observed twenty-two STR alleles previously unrecognised in the STRait Razor v3.0 default allele list, along with nine SNPs flanking target iiSNPs that were not highlighted by the UAS. Sequencing reduced the STR-based random match probability (RMP) from 2.62E-30 to 3.49E-34, and analysis of the iiSNP microhaplotypes reduced RMP from 9.97E-37 to 6.83E-40. The lack of significant linkage disequilibrium between STRs and target iiSNPs allowed the two marker types to be combined using the product rule, yielding a RMP of 2.39E-73. Evidence of consanguinity was apparent from both marker types. While TPOX was the only locus displaying a significant deviation from Hardy-Weinberg equilibrium, 23 out of 27 STRs and 63 out of 91 iiSNPs showed fewer than expected heterozygotes, demonstrating an overall homozygote excess probably reflecting the high frequency of first-cousin marriages in Saudi Arabia. We placed our data in a global context by considering the same markers in the Human Genome Diversity Panel (HGDP), revealing that the Saudi sample was typical of Middle Eastern populations, with a higher level of inbreeding than is seen in most European, African and Central/South Asian populations, correlating with known patterns of endogamy. Given reduced levels of diversity within endogamous groups, the ability to combine the discrimination power of both STRs and SNPs offers significant benefits in the analysis of forensic evidence in Saudi Arabia and the Middle East region more generally.

    Forensic science international. Genetics 2019;43;102164

  • A new patient-derived iPSC model for dystroglycanopathies validates a compound that increases glycosylation of α-dystroglycan.

    Kim J, Lana B, Torelli S, Ryan D, Catapano F, Ala P, Luft C, Stevens E, Konstantinidis E, Louzada S, Fu B, Paredes-Redondo A, Chan AE, Yang F, Stemple DL, Liu P, Ketteler R, Selwood DL, Muntoni F and Lin YY

    Centre for Genomics and Child Health, Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK.

    Dystroglycan, an extracellular matrix receptor, has essential functions in various tissues. Loss of α-dystroglycan-laminin interaction due to defective glycosylation of α-dystroglycan underlies a group of congenital muscular dystrophies often associated with brain malformations, referred to as dystroglycanopathies. The lack of isogenic human dystroglycanopathy cell models has limited our ability to test potential drugs in a human- and neural-specific context. Here, we generated induced pluripotent stem cells (iPSCs) from a severe dystroglycanopathy patient with homozygous FKRP (fukutin-related protein gene) mutation. We showed that CRISPR/Cas9-mediated gene correction of FKRP restored glycosylation of α-dystroglycan in iPSC-derived cortical neurons, whereas targeted gene mutation of FKRP in wild-type cells disrupted this glycosylation. In parallel, we screened 31,954 small molecule compounds using a mouse myoblast line for increased glycosylation of α-dystroglycan. Using human FKRP-iPSC-derived neural cells for hit validation, we demonstrated that compound 4-(4-bromophenyl)-6-ethylsulfanyl-2-oxo-3,4-dihydro-1H-pyridine-5-carbonitrile (4BPPNit) significantly augmented glycosylation of α-dystroglycan, in part through upregulation of LARGE1 glycosyltransferase gene expression. Together, isogenic human iPSC-derived cells represent a valuable platform for facilitating dystroglycanopathy drug discovery and therapeutic development.

    Funded by: EC|Seventh Framework Programme (FP7): 115582, 2012-305121; Newlife - The Charity for Disabled Children: SG/14-15/14; Royal Society: RG130417; UK Research and Innovation|Medical Research Council (MRC): 92-963, MC_U12266B; Wellcome Trust: 098051

    EMBO reports 2019;20;11;e47967

  • A Report from a Workshop of the International Stem Cell Banking Initiative, Held in Collaboration of Global Alliance for iPSC Therapies and the Harvard Stem Cell Institute, Boston, 2017.

    Kim JH, Alderton A, Crook JM, Benvenisty N, Brandsten C, Firpo M, Harrison PW, Kawamata S, Kawase E, Kurtz A, Loring JF, Ludwig T, Man J, Mountford JC, Turner ML, Oh S, da Veiga Pereira L, Pranke P, Sheldon M, Steeg R, Sullivan S, Yaffe M, Zhou Q and Stacey GN

    Division of Intractable Diseases, Korea National Stem Cell Bank, Center for Biomedical Sciences, Korea National Institute of Health, Cheongju, Korea.

    This report summarizes the recent activity of the International Stem Cell Banking Initiative held at Harvard Stem Cell Institute, Boston, MA, USA, on June 18, 2017. In this meeting, we aimed to find consensus on ongoing issues of quality control (QC), safety, and efficacy of human pluripotent stem cell banks and their derivative cell therapy products for the global harmonization. In particular, assays for the QC testing such as pluripotency assays test and general QC testing criteria were intensively discussed. Moreover, the recent activities of global stem cell banking centers and the regulatory bodies were briefly summarized to provide an overview on global developments and issues. Stem Cells 2019;37:1130-1135.

    Funded by: BBSRC; MRC

    Stem cells (Dayton, Ohio) 2019;37;9;1130-1135

  • A High-Quality De novo Genome Assembly from a Single Mosquito Using PacBio Sequencing.

    Kingan SB, Heaton H, Cudini J, Lambert CC, Baybayan P, Galvin BD, Durbin R, Korlach J and Lawniczak MKN

    Pacific Biosciences, 1305 O'Brien Drive, Menlo Park, CA 94025, USA.

    A high-quality reference genome is a fundamental resource for functional genetics, comparative genomics, and population genomics, and is increasingly important for conservation biology. PacBio Single Molecule, Real-Time (SMRT) sequencing generates long reads with uniform coverage and high consensus accuracy, making it a powerful technology for <i>de novo</i> genome assembly. Improvements in throughput and concomitant reductions in cost have made PacBio an attractive core technology for many large genome initiatives, however, relatively high DNA input requirements (~5 µg for standard library protocol) have placed PacBio out of reach for many projects on small organisms that have lower DNA content, or on projects with limited input DNA for other reasons. Here we present a high-quality <i>de novo</i> genome assembly from a single <i>Anopheles coluzzii</i> mosquito. A modified SMRTbell library construction protocol without DNA shearing and size selection was used to generate a SMRTbell library from just 100 ng of starting genomic DNA. The sample was run on the Sequel System with chemistry 3.0 and software v6.0, generating, on average, 25 Gb of sequence per SMRT Cell with 20 h movies, followed by diploid <i>de novo</i> genome assembly with FALCON-Unzip. The resulting curated assembly had high contiguity (contig N50 3.5 Mb) and completeness (more than 98% of conserved genes were present and full-length). In addition, this single-insect assembly now places 667 (>90%) of formerly unplaced genes into their appropriate chromosomal contexts in the AgamP4 PEST reference. We were also able to resolve maternal and paternal haplotypes for over 1/3 of the genome. By sequencing and assembling material from a single diploid individual, only two haplotypes were present, simplifying the assembly process compared to samples from multiple pooled individuals. The method presented here can be applied to samples with starting DNA amounts as low as 100 ng per 1 Gb genome size. This new low-input approach puts PacBio-based assemblies in reach for small highly heterozygous organisms that comprise much of the diversity of life.

    Funded by: Medical Research Council: G1100339; Wellcome Trust: 206194/Z/17/Z, 207492/Z/17/Z, WT207492

    Genes 2019;10;1

  • Challenges in unsupervised clustering of single-cell RNA-seq data.

    Kiselev VY, Andrews TS and Hemberg M

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

    Single-cell RNA sequencing (scRNA-seq) allows researchers to collect large catalogues detailing the transcriptomes of individual cells. Unsupervised clustering is of central importance for the analysis of these data, as it is used to identify putative cell types. However, there are many challenges involved. We discuss why clustering is a challenging problem from a computational point of view and what aspects of the data make it challenging. We also consider the difficulties related to the biological interpretation and annotation of the identified clusters.

    Nature reviews. Genetics 2019;20;5;273-282

  • Exploring Plasmodium falciparum Var Gene Expression to Assess Host Selection Pressure on Parasites During Infancy.

    Kivisi CA, Muthui M, Hunt M, Fegan G, Otto TD, Githinji G, Warimwe GM, Rance R, Marsh K, Bull PC and Abdi AI

    KEMRI Wellcome Trust Research Programme, Kilifi, Kenya.

    In sub-Saharan Africa, children below 5 years bear the greatest burden of severe malaria because they lack naturally acquired immunity that develops following repeated exposure to infections by <i>Plasmodium falciparum</i>. Antibodies to the surface of <i>P. falciparum</i> infected erythrocytes (IE) play an important role in this immunity. In children under the age of 6 months, relative protection from severe malaria is observed and this is thought to be partly due to trans-placental acquired protective maternal antibodies. However, the protective effect of maternal antibodies has not been fully established, especially the role of antibodies to variant surface antigens (VSA) expressed on IE. Here, we assessed the immune pressure on parasites infecting infants using markers associated with the acquisition of naturally acquired immunity to surface antigens. We hypothesized that, if maternal antibodies to VSA imposed a selection pressure on parasites, then the expression of a relatively conserved subset of <i>var</i> genes called group A <i>var</i> genes in infants should change with waning maternal antibodies. To test this, we compared their expression in parasites from children between 0 and 12 months and above 12 months of age. The transcript quantity and the proportional expression of group A <i>var</i> subgroup, including those containing domain cassette 13, were positively associated with age during the first year of life, which contrasts with above 12 months. This was accompanied by a decline in infected erythrocyte surface antibodies and an increase in parasitemia during this period. The observed increase in group A <i>var</i> gene expression with age in the first year of life, when the maternal antibodies are waning and before acquisition of naturally acquired antibodies with repeated exposure, is consistent with the idea that maternally acquired antibodies impose a selection pressure on parasites that infect infants and may play a role in protecting these infants against severe malaria.

    Frontiers in immunology 2019;10;2328

  • An Orphan CpG Island Drives Expression of a let-7 miRNA Precursor with an Important Role in Mouse Development.

    Koerner MV, Chhatbar K, Webb S, Cholewa-Waclaw J, Selfridge J, De Sousa D, Skarnes B, Rosen B, Thomas M, Bottomley J, Ramires-Solis R, Lelliott C, Adams DJ and Bird A

    Wellcome Centre for Cell Biology, University of Edinburgh, Michael Swann Building, Max Born Crescent, Mayfield Road, Edinburgh EH9 3BF, UK.

    Most human genes are associated with promoters embedded in non-methylated, G + C-rich CpG islands (CGIs). Not all CGIs are found at annotated promoters, however, raising the possibility that many serve as promoters for transcripts that do not code for proteins. To test this hypothesis, we searched for novel transcripts in embryonic stem cells (ESCs) that originate within orphan CGIs. Among several candidates, we detected a transcript that included three members of the <i>let-7</i> micro-RNA family: <i>Let-7a-1, let-7f-1,</i> and <i>let-7d</i>. Deletion of the CGI prevented expression of the precursor RNA and depleted the included miRNAs. Mice homozygous for this mutation were sub-viable and showed growth and other defects. The results suggest that despite the identity of their seed sequences, members of the <i>let-7</i> miRNA family exert distinct functions that cannot be complemented by other members.

    Funded by: Wellcome Trust: 098051, 107930

    Epigenomes 2019;3;1;7

  • PYLFIRE: Python implementation of likelihood-free inference by ratio estimation.

    Kokko J, Remes U, Thomas O, Pesonen H and Corander J

    Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland.

    Likelihood-free inference for simulator-based models is an emerging methodological branch of statistics which has attracted considerable attention in applications across diverse fields such as population genetics, astronomy and economics. Recently, the power of statistical classifiers has been harnessed in likelihood-free inference to obtain either point estimates or even posterior distributions of model parameters. Here we introduce PYLFIRE, an open-source Python implementation of the inference method LFIRE (likelihood-free inference by ratio estimation) that uses penalised logistic regression. PYLFIRE is made available as part of the general ELFI inference software to benefit both the user and developer communities for likelihood-free inference.

    Wellcome open research 2019;4;197

  • CTCF variants in 39 individuals with a variable neurodevelopmental disorder broaden the mutational and clinical spectrum.

    Konrad EDH, Nardini N, Caliebe A, Nagel I, Young D, Horvath G, Santoro SL, Shuss C, Ziegler A, Bonneau D, Kempers M, Pfundt R, Legius E, Bouman A, Stuurman KE, Õunap K, Pajusalu S, Wojcik MH, Vasileiou G, Le Guyader G, Schnelle HM, Berland S, Zonneveld-Huijssoon E, Kersten S, Gupta A, Blackburn PR, Ellingson MS, Ferber MJ, Dhamija R, Klee EW, McEntagart M, Lichtenbelt KD, Kenney A, Vergano SA, Abou Jamra R, Platzer K, Ella Pierpont M, Khattar D, Hopkin RJ, Martin RJ, Jongmans MCJ, Chang VY, Martinez-Agosto JA, Kuismin O, Kurki MI, Pietiläinen O, Palotie A, Maarup TJ, Johnson DS, Venborg Pedersen K, Laulund LW, Lynch SA, Blyth M, Prescott K, Canham N, Ibitoye R, Brilstra EH, Shinawi M, Fassi E, DDD Study, Sticht H, Gregor A, Van Esch H and Zweier C

    Institute of Human Genetics, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany.

    Purpose: Pathogenic variants in the chromatin organizer CTCF were previously reported in seven individuals with a neurodevelopmental disorder (NDD).

    Methods: Through international collaboration we collected data from 39 subjects with variants in CTCF. We performed transcriptome analysis on RNA from blood samples and utilized Drosophila melanogaster to investigate the impact of Ctcf dosage alteration on nervous system development and function.

    Results: The individuals in our cohort carried 2 deletions, 8 likely gene-disruptive, 2 splice-site, and 20 different missense variants, most of them de novo. Two cases were familial. The associated phenotype was of variable severity extending from mild developmental delay or normal IQ to severe intellectual disability. Feeding difficulties and behavioral abnormalities were common, and variable other findings including growth restriction and cardiac defects were observed. RNA-sequencing in five individuals identified 3828 deregulated genes enriched for known NDD genes and biological processes such as transcriptional regulation. Ctcf dosage alteration in Drosophila resulted in impaired gross neurological functioning and learning and memory deficits.

    Conclusion: We significantly broaden the mutational and clinical spectrum ofCTCF-associated NDDs. Our data shed light onto the functional role of CTCF by identifying deregulated genes and show that Ctcf alterations result in nervous system defects in Drosophila.

    Funded by: Department of Health: HICF-1009-003; NHGRI NIH HHS: UM1 HG008900; NIH HHS: T32GM007748; Wellcome Trust: WT098051

    Genetics in medicine : official journal of the American College of Medical Genetics 2019;21;12;2723-2733

  • DNA methylation defines regional identity of human intestinal epithelial organoids and undergoes dynamic changes during development.

    Kraiczy J, Nayak KM, Howell KJ, Ross A, Forbester J, Salvestrini C, Mustata R, Perkins S, Andersson-Rolf A, Leenen E, Liebert A, Vallier L, Rosenstiel PC, Stegle O, Dougan G, Heuschkel R, Koo BK and Zilbauer M

    Department of Paediatrics, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK.

    Objective: Human intestinal epithelial organoids (IEOs) are increasingly being recognised as a highly promising translational research tool. However, our understanding of their epigenetic molecular characteristics and behaviour in culture remains limited.

    Design: We performed genome-wide DNA methylation and transcriptomic profiling of human IEOs derived from paediatric/adult and fetal small and large bowel as well as matching purified human gut epithelium. Furthermore, organoids were subjected to in vitro differentiation and genome editing using CRISPR/Cas9 technology.

    Results: We discovered stable epigenetic signatures which define regional differences in gut epithelial function, including induction of segment-specific genes during cellular differentiation. Established DNA methylation profiles were independent of cellular environment since organoids retained their regional DNA methylation over prolonged culture periods. In contrast to paediatric and adult organoids, fetal gut-derived organoids showed distinct dynamic changes of DNA methylation and gene expression in culture, indicative of an in vitro maturation. By applying CRISPR/Cas9 genome editing to fetal organoids, we demonstrate that this process is partly regulated by TET1, an enzyme involved in the DNA demethylation process. Lastly, generating IEOs from a child diagnosed with gastric heterotopia revealed persistent and distinct disease-associated DNA methylation differences, highlighting the use of organoids as disease-specific research models.

    Conclusions: Our study demonstrates striking similarities of epigenetic signatures in mucosa-derived IEOs with matching primary epithelium. Moreover, these results suggest that intestinal stem cell-intrinsic DNA methylation patterns establish and maintain regional gut specification and are involved in early epithelial development and disease.

    Funded by: Medical Research Council: MC_PC_12009; Wellcome Trust: 101241/Z/13/Z

    Gut 2019;68;1;49-61

  • Genome-Wide Epigenetic and Transcriptomic Characterization of Human-Induced Pluripotent Stem Cell-Derived Intestinal Epithelial Organoids.

    Kraiczy J, Ross ADB, Forbester JL, Dougan G, Vallier L and Zilbauer M

    Department of Pediatrics, University of Cambridge, Cambridge, United Kingdom.

    Funded by: Medical Research Council: MC_PC_12009; National Centre for the Replacement, Refinement and Reduction of Animals in Research: NC/N001540/1; Wellcome Trust: 100138/B/12/Z, 206194

    Cellular and molecular gastroenterology and hepatology 2019;7;2;285-288

  • A natural WNT signaling variant potently synergizes with Cdkn2ab loss in skin carcinogenesis.

    Krimpenfort P, Snoek M, Lambooij JP, Song JY, van der Weide R, Bhaskaran R, Teunissen H, Adams DJ, de Wit E and Berns A

    Oncode Institute, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands.

    Cdkn2ab knockout mice, generated from 129P2 ES cells develop skin carcinomas. Here we show that the incidence of these carcinomas drops gradually in the course of backcrossing to the FVB/N background. Microsatellite analyses indicate that this cancer phenotype is linked to a 20 Mb region of 129P2 chromosome 15 harboring the Wnt7b gene, which is preferentially expressed from the 129P2 allele in skin carcinomas and derived cell lines. ChIPseq analysis shows enrichment of H3K27-Ac, a mark for active enhancers, in the 5' region of the Wnt7b 129P2 gene. The Wnt7b 129P2 allele appears sufficient to cause in vitro transformation of Cdkn2ab-deficient cell lines primarily through CDK6 activation. These results point to a critical role of the Cdkn2ab locus in keeping the oncogenic potential of physiological levels of WNT signaling in check and illustrate that GWAS-based searches for cancer predisposing allelic variants can be enhanced by including defined somatically acquired lesions as an additional input.

    Funded by: Cancer Research UK: 14356

    Nature communications 2019;10;1;1425

  • Nasal carriage of Staphylococcus pseudintermedius in patients with granulomatosis with polyangiitis.

    Kronbichler A, Blane B, Holmes MA, Wagner J, Parkhill J, Peacock SJ, Jayne DRW and Harrison EM

    Vasculitis and Lupus Clinic, Addenbrooke's Hospital, Cambridge, UK.

    Funded by: Medical Research Council: MR/S00291X/1; Wellcome Trust

    Rheumatology (Oxford, England) 2019;58;3;548-550

  • A Compendium of Mutational Signatures of Environmental Agents.

    Kucab JE, Zou X, Morganella S, Joel M, Nanda AS, Nagy E, Gomez C, Degasperi A, Harris R, Jackson SP, Arlt VM, Phillips DH and Nik-Zainal S

    Department of Analytical, Environmental and Forensic Sciences, MRC-PHE Centre for Environment and Health, King's College London, 150 Stamford Street, London SE1 9NH, UK.

    Whole-genome-sequencing (WGS) of human tumors has revealed distinct mutation patterns that hint at the causative origins of cancer. We examined mutational signatures in 324 WGS human-induced pluripotent stem cells exposed to 79 known or suspected environmental carcinogens. Forty-one yielded characteristic substitution mutational signatures. Some were similar to signatures found in human tumors. Additionally, six agents produced double-substitution signatures and eight produced indel signatures. Investigating mutation asymmetries across genome topography revealed fully functional mismatch and transcription-coupled repair pathways. DNA damage induced by environmental mutagens can be resolved by disparate repair and/or replicative pathways, resulting in an assortment of signature outcomes even for a single agent. This compendium of experimentally induced mutational signatures permits further exploration of roles of environmental agents in cancer etiology and underscores how human stem cell DNA is directly vulnerable to environmental agents. VIDEO ABSTRACT.

    Funded by: Cancer Research UK: C313/A14329, C60100/A23916; Wellcome Trust: 101126/B/13/Z, WT100183MA

    Cell 2019;177;4;821-836.e16

  • The transferability of lipid loci across African, Asian and European cohorts.

    Kuchenbaecker K, Telkar N, Reiker T, Walters RG, Lin K, Eriksson A, Gurdasani D, Gilly A, Southam L, Tsafantakis E, Karaleftheri M, Seeley J, Kamali A, Asiki G, Millwood IY, Holmes M, Du H, Guo Y, Kumari M, Dedoussis G, Li L, Chen Z, Sandhu MS, Zeggini E and Understanding Society Scientific Group

    Division of Psychiatry, University College of London, London, W1T 7NF, UK.

    Most genome-wide association studies are based on samples of European descent. We assess whether the genetic determinants of blood lipids, a major cardiovascular risk factor, are shared across populations. Genetic correlations for lipids between European-ancestry and Asian cohorts are not significantly different from 1. A genetic risk score based on LDL-cholesterol-associated loci has consistent effects on serum levels in samples from the UK, Uganda and Greece (r = 0.23-0.28, p < 1.9 × 10<sup>-14</sup>). Overall, there is evidence of reproducibility for ~75% of the major lipid loci from European discovery studies, except triglyceride loci in the Ugandan samples (10% of loci). Individual transferable loci are identified using trans-ethnic colocalization. Ten of fourteen loci not transferable to the Ugandan population have pleiotropic associations with BMI in Europeans; none of the transferable loci do. The non-transferable loci might affect lipids by modifying food intake in environments rich in certain nutrients, which suggests a potential role for gene-environment interactions.

    Funded by: Wellcome Trust; Wellcome Trust (Wellcome): 212360/Z/18/Z

    Nature communications 2019;10;1;4330

  • Adaptation of host transmission cycle during Clostridium difficile speciation.

    Kumar N, Browne HP, Viciani E, Forster SC, Clare S, Harcourt K, Stares MD, Dougan G, Fairley DJ, Roberts P, Pirmohamed M, Clokie MRJ, Jensen MBF, Hargreaves KR, Ip M, Wieler LH, Seyboldt C, Norén T, Riley TV, Kuijper EJ, Wren BW and Lawley TD

    Host-Microbiota Interactions Laboratory, Wellcome Sanger Institute, Hinxton, UK.

    Bacterial speciation is a fundamental evolutionary process characterized by diverging genotypic and phenotypic properties. However, the selective forces that affect genetic adaptations and how they relate to the biological changes that underpin the formation of a new bacterial species remain poorly understood. Here, we show that the spore-forming, healthcare-associated enteropathogen Clostridium difficile is actively undergoing speciation. Through large-scale genomic analysis of 906 strains, we demonstrate that the ongoing speciation process is linked to positive selection on core genes in the newly forming species that are involved in sporulation and the metabolism of simple dietary sugars. Functional validation shows that the new C. difficile produces spores that are more resistant and have increased sporulation and host colonization capacity when glucose or fructose is available for metabolism. Thus, we report the formation of an emerging C. difficile species, selected for metabolizing simple dietary sugars and producing high levels of resistant spores, that is adapted for healthcare-mediated transmission.

    Funded by: Medical Research Council: MR/K000551/1, MR/L006758/1; Wellcome Trust: 102979/Z/13/Z

    Nature genetics 2019;51;9;1315-1320

  • Genetic meta-analysis of diagnosed Alzheimer's disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing.

    Kunkle BW, Grenier-Boley B, Sims R, Bis JC, Damotte V, Naj AC, Boland A, Vronskaya M, van der Lee SJ, Amlie-Wolf A, Bellenguez C, Frizatti A, Chouraki V, Martin ER, Sleegers K, Badarinarayan N, Jakobsdottir J, Hamilton-Nelson KL, Moreno-Grau S, Olaso R, Raybould R, Chen Y, Kuzma AB, Hiltunen M, Morgan T, Ahmad S, Vardarajan BN, Epelbaum J, Hoffmann P, Boada M, Beecham GW, Garnier JG, Harold D, Fitzpatrick AL, Valladares O, Moutet ML, Gerrish A, Smith AV, Qu L, Bacq D, Denning N, Jian X, Zhao Y, Del Zompo M, Fox NC, Choi SH, Mateo I, Hughes JT, Adams HH, Malamon J, Sanchez-Garcia F, Patel Y, Brody JA, Dombroski BA, Naranjo MCD, Daniilidou M, Eiriksdottir G, Mukherjee S, Wallon D, Uphill J, Aspelund T, Cantwell LB, Garzia F, Galimberti D, Hofer E, Butkiewicz M, Fin B, Scarpini E, Sarnowski C, Bush WS, Meslage S, Kornhuber J, White CC, Song Y, Barber RC, Engelborghs S, Sordon S, Voijnovic D, Adams PM, Vandenberghe R, Mayhaus M, Cupples LA, Albert MS, De Deyn PP, Gu W, Himali JJ, Beekly D, Squassina A, Hartmann AM, Orellana A, Blacker D, Rodriguez-Rodriguez E, Lovestone S, Garcia ME, Doody RS, Munoz-Fernadez C, Sussams R, Lin H, Fairchild TJ, Benito YA, Holmes C, Karamujić-Čomić H, Frosch MP, Thonberg H, Maier W, Roshchupkin G, Ghetti B, Giedraitis V, Kawalia A, Li S, Huebinger RM, Kilander L, Moebus S, Hernández I, Kamboh MI, Brundin R, Turton J, Yang Q, Katz MJ, Concari L, Lord J, Beiser AS, Keene CD, Helisalmi S, Kloszewska I, Kukull WA, Koivisto AM, Lynch A, Tarraga L, Larson EB, Haapasalo A, Lawlor B, Mosley TH, Lipton RB, Solfrizzi V, Gill M, Longstreth WT, Montine TJ, Frisardi V, Diez-Fairen M, Rivadeneira F, Petersen RC, Deramecourt V, Alvarez I, Salani F, Ciaramella A, Boerwinkle E, Reiman EM, Fievet N, Rotter JI, Reisch JS, Hanon O, Cupidi C, Andre Uitterlinden AG, Royall DR, Dufouil C, Maletta RG, de Rojas I, Sano M, Brice A, Cecchetti R, George-Hyslop PS, Ritchie K, Tsolaki M, Tsuang DW, Dubois B, Craig D, Wu CK, Soininen H, Avramidou D, Albin RL, Fratiglioni L, Germanou A, Apostolova LG, Keller L, Koutroumani M, Arnold SE, Panza F, Gkatzima O, Asthana S, Hannequin D, Whitehead P, Atwood CS, Caffarra P, Hampel H, Quintela I, Carracedo Á, Lannfelt L, Rubinsztein DC, Barnes LL, Pasquier F, Frölich L, Barral S, McGuinness B, Beach TG, Johnston JA, Becker JT, Passmore P, Bigio EH, Schott JM, Bird TD, Warren JD, Boeve BF, Lupton MK, Bowen JD, Proitsi P, Boxer A, Powell JF, Burke JR, Kauwe JSK, Burns JM, Mancuso M, Buxbaum JD, Bonuccelli U, Cairns NJ, McQuillin A, Cao C, Livingston G, Carlson CS, Bass NJ, Carlsson CM, Hardy J, Carney RM, Bras J, Carrasquillo MM, Guerreiro R, Allen M, Chui HC, Fisher E, Masullo C, Crocco EA, DeCarli C, Bisceglio G, Dick M, Ma L, Duara R, Graff-Radford NR, Evans DA, Hodges A, Faber KM, Scherer M, Fallon KB, Riemenschneider M, Fardo DW, Heun R, Farlow MR, Kölsch H, Ferris S, Leber M, Foroud TM, Heuser I, Galasko DR, Giegling I, Gearing M, Hüll M, Geschwind DH, Gilbert JR, Morris J, Green RC, Mayo K, Growdon JH, Feulner T, Hamilton RL, Harrell LE, Drichel D, Honig LS, Cushion TD, Huentelman MJ, Hollingworth P, Hulette CM, Hyman BT, Marshall R, Jarvik GP, Meggy A, Abner E, Menzies GE, Jin LW, Leonenko G, Real LM, Jun GR, Baldwin CT, Grozeva D, Karydas A, Russo G, Kaye JA, Kim R, Jessen F, Kowall NW, Vellas B, Kramer JH, Vardy E, LaFerla FM, Jöckel KH, Lah JJ, Dichgans M, Leverenz JB, Mann D, Levey AI, Pickering-Brown S, Lieberman AP, Klopp N, Lunetta KL, Wichmann HE, Lyketsos CG, Morgan K, Marson DC, Brown K, Martiniuk F, Medway C, Mash DC, Nöthen MM, Masliah E, Hooper NM, McCormick WC, Daniele A, McCurry SM, Bayer A, McDavid AN, Gallacher J, McKee AC, van den Bussche H, Mesulam M, Brayne C, Miller BL, Riedel-Heller S, Miller CA, Miller JW, Al-Chalabi A, Morris JC, Shaw CE, Myers AJ, Wiltfang J, O'Bryant S, Olichney JM, Alvarez V, Parisi JE, Singleton AB, Paulson HL, Collinge J, Perry WR, Mead S, Peskind E, Cribbs DH, Rossor M, Pierce A, Ryan NS, Poon WW, Nacmias B, Potter H, Sorbi S, Quinn JF, Sacchinelli E, Raj A, Spalletta G, Raskind M, Caltagirone C, Bossù P, Orfei MD, Reisberg B, Clarke R, Reitz C, Smith AD, Ringman JM, Warden D, Roberson ED, Wilcock G, Rogaeva E, Bruni AC, Rosen HJ, Gallo M, Rosenberg RN, Ben-Shlomo Y, Sager MA, Mecocci P, Saykin AJ, Pastor P, Cuccaro ML, Vance JM, Schneider JA, Schneider LS, Slifer S, Seeley WW, Smith AG, Sonnen JA, Spina S, Stern RA, Swerdlow RH, Tang M, Tanzi RE, Trojanowski JQ, Troncoso JC, Van Deerlin VM, Van Eldik LJ, Vinters HV, Vonsattel JP, Weintraub S, Welsh-Bohmer KA, Wilhelmsen KC, Williamson J, Wingo TS, Woltjer RL, Wright CB, Yu CE, Yu L, Saba Y, Pilotto A, Bullido MJ, Peters O, Crane PK, Bennett D, Bosco P, Coto E, Boccardi V, De Jager PL, Lleo A, Warner N, Lopez OL, Ingelsson M, Deloukas P, Cruchaga C, Graff C, Gwilliam R, Fornage M, Goate AM, Sanchez-Juan P, Kehoe PG, Amin N, Ertekin-Taner N, Berr C, Debette S, Love S, Launer LJ, Younkin SG, Dartigues JF, Corcoran C, Ikram MA, Dickson DW, Nicolas G, Campion D, Tschanz J, Schmidt H, Hakonarson H, Clarimon J, Munger R, Schmidt R, Farrer LA, Van Broeckhoven C, C O'Donovan M, DeStefano AL, Jones L, Haines JL, Deleuze JF, Owen MJ, Gudnason V, Mayeux R, Escott-Price V, Psaty BM, Ramirez A, Wang LS, Ruiz A, van Duijn CM, Holmans PA, Seshadri S, Williams J, Amouyel P, Schellenberg GD, Lambert JC, Pericak-Vance MA, Alzheimer Disease Genetics Consortium (ADGC),, European Alzheimer’s Disease Initiative (EADI),, Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium (CHARGE), and Genetic and Environmental Risk in AD/Defining Genetic, Polygenic and Environmental Risk for Alzheimer’s Disease Consortium (GERAD/PERADES),

    John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA.

    Risk for late-onset Alzheimer's disease (LOAD), the most prevalent dementia, is partially driven by genetics. To identify LOAD risk loci, we performed a large genome-wide association meta-analysis of clinically diagnosed LOAD (94,437 individuals). We confirm 20 previous LOAD risk loci and identify five new genome-wide loci (IQCK, ACE, ADAM10, ADAMTS1, and WWOX), two of which (ADAM10, ACE) were identified in a recent genome-wide association (GWAS)-by-familial-proxy of Alzheimer's or dementia. Fine-mapping of the human leukocyte antigen (HLA) region confirms the neurological and immune-mediated disease haplotype HLA-DR15 as a risk factor for LOAD. Pathway analysis implicates immunity, lipid metabolism, tau binding proteins, and amyloid precursor protein (APP) metabolism, showing that genetic variants affecting APP and Aβ processing are associated not only with early-onset autosomal dominant Alzheimer's disease but also with LOAD. Analyses of risk genes and pathways show enrichment for rare variants (P = 1.32 × 10<sup>-7</sup>), indicating that additional rare variants remain to be identified. We also identify important genetic correlations between LOAD and traits such as family history of dementia and education.

    Funded by: Medical Research Council: G0300429, G0500289, G0600237, G0801418, G0900421, G0900688, G0902227, G9810900, MC_U123160657, MR/K013041/1, MR/L021803/1, MR/L501517/1, MR/L501529/1; NIA NIH HHS: K25 AG055620, P01 AG003991, P01 AG019724, P01 AG026276, P30 AG010124, P30 AG010129, P30 AG010161, P30 AG013854, P30 AG017266, P30 AG053760, P50 AG005681, R01 AG018454, R01 AG054076, U01 AG016976, U01 AG032438

    Nature genetics 2019;51;3;414-430

  • Studying DNA Methylation in Single-Cell Format with scBS-seq.

    Kunowska N

    Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.

    DNA methylation at cytosine is a major epigenetic mark, heavily implicated in controlling key cellular processes such as development and differentiation, cellular memory, or carcinogenesis. Bisulfite treatment in conjunction with next generation sequencing has been a powerful tool for studying this modification in a quantitative manner in the context of the whole genome and with a single nucleotide resolution. This chapter describes a protocol for bisulfite sequencing adapted to a single-cell format that allows for capturing the methylation signal from up to 50% CpG nucleotides in each cell.

    Methods in molecular biology (Clifton, N.J.) 2019;1979;235-250

  • ChIPmentation for Low-Input Profiling of In Vivo Protein-DNA Interactions.

    Kunowska N and Chen X

    Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.

    Many of the key cellular processes including establishing the cell's identity are governed by chromatin proteins. Mapping their binding on the level of a single cell would give us important insights into a new dimension of cellular heterogeneity. However, ChIP-seq, the main method to study protein-DNA interaction in the chromatin context, has proven very challenging to scale down. ChIPmentation is a modification of ChIP-seq, in which the Tn5 transposase is used to introduce sequencing adapters in one step. This allows to significantly reduce the required input material. ChIPmentation is a robust and versatile approach and even though it has not yet achieved single-cell resolution, we believe that it is a very promising starting point for further downscaling.

    Methods in molecular biology (Clifton, N.J.) 2019;1979;269-282

  • Synergistic Activity of Mobile Genetic Element Defences in Streptococcus pneumoniae.

    Kwun MJ, Oggioni MR, Bentley SD, Fraser C and Croucher NJ

    MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, St. Mary's Campus, Imperial College London, London W2 1PG, UK.

    A diverse set of mobile genetic elements (MGEs) transmit between <i>Streptococcus pneumoniae</i> cells, but many isolates remain uninfected. The best-characterised defences against horizontal transmission of MGEs are restriction-modification systems (RMSs), of which there are two phase-variable examples in <i>S. pneumoniae</i>. Additionally, the transformation machinery has been proposed to limit vertical transmission of chromosomally integrated MGEs. This work describes how these mechanisms can act in concert. Experimental data demonstrate RMS phase variation occurs at a sub-maximal rate. Simulations suggest this may be optimal if MGEs are sometimes vertically inherited, as it reduces the probability that an infected cell will switch between RMS variants while the MGE is invading the population, and thereby undermine the restriction barrier. Such vertically inherited MGEs can be deleted by transformation. The lack of between-strain transformation hotspots at known prophage <i>att</i> sites suggests transformation cannot remove an MGE from a strain in which it is fixed. However, simulations confirmed that transformation was nevertheless effective at preventing the spread of MGEs into a previously uninfected cell population, if a recombination barrier existed between co-colonising strains. Further simulations combining these effects of phase variable RMSs and transformation found they synergistically inhibited MGEs spreading, through limiting both vertical and horizontal transmission.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/N002903/1; Medical Research Council: MR/R015600/1; Wellcome Trust: 104169/Z/14/Z

    Genes 2019;10;9

  • GWAS for systemic sclerosis identifies multiple risk loci and highlights fibrotic and vasculopathy pathways.

    López-Isac E, Acosta-Herrera M, Kerick M, Assassi S, Satpathy AT, Granja J, Mumbach MR, Beretta L, Simeón CP, Carreira P, Ortego-Centeno N, Castellvi I, Bossini-Castillo L, Carmona FD, Orozco G, Hunzelmann N, Distler JHW, Franke A, Lunardi C, Moroncini G, Gabrielli A, de Vries-Bouwstra J, Wijmenga C, Koeleman BPC, Nordin A, Padyukov L, Hoffmann-Vold AM, Lie B, European Scleroderma Group†, Proudman S, Stevens W, Nikpour M, Australian Scleroderma Interest Group (ASIG), Vyse T, Herrick AL, Worthington J, Denton CP, Allanore Y, Brown MA, Radstake TRDJ, Fonseca C, Chang HY, Mayes MD and Martin J

    Institute of Parasitology and Biomedicine López-Neyra, IPBLN-CSIC, Granada, Spain.

    Systemic sclerosis (SSc) is an autoimmune disease that shows one of the highest mortality rates among rheumatic diseases. We perform a large genome-wide association study (GWAS), and meta-analysis with previous GWASs, in 26,679 individuals and identify 27 independent genome-wide associated signals, including 13 new risk loci. The novel associations nearly double the number of genome-wide hits reported for SSc thus far. We define 95% credible sets of less than 5 likely causal variants in 12 loci. Additionally, we identify specific SSc subtype-associated signals. Functional analysis of high-priority variants shows the potential function of SSc signals, with the identification of 43 robust target genes through HiChIP. Our results point towards molecular pathways potentially involved in vasculopathy and fibrosis, two main hallmarks in SSc, and highlight the spectrum of critical cell types for the disease. This work supports a better understanding of the genetic basis of SSc and provides directions for future functional experiments.

    Nature communications 2019;10;1;4955

  • Mechanisms of Progression of Myeloid Preleukemia to Transformed Myeloid Leukemia in Children with Down Syndrome.

    Labuhn M, Perkins K, Matzk S, Varghese L, Garnett C, Papaemmanuil E, Metzner M, Kennedy A, Amstislavskiy V, Risch T, Bhayadia R, Samulowski D, Hernandez DC, Stoilova B, Iotchkova V, Oppermann U, Scheer C, Yoshida K, Schwarzer A, Taub JW, Crispino JD, Weiss MJ, Hayashi Y, Taga T, Ito E, Ogawa S, Reinhardt D, Yaspo ML, Campbell PJ, Roberts I, Constantinescu SN, Vyas P, Heckl D and Klusmann JH

    Pediatric Hematology and Oncology, Hannover Medical School, 30625 Hannover, Germany.

    Myeloid leukemia in Down syndrome (ML-DS) clonally evolves from transient abnormal myelopoiesis (TAM), a preleukemic condition in DS newborns. To define mechanisms of leukemic transformation, we combined exome and targeted resequencing of 111 TAM and 141 ML-DS samples with functional analyses. TAM requires trisomy 21 and truncating mutations in GATA1; additional TAM variants are usually not pathogenic. By contrast, in ML-DS, clonal and subclonal variants are functionally required. We identified a recurrent and oncogenic hotspot gain-of-function mutation in myeloid cytokine receptor CSF2RB. By a multiplex CRISPR/Cas9 screen in an in vivo murine TAM model, we tested loss-of-function of 22 recurrently mutated ML-DS genes. Loss of 18 different genes produced leukemias that phenotypically, genetically, and transcriptionally mirrored ML-DS.

    Funded by: Medical Research Council: G1000729, MC_U137961146, MC_UU_00016/11, MC_UU_12009/11; NCI NIH HHS: P30 CA022453; Wellcome Trust

    Cancer cell 2019;36;2;123-138.e10

  • Pleiotropic Meta-Analysis of Cognition, Education, and Schizophrenia Differentiates Roles of Early Neurodevelopmental and Adult Synaptic Pathways.

    Lam M, Hill WD, Trampush JW, Yu J, Knowles E, Davies G, Stahl E, Huckins L, Liewald DC, Djurovic S, Melle I, Sundet K, Christoforou A, Reinvang I, DeRosse P, Lundervold AJ, Steen VM, Espeseth T, Räikkönen K, Widen E, Palotie A, Eriksson JG, Giegling I, Konte B, Hartmann AM, Roussos P, Giakoumaki S, Burdick KE, Payton A, Ollier W, Chiba-Falek O, Attix DK, Need AC, Cirulli ET, Voineskos AN, Stefanis NC, Avramopoulos D, Hatzimanolis A, Arking DE, Smyrnis N, Bilder RM, Freimer NA, Cannon TD, London E, Poldrack RA, Sabb FW, Congdon E, Conley ED, Scult MA, Dickinson D, Straub RE, Donohoe G, Morris D, Corvin A, Gill M, Hariri AR, Weinberger DR, Pendleton N, Bitsios P, Rujescu D, Lahti J, Le Hellard S, Keller MC, Andreassen OA, Deary IJ, Glahn DC, Malhotra AK and Lencz T

    Institute of Mental Health, Singapore, 539747, Singapore; Division of Psychiatry Research, The Zucker Hillside Hospital, Glen Oaks, NY 11004, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.

    Susceptibility to schizophrenia is inversely correlated with general cognitive ability at both the phenotypic and the genetic level. Paradoxically, a modest but consistent positive genetic correlation has been reported between schizophrenia and educational attainment, despite the strong positive genetic correlation between cognitive ability and educational attainment. Here we leverage published genome-wide association studies (GWASs) in cognitive ability, education, and schizophrenia to parse biological mechanisms underlying these results. Association analysis based on subsets (ASSET), a pleiotropic meta-analytic technique, allowed jointly associated loci to be identified and characterized. Specifically, we identified subsets of variants associated in the expected ("concordant") direction across all three phenotypes (i.e., greater risk for schizophrenia, lower cognitive ability, and lower educational attainment); these were contrasted with variants that demonstrated the counterintuitive ("discordant") relationship between education and schizophrenia (i.e., greater risk for schizophrenia and higher educational attainment). ASSET analysis revealed 235 independent loci associated with cognitive ability, education, and/or schizophrenia at p < 5 × 10<sup>-8</sup>. Pleiotropic analysis successfully identified more than 100 loci that were not significant in the input GWASs. Many of these have been validated by larger, more recent single-phenotype GWASs. Leveraging the joint genetic correlations of cognitive ability, education, and schizophrenia, we were able to dissociate two distinct biological mechanisms-early neurodevelopmental pathways that characterize concordant allelic variation and adulthood synaptic pruning pathways-that were linked to the paradoxical positive genetic association between education and schizophrenia. Furthermore, genetic correlation analyses revealed that these mechanisms contribute not only to the etiopathogenesis of schizophrenia but also to the broader biological dimensions implicated in both general health outcomes and psychiatric illness.

    Funded by: NIMH NIH HHS: R01 MH085018

    American journal of human genetics 2019;105;2;334-350

  • Method for culturing Candidatus Ornithobacterium hominis.

    Lawrence KA, Harris TM, Salter SJ, Hall RW, Smith-Vaughan HC, Chang AB and Marsh RL

    Menzies School of Health Research, Charles Darwin University, Darwin, Australia.

    Candidatus Ornithobacterium hominis has been detected in nasopharyngeal microbiota sequence data from around the world. This report provides the first description of culture conditions for isolating this bacterium. The availability of an easily reproducible culture method is expected to facilitate deeper understanding of the clinical significance of this species.

    Journal of microbiological methods 2019;159;157-160

  • ZMYM2 inhibits NANOG-mediated reprogramming.

    Lawrence M, Theunissen TW, Lombard P, Adams DJ and Silva JCR

    Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, Cambridgeshire, CB2 1QR, UK.

    <b>Background:</b> NANOG is a homeodomain-containing transcription factor which forms one of the hubs in the pluripotency network and plays a key role in the reprogramming of somatic cells and epiblast stem cells to naïve pluripotency.  Studies have found that NANOG has many interacting partners and some of these were shown to play a role in its ability to mediate reprogramming. In this study, we set out to analyse the effect of NANOG interactors on the reprogramming process. <b>Methods:</b> Epiblast stem cells and somatic cells were reprogrammed to naïve pluripotency using MEK/ERK inhibitor PD0325901, GSK3β inhibitor CHIR99021 and Leukaemia Inhibitory Factor (together termed 2i Plus LIF). <i>Zmym2</i> was knocked out using the CRISPR/Cas9 system or overexpressed using the PiggyBac system. Reprogramming was quantified after ZMYM2 deletion or overexpression, in diverse reprogramming systems. In addition, embryonic stem cell self renewal was quantified in differentiation assays after ZMYM2 removal or overexpression. <b>Results:</b> In this work, we identified ZMYM2/ZFP198, which physically associates with NANOG as a key negative regulator of NANOG-mediated reprogramming of both epiblast stem cells and somatic cells. In addition, ZMYM2 impairs the self renewal of embryonic stem cells and its overexpression promotes differentiation. <b>Conclusions:</b> We propose that ZMYM2 curtails NANOG's actions during the reprogramming of both somatic cells and epiblast stem cells and impedes embryonic stem cell self renewal, promoting differentiation.

    Wellcome open research 2019;4;88

  • Patient-derived xenografts and matched cell lines identify pharmacogenomic vulnerabilities in colorectal cancer.

    Lazzari L, Corti G, Picco G, Isella C, Montone M, Arcella P, Durinikova E, Zanella ER, Novara L, Barbosa F, Cassingena A, Cancelliere C, Medico E, Sartore-Bianchi A, Siena S, Garnett MJ, Bertotti A, Trusolino L, Di Nicolantonio F, Linnebacher M, Bardelli A and Arena S

    Precision Oncology, IFOM - The FIRC Institute of Molecular Oncology.

    Purpose: Patient-derived xenograft (PDX) models accurately recapitulate the tumor of origin in terms of histopathology, genomic landscape, and therapeutic response, but some limitations due to costs associated with their maintenance and restricted amenability for large-scale screenings still exist. To overcome these issues, we established a platform of 2D cell lines (xeno-cell lines, XLs), derived from PDXs of colorectal cancer (CRC) with matched patient germline gDNA available.

    Experimental design: Whole exome and transcriptome sequencing analyses were performed. Biomarkers of response and resistance to anti-HER therapy were annotated. Dependency on the WRN helicase gene was assessed in MSS, MSI-H and MSI-like XLs using a reverse genetics functional approach.

    Results: XLs recapitulated the entire spectrum of CRC transcriptional subtypes. Exome and RNA-seq analyses delineated several molecular biomarkers of response and resistance to EGFR and HER2 blockade. Genotype-driven responses observed in vitro in XLs were confirmed in vivo in the matched PDXs. MSI-H models were dependent upon WRN gene expression, while loss of WRN did not affect MSS XLs growth. Interestingly, one MSS XL with transcriptional MSI-like traits was sensitive to WRN depletion.

    Conclusion: The XL platform represents a preclinical tool for functional gene validation and proof of concept studies to identify novel druggable vulnerabilities in CRC.

    Clinical cancer research : an official journal of the American Association for Cancer Research 2019

  • Supervised clustering for single-cell analysis.

    Lee JTH and Hemberg M

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

    Nature methods 2019;16;10;965-966

  • Cutting back malaria: CRISPR/Cas9 genome editing of Plasmodium.

    Lee MCS, Lindner SE, Lopez-Rubio JJ and Llinás M

    Parasites and Microbes Programme, Wellcome Sanger Institute, Hinxton, UK.

    CRISPR/Cas9 approaches are revolutionizing our ability to perform functional genomics across a wide range of organisms, including the Plasmodium parasites that cause malaria. The ability to deliver single point mutations, epitope tags and gene deletions at increased speed and scale is enabling our understanding of the biology of these complex parasites, and pointing to potential new therapeutic targets. In this review, we describe some of the biological and technical considerations for designing CRISPR-based experiments, and discuss potential future developments that broaden the applications for CRISPR/Cas9 interrogation of the malaria parasite genome.

    Funded by: NIAID NIH HHS: R01 AI125565, R21 AI130692; Wellcome Trust: 206194

    Briefings in functional genomics 2019;18;5;281-289

  • The landscape of somatic mutation in normal colorectal epithelial cells.

    Lee-Six H, Olafsson S, Ellis P, Osborne RJ, Sanders MA, Moore L, Georgakopoulos N, Torrente F, Noorani A, Goddard M, Robinson P, Coorens THH, O'Neill L, Alder C, Wang J, Fitzgerald RC, Zilbauer M, Coleman N, Saeb-Parsy K, Martincorena I, Campbell PJ and Stratton MR

    Wellcome Sanger Institute, Hinxton, UK.

    The colorectal adenoma-carcinoma sequence has provided a paradigmatic framework for understanding the successive somatic genetic changes and consequent clonal expansions that lead to cancer<sup>1</sup>. However, our understanding of the earliest phases of colorectal neoplastic changes-which may occur in morphologically normal tissue-is comparatively limited, as for most cancer types. Here we use whole-genome sequencing to analyse hundreds of normal crypts from 42 individuals. Signatures of multiple mutational processes were revealed; some of these were ubiquitous and continuous, whereas others were only found in some individuals, in some crypts or during certain periods of life. Probable driver mutations were present in around 1% of normal colorectal crypts in middle-aged individuals, indicating that adenomas and carcinomas are rare outcomes of a pervasive process of neoplastic change across morphologically normal colorectal epithelium. Colorectal cancers exhibit substantially increased mutational burdens relative to normal cells. Sequencing normal colorectal cells provides quantitative insights into the genomic and clonal evolution of cancer.

    Funded by: Wellcome Trust

    Nature 2019;574;7779;532-537

  • Using Human Induced Pluripotent Stem Cell-derived Intestinal Organoids to Study and Modify Epithelial Cell Protection Against Salmonella and Other Pathogens.

    Lees EA, Forbester JL, Forrest S, Kane L, Goulding D and Dougan G

    Wellcome Trust Sanger Institute; Department of Medicine, University of Cambridge.

    The intestinal 'organoid' (iHO) system, wherein 3-D structures representative of the epithelial lining of the human gut can be produced from human induced pluripotent stem cells (hiPSCs) and maintained in culture, provides an exciting opportunity to facilitate the modeling of the epithelial response to enteric infections. In vivo, intestinal epithelial cells (IECs) play a key role in regulating intestinal homeostasis and may directly inhibit pathogens, although the mechanisms by which this occurs are not fully elucidated. The cytokine interleukin-22 (IL-22) has been shown to play a role in the maintenance and defense of the gut epithelial barrier, including inducing a release of antimicrobial peptides and chemokines in response to infection. We describe the differentiation of healthy control hiPSCs into iHOs via the addition of specific cytokine combinations to their culture medium before embedding them into a basement membrane matrix-based prointestinal culture system. Once embedded, the iHOs are grown in media supplemented with Noggin, R-spondin-1, epidermal growth factor (EGF), CHIR99021, prostaglandin E2, and Y-27632 dihydrochloride monohydrate. Weekly passages by manual disruption of the iHO ultrastructure lead to the formation of budded iHOs, with some exhibiting a crypt/villus structure. All iHOs demonstrate a differentiated epithelium consisting of goblet cells, enteroendocrine cells, Paneth cells, and polarized enterocytes, which can be confirmed via immunostaining for specific markers of each cell subset, transmission electron microscopy (TEM), and quantitative PCR (qPCR). To model infection, Salmonella enterica serovar Typhimurium SL1344 are microinjected into the lumen of the iHOs and incubated for 90 min at 37 °C, and a modified gentamicin protection assay is performed to identify the levels of intracellular bacterial invasion. Some iHOs are also pretreated with recombinant human IL-22 (rhIL-22) prior to infection to establish whether this cytokine is protective against Salmonella infection.

    Journal of visualized experiments : JoVE 2019;147

  • Joint sequencing of human and pathogen genomes reveals the genetics of pneumococcal meningitis.

    Lees JA, Ferwerda B, Kremer PHC, Wheeler NE, Serón MV, Croucher NJ, Gladstone RA, Bootsma HJ, Rots NY, Wijmega-Monsuur AJ, Sanders EAM, Trzciński K, Wyllie AL, Zwinderman AH, van den Berg LH, van Rheenen W, Veldink JH, Harboe ZB, Lundbo LF, de Groot LCPGM, van Schoor NM, van der Velde N, Ängquist LH, Sørensen TIA, Nohr EA, Mentzer AJ, Mills TC, Knight JC, du Plessis M, Nzenze S, Weiser JN, Parkhill J, Madhi S, Benfield T, von Gottberg A, van der Ende A, Brouwer MC, Barrett JC, Bentley SD and van de Beek D

    Department of Microbiology, New York University School of Medicine, New York, NY, 10016, USA.

    Streptococcus pneumoniae is a common nasopharyngeal colonizer, but can also cause life-threatening invasive diseases such as empyema, bacteremia and meningitis. Genetic variation of host and pathogen is known to play a role in invasive pneumococcal disease, though to what extent is unknown. In a genome-wide association study of human and pathogen we show that human variation explains almost half of variation in susceptibility to pneumococcal meningitis and one-third of variation in severity, identifying variants in CCDC33 associated with susceptibility. Pneumococcal genetic variation explains a large amount of invasive potential (70%), but has no effect on severity. Serotype alone is insufficient to explain invasiveness, suggesting other pneumococcal factors are involved in progression to invasive disease. We identify pneumococcal genes involved in invasiveness including pspC and zmpD, and perform a human-bacteria interaction analysis. These genes are potential candidates for the development of more broadly-acting pneumococcal vaccines.

    Funded by: NIAID NIH HHS: R01 AI044231, R01 AI105168; RCUK | Medical Research Council (MRC): 1365620; U.S. Department of Health &amp;amp; Human Services | U.S. Public Health Service (United States Public Health Service): AI038446; Wellcome Trust (Wellcome): 098051, 104169/Z/14/Z, 106289/Z/14/Z, 204969/Z/16/Z; ZonMw (Netherlands Organisation for Health Research and Development): 016.116.358, 916.13.078

    Nature communications 2019;10;1;2176

  • Fast and flexible bacterial genomic epidemiology with PopPUNK.

    Lees JA, Harris SR, Tonkin-Hill G, Gladstone RA, Lo SW, Weiser JN, Corander J, Bentley SD and Croucher NJ

    Department of Microbiology, New York University School of Medicine, New York, New York 10016, USA.

    The routine use of genomics for disease surveillance provides the opportunity for high-resolution bacterial epidemiology. Current whole-genome clustering and multilocus typing approaches do not fully exploit core and accessory genomic variation, and they cannot both automatically identify, and subsequently expand, clusters of significantly similar isolates in large data sets spanning entire species. Here, we describe PopPUNK (<u>Pop</u>ulation <u>P</u>artitioning <u>U</u>sing <u>N</u>ucleotide <i> <u>K</u> </i> -mers), a software implementing scalable and expandable annotation- and alignment-free methods for population analysis and clustering. Variable-length <i>k</i>-mer comparisons are used to distinguish isolates' divergence in shared sequence and gene content, which we demonstrate to be accurate over multiple orders of magnitude using data from both simulations and genomic collections representing 10 taxonomically widespread species. Connections between closely related isolates of the same strain are robustly identified, despite interspecies variation in the pairwise distance distributions that reflects species' diverse evolutionary patterns. PopPUNK can process 10<sup>3</sup>-10<sup>4</sup> genomes in a single batch, with minimal memory use and runtimes up to 200-fold faster than existing model-based methods. Clusters of strains remain consistent as new batches of genomes are added, which is achieved without needing to reanalyze all genomes de novo. This facilitates real-time surveillance with consistent cluster naming between studies and allows for outbreak detection using hundreds of genomes in minutes. Interactive visualization and online publication is streamlined through the automatic output of results to multiple platforms. PopPUNK has been designed as a flexible platform that addresses important issues with currently used whole-genome clustering and typing methods, and has potential uses across bacterial genetics and public health research.

    Funded by: Medical Research Council: MR/R003076/1, MR/R015600/1; NIAID NIH HHS: R01 AI038446, R01 AI105168; Wellcome Trust: 098051, 104169/Z/14/Z

    Genome research 2019;29;2;304-316

  • Germline mutations in the transcription factor IKZF5 cause thrombocytopenia.

    Lentaigne C, Greene D, Sivapalaratnam S, Favier R, Seyres D, Thys C, Grassi L, Mangles S, Sibson K, Stubbs M, Burden F, Bordet JC, Armari-Alla C, Erber W, Farrow S, Gleadall N, Gomez K, Megy K, Papadia S, Penkett CJ, Sims MC, Stefanucci L, Stephens JC, Read RJ, Stirrups KE, Ouwehand WH, Laffan MA, NIHR BioResource, Frontini M, Freson K and Turro E

    Centre for Haematology, Hammersmith Campus, Imperial College Academic Health Sciences Centre, Imperial College London, London, United Kingdom.

    To identify novel causes of hereditary thrombocytopenia, we performed a genetic association analysis of whole-genome sequencing data from 13 037 individuals enrolled in the National Institute for Health Research (NIHR) BioResource, including 233 cases with isolated thrombocytopenia. We found an association between rare variants in the transcription factor-encoding gene IKZF5 and thrombocytopenia. We report 5 causal missense variants in or near IKZF5 zinc fingers, of which 2 occurred de novo and 3 co-segregated in 3 pedigrees. A canonical DNA-zinc finger binding model predicts that 3 of the variants alter DNA recognition. Expression studies showed that chromatin binding was disrupted in mutant compared with wild-type IKZF5, and electron microscopy revealed a reduced quantity of α granules in normally sized platelets. Proplatelet formation was reduced in megakaryocytes from 7 cases relative to 6 controls. Comparison of RNA-sequencing data from platelets, monocytes, neutrophils, and CD4+ T cells from 3 cases and 14 healthy controls showed 1194 differentially expressed genes in platelets but only 4 differentially expressed genes in each of the other blood cell types. In conclusion, IKZF5 is a novel transcriptional regulator of megakaryopoiesis and the eighth transcription factor associated with dominant thrombocytopenia in humans.

    Funded by: British Heart Foundation: FS/18/53/33863; Medical Research Council: MR/J011711/1, MR/R002363/1

    Blood 2019;134;23;2070-2081

  • Mendelian Randomization Analysis of Hemoglobin A1c as a Risk Factor for Coronary Artery Disease.

    Leong A, Chen J, Wheeler E, Hivert MF, Liu CT, Merino J, Dupuis J, Tai ES, Rotter JI, Florez JC, Barroso I and Meigs JB

    Massachusetts General Hospital, Boston, MA asleong@mgh.h