Sanger Institute - Publications 2020

Number of papers published in 2020: 322

  • N6-methyladenosine regulates the stability of RNA:DNA hybrids in human cells.

    Abakir A, Giles TC, Cristini A, Foster JM, Dai N, Starczak M, Rubio-Roldan A, Li M, Eleftheriou M, Crutchley J, Flatt L, Young L, Gaffney DJ, Denning C, Dalhus B, Emes RD, Gackowski D, Corrêa IR, Garcia-Perez JL, Klungland A, Gromak N and Ruzov A

    Department of Stem Cell Biology, University of Nottingham, Nottingham, UK.

    R-loops are nucleic acid structures formed by an RNA:DNA hybrid and unpaired single-stranded DNA that represent a source of genomic instability in mammalian cells<sup>1-4</sup>. Here we show that N<sup>6</sup>-methyladenosine (m<sup>6</sup>A) modification, contributing to different aspects of messenger RNA metabolism<sup>5,6</sup>, is detectable on the majority of RNA:DNA hybrids in human pluripotent stem cells. We demonstrate that m<sup>6</sup>A-containing R-loops accumulate during G<sub>2</sub>/M and are depleted at G<sub>0</sub>/G<sub>1</sub> phases of the cell cycle, and that the m<sup>6</sup>A reader promoting mRNA degradation, YTHDF2 (ref. <sup>7</sup>), interacts with R-loop-enriched loci in dividing cells. Consequently, YTHDF2 knockout leads to increased R-loop levels, cell growth retardation and accumulation of γH2AX, a marker for DNA double-strand breaks, in mammalian cells. Our results suggest that m<sup>6</sup>A regulates accumulation of R-loops, implying a role for this modification in safeguarding genomic stability.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/N005759/1; British Heart Foundation: PG/14/59/31000, SP/15/9/31605; British Heart Foundation (BHF): SP/15/9/31605, PG/14/59/31000; EC | EU Framework Programme for Research and Innovation H2020 | H2020 Priority Excellent Science | H2020 European Research Council (H2020 Excellent Science - European Research Council): ERC-STG-2012-233764; Heart Research UK: TRP01/12; Medical Research Council: 1792340; National Centre for the Replacement Refinement and Reduction of Animals in Research (NC3Rs): CRACK-IT:35911-259146; National Centre for the Replacement, Refinement and Reduction of Animals in Research: NC/C013105/1, NC/C013202/1, NC/K000225/1; Oxford University | John Fell Fund, University of Oxford (John Fell OUP Research Fund: BVD07340; RCUK | Biotechnology and Biological Sciences Research Council (BBSRC): BB/N005759/1; RCUK | MRC | Medical Research Foundation: MR/M017354/1, MR/N013913/1; RCUK | Medical Research Council (MRC): MR/M017354/1; Royal Society: University Research Fellowship; Wellcome Trust; Wellcome Trust (Wellcome): WT206194

    Nature genetics 2020;52;1;48-55

  • High-throughput phenotyping reveals expansive genetic and structural underpinnings of immune variation.

    Abeler-Dörner L, Laing AG, Lorenc A, Ushakov DS, Clare S, Speak AO, Duque-Correa MA, White JK, Ramirez-Solis R, Saran N, Bull KR, Morón B, Iwasaki J, Barton PR, Caetano S, Hng KI, Cambridge E, Forman S, Crockford TL, Griffiths M, Kane L, Harcourt K, Brandt C, Notley G, Babalola KO, Warren J, Mason JC, Meeniga A, Karp NA, Melvin D, Cawthorne E, Weinrick B, Rahim A, Drissler S, Meskas J, Yue A, Lux M, Song-Zhao GX, Chan A, Ballesteros Reviriego C, Abeler J, Wilson H, Przemska-Kosicka A, Edmans M, Strevens N, Pasztorek M, Meehan TF, Powrie F, Brinkman R, Dougan G, Jacobs W, Lloyd CM, Cornall RJ, Maloy KJ, Grencis RK, Griffiths GM, Adams DJ and Hayday AC

    Department of Immunobiology, King's College London, London, UK.

    By developing a high-density murine immunophenotyping platform compatible with high-throughput genetic screening, we have established profound contributions of genetics and structure to immune variation ( Specifically, high-throughput phenotyping of 530 unique mouse gene knockouts identified 140 monogenic 'hits', of which most had no previous immunologic association. Furthermore, hits were collectively enriched in genes for which humans show poor tolerance to loss of function. The immunophenotyping platform also exposed dense correlation networks linking immune parameters with each other and with specific physiologic traits. Such linkages limit freedom of movement for individual immune parameters, thereby imposing genetically regulated 'immunologic structures', the integrity of which was associated with immunocompetence. Hence, we provide an expanded genetic resource and structural perspective for understanding and monitoring immune variation in health and disease.

    Funded by: Medical Research Council: MC_UU_00008/6; Wellcome Trust (Wellcome): 100156/Z/12/Z

    Nature immunology 2020;21;1;86-100

  • Interferon-gamma polymorphisms and risk of iron deficiency and anaemia in Gambian children.

    Abuga KM, Rockett KA, Muriuki JM, Koch O, Nairz M, Sirugo G, Bejon P, Kwiatkowski DP, Prentice AM and Atkinson SH

    Kenya Medical Research Institute (KEMRI) Centre for Geographic Medicine Coast, KEMRI-Wellcome Trust Research Programme, Kilifi, Kenya.

    <b>Background</b>: Anaemia is a major public health concern especially in African children living in malaria-endemic regions. Interferon-gamma (IFN-γ) is elevated during malaria infection and is thought to influence erythropoiesis and iron status. Genetic variants in the IFN-γ gene <i>(IFNG</i>) are associated with increased IFN-γ production. We investigated putative functional single nucleotide polymorphisms (SNPs) and haplotypes of <i>IFNG</i> in relation to nutritional iron status and anaemia in Gambian children over a malaria season. <b>Methods:</b> We used previously available data from Gambian family trios to determine informative SNPs and then used the Agena Bioscience MassArray platform to type five SNPs from the <i>IFNG</i> gene in a cohort of 780 Gambian children. We also measured haemoglobin and biomarkers of iron status and inflammation at the start and end of a malaria season. <b>Results:</b> We identified five <i>IFNG</i> haplotype-tagging SNPs ( <i>IFNG</i>-1616 [rs2069705], <i>IFNG</i>+874 [rs2430561], <i>IFNG</i>+2200 [rs1861493], <i>IFNG</i>+3234 [rs2069718] and <i>IFNG</i>+5612 [rs2069728]). The <i>IFNG</i>+2200C [rs1861493] allele was associated with reduced haemoglobin concentrations (adjusted β -0.44 [95% CI -0.75, -0.12]; Bonferroni adjusted P = 0.03) and a trend towards iron deficiency compared to wild-type at the end of the malaria season in multivariable models adjusted for potential confounders. A haplotype uniquely identified by <i>IFNG</i>+2200C was similarly associated with reduced haemoglobin levels and trends towards iron deficiency, anaemia and iron deficiency anaemia at the end of the malaria season in models adjusted for age, sex, village, inflammation and malaria parasitaemia. <b>Conclusion:</b> We found limited statistical evidence linking <i>IFNG</i> polymorphisms with a risk of developing iron deficiency and anaemia in Gambian children. More definitive studies are needed to investigate the effects of genetically influenced IFN-γ levels on the risk of iron deficiency and anaemia in children living in malaria-endemic areas.

    Wellcome open research 2020;5;40

  • Disruption of chromatin folding domains by somatic genomic rearrangements in human cancer.

    Akdemir KC, Le VT, Chandran S, Li Y, Verhaak RG, Beroukhim R, Campbell PJ, Chin L, Dixon JR, Futreal PA, PCAWG Structural Variation Working Group and PCAWG Consortium

    Department of Genomic Medicine, University of Texas MD Anderson Cancer Center, Houston, TX, USA.

    Chromatin is folded into successive layers to organize linear DNA. Genes within the same topologically associating domains (TADs) demonstrate similar expression and histone-modification profiles, and boundaries separating different domains have important roles in reinforcing the stability of these features. Indeed, domain disruptions in human cancers can lead to misregulation of gene expression. However, the frequency of domain disruptions in human cancers remains unclear. Here, as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA), which aggregated whole-genome sequencing data from 2,658 cancers across 38 tumor types, we analyzed 288,457 somatic structural variations (SVs) to understand the distributions and effects of SVs across TADs. Notably, SVs can lead to the fusion of discrete TADs, and complex rearrangements markedly change chromatin folding maps in the cancer genomes. Notably, only 14% of the boundary deletions resulted in a change in expression in nearby genes of more than twofold.

    Funded by: Cancer Prevention and Research Institute of Texas (Cancer Prevention Research Institute of Texas): R1205; NIH HHS: DP5 OD023071; Welch Foundation: G-0040

    Nature genetics 2020;52;3;294-305

  • A single-nucleotide polymorphism in a Plasmodium berghei ApiAP2 transcription factor alters the development of host immunity.

    Akkaya M, Bansal A, Sheehan PW, Pena M, Molina-Cruz A, Orchard LM, Cimperman CK, Qi CF, Ross P, Yazew T, Sturdevant D, Anzick SL, Thiruvengadam G, Otto TD, Billker O, Llinás M, Miller LH and Pierce SK

    Laboratory of Immunogenetics, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD, USA.

    The acquisition of malaria immunity is both remarkably slow and unpredictable. At present, we know little about the malaria parasite genes that influence the host's ability to mount a protective immune response. Here, we show that a single-nucleotide polymorphism (SNP) resulting in a single amino acid change (S to F) in an ApiAP2 transcription factor in the rodent malaria parasite <i>Plasmodium berghei</i> (<i>Pb</i>) NK65 allowed infected mice to mount a T helper cell 1 (T<sub>H</sub>1)-type immune response that controlled subsequent infections. As compared to <i>Pb</i>NK65<sup>S</sup>, <i>Pb</i>NK65<sup>F</sup> parasites differentially expressed 46 genes, most of which are predicted to play roles in immune evasion. <i>Pb</i>NK65<sup>F</sup> infections resulted in an early interferon-γ response and a later expansion of germinal centers, resulting in high levels of infected red blood cell-specific T<sub>H</sub>1-type immunoglobulin G2b (IgG2b) and IgG2c antibodies. Thus, the <i>Pb</i> ApiAP2 transcription factor functions as a critical parasite virulence factor in malaria infections.

    Funded by: NIAID NIH HHS: R01 AI125565

    Science advances 2020;6;6;eaaw6957

  • The repertoire of mutational signatures in human cancer.

    Alexandrov LB, Kim J, Haradhvala NJ, Huang MN, Tian Ng AW, Wu Y, Boot A, Covington KR, Gordenin DA, Bergstrom EN, Islam SMA, Lopez-Bigas N, Klimczak LJ, McPherson JR, Morganella S, Sabarinathan R, Wheeler DA, Mustonen V, PCAWG Mutational Signatures Working Group, Getz G, Rozen SG, Stratton MR and PCAWG Consortium

    Department of Cellular and Molecular Medicine, Department of Bioengineering, Moores Cancer Center, University of California, San Diego, CA, USA.

    Somatic mutations in cancer genomes are caused by multiple mutational processes, each of which generates a characteristic mutational signature<sup>1</sup>. Here, as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium<sup>2</sup> of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA), we characterized mutational signatures using 84,729,690 somatic mutations from 4,645 whole-genome and 19,184 exome sequences that encompass most types of cancer. We identified 49 single-base-substitution, 11 doublet-base-substitution, 4 clustered-base-substitution and 17 small insertion-and-deletion signatures. The substantial size of our dataset, compared with previous analyses<sup>3-15</sup>, enabled the discovery of new signatures, the separation of overlapping signatures and the decomposition of signatures into components that may represent associated-but distinct-DNA damage, repair and/or replication mechanisms. By estimating the contribution of each signature to the mutational catalogues of individual cancer genomes, we revealed associations of signatures to exogenous or endogenous exposures, as well as to defective DNA-maintenance processes. However, many signatures are of unknown cause. This analysis provides a systematic perspective on the repertoire of mutational processes that contribute to the development of human cancer.

    Funded by: European Research Council; NCI NIH HHS: U24 CA143843, U24 CA143845, U24 CA210999; NIGMS NIH HHS: T32 GM008313; Wellcome Trust: 206194

    Nature 2020;578;7793;94-101

  • NtrBC Regulates Invasiveness and Virulence of Pseudomonas aeruginosa During High-Density Infection.

    Alford MA, Baghela A, Yeung ATY, Pletzer D and Hancock REW

    Centre for Microbial Diseases and Immunity Research, University of British Columbia, Vancouver, BC, Canada.

    <i>Pseudomonas aeruginosa</i> is an opportunistic pathogen that is a major cause of nosocomial and chronic infections contributing to morbidity and mortality in cystic fibrosis patients. One of the reasons for its success as a pathogen is its ability to adapt to a broad range of circumstances. Here, we show the involvement of the general nitrogen regulator NtrBC, which is structurally conserved but functionally diverse across species, in pathogenic and adaptive states of <i>P. aeruginosa</i>. The role of NtrB and NtrC was examined in progressive or chronic infections, which revealed that mutants (Δ<i>ntrB</i>, Δ<i>ntrC</i>, and Δ<i>ntrBC</i>) were reduced in their ability to invade and cause damage in a high-density abscess model <i>in vivo.</i> Progressive infections were established with mutants in the highly virulent PA14 genetic background, whereas chronic infections were established with mutants in the less virulent clinical isolate LESB58 genetic background. Characterization of adaptive lifestyles <i>in vitro</i> confirmed that the double Δ<i>ntrBC</i> mutant demonstrated >40% inhibition of biofilm formation, a nearly complete inhibition of swarming motility, and a modest decrease and altered surfing motility colony appearance; with the exception of swarming, single mutants generally had more subtle or no changes. Transcriptional profiles of deletion mutants under swarming conditions were defined using RNA-Seq and unveiled dysregulated expression of hundreds of genes implicated in virulence in PA14 and LESB58 chronic lung infections, as well as carbon and nitrogen metabolism. Thus, transcriptional profiles were validated by testing responsiveness of mutants to several key intermediates of central metabolic pathways. These results indicate that NtrBC is a global regulatory system involved in both pathological and physiological processes relevant to the success of <i>Pseudomonas</i> in high-density infection.

    Frontiers in microbiology 2020;11;773

  • Population Structure, Stratification, and Introgression of Human Structural Variation.

    Almarri MA, Bergström A, Prado-Martinez J, Yang F, Fu B, Dunham AS, Chen Y, Hurles ME, Tyler-Smith C and Xue Y

    Wellcome Sanger Institute, Hinxton CB10 1SA, UK. Electronic address:

    Structural variants contribute substantially to genetic diversity and are important evolutionarily and medically, but they are still understudied. Here we present a comprehensive analysis of structural variation in the Human Genome Diversity panel, a high-coverage dataset of 911 samples from 54 diverse worldwide populations. We identify, in total, 126,018 variants, 78% of which were not identified in previous global sequencing projects. Some reach high frequency and are private to continental groups or even individual populations, including regionally restricted runaway duplications and putatively introgressed variants from archaic hominins. By de novo assembly of 25 genomes using linked-read sequencing, we discover 1,643 breakpoint-resolved unique insertions, in aggregate accounting for 1.9 Mb of sequence absent from the GRCh38 reference. Our results illustrate the limitation of a single human reference and the need for high-quality genomes from diverse populations to fully discover and understand human genetic variation.

    Cell 2020

  • Molecular characterization of Brucella ovis in Argentina.

    Alvarez LP, Ruiz-Villalobos N, Suárez-Esquivel M, Thomson NR, Marcellino R, Víquez-Ruiz E, Robles CA and Guzmán-Verri C

    Centro de Referencia en Levaduras y Tecnología Cervecera (CRELTEC), Instituto Andino Patagónico de Tecnologías Biológicas y Geoambientales (IPATEC), Consejo Nacional de Investigaciones Científicas y Técnicas, Universidad Nacional del Comahue, San Carlos de Bariloche, Río Negro, Argentina. Electronic address:

    Brucellosis in rams is caused by Brucella ovis or Brucella melitensis and it is considered one of the most important infectious diseases of males in sheep-raising countries. Molecular characterization of Brucella spp. achieved by multi-locus variable number of tandem repeats analysis (MLVA) is a powerful tool to genotype Brucella spp. However, data regarding B. ovis genotyping is scarce. Thus, the aim of this study was to characterize the molecular diversity of B. ovis field-strains in Argentina. A total of 115 isolates of B. ovis from Argentina and Uruguay were genotyped using MLVA-16 and analyzed altogether with 14 publicly available B. ovis genotypes from Brazil. The Discriminatory Power (D) was 0.996 for MLVA-16 and 0.0998 for MLVA-8 and MLVA-11. Analysis of MLVA-16 revealed 100 different genotypes, all of them novel, including 90 unique ones. There was no correlation between geographical distribution and genotype and results showed a higher diversity within provinces than between provinces. Clustering analysis of the strains from Argentina, Uruguay and Brazil revealed that the 129 isolates were grouped into two clades. Whole Genome Sequencing analysis of the 19 B. ovis genomes available in public databases, and including some of the Argentinian strains used in this study, revealed clustering of the Argentinian isolates and closer relationship with B. ovis from New Zealand and Australia. This work adds new data to the poorly understood distribution map of genotypes regionally and worldwide for B. ovis and it constitutes the largest study of B. ovis molecular genotyping until now.

    Veterinary microbiology 2020;245;108703

  • Schistosoma species detection by environmental DNA assays in African freshwaters.

    Alzaylaee H, Collins RA, Rinaldi G, Shechonge A, Ngatunga B, Morgan ER and Genner MJ

    School of Biological Sciences, University of Bristol, Life Sciences Building, Bristol, United Kingdom.

    Background: Schistosomiasis is a neglected tropical parasitic disease associated with severe pathology, mortality and economic loss worldwide. Programs for disease control may benefit from specific and sensitive diagnostic methods to detect Schistosoma trematodes in aquatic environments. Here we report the development of novel environmental DNA (eDNA) qPCR assays for the presence of the human-infecting species Schistosoma mansoni, S. haematobium and S. japonicum.

    Methodology/principal findings: We first tested the specificity of the assays across the three species using genomic DNA preparations which showed successful amplification of target sequences with no cross amplification between the three focal species. In addition, we evaluated the specificity of the assays using synthetic DNA of multiple Schistosoma species, and demonstrated a high overall specificity; however, S. japonicum and S. haematobium assays showed cross-species amplification with very closely-related species. We next tested the effectiveness of the S. mansoni assay using eDNA samples from aquaria containing infected host gastropods, with the target species revealed as present in all infected aquaria. Finally, we evaluated the effectiveness of the S. mansoni and S. haematobium assays using eDNA samples from eight discrete natural freshwater sites in Tanzania, and demonstrated strong correspondence between infection status established using eDNA and conventional assays of parasite prevalence in host snails.

    Conclusions/significance: Collectively, our results suggest that eDNA monitoring is able to detect schistosomes in freshwater bodies, but refinement of the field sampling, storage and assay methods are likely to optimise its performance. We anticipate that environmental DNA-based approaches will help to inform epidemiological studies and contribute to efforts to control and eliminate schistosomiasis in endemic areas.

    PLoS neglected tropical diseases 2020;14;3;e0008129

  • Professional duties are now considered legal duties of care within genomic medicine.

    Anna M, Christine P, Jonathan R, Richard M, Alessia C, Lauren R and Jerome A

    Society and Ethics Research, Connecting Science, Wellcome Genome Campus, Cambridge, UK.

    The legal duty to protect patient confidentiality is common knowledge amongst healthcare professionals. However, what may not be widely known, is that this duty is not always absolute. In the United Kingdom, both the General Medical Council governing the practice of all doctors, as well as many other professional codes of practice recognise that, under certain circumstances, it may be appropriate to break confidentiality. This arises when there is a wider duty to protect the health of others, and when the risk of non-disclosure outweighs the potential harm from breaking confidentiality. We discuss this situation specifically in relation to genomic medicine where relatives in a family may have differing views on the sharing of familial genetic information. Overruling a patient's wishes is predicated on balancing the duty of care towards the patient versus protecting their relative from serious harm. We discuss the practice implications of a pivotal legal case that concluded recently in the High Court of Justice in England and Wales, ABC v St Georges Healthcare NHS Trust & Ors. Professional guidance is already clear that genetic healthcare professionals must undertake a balancing exercise to weigh up contradictory duties of care. However, the judge has provided a new legal weighting to these professional duties: 'The scope of the duty extends not only to conducting the necessary balancing exercise but also to acting in accordance with its outcome' [1: 189]. In the context of genomic medicine, this has important consequences for clinical practice.

    Funded by: Wellcome Trust (Wellcome): 206194

    European journal of human genetics : EJHG 2020

  • In situ CRISPR-Cas9 base editing for the development of genetically engineered mouse models of breast cancer.

    Annunziato S, Lutz C, Henneman L, Bhin J, Wong K, Siteur B, van Gerwen B, de Korte-Grimmerink R, Zafra MP, Schatoff EM, Drenth AP, van der Burg E, Eijkman T, Mukherjee S, Boroviak K, Wessels LF, van de Ven M, Huijbers IJ, Adams DJ, Dow LE and Jonkers J

    Division of Molecular Pathology, The Netherlands Cancer Institute, Amsterdam, The Netherlands.

    Genetically engineered mouse models (GEMMs) of cancer have proven to be of great value for basic and translational research. Although CRISPR-based gene disruption offers a fast-track approach for perturbing gene function and circumvents certain limitations of standard GEMM development, it does not provide a flexible platform for recapitulating clinically relevant missense mutations in vivo. To this end, we generated knock-in mice with Cre-conditional expression of a cytidine base editor and tested their utility for precise somatic engineering of missense mutations in key cancer drivers. Upon intraductal delivery of sgRNA-encoding vectors, we could install point mutations with high efficiency in one or multiple endogenous genes in situ and assess the effect of defined allelic variants on mammary tumorigenesis. While the system also produces bystander insertions and deletions that can stochastically be selected for when targeting a tumor suppressor gene, we could effectively recapitulate oncogenic nonsense mutations. We successfully applied this system in a model of triple-negative breast cancer, providing the proof of concept for extending this flexible somatic base editing platform to other tissues and tumor types.

    Funded by: Cancer Genomics Netherlands (CGCNL): 024001028; Cancer Systems Biology Center (CSBC): 85300120; EC | FP7 | FP7 Ideas: European Research Council (FP7 Ideas); ERC Synergy project CombatCancer: 319661; National Roadmap grant for Large-Scale Research Facilities: 184032303; Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO): VICI 91814643; Netherlands Genomics Initiative (NGI) Zenith: 93512009; Oncode Institute

    The EMBO journal 2020;39;5;e102169

  • Development of copy number assays for detection and surveillance of piperaquine resistance associated plasmepsin 2/3 copy number variation in Plasmodium falciparum.

    Ansbro MR, Jacob CG, Amato R, Kekre M, Amaratunga C, Sreng S, Suon S, Miotto O, Fairhurst RM, Wellems TE and Kwiatkowski DP

    Wellcome Sanger Institute, Hinxton, UK.

    Background: Long regarded as an epicenter of drug-resistant malaria, Southeast Asia continues to provide new challenges to the control of Plasmodium falciparum malaria. Recently, resistance to the artemisinin combination therapy partner drug piperaquine has been observed in multiple locations across Southeast Asia. Genetic studies have identified single nucleotide polymorphisms as well as copy number variations in the plasmepsin 2 and plasmepsin 3 genes, which encode haemoglobin-degrading proteases that associate with clinical and in vitro piperaquine resistance.

    Results: To accurately and quickly determine the presence of copy number variations in the plasmepsin 2/3 genes in field isolates, this study developed a quantitative PCR assay using TaqMan probes. Copy number estimates were validated using a separate SYBR green-based quantitative PCR assay as well as a novel PCR-based breakpoint assay to detect the hybrid gene product. Field samples from 2012 to 2015 across three sites in Cambodia were tested using DNA extracted from dried blood spots and whole blood to monitor the extent of plasmepsin 2/3 gene amplifications, as well as amplifications in the multidrug resistance transporter 1 gene (pfmdr1), a marker of mefloquine resistance. This study found high concordance across all methods of copy number detection. For samples derived from dried blood spots, a success rate greater than 80% was found in each assay, with more recent samples performing better. Evidence of extensive plasmepsin 2/3 copy number amplifications was observed in Pursat (94%, 2015) (Western Cambodia) and Preah Vihear (87%, 2014) (Northern Cambodia), and lower levels in Ratanakiri (16%, 2014) (Eastern Cambodia). A shift was observed from two copies of plasmepsin 2 in Pursat in 2013 to three copies in 2014-2015 (25% to 64%). Pfmdr1 amplifications were absent in all samples from Preah Vihear and Ratanakiri in 2014 and absent in Pursat in 2015.

    Conclusions: The multiplex TaqMan assay is a robust tool for monitoring both plasmepsin 2/3 and pfmdr1 copy number variations in field isolates, and the SYBR-green and breakpoint assays are useful for monitoring plasmepsin 2/3 amplifications. This study shows increasing levels of plasmepsin 2 copy numbers across Cambodia from 2012 to 2015 and a complete reversion of multicopy pfmdr1 parasites to single copy parasites in all study locations.

    Funded by: Bill and Melinda Gates Foundation: OPP1118166; Department for International Development: MR/M005212/1; Medical Research Council UK: G0600718; Wellcome Trust: 090770/Z/09/Z, 098051, 206194

    Malaria journal 2020;19;1;181

  • Tet3 ablation in adult brain neurons increases anxiety-like behavior and regulates cognitive function in mice.

    Antunes C, Da Silva JD, Guerra-Gomes S, Alves ND, Ferreira F, Loureiro-Campos E, Branco MR, Sousa N, Reik W, Pinto L and Marques CJ

    Life and Health Sciences Research Institute (ICVS), School of Medicine, University of Minho, 4710-057, Braga, Portugal.

    TET3 is a member of the ten-eleven translocation (TET) family of enzymes which oxidize 5-methylcytosine (5mC) into 5-hydroxymethylcytosine (5hmC). Tet3 is highly expressed in the brain, where 5hmC levels are most abundant. In adult mice, we observed that TET3 is present in mature neurons and oligodendrocytes but is absent in astrocytes. To investigate the function of TET3 in adult postmitotic neurons, we crossed Tet3 floxed mice with a neuronal Cre-expressing mouse line, Camk2a-CreERT2, obtaining a Tet3 conditional KO (cKO) mouse line. Ablation of Tet3 in adult mature neurons resulted in increased anxiety-like behavior with concomitant hypercorticalism, and impaired hippocampal-dependent spatial orientation. Transcriptome and gene-specific expression analysis of the hippocampus showed dysregulation of genes involved in glucocorticoid signaling pathway (HPA axis) in the ventral hippocampus, whereas upregulation of immediate early genes was observed in both dorsal and ventral hippocampal areas. In addition, Tet3 cKO mice exhibit increased dendritic spine maturation in the ventral CA1 hippocampal subregion. Based on these observations, we suggest that TET3 is involved in molecular alterations that govern hippocampal-dependent functions. These results reveal a critical role for epigenetic modifications in modulating brain functions, opening new insights into the molecular basis of neurological disorders.

    Molecular psychiatry 2020

  • Type 1 Interferon Responses Underlie Tumor-Selective Replication of Oncolytic Measles Virus.

    Aref S, Castleton AZ, Bailey K, Burt R, Dey A, Leongamornlert D, Mitchell RJ, Okasha D and Fielding AK

    UCL Cancer Institute, London WC1E 6DD, UK.

    The mechanism of tumor-selective replication of oncolytic measles virus (MV) is poorly understood. Using a stepwise model of cellular transformation, in which oncogenic hits were additively expressed in human bone marrow-derived mesenchymal stromal cells, we show that MV-induced oncolysis increased progressively with transformation. The type 1 interferon (IFN) response to MV infection was significantly reduced and delayed, in accordance with the level of transformation. Consistently, we observed delayed and reduced signal transducer and activator of transcription (STAT1) phosphorylation in the fully transformed cells. Pre-treatment with IFNβ restored resistance to MV-mediated oncolysis. Gene expression profiling to identify the genetic correlates of susceptibility to MV oncolysis revealed a dampened basal level of immune-related genes in the fully transformed cells compared to their normal counterparts. IFN-induced transmembrane protein 1 (IFITM1) was the foremost basally downregulated immune gene. Stable IFITM1 overexpression in MV-susceptible cells resulted in a 50% increase in cell viability and a significant reduction in viral replication at 24 h after MV infection. Overall, our data indicate that the basal reduction in functions of the type 1 IFN pathway is a major contributor to the oncolytic selectivity of MV. In particular, we have identified IFITM1 as a restriction factor for oncolytic MV, acting at early stages of infection.

    Molecular therapy : the journal of the American Society of Gene Therapy 2020

  • MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data.

    Argelaguet R, Arnol D, Bredikhin D, Deloro Y, Velten B, Marioni JC and Stegle O

    European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK.

    Technological advances have enabled the profiling of multiple molecular layers at single-cell resolution, assaying cells from multiple samples or conditions. Consequently, there is a growing need for computational strategies to analyze data from complex experimental designs that include multiple data modalities and multiple groups of samples. We present Multi-Omics Factor Analysis v2 (MOFA+), a statistical framework for the comprehensive and scalable integration of single-cell multi-modal data. MOFA+ reconstructs a low-dimensional representation of the data using computationally efficient variational inference and supports flexible sparsity constraints, allowing to jointly model variation across multiple sample groups and data modalities.

    Genome biology 2020;21;1;111

  • Integrating whole-genome sequencing within the National Antimicrobial Resistance Surveillance Program in the Philippines.

    Argimón S, Masim MAL, Gayeta JM, Lagrada ML, Macaranas PKV, Cohen V, Limas MT, Espiritu HO, Palarca JC, Chilam J, Jamoralin MC, Villamin AS, Borlasa JB, Olorosa AM, Hernandez LFT, Boehme KD, Jeffrey B, Abudahab K, Hufano CM, Sia SB, Stelling J, Holden MTG, Aanensen DM and Carlos CC

    Centre for Genomic Pathogen Surveillance, Wellcome Genome Campus, Hinxton, UK.

    National networks of laboratory-based surveillance of antimicrobial resistance (AMR) monitor resistance trends and disseminate these data to AMR stakeholders. Whole-genome sequencing (WGS) can support surveillance by pinpointing resistance mechanisms and uncovering transmission patterns. However, genomic surveillance is rare in low- and middle-income countries. Here, we implement WGS within the established Antimicrobial Resistance Surveillance Program of the Philippines via a binational collaboration. In parallel, we characterize bacterial populations of key bug-drug combinations via a retrospective sequencing survey. By linking the resistance phenotypes to genomic data, we reveal the interplay of genetic lineages (strains), AMR mechanisms, and AMR vehicles underlying the expansion of specific resistance phenotypes that coincide with the growing carbapenem resistance rates observed since 2010. Our results enhance our understanding of the drivers of carbapenem resistance in the Philippines, while also serving as the genetic background to contextualize ongoing local prospective surveillance.

    Nature communications 2020;11;1;2719

  • gplas: a comprehensive tool for plasmid analysis using short-read graphs.

    Arredondo-Alonso S, Bootsma M, Hein Y, Rogers MRC, Corander J, Willems RJL and Schürch AC

    Department of Medical Microbiology, University Medical Center Utrecht, Utrecht, The Netherlands.

    Summary: Plasmids can horizontally transmit genetic traits, enabling rapid bacterial adaptation to new environments and hosts. Short-read whole-genome sequencing data is often applied to large-scale bacterial comparative genomics projects but the reconstruction of plasmids from these data is facing severe limitations, such as the inability to distinguish plasmids from each other in a bacterial genome. We developed gplas, a new approach to reliably separate plasmid contigs into discrete components using sequence composition, coverage, assembly graph information and network partitioning based on a pruned network of plasmid unitigs. Gplas facilitates the analysis of large numbers of bacterial isolates and allows a detailed analysis of plasmid epidemiology based solely on short read sequence data.

    Availability and implementation: Gplas is written in R, Bash and uses a Snakemake pipeline as a workflow management system. Gplas is available under the GNU General Public License v3.0 at

    Bioinformatics (Oxford, England) 2020

  • Plasmids Shaped the Recent Emergence of the Major Nosocomial Pathogen Enterococcus faecium.

    Arredondo-Alonso S, Top J, McNally A, Puranen S, Pesonen M, Pensar J, Marttinen P, Braat JC, Rogers MRC, van Schaik W, Kaski S, Willems RJL, Corander J and Schürch AC

    Department of Medical Microbiology, University Medical Center Utrecht, Utrecht, The Netherlands.

    <i>Enterococcus faecium</i> is a gut commensal of humans and animals but is also listed on the WHO global priority list of multidrug-resistant pathogens. Many of its antibiotic resistance traits reside on plasmids and have the potential to be disseminated by horizontal gene transfer. Here, we present the first comprehensive population-wide analysis of the pan-plasmidome of a clinically important bacterium, by whole-genome sequence analysis of 1,644 isolates from hospital, commensal, and animal sources of <i>E. faecium</i> Long-read sequencing on a selection of isolates resulted in the completion of 305 plasmids that exhibited high levels of sequence modularity. We further investigated the entirety of all plasmids of each isolate (plasmidome) using a combination of short-read sequencing and machine-learning classifiers. Clustering of the plasmid sequences unraveled different <i>E. faecium</i> populations with a clear association with hospitalized patient isolates, suggesting different optimal configurations of plasmids in the hospital environment. The characterization of these populations allowed us to identify common mechanisms of plasmid stabilization such as toxin-antitoxin systems and genes exclusively present in particular plasmidome populations exemplified by copper resistance, phosphotransferase systems, or bacteriocin genes potentially involved in niche adaptation. Based on the distribution of k-mer distances between isolates, we concluded that plasmidomes rather than chromosomes are most informative for source specificity of <i>E. faecium</i><b>IMPORTANCE</b><i>Enterococcus faecium</i> is one of the most frequent nosocomial pathogens of hospital-acquired infections. <i>E. faecium</i> has gained resistance against most commonly available antibiotics, most notably, against ampicillin, gentamicin, and vancomycin, which renders infections difficult to treat. Many antibiotic resistance traits, in particular, vancomycin resistance, can be encoded in autonomous and extrachromosomal elements called plasmids. These sequences can be disseminated to other isolates by horizontal gene transfer and confer novel mechanisms to source specificity. In our study, we elucidated the total plasmid content, referred to as the plasmidome, of 1,644 <i>E. faecium</i> isolates by using short- and long-read whole-genome technologies with the combination of a machine-learning classifier. This was fundamental to investigate the full collection of plasmid sequences present in our collection (pan-plasmidome) and to observe the potential transfer of plasmid sequences between <i>E. faecium</i> hosts. We observed that <i>E. faecium</i> isolates from hospitalized patients carried a larger number of plasmid sequences compared to that from other sources, and they elucidated different configurations of plasmidome populations in the hospital environment. We assessed the contribution of different genomic components and observed that plasmid sequences have the highest contribution to source specificity. Our study suggests that <i>E. faecium</i> plasmids are regulated by complex ecological constraints rather than physical interaction between hosts.

    mBio 2020;11;1

  • The secreted protease Adamts18 links hormone action to activation of the mammary stem cell niche.

    Ataca D, Aouad P, Constantin C, Laszlo C, Beleut M, Shamseddin M, Rajaram RD, Jeitziner R, Mead TJ, Caikovski M, Bucher P, Ambrosini G, Apte SS and Brisken C

    Ecole Polytechnique Fédérale de Lausanne, Station 19, CH-1015, Lausanne, Switzerland.

    Estrogens and progesterone control breast development and carcinogenesis via their cognate receptors expressed in a subset of luminal cells in the mammary epithelium. How they control the extracellular matrix, important to breast physiology and tumorigenesis, remains unclear. Here we report that both hormones induce the secreted protease Adamts18 in myoepithelial cells by controlling Wnt4 expression with consequent paracrine canonical Wnt signaling activation. Adamts18 is required for stem cell activation, has multiple binding partners in the basement membrane and interacts genetically with the basal membrane-specific proteoglycan, Col18a1, pointing to the basement membrane as part of the stem cell niche. In vitro, ADAMTS18 cleaves fibronectin; in vivo, Adamts18 deletion causes increased collagen deposition during puberty, which results in impaired Hippo signaling and reduced Fgfr2 expression both of which control stem cell function. Thus, Adamts18 links luminal hormone receptor signaling to basement membrane remodeling and stem cell activation.

    Funded by: Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (Swiss National Science Foundation): 31003A_141248

    Nature communications 2020;11;1;1571

  • Adipose tissue-liver cross talk in the control of whole-body metabolism: implications in non-alcoholic fatty liver disease.

    Azzu V, Vacca M, Virtue S, Allison M and Vidal-Puig A

    Wellcome Trust-MRC Institute of Metabolic Science-Metabolic Research Laboratories, Level 4, Box 289, Addenbrooke's Hospital, Cambridge, CB2 0QQ, United Kingdom; The Liver Unit, Department of Medicine, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Hills Road, Cambridge, CB2 0QQ, United Kingdom.

    Adipose tissue and the liver play a significant role in the regulation of whole body energy homeostasis, but they have not evolved to cope with the continuous, chronic, nutrient surplus seen in obesity. In this review, we detail how prolonged metabolic stress leads to adipose tissue dysfunction, inflammation and adipokine release that results in increased lipid flux to the liver. Overall, the upshot of hepatic fat accumulation alongside an insulin resistant state, is that hepatic lipid enzymatic pathways are modulated and overwhelmed, resulting in the selective build-up of toxic lipid species, which worsens the pro-inflammatory and pro-fibrotic shift observed in NASH.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/H002731/1; Medical Research Council: G0400192, G0802051, MC_UU_12012/2

    Gastroenterology 2020

  • Microarray analyses reveal strain-specific antibody responses to Plasmodium falciparum apical membrane antigen 1 variants following natural infection and vaccination.

    Bailey JA, Berry AA, Travassos MA, Ouattara A, Boudova S, Dotsey EY, Pike A, Jacob CG, Adams M, Tan JC, Bannen RM, Patel JJ, Pablo J, Nakajima R, Jasinskas A, Dutta S, Takala-Harrison S, Lyke KE, Laurens MB, Niangaly A, Coulibaly D, Kouriba B, Doumbo OK, Thera MA, Felgner PL and Plowe CV

    Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD, USA.

    Vaccines based on Plasmodium falciparum apical membrane antigen 1 (AMA1) have failed due to extensive polymorphism in AMA1. To assess the strain-specificity of antibody responses to malaria infection and AMA1 vaccination, we designed protein and peptide microarrays representing hundreds of unique AMA1 variants. Following clinical malaria episodes, children had short-lived, sequence-independent increases in average whole-protein seroreactivity, as well as strain-specific responses to peptides representing diverse epitopes. Vaccination resulted in dramatically increased seroreactivity to all 263 AMA1 whole-protein variants. High-density peptide analysis revealed that vaccinated children had increases in seroreactivity to four distinct epitopes that exceeded responses to natural infection. A single amino acid change was critical to seroreactivity to peptides in a region of AMA1 associated with strain-specific vaccine efficacy. Antibody measurements using whole antigens may be biased towards conserved, immunodominant epitopes. Peptide microarrays may help to identify immunogenic epitopes, define correlates of vaccine protection, and measure strain-specific vaccine-induced antibodies.

    Funded by: Division of Intramural Research, National Institute of Allergy and Infectious Diseases (Division of Intramural Research of the NIAID): R21AI119733; NIAID NIH HHS: K23 AI125720, R01 AI093635, R21 AI119733, T32 AI007524, U01 AI065683, U19 AI129386; U.S. Department of Health &amp; Human Services | NIH | Fogarty International Center (FIC): D43TW001589; U.S. Department of Health &amp; Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI): R01HL130750; U.S. Department of Health &amp; Human Services | National Institutes of Health (NIH): R01AI093635, U01AI065683

    Scientific reports 2020;10;1;3952

  • Expert curation of the human and mouse olfactory receptor gene repertoires identifies conserved coding regions split across two exons.

    Barnes IHA, Ibarra-Soria X, Fitzgerald S, Gonzalez JM, Davidson C, Hardy MP, Manthravadi D, Van Gerven L, Jorissen M, Zeng Z, Khan M, Mombaerts P, Harrow J, Logan DW and Frankish A

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

    Background: Olfactory receptor (OR) genes are the largest multi-gene family in the mammalian genome, with 874 in human and 1483 loci in mouse (including pseudogenes). The expansion of the OR gene repertoire has occurred through numerous duplication events followed by diversification, resulting in a large number of highly similar paralogous genes. These characteristics have made the annotation of the complete OR gene repertoire a complex task. Most OR genes have been predicted in silico and are typically annotated as intronless coding sequences.

    Results: Here we have developed an expert curation pipeline to analyse and annotate every OR gene in the human and mouse reference genomes. By combining evidence from structural features, evolutionary conservation and experimental data, we have unified the annotation of these gene families, and have systematically determined the protein-coding potential of each locus. We have defined the non-coding regions of many OR genes, enabling us to generate full-length transcript models. We found that 13 human and 41 mouse OR loci have coding sequences that are split across two exons. These split OR genes are conserved across mammals, and are expressed at the same level as protein-coding OR genes with an intronless coding region. Our findings challenge the long-standing and widespread notion that the coding region of a vertebrate OR gene is contained within a single exon.

    Conclusions: This work provides the most comprehensive curation effort of the human and mouse OR gene repertoires to date. The complete annotation has been integrated into the GENCODE reference gene set, for immediate availability to the research community.

    Funded by: NHGRI NIH HHS: 2U41HG007234

    BMC genomics 2020;21;1;196

  • Sociodemographic inequities associated with participation in leisure-time physical activity in sub-Saharan Africa: an individual participant data meta-analysis.

    Barr AL, Partap U, Young EH, Agoudavi K, Balde N, Kagaruki GB, Mayige MT, Longo-Mbenza B, Mutungi G, Mwalim O, Wesseh CS, Bahendeka SK, Guwatudde D, Jørgensen JMA, Bovet P, Motala AA and Sandhu MS

    Department of Medicine, University of Cambridge, Cambridge, UK.

    Background: Leisure-time physical activity (LTPA) is an important contributor to total physical activity and the focus of many interventions promoting activity in high-income populations. Little is known about LTPA in sub-Saharan Africa (SSA), and with expected declines in physical activity due to rapid urbanisation and lifestyle changes we aimed to assess the sociodemographic differences in the prevalence of LTPA in the adult populations of this region to identify potential barriers for equitable participation.

    Methods: A two-step individual participant data meta-analysis was conducted using data collected in SSA through 10 population health surveys that included the Global Physical Activity Questionnaire. For each sociodemographic characteristic, the pooled adjusted prevalence and risk ratios (RRs) for participation in LTPA were calculated using the random effects method. Between-study heterogeneity was explored through meta-regression analyses and tests for interaction.

    Results: Across the 10 populations (N = 26,022), 18.9% (95%CI: 14.3, 24.1; I<sup>2</sup> = 99.0%) of adults (≥ 18 years) participated in LTPA. Men were more likely to participate in LTPA compared with women (RR for women: 0.43; 95%CI: 0.32, 0.60; P < 0.001; I<sup>2</sup> = 97.5%), while age was inversely associated with participation. Higher levels of education were associated with increased LTPA participation (RR: 1.30; 95%CI: 1.09, 1.55; P = 0.004; I<sup>2</sup> = 98.1%), with those living in rural areas or self-employed less likely to participate in LTPA. These associations remained after adjusting for time spent physically active at work or through active travel.

    Conclusions: In these populations, participation in LTPA was low, and strongly associated with sex, age, education, self-employment and urban residence. Identifying the potential barriers that reduce participation in these groups is necessary to enable equitable access to the health and social benefits associated with LTPA.

    Funded by: Medical Research Foundation: MR/K013491/1; Wellcome Trust: WT206194

    BMC public health 2020;20;1;927

  • Nosocomial outbreak of the Middle East Respiratory Syndrome coronavirus: A phylogenetic, epidemiological, clinical and infection control analysis.

    Barry M, Phan MV, Akkielah L, Al-Majed F, Alhetheel A, Somily A, Alsubaie SS, McNabb SJ, Cotten M, Zumla A and Memish ZA

    Infectious Diseases Division, Faculty of Medicine, King Khalid University Hospital, King Saud University, Riyadh, Saudi Arabia. Electronic address:

    Background: Middle East Respiratory Syndrome coronavirus (MERS-CoV) continues to cause intermittent community and nosocomial outbreaks. Obtaining data on specific source(s) and transmission dynamics of MERS-CoV during nosocomial outbreaks has been challenging. We performed a clinical, epidemiological and phylogenetic investigation of an outbreak of MERS-CoV at a University Hospital in Riyadh, Kingdom of Saudi Arabia.

    Methods: Clinical, epidemiological and infection control data were obtained from patients and Healthcare workers (HCWs). Full genome sequencing was conducted on nucleic acid extracted directly from MERS-CoV PCR-confirmed clinical samples and phylogenetic analysis performed. Phylogenetic analysis combined with published MERS-CoV genomes was performed. HCWs compliance with infection control practices was also assessed.

    Results: Of 235 persons investigated, there were 23 laboratory confirmed MERS cases, 10 were inpatients and 13 HCWs. Eight of 10 MERS inpatients died (80% mortality). There were no deaths among HCWs. The primary index case assumed from epidemiological investigation was not substantiated phylogenetically. 17/18 of MERS cases were linked both phylogenetically and epidemiologically. One asymptomatic HCW yielded a MERS-CoV genome not directly linked to any other case in the investigation. Five HCWs with mild symptoms yielded >75% full MERS-CoV genome sequences. HCW compliance with use of gowns was 62.1%, gloves 69.7%, and masks 57.6%.

    Conclusions: Several factors and sources, including a HCW MERS-CoV 'carrier phenomenon', occur during nosocomial MERS-CoV outbreaks. Phylogenetic analyses of MERS-CoV linked to clinical and epidemiological information is essential for outbreak investigation. The specific role of apparently healthy HCWs in causing nosocomial outbreaks requires further definition.

    Travel medicine and infectious disease 2020;101807

  • Mouse Models of Myeloid Malignancies.

    Basheer F and Vassiliou G

    Wellcome-MRC Cambridge Stem Cell Institute, Jeffrey Cheah Biomedical Centre, Department of Haematology, University of Cambridge, Cambridge CB2 0AW, United Kingdom.

    Mouse models of human myeloid malignancies support the detailed and focused investigation of selected driver mutations and represent powerful tools in the study of these diseases. Carefully developed murine models can closely recapitulate human myeloid malignancies in vivo, enabling the interrogation of a number of aspects of these diseases including their preclinical course, interactions with the microenvironment, effects of pharmacological agents, and the role of non-cell-autonomous factors, as well as the synergy between co-occurring mutations. Importantly, advances in gene-editing technologies, particularly CRISPR-Cas9, have opened new avenues for the development and study of genetically modified mice and also enable the direct modification of mouse and human hematopoietic cells. In this review we provide a concise overview of some of the important mouse models that have advanced our understanding of myeloid leukemogenesis with an emphasis on models relevant to clonal hematopoiesis, myelodysplastic syndromes, and acute myeloid leukemia with a normal karyotype.

    Cold Spring Harbor perspectives in medicine 2020

  • Dissecting the early steps of MLL induced leukaemogenic transformation using a mouse model of AML.

    Basilico S, Wang X, Kennedy A, Tzelepis K, Giotopoulos G, Kinston SJ, Quiros PM, Wong K, Adams DJ, Carnevalli LS, Huntly BJP, Vassiliou GS, Calero-Nieto FJ and Göttgens B

    Wellcome and MRC Cambridge Stem Cell Institute and University of Cambridge Department of Haematology, Jeffrey Cheah Biomedical Centre, Puddicombe Way, Cambridge, CB2 0AW, UK.

    Leukaemogenic mutations commonly disrupt cellular differentiation and/or enhance proliferation, thus perturbing the regulatory programs that control self-renewal and differentiation of stem and progenitor cells. Translocations involving the Mll1 (Kmt2a) gene generate powerful oncogenic fusion proteins, predominantly affecting infant and paediatric AML and ALL patients. The early stages of leukaemogenic transformation are typically inaccessible from human patients and conventional mouse models. Here, we take advantage of cells conditionally blocked at the multipotent haematopoietic progenitor stage to develop a MLL-r model capturing early cellular and molecular consequences of MLL-ENL expression based on a clear clonal relationship between parental and leukaemic cells. Through a combination of scRNA-seq, ATAC-seq and genome-scale CRISPR-Cas9 screening, we identify pathways and genes likely to drive the early phases of leukaemogenesis. Finally, we demonstrate the broad utility of using matched parental and transformed cells for small molecule inhibitor studies by validating both previously known and other potential therapeutic targets.

    Nature communications 2020;11;1;1407

  • Acral lentiginous melanoma: Basic facts, biological characteristics and research perspectives of an understudied disease.

    Basurto-Lozada P, Molina-Aguilar C, Castañeda-Garcia C, Vázquez-Cruz ME, Garcia-Salinas OI, Álvarez-Cano A, Martínez-Said H, Roldán-Marín R, Adams DJ, Possik PA and Robles-Espinoza CD

    Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro, 76230, México.

    Acral lentiginous melanoma is a histological subtype of cutaneous melanoma that occurs in the glabrous skin of the palms, soles and in the nail unit. Although in some countries, particularly in Latin America, Africa and Asia, it represents the most frequently diagnosed subtype of the disease, it only represents a small proportion of melanoma cases in European-descent populations, which is partially why it has not been studied to the same extent as other forms of melanoma. As a result, its unique genomic drivers remain comparatively poorly explored, as well as its causes, with current evidence supporting a UV-independent path to tumorigenesis. In this Review, we discuss current knowledge of the aetiology and diagnostic criteria of acral lentiginous melanoma, as well as its epidemiological and histopathological characteristics. We also describe what is known about the genomic landscape of this disease and review the available biological models to explore potential therapeutic targets.

    Pigment cell & melanoma research 2020

  • Identification of region-specific astrocyte subtypes at single cell resolution.

    Batiuk MY, Martirosyan A, Wahis J, de Vin F, Marneffe C, Kusserow C, Koeppen J, Viana JF, Oliveira JF, Voet T, Ponting CP, Belgard TG and Holt MG

    Laboratory of Glia Biology, VIB-KU Leuven Center for Brain and Disease Research, Leuven, Belgium.

    Astrocytes, a major cell type found throughout the central nervous system, have general roles in the modulation of synapse formation and synaptic transmission, blood-brain barrier formation, and regulation of blood flow, as well as metabolic support of other brain resident cells. Crucially, emerging evidence shows specific adaptations and astrocyte-encoded functions in regions, such as the spinal cord and cerebellum. To investigate the true extent of astrocyte molecular diversity across forebrain regions, we used single-cell RNA sequencing. Our analysis identifies five transcriptomically distinct astrocyte subtypes in adult mouse cortex and hippocampus. Validation of our data in situ reveals distinct spatial positioning of defined subtypes, reflecting the distribution of morphologically and physiologically distinct astrocyte populations. Our findings are evidence for specialized astrocyte subtypes between and within brain regions. The data are available through an online database (, providing a resource on which to base explorations of local astrocyte diversity and function in the brain.

    Nature communications 2020;11;1;1220

  • Evolution of Salmonella enterica serotype Typhimurium driven by anthropogenic selection and niche adaptation.

    Bawn M, Alikhan NF, Thilliez G, Kirkwood M, Wheeler NE, Petrovska L, Dallman TJ, Adriaenssens EM, Hall N and Kingsley RA

    Quadram Institute Biosciences, Norwich Research Park, Norwich, United Kingdom.

    Salmonella enterica serotype Typhimurium (S. Typhimurium) is a leading cause of gastroenteritis and bacteraemia worldwide, and a model organism for the study of host-pathogen interactions. Two S. Typhimurium strains (SL1344 and ATCC14028) are widely used to study host-pathogen interactions, yet genotypic variation results in strains with diverse host range, pathogenicity and risk to food safety. The population structure of diverse strains of S. Typhimurium revealed a major phylogroup of predominantly sequence type 19 (ST19) and minor of ST36. The major phylogroup had a population structure with two high order clades (α and β) and multiple subclades on extended internal branches, that exhibited distinct signatures of host adaptation and anthropogenic selection. Clade α contained a number of subclades composed of strains from well characterized epidemics in domesticated animals, while clade β contained multiple subclades associated with wild avian species. The contrasting epidemiology of strains in clade α and β was reflected by the distinct distribution of antimicrobial resistance (AMR) genes, accumulation of hypothetically disrupted coding sequences (HDCS), and signatures of functional diversification. These observations were consistent with elevated anthropogenic selection of clade α lineages from adaptation to circulation in populations of domesticated livestock, and the predisposition of clade β lineages to undergo adaptation to an invasive lifestyle by a process of convergent evolution with of host adapted Salmonella serotypes. Gene flux was predominantly driven by acquisition and recombination of prophage and associated cargo genes, with only occasional loss of these elements. The acquisition of large chromosomally-encoded genetic islands was limited, but notably, a feature of two recent pandemic clones (DT104 and monophasic S. Typhimurium ST34) of clade α (SGI-1 and SGI-4).

    PLoS genetics 2020;16;6;e1008850

  • Astrocyte layers in the mammalian cerebral cortex revealed by a single-cell in situ transcriptomic map.

    Bayraktar OA, Bartels T, Holmqvist S, Kleshchevnikov V, Martirosyan A, Polioudakis D, Ben Haim L, Young AMH, Batiuk MY, Prakash K, Brown A, Roberts K, Paredes MF, Kawaguchi R, Stockley JH, Sabeur K, Chang SM, Huang E, Hutchinson P, Ullian EM, Hemberg M, Coppola G, Holt MG, Geschwind DH and Rowitch DH

    Department of Paediatrics, Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK.

    Although the cerebral cortex is organized into six excitatory neuronal layers, it is unclear whether glial cells show distinct layering. In the present study, we developed a high-content pipeline, the large-area spatial transcriptomic (LaST) map, which can quantify single-cell gene expression in situ. Screening 46 candidate genes for astrocyte diversity across the mouse cortex, we identified superficial, mid and deep astrocyte identities in gradient layer patterns that were distinct from those of neurons. Astrocyte layer features, established in the early postnatal cortex, mostly persisted in adult mouse and human cortex. Single-cell RNA sequencing and spatial reconstruction analysis further confirmed the presence of astrocyte layers in the adult cortex. Satb2 and Reeler mutations that shifted neuronal post-mitotic development were sufficient to alter glial layering, indicating an instructive role for neuronal cues. Finally, astrocyte layer patterns diverged between mouse cortical regions. These findings indicate that excitatory neurons and astrocytes are organized into distinct lineage-associated laminae.

    Funded by: Fonds Wetenschappelijk Onderzoek (Research Foundation Flanders): 1523014N, G066715N; Internationale Stichting Alzheimer Onderzoek (ISAO): S#16025; NINDS NIH HHS: P30 NS062691; U.S. Department of Health &amp; Human Services | National Institutes of Health (NIH): 1R01 MH109912, 1U01 MH105991, P01NS08351

    Nature neuroscience 2020;23;4;500-509

  • Reticular Fibroblasts Expressing the Transcription Factor WT1 Define a Stromal Niche That Maintains and Replenishes Splenic Red Pulp Macrophages.

    Bellomo A, Mondor I, Spinelli L, Lagueyrie M, Stewart BJ, Brouilly N, Malissen B, Clatworthy MR and Bajénoff M

    Aix Marseille Univ, CNRS, INSERM, CIML, Marseille, France.

    Located within red pulp cords, splenic red pulp macrophages (RPMs) are constantly exposed to the blood flow, clearing senescent red blood cells (RBCs) and recycling iron from hemoglobin. Here, we studied the mechanisms underlying RPM homeostasis, focusing on the involvement of stromal cells as these cells perform anchoring and nurturing macrophage niche functions in lymph nodes and liver. Microscopy revealed that RPMs are embedded in a reticular meshwork of red pulp fibroblasts characterized by the expression of the transcription factor Wilms' Tumor 1 (WT1) and colony stimulating factor 1 (CSF1). Conditional deletion of Csf1 in WT1<sup>+</sup> red pulp fibroblasts, but not white pulp fibroblasts, drastically altered the RPM network without altering circulating CSF1 levels. Upon RPM depletion, red pulp fibroblasts transiently produced the monocyte chemoattractants CCL2 and CCL7, thereby contributing to the replenishment of the RPM network. Thus, red pulp fibroblasts anchor and nurture RPM, a function likely conserved in humans.

    Immunity 2020

  • The fix is in.

    Bentley S

    Parasites and Microbes, Wellcome Sanger Institute, Hinxton, UK.

    Nature microbiology 2020;5;3;393-394

  • Insights into human genetic variation and population history from 929 diverse genomes.

    Bergström A, McCarthy SA, Hui R, Almarri MA, Ayub Q, Danecek P, Chen Y, Felkel S, Hallast P, Kamm J, Blanché H, Deleuze JF, Cann H, Mallick S, Reich D, Sandhu MS, Skoglund P, Scally A, Xue Y, Durbin R and Tyler-Smith C

    Wellcome Sanger Institute, Hinxton CB10 1SA, UK.

    Genome sequences from diverse human groups are needed to understand the structure of genetic variation in our species and the history of, and relationships between, different populations. We present 929 high-coverage genome sequences from 54 diverse human populations, 26 of which are physically phased using linked-read sequencing. Analyses of these genomes reveal an excess of previously undocumented common genetic variation private to southern Africa, central Africa, Oceania, and the Americas, but an absence of such variants fixed between major geographical regions. We also find deep and gradual population separations within Africa, contrasting population size histories between hunter-gatherer and agriculturalist groups in the past 10,000 years, and a contrast between single Neanderthal but multiple Denisovan source populations contributing to present-day human populations.

    Funded by: Cancer Research UK; European Research Council; Howard Hughes Medical Institute; Medical Research Council,; Wellcome Trust

    Science (New York, N.Y.) 2020;367;6484

  • High-Resolution mRNA and Secretome Atlas of Human Enteroendocrine Cells.

    Beumer J, Puschhof J, Bauzá-Martinez J, Martínez-Silgado A, Elmentaite R, James KR, Ross A, Hendriks D, Artegiani B, Busslinger GA, Ponsioen B, Andersson-Rolf A, Saftien A, Boot C, Kretzschmar K, Geurts MH, Bar-Ephraim YE, Pleguezuelos-Manzano C, Post Y, Begthel H, van der Linden F, Lopez-Iglesias C, van de Wetering WJ, van der Linden R, Peters PJ, Heck AJR, Goedhart J, Snippert H, Zilbauer M, Teichmann SA, Wu W and Clevers H

    Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences (KNAW) and UMC Utrecht, 3584 CT Utrecht, the Netherlands; Oncode Institute, Hubrecht Institute, 3584 CT Utrecht, the Netherlands.

    Enteroendocrine cells (EECs) sense intestinal content and release hormones to regulate gastrointestinal activity, systemic metabolism, and food intake. Little is known about the molecular make-up of human EEC subtypes and the regulated secretion of individual hormones. Here, we describe an organoid-based platform for functional studies of human EECs. EEC formation is induced in vitro by transient expression of NEUROG3. A set of gut organoids was engineered in which the major hormones are fluorescently tagged. A single-cell mRNA atlas was generated for the different EEC subtypes, and their secreted products were recorded by mass-spectrometry. We note key differences to murine EECs, including hormones, sensory receptors, and transcription factors. Notably, several hormone-like molecules were identified. Inter-EEC communication is exemplified by secretin-induced GLP-1 secretion. Indeed, individual EEC subtypes carry receptors for various EEC hormones. This study provides a rich resource to study human EEC development and function.

    Cell 2020

  • Unsupervised generative and graph representation learning for modelling cell differentiation.

    Bica I, Andrés-Terré H, Cvejic A and Liò P

    Department of Engineering Science, University of Oxford, Oxford, OX1 3PJ, United Kingdom.

    Using machine learning techniques to build representations from biomedical data can help us understand the latent biological mechanism of action and lead to important discoveries. Recent developments in single-cell RNA-sequencing protocols have allowed measuring gene expression for individual cells in a population, thus opening up the possibility of finding answers to biomedical questions about cell differentiation. In this paper, we explore unsupervised generative neural methods, based on the variational autoencoder, that can model cell differentiation by building meaningful representations from the high dimensional and complex gene expression data. We use disentanglement methods based on information theory to improve the data representation and achieve better separation of the biological factors of variation in the gene expression data. In addition, we use a graph autoencoder consisting of graph convolutional layers to predict relationships between single-cells. Based on these models, we develop a computational framework that consists of methods for identifying the cell types in the dataset, finding driver genes for the differentiation process and obtaining a better understanding of relationships between cells. We illustrate our methods on datasets from multiple species and also from different sequencing technologies.

    Funded by: Alan Turing Institute: EP/N510129/1

    Scientific reports 2020;10;1;9790

  • Leupaxin Expression Is Dispensable for B Cell Immune Responses.

    Bonaud A, Clare S, Bisio V, Sowerby JM, Yao S, Ostergaard H, Balabanian K, Smith KGC and Espéli M

    Inflammation Chemokines and Immunopathology, Institut National de la Santé et de la Recherche Medicale (INSERM), Faculté de Médecine, Université Paris-Sud, Université Paris-Saclay, Clamart, France.

    The generation of a potent humoral immune response by B cells relies on the integration of signals induced by the B cell receptor, toll-like receptors and both negative and positive co-receptors. Several reports also suggest that integrin signaling plays an important role in this process. How integrin signaling is regulated in B cells is however still partially understood. Integrin activity and function are controlled by several mechanisms including regulation by molecular adaptors of the paxillin family. In B cells, Leupaxin (Lpxn) is the most expressed member of the family and <i>in vitro</i> studies suggest that it could dampen BCR signaling. Here, we report that <i>Lpxn</i> expression is increased in germinal center B cells compared to naïve B cells. Moreover, <i>Lpxn</i> deficiency leads to decreased B cell differentiation into plasma cells <i>in vitro</i>. However, Lpxn seems dispensable for the generation of a potent B cell immune response <i>in vivo</i>. Altogether our results suggest that Lpxn is dispensable for T-dependent and T-independent B cell immune responses.

    Frontiers in immunology 2020;11;466

  • Draft Genome Sequences of the Type Strains of Actinobacillus indolicus (46K2C) and Actinobacillus porcinus (NM319), Two NAD-Dependent Bacterial Species Found in the Respiratory Tract of Pigs.

    Bossé JT, Li Y, Fernandez Crespo R, Angen Ø, Holden MTG, Weinert LA, Maskell DJ, Tucker AW, Wren BW, Rycroft AN, Langford PR and BRaDP1T consortium

    Section of Paediatric Infectious Disease, Department of Infectious Disease, Imperial College London, London, United Kingdom

    We report here the draft genome sequences of the type strains of <i>Actinobacillus indolicus</i> (46K2C) and <i>Actinobacillus porcinus</i> (NM319). These NAD-dependent bacterial species are frequently found in the upper respiratory tract of pigs and are occasionally associated with lung pathology.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/G018553/1; Wellcome Trust

    Microbiology resource announcements 2020;9;1

  • The immunological network in the developing human skin.

    Botting RA and Haniffa M

    Biosciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK.

    Establishment of a well-functioning immune network in skin is crucial for its barrier function. This begins in utero alongside the structural differentiation and maturation of skin, and continues to expand and diversify across the human lifespan. The microenvironment of the developing human skin supports immune cell differentiation and has an overall anti-inflammatory profile. Immunologically inert and skewed immune populations found in developing human skin promote wound healing, and as such may play a crucial role in the structural changes occurring during skin development.

    Immunology 2020

  • Persistent circulation of a fluoroquinolone-resistant Salmonella enterica Typhi clone in the Indian subcontinent.

    Britto CD, Dyson ZA, Mathias S, Bosco A, Dougan G, Jose S, Nagaraj S, Holt KE and Pollard AJ

    Oxford Vaccine Group, Department of Paediatrics, University of Oxford, and the NIHR Oxford Biomedical Research Centre, Oxford, OX3 7LE, UK.

    Background: The molecular structure of circulating enteric fever pathogens was studied using hospital-based genomic surveillance in a tertiary care referral centre in South India as a first genomic surveillance study, to our knowledge, of blood culture-confirmed enteric fever in the region.

    Methods: Blood culture surveillance was conducted at St John's Medical College Hospital, Bengaluru, between July 2016 and June 2017. The bacterial isolates collected were linked to demographic variables of patients and subjected to WGS. The resulting pathogen genomic data were also globally contextualized to gauge possible phylogeographical patterns.

    Results: Hospital-based genomic surveillance for enteric fever in Bengaluru, India, identified 101 Salmonella enterica Typhi and 14 S. Paratyphi A in a 1 year period. Ninety-six percent of isolates displayed non-susceptibility to fluoroquinolones. WGS showed the dominant pathogen was S. Typhi genotype (H58 lineage II). A fluoroquinolone-resistant triple-mutant clone of S. Typhi previously associated with gatifloxacin treatment failure in Nepal was implicated in 18% of enteric fever cases, indicating ongoing inter-regional circulation.

    Conclusions: Enteric fever in South India continues to be a major public health issue and is strongly associated with antimicrobial resistance. Robust microbiological surveillance is necessary to direct appropriate treatment and preventive strategies. Of particular concern is the emergence and expansion of the highly fluoroquinolone-resistant triple-mutant S. Typhi clone and its ongoing inter- and intra-country transmission in South Asia, which highlights the need for regional coordination of intervention strategies, including vaccination and longer-term strategies such as improvements to support hygiene and sanitation.

    The Journal of antimicrobial chemotherapy 2020;75;2;337-341

  • MYC-induced human acute myeloid leukemia requires a continuing IL3/GM-CSF co-stimulus.

    Bulaeva E, Pellacani D, Nakamichi N, Hammond CA, Beer P, Lorzadeh A, Moksa M, Carles A, Bilenky M, Lefort S, Shu JY, Wilhelm B, Weng A, Hirst M and Eaves CJ

    BC Cancer Agency and University of British Columbia, Vancouver, Canada.

    Hematopoietic clones with leukemogenic mutations arise in healthy people as they age, but progression to acute myeloid leukemia (AML) is rare. Recent evidence suggests that the microenvironment may play an important role in modulating human AML population dynamics. To investigate this concept further, we examined the combined and separate effects of an oncogene (c-MYC) and exposure to IL3, GM-CSF and SCF on the experimental genesis of a human AML in xenografted immunodeficient mice. Initial experiments showed that normal human CD34+ blood cells transduced with a lentiviral MYC vector and then transplanted into immunodeficient mice produced a hierarchically organized, rapidly fatal and serially transplantable blast population, phenotypically and transcriptionally similar to human AML cells, but only in mice producing IL3, GM-CSF and SCF transgenically, or in regular mice in which the cells were exposed to IL3 or GM-CSF delivered using a co-transduction strategy. In their absence, the MYC+ human cells produced a normal repertoire of lymphoid and myeloid progeny in transplanted mice for many months but, upon transfer to secondary mice producing the human cytokines, the MYC+ cells rapidly generated AML. Indistinguishable diseases were also obtained efficiently from both primitive (CD34+CD38-) and late (GMPs) cells. These findings underscore the critical role that these cytokines can play in activating a malignant state in normally differentiating human hematopoietic cells in which MYC expression has been deregulated. They also introduce a robust experimental model of human leukemogenesis to further elucidate key mechanisms involved and test strategies to suppress them.

    Blood 2020

  • Cysteine synthases CYSL-1 and CYSL-2 mediate C. elegans heritable adaptation to P. vranovensis infection.

    Burton NO, Riccio C, Dallaire A, Price J, Jenkins B, Koulman A and Miska EA

    Centre for Trophoblast Research, Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, CB2 3EG, UK.

    Parental exposure to pathogens can prime offspring immunity in diverse organisms. The mechanisms by which this heritable priming occurs are largely unknown. Here we report that the soil bacteria Pseudomonas vranovensis is a natural pathogen of the nematode Caenorhabditis elegans and that parental exposure of animals to P. vranovensis promotes offspring resistance to infection. Furthermore, we demonstrate a multigenerational enhancement of progeny survival when three consecutive generations of animals are exposed to P. vranovensis. By investigating the mechanisms by which animals heritably adapt to P. vranovensis infection, we found that parental infection by P. vranovensis results in increased expression of the cysteine synthases cysl-1 and cysl-2 and the regulator of hypoxia inducible factor rhy-1 in progeny, and that these three genes are required for adaptation to P. vranovensis. These observations establish a CYSL-1, CYSL-2, and RHY-1 dependent mechanism by which animals heritably adapt to infection.

    Funded by: Cancer Research UK (CRUK): C13474/A18583, C6946/A14492; RCUK | Biotechnology and Biological Sciences Research Council (BBSRC): BB/M027252/1; Wellcome Trust (Wellcome): 092096/Z/10/Z, 104640/Z/14/Z

    Nature communications 2020;11;1;1741

  • Human and mouse essentiality screens as a resource for disease gene discovery.

    Cacheiro P, Muñoz-Fuentes V, Murray SA, Dickinson ME, Bucan M, Nutter LMJ, Peterson KA, Haselimashhadi H, Flenniken AM, Morgan H, Westerberg H, Konopka T, Hsu CW, Christiansen A, Lanza DG, Beaudet AL, Heaney JD, Fuchs H, Gailus-Durner V, Sorg T, Prochazka J, Novosadova V, Lelliott CJ, Wardle-Jones H, Wells S, Teboul L, Cater H, Stewart M, Hough T, Wurst W, Sedlacek R, Adams DJ, Seavitt JR, Tocchini-Valentini G, Mammano F, Braun RE, McKerlie C, Herault Y, de Angelis MH, Mallon AM, Lloyd KCK, Brown SDM, Parkinson H, Meehan TF, Smedley D, Genomics England Research Consortium and International Mouse Phenotyping Consortium

    Clinical Pharmacology, William Harvey Research Institute, School of Medicine and Dentistry, Queen Mary University of London, London, EC1M 6BQ, UK.

    The identification of causal variants in sequencing studies remains a considerable challenge that can be partially addressed by new gene-specific knowledge. Here, we integrate measures of how essential a gene is to supporting life, as inferred from viability and phenotyping screens performed on knockout mice by the International Mouse Phenotyping Consortium and essentiality screens carried out on human cell lines. We propose a cross-species gene classification across the Full Spectrum of Intolerance to Loss-of-function (FUSIL) and demonstrate that genes in five mutually exclusive FUSIL categories have differing biological properties. Most notably, Mendelian disease genes, particularly those associated with developmental disorders, are highly overrepresented among genes non-essential for cell survival but required for organism development. After screening developmental disorder cases from three independent disease sequencing consortia, we identify potentially pathogenic variants in genes not previously associated with rare diseases. We therefore propose FUSIL as an efficient approach for disease gene discovery.

    Funded by: Medical Research Council: MC_U142684171, MC_U142684172, MR/S006753/1; NHGRI NIH HHS: UM1 HG006348; NIH HHS: UM1 OD023221

    Nature communications 2020;11;1;655

  • FAMIN Is a Multifunctional Purine Enzyme Enabling the Purine Nucleotide Cycle.

    Cader MZ, de Almeida Rodrigues RP, West JA, Sewell GW, Md-Ibrahim MN, Reikine S, Sirago G, Unger LW, Iglesias-Romero AB, Ramshorn K, Haag LM, Saveljeva S, Ebel JF, Rosenstiel P, Kaneider NC, Lee JC, Lawley TD, Bradley A, Dougan G, Modis Y, Griffin JL and Kaser A

    Cambridge Institute of Therapeutic Immunology and Infectious Disease, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge CB2 0AW, UK; Division of Gastroenterology and Hepatology, Department of Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge CB2 0QQ, UK.

    Mutations in FAMIN cause arthritis and inflammatory bowel disease in early childhood, and a common genetic variant increases the risk for Crohn's disease and leprosy. We developed an unbiased liquid chromatography-mass spectrometry screen for enzymatic activity of this orphan protein. We report that FAMIN phosphorolytically cleaves adenosine into adenine and ribose-1-phosphate. Such activity was considered absent from eukaryotic metabolism. FAMIN and its prokaryotic orthologs additionally have adenosine deaminase, purine nucleoside phosphorylase, and S-methyl-5'-thioadenosine phosphorylase activity, hence, combine activities of the namesake enzymes of central purine metabolism. FAMIN enables in macrophages a purine nucleotide cycle (PNC) between adenosine and inosine monophosphate and adenylosuccinate, which consumes aspartate and releases fumarate in a manner involving fatty acid oxidation and ATP-citrate lyase activity. This macrophage PNC synchronizes mitochondrial activity with glycolysis by balancing electron transfer to mitochondria, thereby supporting glycolytic activity and promoting oxidative phosphorylation and mitochondrial H<sup>+</sup> and phosphate recycling.

    Funded by: Wellcome Trust

    Cell 2020;180;2;278-295.e23

  • Minimal phenotyping yields genome-wide association signals of low specificity for major depression.

    Cai N, Revez JA, Adams MJ, Andlauer TFM, Breen G, Byrne EM, Clarke TK, Forstner AJ, Grabe HJ, Hamilton SP, Levinson DF, Lewis CM, Lewis G, Martin NG, Milaneschi Y, Mors O, Müller-Myhsok B, Penninx BWJH, Perlis RH, Pistis G, Potash JB, Preisig M, Shi J, Smoller JW, Streit F, Tiemeier H, Uher R, Van der Auwera S, Viktorin A, Weissman MM, MDD Working Group of the Psychiatric Genomics Consortium, Kendler KS and Flint J

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

    Minimal phenotyping refers to the reliance on the use of a small number of self-reported items for disease case identification, increasingly used in genome-wide association studies (GWAS). Here we report differences in genetic architecture between depression defined by minimal phenotyping and strictly defined major depressive disorder (MDD): the former has a lower genotype-derived heritability that cannot be explained by inclusion of milder cases and a higher proportion of the genome contributing to this shared genetic liability with other conditions than for strictly defined MDD. GWAS based on minimal phenotyping definitions preferentially identifies loci that are not specific to MDD, and, although it generates highly predictive polygenic risk scores, the predictive power can be explained entirely by large sample sizes rather than by specificity for MDD. Our results show that reliance on results from minimal phenotyping may bias views of the genetic architecture of MDD and impede the ability to identify pathways specific to MDD.

    Nature genetics 2020

  • Evolution in interacting species alters predator life-history traits, behaviour and morphology in experimental microbial communities.

    Cairns J, Moerman F, Fronhofer EA, Altermatt F and Hiltunen T

    Wellcome Sanger Institute, Cambridge CB10 1SA, UK.

    Predator-prey interactions heavily influence the dynamics of many ecosystems. An increasing body of evidence suggests that rapid evolution and coevolution can alter these interactions, with important ecological implications, by acting on traits determining fitness, including reproduction, anti-predatory defence and foraging efficiency. However, most studies to date have focused only on evolution in the prey species, and the predator traits in (co)evolving systems remain poorly understood. Here, we investigated changes in predator traits after approximately 600 generations in a predator-prey (ciliate-bacteria) evolutionary experiment. Predators independently evolved on seven different prey species, allowing generalization of the predator's evolutionary response. We used highly resolved automated image analysis to quantify changes in predator life history, morphology and behaviour. Consistent with previous studies, we found that prey evolution impaired growth of the predator, although the effect depended on the prey species. By contrast, predator evolution did not cause a clear increase in predator growth when feeding on ancestral prey. However, predator evolution affected morphology and behaviour, increasing size, speed and directionality of movement, which have all been linked to higher prey search efficiency. These results show that in (co)evolving systems, predator adaptation can occur in traits relevant to foraging efficiency without translating into an increased ability of the predator to grow on the ancestral prey type.

    Proceedings. Biological sciences 2020;287;1928;20200652

  • Interstitial Cell Remodeling Promotes Aberrant Adipogenesis in Dystrophic Muscles.

    Camps J, Breuls N, Sifrim A, Giarratana N, Corvelyn M, Danti L, Grosemans H, Vanuytven S, Thiry I, Belicchi M, Meregalli M, Platko K, MacDonald ME, Austin RC, Gijsbers R, Cossu G, Torrente Y, Voet T and Sampaolesi M

    Laboratory of Translational Cardiomyology, Department of Development and Regeneration, Stem Cell Research Institute, KU Leuven, 3000 Leuven, Belgium; Bayer AG, Research & Development, Pharmaceuticals, 13353 Berlin, Germany.

    Fibrosis and fat replacement in skeletal muscle are major complications that lead to a loss of mobility in chronic muscle disorders, such as muscular dystrophy. However, the in vivo properties of adipogenic stem and precursor cells remain unclear, mainly due to the high cell heterogeneity in skeletal muscles. Here, we use single-cell RNA sequencing to decomplexify interstitial cell populations in healthy and dystrophic skeletal muscles. We identify an interstitial CD142-positive cell population in mice and humans that is responsible for the inhibition of adipogenesis through GDF10 secretion. Furthermore, we show that the interstitial cell composition is completely altered in muscular dystrophy, with a near absence of CD142-positive cells. The identification of these adipo-regulatory cells in the skeletal muscle aids our understanding of the aberrant fat deposition in muscular dystrophy, paving the way for treatments that could counteract degeneration in patients with muscular dystrophy.

    Cell reports 2020;31;5;107597

  • From GWAS to Function: Using Functional Genomics to Identify the Mechanisms Underlying Complex Diseases.

    Cano-Gamez E and Trynka G

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom.

    Genome-wide association studies (GWAS) have successfully mapped thousands of loci associated with complex traits. These associations could reveal the molecular mechanisms altered in common complex diseases and result in the identification of novel drug targets. However, GWAS have also left a number of outstanding questions. In particular, the majority of disease-associated loci lie in non-coding regions of the genome and, even though they are thought to play a role in gene expression regulation, it is unclear which genes they regulate and in which cell types or physiological contexts this regulation occurs. This has hindered the translation of GWAS findings into clinical interventions. In this review we summarize how these challenges have been addressed over the last decade, with a particular focus on the integration of GWAS results with functional genomics datasets. Firstly, we investigate how the tissues and cell types involved in diseases can be identified using methods that test for enrichment of GWAS variants in genomic annotations. Secondly, we explore how to find the genes regulated by GWAS loci using methods that test for colocalization of GWAS signals with molecular phenotypes such as quantitative trait loci (QTLs). Finally, we highlight potential future research avenues such as integrating GWAS results with single-cell sequencing read-outs, designing functionally informed polygenic risk scores (PRS), and validating disease associated genes using genetic engineering. These tools will be crucial to identify new drug targets for common complex diseases.

    Frontiers in genetics 2020;11;424

  • Single-cell transcriptomics identifies an effectorness gradient shaping the response of CD4+ T cells to cytokines.

    Cano-Gamez E, Soskic B, Roumeliotis TI, So E, Smyth DJ, Baldrighi M, Willé D, Nakic N, Esparza-Gordillo J, Larminie CGC, Bronson PG, Tough DF, Rowan WC, Choudhary JS and Trynka G

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK.

    Naïve CD4<sup>+</sup> T cells coordinate the immune response by acquiring an effector phenotype in response to cytokines. However, the cytokine responses in memory T cells remain largely understudied. Here we use quantitative proteomics, bulk RNA-seq, and single-cell RNA-seq of over 40,000 human naïve and memory CD4<sup>+</sup> T cells to show that responses to cytokines differ substantially between these cell types. Memory T cells are unable to differentiate into the Th2 phenotype, and acquire a Th17-like phenotype in response to iTreg polarization. Single-cell analyses show that T cells constitute a transcriptional continuum that progresses from naïve to central and effector memory T cells, forming an effectorness gradient accompanied by an increase in the expression of chemokines and cytokines. Finally, we show that T cell activation and cytokine responses are influenced by the effectorness gradient. Our results illustrate the heterogeneity of T cell responses, furthering our understanding of inflammation.

    Funded by: Wellcome Trust (Wellcome): WT206194

    Nature communications 2020;11;1;1801

  • The concerted action of two B3-like prophage genes exclude superinfecting bacteriophages by blocking DNA entry into Pseudomonas aeruginosa.

    Carballo-Ontiveros MA, Cazares A, Vinuesa P, Kameyama L and Guarneros G

    Departamento de Genética y Biología Molecular, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (CINVESTAV-IPN), Mexico City, Mexico.

    In this study, we describe seven vegetative phage genomes homologous to the historic phage B3 that infect <i>Pseudomonas aeruginosa</i> Like other phage groups, the B3-like group contains conserved (core) and variable (accessory) Open Reading Frames (ORFs) grouped at fixed regions in their genomes; however, in either case, many ORFs remain without assigned functions. We constructed lysogens of the seven B3-like phages in strain Ps33 of <i>P. aeruginosa</i>, a novel clinical isolate, and assayed the exclusion phenotype against a variety of temperate and virulent superinfecting phages. In addition to the classic exclusion conferred by the phage immunity repressor, the phenotype observed in B3-like lysogens suggested the presence of other exclusion genes. We set out to identify the genes responsible for this exclusion phenotype. Phage Ps56 was chosen as the study subject since it excluded numerous temperate and virulent phages. Restriction of Ps56 genome, cloning of several fragments, and resection of the fragments that retained the exclusion phenotype allowed us to identify two core ORFs, so far without any assigned function, as responsible for a type of exclusion. Neither gene expressed separately from plasmids showed activity, but the concurrent expression of both ORFs is needed for exclusion. Our data suggest that phage adsorption occurs, but phage genome translocation to the host's cytoplasm is defective. To our knowledge, this is the first report on this type of exclusion mediated by a prophage in <i>P. aeruginosa</i><b>IMPORTANCE</b><i>Pseudomonas aeruginosa</i> is a Gram-negative bacterium, frequently isolated from infected immunocompromised patients, and the strains are resistant to a broad spectrum of antibiotics. Recently, the use of phages has been proposed as an alternative therapy against multidrug-resistant bacteria. However, this approach may present various hurdles. This work addresses the problem that pathogenic bacteria may be lysogenized by phages carrying genes encoding resistance against secondary infections, such as those used in phage therapy. Discovering phage genes that exclude superinfecting phages not only assign novel functions to orphan genes in databases but also provide insight into selection of proper phages for use in phage therapy.

    Journal of virology 2020

  • Cancer LncRNA Census reveals evidence for deep functional conservation of long noncoding RNAs in tumorigenesis.

    Carlevaro-Fita J, Lanzós A, Feuerbach L, Hong C, Mas-Ponte D, Pedersen JS, PCAWG Drivers and Functional Interpretation Group, Johnson R and PCAWG Consortium

    Department of Medical Oncology, Inselspital, University Hospital and University of Bern, 3010, Bern, Switzerland.

    Long non-coding RNAs (lncRNAs) are a growing focus of cancer genomics studies, creating the need for a resource of lncRNAs with validated cancer roles. Furthermore, it remains debated whether mutated lncRNAs can drive tumorigenesis, and whether such functions could be conserved during evolution. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we introduce the Cancer LncRNA Census (CLC), a compilation of 122 GENCODE lncRNAs with causal roles in cancer phenotypes. In contrast to existing databases, CLC requires strong functional or genetic evidence. CLC genes are enriched amongst driver genes predicted from somatic mutations, and display characteristic genomic features. Strikingly, CLC genes are enriched for driver mutations from unbiased, genome-wide transposon-mutagenesis screens in mice. We identified 10 tumour-causing mutations in orthologues of 8 lncRNAs, including LINC-PINT and NEAT1, but not MALAT1. Thus CLC represents a dataset of high-confidence cancer lncRNAs. Mutagenesis maps are a novel means for identifying deeply-conserved roles of lncRNAs in tumorigenesis.

    Communications biology 2020;3;1;56

  • Bacterial survival: evolve and adapt or perish.

    Chaguza C

    Genomics of Pneumonia and Meningitis, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

    Nature reviews. Microbiology 2020;18;1;5

  • BlobToolKit - Interactive Quality Assessment of Genome Assemblies.

    Challis R, Richards E, Rajan J, Cochrane G and Blaxter M

    University of Edinburgh; Wellcome Trust Sanger Institute

    Reconstruction of target genomes from sequence data produced by instruments that are agnostic as to the species-of-origin may be confounded by contaminant DNA. Whether introduced during sample processing or through co-extraction alongside the target DNA, if insufficient care is taken during the assembly process, the final assembled genome may be a mixture of data from several species. Such assemblies can confound sequence-based biological inference and, when deposited in public databases, may be included in downstream analyses by users unaware of underlying problems. We present BlobToolKit, a software suite to aid researchers in identifying and isolating non-target data in draft and publicly available genome assemblies. BlobToolKit can be used to process assembly, read and analysis files for fully reproducible interactive exploration in the browser-based Viewer. BlobToolKit can be used during assembly to filter non-target DNA, helping researchers produce assemblies with high biological credibility. We have been running an automated BlobToolKit pipeline on eukaryotic assemblies publicly available in the International Nucleotide Sequence Data Collaboration and are making the results available through a public instance of the Viewer at . We aim to complete analysis of all publicly available genomes and then maintain currency with the flow of new genomes. We have worked to embed these views into the presentation of genome assemblies at the European Nucleotide Archive, providing an indication of assembly quality alongside the public record with links out to allow full exploration in the Viewer.

    G3 (Bethesda, Md.) 2020

  • Transcription phenotypes of pancreatic cancer are driven by genomic events during tumor evolution.

    Chan-Seng-Yue M, Kim JC, Wilson GW, Ng K, Figueroa EF, O'Kane GM, Connor AA, Denroche RE, Grant RC, McLeod J, Wilson JM, Jang GH, Zhang A, Dodd A, Liang SB, Borgida A, Chadwick D, Kalimuthu S, Lungu I, Bartlett JMS, Krzyzanowski PM, Sandhu V, Tiriac H, Froeling FEM, Karasinska JM, Topham JT, Renouf DJ, Schaeffer DF, Jones SJM, Marra MA, Laskin J, Chetty R, Stein LD, Zogopoulos G, Haibe-Kains B, Campbell PJ, Tuveson DA, Knox JJ, Fischer SE, Gallinger S and Notta F

    Princess Margaret Cancer Centre, Toronto, Ontario, Canada.

    Pancreatic adenocarcinoma presents as a spectrum of a highly aggressive disease in patients. The basis of this disease heterogeneity has proved difficult to resolve due to poor tumor cellularity and extensive genomic instability. To address this, a dataset of whole genomes and transcriptomes was generated from purified epithelium of primary and metastatic tumors. Transcriptome analysis demonstrated that molecular subtypes are a product of a gene expression continuum driven by a mixture of intratumoral subpopulations, which was confirmed by single-cell analysis. Integrated whole-genome analysis uncovered that molecular subtypes are linked to specific copy number aberrations in genes such as mutant KRAS and GATA6. By mapping tumor genetic histories, tetraploidization emerged as a key mutational process behind these events. Taken together, these data support the premise that the constellation of genomic aberrations in the tumor gives rise to the molecular subtype, and that disease heterogeneity is due to ongoing genomic instability during progression.

    Nature genetics 2020;52;2;231-240

  • Refining the transcriptome of the human malaria parasite Plasmodium falciparum using amplification-free RNA-seq.

    Chappell L, Ross P, Orchard L, Russell TJ, Otto TD, Berriman M, Rayner JC and Llinás M

    Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1SA, UK.

    Background: Plasmodium parasites undergo several major developmental transitions during their complex lifecycle, which are enabled by precisely ordered gene expression programs. Transcriptomes from the 48-h blood stages of the major human malaria parasite Plasmodium falciparum have been described using cDNA microarrays and RNA-seq, but these assays have not always performed well within non-coding regions, where the AT-content is often 90-95%.

    Results: We developed a directional, amplification-free RNA-seq protocol (DAFT-seq) to reduce bias against AT-rich cDNA, which we have applied to three strains of P. falciparum (3D7, HB3 and IT). While strain-specific differences were detected, overall there is strong conservation between the transcriptional profiles. For the 3D7 reference strain, transcription was detected from 89% of the genome, with over 78% of the genome transcribed into mRNAs. We also find that transcription from bidirectional promoters frequently results in non-coding, antisense transcripts. These datasets allowed us to refine the 5' and 3' untranslated regions (UTRs), which can be variable, long (> 1000 nt), and often overlap those of adjacent transcripts.

    Conclusions: The approaches applied in this study allow a refined description of the transcriptional landscape of P. falciparum and demonstrate that very little of the densely packed P. falciparum genome is inactive or redundant. By capturing the 5' and 3' ends of mRNAs, we reveal both constant and dynamic use of transcriptional start sites across the intraerythrocytic developmental cycle that will be useful in guiding the definition of regulatory regions for use in future experimental gene expression studies.

    Funded by: Burroughs Wellcome Fund: 1007041.02; NIGMS NIH HHS: P50 GM071508; NIH HHS: 1DP2OD001315; Wellcome Trust: 206194

    BMC genomics 2020;21;1;395

  • High-Throughput Quantitative RT-PCR in Single and Bulk C. elegans Samples Using Nanofluidic Technology.

    Chauve L, Le Pen J, Hodge F, Todtenhaupt P, Biggins L, Miska EA, Andrews S and Casanueva O

    Babraham Institute.

    This paper presents a high-throughput reverse transcription quantitative PCR (RT-qPCR) assay for Caenorhabditis elegans that is fast, robust, and highly sensitive. This protocol obtains precise measurements of gene expression from single worms or from bulk samples. The protocol presented here provides a novel adaptation of existing methods for complementary DNA (cDNA) preparation coupled to a nanofluidic RT-qPCR platform. The first part of this protocol, named 'Worm-to-CT', allows cDNA production directly from nematodes without the need for prior mRNA isolation. It increases experimental throughput by allowing the preparation of cDNA from 96 worms in 3.5 h. The second part of the protocol uses existing nanofluidic technology to run high-throughput RT-qPCR on the cDNA. This paper evaluates two different nanofluidic chips: the first runs 96 samples and 96 targets, resulting in 9,216 reactions in approximately 1.5 days of benchwork. The second chip type consists of six 12 x 12 arrays, resulting in 864 reactions. Here, the Worm-to-CT method is demonstrated by quantifying mRNA levels of genes encoding heat shock proteins from single worms and from bulk samples. Provided is an extensive list of primers designed to amplify processed RNA for the majority of coding genes within the C. elegans genome.

    Journal of visualized experiments : JoVE 2020;159

  • Re-evaluation of human BDCA-2+ DC during acute sterile skin inflammation.

    Chen YL, Gomes T, Hardman CS, Vieira Braga FA, Gutowska-Owsiak D, Salimi M, Gray N, Duncan DA, Reynolds G, Johnson D, Salio M, Cerundolo V, Barlow JL, McKenzie ANJ, Teichmann SA, Haniffa M and Ogg G

    Medical Research Council Human Immunology Unit, Radcliffe Department of Medicine, Oxford National Institute for Health Research Biomedical Research Centre, Medical Research Council Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK.

    Plasmacytoid dendritic cells (pDCs) produce type I interferon (IFN-I) and are traditionally defined as being BDCA-2+CD123+. pDCs are not readily detectable in healthy human skin, but have been suggested to accumulate in wounds. Here, we describe a CD1a-bearing BDCA-2+CD123int DC subset that rapidly infiltrates human skin wounds and comprises a major DC population. Using single-cell RNA sequencing, we show that these cells are largely activated DCs acquiring features compatible with lymph node homing and antigen presentation, but unexpectedly express both BDCA-2 and CD123, potentially mimicking pDCs. Furthermore, a third BDCA-2-expressing population, Axl+Siglec-6+ DCs (ASDC), was also found to infiltrate human skin during wounding. These data demonstrate early skin infiltration of a previously unrecognized CD123intBDCA-2+CD1a+ DC subset during acute sterile inflammation, and prompt a re-evaluation of previously ascribed pDC involvement in skin disease.

    Funded by: Cancer Research UK: 11331; Medical Research Council: G0501975, G0701693, G0800158, G1000800, G116/150, MC_EX_MR/R022550/1, MC_PC_14103, MC_PC_14131, MC_PC_15002, MC_U105178805, MC_U137881017, MC_U137884181, MC_UU_00008/1, MC_UU_00008/5, MC_UU_12010/1, MC_UU_12010/5, MR/K01577X/1, MR/K021222/1; Wellcome Trust: 209222/Z/17/Z

    The Journal of experimental medicine 2020;217;3

  • Investigating Cellular Recognition Using CRISPR/Cas9 Genetic Screening.

    Chong ZS, Wright GJ and Sharma S

    Cell Surface Signalling Laboratory, Wellcome Sanger Institute, Cambridge CB10 1SA, UK. Electronic address:

    Neighbouring cells can recognise and communicate with each other by direct binding between cell surface receptor and ligand pairs. Examples of cellular recognition events include pathogen entry into a host cell, sperm-egg fusion, and self/nonself discrimination by the immune system. Despite growing appreciation of cell surface recognition molecules as potential therapeutic targets, identifying key factors contributing to cellular recognition remains technically challenging to perform on a genome-wide scale. Recently, genome-scale clustered regularly interspaced short palindromic repeats (CRISPR) knockout or activation (CRISPR-KO/CRISPRa) screens have been applied to identify the molecular determinants of cellular recognition. In this review, we discuss how CRISPR-KO/CRISPRa screening has contributed to our understanding of cellular recognition processes, and how it can be applied to investigate these important interactions in a range of biological contexts.

    Trends in cell biology 2020

  • Targeting NLRP3 and staphylococcal pore-forming toxin receptors in human-induced pluripotent stem cell-derived macrophages.

    Chow SH, Deo P, Yeung ATY, Kostoulias XP, Jeon Y, Gao ML, Seidi A, Olivier FAB, Sridhar S, Nethercott C, Cameron D, Robertson AAB, Robert R, Mackay CR, Traven A, Jin ZB, Hale C, Dougan G, Peleg AY and Naderer T

    Infection & Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Clayton, Victoria, Australia.

    Staphylococcus aureus causes necrotizing pneumonia by secreting toxins such as leukocidins that target front-line immune cells. The mechanism by which leukocidins kill innate immune cells and trigger inflammation during S. aureus lung infection, however, remains unresolved. Here, we explored human-induced pluripotent stem cell-derived macrophages (hiPSC-dMs) to study the interaction of the leukocidins Panton-Valentine leukocidin (PVL) and LukAB with lung macrophages, which are the initial leukocidin targets during S. aureus lung invasion. hiPSC-dMs were susceptible to the leukocidins PVL and LukAB and both leukocidins triggered NLPR3 inflammasome activation resulting in IL-1β secretion. hiPSC-dM cell death after LukAB exposure, however, was only temporarily dependent of NLRP3, although NLRP3 triggered marked cell death after PVL treatment. CRISPR/Cas9-mediated deletion of the PVL receptor, C5aR1, protected hiPSC-dMs from PVL cytotoxicity, despite the expression of other leukocidin receptors, such as CD45. PVL-deficient S. aureus had reduced ability to induce lung IL-1β levels in human C5aR1 knock-in mice. Unexpectedly, inhibiting NLRP3 activity resulted in increased wild-type S. aureus lung burdens. Our findings suggest that NLRP3 induces macrophage death and IL-1β secretion after PVL exposure and controls S. aureus lung burdens.

    Funded by: Australian National Health and Medical Research Council Practitioner Fellowship: APP1117940; Australian Research Council Future Fellows: FT170100313, FT190100733; National Health and Medical Research Council: 1163556; National Key R&amp;D Program of China: 2017YFA0105300

    Journal of leukocyte biology 2020

  • Outbreak of Dirkmeia churashimaensis Fungemia in a Neonatal Intensive Care Unit, India.

    Chowdhary A, Sharada K, Singh PK, Bhagwani DK, Kumar N, de Groot T and Meis JF

    Bloodstream infections caused by uncommon or novel fungal species are challenging to identify and treat. We report a series of cases of fungemia due to a rare basidiomycete yeast, Dirkmeia churashimaensis, in neonatal patients in India. Whole-genome sequence typing demonstrated that the patient isolates were genetically indistinguishable, indicating a single-source infection.

    Emerging infectious diseases 2020;26;4;764-768

  • A brief history of human disease genetics.

    Claussnitzer M, Cho JH, Collins R, Cox NJ, Dermitzakis ET, Hurles ME, Kathiresan S, Kenny EE, Lindgren CM, MacArthur DG, North KN, Plon SE, Rehm HL, Risch N, Rotimi CN, Shendure J, Soranzo N and McCarthy MI

    Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA.

    A primary goal of human genetics is to identify DNA sequence variants that influence biomedical traits, particularly those related to the onset and progression of human disease. Over the past 25 years, progress in realizing this objective has been transformed by advances in technology, foundational genomic resources and analytical tools, and by access to vast amounts of genotype and phenotype data. Genetic discoveries have substantially improved our understanding of the mechanisms responsible for many rare and common diseases and driven development of novel preventative and therapeutic strategies. Medical innovation will increasingly focus on delivering care tailored to individual patterns of genetic predisposition.

    Funded by: Howard Hughes Medical Institute; NIDDK NIH HHS: R01 DK106593, U01 DK062422, U01 DK062429; Wellcome Trust: 090532, 098381, 106130, 203141, 212259

    Nature 2020;577;7789;179-189

  • Genome-wide gene-environment analyses of major depressive disorder and reported lifetime traumatic experiences in UK Biobank.

    Coleman JRI, Peyrot WJ, Purves KL, Davis KAS, Rayner C, Choi SW, Hübel C, Gaspar HA, Kan C, Van der Auwera S, Adams MJ, Lyall DM, Choi KW, on the behalf of Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium, Dunn EC, Vassos E, Danese A, Maughan B, Grabe HJ, Lewis CM, O'Reilly PF, McIntosh AM, Smith DJ, Wray NR, Hotopf M, Eley TC and Breen G

    Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK.

    Depression is more frequent among individuals exposed to traumatic events. Both trauma exposure and depression are heritable. However, the relationship between these traits, including the role of genetic risk factors, is complex and poorly understood. When modelling trauma exposure as an environmental influence on depression, both gene-environment correlations and gene-environment interactions have been observed. The UK Biobank concurrently assessed Major Depressive Disorder (MDD) and self-reported lifetime exposure to traumatic events in 126,522 genotyped individuals of European ancestry. We contrasted genetic influences on MDD stratified by reported trauma exposure (final sample size range: 24,094-92,957). The SNP-based heritability of MDD with reported trauma exposure (24%) was greater than MDD without reported trauma exposure (12%). Simulations showed that this is not confounded by the strong, positive genetic correlation observed between MDD and reported trauma exposure. We also observed that the genetic correlation between MDD and waist circumference was only significant in individuals reporting trauma exposure (r<sub>g</sub> = 0.24, p = 1.8 × 10<sup>-7</sup> versus r<sub>g</sub> = -0.05, p = 0.39 in individuals not reporting trauma exposure, difference p = 2.3 × 10<sup>-4</sup>). Our results suggest that the genetic contribution to MDD is greater when reported trauma is present, and that a complex relationship exists between reported trauma exposure, body composition, and MDD.

    Funded by: Department of Health | National Health and Medical Research Council (NHMRC): 1078901, 1087889; Medical Research Council: G0200243, MC_PC_12028, MC_PC_17228, MC_QA137853; NIMH NIH HHS: U01 MH109528; Nederlandse Organisatie voor Wetenschappelijk Onderzoek (Netherlands Organisation for Scientific Research): 91619152

    Molecular psychiatry 2020

  • Designing ecologically optimized pneumococcal vaccines using population genomics.

    Colijn C, Corander J and Croucher NJ

    Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada.

    Streptococcus pneumoniae (the pneumococcus) is a common nasopharyngeal commensal that can cause invasive pneumococcal disease (IPD). Each component of current protein-polysaccharide conjugate vaccines (PCVs) generally induces immunity specific to one of the approximately 100 pneumococcal serotypes, and typically eliminates it from carriage and IPD through herd immunity. Overall carriage rates remain stable owing to replacement by non-PCV serotypes. Consequently, the net change in IPD incidence is determined by the relative invasiveness of the pre- and post-PCV-carried pneumococcal populations. In the present study, we identified PCVs expected to minimize the post-vaccine IPD burden by applying Bayesian optimization to an ecological model of serotype replacement that integrated epidemiological and genomic data. We compared optimal formulations for reducing infant-only or population-wide IPD, and identified potential benefits to including non-conserved pneumococcal carrier proteins. Vaccines were also devised to minimize IPD resistant to antibiotic treatment, despite the ecological model assuming that resistance levels in the carried population would be preserved. We found that expanding infant-administered PCV valency is likely to result in diminishing returns, and that complementary pairs of infant- and adult-administered vaccines could be a superior strategy. PCV performance was highly dependent on the circulating pneumococcal population, further highlighting the advantages of a diversity of anti-pneumococcal vaccination strategies.

    Funded by: EC | EC Seventh Framework Programm | FP7 Ideas: European Research Council (FP7-IDEAS-ERC - Specific Programme: &quot;Ideas&quot; Implementing the Seventh Framework Programme of the European Community for Research, Technological Development and Demonstration Activities (2007 to 2013)): 742158; RCUK | Engineering and Physical Sciences Research Council (EPSRC): EP/K026003/1, EP/N014529/1; Wellcome Trust (Wellcome): 104169/Z/14/A

    Nature microbiology 2020;5;3;473-485

  • Spatial competition shapes the dynamic mutational landscape of normal esophageal epithelium.

    Colom B, Alcolea MP, Piedrafita G, Hall MWJ, Wabik A, Dentro SC, Fowler JC, Herms A, King C, Ong SH, Sood RK, Gerstung M, Martincorena I, Hall BA and Jones PH

    Wellcome Sanger Institute, Hinxton, UK.

    During aging, progenitor cells acquire mutations, which may generate clones that colonize the surrounding tissue. By middle age, normal human tissues, including the esophageal epithelium (EE), become a patchwork of mutant clones. Despite their relevance for understanding aging and cancer, the processes that underpin mutational selection in normal tissues remain poorly understood. Here, we investigated this issue in the esophageal epithelium of mutagen-treated mice. Deep sequencing identified numerous mutant clones with multiple genes under positive selection, including Notch1, Notch2 and Trp53, which are also selected in human esophageal epithelium. Transgenic lineage tracing revealed strong clonal competition that evolved over time. Clone dynamics were consistent with a simple model in which the proliferative advantage conferred by positively selected mutations depends on the nature of the neighboring cells. When clones with similar competitive fitness collide, mutant cell fate reverts towards homeostasis, a constraint that explains how selection operates in normal-appearing epithelium.

    Funded by: Cancer Research UK (CRUK): C57387/A21777, C609/A17257, C609/A27326; Royal Society: UF130039; Wellcome Trust (Wellcome): 098051, 296194

    Nature genetics 2020

  • Refinement of the Clinical and Mutational Spectrum of UBE2A Deficiency Syndrome.

    Cordeddu V, Macke EL, Radio FC, Cicero SL, Pantaleoni F, Tatti M, Bellacchio E, Ciolfi A, Agolini E, Bruselles A, Brunetti-Pierri N, Suri M, Josephs KS, McEntagart M, Lanpher B, Nickels KC, Haworth A, Reed L, Cappuccio G, Mammi I, Tarnowski JM, Novelli A, Deciphering Developmental Disorders Study, Melis D, Callewaert B, Dallapiccola B, Klee E and Tartaglia M

    National Center for Drug Research and Evaluation, Istituto Superiore di Sanità, Rome, Italy.

    Clinical genetics 2020

  • Comprehensive analysis of chromothripsis in 2,658 human cancers using whole-genome sequencing.

    Cortés-Ciriano I, Lee JJ, Xi R, Jain D, Jung YL, Yang L, Gordenin D, Klimczak LJ, Zhang CZ, Pellman DS, PCAWG Structural Variation Working Group, Park PJ and PCAWG Consortium

    Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

    Chromothripsis is a mutational phenomenon characterized by massive, clustered genomic rearrangements that occurs in cancer and other diseases. Recent studies in selected cancer types have suggested that chromothripsis may be more common than initially inferred from low-resolution copy-number data. Here, as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA), we analyze patterns of chromothripsis across 2,658 tumors from 38 cancer types using whole-genome sequencing data. We find that chromothripsis events are pervasive across cancers, with a frequency of more than 50% in several cancer types. Whereas canonical chromothripsis profiles display oscillations between two copy-number states, a considerable fraction of events involve multiple chromosomes and additional structural alterations. In addition to non-homologous end joining, we detect signatures of replication-associated processes and templated insertions. Chromothripsis contributes to oncogene amplification and to inactivation of genes such as mismatch-repair-related genes. These findings show that chromothripsis is a major process that drives genome evolution in human cancer.

    Funded by: EC | EU Framework Programme for Research and Innovation H2020 | H2020 European Institute of Innovation and Technology (H2020 The European Institute of Innovation and Technology): 703543

    Nature genetics 2020;52;3;331-341

  • Polyclonal Campylobacter fetus Infections Among Unrelated Patients, Montevideo, Uruguay, 2013-2018.

    Costa D, Betancor L, Gadea P, Cabezas L, Caiata L, Palacio R, Seija V, Galiana A, Vieytes M, Cristophersen I, Calleros L and Iraola G

    Microbial Genomics Laboratory, Institut Pasteur de Montevideo, Montevideo, Uruguay.

    In Montevideo (2013-2018), 8 Campylobacter fetus extraintestinal infections were reported. The polyclonal nature of strains revealed by whole-genome sequencing and the apparent lack of epidemiological links was incompatible with a single contamination source, supporting alternative routes of transmission.

    Clinical infectious diseases : an official publication of the Infectious Diseases Society of America 2020;70;6;1236-1239

  • Genomic analysis of natural intra-specific hybrids among Ethiopian isolates of Leishmania donovani.

    Cotton JA, Durrant C, Franssen SU, Gelanew T, Hailu A, Mateus D, Sanders MJ, Berriman M, Volf P, Miles MA and Yeo M

    Wellcome Sanger Institute, Hinxton, United Kingdom.

    Parasites of the genus Leishmania (Kinetoplastida: Trypanosomatidae) cause widespread and devastating human diseases. Visceral leishmaniasis due to Leishmania donovani is endemic in Ethiopia where it has also been responsible for major epidemics. The presence of hybrid genotypes has been widely reported in surveys of natural populations, genetic variation reported in a number of Leishmania species, and the extant capacity for genetic exchange demonstrated in laboratory experiments. However, patterns of recombination and the evolutionary history of admixture that produced these hybrid populations remain unclear. Here, we use whole-genome sequence data to investigate Ethiopian L. donovani isolates previously characterized as hybrids by microsatellite and multi-locus sequencing. To date there is only one previous study on a natural population of Leishmania hybrids based on whole-genome sequences. We propose that these hybrids originate from recombination between two different lineages of Ethiopian L. donovani occurring in the same region. Patterns of inheritance are more complex than previously reported with multiple, apparently independent, origins from similar parents that include backcrossing with parental types. Analysis indicates that hybrids are representative of at least three different histories. Furthermore, isolates were highly polysomic at the level of chromosomes with differences between parasites recovered from a recrudescent infection from a previously treated individual. The results demonstrate that recombination is a significant feature of natural populations and contributes to the growing body of data that shows how recombination, and gene flow, shape natural populations of Leishmania.

    PLoS neglected tropical diseases 2020;14;4;e0007143

  • Screening of a library of recombinant Schistosoma mansoni proteins with sera from murine and human controlled infections identifies early serological markers.

    Crosnier C, Hokke CH, Protasio AV, Brandt C, Rinaldi G, Langenberg MCC, Clare S, Janse JJ, Wilson S, Berriman M, Roestenberg M and Wright GJ

    Wellcome Sanger Institute, Cambridge, UK.

    Schistosomiasis is a major global health problem caused by blood-dwelling parasitic worms, which is currently tackled primarily by mass administration of the drug praziquantel. Appropriate drug treatment strategies are informed by diagnostics that establish the prevalence and intensity of infection, which, in regions of low transmission, should be highly sensitive. To identify sensitive new serological markers of Schistosoma mansoni infections, we have compiled a recombinant protein library of parasite cell-surface and secreted proteins expressed in mammalian cells. Together with a time series of sera samples from volunteers experimentally infected with a defined number of male parasites, we probed this protein library to identify several markers that can detect primary infections with as low as ten parasites and as early as five weeks post infection. These new markers could be further explored as valuable tools to detect ongoing and previous S. mansoni infections, including in endemic regions where transmission is low.

    The Journal of infectious diseases 2020

  • Single-cell RNA-sequencing of differentiating iPS cells reveals dynamic genetic effects on gene expression.

    Cuomo ASE, Seaton DD, McCarthy DJ, Martinez I, Bonder MJ, Garcia-Bernardo J, Amatya S, Madrigal P, Isaacson A, Buettner F, Knights A, Natarajan KN, HipSci Consortium, Vallier L, Marioni JC, Chhatriwala M and Stegle O

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, CB10 1SD Hinxton, Cambridge, UK.

    Recent developments in stem cell biology have enabled the study of cell fate decisions in early human development that are impossible to study in vivo. However, understanding how development varies across individuals and, in particular, the influence of common genetic variants during this process has not been characterised. Here, we exploit human iPS cell lines from 125 donors, a pooled experimental design, and single-cell RNA-sequencing to study population variation of endoderm differentiation. We identify molecular markers that are predictive of differentiation efficiency of individual lines, and utilise heterogeneity in the genetic background across individuals to map hundreds of expression quantitative trait loci that influence expression dynamically during differentiation and across cellular contexts.

    Funded by: Wellcome Trust

    Nature communications 2020;11;1;810

  • Lung function and microbiota diversity in cystic fibrosis.

    Cuthbertson L, Walker AW, Oliver AE, Rogers GB, Rivett DW, Hampton TH, Ashare A, Elborn JS, De Soyza A, Carroll MP, Hoffman LR, Lanyon C, Moskowitz SM, O'Toole GA, Parkhill J, Planet PJ, Teneback CC, Tunney MM, Zuckerman JB, Bruce KD and van der Gast CJ

    National Heart and Lung Institute, Imperial College London, London, UK.

    Background: Chronic infection and concomitant airway inflammation is the leading cause of morbidity and mortality for people living with cystic fibrosis (CF). Although chronic infection in CF is undeniably polymicrobial, involving a lung microbiota, infection surveillance and control approaches remain underpinned by classical aerobic culture-based microbiology. How to use microbiomics to direct clinical management of CF airway infections remains a crucial challenge. A pivotal step towards leveraging microbiome approaches in CF clinical care is to understand the ecology of the CF lung microbiome and identify ecological patterns of CF microbiota across a wide spectrum of lung disease. Assessing sputum samples from 299 patients attending 13 CF centres in Europe and the USA, we determined whether the emerging relationship of decreasing microbiota diversity with worsening lung function could be considered a generalised pattern of CF lung microbiota and explored its potential as an informative indicator of lung disease state in CF.

    Results: We tested and found decreasing microbiota diversity with a reduction in lung function to be a significant ecological pattern. Moreover, the loss of diversity was accompanied by an increase in microbiota dominance. Subsequently, we stratified patients into lung disease categories of increasing disease severity to further investigate relationships between microbiota characteristics and lung function, and the factors contributing to microbiota variance. Core taxa group composition became highly conserved within the severe disease category, while the rarer satellite taxa underpinned the high variability observed in the microbiota diversity. Further, the lung microbiota of individual patient were increasingly dominated by recognised CF pathogens as lung function decreased. Conversely, other bacteria, especially obligate anaerobes, increasingly dominated in those with better lung function. Ordination analyses revealed lung function and antibiotics to be main explanators of compositional variance in the microbiota and the core and satellite taxa. Biogeography was found to influence acquisition of the rarer satellite taxa.

    Conclusions: Our findings demonstrate that microbiota diversity and dominance, as well as the identity of the dominant bacterial species, in combination with measures of lung function, can be used as informative indicators of disease state in CF. Video Abstract.

    Funded by: Natural Environment Research Council: NE/H019456/1

    Microbiome 2020;8;1;45

  • A restricted spectrum of missense KMT2D variants cause a multiple malformations disorder distinct from Kabuki syndrome.

    Cuvertino S, Hartill V, Colyer A, Garner T, Nair N, Al-Gazali L, Canham N, Faundes V, Flinter F, Hertecant J, Holder-Espinasse M, Jackson B, Lynch SA, Nadat F, Narasimhan VM, Peckham M, Sellers R, Seri M, Montanari F, Southgate L, Squeo GM, Trembath R, van Heel D, Venuto S, Weisberg D, Stals K, Ellard S, Genomics England Research Consortium, Barton A, Kimber SJ, Sheridan E, Merla G, Stevens A, Johnson CA and Banka S

    Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine, and Health, The University of Manchester, Manchester, UK.

    Purpose: To investigate if specific exon 38 or 39 KMT2D missense variants (MVs) cause a condition distinct from Kabuki syndrome type 1 (KS1).

    Methods: Multiple individuals, with MVs in exons 38 or 39 of KMT2D that encode a highly conserved region of 54 amino acids flanked by Val3527 and Lys3583, were identified and phenotyped. Functional tests were performed to study their pathogenicity and understand the disease mechanism.

    Results: The consistent clinical features of the affected individuals, from seven unrelated families, included choanal atresia, athelia or hypoplastic nipples, branchial sinus abnormalities, neck pits, lacrimal duct anomalies, hearing loss, external ear malformations, and thyroid abnormalities. None of the individuals had intellectual disability. The frequency of clinical features, objective software-based facial analysis metrics, and genome-wide peripheral blood DNA methylation patterns in these patients were significantly different from that of KS1. Circular dichroism spectroscopy indicated that these MVs perturb KMT2D secondary structure through an increased disordered to ɑ-helical transition.

    Conclusion: KMT2D MVs located in a specific region spanning exons 38 and 39 and affecting highly conserved residues cause a novel multiple malformations syndrome distinct from KS1. Unlike KMT2D haploinsufficiency in KS1, these MVs likely result in disease through a dominant negative mechanism.

    Funded by: Chile's National Commission for Scientific and Technological Research: 72160007; Newlife - The Charity for Disabled Children: 16-17/10; Wellcome Trust

    Genetics in medicine : official journal of the American College of Medical Genetics 2020;22;5;867-877

  • Comparative genomics of Salmonella enterica serovar Enteritidis ST-11 isolated in Uruguay reveals lineages associated with particular epidemiological traits.

    D'Alessandro B, Escanda VP, Balestrazzi L, Grattarola F, Iriarte A, Pickard D, Yim L, Chabalgoity JA and Betancor L

    Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de la República, Av. Alfredo Navarro 3051, CP, 11600, Montevideo, Uruguay.

    Salmonella enterica serovar Enteritidis is a major cause of foodborne disease in Uruguay since 1995. We used a genomic approach to study a set of isolates from different sources and years. Whole genome phylogeny showed that most of the strains are distributed in two major lineages (E1 and E2), both belonging to MLST sequence type 11 the major ST among serovar Enteritidis. Strikingly, E2 isolates are over-represented in periods of outbreak abundance in Uruguay, while E1 span all epidemic periods. Both lineages circulate in neighbor countries at the same timescale as in Uruguay, and are present in minor numbers in distant countries. We identified allelic variants associated with each lineage. Three genes, ycdX, pduD and hsdM, have distinctive variants in E1 that may result in defective products. Another four genes (ybiO, yiaN, aas, aceA) present variants specific for the E2 lineage. Overall this work shows that S. enterica serovar Enteritidis strains circulating in Uruguay have the same phylogenetic profile than strains circulating in the region, as well as in more distant countries. Based on these results we hypothesize that the E2 lineage, which is more prevalent during epidemics, exhibits a combination of allelic variants that could be associated with its epidemic ability.

    Scientific reports 2020;10;1;3638

  • A Robust Method Uncovers Significant Context-Specific Heritability in Diverse Complex Traits.

    Dahl A, Nguyen K, Cai N, Gandal MJ, Flint J and Zaitlen N

    Department of Neurology, University of California Los Angeles, Los Angeles, CA 90095, USA; Department of Medicine, University of California San Francisco, San Francisco, CA 94158, USA. Electronic address:

    Gene-environment interactions (GxE) can be fundamental in applications ranging from functional genomics to precision medicine and is a conjectured source of substantial heritability. However, unbiased methods to profile GxE genome-wide are nascent and, as we show, cannot accommodate general environment variables, modest sample sizes, heterogeneous noise, and binary traits. To address this gap, we propose a simple, unifying mixed model for gene-environment interaction (GxEMM). In simulations and theory, we show that GxEMM can dramatically improve estimates and eliminate false positives when the assumptions of existing methods fail. We apply GxEMM to a range of human and model organism datasets and find broad evidence of context-specific genetic effects, including GxSex, GxAdversity, and GxDisease interactions across thousands of clinical and molecular phenotypes. Overall, GxEMM is broadly applicable for testing and quantifying polygenic interactions, which can be useful for explaining heritability and invaluable for determining biologically relevant environments.

    Funded by: NCI NIH HHS: R01 CA227237; NHGRI NIH HHS: R01 HG006399, U01 HG009080; NHLBI NIH HHS: K25 HL121295; NIDCR NIH HHS: R03 DE025665; NIEHS NIH HHS: R01 ES029929

    American journal of human genetics 2020;106;1;71-91

  • Single-Cell RNA Sequencing Reveals a Dynamic Stromal Niche That Supports Tumor Growth.

    Davidson S, Efremova M, Riedel A, Mahata B, Pramanik J, Huuhtanen J, Kar G, Vento-Tormo R, Hagai T, Chen X, Haniffa MA, Shields JD and Teichmann SA

    Medical Research Council Cancer Unit, University of Cambridge, Hutchison/Medical Research Council Research Centre, Box 197 Cambridge Biomedical Campus, Cambridge, CB2 0XZ, UK.

    Here, using single-cell RNA sequencing, we examine the stromal compartment in murine melanoma and draining lymph nodes (LNs) at points across tumor development, providing data at Naive lymphocytes from LNs undergo activation and clonal expansion within the tumor, before PD1 and Lag3 expression, while tumor-associated myeloid cells promote the formation of a suppressive niche. We identify three temporally distinct stromal populations displaying unique functional signatures, conserved across mouse and human tumors. Whereas "immune" stromal cells are observed in early tumors, "contractile" cells become more prevalent at later time points. Complement component C3 is specifically expressed in the immune population. Its cleavage product C3a supports the recruitment of C3aR<sup>+</sup> macrophages, and perturbation of C3a and C3aR disrupts immune infiltration, slowing tumor growth. Our results highlight the power of scRNA-seq to identify complex interplays and increase stromal diversity as a tumor develops, revealing that stromal cells acquire the capacity to modulate immune landscapes from early disease.

    Cell reports 2020;31;7;107628

  • Formin, an opinion.

    Davison A, McDowell GS, Holden JM, Johnson HF, Wade CM, Chiba S, Jackson DJ, Levin M and Blaxter ML

    University of Nottingham, Nottingham, NG7 2RD, UK

    Development (Cambridge, England) 2020;147;1

  • A practical framework and online tool for mutational signature analyses show inter-tissue variation and driver dependencies.

    Degasperi A, Amarante TD, Czarnecki J, Shooter S, Zou X, Glodzik D, Morganella S, Nanda AS, Badja C, Koh G, Momen SE, Georgakopoulos-Soares I, Dias JML, Young J, Memari Y, Davies H and Nik-Zainal S

    MRC Cancer Unit, University of Cambridge, Hutchison/MRC Research Centre, Cambridge Biomedical Campus, Cambridge, UK CB2 0XZ.

    <i>Mutational signatures</i> are patterns of mutations that arise during tumorigenesis. We present an enhanced, practical framework for mutational signature analyses. Applying these methods on 3,107 whole genome sequenced (WGS) primary cancers of 21 organs reveals known signatures and nine previously undescribed rearrangement signatures. We highlight inter-organ variability of signatures and present a way of visualizing that diversity, reinforcing our findings in an independent analysis of 3,096 WGS metastatic cancers. Signatures with a high level of genomic instability are dependent on <i>TP53</i> dysregulation. We illustrate how uncertainty in mutational signature identification and assignment to samples affects tumor classification, reinforcing that using multiple orthogonal mutational signature data is not only beneficial, it is essential for accurate tumor stratification. Finally, we present a reference web-based tool for cancer and experimentally-generated mutational signatures, called Signal (, that also supports performing mutational signature analyses.

    Funded by: Cancer Research UK: A22932, A23433, A23916; Wellcome Trust: 100183

    Nature cancer 2020;1;2;249-263

  • Defining the Design Principles of Skin Epidermis Postnatal Growth.

    Dekoninck S, Hannezo E, Sifrim A, Miroshnikova YA, Aragona M, Malfait M, Gargouri S, de Neunheuser C, Dubois C, Voet T, Wickström SA, Simons BD and Blanpain C

    Université Libre de Bruxelles, Laboratory of Stem Cells and Cancer, Brussels 1070, Belgium.

    During embryonic and postnatal development, organs and tissues grow steadily to achieve their final size at the end of puberty. However, little is known about the cellular dynamics that mediate postnatal growth. By combining in vivo clonal lineage tracing, proliferation kinetics, single-cell transcriptomics, and in vitro micro-pattern experiments, we resolved the cellular dynamics taking place during postnatal skin epidermis expansion. Our data revealed that harmonious growth is engineered by a single population of developmental progenitors presenting a fixed fate imbalance of self-renewing divisions with an ever-decreasing proliferation rate. Single-cell RNA sequencing revealed that epidermal developmental progenitors form a more uniform population compared with adult stem and progenitor cells. Finally, we found that the spatial pattern of cell division orientation is dictated locally by the underlying collagen fiber orientation. Our results uncover a simple design principle of organ growth where progenitors and differentiated cells expand in harmony with their surrounding tissues.

    Cell 2020

  • Biologically indeterminate yet ordered promiscuous gene expression in single medullary thymic epithelial cells.

    Dhalla F, Baran-Gale J, Maio S, Chappell L, Holländer GA and Ponting CP

    Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK.

    To induce central T-cell tolerance, medullary thymic epithelial cells (mTEC) collectively express most protein-coding genes, thereby presenting an extensive library of tissue-restricted antigens (TRAs). To resolve mTEC diversity and whether promiscuous gene expression (PGE) is stochastic or coordinated, we sequenced transcriptomes of 6,894 single mTEC, enriching for 1,795 rare cells expressing either of two TRAs, TSPAN8 or GP2. Transcriptional heterogeneity allowed partitioning of mTEC into 15 reproducible subpopulations representing distinct maturational trajectories, stages and subtypes, including novel mTEC subsets, such as chemokine-expressing and ciliated TEC, which warrant further characterisation. Unexpectedly, 50 modules of genes were robustly defined each showing patterns of co-expression within individual cells, which were mainly not explicable by chromosomal location, biological pathway or tissue specificity. Further, TSPAN8<sup>+</sup> and GP2<sup>+</sup> mTEC were randomly dispersed within thymic medullary islands. Consequently, these data support observations that PGE exhibits ordered co-expression, although mechanisms underlying this instruction remain biologically indeterminate. Ordered co-expression and random spatial distribution of a diverse range of TRAs likely enhance their presentation and encounter with passing thymocytes, while maintaining mTEC identity.

    Funded by: UK Research and Innovation | Medical Research Council (MRC): MC_UU_00007/15; Wellcome Trust; Wellcome Trust (WT): 105045/Z/14/Z, 109032/Z/15/Z

    The EMBO journal 2020;39;1;e101828

  • Identification of a conserved var gene in different Plasmodium falciparum strains.

    Dimonte S, Bruske EI, Enderes C, Otto TD, Turner L, Kremsner P and Frank M

    Institute of Tropical Medicine, University of Tuebingen, Wilhelmstr. 27, 72074, Tuebingen, Germany.

    Background: The multicopy var gene family of Plasmodium falciparum is of crucial importance for pathogenesis and antigenic variation. So far only var2csa, the var gene responsible for placental malaria, was found to be highly conserved among all P. falciparum strains. Here, a new conserved 3D7 var gene (PF3D7_0617400) is identified in several field isolates.

    Methods: DNA sequencing, transcriptional analysis, Cluster of Differentiation (CD) 36-receptor binding, indirect immunofluorescence with PF3D7_0617400-antibodies and quantification of surface reactivity against semi-immune sera were used to characterize an NF54 clone and a Gabonese field isolate clone (MOA C3) transcribing the gene. A population of 714 whole genome sequenced parasites was analysed to characterize the conservation of the locus in African and Asian isolates. The genetic diversity of two var2csa fragments was compared with the genetic diversity of 57 microsatellites fragments in field isolates.

    Results: PFGA01_060022400 was identified in a Gabonese parasite isolate (MOA) from a chronic infection and found to be 99% identical with PF3D7_0617400 of the 3D7 genome strain. Transcriptional analysis and immunofluorescence showed expression of the gene in an NF54 and a MOA clone but CD36 binding assays and surface reactivity to semi-immune sera differed markedly in the two clones. Long-read Pacific bioscience whole genome sequencing showed that PFGA01_060022400 is located in the internal cluster of chromosome 6. The full length PFGA01_060022400 was detected in 36 of 714 P. falciparum isolates and 500 bp fragments were identified in more than 100 isolates. var2csa was in parts highly conserved (H<sub>e</sub> = 0) but in other parts as variable (H<sub>e</sub> = 0.86) as the 57 microsatellites markers (H<sub>e</sub> = 0.8).

    Conclusions: Individual var gene sequences exhibit conservation in the global parasite population suggesting that purifying selection may limit overall genetic diversity of some var genes. Notably, field and laboratory isolates expressing the same var gene exhibit markedly different phenotypes.

    Malaria journal 2020;19;1;194

  • Deciphering immunity at high plexity and resolution.

    Domínguez Conde C and Teichmann SA

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK.

    Nature reviews. Immunology 2020;20;2;77-78

  • SpeS: A Novel Superantigen and Its Potential as a Vaccine Adjuvant against Strangles.

    Dominguez-Medina CC, Rash NL, Robillard S, Robinson C, Efstratiou A, Broughton K, Parkhill J, Holden MTG, Lopez-Alvarez MR, Paillot R and Waller AS

    Animal Health Trust, Lanwades Park, Kentford, Newmarket CB8 7UU, UK.

    Bacterial superantigens (sAgs) are powerful activators of the immune response that trigger unspecific T cell responses accompanied by the release of proinflammatory cytokines. <i>Streptococcus equi</i> (<i>S. equi</i>) and <i>Streptococcus zooepidemicus</i> (<i>S. zooepidemicus</i>) produce sAgs that play an important role in their ability to cause disease. Strangles, caused by <i>S. equi</i>, is one of the most common infectious diseases of horses worldwide. Here, we report the identification of a new sAg of <i>S. zooepidemicus</i>, SpeS, and show that mutation of the putative T cell receptor (TCR)-binding motif (YAY to IAY) abrogated TCR-binding, whilst maintaining interaction with major histocompatibility complex (MHC) class II molecules. The fusion of SpeS and SpeS<sup>Y39I</sup> to six <i>S. equi</i> surface proteins using two different peptide linkers was conducted to determine if MHC class II-binding properties were maintained. Proliferation assays, qPCR and flow cytometry analysis showed that SpeS<sup>Y39I</sup> and its fusion proteins induced less mitogenic activity and interferon gamma expression when compared to SpeS, whilst retaining Antigen-Presenting Cell (APC)-binding properties. Our data suggest that SpeS<sup>Y39I</sup>-surface protein fusions could be used to direct vaccine antigens towards antigen-presenting cells in vivo with the potential to enhance antigen presentation and improve immune responses.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/P002757/1

    International journal of molecular sciences 2020;21;12

  • 'Community evolution' - laboratory strains and pedigrees in the age of genomics.

    Dorman MJ and Thomson NR

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Molecular microbiologists depend heavily on laboratory strains of bacteria, which are ubiquitous across the community of research groups working on a common organism. However, this presumes that strains present in different laboratories are in fact identical. Work on a culture of <i>Vibrio cholerae</i> preserved from 1916 provoked us to consider recent studies, which have used both classical genetics and next-generation sequencing to study the heterogeneity of laboratory strains. Here, we review and discuss mutations and phenotypic variation in supposedlyisogenic reference strains of <i>V. cholerae</i> and <i>Escherichia coli</i>, and we propose that by virtue of the dissemination of laboratory strains across the world, a large 'community evolution' experiment is currently ongoing.

    Microbiology (Reading, England) 2020

  • Discordant bioinformatic predictions of antimicrobial resistance from whole-genome sequencing data of bacterial isolates: an inter-laboratory study.

    Doyle RM, O'Sullivan DM, Aller SD, Bruchmann S, Clark T, Coello Pelegrin A, Cormican M, Diez Benavente E, Ellington MJ, McGrath E, Motro Y, Phuong Thuy Nguyen T, Phelan J, Shaw LP, Stabler RA, van Belkum A, van Dorp L, Woodford N, Moran-Gilad J, Huggett JF and Harris KA

    Clinical Research Department, London School of Hygiene and Tropical Medicine, London, UK.

    Antimicrobial resistance (AMR) poses a threat to public health. Clinical microbiology laboratories typically rely on culturing bacteria for antimicrobial-susceptibility testing (AST). As the implementation costs and technical barriers fall, whole-genome sequencing (WGS) has emerged as a 'one-stop' test for epidemiological and predictive AST results. Few published comparisons exist for the myriad analytical pipelines used for predicting AMR. To address this, we performed an inter-laboratory study providing sets of participating researchers with identical short-read WGS data from clinical isolates, allowing us to assess the reproducibility of the bioinformatic prediction of AMR between participants, and identify problem cases and factors that lead to discordant results. We produced ten WGS datasets of varying quality from cultured carbapenem-resistant organisms obtained from clinical samples sequenced on either an Illumina NextSeq or HiSeq instrument. Nine participating teams ('participants') were provided these sequence data without any other contextual information. Each participant used their choice of pipeline to determine the species, the presence of resistance-associated genes, and to predict susceptibility or resistance to amikacin, gentamicin, ciprofloxacin and cefotaxime. We found participants predicted different numbers of AMR-associated genes and different gene variants from the same clinical samples. The quality of the sequence data, choice of bioinformatic pipeline and interpretation of the results all contributed to discordance between participants. Although much of the inaccurate gene variant annotation did not affect genotypic resistance predictions, we observed low specificity when compared to phenotypic AST results, but this improved in samples with higher read depths. Had the results been used to predict AST and guide treatment, a different antibiotic would have been recommended for each isolate by at least one participant. These challenges, at the final analytical stage of using WGS to predict AMR, suggest the need for refinements when using this technology in clinical settings. Comprehensive public resistance sequence databases, full recommendations on sequence data quality and standardization in the comparisons between genotype and resistance phenotypes will all play a fundamental role in the successful implementation of AST prediction using WGS in clinical microbiology laboratories.

    Microbial genomics 2020;6;2

  • Molecular Evolution of IDH Wild-Type Glioblastomas Treated With Standard of Care Affects Survival and Design of Precision Medicine Trials: A Report From the EORTC 1542 Study.

    Draaisma K, Chatzipli A, Taphoorn M, Kerkhof M, Weyerbrock A, Sanson M, Hoeben A, Lukacova S, Lombardi G, Leenstra S, Hanse M, Fleischeuer R, Watts C, McAbee J, Angelopoulos N, Gorlia T, Golfinopoulos V, Kros JM, Verhaak RGW, Bours V, van den Bent MJ, McDermott U, Robe PA and French PJ

    Erasmus University Medical Center, Rotterdam, the Netherlands.

    Purpose: Precision medicine trials in glioblastoma (GBM) are often conducted at tumor recurrence. However, second surgeries for recurrent GBM are not routinely performed, and therefore, molecular data for trial inclusion are predominantly derived from the primary sample. This study aims to establish whether molecular targets change during tumor progression and, if so, whether this affects precision medicine trial design.

    Materials and methods: We collected 186 pairs of primary-recurrent GBM samples from patients receiving chemoradiotherapy with temozolomide and sequenced approximately 300 cancer genes. <i>MGMT</i>, <i>TERT</i>, and <i>EGFRvIII</i> status was individually determined.

    Results: The molecular profile of our cohort was identical to that of other GBM cohorts (<i>IDH</i> wild-type [WT], 95%; <i>EGFR</i> amplified, approximately 50%), indicating that patients amenable to second surgery do not represent a specific molecular subtype. Molecular events in <i>IDH</i> WT GBMs were stable in approximately 80% of events, but changes in mutation status were observed for all examined genes (range, approximately 90% and 60% for <i>TERT</i> and <i>EGFR</i> mutations, respectively), and such changes strongly affected targeted trial size and design. A similar pattern of GBM driver instability was observed within <i>MGMT</i> promoter-methylated tumors. <i>MGMT</i> promoter methylation status remained prognostic at tumor recurrence. The observation that hypermutation at GBM recurrence was rare (8%) and not correlated with outcome was relevant for immunotherapy-based treatments.

    Conclusion: This large cohort of matched primary and recurrent <i>IDH</i> WT tumors establishes the frequency of GBM driver instability after chemoradiotherapy with temozolomide. This allows per gene or pathway calculation of trial size at tumor recurrence, using molecular data of the primary tumor only. We also identify genes for which repeat surgery is necessary because of low mutation retention rate.

    Journal of clinical oncology : official journal of the American Society of Clinical Oncology 2020;38;1;81-99

  • Organoids - New Models for Host-Helminth Interactions.

    Duque-Correa MA, Maizels RM, Grencis RK and Berriman M

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK. Electronic address:

    Organoids are multicellular culture systems that replicate tissue architecture and function, and are increasingly used as models of viral, bacterial, and protozoan infections. Organoids have great potential to improve our current understanding of helminth interactions with their hosts and to replace or reduce the dependence on using animal models. In this review, we discuss the applicability of this technology to helminth infection research, including strategies of co-culture of helminths or their products with organoids and the challenges, advantages, and drawbacks of the use of organoids for these studies. We also explore how complementing organoid systems with other cell types and components may allow more complex models to be generated in the future to further investigate helminth-host interactions.

    Funded by: National Centre for the Replacement, Refinement and Reduction of Animals in Research: NC/P001521/1

    Trends in parasitology 2020;36;2;170-181

  • Synergistic Targeting of FLT3 Mutations in AML via Combined Menin-MLL and FLT3 Inhibition.

    Dzama MM, Steiner M, Rausch J, Sasca D, Schönfeld J, Kunz K, Taubert MC, McGeehan GM, Chen CW, Mupo A, Hähnel PS, Theobald M, Kindler T, Koche RP, Vassiliou GS, Armstrong SA and Kühn MWM

    University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany.

    The interaction of Menin (MEN1) and MLL (MLL1, KMT2A) is a dependency and potential therapeutic opportunity against NPM1-mutant (NPM1mut) and MLL-rearranged (MLL-r) leukemias. Concomitant activating driver mutations in the gene encoding the tyrosine kinase FLT3 occur in both leukemias and are particularly common in the NPM1mut subtype. Transcriptional profiling upon pharmacological inhibition of the Menin-MLL complex revealed specific changes in gene expression with downregulation of the MEIS1 transcription-factor and its transcriptional target gene FLT3 being most pronounced. Combining Menin-MLL-inhibition with specific small-molecule kinase inhibitors of FLT3-phosphorylation resulted in a significantly superior reduction of phosphorylated FLT3 and transcriptional suppression of genes downstream to FLT3 signaling. The drug combination induced synergistic inhibition of proliferation as well as enhanced apoptosis and differentiation compared to single-drug treatment in models of human and murine NPM1mut and MLL-r leukemias harboring an FLT3 mutation. Primary AML cells harvested from patients with NPM1mutFLT3mut AML showed significantly better responses to combined Menin and FLT3-inhibition than to single-drug or vehicle control treatment, while AML cells with wildtype NPM1, MLL, and FLT3 were not affected by any of the two drugs. In vivo treatment of leukemic animals with MLL-r FLT3mut leukemia reduced leukemia burden significantly and prolonged survival compared to the single-drug and vehicle control groups. Our data suggest that combined Menin-MLL and FLT3-inhibition represents a novel and promising therapeutic strategy for patients with NPM1mut or MLL-r leukemia and concurrent FLT3 mutation.

    Blood 2020

  • Epigenetic priming by Dppa2 and 4 in pluripotency facilitates multi-lineage commitment.

    Eckersley-Maslin MA, Parry A, Blotenburg M, Krueger C, Ito Y, Franklin VNR, Narita M, D'Santos CS and Reik W

    Epigenetics Programme, Babraham Institute, Cambridge, UK.

    How the epigenetic landscape is established in development is still being elucidated. Here, we uncover developmental pluripotency associated 2 and 4 (DPPA2/4) as epigenetic priming factors that establish a permissive epigenetic landscape at a subset of developmentally important bivalent promoters characterized by low expression and poised RNA-polymerase. Differentiation assays reveal that Dppa2/4 double knockout mouse embryonic stem cells fail to exit pluripotency and differentiate efficiently. DPPA2/4 bind both H3K4me3-marked and bivalent gene promoters and associate with COMPASS- and Polycomb-bound chromatin. Comparing knockout and inducible knockdown systems, we find that acute depletion of DPPA2/4 results in rapid loss of H3K4me3 from key bivalent genes, while H3K27me3 is initially more stable but lost following extended culture. Consequently, upon DPPA2/4 depletion, these promoters gain DNA methylation and are unable to be activated upon differentiation. Our findings uncover a novel epigenetic priming mechanism at developmental promoters, poising them for future lineage-specific activation.

    Nature structural & molecular biology 2020

  • Patient-specific logic models of signaling pathways from screenings on cancer biopsies to prioritize personalized combination therapies.

    Eduati F, Jaaks P, Wappler J, Cramer T, Merten CA, Garnett MJ and Saez-Rodriguez J

    European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany.

    Mechanistic modeling of signaling pathways mediating patient-specific response to therapy can help to unveil resistance mechanisms and improve therapeutic strategies. Yet, creating such models for patients, in particular for solid malignancies, is challenging. A major hurdle to build these models is the limited material available that precludes the generation of large-scale perturbation data. Here, we present an approach that couples ex vivo high-throughput screenings of cancer biopsies using microfluidics with logic-based modeling to generate patient-specific dynamic models of extrinsic and intrinsic apoptosis signaling pathways. We used the resulting models to investigate heterogeneity in pancreatic cancer patients, showing dissimilarities especially in the PI3K-Akt pathway. Variation in model parameters reflected well the different tumor stages. Finally, we used our dynamic models to efficaciously predict new personalized combinatorial treatments. Our results suggest that our combination of microfluidic experiments and mathematical model can be a novel tool toward cancer precision medicine.

    Funded by: European Molecular Biology Laboratory Interdisciplinary postdoc (EMBL EIPOD) and Marie Curie Action (COFUND); JRC for Computational Biomedicine was partially funded by Bayer AG

    Molecular systems biology 2020;16;2;e8664

  • Computational methods for single-cell omics across modalities.

    Efremova M and Teichmann SA

    Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.

    Nature methods 2020;17;1;14-17

  • CellPhoneDB: inferring cell-cell communication from combined expression of multi-subunit ligand-receptor complexes.

    Efremova M, Vento-Tormo M, Teichmann SA and Vento-Tormo R

    Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.

    Cell-cell communication mediated by ligand-receptor complexes is critical to coordinating diverse biological processes, such as development, differentiation and inflammation. To investigate how the context-dependent crosstalk of different cell types enables physiological processes to proceed, we developed CellPhoneDB, a novel repository of ligands, receptors and their interactions. In contrast to other repositories, our database takes into account the subunit architecture of both ligands and receptors, representing heteromeric complexes accurately. We integrated our resource with a statistical framework that predicts enriched cellular interactions between two cell types from single-cell transcriptomics data. Here, we outline the structure and content of our repository, provide procedures for inferring cell-cell communication networks from single-cell RNA sequencing data and present a practical step-by-step guide to help implement the protocol. CellPhoneDB v.2.0 is an updated version of our resource that incorporates additional functionalities to enable users to introduce new interacting molecules and reduces the time and resources needed to interrogate large datasets. CellPhoneDB v.2.0 is publicly available, both as code and as a user-friendly web interface; it can be used by both experts and researchers with little experience in computational genomics. In our protocol, we demonstrate how to evaluate meaningful biological interactions with CellPhoneDB v.2.0 using published datasets. This protocol typically takes ~2 h to complete, from installation to statistical analysis and visualization, for a dataset of ~10 GB, 10,000 cells and 19 cell types, and using five threads.

    Funded by: Wellcome Trust (Wellcome): 211276/Z/18/Z, WT206194

    Nature protocols 2020

  • Immunology in the Era of Single-Cell Technologies.

    Efremova M, Vento-Tormo R, Park JE, Teichmann SA and James KR

    Wellcome Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, United Kingdom; email:,

    Immune cells are characterized by diversity, specificity, plasticity, and adaptability-properties that enable them to contribute to homeostasis and respond specifically and dynamically to the many threats encountered by the body. Single-cell technologies, including the assessment of transcriptomics, genomics, and proteomics at the level of individual cells, are ideally suited to studying these properties of immune cells. In this review we discuss the benefits of adopting single-cell approaches in studying underappreciated qualities of immune cells and highlight examples where these technologies have been critical to advancing our understanding of the immune system in health and disease. Expected final online publication date for the <i>Annual Review of Immunology</i>, Volume 38 is April 26, 2020. Please see for revised estimates.

    Annual review of immunology 2020

  • An amphipathic peptide with antibiotic activity against multidrug-resistant Gram-negative bacteria.

    Elliott AG, Huang JX, Neve S, Zuegg J, Edwards IA, Cain AK, Boinett CJ, Barquist L, Lundberg CV, Steen J, Butler MS, Mobli M, Porter KM, Blaskovich MAT, Lociuro S, Strandh M and Cooper MA

    Centre for Superbug Solutions, Institute for Molecular Bioscience, The University of Queensland, Queensland, QLD, 4072, Australia.

    Peptide antibiotics are an abundant and synthetically tractable source of molecular diversity, but they are often cationic and can be cytotoxic, nephrotoxic and/or ototoxic, which has limited their clinical development. Here we report structure-guided optimization of an amphipathic peptide, arenicin-3, originally isolated from the marine lugworm Arenicola marina. The peptide induces bacterial membrane permeability and ATP release, with serial passaging resulting in a mutation in mlaC, a phospholipid transport gene. Structure-based design led to AA139, an antibiotic with broad-spectrum in vitro activity against multidrug-resistant and extensively drug-resistant bacteria, including ESBL, carbapenem- and colistin-resistant clinical isolates. The antibiotic induces a 3-4 log reduction in bacterial burden in mouse models of peritonitis, pneumonia and urinary tract infection. Cytotoxicity and haemolysis of the progenitor peptide is ameliorated with AA139, and the 'no observable adverse effect level' (NOAEL) dose in mice is ~10-fold greater than the dose generally required for efficacy in the infection models.

    Funded by: Department of Health | National Health and Medical Research Council (NHMRC): AP1106590, APP1059354; RCUK | Medical Research Council (MRC): G1100100/1; Wellcome Trust (Wellcome): WT098051, WT104797/Z/14/Z

    Nature communications 2020;11;1;3184

  • A missense variant in Mitochondrial Amidoxime Reducing Component 1 gene and protection against liver disease.

    Emdin CA, Haas ME, Khera AV, Aragam K, Chaffin M, Klarin D, Hindy G, Jiang L, Wei WQ, Feng Q, Karjalainen J, Havulinna A, Kiiskinen T, Bick A, Ardissino D, Wilson JG, Schunkert H, McPherson R, Watkins H, Elosua R, Bown MJ, Samani NJ, Baber U, Erdmann J, Gupta N, Danesh J, Saleheen D, Chang KM, Vujkovic M, Voight B, Damrauer S, Lynch J, Kaplan D, Serper M, Tsao P, Million Veteran Program, Mercader J, Hanis C, Daly M, Denny J, Gabriel S and Kathiresan S

    Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts, United States of America.

    Analyzing 12,361 all-cause cirrhosis cases and 790,095 controls from eight cohorts, we identify a common missense variant in the Mitochondrial Amidoxime Reducing Component 1 gene (MARC1 p.A165T) that associates with protection from all-cause cirrhosis (OR 0.91, p = 2.3*10-11). This same variant also associates with lower levels of hepatic fat on computed tomographic imaging and lower odds of physician-diagnosed fatty liver as well as lower blood levels of alanine transaminase (-0.025 SD, 3.7*10-43), alkaline phosphatase (-0.025 SD, 1.2*10-37), total cholesterol (-0.030 SD, p = 1.9*10-36) and LDL cholesterol (-0.027 SD, p = 5.1*10-30) levels. We identified a series of additional MARC1 alleles (low-frequency missense p.M187K and rare protein-truncating p.R200Ter) that also associated with lower cholesterol levels, liver enzyme levels and reduced risk of cirrhosis (0 cirrhosis cases for 238 R200Ter carriers versus 17,046 cases of cirrhosis among 759,027 non-carriers, p = 0.04) suggesting that deficiency of the MARC1 enzyme may lower blood cholesterol levels and protect against cirrhosis.

    PLoS genetics 2020;16;4;e1008629

  • Single-cell transcriptomics of allo-reactive CD4+ T cells over time reveals divergent fates during gut GVHD.

    Engel JA, Lee HJ, Williams CG, Kuns RD, Olver S, Lansink LI, Soon MS, Andersen SB, Powell JE, Svensson V, Teichmann SA, Hill GR, Varelias A, Koyama M and Haque A

    Department of Immunology, QIMR Berghofer Medical Research Institute, Brisbane, Australia.

    Acute gastrointestinal Graft-versus-Host-Disease (GVHD) is a primary determinant of mortality after allogeneic hematopoietic stem-cell transplantation (alloSCT). It is mediated by alloreactive donor CD4+ T cells that differentiate into pathogenic subsets expressing IFNγ, IL-17A or GM-CSF, and is regulated by subsets expressing IL-10 and/or Foxp3. Developmental relationships between T-helper states during priming in mesenteric lymph nodes (mLN) and effector function in the GI tract remain undefined at genome-scale. We applied scRNA-seq and computational modelling to a mouse model of donor DC-mediated GVHD exacerbation, creating an atlas of putative CD4+ T-cell differentiation pathways in vivo. Computational trajectory inference suggested emergence of pathogenic and regulatory states along a single developmental trajectory in mLN. Importantly, we inferred an unexpected second trajectory, categorised by little proliferation or cytokine expression, reduced glycolysis, and high tcf7 expression. TCF1hi cells upregulated α4β7 prior to gut migration and failed to express cytokines therein. Nevertheless, they exhibited recall potential and plasticity following secondary transplantation, including cytokine or Foxp3 expression, but reduced TCF1. Thus, scRNA-seq suggested divergence of allo-reactive CD4+ T cells into quiescent and effector states during gut GVHD exacerbation by donor DC, reflecting putative heterogenous priming in vivo. These findings, the first at a single-cell level during GVHD over time, may assist in examination of T cell differentiation in patients undergoing alloSCT.

    JCI insight 2020

  • Concordance for clonal hematopoiesis is limited in elderly twins.

    Fabre MA, McKerrell T, Zwiebel M, Vijayabaskar MS, Park N, Wells PM, Rad R, Deloukas P, Small K, Steves CJ and Vassiliou GS

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom.

    Although acquisition of leukemia-associated somatic mutations by 1 or more hematopoietic stem cells is inevitable with advancing age, its consequences are highly variable, ranging from clinically silent clonal hematopoiesis (CH) to leukemic progression. To investigate the influence of heritable factors on CH, we performed deep targeted sequencing of blood DNA from 52 monozygotic (MZ) and 27 dizygotic (DZ) twin pairs (aged 70-99 years). Using this highly sensitive approach, we identified CH (variant allele frequency ≥0.5%) in 62% of individuals. We did not observe higher concordance for CH within MZ twin pairs as compared with that within DZ twin pairs, or to that expected by chance. However, we did identify 2 MZ pairs in which both twins harbored identical rare somatic mutations, suggesting a shared cell of origin. Finally, in 3 MZ twin pairs harboring mutations in the same driver genes, serial blood samples taken 4 to 5 years apart showed substantial twin-to-twin variability in clonal trajectories. Our findings propose that the inherited genome does not exert a dominant influence on the behavior of adult CH and provide evidence that CH mutations may be acquired in utero.

    Funded by: Medical Research Council: MC_PC_12009; Wellcome Trust

    Blood 2020;135;4;269-273

  • FGFR1 Oncogenic Activation Reveals an Alternative Cell of Origin of SCLC in Rb1/p53 Mice.

    Ferone G, Song JY, Krijgsman O, van der Vliet J, Cozijnsen M, Semenova EA, Adams DJ, Peeper D and Berns A

    Oncode Institute, Division of Molecular Genetics, the Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, the Netherlands.

    Fibroblast growth factor receptor 1 (FGFR1) is frequently amplified in human small-cell lung cancer (SCLC), but its contribution to SCLC and other lung tumors has remained elusive. Here, we assess the tumorigenic capacity of constitutive-active FGFR1 (FGFR1<sup>K656E</sup>) with concomitant RB and P53 depletion in mouse lung. Our results reveal a context-dependent effect of FGFR1<sup>K656E</sup>: it impairs SCLC development from CGRP<sup>POS</sup> neuroendocrine (NE) cells, which are considered the major cell of origin of SCLC, whereas it promotes SCLC and low-grade NE bronchial lesions from tracheobronchial-basal cells. Moreover, FGFR1<sup>K656E</sup> induces lung adenocarcinoma (LADC) from most lung cell compartments. However, its expression is not sustained in LADC originating from CGRP<sup>POS</sup> cells. Therefore, cell context and tumor stage should be taken into account when considering FGFR1 inhibition as a therapeutic option.

    Cell reports 2020;30;11;3837-3850.e3

  • simurg: simulate bacterial pangenomes in R.

    Ferrés I, Fresia P and Iraola G

    Microbial Genomics Laboratory, Institut Pasteur Montevideo, Uruguay.

    Motivation: The pangenome concept describes genetic variability as the union of genes shared in a set of genomes and constitutes the current paradigm for comparative analysis of bacterial populations. However, there is a lack of tools to simulate pangenome variability and structure using defined evolutionary models.

    Results: We developed simurg, an R package that allows to simulate bacterial pangenomes using different combinations of evolutionary constraints such as gene gain, gene loss and mutation rates. Our tool allows the straightforward and reproducible simulation of bacterial pangenomes using real sequence data, providing a valuable tool for benchmarking of pangenome software or comparing evolutionary hypotheses.

    Availability and implementation: The simurg package is released under the GPL-3 license, and is freely available for download from GitHub (

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Bioinformatics (Oxford, England) 2020;36;4;1273-1274

  • Update on the pathology, genetics and somatic landscape of sebaceous tumours.

    Ferreira I, Wiedemeyer K, Demetter P, Adams DJ, Arends MJ and Brenn T

    Université Libre de Bruxelles, Brussels, Belgium.

    Cutaneous sebaceous neoplasms show a predilection for the head and neck area of adults and include tumours with benign behaviour, sebaceous adenoma and sebaceoma, and sebaceous carcinoma with potential for an aggressive disease course at the malignant end of the spectrum. The majority of tumours are solitary and sporadic, but a subset of tumours may be associated with Lynch syndrome, also known as hereditary non-polyposis colon cancer (HNPCC) and previously referred to as Muir-Torre syndrome (now known to be part of Lynch syndrome). This review provides an overview of the clinical and histological features of cutaneous sebaceous neoplasia with an emphasis on differentiating features and differential diagnosis. It also offers insights into the recently described molecular pathways involved in the development of sebaceous tumours and their association with Lynch syndrome.

    Funded by: Cancer Research UK: 14356

    Histopathology 2020;76;5;640-649

  • Cohort Profile: East London Genes & Health (ELGH), a community-based population genomics and health study in British Bangladeshi and British Pakistani people.

    Finer S, Martin HC, Khan A, Hunt KA, MacLaughlin B, Ahmed Z, Ashcroft R, Durham C, MacArthur DG, McCarthy MI, Robson J, Trivedi B, Griffiths C, Wright J, Trembath RC and van Heel DA

    Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK.

    Funded by: Department of Health; Medical Research Council: MR/M009017/1; Wellcome Trust: 102627, 210561, 210561/Z/18/Z

    International journal of epidemiology 2020;49;1;20-21i

  • Genomic signatures of domestication in Old World camels.

    Fitak RR, Mohandesan E, Corander J, Yadamsuren A, Chuluunbat B, Abdelhadi O, Raziq A, Nagy P, Walzer C, Faye B and Burger PA

    Institute of Population Genetics, Vetmeduni Vienna, Veterinärplatz 1, 1210, Vienna, Austria.

    Domestication begins with the selection of animals showing less fear of humans. In most domesticates, selection signals for tameness have been superimposed by intensive breeding for economical or other desirable traits. Old World camels, conversely, have maintained high genetic variation and lack secondary bottlenecks associated with breed development. By re-sequencing multiple genomes from dromedaries, Bactrian camels, and their endangered wild relatives, here we show that positive selection for candidate genes underlying traits collectively referred to as 'domestication syndrome' is consistent with neural crest deficiencies and altered thyroid hormone-based signaling. Comparing our results with other domestic species, we postulate that the core set of domestication genes is considerably smaller than the pan-domestication set - and overlapping genes are likely a result of chance and redundancy. These results, along with the extensive genomic resources provided, are an important contribution to understanding the evolutionary history of camels and the genomic features of their domestication.

    Funded by: Austrian Science Fund (Fonds zur Förderung der Wissenschaftlichen Forschung): P24706-B25, P29623-B25

    Communications biology 2020;3;1;316

  • Genomically Aided Diagnosis of Severe Developmental Disorders.

    FitzPatrick DR and Firth HV

    MRC Human Genetics Unit, University of Edinburgh, Edinburgh EH4 2XU, United Kingdom; email:

    Our ability to make accurate and specific genetic diagnoses in individuals with severe developmental disorders has been transformed by data derived from genomic sequencing technologies. These data reveal both the patterns and rates of different mutational mechanisms and identify regions of the human genome with fewer mutations than would be expected. In outbred populations, the most common identifiable cause of severe developmental disorders is de novo mutation affecting the coding region in one of approximately 500 different genes, almost universally showing constraint. Simply combining the location of a de novo genomic event with its predicted consequence on the gene product gives significant diagnostic power. Our knowledge of the diversity of phenotypic consequences associated with comparable diagnostic genotypes at each locus is improving. Computationally useful phenotype data will improve diagnostic interpretation of ultrarare genetic variants and, in the long run, indicate which specific embryonic processes have been perturbed. Expected final online publication date for the <i>Annual Review of Genomics and Human Genetics</i>, Volume 21 is August 31, 2020. Please see for revised estimates.

    Annual review of genomics and human genetics 2020

  • IRF5 Promotes Influenza Virus-Induced Inflammatory Responses in Human Induced Pluripotent Stem Cell-Derived Myeloid Cells and Murine Models.

    Forbester JL, Clement M, Wellington D, Yeung A, Dimonte S, Marsden M, Chapman L, Coomber EL, Tolley C, Lees E, Hale C, Clare S, Udalova I, Dong T, Dougan G and Humphreys IR

    Division of Infection and Immunity/Systems Immunity, University Research Institute, Cardiff, United Kingdom

    Recognition of influenza A virus (IAV) by the innate immune system triggers pathways that restrict viral replication, activate innate immune cells, and regulate adaptive immunity. However, excessive innate immune activation can exaggerate disease. The pathways promoting excessive activation are incompletely understood, with limited experimental models to investigate the mechanisms driving influenza virus-induced inflammation in humans. Interferon regulatory factor 5 (IRF5) is a transcription factor that plays important roles in the induction of cytokines after viral sensing. In an <i>in vivo</i> model of IAV infection, IRF5 deficiency reduced IAV-driven immune pathology and associated inflammatory cytokine production, specifically reducing cytokine-producing myeloid cell populations in <i>Irf5</i><sup>-/-</sup> mice but not impacting type 1 interferon (IFN) production or virus replication. Using cytometry by time of flight (CyTOF), we identified that human lung IRF5 expression was highest in cells of the myeloid lineage. To investigate the role of IRF5 in mediating human inflammatory responses by myeloid cells to IAV, we employed human-induced pluripotent stem cells (hIPSCs) with biallelic mutations in <i>IRF5</i>, demonstrating for the first time that induced pluripotent stem cell-derived dendritic cells (iPS-DCs) with biallelic mutations can be used to investigate the regulation of human virus-induced immune responses. Using this technology, we reveal that IRF5 deficiency in human DCs, or macrophages, corresponded with reduced virus-induced inflammatory cytokine production, with IRF5 acting downstream of Toll-like receptor 7 (TLR7) and, possibly, retinoic acid-inducible gene I (RIG-I) after viral sensing. Thus, IRF5 acts as a regulator of myeloid cell inflammatory cytokine production during IAV infection in mice and humans and drives immune-mediated viral pathogenesis independently of type 1 IFN and virus replication.<b>IMPORTANCE</b> The inflammatory response to influenza A virus (IAV) participates in infection control but contributes to disease severity. After viral detection, intracellular pathways are activated, initiating cytokine production, but these pathways are incompletely understood. We show that interferon regulatory factor 5 (IRF5) mediates IAV-induced inflammation and, in mice, drives pathology. This was independent of antiviral type 1 IFN and virus replication, implying that IRF5 could be specifically targeted to treat influenza virus-induced inflammation. We show for the first time that human iPSC technology can be exploited in genetic studies of virus-induced immune responses. Using this technology, we deleted IRF5 in human myeloid cells. These IRF5-deficient cells exhibited impaired influenza virus-induced cytokine production and revealed that IRF5 acts downstream of Toll-like receptor 7 and possibly retinoic acid-inducible gene I. Our data demonstrate the importance of IRF5 in influenza virus-induced inflammation, suggesting that genetic variation in the IRF5 gene may influence host susceptibility to viral diseases.

    Journal of virology 2020;94;9

  • Sex chromosome evolution in parasitic nematodes of humans.

    Foster JM, Grote A, Mattick J, Tracey A, Tsai YC, Chung M, Cotton JA, Clark TA, Geber A, Holroyd N, Korlach J, Li Y, Libro S, Lustigman S, Michalski ML, Paulini M, Rogers MB, Teigen L, Twaddle A, Welch L, Berriman M, Dunning Hotopp JC and Ghedin E

    Division of Protein Expression & Modification, New England Biolabs, Ipswich, MA, 01938, USA.

    Sex determination mechanisms often differ even between related species yet the evolution of sex chromosomes remains poorly understood in all but a few model organisms. Some nematodes such as Caenorhabditis elegans have an XO sex determination system while others, such as the filarial parasite Brugia malayi, have an XY mechanism. We present a complete B. malayi genome assembly and define Nigon elements shared with C. elegans, which we then map to the genomes of other filarial species and more distantly related nematodes. We find a remarkable plasticity in sex chromosome evolution with several distinct cases of neo-X and neo-Y formation, X-added regions, and conversion of autosomes to sex chromosomes from which we propose a model of chromosome evolution across different nematode clades. The phylum Nematoda offers a new and innovative system for gaining a deeper understanding of sex chromosome evolution.

    Funded by: NIAID NIH HHS: U19 AI110820; Wellcome Trust (Wellcome): 098051

    Nature communications 2020;11;1;1964

  • Global genome diversity of the Leishmania donovani complex.

    Franssen SU, Durrant C, Stark O, Moser B, Downing T, Imamura H, Dujardin JC, Sanders MJ, Mauricio I, Miles MA, Schnur LF, Jaffe CL, Nasereddin A, Schallig H, Yeo M, Bhattacharyya T, Alam MZ, Berriman M, Wirth T, Schönian G and Cotton JA

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom.

    Protozoan parasites of the <i>Leishmania donovani</i> complex - <i>L. donovani</i> and <i>L. infantum</i> - cause the fatal disease visceral leishmaniasis. We present the first comprehensive genome-wide global study, with 151 cultured field isolates representing most of the geographical distribution. <i>L. donovani</i> isolates separated into five groups that largely coincide with geographical origin but vary greatly in diversity. In contrast, the majority of <i>L. infantum</i> samples fell into one globally-distributed group with little diversity. This picture is complicated by several hybrid lineages. Identified genetic groups vary in heterozygosity and levels of linkage, suggesting different recombination histories. We characterise chromosome-specific patterns of aneuploidy and identified extensive structural variation, including known and suspected drug resistance loci. This study reveals greater genetic diversity than suggested by geographically-focused studies, provides a resource of genomic variation for future work and sets the scene for a new understanding of the evolution and genetics of the <i>Leishmania donovani</i> complex.

    Funded by: EU Framework Programme for Research and Innovation: FP7- 222895; Wellcome Trust: Wellcome Sanger Institute core funding, WT098051, Wellcome Sanger Institute core funding, WT206194

    eLife 2020;9

  • Rapid and sensitive large-scale screening of low affinity extracellular receptor protein interactions by using reaction induced inhibition of Gaussia luciferase.

    Galaway F and Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Sanger Institute, Cambridge, UK.

    Extracellular protein interactions mediated by cell surface receptors are essential for intercellular communication in multicellular organisms. Assays to detect extracellular interactions must account for their often weak binding affinities and also the biochemical challenges in solubilising membrane-embedded receptors in an active form. Methods based on detecting direct binding of soluble recombinant receptor ectodomains have been successful, but genome-scale screening is limited by the usual requirement of producing sufficient amounts of each protein in two different forms, usually a "bait" and "prey". Here, we show that oligomeric receptor ectodomains coupled to concatenated units of the light-generating Gaussia luciferase enzyme robustly detected low affinity interactions and reduced the amount of protein required by several orders of magnitude compared to other reporter enzymes. Importantly, we discovered that this flash-type luciferase exhibited a reaction-induced inhibition that permitted the use of a single protein preparation as both bait and prey thereby halving the number of expression plasmids and recombinant proteins required for screening. This approach was tested against a benchmarked set of quantified extracellular interactions and shown to detect extremely weak interactions (K<sub>D</sub>s ≥ μM). This method will facilitate large-scale receptor interaction screening and contribute to the goal of mapping networks of cellular communication.

    Scientific reports 2020;10;1;10522

  • A New Pneumococcal Capsule Type, 10D, is the 100th Serotype and Has a Large cps Fragment from an Oral Streptococcus.

    Ganaie F, Saad JS, McGee L, van Tonder AJ, Bentley SD, Lo SW, Gladstone RA, Turner P, Keenan JD, Breiman RF and Nahm MH

    Department of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, USA.

    <i>Streptococcus pneumoniae</i> (pneumococcus) is a major human pathogen producing structurally diverse capsular polysaccharides. Widespread use of highly successful pneumococcal conjugate vaccines (PCVs) targeting pneumococcal capsules has greatly reduced infections by the vaccine types but increased infections by nonvaccine serotypes. Herein, we report a new and the 100th capsule type, named serotype 10D, by determining its unique chemical structure and biosynthetic roles of all capsule synthesis locus (<i>cps</i>) genes. The name 10D reflects its serologic cross-reaction with serotype 10A and appearance of cross-opsonic antibodies in response to immunization with 10A polysaccharide in a 23-valent pneumococcal vaccine. Genetic analysis showed that 10D <i>cps</i> has three large regions syntenic to and highly homologous with <i>cps</i> loci from serotype 6C, serotype 39, and an oral streptococcus strain (<i>S. mitis</i> SK145). The 10D <i>cps</i> region syntenic to SK145 is about 6 kb and has a short gene fragment of <i>wciN</i>α at the 5' end. The presence of this nonfunctional <i>wciN</i>α fragment provides compelling evidence for a recent interspecies genetic transfer from oral streptococcus to pneumococcus. Since oral streptococci have a large repertoire of <i>cps</i> loci, widespread PCV usage could facilitate the appearance of novel serotypes through interspecies recombination.<b>IMPORTANCE</b> The polysaccharide capsule is essential for the pathogenicity of pneumococcus, which is responsible for millions of deaths worldwide each year. Currently available pneumococcal vaccines are designed to elicit antibodies to the capsule polysaccharides of the pneumococcal isolates commonly causing diseases, and the antibodies provide protection only against the pneumococcus expressing the vaccine-targeted capsules. Since pneumococci can produce different capsule polysaccharides and therefore reduce vaccine effectiveness, it is important to track the appearance of novel pneumococcal capsule types and how these new capsules are created. Herein, we describe a new and the 100th pneumococcal capsule type with unique chemical and serological properties. The capsule type was named 10D for its serologic similarity to 10A. Genetic studies provide strong evidence that pneumococcus created 10D capsule polysaccharide by capturing a large genetic fragment from an oral streptococcus. Such interspecies genetic exchanges could greatly increase diversity of pneumococcal capsules and complicate serotype shifts.

    mBio 2020;11;3

  • Identification of slit3 as a locus affecting nicotine preference in zebrafish and human smoking behaviour.

    García-González J, Brock AJ, Parker MO, Riley RJ, Joliffe D, Sudwarts A, Teh MT, Busch-Nentwich EM, Stemple DL, Martineau AR, Kaprio J, Palviainen T, Kuan V, Walton RT and Brennan CH

    School of Biological and Chemical Sciences, Queen Mary, University of London, London, United Kingdom.

    To facilitate smoking genetics research we determined whether a screen of mutagenized zebrafish for nicotine preference could predict loci affecting smoking behaviour. From 30 screened F<sub>3</sub> sibling groups, where each was derived from an individual ethyl-nitrosurea mutagenized F<sub>0</sub> fish, two showed increased or decreased nicotine preference. Out of 25 inactivating mutations carried by the F<sub>3</sub> fish, one in the <i>slit3</i> gene segregated with increased nicotine preference in heterozygous individuals. Focussed SNP analysis of the human <i>SLIT3</i> locus in cohorts from UK (n=863) and Finland (n=1715) identified two variants associated with cigarette consumption and likelihood of cessation. Characterisation of <i>slit3</i> mutant larvae and adult fish revealed decreased sensitivity to the dopaminergic and serotonergic antagonist amisulpride, known to affect startle reflex that is correlated with addiction in humans, and increased <i>htr1aa</i> mRNA expression in mutant larvae. No effect on neuronal pathfinding was detected. These findings reveal a role for SLIT3 in development of pathways affecting responses to nicotine in zebrafish and smoking in humans.

    Funded by: Academy of Finland: 308248, 312073; Biotechnology and Biological Sciences Research Council: BB/M007863; Medical Research Council: G1000403; NIDA NIH HHS: U01 DA044400; NIH HHS: Project grant, U01 DA 044400-03; National Centre for the Replacement, Refinement and Reduction of Animals in Research: G1000053; National Institute for Health Research: NF-SI-0515-10076, NIHR PGfAR RP-PG-0407-10398, PGfAR RP-PG-0609-10181; Royal Society: Industry Fellows College; Wellcome Trust: Clinical research fellowship WT 110284/Z/15/Z

    eLife 2020;9

  • Long-term expansion, genomic stability and in vivo safety of adult human pancreas organoids.

    Georgakopoulos N, Prior N, Angres B, Mastrogiovanni G, Cagan A, Harrison D, Hindley CJ, Arnes-Benito R, Liau SS, Curd A, Ivory N, Simons BD, Martincorena I, Wurst H, Saeb-Parsy K and Huch M

    The Wellcome Trust/ Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK.

    Background: Pancreatic organoid systems have recently been described for the in vitro culture of pancreatic ductal cells from mouse and human. Mouse pancreatic organoids exhibit unlimited expansion potential, while previously reported human pancreas organoid (hPO) cultures do not expand efficiently long-term in a chemically defined, serum-free medium. We sought to generate a 3D culture system for long-term expansion of human pancreas ductal cells as hPOs to serve as the basis for studies of human pancreas ductal epithelium, exocrine pancreatic diseases and the development of a genomically stable replacement cell therapy for diabetes mellitus.

    Results: Our chemically defined, serum-free, human pancreas organoid culture medium supports the generation and expansion of hPOs with high efficiency from both fresh and cryopreserved primary tissue. hPOs can be expanded from a single cell, enabling their genetic manipulation and generation of clonal cultures. hPOs expanded for months in vitro maintain their ductal morphology, biomarker expression and chromosomal integrity. Xenografts of hPOs survive long-term in vivo when transplanted into the pancreas of immunodeficient mice. Notably, mouse orthotopic transplants show no signs of tumorigenicity. Crucially, our medium also supports the establishment and expansion of hPOs in a chemically defined, modifiable and scalable, biomimetic hydrogel.

    Conclusions: hPOs can be expanded long-term, from both fresh and cryopreserved human pancreas tissue in a chemically defined, serum-free medium with no detectable tumorigenicity. hPOs can be clonally expanded, genetically manipulated and are amenable to culture in a chemically defined hydrogel. hPOs therefore represent an abundant source of pancreas ductal cells that retain the characteristics of the tissue-of-origin, which opens up avenues for modelling diseases of the ductal epithelium and increasing understanding of human pancreas exocrine biology as well as for potentially producing insulin-secreting cells for the treatment of diabetes.

    Funded by: Cancer Research UK: C6946/A14492; Horizon 2020: ECH2020-668350; Wellcome Trust: 092096, 104151/Z/14/

    BMC developmental biology 2020;20;1;4

  • Transcription-coupled repair and mismatch repair contribute towards preserving genome integrity at mononucleotide repeat tracts.

    Georgakopoulos-Soares I, Koh G, Momen SE, Jiricny J, Hemberg M and Nik-Zainal S

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK.

    The mechanisms that underpin how insertions or deletions (indels) become fixed in DNA have primarily been ascribed to replication-related and/or double-strand break (DSB)-related processes. Here, we introduce a method to evaluate indels, orientating them relative to gene transcription. In so doing, we reveal a number of surprising findings: First, there is a transcriptional strand asymmetry in the distribution of mononucleotide repeat tracts in the reference human genome. Second, there is a strong transcriptional strand asymmetry of indels across 2,575 whole genome sequenced human cancers. We suggest that this is due to the activity of transcription-coupled nucleotide excision repair (TC-NER). Furthermore, TC-NER interacts with mismatch repair (MMR) under physiological conditions to produce strand bias. Finally, we show how insertions and deletions differ in their dependencies on these repair pathways. Our analytical approach reveals insights into the contribution of DNA repair towards indel mutagenesis in human cells.

    Funded by: Cancer Research UK (CRUK): C60100/A23916, C60100/A25274

    Nature communications 2020;11;1;1980

  • The evolutionary history of 2,658 cancers.

    Gerstung M, Jolly C, Leshchiner I, Dentro SC, Gonzalez S, Rosebrock D, Mitchell TJ, Rubanova Y, Anur P, Yu K, Tarabichi M, Deshwar A, Wintersinger J, Kleinheinz K, Vázquez-García I, Haase K, Jerman L, Sengupta S, Macintyre G, Malikic S, Donmez N, Livitz DG, Cmero M, Demeulemeester J, Schumacher S, Fan Y, Yao X, Lee J, Schlesner M, Boutros PC, Bowtell DD, Zhu H, Getz G, Imielinski M, Beroukhim R, Sahinalp SC, Ji Y, Peifer M, Markowetz F, Mustonen V, Yuan K, Wang W, Morris QD, PCAWG Evolution &amp; Heterogeneity Working Group, Spellman PT, Wedge DC, Van Loo P and PCAWG Consortium

    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.

    Cancer develops through a process of somatic evolution<sup>1,2</sup>. Sequencing data from a single biopsy represent a snapshot of this process that can reveal the timing of specific genomic aberrations and the changing influence of mutational processes<sup>3</sup>. Here, by whole-genome sequencing analysis of 2,658 cancers as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA)<sup>4</sup>, we reconstruct the life history and evolution of mutational processes and driver mutation sequences of 38 types of cancer. Early oncogenesis is characterized by mutations in a constrained set of driver genes, and specific copy number gains, such as trisomy 7 in glioblastoma and isochromosome 17q in medulloblastoma. The mutational spectrum changes significantly throughout tumour evolution in 40% of samples. A nearly fourfold diversification of driver genes and increased genomic instability are features of later stages. Copy number alterations often occur in mitotic crises, and lead to simultaneous gains of chromosomal segments. Timing analyses suggest that driver mutations often precede diagnosis by many years, if not decades. Together, these results determine the evolutionary trajectories of cancer, and highlight opportunities for early cancer detection.

    Funded by: Medical Research Council: MR/L016311; NCI NIH HHS: 1U24CA143799; NIH HHS: GM108308; NIMH NIH HHS: MH086633; Wellcome Trust: FC001202

    Nature 2020;578;7793;122-128

  • Visualizing variation within Global Pneumococcal Sequence Clusters (GPSCs) and country population snapshots to contextualize pneumococcal isolates.

    Gladstone RA, Lo SW, Goater R, Yeats C, Taylor B, Hadfield J, Lees JA, Croucher NJ, van Tonder AJ, Bentley LJ, Quah FX, Blaschke AJ, Pershing NL, Byington CL, Balaji V, Hryniewicz W, Sigauque B, Ravikumar KL, Almeida SCG, Ochoa TJ, Ho PL, du Plessis M, Ndlangisa KM, Cornick JE, Kwambana-Adams B, Benisty R, Nzenze SA, Madhi SA, Hawkins PA, Pollard AJ, Everett DB, Antonio M, Dagan R, Klugman KP, von Gottberg A, Metcalf BJ, Li Y, Beall BW, McGee L, Breiman RF, Aanensen DM, Bentley SD and The Global Pneumococcal Sequencing Consortium

    Parasites and microbes, Wellcome Sanger InstituteHinxton, UK.

    Knowledge of pneumococcal lineages, their geographic distribution and antibiotic resistance patterns, can give insights into global pneumococcal disease. We provide interactive bioinformatic outputs to explore such topics, aiming to increase dissemination of genomic insights to the wider community, without the need for specialist training. We prepared 12 country-specific phylogenetic snapshots, and international phylogenetic snapshots of 73 common Global Pneumococcal Sequence Clusters (GPSCs) previously defined using PopPUNK, and present them in Microreact. Gene presence and absence defined using Roary, and recombination profiles derived from Gubbins are presented in Phandango for each GPSC. Temporal phylogenetic signal was assessed for each GPSC using BactDating. We provide examples of how such resources can be used. In our example use of a country-specific phylogenetic snapshot we determined that serotype 14 was observed in nine unrelated genetic backgrounds in South Africa. The international phylogenetic snapshot of GPSC9, in which most serotype 14 isolates from South Africa were observed, highlights that there were three independent sub-clusters represented by South African serotype 14 isolates. We estimated from the GPSC9-dated tree that the sub-clusters were each established in South Africa during the 1980s. We show how recombination plots allowed the identification of a 20 kb recombination spanning the capsular polysaccharide locus within GPSC97. This was consistent with a switch from serotype 6A to 19A estimated to have occured in the 1990s from the GPSC97-dated tree. Plots of gene presence/absence of resistance genes (<i>tet</i>, <i>erm</i>, <i>cat</i>) across the GPSC23 phylogeny were consistent with acquisition of a composite transposon. We estimated from the GPSC23-dated tree that the acquisition occurred between 1953 and 1975. Finally, we demonstrate the assignment of GPSC31 to 17 externally generated pneumococcal serotype 1 assemblies from Utah via Pathogenwatch. Most of the Utah isolates clustered within GPSC31 in a USA-specific clade with the most recent common ancestor estimated between 1958 and 1981. The resources we have provided can be used to explore to data, test hypothesis and generate new hypotheses. The accessible assignment of GPSCs allows others to contextualize their own collections beyond the data presented here.

    Microbial genomics 2020

  • Insights into the intracellular localization, protein associations and artemisinin resistance properties of Plasmodium falciparum K13.

    Gnädig NF, Stokes BH, Edwards RL, Kalantarov GF, Heimsch KC, Kuderjavy M, Crane A, Lee MCS, Straimer J, Becker K, Trakht IN, Odom John AR, Mok S and Fidock DA

    Department of Microbiology & Immunology, Columbia University Irving Medical Center, New York, NY, United States of America.

    The emergence of artemisinin (ART) resistance in Plasmodium falciparum intra-erythrocytic parasites has led to increasing treatment failure rates with first-line ART-based combination therapies in Southeast Asia. Decreased parasite susceptibility is caused by K13 mutations, which are associated clinically with delayed parasite clearance in patients and in vitro with an enhanced ability of ring-stage parasites to survive brief exposure to the active ART metabolite dihydroartemisinin. Herein, we describe a panel of K13-specific monoclonal antibodies and gene-edited parasite lines co-expressing epitope-tagged versions of K13 in trans. By applying an analytical quantitative imaging pipeline, we localize K13 to the parasite endoplasmic reticulum, Rab-positive vesicles, and sites adjacent to cytostomes. These latter structures form at the parasite plasma membrane and traffic hemoglobin to the digestive vacuole wherein artemisinin-activating heme moieties are released. We also provide evidence of K13 partially localizing near the parasite mitochondria upon treatment with dihydroartemisinin. Immunoprecipitation data generated with K13-specific monoclonal antibodies identify multiple putative K13-associated proteins, including endoplasmic reticulum-resident molecules, mitochondrial proteins, and Rab GTPases, in both K13 mutant and wild-type isogenic lines. We also find that mutant K13-mediated resistance is reversed upon co-expression of wild-type or mutant K13. These data help define the biological properties of K13 and its role in mediating P. falciparum resistance to ART treatment.

    Funded by: NIAID NIH HHS: R01 AI103280, R01 AI109023, R21 AI144472

    PLoS pathogens 2020;16;4;e1008482

  • Epstein-Barr virus reactivation in sepsis due to community-acquired pneumonia is associated with increased morbidity and an immunosuppressed host transcriptomic endotype.

    Goh C, Burnham KL, Ansari MA, de Cesare M, Golubchik T, Hutton P, Overend LE, Davenport EE, Hinds CJ, Bowden R and Knight JC

    Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.

    Epstein-Barr virus (EBV) reactivation is common in sepsis patients but the extent and nature of this remains unresolved. We sought to determine the incidence and correlates of EBV-positivity in a large sepsis cohort. We also hypothesised that EBV reactivation would be increased in patients in whom relative immunosuppression was the major feature of their sepsis response. To identify such patients we aimed to use knowledge of sepsis response subphenotypes based on transcriptomic studies of circulating leukocytes, specifically patients with a Sepsis Response Signature endotype (SRS1) that we have previously shown to be associated with increased mortality and features of immunosuppression. We assayed EBV from the plasma of intensive care unit (ICU) patients with sepsis due to community-acquired pneumonia. In total 730 patients were evaluated by targeted metagenomics (n = 573 patients), digital droplet PCR (n = 565), or both (n = 408). We had previously analysed gene expression in peripheral blood leukocytes for a subset of individuals (n = 390). We observed a 37% incidence of EBV-positivity. EBV reactivation was associated with longer ICU stay (12.9 vs 9.2 days; p = 0.004) and increased organ failure (day 1 SOFA score 6.9 vs 5.9; p = 0.00011). EBV reactivation was associated with the relatively immunosuppressed SRS1 endotype (p = 0.014) and differential expression of a small number of biologically relevant genes. These findings are consistent with the hypothesis that viral reactivation in sepsis is a consequence of immune compromise and is associated with increasing severity of illness although further mechanistic studies are required to definitively illustrate cause and effect.

    Scientific reports 2020;10;1;9838

  • Genomic evolution of Neisseria gonorrhoeae since the preantibiotic era (1928-2013): antimicrobial use/misuse selects for resistance and drives evolution.

    Golparian D, Harris SR, Sánchez-Busó L, Hoffmann S, Shafer WM, Bentley SD, Jensen JS and Unemo M

    WHO Collaborating Centre for Gonorrhoea and other Sexually Transmitted Infections, Department of Laboratory Medicine, Microbiology, Faculty of Medicine and Health, Örebro University, SE-710 85, Örebro, Sweden.

    Background: Multidrug-resistant Neisseria gonorrhoeae strains are prevalent, threatening gonorrhoea treatment globally, and understanding of emergence, evolution, and spread of antimicrobial resistance (AMR) in gonococci remains limited. We describe the genomic evolution of gonococci and their AMR, related to the introduction of antimicrobial therapies, examining isolates from 1928 (preantibiotic era) to 2013 in Denmark. This is, to our knowledge, the oldest gonococcal collection globally.

    Methods: Lyophilised isolates were revived and examined using Etest (18 antimicrobials) and whole-genome sequencing (WGS). Quality-assured genome sequences were obtained for 191 viable and 40 non-viable isolates and analysed with multiple phylogenomic approaches.

    Results: Gonococcal AMR, including an accumulation of multiple AMR determinants, started to emerge particularly in the 1950s-1970s. By the twenty-first century, resistance to most antimicrobials was common. Despite that some AMR determinants affect many physiological functions and fitness, AMR determinants were mainly selected by the use/misuse of gonorrhoea therapeutic antimicrobials. Most AMR developed in strains belonging to one multidrug-resistant (MDR) clade with close to three times higher genomic mutation rate. Modern N. gonorrhoeae was inferred to have emerged in the late-1500s and its genome became increasingly conserved over time.

    Conclusions: WGS of gonococci from 1928 to 2013 showed that no AMR determinants, except penB, were in detectable frequency before the introduction of gonorrhoea therapeutic antimicrobials. The modern gonococcus is substantially younger than previously hypothesized and has been evolving into a more clonal species, driven by the use/misuse of antimicrobials. The MDR gonococcal clade should be further investigated for early detection of strains with predispositions to develop and maintain MDR and for initiation of public health interventions.

    Funded by: BLRD VA: IK6 BX004470; Foundation for Medical Research at Örebro University Hospital: 2012; Wellcome Trust: 098051

    BMC genomics 2020;21;1;116

  • Reply.

    Goode EC, Hirschfield GM and Rushbrook SM

    Norfolk and Norwich University Hospital, Norwich, United Kingdom.

    Hepatology (Baltimore, Md.) 2020;71;1;399-400

  • Association between bacterial homoplastic variants and radiological pathology in tuberculosis.

    Grandjean L, Monteserin J, Gilman R, Pauschardt J, Rokadiya S, Bonilla C, Ritacco V, Vidal JR, Parkhill J, Peacock S, Moore DA and Balloux F

    Department of Medicine, Imperial College London, London, UK

    Background: Understanding how pathogen genetic factors contribute to pathology in TB could enable tailored treatments to the most pathogenic and infectious strains. New strategies are needed to control drug-resistant TB, which requires longer and costlier treatment. We hypothesised that the severity of radiological pathology on the chest radiograph in TB disease was associated with variants arising independently, multiple times (homoplasies) in the <i>Mycobacterium tuberculosis</i> genome.

    Methods: We performed whole genome sequencing (Illumina HiSeq2000 platform) on <i>M. tuberculosis</i> isolates from 103 patients with drug-resistant TB in Lima between 2010 and 2013. Variables including age, sex, HIV status, previous TB disease and the percentage of lung involvement on the pretreatment chest radiograph were collected from health posts of the national TB programme. Genomic variants were identified using standard pipelines.

    Results: Two mutations were significantly associated with more widespread radiological pathology in a multivariable regression model controlling for confounding variables (Rv2828c.141, RR 1.3, 95% CI 1.21 to 1.39, p<0.01; rpoC.1040 95% CI 1.77 to 2.16, RR 1.9, p<0.01). The rpoB.450 mutation was associated with less extensive radiological pathology (RR 0.81, 95% CI 0.69 to 0.94, p=0.03), suggestive of a bacterial fitness cost for this mutation in vivo. Patients with a previous episode of TB disease and those between 10 and 30 years of age also had significantly increased radiological pathology.

    Conclusions: This study is the first to compare the <i>M. tuberculosis</i> genome to radiological pathology on the chest radiograph. We identified two variants significantly positively associated with more widespread radiological pathology and one with reduced pathology. Prospective studies are warranted to determine whether mutations associated with increased pathology also predict the spread of drug-resistant TB.

    Thorax 2020

  • Evolution of the insecticide target Rdl in African Anopheles is driven by interspecific and interkaryotypic introgression.

    Grau-Bové X, Tomlinson S, O'Reilly AO, Harding NJ, Miles A, Kwiatkowski D, Donnelly MJ, Weetman D and Anopheles gambiae 1000 Genomes Consortium

    Department of Vector Biology, Liverpool School of Tropical Medicine, Liverpool, United Kingdom.

    The evolution of insecticide resistance mechanisms in natural populations of Anopheles malaria vectors is a major public health concern across Africa. Using genome sequence data, we study the evolution of resistance mutations in the resistance to dieldrin locus (Rdl), a GABA receptor targeted by several insecticides, but most notably by the long-discontinued cyclodiene, dieldrin. The two Rdl resistance mutations (296G and 296S) spread across West and Central African Anopheles via two independent hard selective sweeps that included likely compensatory nearby mutations, and were followed by a rare combination of introgression across species (from A. gambiae and A. arabiensis to A. coluzzii) and across non-concordant karyotypes of the 2La chromosomal inversion. Rdl resistance evolved in the 1950s as the first known adaptation to a large-scale insecticide-based intervention, but the evolutionary lessons from this system highlight contemporary and future dangers for management strategies designed to combat development of resistance in malaria vectors.

    Molecular biology and evolution 2020

  • Personalized and graph genomes reveal missing signal in epigenomic data.

    Groza C, Kwan T, Soranzo N, Pastinen T and Bourque G

    Human Genetics, McGill University, Montreal, QC, Canada.

    Background: Epigenomic studies that use next generation sequencing experiments typically rely on the alignment of reads to a reference sequence. However, because of genetic diversity and the diploid nature of the human genome, we hypothesize that using a generic reference could lead to incorrectly mapped reads and bias downstream results.

    Results: We show that accounting for genetic variation using a modified reference genome or a de novo assembled genome can alter histone H3K4me1 and H3K27ac ChIP-seq peak calls either by creating new personal peaks or by the loss of reference peaks. Using permissive cutoffs, modified reference genomes are found to alter approximately 1% of peak calls while de novo assembled genomes alter up to 5% of peaks. We also show statistically significant differences in the amount of reads observed in regions associated with the new, altered, and unchanged peaks. We report that short insertions and deletions (indels), followed by single nucleotide variants (SNVs), have the highest probability of modifying peak calls. We show that using a graph personalized genome represents a reasonable compromise between modified reference genomes and de novo assembled genomes. We demonstrate that altered peaks have a genomic distribution typical of other peaks.

    Conclusions: Analyzing epigenomic datasets with personalized and graph genomes allows the recovery of new peaks enriched for indels and SNVs. These altered peaks are more likely to differ between individuals and, as such, could be relevant in the study of various human phenotypes.

    Genome biology 2020;21;1;124

  • Identifying and removing haplotypic duplication in primary genome assemblies.

    Guan D, McCarthy SA, Wood J, Howe K, Wang Y and Durbin R

    Center for Bioinformatics, Harbin Institute of Technology, Harbin, China.

    Motivation: Rapid development in long read sequencing and scaffolding technologies is accelerating the production of reference-quality assemblies for large eukaryotic genomes. However, haplotype divergence in regions of high heterozygosity often results in assemblers creating two copies rather than one copy of a region, leading to breaks in contiguity and compromising downstream steps such as gene annotation. Several tools have been developed to resolve this problem. However, they either only focus on removing contained duplicate regions, also known as haplotigs, or fail to use all the relevant information and hence make errors.

    Results: Here we present a novel tool "purge_dups" that uses sequence similarity and read depth to automatically identify and remove both haplotigs and heterozygous overlaps. In comparison with current tools, we demonstrate that purge_dups can reduce heterozygous duplication and increase assembly continuity while maintaining completeness of the primary assembly. Moreover, purge_dups is fully automatic and can easily be integrated into assembly pipelines.

    Availability: The source code is written in C and is available at

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Funded by: Wellcome Trust: 207492/Z/17/Z

    Bioinformatics (Oxford, England) 2020

  • A Genetic History of the Near East from an aDNA Time Course Sampling Eight Points in the Past 4,000 Years.

    Haber M, Nassar J, Almarri MA, Saupe T, Saag L, Griffith SJ, Doumet-Serhal C, Chanteau J, Saghieh-Beydoun M, Xue Y, Scheib CL and Tyler-Smith C

    Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham B15 2TT, UK; Centre for Computational Biology, University of Birmingham, Birmingham B15 2TT, UK; Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK. Electronic address:

    The Iron and Classical Ages in the Near East were marked by population expansions carrying cultural transformations that shaped human history, but the genetic impact of these events on the people who lived through them is little-known. Here, we sequenced the whole genomes of 19 individuals who each lived during one of four time periods between 800 BCE and 200 CE in Beirut on the Eastern Mediterranean coast at the center of the ancient world's great civilizations. We combined these data with published data to traverse eight archaeological periods and observed any genetic changes as they arose. During the Iron Age (∼1000 BCE), people with Anatolian and South-East European ancestry admixed with people in the Near East. The region was then conquered by the Persians (539 BCE), who facilitated movement exemplified in Beirut by an ancient family with Egyptian-Lebanese admixed members. But the genetic impact at a population level does not appear until the time of Alexander the Great (beginning 330 BCE), when a fusion of Asian and Near Easterner ancestry can be seen, paralleling the cultural fusion that appears in the archaeological records from this period. The Romans then conquered the region (31 BCE) but had little genetic impact over their 600 years of rule. Finally, during the Ottoman rule (beginning 1516 CE), Caucasus-related ancestry penetrated the Near East. Thus, in the past 4,000 years, three limited admixture events detectably impacted the population, complementing the historical records of this culturally complex region dominated by the elite with genetic insights from the general population.

    American journal of human genetics 2020

  • Muzlifah Haniffa-a new era for collaborative and supportive medical research.

    Haniffa M

    Biosciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK.

    Nature medicine 2020;26;2;155

  • WormBase: a modern Model Organism Information Resource.

    Harris TW, Arnaboldi V, Cain S, Chan J, Chen WJ, Cho J, Davis P, Gao S, Grove CA, Kishore R, Lee RYN, Muller HM, Nakamura C, Nuin P, Paulini M, Raciti D, Rodgers FH, Russell M, Schindelman G, Auken KV, Wang Q, Williams G, Wright AJ, Yook K, Howe KL, Schedl T, Stein L and Sternberg PW

    Informatics and Bio-computing Platform, Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada.

    WormBase ( is a mature Model Organism Information Resource supporting researchers using the nematode Caenorhabditis elegans as a model system for studies across a broad range of basic biological processes. Toward this mission, WormBase efforts are arranged in three primary facets: curation, user interface and architecture. In this update, we describe progress in each of these three areas. In particular, we discuss the status of literature curation and recently added data, detail new features of the web interface and options for users wishing to conduct data mining workflows, and discuss our efforts to build a robust and scalable architecture by leveraging commercial cloud offerings. We conclude with a description of WormBase's role as a founding member of the nascent Alliance of Genome Resources.

    Funded by: Medical Research Council: MR/S000453/1

    Nucleic acids research 2020;48;D1;D762-D767

  • The role of haematological traits in risk of ischaemic stroke and its subtypes.

    Harshfield EL, Sims MC, Traylor M, Ouwehand WH and Markus HS

    Stroke Research Group, Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK.

    Thrombosis and platelet activation play a central role in stroke pathogenesis, and antiplatelet and anticoagulant therapies are central to stroke prevention. However, whether haematological traits contribute equally to all ischaemic stroke subtypes is uncertain. Furthermore, identification of associations with new traits may offer novel treatment opportunities. The aim of this research was to ascertain causal relationships between a wide range of haematological traits and ischaemic stroke and its subtypes. We obtained summary statistics from 27 published genome-wide association studies of haematological traits involving over 375 000 individuals, and genetic associations with stroke from the MEGASTROKE Consortium (n = 67 000 stroke cases). Using two-sample Mendelian randomization we analysed the association of genetically elevated levels of 36 blood cell traits (platelets, mature/immature red cells, and myeloid/lymphoid/compound white cells) and 49 haemostasis traits (including clotting cascade factors and markers of platelet function) with risk of developing ischaemic (AIS), cardioembolic (CES), large artery (LAS), and small vessel stroke (SVS). Several factors on the intrinsic clotting pathway were significantly associated (P < 3.85 × 10-4) with CES and LAS, but not with SVS (e.g. reduced factor VIII activity with AIS/CES/LAS; raised factor VIII antigen with AIS/CES; and increased factor XI activity with AIS/CES). On the common pathway, increased gamma (γ') fibrinogen was significantly associated with AIS/CES. Furthermore, elevated plateletcrit was significantly associated with AIS/CES, eosinophil percentage of white cells with LAS, and thrombin-activatable fibrinolysis inhibitor activation peptide antigen with AIS. We also conducted a follow-up analysis in UK Biobank, which showed that amongst individuals with atrial fibrillation, those with genetically lower levels of factor XI are at reduced risk of AIS compared to those with normal levels of factor XI. These results implicate components of the intrinsic and common pathways of the clotting cascade, as well as several other haematological traits, in the pathogenesis of CES and possibly LAS, but not SVS. The lack of associations with SVS suggests thrombosis may be less important for this stroke subtype. Plateletcrit and factor XI are potentially tractable new targets for secondary prevention of ischaemic stroke, while factor VIII and γ' fibrinogen require further population-based studies to ascertain their possible aetiological roles.

    Funded by: Medical Research Council: MC_PC_17228, MC_QA137853

    Brain : a journal of neurology 2020;143;1;210-221

  • Souporcell: robust clustering of single-cell RNA-seq data by genotype without reference genotypes.

    Heaton H, Talman AM, Knights A, Imaz M, Gaffney DJ, Durbin R, Hemberg M and Lawniczak MKN

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

    Methods to deconvolve single-cell RNA-sequencing (scRNA-seq) data are necessary for samples containing a mixture of genotypes, whether they are natural or experimentally combined. Multiplexing across donors is a popular experimental design that can avoid batch effects, reduce costs and improve doublet detection. By using variants detected in scRNA-seq reads, it is possible to assign cells to their donor of origin and identify cross-genotype doublets that may have highly similar transcriptional profiles, precluding detection by transcriptional profile. More subtle cross-genotype variant contamination can be used to estimate the amount of ambient RNA. Ambient RNA is caused by cell lysis before droplet partitioning and is an important confounder of scRNA-seq analysis. Here we develop souporcell, a method to cluster cells using the genetic variants detected within the scRNA-seq reads. We show that it achieves high accuracy on genotype clustering, doublet detection and ambient RNA estimation, as demonstrated across a range of challenging scenarios.

    Funded by: British Heart Foundation (BHF): RG/13/13/30194; RG/18/13/33946; Wellcome Trust (Wellcome): 206194/Z/17/Z

    Nature methods 2020

  • A microsporidian impairs Plasmodium falciparum transmission in Anopheles arabiensis mosquitoes.

    Herren JK, Mbaisi L, Mararo E, Makhulu EE, Mobegi VA, Butungi H, Mancini MV, Oundo JW, Teal ET, Pinaud S, Lawniczak MKN, Jabara J, Nattoh G and Sinkins SP

    International Centre of Insect Physiology and Ecology (ICIPE), Kasarani, Nairobi, Kenya.

    A possible malaria control approach involves the dissemination in mosquitoes of inherited symbiotic microbes to block Plasmodium transmission. However, in the Anopheles gambiae complex, the primary African vectors of malaria, there are limited reports of inherited symbionts that impair transmission. We show that a vertically transmitted microsporidian symbiont (Microsporidia MB) in the An. gambiae complex can impair Plasmodium transmission. Microsporidia MB is present at moderate prevalence in geographically dispersed populations of An. arabiensis in Kenya, localized to the mosquito midgut and ovaries, and is not associated with significant reductions in adult host fecundity or survival. Field-collected Microsporidia MB infected An. arabiensis tested negative for P. falciparum gametocytes and, on experimental infection with P. falciparum, sporozoites aren't detected in Microsporidia MB infected mosquitoes. As a microbe that impairs Plasmodium transmission that is non-virulent and vertically transmitted, Microsporidia MB could be investigated as a strategy to limit malaria transmission.

    Funded by: RCUK | Biotechnology and Biological Sciences Research Council (BBSRC): BB/R005338/1, sub-grant AV/PP015/1; Wellcome Trust (Wellcome): 107372, 200274

    Nature communications 2020;11;1;2187

  • Malat1 Suppresses Immunity to Infection through Promoting Expression of Maf and IL-10 in Th Cells.

    Hewitson JP, West KA, James KR, Rani GF, Dey N, Romano A, Brown N, Teichmann SA, Kaye PM and Lagos D

    York Biomedical Research Institute, University of York, York, YO10 5DD Yorkshire, United Kingdom.

    Despite extensive mapping of long noncoding RNAs in immune cells, their function in vivo remains poorly understood. In this study, we identify over 100 long noncoding RNAs that are differentially expressed within 24 h of Th1 cell activation. Among those, we show that suppression of <i>Malat1</i> is a hallmark of CD4<sup>+</sup> T cell activation, but its complete deletion results in more potent immune responses to infection. This is because <i>Malat1<sup>-/-</sup></i> Th1 and Th2 cells express lower levels of the immunosuppressive cytokine IL-10. In vivo, the reduced CD4<sup>+</sup> T cell IL-10 expression in <i>Malat1<sup>-/-</sup></i> mice underpins enhanced immunity and pathogen clearance in experimental visceral leishmaniasis (<i>Leishmania donovani</i>) but more severe disease in a model of malaria (<i>Plasmodium chabaudi chabaudi</i> AS). Mechanistically, <i>Malat1</i> regulates IL-10 through enhancing expression of Maf, a key transcriptional regulator of <i>IL-10</i> Maf expression correlates with <i>Malat1</i> in single Ag-specific Th cells from <i>P. chabaudi chabaudi</i> AS-infected mice and is downregulated in <i>Malat1<sup>-/-</sup></i> Th1 and Th2 cells. The <i>Malat1</i> RNA is responsible for these effects, as antisense oligonucleotide-mediated inhibition of <i>Malat1</i> also suppresses Maf and IL-10 levels. Our results reveal that through promoting expression of the Maf/IL-10 axis in effector Th cells, <i>Malat1</i> is a nonredundant regulator of mammalian immunity.

    Journal of immunology (Baltimore, Md. : 1950) 2020

  • Type II and type IV toxin-antitoxin systems show different evolutionary patterns in the global Klebsiella pneumoniae population.

    Horesh G, Fino C, Harms A, Dorman MJ, Parts L, Gerdes K, Heinz E and Thomson NR

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1RQ, UK.

    The Klebsiella pneumoniae species complex includes important opportunistic pathogens which have become public health priorities linked to major hospital outbreaks and the recent emergence of multidrug-resistant hypervirulent strains. Bacterial virulence and the spread of multidrug resistance have previously been linked to toxin-antitoxin (TA) systems. TA systems encode a toxin that disrupts essential cellular processes, and a cognate antitoxin which counteracts this activity. Whilst associated with the maintenance of plasmids, they also act in bacterial immunity and antibiotic tolerance. However, the evolutionary dynamics and distribution of TA systems in clinical pathogens are not well understood. Here, we present a comprehensive survey and description of the diversity of TA systems in 259 clinically relevant genomes of K. pneumoniae. We show that TA systems are highly prevalent with a median of 20 loci per strain. Importantly, these toxins differ substantially in their distribution patterns and in their range of cognate antitoxins. Classification along these properties suggests different roles of TA systems and highlights the association and co-evolution of toxins and antitoxins.

    Nucleic acids research 2020

  • Reply to Jensen and Kowalik: Consideration of mixed infections is central to understanding HCMV intrahost diversity.

    Houldcroft CJ, Cudini J, Goldstein RA and Breuer J

    Department of Medicine, Addenbrookes Hospital, Cambridge University, Cambridge CB2 0QQ, United Kingdom.

    Proceedings of the National Academy of Sciences of the United States of America 2020;117;2;818-819

  • Differential regulation of the immune system in a brain-liver-fatsorgan network during short term fasting.

    Huang SSY, Makhlouf M, AbouMoussa EH, Ruiz Tejada Segura ML, Mathew LS, Wang K, Leung MC, Chaussabel D, Logan DW, Scialdone A, Garand M and Saraiva LR

    Sidra Medicine, PO Box 26999, Doha, Qatar. Electronic address:

    Background: Different fasting regimens are known to promote health, mitigate chronic immunological disorders, and improve age-related pathophysiological parameters in animals and humans. Indeed, several clinical trials are currently ongoing using fasting as a potential therapy for a wide range of conditions. Fasting alters metabolism by acting as a reset for energy homeostasis, but the molecular mechanisms underlying the beneficial effects of short-term fasting (STF) are still not well understood, particularly at the systems or multi-organ level.

    Methods: We performed RNA-sequencing in nine different organs from mice fed ad libitum (0 hours), or subjected to five different times of fasting (2-22 hours). We applied a combination of multivariate analysis, differential expression analysis, gene ontology and network analysis for an in-depth understanding of the multi-organ transcriptome. We utilized literature mining solutions, LitLab™ and Gene Retriever™, to identify the biological and biochemical terms significantly associated with our experimental gene set which provide additional support and meaning to the experimentally derived gene and protein data.

    Results: We cataloged the transcriptional dynamics within and between organs during STF and discovered differential temporal effects of STF among organs. Using gene ontology enrichment analysis, we identified an organ network sharing 37 common biological pathways perturbed by STF. This network incorporates the brain, liver, interscapular brown adipose tissue, and posterior-subcutaneous white adipose tissue, hence we named it the brain-liver-fats organ network. Using Reactome pathways analysis, we identified the immune system, dominated by T cell regulation processes, as a central and prominent target of systemic modulations during STF in this organ network. The changes we identified in specific immune components point to the priming of adaptive immunity and parallel the fine-tuning of innate immune signaling.

    Conclusions: Our study provides a comprehensive multi-organ transcriptomic profiling of mice subjected to multiple periods of STF and adds new insights into the molecular modulators involved in the systemic immuno-transcriptomic changes that occur during short-term energy loss.

    Molecular metabolism 2020;101038

  • Pan-cancer analysis of whole genomes.

    ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium

    Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale<sup>1-3</sup>. Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4-5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter<sup>4</sup>; identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation<sup>5,6</sup>; analyses timings and patterns of tumour evolution<sup>7</sup>; describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity<sup>8,9</sup>; and evaluates a range of more-specialized features of cancer genomes<sup>8,10-18</sup>.

    Funded by: Medical Research Council: MC_UU_00007/16, MC_UU_12022/2; NCI NIH HHS: U24 CA211000; NHGRI NIH HHS: R01 HG007069; Wellcome Trust: 088177

    Nature 2020;578;7793;82-93

  • Functional analysis of candidate genes from genome-wide association studies of hearing.

    Ingham NJ, Rook V, Di Domenico F, James E, Lewis MA, Girotto G, Buniello A and Steel KP

    Wolfson Centre for Age-Related Diseases, King's College London, London, SE1 1UL, UK; Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK. Electronic address:

    The underlying causes of age-related hearing loss (ARHL) are not well understood, but it is clear from heritability estimates that genetics plays a role in addition to environmental factors. Genome-wide association studies (GWAS) in human populations can point to candidate genes that may be involved in ARHL, but follow-up analysis is needed to assess the role of these genes in the disease process. Some genetic variants may contribute a small amount to a disease, while other variants may have a large effect size, but the genetic architecture of ARHL is not yet well-defined. In this study, we asked if a set of 17 candidate genes highlighted by early GWAS reports of ARHL have detectable effects on hearing by knocking down expression levels of each gene in the mouse and analysing auditory function. We found two of the genes have an impact on hearing. Mutation of Dclk1 led to late-onset progressive increase in ABR thresholds and the A430005L14Rik (C1orf174) mutants showed worse recovery from noise-induced damage than controls. We did not detect any abnormal responses in the remaining 15 mutant lines either in thresholds or from our battery of suprathreshold ABR tests, and we discuss the possible reasons for this.

    Hearing research 2020;387;107879

  • Aberrant cell migration contributes to defective airway epithelial repair in childhood wheeze.

    Iosifidis T, Sutanto EN, Buckley AG, Coleman L, Gill EE, Lee AH, Ling KM, Hillas J, Looi K, Garratt LW, Martinovich KM, Shaw NC, Montgomery ST, Kicic-Starcevich E, Karpievitch YV, Le Souëf P, Laing IA, Vijayasekaran S, Lannigan FJ, Rigby PJ, Hancock RE, Knight DA, Stick SM, Kicic A, Western Australian Epithelial Research Program (WAERP) and Australian Respiratory Epithelium Consortium (AusREC)

    Division of Pediatrics and.

    Abnormal wound repair has been observed in the airway epithelium of patients with chronic respiratory diseases, including asthma. Therapies focusing on repairing vulnerable airways, particularly in early life, present a potentially novel treatment strategy. We report defective lower airway epithelial cell repair to strongly associate with common pre-school-aged and school-aged wheezing phenotypes, characterized by aberrant migration patterns and reduced integrin α5β1 expression. Next generation sequencing identified the PI3K/Akt pathway as the top upstream transcriptional regulator of integrin α5β1, where Akt activation enhanced repair and integrin α5β1 expression in primary cultures from children with wheeze. Conversely, inhibition of PI3K/Akt signaling in primary cultures from children without wheeze reduced α5β1 expression and attenuated repair. Importantly, the FDA-approved drug celecoxib - and its non-COX2-inhibiting analogue, dimethyl-celecoxib - stimulated the PI3K/Akt-integrin α5β1 axis and restored airway epithelial repair in cells from children with wheeze. When compared with published clinical data sets, the identified transcriptomic signature was also associated with viral-induced wheeze exacerbations highlighting the clinical potential of such therapy. Collectively, these results identify airway epithelial restitution via targeting the PI3K-integrin α5β1 axis as a potentially novel therapeutic avenue for childhood wheeze and asthma. We propose that the next step in the therapeutic development process should be a proof-of-concept clinical trial, since relevant animal models to test the crucial underlying premise are unavailable.

    JCI insight 2020;5;7

  • Germs and germlines: how "public" B-cell clones evolve in the gut.

    James KR and King HW

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK.

    Chen et al. describe how B-cell clones observed in the gut of many different individuals (recurrent or "public" clonotypes) are shaped by the combined influences of common microbial antigens and underlying genomic recombination biases.

    Funded by: European Research Council: 646794; Sir Henry Wellcome Postdoctoral Fellowship: 213555/Z/18/Z

    Immunology and cell biology 2020

  • Increasing incidence of group B streptococcus neonatal infections in the Netherlands is associated with clonal expansion of CC17 and CC23.

    Jamrozy D, Bijlsma MW, de Goffau MC, van de Beek D, Kuijpers TW, Parkhill J, van der Ende A and Bentley SD

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

    Group B streptococcus (GBS) is the leading cause of neonatal invasive disease worldwide. In the Netherlands incidence of the disease increased despite implementation of preventive guidelines. We describe a genomic analysis of 1345 GBS isolates from neonatal (age 0-89 days) invasive infections in the Netherlands reported between 1987 and 2016. Most isolates clustered into one of five major lineages: CC17 (39%), CC19 (25%), CC23 (18%), CC10 (9%) and CC1 (7%). There was a significant rise in the number of infections due to isolates from CC17 and CC23. Phylogenetic clustering analysis revealed that this was caused by expansion of specific sub-lineages, designated CC17-A1, CC17-A2 and CC23-A1. Dating of phylogenetic trees estimated that these clones diverged in the 1960s/1970s, representing historical rather than recently emerged clones. For CC17-A1 the expansion correlated with acquisition of a new phage, carrying gene encoding a putative cell-surface protein. Representatives of CC17-A1, CC17-A2 and CC23-A1 clones were identified in datasets from other countries demonstrating their global distribution.

    Funded by: Wellcome Trust (Wellcome): 098051; ZonMw (Netherlands Organisation for Health Research and Development): 016.116.358

    Scientific reports 2020;10;1;9539

  • Reconstructing human DC, monocyte and macrophage development in utero using single cell technologies.

    Jardine L and Haniffa M

    Biosciences Institute, Newcastle University, Faculty of Medical Sciences, Newcastle upon Tyne, NE2 4HH, UK; Department of Haematology, Newcastle Hospitals NHS Foundation Trust, Newcastle upon Tyne NE2 4LP, UK. Electronic address:

    The repertoire of dendritic cells (DCs), monocytes and macrophages in adult humans is diverse and we are appreciating this to a greater extent as high throughput methods, such a single-cell RNA sequencing, become widely adopted and scalable. This powerful lens of analysis is also beginning to shed light on prenatal immunology, allowing us to chart the emergence, tissue distribution and developmental regulation of DCs, monocytes and macrophages during early human life. In this review, we will integrate recent insights from studies of the developing immune system into our understanding of adult DC, monocyte and macrophage organization, illustrating where insights from early life both affirm and challenge current understanding.

    Molecular immunology 2020;123;1-6

  • A deep learning system accurately classifies primary and metastatic cancers using passenger mutation patterns.

    Jiao W, Atwal G, Polak P, Karlic R, Cuppen E, PCAWG Tumor Subtypes and Clinical Translation Working Group, Danyi A, de Ridder J, van Herpen C, Lolkema MP, Steeghs N, Getz G, Morris Q, Stein LD and PCAWG Consortium

    Ontario Institute for Cancer Research, Toronto, ON, M5G0A3, Canada.

    In cancer, the primary tumour's organ of origin and histopathology are the strongest determinants of its clinical behaviour, but in 3% of cases a patient presents with a metastatic tumour and no obvious primary. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we train a deep learning classifier to predict cancer type based on patterns of somatic passenger mutations detected in whole genome sequencing (WGS) of 2606 tumours representing 24 common cancer types produced by the PCAWG Consortium. Our classifier achieves an accuracy of 91% on held-out tumor samples and 88% and 83% respectively on independent primary and metastatic samples, roughly double the accuracy of trained pathologists when presented with a metastatic tumour without knowledge of the primary. Surprisingly, adding information on driver mutations reduced accuracy. Our results have clinical applicability, underscore how patterns of somatic passenger mutations encode the state of the cell of origin, and can inform future strategies to detect the source of circulating tumour DNA.

    Nature communications 2020;11;1;728

  • Macrophage metabolic reprogramming presents a therapeutic target in lupus nephritis.

    Jing C, Castro-Dopico T, Richoz N, Tuong ZK, Ferdinand JR, Lok LSC, Loudon KW, Banham GD, Mathews RJ, Cader Z, Fitzpatrick S, Bashant KR, Kaplan MJ, Kaser A, Johnson RS, Murphy MP, Siegel RM and Clatworthy MR

    Molecular Immunity Unit, Department of Medicine, Medical Research Council Laboratory of Molecular Biology, University of Cambridge, Cambridge CB2 0QH, United Kingdom.

    IgG antibodies cause inflammation and organ damage in autoimmune diseases such as systemic lupus erythematosus (SLE). We investigated the metabolic profile of macrophages isolated from inflamed tissues in immune complex (IC)-associated diseases, including SLE and rheumatoid arthritis, and following IgG Fcγ receptor cross-linking. We found that human and mouse macrophages undergo a switch to glycolysis in response to IgG IC stimulation, mirroring macrophage metabolic changes in inflamed tissue in vivo. This metabolic reprogramming was required to generate a number of proinflammatory mediators, including IL-1β, and was dependent on mTOR and hypoxia-inducible factor (HIF)1α. Inhibition of glycolysis, or genetic depletion of HIF1α, attenuated IgG IC-induced activation of macrophages in vitro, including primary human kidney macrophages. In vivo, glycolysis inhibition led to a reduction in kidney macrophage IL-1β and reduced neutrophil recruitment in a murine model of antibody-mediated nephritis. Together, our data reveal the molecular mechanisms underpinning FcγR-mediated metabolic reprogramming in macrophages and suggest a therapeutic strategy for autoantibody-induced inflammation, including lupus nephritis.

    Proceedings of the National Academy of Sciences of the United States of America 2020

  • The Nature and Extent of Plasmid Variation in Chlamydia trachomatis.

    Jones CA, Hadfield J, Thomson NR, Cleary DW, Marsh P, Clarke IN and O'Neill CE

    Clinical and Experimental Sciences, Faculty of Medicine, University of Southampton, Southampton General Hospital, Southampton SO166YD, UK.

    <i>Chlamydia trachomatis</i> is an obligate intracellular pathogen of humans, causing both the sexually transmitted infection, chlamydia, and the most common cause of infectious blindness, trachoma. The majority of sequenced <i>C. trachomatis</i> clinical isolates carry a 7.5-Kb plasmid, and it is becoming increasingly evident that this is a key determinant of pathogenicity. The discovery of the Swedish New Variant and the more recent Finnish variant highlight the importance of understanding the natural extent of variation in the plasmid. In this study we analysed 524 plasmid sequences from publicly available whole-genome sequence data. Single nucleotide polymorphisms (SNP) in each of the eight coding sequences (CDS) were identified and analysed. There were 224 base positions out of a total 7550 bp that carried a SNP, which equates to a SNP rate of 2.97%, nearly three times what was previously calculated. After normalising for CDS size, CDS8 had the highest SNP rate at 3.97% (i.e., number of SNPs per total number of nucleotides), whilst CDS6 had the lowest at 1.94%. CDS5 had the highest total number of SNPs across the 524 sequences analysed (2267 SNPs), whereas CDS6 had the least SNPs with only 85 SNPs. Calculation of the genetic distances identified CDS6 as the least variable gene at the nucleotide level (d = 0.001), and CDS5 as the most variable (d = 0.007); however, at the amino acid level CDS2 was the least variable (d = 0.001), whilst CDS5 remained the most variable (d = 0.013). This study describes the largest in-depth analysis of the <i>C. trachomatis</i> plasmid to date, through the analysis of plasmid sequence data mined from whole genome sequences spanning 50 years and from a worldwide distribution, providing insights into the nature and extent of existing variation within the plasmid as well as guidance for the design of future diagnostic assays. This is crucial at a time when single-target diagnostic assays are failing to detect natural mutants, putting those infected at risk of a serious long-term and life-changing illness.

    Microorganisms 2020;8;3

  • A cell atlas of human thymic development defines T cell repertoire formation

    Jong-Eun Park, Rachel A. Botting, Cecilia Domínguez Conde, Dorin-Mirel Popescu, Marieke Lavaert, Daniel J. Kunz, Issac Goh, Emily Stephenson, Roberta Ragazzini, Elizabeth Tuck, Anna Wilbrey-Clark, Kenny Roberts, Veronika R. Kedlian, John R. Ferdinand, Xiaoling He, Simone Webb, Daniel Maunder, Niels Vandamme, Krishnaa T. Mahbubani, Krzysztof Polanski, Lira Mamanova, Liam Bolt, David Crossland, Fabrizio de Rita, Andrew Fuller, Andrew Filby, Gary Reynolds, David Dixon, Kourosh Saeb-Parsy, Steven Lisgo, Deborah Henderson, Roser Vento-Tormo, Omer A. Bayraktar, Roger A. Barker, Kerstin B. Meyer, Yvan Saeys, Paola Bonfanti, Sam Behjati, Menna R. Clatworthy, Tom Taghon, Muzlifah Haniffa and Sarah A. Teichmann

    The human thymus is the organ responsible for the maturation of many types of T cells, which are immune cells that protect us from infection. However, it is not well known how these cells develop with a full immune complement that contains the necessary variation to protect us from a variety of pathogens. By performing single-cell RNA sequencing on more than 250,000 cells, Park et al. examined the changes that occur in the thymus over the course of a human life. They found that development occurs in a coordinated manner among immune cells and with their developmental microenvironment. These data allowed for the creation of models of how T cells with different specific immune functions develop in humans. Science , this issue p. [eaay3224][1] ### INTRODUCTION The thymus is the critical organ for T cell development and T cell receptor (TCR) repertoire formation, which shapes the landscape of adaptive immunity. T cell development in the thymus is spatially coordinated, and this process is orchestrated by diverse cell types constituting the thymic microenvironment. Although the thymus has been extensively studied using diverse animal models, human immunity cannot be understood without a detailed atlas of the human thymus. ### RATIONALE To provide a comprehensive atlas of thymic cells across human life, we performed single-cell RNA sequencing (scRNA-seq) using dissociated cells from human thymus during development, childhood, and adult life. We sampled 15 embryonic and fetal thymi spanning thymic developmental stages between 7 and 17 post-conception weeks, as well as nine postnatal thymi from pediatric and adult individuals. Diverse sorting schemes were applied to increase the coverage on underrepresented cell populations. Using the marker genes obtained from single-cell transcriptomes, we spatially localized cell states by single-molecule fluorescence in situ hybridization (smFISH). To provide a systematic comparison between human and mouse, we also generated single-cell data on postnatal mouse thymi and combined this with preexisting mouse datasets. Finally, to investigate the bias in the recombination and selection of human TCR repertoires, we enriched the TCR sequences for single-cell library generation. ### RESULTS We identified more than 50 different cell states in the human thymus. Human thymus cell states dynamically change in abundance and gene expression profiles across development and during pediatric and adult life. We identified novel subpopulations of human thymic fibroblasts and epithelial cells and located them in situ. We computationally predicted the trajectory of human T cell development from early progenitors in the hematopoietic fetal liver into diverse mature T cell types. Using this trajectory, we constructed a framework of putative transcription factors driving T cell fate determination. Among thymic unconventional T cells, we noted a distinct subset of CD8αα+ T cells, which is marked by GNG4 expression and located in the perimedullary region of the thymus. This subset expressed high levels of XCL1 and colocalized with XCR1+ dendritic cells. Comparison of human and mouse thymic cells revealed divergent gene expression profiles of these unconventional T cell types. Finally, we identified a strong bias in human VDJ usage shaped by recombination and multiple rounds of selection, including a TCRα V-J bias for CD8+ T cells. ### CONCLUSION Our single-cell transcriptome profile of the thymus across the human lifetime and across species provides a high-resolution census of T cell development within the native tissue microenvironment. Systematic comparison between the human and mouse thymus highlights human-specific cell states and gene expression signatures. Our detailed cellular network of the thymic niche for T cell development will aid the establishment of in vitro organoid culture models that faithfully recapitulate human in vivo thymic tissue. ![Figure][2]&lt;/img&gt; Constructing the human thymus cell atlas. We analyzed human thymic cells across development and postnatal life using scRNA-seq and spatial methods to delineate the diversity of thymic-derived T cells and the localization of cells constituting the thymus microenvironment. With T cell development trajectory reconstituted at single-cell resolution combined with TCR sequence, we investigated the bias in the VDJ recombination and selection of human TCR repertoires. Finally, we provide a systematic comparison between human and mouse thymic cell atlases. The thymus provides a nurturing environment for the differentiation and selection of T cells, a process orchestrated by their interaction with multiple thymic cell types. We used single-cell RNA sequencing to create a cell census of the human thymus across the life span and to reconstruct T cell differentiation trajectories and T cell receptor (TCR) recombination kinetics. Using this approach, we identified and located in situ CD8αα+ T cell populations, thymic fibroblast subtypes, and activated dendritic cell states. In addition, we reveal a bias in TCR recombination and selection, which is attributed to genomic position and the kinetics of lineage commitment. Taken together, our data provide a comprehensive atlas of the human thymus across the life span with new insights into human T cell development. [1]: /lookup/doi/10.1126/science.aay3224 [2]: pending:yes

    Science 2020;367;6480

  • Expanding the genotype-phenotype correlation of de novo heterozygous missense variants in YWHAG as a cause of developmental and epileptic encephalopathy.

    Kanani F, Titheradge H, Cooper N, Elmslie F, Lees MM, Juusola J, Pisani L, McKenna C, Mignot C, Valence S, Keren B, Lachlan K, DDD Study and Balasubramanian M

    Sheffield Clinical Genetics Service, Sheffield Children's NHS Foundation Trust, Sheffield, UK.

    Developmental and Epileptic encephalopathies (DEE) describe heterogeneous epilepsy syndromes, characterized by early-onset, refractory seizures and developmental delay (DD). Several DEE associated genes have been reported. With increased access to whole exome sequencing (WES), new candidate genes are being identified although there are fewer large cohort papers describing the clinical phenotype in such patients. We describe 6 unreported individuals and provide updated information on an additional previously reported individual with heterozygous de novo missense variants in YWHAG. We describe a syndromal phenotype, report 5 novel, and a recurrent p.Arg132Cys YWHAG variant and compare developmental trajectory and treatment strategies in this cohort. We provide further evidence of causality in YWHAG variants. WES was performed in five patients via Deciphering Developmental Disorders Study and the remaining two were identified via Genematcher and AnnEX databases. De novo variants identified from exome data were validated using Sanger sequencing. Seven out of seven patients in the cohort have de novo, heterozygous missense variants in YWHAG including 2/7 patients with a recurrent c.394C > T, p.Arg132Cys variant; 1/7 has a second, pathogenic variant in STAG1. Characteristic features included: early-onset seizures, predominantly generalized tonic-clonic and absence type (7/7) with good response to standard anti-epileptic medications; moderate DD; Intellectual Disability (ID) (5/7) and Autism Spectrum Disorder (3/7). De novo YWHAG missense variants cause EE, characterized by early-onset epilepsy, ID and DD, supporting the hypothesis that YWHAG loss-of-function causes a neurological phenotype. Although the exact mechanism of disease resulting from alterations in YWHAG is not fully known, it is possible that haploinsufficiency of YWHAG in developing cerebral cortex may lead to abnormal neuronal migration resulting in DEE.

    American journal of medical genetics. Part A 2020;182;4;713-720

  • ChemBioServer 2.0: An advanced web server for filtering, clustering and networking of chemical compounds facilitating both drug discovery and repurposing.

    Karatzas E, Zamora JE, Athanasiadis E, Dellis D, Cournia Z and Spyrou GM

    Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Ilisia, Athens, Greece.

    ChemBioServer 2.0 is the advanced sequel of a web-server for filtering, clustering and networking of chemical compound libraries facilitating both drug discovery and repurposing. It provides researchers the ability to (i) browse and visualize compounds along with their physicochemical and toxicity properties, (ii) perform property-based filtering of chemical compounds, (iii) explore compound libraries for lead optimization based on perfect match substructure search, (iv) re-rank virtual screening results to achieve selectivity for a protein of interest against different protein members of the same family, selecting only those compounds that score high for the protein of interest, (v) perform clustering among the compounds based on their physicochemical properties providing representative compounds for each cluster, (vi) construct and visualize a structural similarity network of compounds providing a set of network analysis metrics, (vii) combine a given set of compounds with a reference set of compounds into a single structural similarity network providing the opportunity to infer drug repurposing due to transitivity, (viii) remove compounds from a network based on their similarity with unwanted substances (e.g. failed drugs) and (ix) build custom compound mining pipelines.


    Bioinformatics (Oxford, England) 2020

  • The mutational constraint spectrum quantified from variation in 141,456 humans.

    Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP, Gauthier LD, Brand H, Solomonson M, Watts NA, Rhodes D, Singer-Berk M, England EM, Seaby EG, Kosmicki JA, Walters RK, Tashman K, Farjoun Y, Banks E, Poterba T, Wang A, Seed C, Whiffin N, Chong JX, Samocha KE, Pierce-Hoffman E, Zappala Z, O'Donnell-Luria AH, Minikel EV, Weisburd B, Lek M, Ware JS, Vittal C, Armean IM, Bergelson L, Cibulskis K, Connolly KM, Covarrubias M, Donnelly S, Ferriera S, Gabriel S, Gentry J, Gupta N, Jeandet T, Kaplan D, Llanwarne C, Munshi R, Novod S, Petrillo N, Roazen D, Ruano-Rubio V, Saltzman A, Schleicher M, Soto J, Tibbetts K, Tolonen C, Wade G, Talkowski ME, Genome Aggregation Database Consortium, Neale BM, Daly MJ and MacArthur DG

    Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

    Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes<sup>1</sup>. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.

    Nature 2020;581;7809;434-443

  • The gene-rich genome of the scallop Pecten maximus.

    Kenny NJ, McCarthy SA, Dudchenko O, James K, Betteridge E, Corton C, Dolucan J, Mead D, Oliver K, Omer AD, Pelan S, Ryan Y, Sims Y, Skelton J, Smith M, Torrance J, Weisz D, Wipat A, Aiden EL, Howe K and Williams ST

    Natural History Museum, Department of Life Sciences,Cromwell Road, London SW7 5BD, UK.

    Background: The king scallop, Pecten maximus, is distributed in shallow waters along the Atlantic coast of Europe. It forms the basis of a valuable commercial fishery and plays a key role in coastal ecosystems and food webs. Like other filter feeding bivalves it can accumulate potent phytotoxins, to which it has evolved some immunity. The molecular origins of this immunity are of interest to evolutionary biologists, pharmaceutical companies, and fisheries management.

    Findings: Here we report the genome assembly of this species, conducted as part of the Wellcome Sanger 25 Genomes Project. This genome was assembled from PacBio reads and scaffolded with 10X Chromium and Hi-C data. Its 3,983 scaffolds have an N50 of 44.8 Mb (longest scaffold 60.1 Mb), with 92% of the assembly sequence contained in 19 scaffolds, corresponding to the 19 chromosomes found in this species. The total assembly spans 918.3 Mb and is the best-scaffolded marine bivalve genome published to date, exhibiting 95.5% recovery of the metazoan BUSCO set. Gene annotation resulted in 67,741 gene models. Analysis of gene content revealed large numbers of gene duplicates, as previously seen in bivalves, with little gene loss, in comparison with the sequenced genomes of other marine bivalve species.

    Conclusions: The genome assembly of P. maximus and its annotated gene set provide a high-quality platform for studies on such disparate topics as shell biomineralization, pigmentation, vision, and resistance to algal toxins. As a result of our findings we highlight the sodium channel gene Nav1, known to confer resistance to saxitoxin and tetrodotoxin, as a candidate for further studies investigating immunity to domoic acid.

    GigaScience 2020;9;5

  • Clustered CTCF binding is an evolutionary mechanism to maintain topologically associating domains.

    Kentepozidou E, Aitken SJ, Feig C, Stefflova K, Ibarra-Soria X, Odom DT, Roller M and Flicek P

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, CB10 1SD, UK.

    Background: CTCF binding contributes to the establishment of a higher-order genome structure by demarcating the boundaries of large-scale topologically associating domains (TADs). However, despite the importance and conservation of TADs, the role of CTCF binding in their evolution and stability remains elusive.

    Results: We carry out an experimental and computational study that exploits the natural genetic variation across five closely related species to assess how CTCF binding patterns stably fixed by evolution in each species contribute to the establishment and evolutionary dynamics of TAD boundaries. We perform CTCF ChIP-seq in multiple mouse species to create genome-wide binding profiles and associate them with TAD boundaries. Our analyses reveal that CTCF binding is maintained at TAD boundaries by a balance of selective constraints and dynamic evolutionary processes. Regardless of their conservation across species, CTCF binding sites at TAD boundaries are subject to stronger sequence and functional constraints compared to other CTCF sites. TAD boundaries frequently harbor dynamically evolving clusters containing both evolutionarily old and young CTCF sites as a result of the repeated acquisition of new species-specific sites close to conserved ones. The overwhelming majority of clustered CTCF sites colocalize with cohesin and are significantly closer to gene transcription start sites than nonclustered CTCF sites, suggesting that CTCF clusters particularly contribute to cohesin stabilization and transcriptional regulation.

    Conclusions: Dynamic conservation of CTCF site clusters is an apparently important feature of CTCF binding evolution that is critical to the functional stability of a higher-order chromatin structure.

    Funded by: Cancer Research UK: 20412; European Research Council: 615584; Wellcome Trust: WT106563/Z/14, WT108749/Z/15/Z, WT202878/B/16/Z, WT202878/Z/16/Z

    Genome biology 2020;21;1;5

  • Spatio-temporal dynamics of Plasmodium falciparum transmission within a spatial unit on the Colombian Pacific Coast.

    Knudson A, González-Casabianca F, Feged-Rivadeneira A, Pedreros MF, Aponte S, Olaya A, Castillo CF, Mancilla E, Piamba-Dorado A, Sanchez-Pedraza R, Salazar-Terreros MJ, Lucchi N, Udhayakumar V, Jacob C, Pance A, Carrasquilla M, Apráez G, Angel JA, Rayner JC and Corredor V

    Departamento de Microbiología, Facultad de Medicina, Universidad Nacional de Colombia, Bogotá, Colombia.

    As malaria control programmes concentrate their efforts towards malaria elimination a better understanding of malaria transmission patterns at fine spatial resolution units becomes necessary. Defining spatial units that consider transmission heterogeneity, human movement and migration will help to set up achievable malaria elimination milestones and guide the creation of efficient operational administrative control units. Using a combination of genetic and epidemiological data we defined a malaria transmission unit as the area contributing 95% of malaria cases diagnosed at the catchment facility located in the town of Guapi in the South Pacific Coast of Colombia. We provide data showing that P. falciparum malaria transmission is heterogeneous in time and space and analysed, using topological data analysis, the spatial connectivity, at the micro epidemiological level, between parasite populations circulating within the unit. To illustrate the necessity to evaluate the efficacy of malaria control measures within the transmission unit in order to increase the efficiency of the malaria control effort, we provide information on the size of the asymptomatic reservoir, the nature of parasite genotypes associated with drug resistance as well as the frequency of the Pfhrp2/3 deletion associated with false negatives when using Rapid Diagnostic Tests.

    Funded by: U.S. Department of Health &amp; Human Services | Centers for Disease Control and Prevention (CDC): 2017-503; Wellcome Trust (Wellcome): 206194/Z/17/Z

    Scientific reports 2020;10;1;3756

  • Exome Sequencing for Prenatal Detection of Genetic Abnormalities in Fetal Ultrasound Anomalies: An Economic Evaluation.

    Kodabuckus SS, Quinlan-Jones E, McMullan DJ, Maher ER, Hurles ME, Barton PM and Kilby MD

    Health Economics Unit, Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, United Kingdom.

    Introduction: In light of the prospective Prenatal Assessment of Genomes and Exomes (PAGE) study, this paper aimed to determine the additional costs of using exome sequencing (ES) alongside or in place of chromosomal microarray (CMA) in a fetus with an identified congenital anomaly.

    Methods: A decision tree was populated using data from a prospective cohort of women undergoing invasive diagnostic testing. Four testing strategies were evaluated: CMA, ES, CMA followed by ES ("stepwise"); CMA and ES combined.

    Results: When ES is priced at GBP 2,100 (EUR 2,407/USD 2,694), performing ES alone prenatally would cost a further GBP 31,410 (EUR 36,001/USD 40,289) per additional genetic diagnosis, whereas the stepwise would cost a further GBP 24,657 (EUR 28,261/USD 31,627) per additional genetic diagnosis. When ES is priced at GBP 966 (EUR 1,107/USD 1,239), performing ES alone prenatally would cost a further GBP 11,532 (EUR 13,217/USD 14,792) per additional genetic diagnosis, whereas the stepwise would cost a further additional GBP 11,639 (EUR 13,340/USD 14,929) per additional genetic diagnosis. The sub-group analysis suggests that performing stepwise on cases indicative of multiple anomalies at ultrasound scan (USS) compared to cases indicative of a single anomaly, is more cost-effective compared to using ES alone.

    Discussion/conclusion: Performing ES alongside CMA is more cost-effective than ES alone, which can potentially lead to improvements in pregnancy management. The direct effects of test results on pregnancy outcomes were not examined; therefore, further research is recommended to examine changes on the projected incremental cost-effectiveness ratios.

    Fetal diagnosis and therapy 2020;1-11

  • Mutational signatures: experimental design and analytical framework.

    Koh G, Zou X and Nik-Zainal S

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK.

    Mutational signatures provide a powerful alternative for understanding the pathophysiology of cancer. Currently, experimental efforts aimed at validating and understanding the etiologies of cancer-derived mutational signatures are underway. In this review, we highlight key aspects of mutational signature experimental design and describe the analytical framework. We suggest guidelines and quality control measures for handling whole-genome sequencing data for mutational signature analyses and discuss pitfalls in interpretation. We envision that improved next-generation sequencing technologies and molecular cell biology approaches will usher in the next generation of studies into the etiologies and mechanisms of mutational patterns uncovered in cancers.

    Funded by: Cancer Research UK: C60100/A23916; Medical Research Council: Grant-in-Aid MRC Cancer Unit; Wellcome Trust: 4-year PhD Studentship (Sanger Institute)

    Genome biology 2020;21;1;37

  • halSynteny: a fast, easy-to-use conserved synteny block construction method for multiple whole-genome alignments.

    Krasheninnikova K, Diekhans M, Armstrong J, Dievskii A, Paten B and O'Brien S

    Computer Technologies Laboratory, School of Translational Information Technologies, ITMO University, 49 Kronverkskiy Pr., St. Petersburg 197101, St. Petersburg, Russian Federation.

    Background: Large-scale sequencing projects provide high-quality full-genome data that can be used for reconstruction of chromosomal exchanges and rearrangements that disrupt conserved syntenic blocks. The highest resolution of cross-species homology can be obtained on the basis of whole-genome, reference-free alignments. Very large multiple alignments of full-genome sequence stored in a binary format demand an accurate and efficient computational approach for synteny block production.

    Findings: halSynteny performs efficient processing of pairwise alignment blocks for any pair of genomes in the alignment. The tool is part of the HAL comparative genomics suite and is targeted to build synteny blocks for multi-hundred-way, reference-free vertebrate alignments built with the Cactus system.

    Conclusions: halSynteny enables an accurate and rapid identification of synteny in multiple full-genome alignments. The method is implemented in C++11 as a component of the halTools software and released under MIT license. The package is available at

    GigaScience 2020;9;6

  • The prevalence and implications of single nucleotide polymorphisms in genes encoding the RNA polymerase of clinical isolates of Staphylococcus aureus.

    Krishna A, Liu B, Peacock SJ and Wigneshweraraj S

    MRC Centre for Molecular Bacteriology and Infection, Imperial College London, London, UK.

    Central to the regulation of bacterial gene expression is the multisubunit enzyme RNA polymerase (RNAP), which is responsible for catalyzing transcription. As all adaptive processes are underpinned by changes in gene expression, the RNAP can be considered the major mediator of any adaptive response in the bacterial cell. In bacterial pathogens, theoretically, single nucleotide polymorphisms (SNPs) in genes that encode subunits of the RNAP and associated factors could mediate adaptation and confer a selective advantage to cope with biotic and abiotic stresses. We investigated this possibility by undertaking a systematic survey of SNPs in genes encoding the RNAP and associated factors in a collection of 1,429 methicillin-resistant Staphylococcus aureus (MRSA) clinical isolates. We present evidence for the existence of several, hitherto unreported, nonsynonymous SNPs in genes encoding the RNAP and associated factors of MRSA ST22 clinical isolates and propose that the acquisition of amino acid substitutions in the RNAP could represent an adaptive strategy that contributes to the pathogenic success of MRSA.

    Funded by: Imperial College President's Scholarship; UKCRC Translational Infection Research Initiative and the Medical Research Council: G1000803; Wellcome Trust: WT100958MA

    MicrobiologyOpen 2020;e1058

  • Distinct microbial and immune niches of the human colon

    Kylie R. James, Tomas Gomes, Rasa Elmentaite, Nitin Kumar, Emily L. Gulliver, Hamish W. King, Mark D. Stares, Bethany R. Bareham, John R. Ferdinand, Velislava N. Petrova, Krzysztof Pola&#324;ski, Samuel C. Forster, Lorna B. Jarvis, Ondrej Suchanek, Sarah Howlett, Louisa K. James, Joanne L. Jones, Kerstin B. Meyer, Menna R. Clatworthy, Kourosh Saeb-Parsy, Trevor D. Lawley and Sarah A. Teichmann

    Gastrointestinal microbiota and immune cells interact closely and display regional specificity; however, little is known about how these communities differ with location. Here, we simultaneously assess microbiota and single immune cells across the healthy, adult human colon, with paired characterization of immune cells in the mesenteric lymph nodes, to delineate colonic immune niches at steady state. We describe distinct helper T cell activation and migration profiles along the colon and characterize the transcriptional adaptation trajectory of regulatory T cells between lymphoid tissue and colon. Finally, we show increasing B cell accumulation, clonal expansion and mutational frequency from the cecum to the sigmoid colon and link this to the increasing number of reactive bacterial species. The gut microbiota and their proximate immune cells engage in a dialog of reciprocal regulation. James and colleagues describe how immune cell and microbiotal populations vary along the length of the human colon.

    Nature Immunology 2020;21;3;343

  • Eleven grand challenges in single-cell data science.

    Lähnemann D, Köster J, Szczurek E, McCarthy DJ, Hicks SC, Robinson MD, Vallejos CA, Campbell KR, Beerenwinkel N, Mahfouz A, Pinello L, Skums P, Stamatakis A, Attolini CS, Aparicio S, Baaijens J, Balvert M, Barbanson B, Cappuccio A, Corleone G, Dutilh BE, Florescu M, Guryev V, Holmer R, Jahn K, Lobo TJ, Keizer EM, Khatri I, Kielbasa SM, Korbel JO, Kozlov AM, Kuo TH, Lelieveldt BPF, Mandoiu II, Marioni JC, Marschall T, Mölder F, Niknejad A, Raczkowski L, Reinders M, Ridder J, Saliba AE, Somarakis A, Stegle O, Theis FJ, Yang H, Zelikovsky A, McHardy AC, Raphael BJ, Shah SP and Schönhuth A

    Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany.

    The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands-or even millions-of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.

    Funded by: NHGRI NIH HHS: R01 HG007069

    Genome biology 2020;21;1;31

  • Targeted sequencing in DLBCL, molecular subtypes, and outcomes: a Haematological Malignancy Research Network report.

    Lacy SE, Barrans SL, Beer P, Painter D, Smith A, Roman E, Cooke SL, Ruiz C, Glover P, van Hoppe SJ, Webster N, Campbell PJ, Tooze RM, Patmore R, Burton C, Crouch S and Hodson DJ

    University of York, York, United Kingdom.

    Based on the profile of genetic alterations occurring in tumor samples from selected diffuse-large-B-cell-lymphoma (DLBCL) patients, two recent whole exome sequencing studies proposed partially overlapping classification systems. Using clustering techniques applied to targeted sequencing data derived from a large unselected population-based patient cohort with full clinical follow-up (n=928), we investigated whether molecular subtypes can be robustly identified using methods potentially applicable in routine clinical practice. DNA extracted from DLBCL tumors diagnosed in patients residing in a catchment population of ~4 million (14 centers), were sequenced with a targeted 293-gene hematological-malignancy panel. Bernoulli mixture-model clustering was applied, and the resulting subtypes analyzed in relation to their clinical characteristics and outcomes. Five molecular subtypes were resolved, termed MYD88, BCL2, SOCS1/SGK1, TET2/SGK1 and NOTCH2, along with an unclassified group. The subtypes characterized by genetic alterations of BCL2, NOTCH2 and MYD88 respectively recapitulated recent studies showing good, intermediate and poor prognosis respectively. The SOCS1/SGK1 subtype showed biological overlap with primary mediastinal B-cell lymphoma and conferred excellent prognosis. Although not identified as a distinct cluster, NOTCH1 mutation was associated with poor prognosis. The impact of TP53 mutation varied with genomic subtypes, conferring no effect in the NOTCH2 subtype and poor prognosis in the MYD88 subtype. Our findings confirm the existence of molecular subtypes of DLBCL, providing evidence that genomic tests have prognostic significance in non-selected DLBCL patients. The identification of both good and poor risk subtypes in R-CHOP treated patients clearly demonstrate the clinical value of the approach; confirming the need for a consensus classification.

    Blood 2020

  • Gene family information facilitates variant interpretation and identification of disease-associated genes in neurodevelopmental disorders.

    Lal D, May P, Perez-Palma E, Samocha KE, Kosmicki JA, Robinson EB, Møller RS, Krause R, Nürnberg P, Weckhuysen S, De Jonghe P, Guerrini R, Niestroj LM, Du J, Marini C, EuroEPINOMICS-RES Consortium, Ware JS, Kurki M, Gormley P, Tang S, Wu S, Biskup S, Poduri A, Neubauer BA, Koeleman BPC, Helbig KL, Weber YG, Helbig I, Majithia AR, Palotie A and Daly MJ

    Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH, USA.

    Background: Classifying pathogenicity of missense variants represents a major challenge in clinical practice during the diagnoses of rare and genetic heterogeneous neurodevelopmental disorders (NDDs). While orthologous gene conservation is commonly employed in variant annotation, approximately 80% of known disease-associated genes belong to gene families. The use of gene family information for disease gene discovery and variant interpretation has not yet been investigated on a genome-wide scale. We empirically evaluate whether paralog-conserved or non-conserved sites in human gene families are important in NDDs.

    Methods: Gene family information was collected from Ensembl. Paralog-conserved sites were defined based on paralog sequence alignments; 10,068 NDD patients and 2078 controls were statistically evaluated for de novo variant burden in gene families.

    Results: We demonstrate that disease-associated missense variants are enriched at paralog-conserved sites across all disease groups and inheritance models tested. We developed a gene family de novo enrichment framework that identified 43 exome-wide enriched gene families including 98 de novo variant carrying genes in NDD patients of which 28 represent novel candidate genes for NDD which are brain expressed and under evolutionary constraint.

    Conclusion: This study represents the first method to incorporate gene family information into a statistical framework to interpret variant data for NDDs and to discover new NDD-associated genes.

    Genome medicine 2020;12;1;28

  • Atypical, milder presentation in a child with CC2D2A and KIDINS220 variants.

    Lam Z, Albaba S, Study D and Balasubramanian M

    Yorkshire Regional Genetics Service, Leeds Teaching Hospitals NHS Trust, Leeds.

    With the increasing availability and clinical use of exome and whole-genome sequencing, reverse phenotyping is now becoming common practice in clinical genetics. Here, we report a patient identified through the Wellcome Trust Deciphering Developmental Disorders study who has homozygous pathogenic variants in CC2D2A and a de-novo heterozygous pathogenic variant in KIDINS220. He presents with developmental delay, intellectual disability, and oculomotor apraxia. Reverse phenotyping has demonstrated that he likely has a composite phenotype with contributions from both variants. The patient is much more mildly affected than those with Joubert Syndrome or Spastic paraplegia, intellectual disability, nystagmus, and obesity, the conditions associated with CC2D2A and KIDINS220 respectively, and therefore, contributes to the phenotypic variability associated with the two conditions.

    Clinical dysmorphology 2020;29;1;10-16

  • TMEM95 is a sperm membrane protein essential for mammalian fertilization.

    Lamas-Toranzo I, Hamze JG, Bianchi E, Fernández-Fuertes B, Pérez-Cerezales S, Laguna-Barraza R, Fernández-González R, Lonergan P, Gutiérrez-Adán A, Wright GJ, Jiménez-Movilla M and Bermejo-Álvarez P

    Animal Reproduction, INIA, Madrid, Spain.

    The fusion of gamete membranes during fertilization is an essential process for sexual reproduction. Despite its importance, only three proteins are known to be indispensable for sperm-egg membrane fusion: the sperm proteins IZUMO1 and SPACA6, and the egg protein JUNO. Here we demonstrate that another sperm protein, TMEM95, is necessary for sperm-egg interaction. TMEM95 ablation in mice caused complete male-specific infertility. Sperm lacking this protein were morphologically normal exhibited normal motility, and could penetrate the zona pellucida and bind to the oolemma. However, once bound to the oolemma, TMEM95-deficient sperm were unable to fuse with the egg membrane or penetrate into the ooplasm, and fertilization could only be achieved by mechanical injection of one sperm into the ooplasm, thereby bypassing membrane fusion. These data demonstrate that TMEM95 is essential for mammalian fertilization.

    Funded by: Department of Agriculture, Food and the Marine: 11/S/104; European Union Seventh Framework Programme: Marie Curie fellowship; Fundaci&amp;amp;#x00F3;n S&amp;amp;#x00E9;neca-Agencia de Ciencia y Tecnolog&amp;amp;#x00ED;a de Murcia: 20887/PI/18; H2020 European Research Council: StG 757886-ELONGAN; Medical Research Council: MR/M012468/1; Ministerio de Economía y Competitividad: AGL2014-58739-R, AGL2015-70159-P, AGL2016-71890-REDT, AGL2017-84908-R, FPI fellowship, RTI2018-093548-B-I00, RYC-2012-10193, Ram&amp;#x00F3;n y Cajal contract

    eLife 2020;9

  • Pan-active imidazolopiperazine antimalarials target the Plasmodium falciparum intracellular secretory pathway.

    LaMonte GM, Rocamora F, Marapana DS, Gnädig NF, Ottilie S, Luth MR, Worgall TS, Goldgof GM, Mohunlal R, Santha Kumar TR, Thompson JK, Vigil E, Yang J, Hutson D, Johnson T, Huang J, Williams RM, Zou BY, Cheung AL, Kumar P, Egan TJ, Lee MCS, Siegel D, Cowman AF, Fidock DA and Winzeler EA

    Department of Pediatrics, School of Medicine, University of California, San Diego, La Jolla, CA, 92093, USA.

    A promising new compound class for treating human malaria is the imidazolopiperazines (IZP) class. IZP compounds KAF156 (Ganaplacide) and GNF179 are effective against Plasmodium symptomatic asexual blood-stage infections, and are able to prevent transmission and block infection in animal models. But despite the identification of resistance mechanisms in P. falciparum, the mode of action of IZPs remains unknown. To investigate, we here combine in vitro evolution and genome analysis in Saccharomyces cerevisiae with molecular, metabolomic, and chemogenomic methods in P. falciparum. Our findings reveal that IZP-resistant S. cerevisiae clones carry mutations in genes involved in Endoplasmic Reticulum (ER)-based lipid homeostasis and autophagy. In Plasmodium, IZPs inhibit protein trafficking, block the establishment of new permeation pathways, and cause ER expansion. Our data highlight a mechanism for blocking parasite development that is distinct from those of standard compounds used to treat malaria, and demonstrate the potential of IZPs for studying ER-dependent protein processing.

    Funded by: Bill and Melinda Gates Foundation (Bill &amp; Melinda Gates Foundation): OPP1054480, OPP1171497

    Nature communications 2020;11;1;1780

  • Multiple GYPB gene deletions associated with the U- phenotype in those of African ancestry.

    Lane WJ, Gleadall NS, Aeschlimann J, Vege S, Sanchis-Juan A, Stephens J, Sullivan JC, Mah HH, Aguad M, Smeland-Wagman R, Lebo MS, Vijay Kumar PK, Kaufman RM, Green RC, Ouwehand WH and Westhoff CM

    Department of Pathology, Brigham and Women's Hospital, Boston, Massachusetts.

    Background: The MNS blood group system is defined by three homologous genes: GYPA, GYPB, and GYPE. GYPB encodes for glycophorin B (GPB) carrying S/s and the "universal" antigen U. RBCs of approximately 1% of individuals of African ancestry are U- due to absence of GPB. The U- phenotype has long been attributed to a deletion encompassing GYPB exons 2 to 5 and GYPE exon 1 (GYPB*01N).

    Study design and methods: Samples from two U-individuals underwent Illumina short read whole genome sequencing (WGS) and Nanopore long read WGS. In addition, two existing WGS datasets, MedSeq (n = 110) and 1000 Genomes (1000G, n = 2535), were analyzed for GYPB deletions. Deletions were confirmed by Sanger sequencing. Twenty known U- donor samples were tested by a PCR assay to determine the specific deletion alleles present in African Americans.

    Results: Two large GYPB deletions in U- samples of African ancestry were identified: a 110 kb deletion extending left of GYPB (DEL_B_LEFT) and a 103 kb deletion extending right (DEL_B_RIGHT). DEL_B_LEFT and DEL_B_RIGHT were the most common GYPB deletions in the 1000 Genomes Project 669 African genomes (allele frequencies 0.04 and 0.02). Seven additional deletions involving GYPB were seen in African, Admixed American, and South Asian samples. No samples analyzed had GYPB*01N.

    Conclusions: The U- phenotype in those of African ancestry is primarily associated with two different complete deletions of GYPB (with intact GYPE). Seven additional less common GYPB deletion backgrounds were found. GYPB*01N, long assumed to be the allele commonly encoding U- phenotypes, appears to be rare.

    Funded by: Department of Defense; Doris Duke Charitable Foundation; NHGRI NIH HHS: U01-HG006500; NHS Blood and Transplant; NIH HHS; National Institute for Health Research

    Transfusion 2020

  • Analysis pipelines for cancer genome sequencing in mice.

    Lange S, Engleitner T, Mueller S, Maresch R, Zwiebel M, González-Silva L, Schneider G, Banerjee R, Yang F, Vassiliou GS, Friedrich MJ, Saur D, Varela I and Rad R

    Institute of Molecular Oncology and Functional Genomics, School of Medicine, Technische Universität München, Munich, Germany.

    Mouse models of human cancer have transformed our ability to link genetics, molecular mechanisms and phenotypes. Both reverse and forward genetics in mice are currently gaining momentum through advances in next-generation sequencing (NGS). Methodologies to analyze sequencing data were, however, developed for humans and hence do not account for species-specific differences in genome structures and experimental setups. Here, we describe standardized computational pipelines specifically tailored to the analysis of mouse genomic data. We present novel tools and workflows for the detection of different alteration types, including single-nucleotide variants (SNVs), small insertions and deletions (indels), copy-number variations (CNVs), loss of heterozygosity (LOH) and complex rearrangements, such as in chromothripsis. Workflows have been extensively validated and cross-compared using multiple methodologies. We also give step-by-step guidance on the execution of individual analysis types, provide advice on data interpretation and make the complete code available online. The protocol takes 2-7 d, depending on the desired analyses.

    Funded by: Deutsche Forschungsgemeinschaft (German Research Foundation): RA1629/2-1, SFB1243, SFB1321, SFB1335; Deutsche Krebshilfe (German Cancer Aid): 70112480; EC | EU Framework Programme for Research and Innovation H2020 | H2020 Priority Excellent Science | H2020 Marie Skłodowska-Curie Actions (H2020 Excellent Science - Marie Skłodowska-Curie Actions): PRECODE

    Nature protocols 2020;15;2;266-315

  • VarSite: Disease variants and protein structure.

    Laskowski RA, Stephenson JD, Sillitoe I, Orengo CA and Thornton JM

    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.

    VarSite is a web server mapping known disease-associated variants from UniProt and ClinVar, together with natural variants from gnomAD, onto protein 3D structures in the Protein Data Bank. The analyses are primarily image-based and provide both an overview for each human protein, as well as a report for any specific variant of interest. The information can be useful in assessing whether a given variant might be pathogenic or benign. The structural annotations for each position in the protein include protein secondary structure, interactions with ligand, metal, DNA/RNA, or other protein, and various measures of a given variant's possible impact on the protein's function. The 3D locations of the disease-associated variants can be viewed interactively via the 3dmol.js JavaScript viewer, as well as in RasMol and PyMOL. Users can search for specific variants, or sets of variants, by providing the DNA coordinates of the base change(s) of interest. Additionally, various agglomerative analyses are given, such as the mapping of disease and natural variants onto specific Pfam or CATH domains. The server is freely accessible to all at:

    Protein science : a publication of the Protein Society 2020;29;1;111-119

  • Whole genome sequencing of Herpes Simplex Virus 1 directly from human cerebrospinal fluid reveals selective constraints in neurotropic viruses.

    Lassalle F, Beale MA, Bharucha T, Williams CA, Williams RJ, Cudini J, Goldstein R, Haque T, Depledge DP and Breuer J

    Department of Infectious Disease Epidemiology, Imperial College London, St-Mary's Hospital campus, Praed Street, London W2 1NY, UK.

    Herpes Simplex Virus type 1 (HSV-1) chronically infects over 70 per cent of the global population. Clinical manifestations are largely restricted to recurrent epidermal vesicles. However, HSV-1 also leads to encephalitis, the infection of the brain parenchyma, with high associated rates of mortality and morbidity. In this study, we performed target enrichment followed by direct sequencing of HSV-1 genomes, using target enrichment methods on the cerebrospinal fluid (CSF) of clinical encephalitis patients and from skin swabs of epidermal vesicles on non-encephalopathic patients. Phylogenetic analysis revealed high inter-host diversity and little population structure. In contrast, samples from different lesions in the same patient clustered with similar patterns of allelic variants. Comparison of consensus genome sequences shows HSV-1 has been freely recombining, except for distinct islands of linkage disequilibrium (LD). This suggests functional constraints prevent recombination between certain genes, notably those encoding pairs of interacting proteins. Distinct LD patterns characterised subsets of viruses recovered from CSF and skin lesions, which may reflect different evolutionary constraints in different body compartments. Functions of genes under differential constraint related to immunity or tropism and provide new hypotheses on tissue-specific mechanisms of viral infection and latency.

    Virus evolution 2020;6;1;veaa012

  • Detecting extra-ocular Chlamydia trachomatis in a trachoma-endemic community in Ethiopia: Identifying potential routes of transmission.

    Last A, Versteeg B, Shafi Abdurahman O, Robinson A, Dumessa G, Abraham Aga M, Shumi Bejiga G, Negussu N, Greenland K, Czerniewska A, Thomson N, Cairncross S, Sarah V, Macleod D, Solomon AW, Logan J and Burton MJ

    Clinical Research Department, London School of Hygiene & Tropical Medicine, London, United Kingdom.

    Background: Trachoma elimination efforts are hampered by limited understanding of Chlamydia trachomatis (Ct) transmission routes. Here we aimed to detect Ct DNA at non-ocular sites and on eye-seeking flies.

    Methods: A population-based household survey was conducted in Oromia Region, Ethiopia. Ocular and non-ocular (faces, hands, clothing, water containers and sleeping surfaces) swabs were collected from all individuals. Flies were caught from faces of children. Flies, ocular swabs and non-ocular swabs were tested for Ct by quantitative PCR.

    Results: In total, 1220 individuals in 247 households were assessed. Active trachoma (trachomatous inflammation-follicular) and ocular Ct were detected in 10% and 2% of all-ages, and 21% and 3% of 1-9-year-olds, respectively. Ct was detected in 12% (95% CI:8-15%) of tested non-ocular swabs from ocular-positive households, but in none of the non-ocular swabs from ocular-negative households. Ct was detected on 24% (95% CI:18-32%) of flies from ocular-positive households and 3% (95% CI:1-6%) of flies from ocular-negative households.

    Conclusion: Ct DNA was detected on hands, faces and clothing of individuals living in ocular-positive households suggesting that this might be a route of transmission within Ct infected households. In addition, we detected Ct on flies from ocular-positive households and occasionally in ocular-negative households suggesting that flies might be a vector for transmission within and between Ct infected and uninfected households. These potential transmission routes may need to be simultaneously addressed to suppress transmission.

    PLoS neglected tropical diseases 2020;14;3;e0008120

  • Integrated scRNA-Seq Identifies Human Postnatal Thymus Seeding Progenitors and Regulatory Dynamics of Differentiating Immature Thymocytes.

    Lavaert M, Liang KL, Vandamme N, Park JE, Roels J, Kowalczyk MS, Li B, Ashenberg O, Tabaka M, Dionne D, Tickle TL, Slyper M, Rozenblatt-Rosen O, Vandekerckhove B, Leclercq G, Regev A, Van Vlierberghe P, Guilliams M, Teichmann SA, Saeys Y and Taghon T

    Faculty of Medicine and Health Sciences, Department of Diagnostic Sciences, Ghent University, C. Heymanslaan 10, MRB2, Entrance 38, 9000 Ghent, Belgium.

    During postnatal life, thymopoiesis depends on the continuous colonization of the thymus by bone-marrow-derived hematopoietic progenitors that migrate through the bloodstream. The current understanding of the nature of thymic immigrants is largely based on data from pre-clinical models. Here, we employed single-cell RNA sequencing (scRNA-seq) to examine the immature postnatal thymocyte population in humans. Integration of bone marrow and peripheral blood precursor datasets identified two putative thymus seeding progenitors that varied in expression of CD7; CD10; and the homing receptors CCR7, CCR9, and ITGB7. Whereas both precursors supported T cell development, only one contributed to intrathymic dendritic cell (DC) differentiation, predominantly of plasmacytoid dendritic cells. Trajectory inference delineated the transcriptional dynamics underlying early human T lineage development, enabling prediction of transcription factor (TF) modules that drive stage-specific steps of human T cell development. This comprehensive dataset defines the expression signature of immature human thymocytes and provides a resource for the further study of human thymopoiesis.

    Immunity 2020

  • Genomic heterogeneity in myeloproliferative neoplasms and applications to clinical practice.

    Lee J, Godfrey AL and Nangalia J

    Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK; Cambridge Stem Cell Institute, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, Puddicombe Way, Cambridge, UK; Department of Haematology, University of Cambridge, Cambridge, UK.

    The myeloproliferative neoplasms (MPN) polycythaemia vera, essential thrombocythaemia and primary myelofibrosis are chronic myeloid disorders associated most often with mutations in JAK2, MPL and CALR, and in some patients with additional acquired genomic lesions. Whilst the molecular mechanisms downstream of these mutations are now clearer, it is apparent that clinical phenotype in MPN is a product of complex interactions, acting between individual mutations, between disease subclones, and between the tumour and background host factors. In this review we first discuss MPN phenotypic driver mutations and the factors that interact with them to influence phenotype. We consider the importance of ongoing studies of clonal haematopoiesis, which may inform a better understanding of why MPN develop in specific individuals. We then consider how best to deploy genomic testing in a clinical environment and the challenges as well as opportunities that may arise from more routine, comprehensive genomic analysis of patients with MPN.

    Blood reviews 2020;100708

  • Tracking hematopoietic stem cells and their progeny using whole-genome sequencing.

    Lee-Six H and Kent DG

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom.

    Despite decades of progress in our understanding of hematopoiesis through the study of animal models and transplantation in humans, investigating physiological human hematopoiesis directly has remained challenging. Questions on the clonal structure of the human hematopoietic stem cell (HSC) pool, such as "how many HSCs are there?" and "do all HSC clones actively produce all blood cell types in equal proportions?" remain open. These questions have inherent value for understanding normal human physiology, but also directly inform our comprehension of the process by which the system is subverted to drive diseases of the blood, in particular blood cancers and bone marrow failure syndromes. The critical link between normal and abnormal hematopoiesis is perhaps best illustrated by the recent discovery of clonal hematopoiesis in healthy people with no abnormal blood parameters. In such individuals, large clones derived from single cells are present and are dominant relative to their normal counterparts, but their presence does not necessitate abnormal blood cell production. Intriguingly, however, these individuals are also at a significantly greater risk of developing leukemias and of cardiovascular events, underscoring the importance of understanding how blood stem cell clones compete against each other.

    Experimental hematology 2020

  • Horizontal gene transfer rate is not the primary determinant of observed antibiotic resistance frequencies in Streptococcus pneumoniae

    Lehtinen,Sonja, Chewapreecha,Claire, Lees,John, Hanage,William P., Lipsitch,Marc, Croucher,Nicholas J., Bentley,Stephen D., Turner,Paul, Fraser,Christophe and Mostowy,Rafal J.

    The extent to which evolution is constrained by the rate at which horizontal gene transfer (HGT) allows DNA to move between genetic lineages is an open question, which we address in the context of antibiotic resistance in Streptococcus pneumoniae. We analyze microbiological, genomic, and epidemiological data from the largest-to-date sequenced pneumococcal carriage study in 955 infants from a refugee camp on the Thailand-Myanmar border. Using a unified framework, we simultaneously test prior hypotheses on rates of HGT and a key evolutionary covariate (duration of carriage) as determinants of resistance frequencies. We conclude that in this setting, there is little evidence of HGT playing a major role in determining resistance frequencies. Instead, observed resistance frequencies are best explained as the outcome of selection acting on a pool of variants, irrespective of the rate at which resistance determinants move between genetic lineages.

    Science advances 2020;6;21;eaaz6137

  • Genome-wide Association Analysis in Humans Links Nucleotide Metabolism to Leukocyte Telomere Length.

    Li C, Stoma S, Lotta LA, Warner S, Albrecht E, Allione A, Arp PP, Broer L, Buxton JL, Da Silva Couto Alves A, Deelen J, Fedko IO, Gordon SD, Jiang T, Karlsson R, Kerrison N, Loe TK, Mangino M, Milaneschi Y, Miraglio B, Pervjakova N, Russo A, Surakka I, van der Spek A, Verhoeven JE, Amin N, Beekman M, Blakemore AI, Canzian F, Hamby SE, Hottenga JJ, Jones PD, Jousilahti P, Mägi R, Medland SE, Montgomery GW, Nyholt DR, Perola M, Pietiläinen KH, Salomaa V, Sillanpää E, Suchiman HE, van Heemst D, Willemsen G, Agudo A, Boeing H, Boomsma DI, Chirlaque MD, Fagherazzi G, Ferrari P, Franks P, Gieger C, Eriksson JG, Gunter M, Hägg S, Hovatta I, Imaz L, Kaprio J, Kaaks R, Key T, Krogh V, Martin NG, Melander O, Metspalu A, Moreno C, Onland-Moret NC, Nilsson P, Ong KK, Overvad K, Palli D, Panico S, Pedersen NL, Penninx BWJH, Quirós JR, Jarvelin MR, Rodríguez-Barranco M, Scott RA, Severi G, Slagboom PE, Spector TD, Tjonneland A, Trichopoulou A, Tumino R, Uitterlinden AG, van der Schouw YT, van Duijn CM, Weiderpass E, Denchi EL, Matullo G, Butterworth AS, Danesh J, Samani NJ, Wareham NJ, Nelson CP, Langenberg C and Codd V

    MRC Epidemiology Unit, University of Cambridge, CB2 0SL, United Kingdom; NIHR Leicester Biomedical Research Centre, Glenfield Hospital, Leicester, LE3 9QP, United Kingdom.

    Leukocyte telomere length (LTL) is a heritable biomarker of genomic aging. In this study, we perform a genome-wide meta-analysis of LTL by pooling densely genotyped and imputed association results across large-scale European-descent studies including up to 78,592 individuals. We identify 49 genomic regions at a false dicovery rate (FDR) < 0.05 threshold and prioritize genes at 31, with five highlighting nucleotide metabolism as an important regulator of LTL. We report six genome-wide significant loci in or near SENP7, MOB1B, CARMIL1, PRRC2A, TERF2, and RFWD3, and our results support recently identified PARP1, POT1, ATM, and MPHOSPH6 loci. Phenome-wide analyses in >350,000 UK Biobank participants suggest that genetically shorter telomere length increases the risk of hypothyroidism and decreases the risk of thyroid cancer, lymphoma, and a range of proliferative conditions. Our results replicate previously reported associations with increased risk of coronary artery disease and lower risk for multiple cancer types. Our findings substantially expand current knowledge on genes that regulate LTL and their impact on human health and disease.

    American journal of human genetics 2020;106;3;389-404

  • Patterns of somatic structural variation in human cancer genomes.

    Li Y, Roberts ND, Wala JA, Shapira O, Schumacher SE, Kumar K, Khurana E, Waszak S, Korbel JO, Haber JE, Imielinski M, PCAWG Structural Variation Working Group, Weischenfeldt J, Beroukhim R, Campbell PJ and PCAWG Consortium

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, UK.

    A key mutational process in cancer is structural variation, in which rearrangements delete, amplify or reorder genomic segments that range in size from kilobases to whole chromosomes<sup>1-7</sup>. Here we develop methods to group, classify and describe somatic structural variants, using data from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA), which aggregated whole-genome sequencing data from 2,658 cancers across 38 tumour types<sup>8</sup>. Sixteen signatures of structural variation emerged. Deletions have a multimodal size distribution, assort unevenly across tumour types and patients, are enriched in late-replicating regions and correlate with inversions. Tandem duplications also have a multimodal size distribution, but are enriched in early-replicating regions-as are unbalanced translocations. Replication-based mechanisms of rearrangement generate varied chromosomal structures with low-level copy-number gains and frequent inverted rearrangements. One prominent structure consists of 2-7 templates copied from distinct regions of the genome strung together within one locus. Such cycles of templated insertions correlate with tandem duplications, and-in liver cancer-frequently activate the telomerase gene TERT. A wide variety of rearrangement processes are active in cancer, which generate complex configurations of the genome upon which selection can act.

    Funded by: NCI NIH HHS: R01 CA095175, R01 CA217991, R01 CA218668; NIGMS NIH HHS: R35 GM127029; Wellcome Trust: 088340, 206194, WT088340MA

    Nature 2020;578;7793;112-121

  • The Deep Genome Project.

    Lloyd KCK, Adams DJ, Baynam G, Beaudet AL, Bosch F, Boycott KM, Braun RE, Caulfield M, Cohn R, Dickinson ME, Dobbie MS, Flenniken AM, Flicek P, Galande S, Gao X, Grobler A, Heaney JD, Herault Y, de Angelis MH, Lupski JR, Lyonnet S, Mallon AM, Mammano F, MacRae CA, McInnes R, McKerlie C, Meehan TF, Murray SA, Nutter LMJ, Obata Y, Parkinson H, Pepper MS, Sedlacek R, Seong JK, Shiroishi T, Smedley D, Tocchini-Valentini G, Valle D, Wang CL, Wells S, White J, Wurst W, Xu Y and Brown SDM

    Department of Surgery, School of Medicine, and Mouse Biology Program, University of California, Davis, CA, 95618, USA.

    Funded by: British Heart Foundation: FS/12/82/29736; Medical Research Council: G9521010, MC_EX_MR/M009203/1, MC_PC_14089, MC_U142684172, MR/M009203/1; NHGRI NIH HHS: UM1 HG006348

    Genome biology 2020;21;1;18

  • A mosaic tetracycline resistance gene tet(S/M) detected in an MDR pneumococcal CC230 lineage that underwent capsular switching in South Africa.

    Lo SW, Gladstone RA, van Tonder AJ, Du Plessis M, Cornick JE, Hawkins PA, Madhi SA, Nzenze SA, Kandasamy R, Ravikumar KL, Elmdaghri N, Kwambana-Adams B, Almeida SCG, Skoczynska A, Egorova E, Titov L, Saha SK, Paragi M, Everett DB, Antonio M, Klugman KP, Li Y, Metcalf BJ, Beall B, McGee L, Breiman RF, Bentley SD, von Gottberg A and Global Pneumococcal Sequencing Consortium

    Parasites and Microbes Programme, The Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Objectives: We reported tet(S/M) in Streptococcus pneumoniae and investigated its temporal spread in relation to nationwide clinical interventions.

    Methods: We whole-genome sequenced 12 254 pneumococcal isolates from 29 countries on an Illumina HiSeq sequencer. Serotype, multilocus ST and antibiotic resistance were inferred from genomes. An SNP tree was built using Gubbins. Temporal spread was reconstructed using a birth-death model.

    Results: We identified tet(S/M) in 131 pneumococcal isolates and none carried other known tet genes. Tetracycline susceptibility testing results were available for 121 tet(S/M)-positive isolates and all were resistant. A majority (74%) of tet(S/M)-positive isolates were from South Africa and caused invasive diseases among young children (59% HIV positive, where HIV status was available). All but two tet(S/M)-positive isolates belonged to clonal complex (CC) 230. A global phylogeny of CC230 (n=389) revealed that tet(S/M)-positive isolates formed a sublineage predicted to exhibit resistance to penicillin, co-trimoxazole, erythromycin and tetracycline. The birth-death model detected an unrecognized outbreak of this sublineage in South Africa between 2000 and 2004 with expected secondary infections (effective reproductive number, R) of ∼2.5. R declined to ∼1.0 in 2005 and <1.0 in 2012. The declining epidemic could be related to improved access to ART in 2004 and introduction of pneumococcal conjugate vaccine (PCV) in 2009. Capsular switching from vaccine serotype 14 to non-vaccine serotype 23A was observed within the sublineage.

    Conclusions: The prevalence of tet(S/M) in pneumococci was low and its dissemination was due to an unrecognized outbreak of CC230 in South Africa. Capsular switching in this MDR sublineage highlighted its potential to continue to cause disease in the post-PCV13 era.

    Funded by: Wellcome Trust

    The Journal of antimicrobial chemotherapy 2020;75;3;512-520

  • Genomic and Phenotypic Analyses of Acinetobacter baumannii Isolates From Three Tertiary Care Hospitals in Thailand.

    Loraine J, Heinz E, Soontarach R, Blackwell GA, Stabler RA, Voravuthikunchai SP, Srimanote P, Kiratisin P, Thomson NR and Taylor PW

    School of Pharmacy, University College London, London, United Kingdom.

    Antibiotic resistant strains of <i>Acinetobacter baumannii</i> are responsible for a large and increasing burden of nosocomial infections in Thailand and other countries of Southeast Asia. New approaches to their control and treatment are urgently needed and an attractive strategy is to remove the bacterial polysaccharide capsule, and thus the protection from the host's immune system. To examine phylogenetic relationships, distribution of capsule chemotypes, acquired antibiotic resistance determinants, susceptibility to complement and other traits associated with systemic infection, we sequenced 191 isolates from three tertiary referral hospitals in Thailand and used phenotypic assays to characterize key aspects of infectivity. Several distinct lineages were circulating in three hospitals and the majority belonged to global clonal group 2 (GC2). Very high levels of resistance to carbapenems and other front-line antibiotics were found, as were a number of widespread plasmid replicons. A high diversity of capsule genotypes was encountered, with only three of these (KL6, KL10, and KL47) showing more than 10% frequency. Almost 90% of GC2 isolates belonged to the most common capsule genotypes and were fully resistant to the bactericidal action of human serum complement, most likely protected by their polysaccharide capsule, which represents a key determinant of virulence for systemic infection. Our study further highlights the importance to develop therapeutic strategies to remove the polysaccharide capsule from extensively drug-resistant <i>A. baumanii</i> during the course of systemic infection.

    Frontiers in microbiology 2020;11;548

  • Influence of past climate change on phylogeography and demographic history of narwhals, Monodon monoceros.

    Louis M, Skovrind M, Samaniego Castruita JA, Garilao C, Kaschner K, Gopalakrishnan S, Haile JS, Lydersen C, Kovacs KM, Garde E, Heide-Jørgensen MP, Postma L, Ferguson SH, Willerslev E and Lorenzen ED

    Globe Institute, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark.

    The Arctic is warming at an unprecedented rate, with unknown consequences for endemic fauna. However, Earth has experienced severe climatic oscillations in the past, and understanding how species responded to them might provide insight into their resilience to near-future climatic predictions. Little is known about the responses of Arctic marine mammals to past climatic shifts, but narwhals (<i>Monodon monoceros</i>) are considered one of the endemic Arctic species most vulnerable to environmental change. Here, we analyse 121 complete mitochondrial genomes from narwhals sampled across their range and use them in combination with species distribution models to elucidate the influence of past and ongoing climatic shifts on their population structure and demographic history. We find low levels of genetic diversity and limited geographic structuring of genetic clades. We show that narwhals experienced a long-term low effective population size, which increased after the Last Glacial Maximum, when the amount of suitable habitat expanded. Similar post-glacial habitat release has been a key driver of population size expansion of other polar marine predators. Our analyses indicate that habitat availability has been critical to the success of narwhals, raising concerns for their fate in an increasingly warming Arctic.

    Proceedings. Biological sciences 2020;287;1925;20192964

  • Structural variation of the malaria-associated human glycophorin A-B-E region.

    Louzada S, Algady W, Weyell E, Zuccherato LW, Brajer P, Almalki F, Scliar MO, Naslavsky MS, Yamamoto GL, Duarte YAO, Passos-Bueno MR, Zatz M, Yang F and Hollox EJ

    Wellcome Sanger Institute, Hinxton, Cambridge, UK.

    Background: Approximately 5% of the human genome shows common structural variation, which is enriched for genes involved in the immune response and cell-cell interactions. A well-established region of extensive structural variation is the glycophorin gene cluster, comprising three tandemly-repeated regions about 120 kb in length and carrying the highly homologous genes GYPA, GYPB and GYPE. Glycophorin A (encoded by GYPA) and glycophorin B (encoded by GYPB) are glycoproteins present at high levels on the surface of erythrocytes, and they have been suggested to act as decoy receptors for viral pathogens. They are receptors for the invasion of the protist parasite Plasmodium falciparum, a causative agent of malaria. A particular complex structural variant, called DUP4, creates a GYPB-GYPA fusion gene known to confer resistance to malaria. Many other structural variants exist across the glycophorin gene cluster, and they remain poorly characterised.

    Results: Here, we analyse sequences from 3234 diploid genomes from across the world for structural variation at the glycophorin locus, confirming 15 variants in the 1000 Genomes project cohort, discovering 9 new variants, and characterising a selection of these variants using fibre-FISH and breakpoint mapping at the sequence level. We identify variants predicted to create novel fusion genes and a common inversion duplication variant at appreciable frequencies in West Africans. We show that almost all variants can be explained by non-allelic homologous recombination and by comparing the structural variant breakpoints with recombination hotspot maps, confirm the importance of a particular meiotic recombination hotspot on structural variant formation in this region.

    Conclusions: We identify and validate large structural variants in the human glycophorin A-B-E gene cluster which may be associated with different clinical aspects of malaria.

    Funded by: Saudi Arabia Cultural Bureau in London: n/a; Wellcome Trust: WT098051

    BMC genomics 2020;21;1;446

  • Tumor necrosis factor receptor family costimulation increases regulatory T-cell activation and function via NF-κB.

    Lubrano di Ricco M, Ronin E, Collares D, Divoux J, Grégoire S, Wajant H, Gomes T, Grinberg-Bleyer Y, Baud V, Marodon G and Salomon BL

    Sorbonne Université, INSERM, CNRS, Centre d'Immunologie et des Maladies Infectieuses (CIMI-Paris), Paris, France.

    Several drugs targeting members of the TNF superfamily or TNF receptor superfamily (TNFRSF) are widely used in medicine or are currently being tested in therapeutic trials. However, their mechanism of action remains poorly understood. Here, we explored the effects of TNFRSF co-stimulation on murine Foxp3<sup>+</sup> regulatory T cell (Treg) biology, as they are pivotal modulators of immune responses. We show that engagement of TNFR2, 4-1BB, GITR, and DR3, but not OX40, increases Treg proliferation and survival. Triggering these TNFRSF in Tregs induces similar changes in gene expression patterns, suggesting that they engage common signal transduction pathways. Among them, we identified a major role of canonical NF-κB. Importantly, TNFRSF co-stimulation improves the ability of Tregs to suppress colitis. Our data demonstrate that stimulation of discrete TNFRSF members enhances Treg activation and function through a shared mechanism. Consequently, therapeutic effects of drugs targeting TNFRSF or their ligands may be mediated by their effect on Tregs.

    Funded by: Agence Nationale de la Recherche: ANR-15-CE15-0015-01, ANR-17-CE15-0030-01; Deutsche Forschungsgemeinschaft: 324392634; ENLIGHT-TEN: 675395; ENLIGHT-TEN program; European Commission: 675395; European Union's: H2020; Fondation pour la Recherche Médicale

    European journal of immunology 2020

  • Genomic surveillance of Escherichia coli ST131 identifies local expansion and serial replacement of subclones.

    Ludden C, Decano AG, Jamrozy D, Pickard D, Morris D, Parkhill J, Peacock SJ, Cormican M and Downing T

    London School of Hygiene & Tropical Medicine, Keppel Street, London WC1E 7HT, UK.

    <i>Escherichia coli</i> sequence type 131 (ST131) is a pandemic clone that is evolving rapidly with increasing levels of antimicrobial resistance. Here, we investigated an outbreak of <i>E. coli</i> ST131 producing extended spectrum β-lactamases (ESBLs) in a long-term care facility (LTCF) in Ireland by combining data from this LTCF (<i>n</i>=69) with other Irish (<i>n</i>=35) and global (<i>n</i>=690) ST131 genomes to reconstruct the evolutionary history and understand changes in population structure and genome architecture over time. This required a combination of short- and long-read genome sequencing, <i>de novo</i> assembly, read mapping, ESBL gene screening, plasmid alignment and temporal phylogenetics. We found that Clade C was the most prevalent (686 out of 794 isolates, 86 %) of the three major ST131 clades circulating worldwide (A with <i>fimH41</i>, B with <i>fimH22</i>, C with <i>fimH30</i>), and was associated with the presence of different ESBL alleles, diverse plasmids and transposable elements. Clade C was estimated to have emerged in <i>c</i>. 1985 and subsequently acquired different ESBL gene variants (<i>bla</i><sub>CTX-M-14</sub> vs <i>bla</i><sub>CTX-M-15</sub>). An ISEcp<i>1-</i>mediated transposition of the <i>bla</i><sub>CTX-M-15</sub> gene further increased the diversity within Clade C. We discovered a local clonal expansion of a rare C2 lineage (C2_8) with a chromosomal insertion of <i>bla</i><sub>CTX-M-15</sub> at the <i>mppA</i> gene. This was acquired from an IncFIA plasmid. The C2_8 lineage clonally expanded in the Irish LTCF from 2006, displacing the existing C1 strain (C1_10), highlighting the potential for novel ESBL-producing ST131 with a distinct genetic profile to cause outbreaks strongly associated with specific healthcare environments.

    Microbial genomics 2020

  • A One Health Study of the Genetic Relatedness of Klebsiella pneumoniae and Their Mobile Elements in the East of England.

    Ludden C, Moradigaravand D, Jamrozy D, Gouliouris T, Blane B, Naydenova P, Hernandez-Garcia J, Wood P, Hadjirin N, Radakovic M, Crawley C, Brown NM, Holmes M, Parkhill J and Peacock SJ

    Department of Pathogen Molecular Biology, London School of Hygiene & Tropical Medicine, Hinxton.

    Background: Klebsiella pneumoniae is a human, animal, and environmental commensal and a leading cause of nosocomial infections, which are often caused by multiresistant strains. We evaluate putative sources of K. pneumoniae that are carried by and infect hospital patients.

    Methods: We conducted a 6-month survey on 2 hematology wards at Addenbrooke's Hospital, Cambridge, United Kingdom, in 2015 to isolate K. pneumoniae from stool, blood, and the environment. We conducted cross-sectional surveys of K. pneumoniae from 29 livestock farms, 97 meat products, the hospital sewer, and 20 municipal wastewater treatment plants in the East of England between 2014 and 2015. Isolates were sequenced and their genomes compared.

    Results: Klebsiella pneumoniae was isolated from stool of 17/149 (11%) patients and 18/922 swabs of their environment, together with 1 bloodstream infection during the study and 4 others over a 24-month period. Each patient carried 1 or more lineages that was unique to them, but 2 broad environmental contamination events and patient-environment transmission were identified. Klebsiella pneumoniae was isolated from cattle, poultry, hospital sewage, and 12/20 wastewater treatment plants. There was low genetic relatedness between isolates from patients/their hospital environment vs isolates from elsewhere. Identical genes encoding cephalosporin resistance were carried by isolates from humans/environment and elsewhere but were carried on different plasmids.

    Conclusion: We identified no patient-to-patient transmission and no evidence for livestock as a source of K. pneumoniae infecting humans. However, our findings reaffirm the importance of the hospital environment as a source of K. pneumoniae associated with serious human infection.

    Clinical infectious diseases : an official publication of the Infectious Diseases Society of America 2020;70;2;219-226

  • A panel of recombinant proteins from human-infective Plasmodium species for serological surveillance.

    Müller-Sienerth N, Shilts J, Kadir KA, Yman V, Homann MV, Asghar M, Ngasala B, Singh B, Färnert A and Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Sanger Institute, Cambridge, UK.

    Background: Malaria remains a global health problem and accurate surveillance of Plasmodium parasites that are responsible for this disease is required to guide the most effective distribution of control measures. Serological surveillance will be particularly important in areas of low or periodic transmission because patient antibody responses can provide a measure of historical exposure. While methods for detecting host antibody responses to Plasmodium falciparum and Plasmodium vivax are well established, development of serological assays for Plasmodium knowlesi, Plasmodium ovale and Plasmodium malariae have been inhibited by a lack of immunodiagnostic candidates due to the limited availability of genomic information.

    Methods: Using the recently completed genome sequences from P. malariae, P. ovale and P. knowlesi, a set of 33 candidate cell surface and secreted blood-stage antigens was selected and expressed in a recombinant form using a mammalian expression system. These proteins were added to an existing panel of antigens from P. falciparum and P. vivax and the immunoreactivity of IgG, IgM and IgA immunoglobulins from individuals diagnosed with infections to each of the five different Plasmodium species was evaluated by ELISA. Logistic regression modelling was used to quantify the ability of the responses to determine prior exposure to the different Plasmodium species.

    Results: Using sera from European travellers with diagnosed Plasmodium infections, antigens showing species-specific immunoreactivity were identified to select a panel of 22 proteins from five Plasmodium species for serological profiling. The immunoreactivity to the antigens in the panel of sera taken from travellers and individuals living in malaria-endemic regions with diagnosed infections showed moderate power to predict infections by each species, including P. ovale, P. malariae and P. knowlesi. Using a larger set of patient samples and logistic regression modelling it was shown that exposure to P. knowlesi could be accurately detected (AUC = 91%) using an antigen panel consisting of the P. knowlesi orthologues of MSP10, P12 and P38.

    Conclusions: Using the recent availability of genome sequences to all human-infective Plasmodium spp. parasites and a method of expressing Plasmodium proteins in a secreted functional form, an antigen panel has been compiled that will be useful to determine exposure to these parasites.

    Funded by: Universiti Malaysia Sarawak: FA052000-0706-0002; Wellcome Trust: 206194

    Malaria journal 2020;19;1;31

  • Stimulation strength controls the rate of initiation but not the molecular organization of TCR-induced signalling.

    Ma CY, Marioni JC, Griffiths GM and Richard AC

    Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom.

    Millions of naïve T cells with different TCRs may interact with a peptide-MHC ligand, but very few will activate. Remarkably, this fine control is orchestrated using a limited set of intracellular machinery. It remains unclear whether changes in stimulation strength alter the programme of signalling events leading to T cell activation. Using mass cytometry to simultaneously measure multiple signalling pathways during activation of murine CD8<sup>+</sup> T cells, we found a programme of distal signalling events that is shared, regardless of the strength of TCR stimulation. Moreover, the relationship between transcription of early response genes <i>Nr4a1</i> and <i>Irf8</i> and activation of the ribosomal protein S6 is also conserved across stimuli. Instead, we found that stimulation strength dictates the rate with which cells initiate signalling through this network. These data suggest that TCR-induced signalling results in a coordinated activation program, modulated in rate but not organization by stimulation strength.

    Funded by: Addenbrooke's Charitable Trust, Cambridge University Hospitals: 23/17 A (ii); Cancer Research UK: A17197; Medical Research Council: MR/P014178/1; Wellcome: 103930,100140,217100, 204017/Z/16/Z

    eLife 2020;9

  • Pervasive Strong Selection at the Level of Codon Usage Bias in Drosophila melanogaster.

    Machado HE, Lawrie DS and Petrov DA

    Cancer, Ageing, and Somatic Mutation, Wellcome Sanger Institute, Hinxton CB10 1SA, UK

    Codon usage bias (CUB), where certain codons are used more frequently than expected by chance, is a ubiquitous phenomenon and occurs across the tree of life. The dominant paradigm is that the proportion of preferred codons is set by weak selection. While experimental changes in codon usage have at times shown large phenotypic effects in contrast to this paradigm, genome-wide population genetic estimates have supported the weak selection model. Here we use deep genomic population sequencing of two <i>Drosophila melanogaster</i> populations to measure selection on synonymous sites in a way that allowed us to estimate the prevalence of both weak and strong purifying selection. We find that selection in favor of preferred codons ranges from weak (<i>|N<sub>e</sub>s| ∼</i> 1) to strong (<i>|N<sub>e</sub>s| ></i> 10), with strong selection acting on 10-20% of synonymous sites in preferred codons. While previous studies indicated that selection at synonymous sites could be strong, this is the first study to detect and quantify strong selection specifically at the level of CUB. Further, we find that CUB-associated polymorphism accounts for the majority of strong selection on synonymous sites, with secondary contributions of splicing (selection on alternatively spliced genes, splice junctions, and spliceosome-bound sites) and transcription factor binding. Our findings support a new model of CUB and indicate that the functional importance of CUB, as well as synonymous sites in general, have been underestimated.

    Genetics 2020;214;2;511-528

  • HBO1 is required for the maintenance of leukaemia stem cells.

    MacPherson L, Anokye J, Yeung MM, Lam EYN, Chan YC, Weng CF, Yeh P, Knezevic K, Butler MS, Hoegl A, Chan KL, Burr ML, Gearing LJ, Willson T, Liu J, Choi J, Yang Y, Bilardi RA, Falk H, Nguyen N, Stupple PA, Peat TS, Zhang M, de Silva M, Carrasco-Pozo C, Avery VM, Khoo PS, Dolezal O, Dennis ML, Nuttall S, Surjadi R, Newman J, Ren B, Leaver DJ, Sun Y, Baell JB, Dovey O, Vassiliou GS, Grebien F, Dawson SJ, Street IP, Monahan BJ, Burns CJ, Choudhary C, Blewitt ME, Voss AK, Thomas T and Dawson MA

    Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia.

    Acute myeloid leukaemia (AML) is a heterogeneous disease characterized by transcriptional dysregulation that results in a block in differentiation and increased malignant self-renewal. Various epigenetic therapies aimed at reversing these hallmarks of AML have progressed into clinical trials, but most show only modest efficacy owing to an inability to effectively eradicate leukaemia stem cells (LSCs)<sup>1</sup>. Here, to specifically identify novel dependencies in LSCs, we screened a bespoke library of small hairpin RNAs that target chromatin regulators in a unique ex vivo mouse model of LSCs. We identify the MYST acetyltransferase HBO1 (also known as KAT7 or MYST2) and several known members of the HBO1 protein complex as critical regulators of LSC maintenance. Using CRISPR domain screening and quantitative mass spectrometry, we identified the histone acetyltransferase domain of HBO1 as being essential in the acetylation of histone H3 at K14. H3 acetylated at K14 (H3K14ac) facilitates the processivity of RNA polymerase II to maintain the high expression of key genes (including Hoxa9 and Hoxa10) that help to sustain the functional properties of LSCs. To leverage this dependency therapeutically, we developed a highly potent small-molecule inhibitor of HBO1 and demonstrate its mode of activity as a competitive analogue of acetyl-CoA. Inhibition of HBO1 phenocopied our genetic data and showed efficacy in a broad range of human cell lines and primary AML cells from patients. These biological, structural and chemical insights into a therapeutic target in AML will enable the clinical translation of these findings.

    Funded by: Howard Hughes Medical Institute; Medical Research Council: MC_PC_12009

    Nature 2020;577;7789;266-270

  • SF3B1-mutant myelodysplastic syndrome as a distinct disease subtype - A Proposal of the International Working Group for the Prognosis of Myelodysplastic Syndromes (IWG-PM).

    Malcovati L, Stevenson K, Papaemmanuil E, Neuberg D, Bejar R, Boultwood J, Bowen DT, Campbell PJ, Ebert BL, Fenaux P, Haferlach T, Heuser M, Jansen JH, Komrokji RS, Maciejewski JP, Walter MJ, Fontenay M, Garcia-Manero G, Graubert TA, Karsan A, Meggendorfer M, Pellagatti A, Sallman DA, Savona MR, Sekeres M, Steensma DP, Tauro S, Thol F, Vyas P, Van de Loosdrecht AA, Haase DT, Tuechler H, Greenberg PL, Ogawa S, Hellstrom-Lindberg ES and Cazzola M

    University of Pavia & S. Matteo Hospital, Pavia, Italy.

    The 2016 revision of the World Health Organization (WHO) classification of tumors of hematopoietic and lymphoid tissues is characterized by a closer integration of morphology and molecular genetics. Notwithstanding, the myelodysplastic syndrome (MDS) with isolated del(5q) remains so far the only MDS subtype defined by a genetic abnormality. About half of MDS patients carry somatic mutations in spliceosome genes, with SF3B1 being the most commonly mutated one. SF3B1 mutation identifies a condition characterized by ring sideroblasts, ineffective erythropoiesis, and indolent clinical course. A large body of evidence supports recognition of SF3B1-mutant MDS as a distinct nosologic entity. To further validate this notion, we interrogated the dataset of the International Working Group for the Prognosis of MDS (IWG-PM). Based on the findings of our analyses, we propose the following diagnostic criteria for SF3B1-mutant MDS: (i) cytopenia as defined by standard hematologic values; (ii) somatic SF3B1 mutation; (iii) morphologic dysplasia (with or without ring sideroblasts); (iv) bone marrow blasts <5% and peripheral blood blasts <1%. Selected concomitant genetic lesions represent exclusion criteria for the proposed entity. In patients with clonal cytopenia of undetermined significance, SF3B1 mutation is almost invariably associated with subsequent development of overt MDS with ring sideroblasts, suggesting that this genetic lesion provides presumptive evidence of MDS in the setting of persistent unexplained cytopenia. Diagnosis of SF3B1-mutant MDS has considerable clinical implications in terms of risk stratification and therapeutic decision making. In fact, this condition has a relatively good prognosis and may respond to luspatercept with abolishment of transfusion requirement.

    Blood 2020

  • Profiling immunoglobulin repertoires across multiple human tissues using RNA sequencing.

    Mandric I, Rotman J, Yang HT, Strauli N, Montoya DJ, Van Der Wey W, Ronas JR, Statz B, Yao D, Petrova V, Zelikovsky A, Spreafico R, Shifman S, Zaitlen N, Rossetti M, Ansel KM, Eskin E and Mangul S

    Department of Computer Science, University of California, Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA.

    Profiling immunoglobulin (Ig) receptor repertoires with specialized assays can be cost-ineffective and time-consuming. Here we report ImReP, a computational method for rapid and accurate profiling of the Ig repertoire, including the complementary-determining region 3 (CDR3), using regular RNA sequencing data such as those from 8,555 samples across 53 tissues types from 544 individuals in the Genotype-Tissue Expression (GTEx v6) project. Using ImReP and GTEx v6 data, we generate a collection of 3.6 million Ig sequences, termed the atlas of immunoglobulin repertoires (TAIR), across a broad range of tissue types that often do not have reported Ig repertoires information. Moreover, the flow of Ig clonotypes and inter-tissue repertoire similarities across immune-related tissues are also evaluated. In summary, TAIR is one of the largest collections of CDR3 sequences and tissue types, and should serve as an important resource for studying immunological diseases.

    Nature communications 2020;11;1;3126

  • Exome Sequencing Identifies Genes and Gene Sets Contributing to Severe Childhood Obesity, Linking PHIP Variants to Repressed POMC Transcription.

    Marenne G, Hendricks AE, Perdikari A, Bounds R, Payne F, Keogh JM, Lelliott CJ, Henning E, Pathan S, Ashford S, Bochukova EG, Mistry V, Daly A, Hayward C, INTERVAL, UK10K Consortium, Wareham NJ, O'Rahilly S, Langenberg C, Wheeler E, Zeggini E, Farooqi IS and Barroso I

    Wellcome Sanger Institute, Cambridge, UK; Inserm, Univ Brest, EFS, UMR 1078, GGB, 29200 Brest, France.

    Obesity is genetically heterogeneous with monogenic and complex polygenic forms. Using exome and targeted sequencing in 2,737 severely obese cases and 6,704 controls, we identified three genes (PHIP, DGKI, and ZMYM4) with an excess burden of very rare predicted deleterious variants in cases. In cells, we found that nuclear PHIP (pleckstrin homology domain interacting protein) directly enhances transcription of pro-opiomelanocortin (POMC), a neuropeptide that suppresses appetite. Obesity-associated PHIP variants repressed POMC transcription. Our demonstration that PHIP is involved in human energy homeostasis through transcriptional regulation of central melanocortin signaling has potential diagnostic and therapeutic implications for patients with obesity and developmental delay. Additionally, we found an excess burden of predicted deleterious variants involving genes nearest to loci from obesity genome-wide association studies. Genes and gene sets influencing obesity with variable penetrance provide compelling evidence for a continuum of causality in the genetic architecture of obesity, and explain some of its missing heritability.

    Cell metabolism 2020;31;6;1107-1119.e12

  • An enhanced toolkit for the generation of knockout and marker-free fluorescent Plasmodium chabaudi.

    Marr EJ, Milne RM, Anar B, Girling G, Schwach F, Mooney JP, Nahrendorf W, Spence PJ, Cunningham D, Baker DA, Langhorne J, Rayner JC, Billker O, Bushell ES and Thompson J

    Institute of Immunology and Infection Research, School of Biological Sciences, University of Edinburgh, Ashworth Laboratories, The King's Buildings, Edinburgh, EH9 3FL, UK.

    The rodent parasite <i>Plasmodium chabaudi</i> is an important <i>in vivo</i> model of malaria. The ability to produce chronic infections makes it particularly useful for investigating the development of anti- <i>Plasmodium</i> immunity, as well as features associated with parasite virulence during both the acute and chronic phases of infection. <i>P. chabaudi</i> also undergoes asexual maturation (schizogony) and erythrocyte invasion in culture, so offers an experimentally-amenable <i>in vivo</i> to <i>in vitro</i> model for studying gene function and drug activity during parasite replication. To extend the usefulness of this model, we have further optimised transfection protocols and plasmids for <i>P. chabaudi</i> and generated stable, fluorescent lines that are free from drug-selectable marker genes. These mother-lines show the same infection dynamics as wild-type parasites throughout the lifecycle in mice and mosquitoes; furthermore, their virulence can be increased by serial blood passage and reset by mosquito transmission. We have also adapted the large-insert, linear <i>Plasmo</i>GEM vectors that have revolutionised the scale of experimental genetics in another rodent malaria parasite and used these to generate barcoded <i>P. chabaudi</i> gene-deletion and -tagging vectors for transfection in our fluorescent <i>P. chabaudi</i> mother-lines. This produces a tool-kit of <i>P. chabaudi</i> lines, vectors and transfection approaches that will be of broad utility to the research community.

    Wellcome open research 2020;5;71

  • Nature via Nurture, the Martin Way.

    Martin HC

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

    I recount early formative experiences with my father, Nick Martin.

    Twin research and human genetics : the official journal of the International Society for Twin Studies 2020;1-2

  • Few SINEs of life: Alu elements have little evidence for biological relevance despite elevated translation.

    Martinez-Gomez L, Abascal F, Jungreis I, Pozo F, Kellis M, Mudge JM and Tress ML

    Bioinformatics Unit, Spanish National Cancer Research Centre, 28029 Madrid, Spain.

    Transposable elements colonize genomes and with time may end up being incorporated into functional regions. SINE Alu elements, which appeared in the primate lineage, are ubiquitous in the human genome and more than a thousand overlap annotated coding exons. Although almost all Alu-derived coding exons appear to be in alternative transcripts, they have been incorporated into the main coding transcript in at least 11 genes. The extent to which Alu regions are incorporated into functional proteins is unclear, but we detected reliable peptide evidence to support the translation to protein of 33 Alu-derived exons. All but one of the Alu elements for which we detected peptides were frame-preserving and there was proportionally seven times more peptide evidence for Alu elements as for other primate exons. Despite this strong evidence for translation to protein we found no evidence of selection, either from cross species alignments or human population variation data, among these Alu-derived exons. Overall, our results confirm that SINE Alu elements have contributed to the expansion of the human proteome, and this contribution appears to be stronger than might be expected over such a relatively short evolutionary timeframe. Despite this, the biological relevance of these modifications remains open to question.

    Funded by: NHGRI NIH HHS: U41 HG007234

    NAR genomics and bioinformatics 2020;2;1;lqz023

  • An ancient baboon genome demonstrates long-term population continuity in southern Africa.

    Mathieson I, Abascal F, Vinner L, Skoglund P, Pomilla C, Mitchell P, Arthur C, Gurdasani D, Willerslev E, Sandhu MS and Dewar G

    Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia PA, USA.

    Baboons are one of the most abundant large nonhuman primates and are widely studied in biomedical, behavioral and anthropological research. Despite this, our knowledge of their evolutionary and demographic history remains incomplete. Here, we report a 0.9-fold coverage genome sequence from a 5800-year-old baboon from the site of Ha Makotoko in Lesotho. The ancient baboon is closely related to present-day Papio ursinus individuals from southern Africa-indicating a high degree of continuity in the southern African baboon population. This level of population continuity is rare in recent human populations, but may provide a good model for the evolution of Homo and other large primates over similar timespans in structured populations throughout Africa.

    Genome biology and evolution 2020

  • The Evolutionary Genomics of Host Specificity in Staphylococcus aureus.

    Matuszewska M, Murray GGR, Harrison EM, Holmes MA and Weinert LA

    Department of Veterinary Medicine, University of Cambridge, Cambridge, CB3 0ES, UK.

    Staphylococcus aureus is an important human bacterial pathogen that has a cosmopolitan host range, including livestock, companion and wild animal species. Genomic and epidemiological studies show that S. aureus has jumped between host species many times over its evolutionary history. These jumps have involved the dynamic gain and loss of host-specific adaptive genes, usually located on mobile genetic elements. The same functional elements are often consistently gained in jumps into a particular species. Further sampling of diverse animal species is likely to uncover an even broader host range and greater genetic diversity of S. aureus than is already known, and understanding S. aureus host specificity in these hosts will mitigate the risks of emergent human and livestock strains.

    Funded by: Medical Research Council: MR/N002660/1, MR/P007201/1, MR/S00291X/1

    Trends in microbiology 2020

  • Factors Affecting Sentinel Node Metastasis in Thin (T1) Cutaneous Melanomas: Development and External Validation of a Predictive Nomogram.

    Maurichi A, Miceli R, Eriksson H, Newton-Bishop J, Nsengimana J, Chan M, Hayes AJ, Heelan K, Adams D, Patuzzo R, Barretta F, Gallino G, Harwood C, Bergamaschi D, Bennett D, Lasithiotakis K, Ghiorzo P, Dalmasso B, Manganoni A, Consoli F, Mattavelli I, Barbieri C, Leva A, Cortinovis U, Espeli V, Mangas C, Quaglino P, Ribero S, Broganelli P, Pellacani G, Longo C, Del Forno C, Borgognoni L, Sestini S, Pimpinelli N, Fortunato S, Chiarugi A, Nardini P, Morittu E, Florita A, Cossa M, Valeri B, Milione M, Pruneri G, Zoras O, Anichini A, Mortarini R and Santinami M

    Melanoma and Sarcoma Unit, Fondazione Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Istituto Nazionale dei Tumori di Milano, Milan, Italy.

    Purpose: Thin melanomas (T1; ≤ 1 mm) constitute 70% of newly diagnosed cutaneous melanomas. Regional node metastasis determined by sentinel node biopsy (SNB) is an important prognostic factor for T1 melanoma. However, current melanoma guidelines do not provide clear indications on when to perform SNB in T1 disease and stress an individualized approach to SNB that considers all clinicopathologic risk factors. We aimed to identify determinants of sentinel node (SN) status for incorporation into an externally validated nomogram to better select patients with T1 disease for SNB.

    Patients and methods: The development cohort comprised 3,666 patients with T1 disease consecutively treated at the Istituto Nazionale Tumori (Milan, Italy) between 2001 and 2018; 4,227 patients with T1 disease treated at 13 other European centers over the same period formed the validation cohort. A random forest procedure was applied to the development data set to select characteristics associated with SN status for inclusion in a multiple binary logistic model from which a nomogram was elaborated. Decision curve analyses assessed the clinical utility of the nomogram.

    Results: Of patients in the development cohort, 1,635 underwent SNB; 108 patients (6.6%) were SN positive. By univariable analysis, age, growth phase, Breslow thickness, ulceration, mitotic rate, regression, and lymphovascular invasion were significantly associated with SN status. The random forest procedure selected 6 variables (not growth phase) for inclusion in the logistic model and nomogram. The nomogram proved well calibrated and had good discriminative ability in both cohorts. Decision curve analyses revealed the superior net benefit of the nomogram compared with each individual variable included in it as well as with variables suggested by current guidelines.

    Conclusion: We propose the nomogram as a decision aid in all patients with T1 melanoma being considered for SNB.

    Journal of clinical oncology : official journal of the American Society of Clinical Oncology 2020;JCO1901902

  • Factors associated with occurrence of salmonellosis among children living in Mukuru slum, an urban informal settlement in Kenya.

    Mbae C, Mwangi M, Gitau N, Irungu T, Muendo F, Wakio Z, Wambui R, Kavai S, Onsare R, Wairimu C, Ngetich R, Njeru F, Van Puyvelde S, Clemens J, Dougan G and Kariuki S

    Centre for Microbiology Research, Kenya Medical Research Institute, Off Mbagathi Road, PO Box 54840-00200, Nairobi, Kenya.

    Background: In Kenya, typhoid fever and invasive non-typhoidal salmonellosis present a huge burden of disease, especially in poor-resource settings where clean water supply and sanitation conditions are inadequate. The epidemiology of both diseases is poorly understood in terms of severity and risk factors. The aim of the study was to determine the disease burden and spatial distribution of salmonellosis, as well as socioeconomic and environmental risk factors for these infections, in a large informal settlement near the city of Nairobi, from 2013 to 2017.

    Methods: Initially, a house-to-house baseline census of 150,000 population in Mukuru informal settlement was carried out and relevant socioeconomic, demographic, and healthcare utilization information was collected using structured questionnaires. Salmonella bacteria were cultured from the blood and faeces of children < 16 years of age who reported at three outpatient facilities with fever alone or fever and diarrhea. Tests of association between specific Salmonella serotypes and risk factors were conducted using Pearson Chi-Square (χ<sup>2</sup>) test.

    Results: A total of 16,236 children were recruited into the study. The prevalence of bloodstream infections by Non-Typhoidal Salmonella (NTS), consisting of Salmonella Typhimurium/ Enteriditis, was 1.3%; Salmonella Typhi was 1.4%, and this was highest among children < 16 years of age. Occurrence of Salmonella Typhimurium/ Enteriditis was not significantly associated with rearing any domestic animals. Rearing chicken was significantly associated with high prevalence of S. Typhi (2.1%; p = 0.011). The proportion of children infected with Salmonella Typhimurium/ Enteriditis was significantly higher in households that used water pots as water storage containers compared to using water directly from the tap (0.6%). Use of pit latrines and open defecation were significant risk factors for S. Typhi infection (1.6%; p = 0.048). The proportion of Salmonella Typhimurium/ Enteriditis among children eating street food 4 or more times per week was higher compared to 1 to 2 times/week on average (1.1%; p = 0.032).

    Conclusion: Typhoidal and NTS are important causes of illness in children in Mukuru informal settlement, especially among children less than 16 years of age. Improving Water, Sanitation and Hygiene (WASH) including boiling water, breastfeeding, hand washing practices, and avoiding animal contact in domestic settings could contribute to reducing the risk of transmission of Salmonella disease from contaminated environments.

    Funded by: NIH HHS: R01AI099525

    BMC infectious diseases 2020;20;1;422

  • Cardelino: computational integration of somatic clonal substructure and single-cell transcriptomes.

    McCarthy DJ, Rostom R, Huang Y, Kunz DJ, Danecek P, Bonder MJ, Hagai T, Lyu R, HipSci Consortium, Wang W, Gaffney DJ, Simons BD, Stegle O and Teichmann SA

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK.

    Bulk and single-cell DNA sequencing has enabled reconstructing clonal substructures of somatic tissues from frequency and cooccurrence patterns of somatic variants. However, approaches to characterize phenotypic variations between clones are not established. Here we present cardelino (, a computational method for inferring the clonal tree configuration and the clone of origin of individual cells assayed using single-cell RNA-seq (scRNA-seq). Cardelino flexibly integrates information from imperfect clonal trees inferred based on bulk exome-seq data, and sparse variant alleles expressed in scRNA-seq data. We apply cardelino to a published cancer dataset and to newly generated matched scRNA-seq and exome-seq data from 32 human dermal fibroblast lines, identifying hundreds of differentially expressed genes between cells from different somatic clones. These genes are frequently enriched for cell cycle and proliferation pathways, indicating a role for cell division genes in somatic evolution in healthy skin.

    Nature methods 2020

  • Mechanisms of β-lactam resistance of Streptococcus uberis isolated from bovine mastitis cases.

    McDougall S, Clausen L, Ha HJ, Gibson I, Bryan M, Hadjirin N, Lay E, Raisen C, Ba X, Restif O, Parkhill J and Holmes MA

    Cognosco, Anexa FVC, Morrinsville, New Zealand. Electronic address:

    A number of veterinary clinical pathology laboratories in New Zealand have been reporting emergence of increased minimum in inhibitory concentrations for β-lactams in the common clinical bovine mastitis pathogen Streptococcus uberis. The objective of this study was to determine the genetic basis of this increase in MIC for β-lactams amongst S. uberis. Illumina sequencing and determination of oxacillin MIC was performed on 265 clinical isolates. Published sequences of the five penicillin binding proteins pbp1a, pbp1b, pbp2a, pbp2b, and pbp2x were used to identify, extract and align these sequences from the study isolates. Amino acid substitutions resulting from single nucleotide polymorphisms (SNP) within these genes were analysed for associations with elevated (≥ 0.5 mg/L) oxacillin MIC together with a genome wide association study. The population structure of the study isolates was approximated using a phylogenetic tree generated from an alignment of the core genome. A total of 53 % of isolates had MIC ≥ 0.5 mg/L for oxacillin. A total of 101 substitutions within the five pbp were identified, of which 11 were statistically associated with an MIC ≥ 0.5 mg/L. All 140 isolates which exhibited an increased β-lactam MIC had SNPs leading to pbp2x E<sub>381</sub>K and Q<sub>554</sub>E substitutions. The phylogenetic tree indicated that the genotype and phenotype associated with the increased MIC for oxacillin were present in several different lineages suggesting that acquisition of this increased β-lactam MIC had occurred in multiple geographically distinct regions. Reanalysis of the data from the intervention studies from which the isolates were originally drawn found a tendency for the pbp2x E<sub>381</sub>K substitution to be associated with lower cure rates. It is concluded that there is geographically and genetically widespread presence of pbp substitutions associated with reduced susceptibility to β-lactam antimicrobials. Additionally, presence of pbp substitutions tended to be associated with poorer cure rate outcomes following antimicrobial therapy for clinical mastitis.

    Veterinary microbiology 2020;242;108592

  • Characterising a healthy adult with a rare HAO1 knockout to support a therapeutic strategy for primary hyperoxaluria.

    McGregor TL, Hunt KA, Yee E, Mason D, Nioi P, Ticau S, Pelosi M, Loken PR, Finer S, Lawlor DA, Fauman EB, Huang QQ, Griffiths CJ, MacArthur DG, Trembath RC, Oglesbee D, Lieske JC, Erbe DV, Wright J and van Heel DA

    Clinical Research, Alnylam Pharmaceuticals, Cambridge, United States.

    By sequencing autozygous human populations we identified a healthy adult woman with lifelong complete knockout of <i>HAO1</i> (expected ~1 in 30 million outbred people). <i>HAO1</i> (glycolate oxidase) silencing is the mechanism of lumasiran, an investigational RNA interference therapeutic for primary hyperoxaluria type 1. Her plasma glycolate levels were 12 times, and urinary glycolate 6 times, the upper limit of normal observed in healthy reference individuals (n=67). Plasma metabolomics and lipidomics (1871 biochemicals) revealed 18 markedly elevated biochemicals (>5sd outliers versus n=25 controls) suggesting additional HAO1 effects. Comparison with lumasiran preclinical and clinical trial data suggested she has <2% residual glycolate oxidase activity. Cell line p.Leu333SerfsTer4 expression showed markedly reduced HAO1 protein levels and cellular protein mis-localisation. In this woman, lifelong <i>HAO1</i> knockout is safe and without clinical phenotype, de-risking a therapeutic approach and informing therapeutic mechanisms. Unlocking evidence from the diversity of human genetic variation can facilitate drug development.

    Funded by: Medical Research Council: M009017; Wellcome: WT102627, WT210561

    eLife 2020;9

  • The genome sequence of the Eurasian red squirrel, Sciurus vulgaris Linnaeus 1758.

    Mead D, Fingland K, Cripps R, Portela Miguez R, Smith M, Corton C, Oliver K, Skelton J, Betteridge E, Dolucan J, Dudchenko O, Omer AD, Weisz D, Lieberman Aiden E, Fedrigo O, Mountcastle J, Jarvis E, McCarthy SA, Sims Y, Torrance J, Tracey A, Howe K, Challis R, Durbin R and Blaxter M

    Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK.

    We present a genome assembly from an individual male <i>Sciurus vulgaris</i> (the Eurasian red squirrel; Vertebrata; Mammalia; Eutheria; Rodentia; Sciuridae). The genome sequence is 2.88 gigabases in span. The majority of the assembly is scaffolded into 21 chromosomal-level scaffolds, with both X and Y sex chromosomes assembled.

    Wellcome open research 2020;5;18

  • The genome sequence of the Eurasian river otter, Lutra lutra Linnaeus 1758.

    Mead D, Hailer F, Chadwick E, Portela Miguez R, Smith M, Corton C, Oliver K, Skelton J, Betteridge E, Doulcan JD, Dudchenko O, Omer A, Weisz D, Lieberman Aiden E, McCarthy S, Howe K, Sims Y, Torrance J, Tracey A, Challis R, Durbin R and Blaxter M

    Wellcome Genome Campus, Wellcome Sanger Institute,, Hinxton, CB10 1SA, UK.

    We present a genome assembly from an individual male <i>Lutra lutra</i> (the Eurasian river otter; Vertebrata; Mammalia; Eutheria; Carnivora; Mustelidae). The genome sequence is 2.44 gigabases in span. The majority of the assembly is scaffolded into 20 chromosomal pseudomolecules, with both X and Y sex chromosomes assembled.

    Wellcome open research 2020;5;33

  • Putative cell type discovery from single-cell gene expression data.

    Miao Z, Moreno P, Huang N, Papatheodorou I, Brazma A and Teichmann SA

    European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Cambridge, UK.

    We present the Single-Cell Clustering Assessment Framework, a method for the automated identification of putative cell types from single-cell RNA sequencing (scRNA-seq) data. By iteratively applying a machine learning approach to a given set of cells, we simultaneously identify distinct cell groups and a weighted list of feature genes for each group. The differentially expressed feature genes discriminate the given cell group from other cells. Each such group of cells corresponds to a putative cell type or state, characterized by the feature genes as markers. Benchmarking using expert-annotated scRNA-seq datasets shows that our method automatically identifies the 'ground truth' cell assignments with high accuracy.

    Funded by: Wellcome Trust (Wellcome): 108437/Z/15/Z

    Nature methods 2020

  • Societal considerations in host genome testing for COVID-19.

    Milne R

    Society and Ethics Research Group, Wellcome Genome Campus, Hinxton, UK.

    Genetics in medicine : official journal of the American College of Medical Genetics 2020

  • We need to think about data governance for dementia research in a digital era.

    Milne R and Brayne C

    Society and Ethics Research Group, Wellcome Genome Campus, Hinxton, UK.

    Background: Research into Alzheimer's disease and other dementias increasingly involves large-scale data-sharing initiatives. The development of novel digital tools and assessments is likely to increase the need for these. This presents ethics and governance challenges to ensure the use of these data is able to maximise the benefit to patients and the public.

    Discussion: We consider the challenges associated with informed consent and governance in the context of dementia research. We set out the potential of novel data governance approaches for the future of data sharing for dementia. The data trust model proposed in discussions of data governance may have potentially valuable application for dementia research. Such inclusive approaches to trustworthy data governance should be considered as data-sharing initiatives are established and develop.

    Funded by: Medical Research Council: MR/L023784/1; Wellcome Trust: 206194

    Alzheimer's research & therapy 2020;12;1;17

  • Evaluating drug targets through human loss-of-function genetic variation.

    Minikel EV, Karczewski KJ, Martin HC, Cummings BB, Whiffin N, Rhodes D, Alföldi J, Trembath RC, van Heel DA, Daly MJ, Genome Aggregation Database Production Team, Genome Aggregation Database Consortium, Schreiber SL and MacArthur DG

    Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

    Naturally occurring human genetic variants that are predicted to inactivate protein-coding genes provide an in vivo model of human gene inactivation that complements knockout studies in cells and model organisms. Here we report three key findings regarding the assessment of candidate drug targets using human loss-of-function variants. First, even essential genes, in which loss-of-function variants are not tolerated, can be highly successful as targets of inhibitory drugs. Second, in most genes, loss-of-function variants are sufficiently rare that genotype-based ascertainment of homozygous or compound heterozygous 'knockout' humans will await sample sizes that are approximately 1,000 times those presently available, unless recruitment focuses on consanguineous individuals. Third, automated variant annotation and filtering are powerful, but manual curation remains crucial for removing artefacts, and is a prerequisite for recall-by-genotype efforts. Our results provide a roadmap for human knockout studies and should guide the interpretation of loss-of-function variants in drug development.

    Nature 2020;581;7809;459-464

  • Evaluation of circulating leukocyte transcriptome and its relationship with immune function and blood markers in dairy cows during the transition period.

    Minuti A, Jahan N, Lopreiato V, Piccioli-Cappelli F, Bomba L, Capomaccio S, Loor JJ, Ajmone-Marsan P and Trevisi E

    Department of Animal Sciences, Food and Nutrition - DIANA, Faculty of Agriculture, Food and Environmental Science, Università Cattolica del Sacro Cuore, Via Emilia Parmense 84,, Piacenza, 29122, Italy.

    Dairy cows during the transition period are faced with important physiological changes which include a dysfunctional immune system and an increased inflammatory state. New data are necessary to understand the key factors involved in the immune system regulation. Six dairy cows were sampled during transition period to investigate the leukocyte transcriptome changes and its relationship with blood biomarkers. Blood samples were collected at - 20 ± 2, - 3 ± 1, 3, and 7 days from parturition (DFP). Leukocyte transcriptome was analyzed by deep sequencing technology (Hiseq1000 Illumina, USA). Plasma was analyzed for metabolic biomarkers. Differentially expressed genes (DEG) were used to run an enrichment analysis through the Dynamic Impact Approach (DIA). Considering - 20 DFP as references time, the main KEGG impacted pathways were activated before calving (- 3 DFP) and were connected to lipid metabolism, lipid transport in plasma, and phagosome. The greatest differences were found after parturition with 281 DEG (179 upregulated and 102 downregulated). The activated pathways were mainly related to immunity and endocrine aspects, while metabolic pathways related to lipid and amino acid metabolism were inhibited. Plasma BHBA had a substantial inhibitory impact on KEGG pathways related to DNA replication and cell cycle, while plasma IL-1β had an inhibitory impact on fatty acid elongation in mitochondria and an activated impact in several pathways related to cellular energy metabolism. Overall, this study confirmed that many changes in lipid metabolism and immune competence of the circulating leukocytes occurred in dairy cow around calving. Interestingly, BHBA and IL-1β connected with the transcriptome.

    Functional & integrative genomics 2020;20;2;293-305

  • MGnify: the microbiome analysis resource in 2020.

    Mitchell AL, Almeida A, Beracochea M, Boland M, Burgin J, Cochrane G, Crusoe MR, Kale V, Potter SC, Richardson LJ, Sakharova E, Scheremetjew M, Korobeynikov A, Shlemov A, Kunyavskaya O, Lapidus A and Finn RD

    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

    MGnify ( provides a free to use platform for the assembly, analysis and archiving of microbiome data derived from sequencing microbial populations that are present in particular environments. Over the past 2 years, MGnify (formerly EBI Metagenomics) has more than doubled the number of publicly available analysed datasets held within the resource. Recently, an updated approach to data analysis has been unveiled (version 5.0), replacing the previous single pipeline with multiple analysis pipelines that are tailored according to the input data, and that are formally described using the Common Workflow Language, enabling greater provenance, reusability, and reproducibility. MGnify's new analysis pipelines offer additional approaches for taxonomic assertions based on ribosomal internal transcribed spacer regions (ITS1/2) and expanded protein functional annotations. Biochemical pathways and systems predictions have also been added for assembled contigs. MGnify's growing focus on the assembly of metagenomic data has also seen the number of datasets it has assembled and analysed increase six-fold. The non-redundant protein database constructed from the proteins encoded by these assemblies now exceeds 1 billion sequences. Meanwhile, a newly developed contig viewer provides fine-grained visualisation of the assembled contigs and their enriched annotations.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/M011755/1, BB/N018354/1, BB/R015228/1

    Nucleic acids research 2020;48;D1;D570-D578

  • COngenital heart disease and the Diagnostic yield with Exome sequencing (CODE Study): prospective cohort study and systematic review.

    Mone F, Eberhardt RY, Morris RK, Hurles ME, Mcmullan DJ, Maher ER, Lord J, Chitty LS, Giordano JL, Wapner RJ, Kilby MD and CODE Study Collaborators

    West Midlands Fetal Medicine Centre, Birmingham Women's and Children's National Health Service (NHS) Foundation Trust, Birmingham, UK.

    Objectives: To determine the yield of antenatal exome sequencing (ES) over chromosome microarray (CMA) / conventional karyotyping in; (i) any prenatally diagnosed congenital heart disease (CHD); (ii) isolated CHD; (iii) multi-system CHD and; (iv) CHD by phenotypic subgroup. METHODS A prospective cohort study of 197 trios undergoing ES following CMA/karyotype because CHD was identified prenatally and a systematic review of the literature was performed. MEDLINE, EMBASE and CINAHL (2000-Oct 2019) databases were searched electronically. Selected studies included those with; (i) >3 cases; (ii) initiation of testing based upon a prenatal phenotype only and; (iii) where CMA/karyotyping was negative. PROSPERO No. CRD42019140309 RESULTS: In our cohort ES gave an additional diagnostic yield in; (i) all CHD; (ii) isolated CHD and; (iii) multi-system CHD of 12.7% (n=25/197), 11.5% (n=14/122) and 14.7% (n=11/75) (p=0.81). The pooled incremental yields for the aforementioned categories from 18-studies (n=636) were 21% (95% CI, 15-27%), 11% (95% CI, 7-15%) and 37% (95% CI, 18%-56%) respectively. This did not differ significantly when sub-analyses were limited to studies including >20 cases. In instances of multi-system CHD in the primary analysis, the commonest extra-cardiac anomalies associated with a pathogenic variant were those affecting the genitourinary system 44.2% (n=23/52). Cardiac shunt lesions had the greatest incremental yield, 41% (95% CI, 19-63%), followed by right-sided lesions 26% (95% CI, 9-43%). In the majority of instances pathogenic variants occurred de novo and in autosomal dominant (monoallelic) disease genes (68/96; 70.8%). The commonest monogenic syndrome identified was Kabuki syndrome (n=19/96; 19.8%).

    Conclusions: Despite the apparent incremental yield of prenatal exome sequencing in congenital heart disease, the routine application of such a policy would require the adoption of robust bioinformatic, clinical and ethical pathways. Whilst the greatest yield is with multi-system anomalies, consideration may also be given to performing ES in the presence of isolated cardiac abnormalities. This article is protected by copyright. All rights reserved.

    Ultrasound in obstetrics & gynecology : the official journal of the International Society of Ultrasound in Obstetrics and Gynecology 2020

  • The mutational landscape of normal human endometrial epithelium.

    Moore L, Leongamornlert D, Coorens THH, Sanders MA, Ellis P, Dentro SC, Dawson KJ, Butler T, Rahbari R, Mitchell TJ, Maura F, Nangalia J, Tarpey PS, Brunner SF, Lee-Six H, Hooks Y, Moody S, Mahbubani KT, Jimenez-Linan M, Brosens JJ, Iacobuzio-Donahue CA, Martincorena I, Saeb-Parsy K, Campbell PJ and Stratton MR

    Cancer, Ageing and Somatic Mutation (CASM), Wellcome Sanger Institute, Cambridge, UK.

    All normal somatic cells are thought to acquire mutations, but understanding of the rates, patterns, causes and consequences of somatic mutations in normal cells is limited. The uterine endometrium adopts multiple physiological states over a lifetime and is lined by a gland-forming epithelium<sup>1,2</sup>. Here, using whole-genome sequencing, we show that normal human endometrial glands are clonal cell populations with total mutation burdens that increase at about 29 base substitutions per year and that are many-fold lower than those of endometrial cancers. Normal endometrial glands frequently carry 'driver' mutations in cancer genes, the burden of which increases with age and decreases with parity. Cell clones with drivers often originate during the first decades of life and subsequently progressively colonize the epithelial lining of the endometrium. Our results show that mutational landscapes differ markedly between normal tissues-perhaps shaped by differences in their structure and physiology-and indicate that the procession of neoplastic change that leads to endometrial cancer is initiated early in life.

    Nature 2020;580;7805;640-646

  • Quantitative genetic analysis deciphers the impact of cis and trans regulation on cell-to-cell variability in protein expression levels.

    Morgan MD, Patin E, Jagla B, Hasan M, Quintana-Murci L and Marioni JC

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom.

    Identifying the factors that shape protein expression variability in complex multi-cellular organisms has primarily focused on promoter architecture and regulation of single-cell expression in cis. However, this targeted approach has to date been unable to identify major regulators of cell-to-cell gene expression variability in humans. To address this, we have combined single-cell protein expression measurements in the human immune system using flow cytometry with a quantitative genetics analysis. For the majority of proteins whose variability in expression has a heritable component, we find that genetic variants act in trans, with notably fewer variants acting in cis. Furthermore, we highlight using Mendelian Randomization that these variability-Quantitative Trait Loci might be driven by the cis regulation of upstream genes. This indicates that natural selection may balance the impact of gene regulation in cis with downstream impacts on expression variability in trans.

    PLoS genetics 2020;16;3;e1008686

  • Integration of genomics, metagenomics, and metabolomics to identify interplay between susceptibility alleles and microbiota in adenoma initiation.

    Moskowitz JE, Doran AG, Lei Z, Busi SB, Hart ML, Franklin CL, Sumner LW, Keane TM and Amos-Landgraf JM

    Department of Veterinary Pathobiology, University of Missouri, Columbia, MO, 65201, USA.

    Background: Colorectal cancer (CRC) is a multifactorial disease resulting from both genetic predisposition and environmental factors including the gut microbiota (GM), but deciphering the influence of genetic variants, environmental variables, and interactions with the GM is exceedingly difficult. We previously observed significant differences in intestinal adenoma multiplicity between C57BL/6 J-Apc<sup>Min</sup> (B6-Min/J) from The Jackson Laboratory (JAX), and original founder strain C57BL/6JD-Apc<sup>Min</sup> (B6-Min/D) from the University of Wisconsin.

    Methods: To resolve genetic and environmental interactions and determine their contributions we utilized two genetically inbred, independently isolated Apc<sup>Min</sup> mouse colonies that have been separated for over 20 generations. Whole genome sequencing was used to identify genetic variants unique to the two substrains. To determine the influence of genetic variants and the impact of differences in the GM on phenotypic variability, we used complex microbiota targeted rederivation to generate two Apc mutant mouse colonies harboring complex GMs from two different sources (GMJAX originally from JAX or GMHSD originally from Envigo), creating four Apc<sup>Min</sup> groups. Untargeted metabolomics were used to characterize shifts in the fecal metabolite profile based on genetic variation and differences in the GM.

    Results: WGS revealed several thousand high quality variants unique to the two substrains. No homozygous variants were present in coding regions, with the vast majority of variants residing in noncoding regions. Host genetic divergence between Min/J and Min/D and the complex GM additively determined differential adenoma susceptibility. Untargeted metabolomics revealed that both genetic lineage and the GM collectively determined the fecal metabolite profile, and that each differentially regulates bile acid (BA) metabolism. Metabolomics pathway analysis facilitated identification of a functionally relevant private noncoding variant associated with the bile acid transporter Fatty acid binding protein 6 (Fabp6). Expression studies demonstrated differential expression of Fabp6 between Min/J and Min/D, and the variant correlates with adenoma multiplicity in backcrossed mice.

    Conclusions: We found that both genetic variation and differences in microbiota influences the quantitiative adenoma phenotype in Apc<sup>Min</sup> mice. These findings demonstrate how the use of metabolomics datasets can aid as a functional genomic tool, and furthermore illustrate the power of a multi-omics approach to dissect complex disease susceptibility of noncoding variants.

    Funded by: ODCDC CDC HHS: U42 OD010918

    BMC cancer 2020;20;1;600

  • Loss of IL-10 signaling in macrophages limits bacterial killing driven by prostaglandin E2.

    Mukhopadhyay S, Heinz E, Porreca I, Alasoo K, Yeung A, Yang HT, Schwerd T, Forbester JL, Hale C, Agu CA, Choi YH, Rodrigues J, Capitani M, Jostins-Dean L, Thomas DC, Travis S, Gaffney D, Skarnes WC, Thomson N, Uhlig HH, Dougan G and Powrie F

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Loss of IL-10 signaling in macrophages (Mφs) leads to inflammatory bowel disease (IBD). Induced pluripotent stem cells (iPSCs) were generated from an infantile-onset IBD patient lacking a functional IL10RB gene. Mφs differentiated from IL-10RB-/- iPSCs lacked IL-10RB mRNA expression, were unable to phosphorylate STAT3, and failed to reduce LPS induced inflammatory cytokines in the presence of exogenous IL-10. IL-10RB-/- Mφs exhibited a striking defect in their ability to kill Salmonella enterica serovar Typhimurium, which was rescuable after experimentally introducing functional copies of the IL10RB gene. Genes involved in synthesis and receptor pathways for eicosanoid prostaglandin E2 (PGE2) were more highly induced in IL-10RB-/- Mφs, and these Mφs produced higher amounts of PGE2 after LPS stimulation compared with controls. Furthermore, pharmacological inhibition of PGE2 synthesis and PGE2 receptor blockade enhanced bacterial killing in Mφs. These results identify a regulatory interaction between IL-10 and PGE2, dysregulation of which may drive aberrant Mφ activation and impaired host defense contributing to IBD pathogenesis.

    Funded by: Wellcome Trust

    The Journal of experimental medicine 2020;217;2

  • Shared genetic risk between eating disorder- and substance-use-related phenotypes: Evidence from genome-wide association studies.

    Munn-Chernoff MA, Johnson EC, Chou YL, Coleman JRI, Thornton LM, Walters RK, Yilmaz Z, Baker JH, Hübel C, Gordon S, Medland SE, Watson HJ, Gaspar HA, Bryois J, Hinney A, Leppä VM, Mattheisen M, Ripke S, Yao S, Giusti-Rodríguez P, Hanscombe KB, Adan RAH, Alfredsson L, Ando T, Andreassen OA, Berrettini WH, Boehm I, Boni C, Boraska Perica V, Buehren K, Burghardt R, Cassina M, Cichon S, Clementi M, Cone RD, Courtet P, Crow S, Crowley JJ, Danner UN, Davis OSP, de Zwaan M, Dedoussis G, Degortes D, DeSocio JE, Dick DM, Dikeos D, Dina C, Dmitrzak-Weglarz M, Docampo E, Duncan LE, Egberts K, Ehrlich S, Escaramís G, Esko T, Estivill X, Farmer A, Favaro A, Fernández-Aranda F, Fichter MM, Fischer K, Föcker M, Foretova L, Forstner AJ, Forzan M, Franklin CS, Gallinger S, Giegling I, Giuranna J, Gonidakis F, Gorwood P, Gratacos Mayora M, Guillaume S, Guo Y, Hakonarson H, Hatzikotoulas K, Hauser J, Hebebrand J, Helder SG, Herms S, Herpertz-Dahlmann B, Herzog W, Huckins LM, Hudson JI, Imgart H, Inoko H, Janout V, Jiménez-Murcia S, Julià A, Kalsi G, Kaminská D, Karhunen L, Karwautz A, Kas MJH, Kennedy JL, Keski-Rahkonen A, Kiezebrink K, Kim YR, Klump KL, Knudsen GPS, La Via MC, Le Hellard S, Levitan RD, Li D, Lilenfeld L, Lin BD, Lissowska J, Luykx J, Magistretti PJ, Maj M, Mannik K, Marsal S, Marshall CR, Mattingsdal M, McDevitt S, McGuffin P, Metspalu A, Meulenbelt I, Micali N, Mitchell K, Monteleone AM, Monteleone P, Nacmias B, Navratilova M, Ntalla I, O'Toole JK, Ophoff RA, Padyukov L, Palotie A, Pantel J, Papezova H, Pinto D, Rabionet R, Raevuori A, Ramoz N, Reichborn-Kjennerud T, Ricca V, Ripatti S, Ritschel F, Roberts M, Rotondo A, Rujescu D, Rybakowski F, Santonastaso P, Scherag A, Scherer SW, Schmidt U, Schork NJ, Schosser A, Seitz J, Slachtova L, Slagboom PE, Slof-Op't Landt MCT, Slopien A, Sorbi S, Świątkowska B, Szatkiewicz JP, Tachmazidou I, Tenconi E, Tortorella A, Tozzi F, Treasure J, Tsitsika A, Tyszkiewicz-Nwafor M, Tziouvas K, van Elburg AA, van Furth EF, Wagner G, Walton E, Widen E, Zeggini E, Zerwas S, Zipfel S, Bergen AW, Boden JM, Brandt H, Crawford S, Halmi KA, Horwood LJ, Johnson C, Kaplan AS, Kaye WH, Mitchell J, Olsen CM, Pearson JF, Pedersen NL, Strober M, Werge T, Whiteman DC, Woodside DB, Grove J, Henders AK, Larsen JT, Parker R, Petersen LV, Jordan J, Kennedy MA, Birgegård A, Lichtenstein P, Norring C, Landén M, Mortensen PB, Polimanti R, McClintick JN, Adkins AE, Aliev F, Bacanu SA, Batzler A, Bertelsen S, Biernacka JM, Bigdeli TB, Chen LS, Clarke TK, Degenhardt F, Docherty AR, Edwards AC, Foo JC, Fox L, Frank J, Hack LM, Hartmann AM, Hartz SM, Heilmann-Heimbach S, Hodgkinson C, Hoffmann P, Hottenga JJ, Konte B, Lahti J, Lahti-Pulkkinen M, Lai D, Ligthart L, Loukola A, Maher BS, Mbarek H, McIntosh AM, McQueen MB, Meyers JL, Milaneschi Y, Palviainen T, Peterson RE, Ryu E, Saccone NL, Salvatore JE, Sanchez-Roige S, Schwandt M, Sherva R, Streit F, Strohmaier J, Thomas N, Wang JC, Webb BT, Wedow R, Wetherill L, Wills AG, Zhou H, Boardman JD, Chen D, Choi DS, Copeland WE, Culverhouse RC, Dahmen N, Degenhardt L, Domingue BW, Frye MA, Gäbel W, Hayward C, Ising M, Keyes M, Kiefer F, Koller G, Kramer J, Kuperman S, Lucae S, Lynskey MT, Maier W, Mann K, Männistö S, Müller-Myhsok B, Murray AD, Nurnberger JI, Preuss U, Räikkönen K, Reynolds MD, Ridinger M, Scherbaum N, Schuckit MA, Soyka M, Treutlein J, Witt SH, Wodarz N, Zill P, Adkins DE, Boomsma DI, Bierut LJ, Brown SA, Bucholz KK, Costello EJ, de Wit H, Diazgranados N, Eriksson JG, Farrer LA, Foroud TM, Gillespie NA, Goate AM, Goldman D, Grucza RA, Hancock DB, Harris KM, Hesselbrock V, Hewitt JK, Hopfer CJ, Iacono WG, Johnson EO, Karpyak VM, Kendler KS, Kranzler HR, Krauter K, Lind PA, McGue M, MacKillop J, Madden PAF, Maes HH, Magnusson PKE, Nelson EC, Nöthen MM, Palmer AA, Penninx BWJH, Porjesz B, Rice JP, Rietschel M, Riley BP, Rose RJ, Shen PH, Silberg J, Stallings MC, Tarter RE, Vanyukov MM, Vrieze S, Wall TL, Whitfield JB, Zhao H, Neale BM, Wade TD, Heath AC, Montgomery GW, Martin NG, Sullivan PF, Kaprio J, Breen G, Gelernter J, Edenberg HJ, Bulik CM and Agrawal A

    Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.

    Eating disorders and substance use disorders frequently co-occur. Twin studies reveal shared genetic variance between liabilities to eating disorders and substance use, with the strongest associations between symptoms of bulimia nervosa and problem alcohol use (genetic correlation [r<sub>g</sub> ], twin-based = 0.23-0.53). We estimated the genetic correlation between eating disorder and substance use and disorder phenotypes using data from genome-wide association studies (GWAS). Four eating disorder phenotypes (anorexia nervosa [AN], AN with binge eating, AN without binge eating, and a bulimia nervosa factor score), and eight substance-use-related phenotypes (drinks per week, alcohol use disorder [AUD], smoking initiation, current smoking, cigarettes per day, nicotine dependence, cannabis initiation, and cannabis use disorder) from eight studies were included. Significant genetic correlations were adjusted for variants associated with major depressive disorder and schizophrenia. Total study sample sizes per phenotype ranged from ~2400 to ~537 000 individuals. We used linkage disequilibrium score regression to calculate single nucleotide polymorphism-based genetic correlations between eating disorder- and substance-use-related phenotypes. Significant positive genetic associations emerged between AUD and AN (r<sub>g</sub> = 0.18; false discovery rate q = 0.0006), cannabis initiation and AN (r<sub>g</sub> = 0.23; q < 0.0001), and cannabis initiation and AN with binge eating (r<sub>g</sub> = 0.27; q = 0.0016). Conversely, significant negative genetic correlations were observed between three nondiagnostic smoking phenotypes (smoking initiation, current smoking, and cigarettes per day) and AN without binge eating (r<sub>gs</sub> = -0.19 to -0.23; qs < 0.04). The genetic correlation between AUD and AN was no longer significant after co-varying for major depressive disorder loci. The patterns of association between eating disorder- and substance-use-related phenotypes highlights the potentially complex and substance-specific relationships among these behaviors.

    Funded by: Medical Research Council: MR/R004803/1

    Addiction biology 2020;e12880

  • Molecular epidemiology and intercontinental spread of cholera.

    Mutreja A and Dougan G

    Department of Medicine, University of Cambridge, Cambridge CB2 0QQ, United Kingdom; Translational Health Science and Technology Institute, Faridabad, Haryana 121001, India; Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1S, United Kingdom. Electronic address:

    Whole genome sequence analysis has revealed the phylogenetic structure of Vibrio cholerae and has shown that the current seventh pandemic is highly clonal, emerging from a single source. Such analysis has the potential to become a powerful public health tool as we build public sequence databases, and as the speed of sequencing and analysis increases. Examples of such studies, as applied to different settings of the disease cholera, are described and discussed.

    Vaccine 2020;38 Suppl 1;A46-A51

  • Minimal spatial heterogeneity in chronic lymphocytic leukemia at diagnosis.

    Nadeu F, Royo R, Maura F, Dawson KJ, Dueso-Barroso A, Aymerich M, Pinyol M, Beà S, López-Guillermo A, Delgado J, Puente XS and Campo E

    Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain.

    Funded by: &quot;la Caixa&quot; Foundation (Caixa Foundation): CLLEvolution-HR17-00221; Ministerio de Economía y Competitividad (Ministry of Economy and Competitiveness): BES-2016-076372; Ministry of Economy and Competitiveness | Instituto de Salud Carlos III (Institute of Health Carlos III): PMP15/00007

    Leukemia 2020

  • ARID1A influences HDAC1/BRD4 activity, intrinsic proliferative capacity and breast cancer treatment response.

    Nagarajan S, Rao SV, Sutton J, Cheeseman D, Dunn S, Papachristou EK, Prada JG, Couturier DL, Kumar S, Kishore K, Chilamakuri CSR, Glont SE, Archer Goode E, Brodie C, Guppy N, Natrajan R, Bruna A, Caldas C, Russell A, Siersbæk R, Yusa K, Chernukhin I and Carroll JS

    CRUK Cambridge Institute, University of Cambridge, Cambridge, UK.

    Using genome-wide clustered regularly interspaced short palindromic repeats (CRISPR) screens to understand endocrine drug resistance, we discovered ARID1A and other SWI/SNF complex components as the factors most critically required for response to two classes of estrogen receptor-alpha (ER) antagonists. In this context, SWI/SNF-specific gene deletion resulted in drug resistance. Unexpectedly, ARID1A was also the top candidate in regard to response to the bromodomain and extraterminal domain inhibitor JQ1, but in the opposite direction, with loss of ARID1A sensitizing breast cancer cells to bromodomain and extraterminal domain inhibition. We show that ARID1A is a repressor that binds chromatin at ER cis-regulatory elements. However, ARID1A elicits repressive activity in an enhancer-specific, but forkhead box A1-dependent and active, ER-independent manner. Deletion of ARID1A resulted in loss of histone deacetylase 1 binding, increased histone 4 lysine acetylation and subsequent BRD4-driven transcription and growth. ARID1A mutations are more frequent in treatment-resistant disease, and our findings provide mechanistic insight into this process while revealing rational treatment strategies for these patients.

    Funded by: Cancer Research UK (CRUK): Core funding; EC | EC Seventh Framework Programm | FP7 Ideas: European Research Council (FP7-IDEAS-ERC - Specific Programme: &quot;Ideas&quot; Implementing the Seventh Framework Programme of the European Community for Research, Technological Development and Demonstration Activities (2007 to 2013)): 646876

    Nature genetics 2020;52;2;187-197

  • Genetic Markers in S. Paratyphi C Reveal Primary Adaptation to Pigs.

    Nair S, Fookes M, Corton C, Thomson NR, Wain J and Langridge GC

    Gastrointestinal Bacteria Reference Unit, Public Health England, Colindale, London NW9 5EQ, UK.

    <i>Salmonella enterica</i> with the identical antigenic formula 6,7:c:1,5 can be differentiated biochemically and by disease syndrome. One grouping, <i>Salmonella</i> Paratyphi C, is currently considered a typhoidal serovar, responsible for enteric fever in humans. The human-restricted typhoidal serovars (<i>S.</i> Typhi and Paratyphi A, B and C) typically display high levels of genome degradation and are cited as an example of convergent evolution for host adaptation in humans. However, <i>S.</i> Paratyphi C presents a different clinical picture to <i>S.</i> Typhi/Paratyphi A, in a patient group with predisposition, raising the possibility that its natural history is different, and that infection is invasive salmonellosis rather than enteric fever. Using whole genome sequencing and metabolic pathway analysis, we compared the genomes of 17 <i>S.</i> Paratyphi C strains to other members of the 6,7:c:1,5 group and to two typhoidal serovars: <i>S.</i> Typhi and Paratyphi A. The genome degradation observed in <i>S.</i> Paratyphi C was much lower than <i>S.</i> Typhi/Paratyphi A, but similar to the other 6,7:c:1,5 strains. Genomic and metabolic comparisons revealed little to no overlap between <i>S.</i> Paratyphi C and the other typhoidal serovars, arguing against convergent evolution and instead providing evidence of a primary adaptation to pigs in accordance with the 6,7:c:1.5 strains.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/R012504/1, BBS/E/F/000PR10352

    Microorganisms 2020;8;5

  • CELLector: Genomics-Guided Selection of Cancer In Vitro Models.

    Najgebauer H, Yang M, Francies HE, Pacini C, Stronach EA, Garnett MJ, Saez-Rodriguez J and Iorio F

    Open Targets, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK; Wellcome Trust Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK.

    Selecting appropriate cancer models is a key prerequisite for maximizing translational potential and clinical relevance of in vitro oncology studies. We developed CELLector: an R package and R Shiny application allowing researchers to select the most relevant cancer cell lines in a patient-genomic-guided fashion. CELLector leverages tumor genomics to identify recurrent subtypes with associated genomic signatures. It then evaluates these signatures in cancer cell lines to prioritize their selection. This enables users to choose appropriate in vitro models for inclusion or exclusion in retrospective analyses and future studies. Moreover, this allows bridging outcomes from cancer cell line screens to precisely defined sub-cohorts of primary tumors. Here, we demonstrate the usefulness and applicability of CELLector, showing how it can aid prioritization of in vitro models for future development and unveil patient-derived multivariate prognostic and therapeutic markers. CELLector is freely available at (code at and

    Cell systems 2020;10;5;424-432.e6

  • DNAJC6 Mutations Disrupt Dopamine Homeostasis in Juvenile Parkinsonism-Dystonia.

    Ng J, Cortès-Saladelafont E, Abela L, Termsarasab P, Mankad K, Sudhakar S, Gorman KM, Heales SJR, Pope S, Biassoni L, Csányi B, Cain J, Rakshi K, Coutts H, Jayawant S, Jefferson R, Hughes D, García-Cazorla À, Grozeva D, Raymond FL, Pérez-Dueñas B, De Goede C, Pearson TS, Meyer E and Kurian MA

    Molecular Neurosciences, Developmental Neurosciences Programme, UCL Great Ormond Street Institute of Child Health, London, United Kingdom.

    Background: Juvenile forms of parkinsonism are rare conditions with onset of bradykinesia, tremor and rigidity before the age of 21 years. These atypical presentations commonly have a genetic aetiology, highlighting important insights into underlying pathophysiology. Genetic defects may affect key proteins of the endocytic pathway and clathrin-mediated endocytosis (CME), as in DNAJC6-related juvenile parkinsonism.

    Objective: To report on a new patient cohort with juvenile-onset DNAJC6 parkinsonism-dystonia and determine the functional consequences on auxilin and dopamine homeostasis.

    Methods: Twenty-five children with juvenile parkinsonism were identified from a research cohort of patients with undiagnosed pediatric movement disorders. Molecular genetic investigations included autozygosity mapping studies and whole-exome sequencing. Patient fibroblasts and CSF were analyzed for auxilin, cyclin G-associated kinase and synaptic proteins.

    Results: We identified 6 patients harboring previously unreported, homozygous nonsense DNAJC6 mutations. All presented with neurodevelopmental delay in infancy, progressive parkinsonism, and neurological regression in childhood. <sup>123</sup> I-FP-CIT SPECT (DaTScan) was performed in 3 patients and demonstrated reduced or absent tracer uptake in the basal ganglia. CSF neurotransmitter analysis revealed an isolated reduction of homovanillic acid. Auxilin levels were significantly reduced in both patient fibroblasts and CSF. Cyclin G-associated kinase levels in CSF were significantly increased, whereas a number of presynaptic dopaminergic proteins were reduced.

    Conclusions: DNAJC6 is an emerging cause of recessive juvenile parkinsonism-dystonia. DNAJC6 encodes the cochaperone protein auxilin, involved in CME of synaptic vesicles. The observed dopamine dyshomeostasis in patients is likely to be multifactorial, secondary to auxilin deficiency and/or neurodegeneration. Increased patient CSF cyclin G-associated kinase, in tandem with reduced auxilin levels, suggests a possible compensatory role of cyclin G-associated kinase, as observed in the auxilin knockout mouse. DNAJC6 parkinsonism-dystonia should be considered as a differential diagnosis for pediatric neurotransmitter disorders associated with low homovanillic acid levels. Future research in elucidating disease pathogenesis will aid the development of better treatments for this pharmacoresistant disorder. © 2020 The Authors. Movement Disorders published by Wiley Periodicals, Inc. on behalf of International Parkinson and Movement Disorder Society.

    Funded by: Agustí Pedró i Pons Foundation, Universitat de Barcelona and Río Hortega 2015-2017 Institute of Health Carlos III (ECS); Cambridge Biomedical Research Centre and Wellcome Trust for UK10K (FLR and DG); Dystonia Medical Research Foundation (PT); European Union Marie Curie training network (SJRH); Great Ormond Street Hospital Children's Charity (JN, EM, MAK); NIHR Biomedical Research Centres; PS09/01132 ISCIII-FEDER, Institute of Health Carlos III (AGC); Rosetrees Trust (JN, MAK); Sir Jules Thorn Charitable Trust (MAK); Swiss National Science Foundation (LA); UK Medical Research Council (JN); Wellcome Intermediate Fellowship and NIHR Professorship (MAK)

    Movement disorders : official journal of the Movement Disorder Society 2020

  • Genomic evidence supports a clonal diaspora model for metastases of esophageal adenocarcinoma.

    Noorani A, Li X, Goddard M, Crawte J, Alexandrov LB, Secrier M, Eldridge MD, Bower L, Weaver J, Lao-Sirieix P, Martincorena I, Debiram-Beecham I, Grehan N, MacRae S, Malhotra S, Miremadi A, Thomas T, Galbraith S, Petersen L, Preston SD, Gilligan D, Hindmarsh A, Hardwick RH, Stratton MR, Wedge DC and Fitzgerald RC

    MRC Cancer Unit, University of Cambridge, Cambridge, UK.

    The poor outcomes in esophageal adenocarcinoma (EAC) prompted us to interrogate the pattern and timing of metastatic spread. Whole-genome sequencing and phylogenetic analysis of 388 samples across 18 individuals with EAC showed, in 90% of patients, that multiple subclones from the primary tumor spread very rapidly from the primary site to form multiple metastases, including lymph nodes and distant tissues-a mode of dissemination that we term 'clonal diaspora'. Metastatic subclones at autopsy were present in tissue and blood samples from earlier time points. These findings have implications for our understanding and clinical evaluation of EAC.

    Funded by: Cancer Research UK (CRUK): RG66287; DH | National Institute for Health Research (NIHR): RG67258; RCUK | Medical Research Council (MRC): RG84369

    Nature genetics 2020;52;1;74-83

  • Novel 2D and 3D Assays to Determine the Activity of Anti-Leishmanial Drugs.

    O'Keeffe A, Hale C, Cotton JA, Yardley V, Gupta K, Ananthanarayanan A, Murdan S and Croft SL

    Department of Infection Biology, London School of Hygiene and Tropical Medicine, London WC1E 7HT, UK.

    The discovery of novel anti-leishmanial compounds remains essential as current treatments have known limitations and there are insufficient novel compounds in development. We have investigated three complex and physiologically relevant in vitro assays, including: (i) a media perfusion based cell culture model, (ii) two 3D cell culture models, and (iii) iPSC derived macrophages in place of primary macrophages or cell lines, to determine whether they offer improved approaches to anti-leishmanial drug discovery and development. Using a Leishmania major amastigote-macrophage assay the activities of standard drugs were investigated to show the effect of changing parameters in these assays. We determined that drug activity was reduced by media perfusion (EC<sub>50</sub> values for amphotericin B shifted from 54 (51-57) nM in the static system to 70 (61-75) nM under media perfusion; EC<sub>50</sub> values for miltefosine shifted from 12 (11-15) µM in the static system to 30 (26-34) µM under media perfusion) (mean and 95% confidence intervals), with corresponding reduced drug accumulation by macrophages. In the 3D cell culture model there was a significant difference in the EC<sub>50</sub> values of amphotericin B but not miltefosine (EC<sub>50</sub> values for amphotericin B were 34.9 (31.4-38.6) nM in the 2D and 52.3 (46.6-58.7) nM in 3D; EC<sub>50</sub> values for miltefosine were 5.0 (4.9-5.2) µM in 2D and 5.9 (5.5-6.2) µM in 3D (mean and 95% confidence intervals). Finally, in experiments using iPSC derived macrophages infected with Leishmania, reported here for the first time, we observed a higher level of intracellular infection in iPSC derived macrophages compared to the other macrophage types for four different species of Leishmania studied. For L. major with an initial infection ratio of 0.5 parasites per host cell the percentage infection level of the macrophages after 72 h was 11.3% ± 1.5%, 46.0% ± 1.4%, 66.4% ± 3.5% and 75.1% ± 2.4% (average ± SD) for the four cells types, THP1 a human monocytic cell line, mouse bone marrow macrophages (MBMMs), human bone marrow macrophages (HBMMs) and iPSC derived macrophages respectively. Despite the higher infection levels, drug activity in iPSC derived macrophages was similar to that in other macrophage types, for example, amphotericin B EC<sub>50</sub> values were 35.9 (33.4-38.5), 33.5 (31.5-36.5), 33.6 (30.5-not calculated (NC)) and 46.4 (45.8-47.2) nM in iPSC, MBMMs, HBMMs and THP1 cells respectively (mean and 95% confidence intervals). We conclude that increasing the complexity of cellular assays does impact upon anti-leishmanial drug activities but not sufficiently to replace the current model used in HTS/HCS assays in drug discovery programmes. The impact of media perfusion on drug activities and the use of iPSC macrophages do, however, deserve further investigation.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/M009513/1; Medical Research Council: IF MC-PC_13069

    Microorganisms 2020;8;6

  • Single-cell transcriptomics identifies CD44 as a marker and regulator of endothelial to haematopoietic transition.

    Oatley M, Bölükbası ÖV, Svensson V, Shvartsman M, Ganter K, Zirngibl K, Pavlovich PV, Milchevskaya V, Foteva V, Natarajan KN, Baying B, Benes V, Patil KR, Teichmann SA and Lancrin C

    European Molecular Biology Laboratory, EMBL Rome - Epigenetics and Neurobiology Unit, via E. Ramarini 32, 00015, Monterotondo, Italy.

    The endothelial to haematopoietic transition (EHT) is the process whereby haemogenic endothelium differentiates into haematopoietic stem and progenitor cells (HSPCs). The intermediary steps of this process are unclear, in particular the identity of endothelial cells that give rise to HSPCs is unknown. Using single-cell transcriptome analysis and antibody screening, we identify CD44 as a marker of EHT enabling us to isolate robustly the different stages of EHT in the aorta-gonad-mesonephros (AGM) region. This allows us to provide a detailed phenotypical and transcriptional profile of CD44-positive arterial endothelial cells from which HSPCs emerge. They are characterized with high expression of genes related to Notch signalling, TGFbeta/BMP antagonists, a downregulation of genes related to glycolysis and the TCA cycle, and a lower rate of cell cycle. Moreover, we demonstrate that by inhibiting the interaction between CD44 and its ligand hyaluronan, we can block EHT, identifying an additional regulator of HSPC development.

    Nature communications 2020;11;1;586

  • Establishment, optimisation and quantitation of a bioluminescent murine infection model of visceral leishmaniasis for systematic vaccine screening.

    Ong HB, Clare S, Roberts AJ, Wilson ME and Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Sanger Institute, Cambridge, UK.

    Visceral leishmaniasis is an infectious parasitic disease caused by the protozoan parasites Leishmania donovani and Leishmania infantum. The drugs currently used to treat visceral leishmaniasis suffer from toxicity and the emergence of parasite resistance, and so a better solution would be the development of an effective subunit vaccine; however, no approved vaccine currently exists. The comparative testing of a large number of vaccine candidates requires a quantitative and reproducible experimental murine infection model, but the parameters that influence infection pathology have not been systematically determined. To address this, we have established an infection model using a transgenic luciferase-expressing L. donovani parasite and longitudinally quantified the infections using in vivo bioluminescent imaging within individual mice. We examined the effects of varying the infection route, the site of adjuvant formulation administration, and standardised the parasite preparation and dose. We observed that the increase in parasite load within the liver during the first few weeks of infection was directly proportional to the parasite number in the initial inoculum. Finally, we show that immunity can be induced in pre-exposed animals that have resolved an initial infection. This murine infection model provides a platform for systematic subunit vaccine testing against visceral leishmaniasis.

    Funded by: Center for Integrated Healthcare, U.S. Department of Veterans Affairs: BX001983; NIH HHS: R01 AI045540; Wellcome Trust: 098051

    Scientific reports 2020;10;1;4689

  • Structure and mechanism of monoclonal antibody binding to the junctional epitope of Plasmodium falciparum circumsporozoite protein.

    Oyen D, Torres JL, Aoto PC, Flores-Garcia Y, Binter S, Pholcharee T, Carroll S, Reponen S, Wash R, Liang Q, Lemiale F, Locke E, Bradley A, King CR, Emerling D, Kellam P, Zavala F, Ward AB and Wilson IA

    Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, California, United States of America.

    Lasting protection has long been a goal for malaria vaccines. The major surface antigen on Plasmodium falciparum sporozoites, the circumsporozoite protein (PfCSP), has been an attractive target for vaccine development and most protective antibodies studied to date interact with the central NANP repeat region of PfCSP. However, it remains unclear what structural and functional characteristics correlate with better protection by one antibody over another. Binding to the junctional region between the N-terminal domain and central NANP repeats has been proposed to result in superior protection: this region initiates with the only NPDP sequence followed immediately by NANP. Here, we isolated antibodies in Kymab mice immunized with full-length recombinant PfCSP and two protective antibodies were selected for further study with reactivity against the junctional region. X-ray and EM structures of two monoclonal antibodies, mAb667 and mAb668, shed light on their differential affinity and specificity for the junctional region. Importantly, these antibodies also bind to the NANP repeat region with equal or better affinity. A comparison with an NANP-only binding antibody (mAb317) revealed roughly similar but statistically distinct levels of protection against sporozoite challenge in mouse liver burden models, suggesting that junctional antibody protection might relate to the ability to also cross-react with the NANP repeat region. Our findings indicate that additional efforts are necessary to isolate a true junctional antibody with no or much reduced affinity to the NANP region to elucidate the role of the junctional epitope in protection.

    PLoS pathogens 2020;16;3;e1008373

  • Integrative pathway enrichment analysis of multivariate omics data.

    Paczkowska M, Barenboim J, Sintupisut N, Fox NS, Zhu H, Abd-Rabbo D, Mee MW, Boutros PC, PCAWG Drivers and Functional Interpretation Working Group, Reimand J and PCAWG Consortium

    Computational Biology Program, Ontario Institute for Cancer Research, 661 University Ave Suite 510, Toronto, ON, M5G 0A3, Canada.

    Multi-omics datasets represent distinct aspects of the central dogma of molecular biology. Such high-dimensional molecular profiles pose challenges to data interpretation and hypothesis generation. ActivePathways is an integrative method that discovers significantly enriched pathways across multiple datasets using statistical data fusion, rationalizes contributing evidence and highlights associated genes. As part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancers across 38 tumor types, we integrated genes with coding and non-coding mutations and revealed frequently mutated pathways and additional cancer genes with infrequent mutations. We also analyzed prognostic molecular pathways by integrating genomic and transcriptomic features of 1780 breast cancers and highlighted associations with immune response and anti-apoptotic signaling. Integration of ChIP-seq and RNA-seq data for master regulators of the Hippo pathway across normal human tissues identified processes of tissue regeneration and stem cell regulation. ActivePathways is a versatile method that improves systems-level understanding of cellular organization in health and disease through integration of multiple molecular datasets and pathway annotations.

    Nature communications 2020;11;1;735

  • Expression Atlas update: from tissues to single cells.

    Papatheodorou I, Moreno P, Manning J, Fuentes AM, George N, Fexova S, Fonseca NA, Füllgrabe A, Green M, Huang N, Huerta L, Iqbal H, Jianu M, Mohammed S, Zhao L, Jarnuczak AF, Jupp S, Marioni J, Meyer K, Petryszak R, Prada Medina CA, Talavera-López C, Teichmann S, Vizcaino JA and Brazma A

    European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK.

    Expression Atlas is EMBL-EBI's resource for gene and protein expression. It sources and compiles data on the abundance and localisation of RNA and proteins in various biological systems and contexts and provides open access to this data for the research community. With the increased availability of single cell RNA-Seq datasets in the public archives, we have now extended Expression Atlas with a new added-value service to display gene expression in single cells. Single Cell Expression Atlas was launched in 2018 and currently includes 123 single cell RNA-Seq studies from 12 species. The website can be searched by genes within or across species to reveal experiments, tissues and cell types where this gene is expressed or under which conditions it is a marker gene. Within each study, cells can be visualized using a pre-calculated t-SNE plot and can be coloured by different features or by cell clusters based on gene expression. Within each experiment, there are links to downloadable files, such as RNA quantification matrices, clustering results, reports on protocols and associated metadata, such as assigned cell types.

    Nucleic acids research 2020;48;D1;D77-D83

  • Prenatal development of human immunity.

    Park JE, Jardine L, Gottgens B, Teichmann SA and Haniffa M

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    The blood and immune systems develop in parallel during early prenatal life. Waves of hematopoiesis separated in anatomical space and time give rise to circulating and tissue-resident immune cells. Previous observations have relied on animal models, which differ from humans in both their developmental timeline and exposure to microorganisms. Decoding the composition of the human immune system is now tractable using single-cell multi-omics approaches. Large-scale single-cell genomics, imaging technologies, and the Human Cell Atlas initiative have together enabled a systems-level mapping of the developing human immune system and its emergent properties. Although the precise roles of specific immune cells during development require further investigation, the system as a whole displays malleable and responsive properties according to developmental need and environmental challenge.

    Science (New York, N.Y.) 2020;368;6491;600-603

  • Vitamin D Receptor Controls Cell Stemness in Acute Myeloid Leukemia and in Normal Bone Marrow.

    Paubelle E, Zylbersztejn F, Maciel TT, Carvalho C, Mupo A, Cheok M, Lieben L, Sujobert P, Decroocq J, Yokoyama A, Asnafi V, Macintyre E, Tamburini J, Bardet V, Castaigne S, Preudhomme C, Dombret H, Carmeliet G, Bouscary D, Ginzburg YZ, de Thé H, Benhamou M, Monteiro RC, Vassiliou GS, Hermine O and Moura IC

    INSERM UMR 1163, Laboratory of Cellular and Molecular Mechanisms of Hematological Disorders and Therapeutical Implications, 75015 Paris, France; Paris Descartes - Sorbonne Paris Cité University, Imagine Institute, 75015 Paris, France; CNRS ERL 8254, Laboratory of Cellular and Molecular Mechanisms of Hematological Disorders and Therapeutical Implications, 75015 Paris, France; Department of Clinical Hematology, Assistance Publique-Hôpitaux de Paris, Hôpital Necker, 75015 Paris, France. Electronic address:

    Vitamin D (VD) is a known differentiating agent, but the role of VD receptor (VDR) is still incompletely described in acute myeloid leukemia (AML), whose treatment is based mostly on antimitotic chemotherapy. Here, we present an unexpected role of VDR in normal hematopoiesis and in leukemogenesis. Limited VDR expression is associated with impaired myeloid progenitor differentiation and is a new prognostic factor in AML. In mice, the lack of Vdr results in increased numbers of hematopoietic and leukemia stem cells and quiescent hematopoietic stem cells. In addition, malignant transformation of Vdr<sup>-/-</sup> cells results in myeloid differentiation block and increases self-renewal. Vdr promoter is methylated in AML as in CD34<sup>+</sup> cells, and demethylating agents induce VDR expression. Association of VDR agonists with hypomethylating agents promotes leukemia stem cell exhaustion and decreases tumor burden in AML mouse models. Thus, Vdr functions as a regulator of stem cell homeostasis and leukemic propagation.

    Cell reports 2020;30;3;739-754.e4

  • The Open Targets Post-GWAS analysis pipeline.

    Peat G, Jones W, Nuhn M, Marugán JC, Newell W, Dunham I and Zerbino D

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.

    Motivation: Genome-wide association studies (GWAS) are a powerful method to detect even weak associations between variants and phenotypes, however many of the identified associated variants are in non-coding regions, and presumably influence gene expression regulation. Identifying potential drug targets, i.e. causal protein-coding genes, therefore requires crossing the genetics results with functional data.

    Results: We present a novel data integration pipeline, that feeds into an interactive data resource, that analyses GWAS results in the light of experimental epigenetic and cis-regulatory datasets, such as ChIP-Seq, Promoter Capture Hi-C or eQTL, and presents them in a single report, which can be used for inferring likely causal genes.

    Availability: The analysis code is available at, and the interactive data browser at

    Bioinformatics (Oxford, England) 2020

  • Retrograde signals from endosymbiotic organelles: a common control principle in eukaryotic cells.

    Pfannschmidt T, Terry MJ, Van Aken O and Quiros PM

    Institute of Botany, Plant Physiology, Leibniz University Hannover, Herrenhäuser Straße 2, 30419 Hannover, Germany.

    Endosymbiotic organelles of eukaryotic cells, the plastids, including chloroplasts and mitochondria, are highly integrated into cellular signalling networks. In both heterotrophic and autotrophic organisms, plastids and/or mitochondria require extensive organelle-to-nucleus communication in order to establish a coordinated expression of their own genomes with the nuclear genome, which encodes the majority of the components of these organelles. This goal is achieved by the use of a variety of signals that inform the cell nucleus about the number and developmental status of the organelles and their reaction to changing external environments. Such signals have been identified in both photosynthetic and non-photosynthetic eukaryotes (known as retrograde signalling and retrograde response, respectively) and, therefore, appear to be universal mechanisms acting in eukaryotes of all kingdoms. In particular, chloroplasts and mitochondria both harbour crucial redox reactions that are the basis of eukaryotic life and are, therefore, especially susceptible to stress from the environment, which they signal to the rest of the cell. These signals are crucial for cell survival, lifespan and environmental adjustment, and regulate quality control and targeted degradation of dysfunctional organelles, metabolic adjustments, and developmental signalling, as well as induction of apoptosis. The functional similarities between retrograde signalling pathways in autotrophic and non-autotrophic organisms are striking, suggesting the existence of common principles in signalling mechanisms or similarities in their evolution. Here, we provide a survey for the newcomers to this field of research and discuss the importance of retrograde signalling in the context of eukaryotic evolution. Furthermore, we discuss commonalities and differences in retrograde signalling mechanisms and propose retrograde signalling as a general signalling mechanism in eukaryotic cells that will be also of interest for the specialist. This article is part of the theme issue 'Retrograde signalling from endosymbiotic organelles'.

    Philosophical transactions of the Royal Society of London. Series B, Biological sciences 2020;375;1801;20190396

  • A single-progenitor model as the unifying paradigm of epidermal and esophageal epithelial maintenance in mice.

    Piedrafita G, Kostiou V, Wabik A, Colom B, Fernandez-Antoran D, Herms A, Murai K, Hall BA and Jones PH

    Wellcome Sanger Institute, Hinxton, CB10 1SA, UK.

    In adult skin epidermis and the epithelium lining the esophagus cells are constantly shed from the tissue surface and replaced by cell division. Tracking genetically labelled cells in transgenic mice has given insight into cell behavior, but conflicting models appear consistent with the results. Here, we use an additional transgenic assay to follow cell division in mouse esophagus and the epidermis at multiple body sites. We find that proliferating cells divide at a similar rate, and place bounds on the distribution cell cycle times. By including these results in a common analytic approach, we show that data from eight lineage tracing experiments is consistent with tissue maintenance by a single population of proliferating cells. The outcome of a given cell division is unpredictable but, on average, the likelihood of producing proliferating and differentiating cells is equal, ensuring cellular homeostasis. These findings are key to understanding squamous epithelial homeostasis and carcinogenesis.

    Funded by: Cancer Research UK (CRUK): C/609/A17257, C609/A27326; Royal Society: UF130039; Wellcome Trust (Wellcome): 098051, 296194

    Nature communications 2020;11;1;1429

  • Phylogenetic Analysis Indicates a Longer Term Presence of the Globally Distributed H58 Haplotype of Salmonella Typhi in Southern India.

    Pragasam AK, Pickard D, Wong V, Dougan G, Kang G, Thompson A, John J, Balaji V and Mutreja A

    Department of Clinical Microbiology, Christian Medical College, Vellore, India.

    Background: Typhoid fever caused by Salmonella Typhi is a major public health concern in low-/middle-income countries. A recent study of 1900 global S. Typhi indicated that South Asia might be the site of the original emergence of the most successful and hypervirulent clone belonging to the 4.3.1 genotype. However, this study had limited samples from India.

    Methods: We analyzed 194 clinical S. Typhi, temporal representatives from those isolated from blood and bone marrow cultures in southern India, over 26 years (1991-2016). Antimicrobial resistance (AMR) testing was performed for most common clinical agents. Whole-genome sequencing and SNP-level analysis was conducted. Comparative genomics of Vellore isolates was performed to infer transmission and AMR events.

    Results: We identified multidrug-resistance (MDR)-associated clade 4.3.1 as the dominant genotype. We detected 4.3.1 S. Typhi as early as 1991, the earliest to be reported form India, and the majority were fluoroquinolone resistant and not MDR. MDR was not detected at all in other genotypes circulating in Vellore. Comparison with global S. Typhi showed 2 Vellore subgroups (I and II) that were phylogenetically highly related to previously described South Asia (subgroup I, II) and Southeast Asia (subgroup II) clades.

    Conclusions: 4.3.1 S. Typhi has dominated in Vellore for 2 decades. Our study would assist public health agencies in better tracking of transmission and persistence of this successful clade in India and globally. It informs clinicians of the AMR pattern of circulating clone, which would add confidence to their prophylactic/treatment decision making and facilitate efficient patient care.

    Clinical infectious diseases : an official publication of the Infectious Diseases Society of America 2020

  • Nasal DNA methylation profiling of asthma and rhinitis.

    Qi C, Jiang Y, Yang IV, Forno E, Wang T, Vonk JM, Gehring U, Smit HA, Milanzi EB, Carpaij OA, Berg M, Hesse L, Brouwer S, Cardwell J, Vermeulen CJ, Acosta-Pérez E, Canino G, Boutaoui N, van den Berge M, Teichmann SA, Nawijn MC, Chen W, Celedón JC, Xu CJ and Koppelman GH

    Department of Pediatric Pulmonology and Pediatric Allergy, Beatrix Children's Hospital, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands; Gronigen Research Institute for Asthma and COPD, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.

    Background: Epigenetic signatures in the nasal epithelium, which is a primary interface with the environment and an accessible proxy for the bronchial epithelium, might provide insights into mechanisms of allergic disease.

    Objective: We aimed to identify and interpret methylation signatures in nasal epithelial brushes associated with rhinitis and asthma.

    Methods: Nasal epithelial brushes were obtained from 455 children at the 16-year follow-up of the Dutch Prevention and Incidence of Asthma and Mite Allergy birth cohort study. Epigenome-wide association studies were performed on children with asthma, rhinitis, and asthma and/or rhinitis (AsRh) by using logistic regression, and the top results were replicated in 2 independent cohorts of African American and Puerto Rican children. Significant CpG sites were related to environmental exposures (pets, active and passive smoking, and molds) during secondary school and were correlated with gene expression by RNA-sequencing (n = 244).

    Results: The epigenome-wide association studies identified CpG sites significantly associated with rhinitis (n = 81) and AsRh (n = 75), but not with asthma. We significantly replicated 62 of 81 CpG sites with rhinitis and 60 of 75 with AsRh, as well as 1 CpG site with asthma. Methylation of cg03565274 was negatively associated with AsRh and positively associated with exposure to pets during secondary school. DNA methylation signals associated with AsRh were mainly driven by specific IgE-positive subjects. DNA methylation related to gene transcripts that were enriched for immune pathways and expressed in immune and epithelial cells. Nasal CpG sites performed well in predicting AsRh.

    Conclusions: We identified replicable DNA methylation profiles of asthma and rhinitis in nasal brushes. Exposure to pets may affect nasal epithelial methylation in relation to asthma and rhinitis.

    The Journal of allergy and clinical immunology 2020

  • Population structure and antimicrobial resistance patterns of Salmonella Typhi isolates in urban Dhaka, Bangladesh from 2004 to 2016.

    Rahman SIA, Dyson ZA, Klemm EJ, Khanam F, Holt KE, Chowdhury EK, Dougan G and Qadri F

    Infectious Diseases Division, International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b), Dhaka, Bangladesh.

    Background: Multi-drug resistant typhoid fever remains an enormous public health threat in low and middle-income countries. However, we still lack a detailed understanding of the epidemiology and genomics of S. Typhi in many regions. Here we have undertaken a detailed genomic analysis of typhoid in urban Dhaka, Bangladesh to unravel the population structure and antimicrobial resistance patterns in S. Typhi isolated between 2004-2016.

    Principal findings: Whole genome sequencing of 202 S. Typhi isolates obtained from three study locations in urban Dhaka revealed a diverse range of S. Typhi genotypes and AMR profiles. The bacterial population within Dhaka were relatively homogenous with little stratification between different healthcare facilities or age groups. We also observed evidence of exchange of Bangladeshi genotypes with neighboring South Asian countries (India, Pakistan and Nepal) suggesting these are circulating throughout the region. This analysis revealed a decline in H58 (genotype 4.3.1) isolates from 2011 onwards, coinciding with a rise in a diverse range of non-H58 genotypes and a simultaneous rise in isolates with reduced susceptibility to fluoroquinolones, potentially reflecting a change in treatment practices. We identified a novel S. Typhi genotype, subclade 3.3.2 (previously defined only to clade level, 3.3), which formed two localized clusters (3.3.2.Bd1 and 3.3.2.Bd2) associated with different mutations in the Quinolone Resistance Determining Region (QRDR) of gene gyrA.

    Significance: Our analysis of S. Typhi isolates from urban Dhaka, Bangladesh isolated over a twelve year period identified a diverse range of AMR profiles and genotypes. The observed increase in non-H58 genotypes associated with reduced fluoroquinolone susceptibility may reflect a change in treatment practice in this region and highlights the importance of continued molecular surveillance to monitor the ongoing evolution of AMR in Dhaka. We have defined new genotypes and lineages of Bangladeshi S. Typhi which will facilitate the identification of these emerging AMR clones in future surveillance efforts.

    PLoS neglected tropical diseases 2020;14;2;e0008036

  • Defining metrics for whole-genome sequence analysis of MRSA in clinical practice.

    Raven KE, Blane B, Kumar N, Leek D, Bragin E, Coll F, Parkhill J and Peacock SJ

    Department of Medicine, University of Cambridge, Box 157 Addenbrooke's Hospital, Hills Road, Cambridge, CB2 0QQ, UK.

    Bacterial sequencing will become increasingly adopted in routine microbiology laboratories. Here, we report the findings of a technical evaluation of almost 800 clinical methicillin-resistant <i>Staphylococcus aureus</i> (MRSA) isolates, in which we sought to define key quality metrics to support MRSA sequencing in clinical practice. We evaluated the accuracy of mapping to a generic reference versus clonal complex (CC)-specific mapping, which is more computationally challenging. Focusing on isolates that were genetically related (<50 single nucleotide polymorphisms (SNPs)) and belonged to prevalent sequence types, concordance between these methods was 99.5 %. We use MRSA MPROS0386 to control for base calling accuracy by the sequencer, and used multiple repeat sequences of the control to define a permitted range of SNPs different to the mapping reference for this control (equating to 3 standard deviations from the mean). Repeat sequences of the control were also used to demonstrate that SNP calling was most accurate across differing coverage depths (above 35×, the lowest depth in our study) when the depth required to call a SNP as present was at least 4-8×. Using 786 MRSA sequences, we defined a robust measure for <i>mec</i> gene detection to reduce false-positives arising from contamination, which was no greater than 2 standard deviations below the average depth of coverage across the genome. Sequencing from bacteria harvested from clinical plates runs an increased risk of contamination with the same or different species, and we defined a cut-off of 30 heterozygous sites >50 bp apart to identify same-species contamination for MRSA. These metrics were combined into a quality-control (QC) flowchart to determine whether sequence runs and individual clinical isolates passed QC, which could be adapted by future automated analysis systems to enable rapid hands-off sequence analysis by clinical laboratories.

    Microbial genomics 2020

  • Niacin-mediated rejuvenation of macrophage/microglia enhances remyelination of the aging central nervous system.

    Rawji KS, Young AMH, Ghosh T, Michaels NJ, Mirzaei R, Kappen J, Kolehmainen KL, Alaeiilkhchi N, Lozinski B, Mishra MK, Pu A, Tang W, Zein S, Kaushik DK, Keough MB, Plemel JR, Calvert F, Knights AJ, Gaffney DJ, Tetzlaff W, Franklin RJM and Yong VW

    Department of Clinical Neurosciences, Hotchkiss Brain Institute, University of Calgary, 3330 Hospital Drive, Calgary, AB, T2N 4N1, Canada.

    Remyelination following CNS demyelination restores rapid signal propagation and protects axons; however, its efficiency declines with increasing age. Both intrinsic changes in the oligodendrocyte progenitor cell population and extrinsic factors in the lesion microenvironment of older subjects contribute to this decline. Microglia and monocyte-derived macrophages are critical for successful remyelination, releasing growth factors and clearing inhibitory myelin debris. Several studies have implicated delayed recruitment of macrophages/microglia into lesions as a key contributor to the decline in remyelination observed in older subjects. Here we show that the decreased expression of the scavenger receptor CD36 of aging mouse microglia and human microglia in culture underlies their reduced phagocytic activity. Overexpression of CD36 in cultured microglia rescues the deficit in phagocytosis of myelin debris. By screening for clinically approved agents that stimulate macrophages/microglia, we have found that niacin (vitamin B3) upregulates CD36 expression and enhances myelin phagocytosis by microglia in culture. This increase in myelin phagocytosis is mediated through the niacin receptor (hydroxycarboxylic acid receptor 2). Genetic fate mapping and multiphoton live imaging show that systemic treatment of 9-12-month-old demyelinated mice with therapeutically relevant doses of niacin promotes myelin debris clearance in lesions by both peripherally derived macrophages and microglia. This is accompanied by enhancement of oligodendrocyte progenitor cell numbers and by improved remyelination in the treated mice. Niacin represents a safe and translationally amenable regenerative therapy for chronic demyelinating diseases such as multiple sclerosis.

    Funded by: CIHR: 690720

    Acta neuropathologica 2020

  • An intestinal zinc sensor regulates food intake and developmental growth.

    Redhai S, Pilgrim C, Gaspar P, Giesen LV, Lopes T, Riabinina O, Grenier T, Milona A, Chanana B, Swadling JB, Wang YF, Dahalan F, Yuan M, Wilsch-Brauninger M, Lin WH, Dennison N, Capriotti P, Lawniczak MKN, Baines RA, Warnecke T, Windbichler N, Leulier F, Bellono NW and Miguel-Aliaga I

    MRC London Institute of Medical Sciences, London, UK.

    In cells, organs and whole organisms, nutrient sensing is key to maintaining homeostasis and adapting to a fluctuating environment<sup>1</sup>. In many animals, nutrient sensors are found within the enteroendocrine cells of the digestive system; however, less is known about nutrient sensing in their cellular siblings, the absorptive enterocytes<sup>1</sup>. Here we use a genetic screen in Drosophila melanogaster to identify Hodor, an ionotropic receptor in enterocytes that sustains larval development, particularly in nutrient-scarce conditions. Experiments in Xenopus oocytes and flies indicate that Hodor is a pH-sensitive, zinc-gated chloride channel that mediates a previously unrecognized dietary preference for zinc. Hodor controls systemic growth from a subset of enterocytes-interstitial cells-by promoting food intake and insulin/IGF signalling. Although Hodor sustains gut luminal acidity and restrains microbial loads, its effect on systemic growth results from the modulation of Tor signalling and lysosomal homeostasis within interstitial cells. Hodor-like genes are insect-specific, and may represent targets for the control of disease vectors. Indeed, CRISPR-Cas9 genome editing revealed that the single hodor orthologue in Anopheles gambiae is an essential gene. Our findings highlight the need to consider the instructive contributions of metals-and, more generally, micronutrients-to energy homeostasis.

    Funded by: NIDDK NIH HHS: R00 DK115879

    Nature 2020;580;7802;263-268

  • Pathway and network analysis of more than 2500 whole cancer genomes.

    Reyna MA, Haan D, Paczkowska M, Verbeke LPC, Vazquez M, Kahraman A, Pulido-Tamayo S, Barenboim J, Wadi L, Dhingra P, Shrestha R, Getz G, Lawrence MS, Pedersen JS, Rubin MA, Wheeler DA, Brunak S, Izarzugaza JMG, Khurana E, Marchal K, von Mering C, Sahinalp SC, Valencia A, PCAWG Drivers and Functional Interpretation Working Group, Reimand J, Stuart JM, Raphael BJ and PCAWG Consortium

    Department of Computer Science, Princeton University, Princeton, NJ, 08540, USA.

    The catalog of cancer driver mutations in protein-coding genes has greatly expanded in the past decade. However, non-coding cancer driver mutations are less well-characterized and only a handful of recurrent non-coding mutations, most notably TERT promoter mutations, have been reported. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancer across 38 tumor types, we perform multi-faceted pathway and network analyses of non-coding mutations across 2583 whole cancer genomes from 27 tumor types compiled by the ICGC/TCGA PCAWG project that was motivated by the success of pathway and network analyses in prioritizing rare mutations in protein-coding genes. While few non-coding genomic elements are recurrently mutated in this cohort, we identify 93 genes harboring non-coding mutations that cluster into several modules of interacting proteins. Among these are promoter mutations associated with reduced mRNA expression in TP53, TLE4, and TCF4. We find that biological processes had variable proportions of coding and non-coding mutations, with chromatin remodeling and proliferation pathways altered primarily by coding mutations, while developmental pathways, including Wnt and Notch, altered by both coding and non-coding mutations. RNA splicing is primarily altered by non-coding mutations in this cohort, and samples containing non-coding mutations in well-known RNA splicing factors exhibit similar gene expression signatures as samples with coding mutations in these genes. These analyses contribute a new repertoire of possible cancer genes and mechanisms that are altered by non-coding mutations and offer insights into additional cancer vulnerabilities that can be investigated for potential therapeutic treatments.

    Funded by: NCI NIH HHS: U24 CA211000; NHGRI NIH HHS: R01 HG007069

    Nature communications 2020;11;1;729

  • Analyses of non-coding somatic drivers in 2,658 cancer whole genomes.

    Rheinbay E, Nielsen MM, Abascal F, Wala JA, Shapira O, Tiao G, Hornshøj H, Hess JM, Juul RI, Lin Z, Feuerbach L, Sabarinathan R, Madsen T, Kim J, Mularoni L, Shuai S, Lanzós A, Herrmann C, Maruvka YE, Shen C, Amin SB, Bandopadhayay P, Bertl J, Boroevich KA, Busanovich J, Carlevaro-Fita J, Chakravarty D, Chan CWY, Craft D, Dhingra P, Diamanti K, Fonseca NA, Gonzalez-Perez A, Guo Q, Hamilton MP, Haradhvala NJ, Hong C, Isaev K, Johnson TA, Juul M, Kahles A, Kahraman A, Kim Y, Komorowski J, Kumar K, Kumar S, Lee D, Lehmann KV, Li Y, Liu EM, Lochovsky L, Park K, Pich O, Roberts ND, Saksena G, Schumacher SE, Sidiropoulos N, Sieverling L, Sinnott-Armstrong N, Stewart C, Tamborero D, Tubio JMC, Umer HM, Uusküla-Reimand L, Wadelius C, Wadi L, Yao X, Zhang CZ, Zhang J, Haber JE, Hobolth A, Imielinski M, Kellis M, Lawrence MS, von Mering C, Nakagawa H, Raphael BJ, Rubin MA, Sander C, Stein LD, Stuart JM, Tsunoda T, Wheeler DA, Johnson R, Reimand J, Gerstein M, Khurana E, Campbell PJ, López-Bigas N, PCAWG Drivers and Functional Interpretation Working Group, PCAWG Structural Variation Working Group, Weischenfeldt J, Beroukhim R, Martincorena I, Pedersen JS, Getz G and PCAWG Consortium

    The Broad Institute of MIT and Harvard, Cambridge, MA, USA.

    The discovery of drivers of cancer has traditionally focused on protein-coding genes<sup>1-4</sup>. Here we present analyses of driver point mutations and structural variants in non-coding regions across 2,658 genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium<sup>5</sup> of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). For point mutations, we developed a statistically rigorous strategy for combining significance levels from multiple methods of driver discovery that overcomes the limitations of individual methods. For structural variants, we present two methods of driver discovery, and identify regions that are significantly affected by recurrent breakpoints and recurrent somatic juxtapositions. Our analyses confirm previously reported drivers<sup>6,7</sup>, raise doubts about others and identify novel candidates, including point mutations in the 5' region of TP53, in the 3' untranslated regions of NFKBIZ and TOB1, focal deletions in BRD4 and rearrangements in the loci of AKR1C genes. We show that although point mutations and structural variants that drive cancer are less frequent in non-coding genes and regulatory sequences than in protein-coding genes, additional examples of these drivers will be found as more cancer genomes become available.

    Funded by: NCI NIH HHS: R01 CA188228, R01 CA215489, U24 CA143845, U24 CA210999, U24 CA211000, U54 CA143798; NHGRI NIH HHS: R01 HG007069; NIGMS NIH HHS: R35 GM127029

    Nature 2020;578;7793;102-111

  • Multi-proxy analyses of a mid-15th century Middle Iron Age Bantu-speaker palaeo-faecal specimen elucidates the configuration of the 'ancestral' sub-Saharan African intestinal microbiome.

    Rifkin RF, Vikram S, Ramond JB, Rey-Iglesia A, Brand TB, Porraz G, Val A, Hall G, Woodborne S, Le Bailly M, Potgieter M, Underdown SJ, Koopman JE, Cowan DA, Van de Peer Y, Willerslev E and Hansen AJ

    Center for Microbial Ecology and Genomics, Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Hatfield, South Africa.

    Background: The archaeological incidence of ancient human faecal material provides a rare opportunity to explore the taxonomic composition and metabolic capacity of the ancestral human intestinal microbiome (IM). Here, we report the results of the shotgun metagenomic analyses of an ancient South African palaeo-faecal specimen.

    Methods: Following the recovery of a single desiccated palaeo-faecal specimen from Bushman Rock Shelter in Limpopo Province, South Africa, we applied a multi-proxy analytical protocol to the sample. The extraction of ancient DNA from the specimen and its subsequent shotgun metagenomic sequencing facilitated the taxonomic and metabolic characterisation of this ancient human IM.

    Results: Our results indicate that the distal IM of the Neolithic 'Middle Iron Age' (c. AD 1460) Bantu-speaking individual exhibits features indicative of a largely mixed forager-agro-pastoralist diet. Subsequent comparison with the IMs of the Tyrolean Iceman (Ötzi) and contemporary Hadza hunter-gatherers, Malawian agro-pastoralists and Italians reveals that this IM precedes recent adaptation to 'Western' diets, including the consumption of coffee, tea, chocolate, citrus and soy, and the use of antibiotics, analgesics and also exposure to various toxic environmental pollutants.

    Conclusions: Our analyses reveal some of the causes and means by which current human IMs are likely to have responded to recent dietary changes, prescription medications and environmental pollutants, providing rare insight into human IM evolution following the advent of the Neolithic c. 12,000 years ago. Video Abtract.

    Funded by: National Geographic Society: NGS-371R-18; National Research Foundation: UID Nr. 105197

    Microbiome 2020;8;1;62

  • Regional differences in human biliary tissues and corresponding in vitro derived organoids.

    Rimland CA, Tilson SG, Morell CM, Tomaz RA, Lu WY, Adams SE, Georgakopoulos N, Otaizo-Carrasquero F, Myers TG, Ferdinand JR, Gieseck RL, Sampaziotis F, Tysoe OC, Wesley B, Muraro D, Oniscu GC, Hannan NR, Forbes SJ, Saeb-Parsy K, Wynn TA and Vallier L

    Wellcome-Medical Research Council , Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK.

    Background: Organoids provide a powerful system to study epithelia in vitro. Recently, this approach was applied successfully to the biliary tree, a series of ductular tissues responsible for the drainage of bile and pancreatic secretions. More precisely, organoids have been derived from ductal tissue located outside (Extrahepatic Bile Ducts or EHBDs) or inside the liver (Intrahepatic bile ducts or IHBDs). These organoids share many characteristics including the expression of cholangiocyte markers such as KRT19. However, the relationship between these organoids and their tissues of origin, and to each other, is largely unknown.

    Methods/results: Organoids were derived from human gallbladder, common bile duct, pancreatic duct and intrahepatic bile ducts using culture conditions promoting WNT signaling. The resulting IHBD and EHBD organoids expressed stem/progenitor markers LGR5/PROM1 and ductal markers KRT19/KRT7. However, RNA-sequencing revealed that organoids conserve only a limited number of regional-specific markers corresponding to their location of origin. Of particular interest, downregulation of biliary markers and upregulation of cell cycle genes was observed in organoids. IHBD and EHBD organoids diverged in their response to WNT signaling and only IHBD were able to express a low-level of hepatocyte markers under differentiation conditions.

    Conclusions: Taken together, our results demonstrate that differences exist not only between extrahepatic biliary organoids and their tissue of origin but also between IHBD and EHBD organoids. This information may help to understand the tissue specificity of cholangiopathies and also to identify new targets for therapeutic development.

    Hepatology (Baltimore, Md.) 2020

  • Gene Silencing in the Liver Fluke Fasciola hepatica: RNA Interference.

    Rinaldi G, Dell'Oca N, Castillo E and Tort JF

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, UK.

    The chronic infection with the liver fluke of the genus Fasciola spp. is the most prevalent foodborne trematodiasis, affecting at least one-fourth of the world livestock grazing in areas where the parasite is present. Moreover, fascioliasis is considered a major zoonosis mainly in rural areas of central South America, Northern Africa, and Central Asia. Increasing evidences of resistance against triclabendazole may compromise its use as drug of choice; thus, novel control strategies are desperately needed. Functional genomic approaches play a key role in the validation and characterization of new targets for drug and vaccine development. So far, RNA interference has been the only gene silencing approach successfully employed in liver flukes of the genus Fasciola spp. Herein, we describe a detailed step-by-step protocol to perform gene silencing mediated by RNAi in Fasciola hepatica.

    Methods in molecular biology (Clifton, N.J.) 2020;2137;67-92

  • The influence of rare variants in circulating metabolic biomarkers.

    Riveros-Mckay F, Oliver-Williams C, Karthikeyan S, Walter K, Kundu K, Ouwehand WH, Roberts D, Di Angelantonio E, Soranzo N, Danesh J, INTERVAL Study, Wheeler E, Zeggini E, Butterworth AS and Barroso I

    Wellcome Sanger Institute, Cambridge, United Kingdom.

    Circulating metabolite levels are biomarkers for cardiovascular disease (CVD). Here we studied, association of rare variants and 226 serum lipoproteins, lipids and amino acids in 7,142 (discovery plus follow-up) healthy participants. We leveraged the information from multiple metabolite measurements on the same participants to improve discovery in rare variant association analyses for gene-based and gene-set tests by incorporating correlated metabolites as covariates in the validation stage. Gene-based analysis corrected for the effective number of tests performed, confirmed established associations at APOB, APOC3, PAH, HAL and PCSK (p<1.32x10-7) and identified novel gene-trait associations at a lower stringency threshold with ACSL1, MYCN, FBXO36 and B4GALNT3 (p<2.5x10-6). Regulation of the pyruvate dehydrogenase (PDH) complex was associated for the first time, in gene-set analyses also corrected for effective number of tests, with IDL and LDL parameters, as well as circulating cholesterol (pMETASKAT<2.41x10-6). In conclusion, using an approach that leverages metabolite measurements obtained in the same participants, we identified novel loci and pathways involved in the regulation of these important metabolic biomarkers. As large-scale biobanks continue to amass sequencing and phenotypic information, analytical approaches such as ours will be useful to fully exploit the copious amounts of biological data generated in these efforts.

    PLoS genetics 2020;16;3;e1008605

  • Screening of healthcare workers for SARS-CoV-2 highlights the role of asymptomatic carriage in COVID-19 transmission.

    Rivett L, Sridhar S, Sparkes D, Routledge M, Jones NK, Forrest S, Young J, Pereira-Dias J, Hamilton WL, Ferris M, Torok ME, Meredith L, CITIID-NIHR COVID-19 BioResource Collaboration, Gupta R, Lyons PA, Toshner M, Warne B, Bartholdson Scott J, Cormie C, Gill H, Kean I, Maes M, Reynolds N, Wantoch M, Caddy S, Caller L, Feltwell T, Hall G, Hosmillo M, Houldcroft C, Jahun A, Khokhar F, Yakovleva A, Butcher H, Caputo D, Clapham-Riley D, Dolling H, Furlong A, Graves B, Gresley EL, Kingston N, Papadia S, Stark H, Stirrups KE, Webster J, Calder J, Harris J, Hewitt S, Kennet J, Meadows A, Rastall R, Brien CO, Price J, Publico C, Rowlands J, Ruffolo V, Tordesillas H, Brookes K, Canna L, Cruz I, Dempsey K, Elmer A, Escoffery N, Jones H, Ribeiro C, Saunders C, Wright A, Nyagumbo R, Roberts A, Bucke A, Hargreaves S, Johnson D, Narcorda A, Read D, Sparke C, Warboys L, Lagadu K, Mactavous L, Gould T, Raine T, Mather C, Ramenatte N, Vallier AL, Kasanicki M, Eames PJ, McNicholas C, Thake L, Bartholomew N, Brown N, Parmar S, Zhang H, Bowring A, Martell G, Quinnell N, Wright J, Murphy H, Dunmore BJ, Legchenko E, Gräf S, Huang C, Hodgson J, Hunter K, Martin J, Mescia F, O'Donnell C, Pointon L, Shih J, Sutcliffe R, Tilly T, Tong Z, Treacy C, Wood J, Bergamaschi L, Betancourt A, Bowyer G, De Sa A, Epping M, Hinch A, Huhn O, Jarvis I, Lewis D, Marsden J, McCallum S, Nice F, Curran MD, Fuller S, Chaudhry A, Shaw A, Samworth RJ, Bradley JR, Dougan G, Smith KG, Lehner PJ, Matheson NJ, Wright G, Goodfellow IG, Baker S and Weekes MP

    Department of Infectious Diseases, Cambridge University NHS Hospitals Foundation Trust, Cambridge, United Kingdom.

    Significant differences exist in the availability of healthcare worker (HCW) SARS-CoV-2 testing between countries, and existing programmes focus on screening symptomatic rather than asymptomatic staff. Over a 3 week period (April 2020), 1032 asymptomatic HCWs were screened for SARS-CoV-2 in a large UK teaching hospital. Symptomatic staff and symptomatic household contacts were additionally tested. Real-time RT-PCR was used to detect viral RNA from a throat+nose self-swab. 3% of HCWs in the <i>asymptomatic screening group</i> tested positive for SARS-CoV-2. 17/30 (57%) were truly asymptomatic/pauci-symptomatic. 12/30 (40%) had experienced symptoms compatible with coronavirus disease 2019 (COVID-19)>7 days prior to testing, most self-isolating, returning well. Clusters of HCW infection were discovered on two independent wards. Viral genome sequencing showed that the majority of HCWs had the dominant lineage B∙1. Our data demonstrates the utility of comprehensive screening of HCWs with minimal or no symptoms. This approach will be critical for protecting patients and hospital staff.

    Funded by: Academy of Medical Sciences: Clinician Scientist Fellowship; Cancer Research UK: PRECISION Grand Challenge C38317/A24043; Engineering and Physical Sciences Research Council: EP/N031938/1, EP/P031447/1; Medical Research Council: MR/P008801/1; NHS Blood and Transplant: WPA15-02; National Institute for Health Research: Cambridge Biomedical Research Centre; Wellcome: 108070/Z/15/Z, 200871/Z/16/Z, 206298/B/17/Z, 207498?Z/17/Z, 210688/Z/18/Z, 215515/Z/19/Z; Wellcome Trust

    eLife 2020;9

  • Automation of Multiplexed RNAscope Single-Molecule Fluorescent In Situ Hybridization and Immunohistochemistry for Spatial Tissue Mapping.

    Roberts K and Bayraktar OA

    Wellcome Sanger Institute, Hinxton, UK.

    In situ transcriptomic methods hold immense promise for spatially resolved mapping of cell types across human tissues. Here, we describe a protocol for automated single-molecule fluorescent in situ hybridization (smFISH) on standard histology sections at high throughput. We focus on the RNAscope smFISH assay that combines branched DNA amplification with tyramide signal amplification (TSA) to obtain high signal-to-noise ratio for tissue imaging. We describe the use of the robotic Leica BOND RX system for automation of liquid handling and the combination of the RNAscope assay with TSA-based immunohistochemistry without the need for specialized demultiplexed imaging.

    Methods in molecular biology (Clifton, N.J.) 2020;2148;229-244

  • Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition.

    Rodriguez-Martin B, Alvarez EG, Baez-Ortega A, Zamora J, Supek F, Demeulemeester J, Santamarina M, Ju YS, Temes J, Garcia-Souto D, Detering H, Li Y, Rodriguez-Castro J, Dueso-Barroso A, Bruzos AL, Dentro SC, Blanco MG, Contino G, Ardeljan D, Tojo M, Roberts ND, Zumalave S, Edwards PAW, Weischenfeldt J, Puiggròs M, Chong Z, Chen K, Lee EA, Wala JA, Raine K, Butler A, Waszak SM, Navarro FCP, Schumacher SE, Monlong J, Maura F, Bolli N, Bourque G, Gerstein M, Park PJ, Wedge DC, Beroukhim R, Torrents D, Korbel JO, Martincorena I, Fitzgerald RC, Van Loo P, Kazazian HH, Burns KH, PCAWG Structural Variation Working Group, Campbell PJ, Tubio JMC and PCAWG Consortium

    Genomes and Disease, Centre for Research in Molecular Medicine and Chronic Diseases (CIMUS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain.

    About half of all cancers have somatic integrations of retrotransposons. Here, to characterize their role in oncogenesis, we analyzed the patterns and mechanisms of somatic retrotransposition in 2,954 cancer genomes from 38 histological cancer subtypes within the framework of the Pan-Cancer Analysis of Whole Genomes (PCAWG) project. We identified 19,166 somatically acquired retrotransposition events, which affected 35% of samples and spanned a range of event types. Long interspersed nuclear element (LINE-1; L1 hereafter) insertions emerged as the first most frequent type of somatic structural variation in esophageal adenocarcinoma, and the second most frequent in head-and-neck and colorectal cancers. Aberrant L1 integrations can delete megabase-scale regions of a chromosome, which sometimes leads to the removal of tumor-suppressor genes, and can induce complex translocations and large-scale duplications. Somatic retrotranspositions can also initiate breakage-fusion-bridge cycles, leading to high-level amplification of oncogenes. These observations illuminate a relevant role of L1 retrotransposition in remodeling the cancer genome, with potential implications for the development of human tumors.

    Funded by: NCI NIH HHS: R01 CA172652

    Nature genetics 2020;52;3;306-319

  • Mutational signatures in tumours induced by high and low energy radiation in Trp53 deficient mice.

    Rose Li Y, Halliwill KD, Adams CJ, Iyer V, Riva L, Mamunur R, Jen KY, Del Rosario R, Fredlund E, Hirst G, Alexandrov LB, Adams D and Balmain A

    UCSF Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA, 94158, USA.

    Ionising radiation (IR) is a recognised carcinogen responsible for cancer development in patients previously treated using radiotherapy, and in individuals exposed as a result of accidents at nuclear energy plants. However, the mutational signatures induced by distinct types and doses of radiation are unknown. Here, we analyse the genetic architecture of mammary tumours, lymphomas and sarcomas induced by high (<sup>56</sup>Fe-ions) or low (gamma) energy radiation in mice carrying Trp53 loss of function alleles. In mammary tumours, high-energy radiation is associated with induction of focal structural variants, leading to genomic instability and Met amplification. Gamma-radiation is linked to large-scale structural variants and a point mutation signature associated with oxidative stress. The genomic architecture of carcinomas, sarcomas and lymphomas arising in the same animals are significantly different. Our study illustrates the complex interactions between radiation quality, germline Trp53 deficiency and tissue/cell of origin in shaping the genomic landscape of IR-induced tumours.

    Funded by: Cancer Research UK: 12401; NCI NIH HHS: F31 CA180715, F32 CA232635, R01 CA184510, R35 CA210018, U01 CA084244, U01 CA176287; NIGMS NIH HHS: T32 GM007175

    Nature communications 2020;11;1;394

  • Timing the initiation of multiple myeloma.

    Rustad EH, Yellapantula V, Leongamornlert D, Bolli N, Ledergor G, Nadeu F, Angelopoulos N, Dawson KJ, Mitchell TJ, Osborne RJ, Ziccheddu B, Carniti C, Montefusco V, Corradini P, Anderson KC, Moreau P, Papaemmanuil E, Alexandrov LB, Puente XS, Campo E, Siebert R, Avet-Loiseau H, Landgren O, Munshi N, Campbell PJ and Maura F

    Myeloma Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA.

    The evolution and progression of multiple myeloma and its precursors over time is poorly understood. Here, we investigate the landscape and timing of mutational processes shaping multiple myeloma evolution in a large cohort of 89 whole genomes and 973 exomes. We identify eight processes, including a mutational signature caused by exposure to melphalan. Reconstructing the chronological activity of each mutational signature, we estimate that the initial transformation of a germinal center B-cell usually occurred during the first 2<sup>nd</sup>-3<sup>rd</sup> decades of life. We define four main patterns of activation-induced deaminase (AID) and apolipoprotein B mRNA editing catalytic polypeptide-like (APOBEC) mutagenesis over time, including a subset of patients with evidence of prolonged AID activity during the pre-malignant phase, indicating antigen-responsiveness and germinal center reentry. Our findings provide a framework to study the etiology of multiple myeloma and explore strategies for prevention and early detection.

    Funded by: NCI NIH HHS: P30 CA008748

    Nature communications 2020;11;1;1917

  • A community effort to create standards for evaluating tumor subclonal reconstruction.

    Salcedo A, Tarabichi M, Espiritu SMG, Deshwar AG, David M, Wilson NM, Dentro S, Wintersinger JA, Liu LY, Ko M, Sivanandan S, Zhang H, Zhu K, Ou Yang TH, Chilton JM, Buchanan A, Lalansingh CM, P'ng C, Anghel CV, Umar I, Lo B, Zou W, DREAM SMC-Het Participants, Simpson JT, Stuart JM, Anastassiou D, Guan Y, Ewing AD, Ellrott K, Wedge DC, Morris Q, Van Loo P and Boutros PC

    Ontario Institute for Cancer Research, Toronto, Canada.

    Tumor DNA sequencing data can be interpreted by computational methods that analyze genomic heterogeneity to infer evolutionary dynamics. A growing number of studies have used these approaches to link cancer evolution with clinical progression and response to therapy. Although the inference of tumor phylogenies is rapidly becoming standard practice in cancer genome analyses, standards for evaluating them are lacking. To address this need, we systematically assess methods for reconstructing tumor subclonality. First, we elucidate the main algorithmic problems in subclonal reconstruction and develop quantitative metrics for evaluating them. Then we simulate realistic tumor genomes that harbor all known clonal and subclonal mutation types and processes. Finally, we benchmark 580 tumor reconstructions, varying tumor read depth, tumor type and somatic variant detection. Our analysis provides a baseline for the establishment of gold-standard methods to analyze tumor heterogeneity.

    Funded by: EC | EU Framework Programme for Research and Innovation H2020 | H2020 Priority Excellent Science | H2020 Marie Sklodowska-Curie Actions (H2020 Excellent Science - Marie Sklodowska-Curie Actions): 747852-SIOMICS; Francis Crick Institute (Francis Crick Institute Limited): FC001202; Medical Research Council: FC001202, MR/L016311/1; Movember Foundation (Movember): RS2014-01; NCI NIH HHS: P30 CA016042, R01 CA180778, U24 CA143858, U24 CA210990; NHGRI NIH HHS: U41 HG006620; NIAID NIH HHS: R01 AI134384; Wellcome Trust

    Nature biotechnology 2020;38;1;97-107

  • A framework for an evidence-based gene list relevant to autism spectrum disorder.

    Schaaf CP, Betancur C, Yuen RKC, Parr JR, Skuse DH, Gallagher L, Bernier RA, Buchanan JA, Buxbaum JD, Chen CA, Dies KA, Elsabbagh M, Firth HV, Frazier T, Hoang N, Howe J, Marshall CR, Michaud JL, Rennie O, Szatmari P, Chung WK, Bolton PF, Cook EH, Scherer SW and Vorstman JAS

    Institute of Human Genetics, Heidelberg University, Heidelberg, Germany.

    Autism spectrum disorder (ASD) is often grouped with other brain-related phenotypes into a broader category of neurodevelopmental disorders (NDDs). In clinical practice, providers need to decide which genes to test in individuals with ASD phenotypes, which requires an understanding of the level of evidence for individual NDD genes that supports an association with ASD. Consensus is currently lacking about which NDD genes have sufficient evidence to support a relationship to ASD. Estimates of the number of genes relevant to ASD differ greatly among research groups and clinical sequencing panels, varying from a few to several hundred. This Roadmap discusses important considerations necessary to provide an evidence-based framework for the curation of NDD genes based on the level of information supporting a clinically relevant relationship between a given gene and ASD.

    Nature reviews. Genetics 2020

  • Lentiviral gene therapy rescues p47phox chronic granulomatous disease and the ability to fight Salmonella infection in mice.

    Schejtman A, Aragão-Filho WC, Clare S, Zinicola M, Weisser M, Burns SO, Booth C, Gaspar HB, Thomas DC, Condino-Neto A, Thrasher AJ and Santilli G

    Molecular and Cellular Immunology Unit, UCL Great Ormond Street Institute of Child Health, University College London, London, UK.

    Chronic granulomatous disease (CGD) is an inherited primary immunodeficiency disorder characterised by recurrent and often life-threatening infections and hyperinflammation. It is caused by defects of the phagocytic NADPH oxidase, a multicomponent enzyme system responsible for effective pathogen killing. A phase I/II clinical trial of lentiviral gene therapy is underway for the most common form of CGD, X-linked, caused by mutations in the gp91<sup>phox</sup> subunit of the NADPH oxidase. We propose to use a similar strategy to tackle p47<sup>phox</sup>-deficient CGD, caused by mutations in NCF1, which encodes the p47<sup>phox</sup> cytosolic component of the enzymatic complex. We generated a pCCLCHIM-p47<sup>phox</sup> lentiviral vector, containing the chimeric Cathepsin G/FES myeloid promoter and a codon-optimised version of the human NCF1 cDNA. Here we show that transduction with the pCCLCHIM-p47<sup>phox</sup> vector efficiently restores p47<sup>phox</sup> expression and biochemical NADPH oxidase function in p47<sup>phox</sup>-deficient human and murine cells. We also tested the ability of our gene therapy approach to control infection by challenging p47<sup>phox</sup>-null mice with Salmonella Typhimurium, a leading cause of sepsis in CGD patients, and found that mice reconstituted with lentivirus-transduced hematopoietic stem cells had a reduced bacterial load compared with untreated mice. Overall, our results potentially support the clinical development of a gene therapy approach using the pCCLCHIM-p47<sup>phox</sup> vector.

    Gene therapy 2020

  • Mosaicism in ASXL3-related syndrome: Description of five patients from three families.

    Schirwani S, Hauser N, Platt A, Punj S, Prescott K, Canham N, Study DDD, Mansour S and Balasubramanian M

    Academic Unit of Child Health, Department of Oncology & Metabolism, University of Sheffield, UK; Sheffield Clinical Genetics Service, Sheffield Children's NHS Foundation Trust, UK. Electronic address:

    De novo pathogenic variants in the additional sex combs-like 3 (ASXL3) gene cause a rare multi-systemic neurodevelopmental disorder. There is growing evidence that germline and somatic mosaicism are more common and play a greater role in genetic disorders than previously acknowledged. There is one previous report of ASXL3-related syndrome caused by de novo pathogenic variants in two siblings suggesting gonadal mosaicism. In this report, we present five patients with ASXL3-related syndrome, describing two families comprising two non-twin siblings harbouring apparent de novo pathogenic variants in ASXL3. Parents were clinically unaffected and there was no evidence of mosaicism from genomic DNA on exome-trio data, suggesting germline mosaicism in one of the parents. We also describe clinical details of a patient with typical features of ASXL3-related syndrome and mosaic de novo pathogenic variant in ASXL3 in 30-35% of both blood and saliva sample on trio-exome sequencing. We expand the known genetic basis of ASXL3-related syndromes and discuss mosaicism as a disease mechanism in five patients from three unrelated families. The findings of this report highlight the importance of taking gonadal mosaicism into consideration when counselling families regarding recurrence risk. We also discuss postzygotic mosaicism as a cause of fully penetrant ASXL3-related syndrome.

    European journal of medical genetics 2020;103925

  • Somatic mosaicism and common genetic variation contribute to the risk of very-early-onset inflammatory bowel disease.

    Serra EG, Schwerd T, Moutsianas L, Cavounidis A, Fachal L, Pandey S, Kammermeier J, Croft NM, Posovszky C, Rodrigues A, Russell RK, Barakat F, Auth MKH, Heuschkel R, Zilbauer M, Fyderek K, Braegger C, Travis SP, Satsangi J, Parkes M, Thapar N, Ferry H, Matte JC, Gilmour KC, Wedrychowicz A, Sullivan P, Moore C, Sambrook J, Ouwehand W, Roberts D, Danesh J, Baeumler TA, Fulga TA, Karaminejadranjbar M, Ahmed A, Wilson R, Barrett JC, Elkadri A, Griffiths AM, COLORS in IBD group investigators, Oxford IBD cohort study investigators, INTERVAL Study, Swiss IBD cohort investigators, UK IBD Genetics Consortium, NIDDK IBD Genetics Consortium, Snapper SB, Shah N, Muise AM, Wilson DC, Uhlig HH and Anderson CA

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

    Very-early-onset inflammatory bowel disease (VEO-IBD) is a heterogeneous phenotype associated with a spectrum of rare Mendelian disorders. Here, we perform whole-exome-sequencing and genome-wide genotyping in 145 patients (median age-at-diagnosis of 3.5 years), in whom no Mendelian disorders were clinically suspected. In five patients we detect a primary immunodeficiency or enteropathy, with clinical consequences (XIAP, CYBA, SH2D1A, PCSK1). We also present a case study of a VEO-IBD patient with a mosaic de novo, pathogenic allele in CYBB. The mutation is present in ~70% of phagocytes and sufficient to result in defective bacterial handling but not life-threatening infections. Finally, we show that VEO-IBD patients have, on average, higher IBD polygenic risk scores than population controls (99 patients and 18,780 controls; P < 4 × 10<sup>-10</sup>), and replicate this finding in an independent cohort of VEO-IBD cases and controls (117 patients and 2,603 controls; P < 5 × 10<sup>-10</sup>). This discovery indicates that a polygenic component operates in VEO-IBD pathogenesis.

    Funded by: Department of Health: NIHR-RP-R3-12-026; Medical Research Council: MC_UU_00008/7, MC_UU_12010/7, MR/S036377/1; Wellcome Trust

    Nature communications 2020;11;1;995

  • Cell Surface Receptor Identification Using Genome-Scale CRISPR/Cas9 Genetic Screens.

    Sharma S and Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute; EMBL-EBI, Wellcome Genome Campus;

    Intercellular communication mediated by direct interactions between membrane-embedded cell surface receptors is crucial for the normal development and functioning of multicellular organisms. Detecting these interactions remains technically challenging, however. This manuscript describes a systematic genome-scale CRISPR/Cas9 knockout genetic screening approach that reveals cellular pathways required for specific cell surface recognition events. This assay utilizes recombinant proteins produced in a mammalian protein expression system as avid binding probes to identify interaction partners in a cell-based genetic screen. This method can be used to identify the genes necessary for cell surface interactions detected by recombinant binding probes corresponding to the ectodomains of membrane-embedded receptors. Importantly, given the genome-scale nature of this approach, it also has the advantage of not only identifying the direct receptor but also the cellular components that are required for the presentation of the receptor at the cell surface, thereby providing valuable insights into the biology of the receptor.

    Journal of visualized experiments : JoVE 2020;160

  • Genomic profiling reveals distinct routes to complement resistance in Klebsiella pneumoniae.

    Short FL, Di Sario G, Reichmann NT, Kleanthous C, Parkhill J and Taylor PW

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    The serum complement (C') system is a first line of defense against bacterial invaders. Resistance to killing by serum enhances the capacity of <i>Klebsiella pneumoniae</i> to cause infection, but is an incompletely understood virulence trait. Identifying and characterising the factors responsible for preventing activation of, and killing by, serum C' could inform new approaches to treatment of <i>K. pneumoniae</i> infections. We have used functional genomic profiling to define the genetic basis of C' resistance in four diverse serum-resistant <i>K. pneumoniae</i> strains (NTUH-K2044, B5055, ATCC43816 and RH201207), and explored their recognition by key complement components. Over 90 genes contributed to resistance in one or more strains, but only three, <i>rfaH</i>, <i>lpp</i> and <i>arnD</i>, were common to all four. Deletion of the anti-terminator <i>rfaH</i>, controlling expression of capsule and O-side chains, resulted in dramatic C' resistance reductions in all strains. The murein lipoprotein gene <i>lpp</i> promoted capsule retention through a mechanism dependent on its C-terminal lysine residue; its deletion led to modest reductions in C' resistance. Binding experiments with the C' components C3b and C5b-9 showed that the underlying mechanism of evasion varied in the four strains: B5055 and NTUH-K2044 appeared to bypass recognition by C' entirely, while ATCC43816 and RH201207 were able to resist killing despite being associated with substantial levels of C5b-9. All <i>rfaH</i> and <i>lpp</i> mutants bound C3b and C5b-9 in large quantities. Our findings show that, even amongst this small selection of isolates, <i>K. pneumoniae</i> adopts differing mechanisms and utilises distinct gene sets to avoid C' attack.

    Infection and immunity 2020

  • Combined burden and functional impact tests for cancer driver discovery using DriverPower.

    Shuai S, PCAWG Drivers and Functional Interpretation Working Group, Gallinger S, Stein L and PCAWG Consortium

    Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada, M5S 1A8.

    The discovery of driver mutations is one of the key motivations for cancer genome sequencing. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancers across 38 tumour types, we describe DriverPower, a software package that uses mutational burden and functional impact evidence to identify driver mutations in coding and non-coding sites within cancer whole genomes. Using a total of 1373 genomic features derived from public sources, DriverPower's background mutation model explains up to 93% of the regional variance in the mutation rate across multiple tumour types. By incorporating functional impact scores, we are able to further increase the accuracy of driver discovery. Testing across a collection of 2583 cancer genomes from the PCAWG project, DriverPower identifies 217 coding and 95 non-coding driver candidates. Comparing to six published methods used by the PCAWG Drivers and Functional Interpretation Working Group, DriverPower has the highest F1 score for both coding and non-coding driver discovery. This demonstrates that DriverPower is an effective framework for computational driver discovery.

    Nature communications 2020;11;1;734

  • Single cell sequencing shines a light on malaria parasite relatedness in complex infections.

    Siegel SV and Rayner JC

    Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, United Kingdom. Electronic address:

    Recent genomic studies are investigating the wide-ranging implications of malaria complex infections on parasite diversity, transmission, and downstream disease eradication efforts. Using single cell sequencing, new work by Nkhoma et al. provides evidence that there is unexpectedly frequent co-transmission of related parasites in the intense transmission setting in Malawi.

    Trends in parasitology 2020;36;2;83-85

  • Genomic footprints of activated telomere maintenance mechanisms in cancer.

    Sieverling L, Hong C, Koser SD, Ginsbach P, Kleinheinz K, Hutter B, Braun DM, Cortés-Ciriano I, Xi R, Kabbe R, Park PJ, Eils R, Schlesner M, PCAWG-Structural Variation Working Group, Brors B, Rippe K, Jones DTW, Feuerbach L and PCAWG Consortium

    Division of Applied Bioinformatics, German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany.

    Cancers require telomere maintenance mechanisms for unlimited replicative potential. They achieve this through TERT activation or alternative telomere lengthening associated with ATRX or DAXX loss. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we dissect whole-genome sequencing data of over 2500 matched tumor-control samples from 36 different tumor types aggregated within the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium to characterize the genomic footprints of these mechanisms. While the telomere content of tumors with ATRX or DAXX mutations (ATRX/DAXX<sup>trunc</sup>) is increased, tumors with TERT modifications show a moderate decrease of telomere content. One quarter of all tumor samples contain somatic integrations of telomeric sequences into non-telomeric DNA. This fraction is increased to 80% prevalence in ATRX/DAXX<sup>trunc</sup> tumors, which carry an aberrant telomere variant repeat (TVR) distribution as another genomic marker. The latter feature includes enrichment or depletion of the previously undescribed singleton TVRs TTCGGG and TTTGGG, respectively. Our systematic analysis provides new insight into the recurrent genomic alterations associated with telomere maintenance mechanisms in cancer.

    Funded by: Bundesministerium für Bildung und Forschung (Federal Ministry of Education and Research): 01KU1001A, -B, -C, and -D, 01KU1505A, 01ZX1302; Deutsche Forschungsgemeinschaft (German Research Foundation): Br3535/1-2; EC | Horizon 2020 (Horizon 2020 - Research and Innovation Framework Programme): 703543; NCI NIH HHS: R01 CA218112

    Nature communications 2020;11;1;733

  • Arctic-adapted dogs emerged at the Pleistocene-Holocene transition.

    Sinding MS, Gopalakrishnan S, Ramos-Madrigal J, de Manuel M, Pitulko VV, Kuderna L, Feuerborn TR, Frantz LAF, Vieira FG, Niemann J, Samaniego Castruita JA, Carøe C, Andersen-Ranberg EU, Jordan PD, Pavlova EY, Nikolskiy PA, Kasparov AK, Ivanova VV, Willerslev E, Skoglund P, Fredholm M, Wennerberg SE, Heide-Jørgensen MP, Dietz R, Sonne C, Meldgaard M, Dalén L, Larson G, Petersen B, Sicheritz-Pontén T, Bachmann L, Wiig Ø, Marques-Bonet T, Hansen AJ and Gilbert MTP

    The GLOBE Institute, University of Copenhagen, Copenhagen, Denmark.

    Although sled dogs are one of the most specialized groups of dogs, their origin and evolution has received much less attention than many other dog groups. We applied a genomic approach to investigate their spatiotemporal emergence by sequencing the genomes of 10 modern Greenland sled dogs, an ~9500-year-old Siberian dog associated with archaeological evidence for sled technology, and an ~33,000-year-old Siberian wolf. We found noteworthy genetic similarity between the ancient dog and modern sled dogs. We detected gene flow from Pleistocene Siberian wolves, but not modern American wolves, to present-day sled dogs. The results indicate that the major ancestry of modern sled dogs traces back to Siberia, where sled dog-specific haplotypes of genes that potentially relate to Arctic adaptation were established by 9500 years ago.

    Science (New York, N.Y.) 2020;368;6498;1495-1499

  • Mapping the travel patterns of people with malaria in Bangladesh.

    Sinha I, Sayeed AA, Uddin D, Wesolowski A, Zaman SI, Faiz MA, Ghose A, Rahman MR, Islam A, Karim MJ, Saha A, Rezwan MK, Shamsuzzaman AKM, Jhora ST, Aktaruzzaman MM, Chang HH, Miotto O, Kwiatkowski D, Dondorp AM, Day NPJ, Hossain MA, Buckee C and Maude RJ

    Mahidol-Oxford Tropical Medicine Research Unit, Bangkok, Thailand.

    Background: Spread of malaria and antimalarial resistance through human movement present major threats to current goals to eliminate the disease. Bordering the Greater Mekong Subregion, southeast Bangladesh is a potentially important route of spread to India and beyond, but information on travel patterns in this area are lacking.

    Methods: Using a standardised short survey tool, 2090 patients with malaria were interviewed at 57 study sites in 2015-2016 about their demographics and travel patterns in the preceding 2 months.

    Results: Most travel was in the south of the study region between Cox's Bazar district (coastal region) to forested areas in Bandarban (31% by days and 45% by nights), forming a source-sink route. Less than 1% of travel reported was between the north and south forested areas of the study area. Farmers (21%) and students (19%) were the top two occupations recorded, with 67 and 47% reporting travel to the forest respectively. Males aged 25-49 years accounted for 43% of cases visiting forests but only 24% of the study population. Children did not travel. Women, forest dwellers and farmers did not travel beyond union boundaries. Military personnel travelled the furthest especially to remote forested areas.

    Conclusions: The approach demonstrated here provides a framework for identifying key traveller groups and their origins and destinations of travel in combination with knowledge of local epidemiology to inform malaria control and elimination efforts. Working with the NMEP, the findings were used to derive a set of policy recommendations to guide targeting of interventions for elimination.

    Funded by: Bill and Melinda Gates Foundation: NA; Harvard T.H. Chan School of Public Health (US): NA; Wellcome Trust: NA

    BMC medicine 2020;18;1;45

  • Haplotype-aware graph indexes.

    Sirén J, Garrison E, Novak AM, Paten B and Durbin R

    UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA 95064, USA.

    Motivation: The variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are non-biological, unlikely recombinations of true haplotypes.

    Results: We augment the VG model with haplotype information to identify which paths are more likely to exist in nature. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows-Wheeler transform. We demonstrate the scalability of the new implementation by building a whole-genome index of the 5008 haplotypes of the 1000 Genomes Project, and an index of all 108 070 Trans-Omics for Precision Medicine Freeze 5 chromosome 17 haplotypes. We also develop an algorithm for simplifying variation graphs for k-mer indexing without losing any k-mers in the haplotypes.

    Availability and implementation: Our software is available at, and

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Funded by: Wellcome Trust: 207492/Z/17/Z

    Bioinformatics (Oxford, England) 2020;36;2;400-407

  • Functional analysis of vasa/PL10-like genes in the ovary of Schistosoma mansoni.

    Skinner DE, Popratiloff A, Alrefaei YN, Mann VH, Rinaldi G and Brindley PJ

    Department of Microbiology, Immunology & Tropical Medicine, and Research Center for Neglected Diseases of Poverty, School of Medicine & Health Sciences, The George Washington University, Washington, DC 20037 USA; Center for Discovery and Innovation in Parasitic Diseases, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego 9500 Gilman Dr, La Jolla, CA, 92093, USA.

    The RNA helicase Vasa plays a pivotal role in the development of the germ line. To decipher the functional roles of vasa/PL10-like genes in the human blood fluke Schistosoma mansoni, we performed RNA interference followed by the analysis of the ovary in the adult female. Double-stranded RNA targeting the schistosome vasa-like gene Smvlg1 reduced the volume of the ovary. Changes in morphology of the ovary were analysed using carmine red-staining of the parasites followed by a novel confocal laser scanning microscopy (CLSM)-based approach to control for natural autofluorescence in female schistosome tissues. The reduction in the ovary volume may have been promoted by the loss of germ cells. By contrast, significant differences were not apparent in the number of eggs produced or hatching rate of eggs laid by the female schistosomes transfected with Smvlg1-specific dsRNA. The findings suggested a role for S. mansoni vasa/PL10-like gene -1 in germ cell development within the schistosome ovary that might impact in the pathogenesis and disease transmission by this neglected tropical disease pathogen.

    Molecular and biochemical parasitology 2020;236;111259

  • Comprehensive characterization of cell-free tumor DNA in plasma and urine of patients with renal tumors.

    Smith CG, Moser T, Mouliere F, Field-Rayner J, Eldridge M, Riediger AL, Chandrananda D, Heider K, Wan JCM, Warren AY, Morris J, Hudecova I, Cooper WN, Mitchell TJ, Gale D, Ruiz-Valdepenas A, Klatte T, Ursprung S, Sala E, Riddick ACP, Aho TF, Armitage JN, Perakis S, Pichler M, Seles M, Wcislo G, Welsh SJ, Matakidou A, Eisen T, Massie CE, Rosenfeld N, Heitzer E and Stewart GD

    Cancer Research UK Cambridge Institute, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.

    Background: Cell-free tumor-derived DNA (ctDNA) allows non-invasive monitoring of cancers, but its utility in renal cell cancer (RCC) has not been established.

    Methods: Here, a combination of untargeted and targeted sequencing methods, applied to two independent cohorts of patients (n = 91) with various renal tumor subtypes, were used to determine ctDNA content in plasma and urine.

    Results: Our data revealed lower plasma ctDNA levels in RCC relative to other cancers of similar size and stage, with untargeted detection in 27.5% of patients from both cohorts. A sensitive personalized approach, applied to plasma and urine from select patients (n = 22) improved detection to ~ 50%, including in patients with early-stage disease and even benign lesions. Detection in plasma, but not urine, was more frequent amongst patients with larger tumors and in those patients with venous tumor thrombus. With data from one extensively characterized patient, we observed that plasma and, for the first time, urine ctDNA may better represent tumor heterogeneity than a single tissue biopsy. Furthermore, in a subset of patients (n = 16), longitudinal sampling revealed that ctDNA can track disease course and may pre-empt radiological identification of minimal residual disease or disease progression on systemic therapy. Additional datasets will be required to validate these findings.

    Conclusions: These data highlight RCC as a ctDNA-low malignancy. The biological reasons for this are yet to be determined. Nonetheless, our findings indicate potential clinical utility in the management of patients with renal tumors, provided improvement in isolation and detection approaches.

    Funded by: Addenbrooke's Charitable Trust, Cambridge University Hospitals; Austrian Science Fund: P28949-B28; Austrian federal ministry for digital and economic affairs; Cancer Research UK: C9545/A29580; European Research Council: CANCER EXOMES IN PLASMA; Renal Cancer Research Fund

    Genome medicine 2020;12;1;23

  • Human BDNF/TrkB variants impair hippocampal synaptogenesis and associate with neurobehavioural abnormalities.

    Sonoyama T, Stadler LKJ, Zhu M, Keogh JM, Henning E, Hisama F, Kirwan P, Jura M, Blaszczyk BK, DeWitt DC, Brouwers B, Hyvönen M, Barroso I, Merkle FT, Appleyard SM, Wayman GA and Farooqi IS

    University of Cambridge Metabolic Research Laboratories and NIHR Cambridge Biomedical Research Centre, Wellcome Trust-MRC Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge, UK.

    Brain-derived neurotrophic factor (BDNF) signals through its high affinity receptor Tropomyosin receptor kinase-B (TrkB) to regulate neuronal development, synapse formation and plasticity. In rodents, genetic disruption of Bdnf and TrkB leads to weight gain and a spectrum of neurobehavioural phenotypes. Here, we functionally characterised a de novo missense variant in BDNF and seven rare variants in TrkB identified in a large cohort of people with severe, childhood-onset obesity. In cells, the E183K BDNF variant resulted in impaired processing and secretion of the mature peptide. Multiple variants in the kinase domain and one variant in the extracellular domain of TrkB led to a loss of function through multiple signalling pathways, impaired neurite outgrowth and dominantly inhibited glutamatergic synaptogenesis in hippocampal neurons. BDNF/TrkB variant carriers exhibited learning difficulties, impaired memory, hyperactivity, stereotyped and sometimes, maladaptive behaviours. In conclusion, human loss of function BDNF/TrkB variants that impair hippocampal synaptogenesis may contribute to a spectrum of neurobehavioural disorders.

    Scientific reports 2020;10;1;9028

  • Effect of long-lasting insecticidal nets with and without piperonyl butoxide on malaria indicators in Uganda (LLINEUP): a pragmatic, cluster-randomised trial embedded in a national LLIN distribution campaign.

    Staedke SG, Gonahasa S, Dorsey G, Kamya MR, Maiteki-Sebuguzi C, Lynd A, Katureebe A, Kyohere M, Mutungi P, Kigozi SP, Opigo J, Hemingway J and Donnelly MJ

    Department of Clinical Research, London School of Hygiene & Tropical Medicine, London, UK; Infectious Diseases Research Collaboration, Kampala, Uganda. Electronic address:

    Background: Long-lasting insecticidal nets (LLINs) are the primary malaria prevention tool, but their effectiveness is threatened by pyrethroid resistance. We embedded a pragmatic cluster-randomised trial into Uganda's national LLIN campaign to compare conventional LLINs with those containing piperonyl butoxide (PBO), a synergist that can partially restore pyrethroid susceptibility in mosquito vectors.

    Methods: 104 health sub-districts, from 48 districts in Uganda, were randomly assigned to LLINs with PBO (PermaNet 3.0 and Olyset Plus) and conventional LLINs (PermaNet 2.0 and Olyset Net) by proportionate randomisation using an iterative process. At baseline 6, 12, and 18 months after LLIN distribution, cross-sectional surveys were done in 50 randomly selected households per cluster (5200 per survey); a subset of ten households per cluster (1040 per survey) were randomly selected for entomological surveys. The primary outcome was parasite prevalence by microscopy in children aged 2-10 years, assessed in the as-treated population at 6, 12, and 18 months. This trial is registered with ISRCTN, ISRCTN17516395.

    Findings: LLINs were delivered to households from March 25, 2017, to March 18, 2018, 32 clusters were randomly assigned to PermaNet 3.0, 20 to Olyset Plus, 37 to PermaNet 2.0, and 15 to Olyset Net. In the as-treated analysis, three clusters were excluded because no dominant LLIN was received, and four clusters were reassigned, resulting in 49 PBO LLIN clusters (31 received PermaNet 3.0 and 18 received Olyset Plus) and 52 non-PBO LLIN clusters (39 received PermaNet 2.0 and 13 received Olyset Net). At 6 months, parasite prevalence was 11% (386/3614) in the PBO group compared with 15% (556/3844) in the non-PBO group (prevalence ratio [PR] adjusted for baseline values 0·74, 95% CI 0·62-0·87; p=0·0003). Parasite prevalence was similar at month 12 (11% vs 13%; PR 0·73, 95% CI 0·63-0·85; p=0·0001) and month 18 (12% vs 14%; PR 0·84, 95% CI 0·72-0·98; p=0·029).

    Interpretation: In Uganda, where pyrethroid resistance is high, PBO LLINs reduced parasite prevalence more effectively than did conventional LLINs for up to 18 months. This study provides evidence needed to support WHO's final recommendation on use of PBO LLINs.

    Funding: The Against Malaria Foundation, UK Department for International Development, Innovative Vector Control Consortium, and Bill and Melinda Gates Foundation.

    Lancet (London, England) 2020;395;10232;1292-1303

  • Applying single-cell technologies to clinical pathology: progress in nephropathology.

    Stewart BJ and Clatworthy MR

    Molecular Immunity Unit, University of Cambridge Department of Medicine, Cambridge, UK.

    Cells represent the basic building blocks of living organisms. Accurate characterisation of cellular phenotype, intercellular signalling networks, and the spatial organisation of cells within organs is crucial to deliver a better understanding of the processes underpinning physiology, and the perturbations that lead to disease. Single-cell methodologies have rapidly increased in scale and scope in recent years and are set to generate important insights into human disease. Here, we review current practices in nephropathology, which are dominated by relatively simple morphological descriptions of tissue biopsies based on their appearance using light microscopy. Bulk transcriptomics have more recently been used to explore glomerular and tubulointerstitial kidney disease, renal cancer, and the responses to injury and alloimmunity in kidney transplantation, generating novel disease insights and prognostic biomarkers. These studies set the stage for single-cell transcriptomic approaches which reveal cell-type specific gene expression patterns in health and disease. These technologies allow genome-wide disease susceptibility genes to be interpreted with the knowledge of the specific cell populations within organs that express them, identifying candidate cell types for further study. Single cell technologies are also moving beyond assaying individual cellular transcriptomes, to measuring the epigenetic landscape of single cells. Single cell antigen receptor gene sequencing also enables specific T and B cell clones to be tracked in different tissues and disease states. In the coming years these rich 'multi-omic' descriptions of kidney disease will enable histopathological descriptions to be comprehensively integrated with molecular phenotypes, enabling better disease classification and prognostication and the application of personalised treatment strategies. This article is protected by copyright. All rights reserved.

    The Journal of pathology 2020

  • A high-content RNAi screen reveals multiple roles for long noncoding RNAs in cell division.

    Stojic L, Lun ATL, Mascalchi P, Ernst C, Redmond AM, Mangei J, Barr AR, Bousgouni V, Bakal C, Marioni JC, Odom DT and Gergely F

    Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.

    Genome stability relies on proper coordination of mitosis and cytokinesis, where dynamic microtubules capture and faithfully segregate chromosomes into daughter cells. With a high-content RNAi imaging screen targeting more than 2,000 human lncRNAs, we identify numerous lncRNAs involved in key steps of cell division such as chromosome segregation, mitotic duration and cytokinesis. Here, we provide evidence that the chromatin-associated lncRNA, linc00899, leads to robust mitotic delay upon its depletion in multiple cell types. We perform transcriptome analysis of linc00899-depleted cells and identify the neuronal microtubule-binding protein, TPPP/p25, as a target of linc00899. We further show that linc00899 binds TPPP/p25 and suppresses its transcription. In cells depleted of linc00899, upregulation of TPPP/p25 alters microtubule dynamics and delays mitosis. Overall, our comprehensive screen uncovers several lncRNAs involved in genome stability and reveals a lncRNA that controls microtubule behaviour with functional implications beyond cell division.

    Funded by: Cancer Research UK (CRUK): A20412, C14303/A17197

    Nature communications 2020;11;1;1851

  • GPseudoClust: deconvolution of shared pseudo-profiles at single-cell resolution.

    Strauss ME, Kirk PDW, Reid JE and Wernisch L

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Motivation: Many methods have been developed to cluster genes on the basis of their changes in mRNA expression over time, using bulk RNA-seq or microarray data. However, single-cell data may present a particular challenge for these algorithms, since the temporal ordering of cells is not directly observed. One way to address this is to first use pseudotime methods to order the cells, and then apply clustering techniques for time course data. However, pseudotime estimates are subject to high levels of uncertainty, and failing to account for this uncertainty is liable to lead to erroneous and/or over-confident gene clusters.

    Results: The proposed method, GPseudoClust, is a novel approach that jointly infers pseudotemporal ordering and gene clusters, and quantifies the uncertainty in both. GPseudoClust combines a recent method for pseudotime inference with non-parametric Bayesian clustering methods, efficient Markov Chain Monte Carlo sampling and novel subsampling strategies which aid computation. We consider a broad array of simulated and experimental datasets to demonstrate the effectiveness of GPseudoClust in a range of settings.

    Availability and implementation: An implementation is available on GitHub: and

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Funded by: Medical Research Council: MC_UU_00002/1, MC_UU_00002/10

    Bioinformatics (Oxford, England) 2020;36;5;1484-1491

  • Persistence of Brucella abortus lineages revealed by genomic characterization and phylodynamic analysis.

    Suárez-Esquivel M, Hernández-Mora G, Ruiz-Villalobos N, Barquero-Calvo E, Chacón-Díaz C, Ladner JT, Oviedo-Sánchez G, Foster JT, Rojas-Campos N, Chaves-Olarte E, Thomson NR, Moreno E and Guzmán-Verri C

    Programa de Investigación en Enfermedades Tropicales, Escuela de Medicina Veterinaria, Universidad Nacional, Heredia, Costa Rica.

    Brucellosis, caused by Brucella abortus, is a major disease of cattle and humans worldwide distributed. Eradication and control of the disease has been difficult in Central and South America, Central Asia, the Mediterranean and the Middle East. Epidemiological strategies combined with phylogenetic methods provide the high-resolution power needed to study relationships between surveillance data and pathogen population dynamics, using genetic diversity and spatiotemporal distributions. This information is crucial for prevention and control of disease spreading at a local and worldwide level. In Costa Rica (CR), the disease was first reported at the beginning of the 20th century and has not been controlled despite many efforts. We characterized 188 B. abortus isolates from CR recovered from cattle, humans and water buffalo, from 2003 to 2018, and whole genome sequencing (WGS) was performed in 95 of them. They were also assessed based on geographic origin, date of introduction, and phylogenetic associations in a worldwide and national context. Our results show circulation of five B. abortus lineages (I to V) in CR, phylogenetically related to isolates from the United States, United Kingdom, and South America. Lineage I was dominant and probably introduced at the end of the 19th century. Lineage II, represented by a single isolate from a water buffalo, clustered with a Colombian sample, and was likely introduced after 1845. Lineages III and IV were likely introduced during the early 2000s. Fourteen isolates from humans were found within the same lineage (lineage I) regardless of their geographic origin within the country. The main CR lineages, introduced more than 100 years ago, are widely spread throughout the country, in contrast to new introductions that seemed to be more geographically restricted. Following the brucellosis prevalence and the farming practices of several middle- and low-income countries, similar scenarios could be found in other regions worldwide.

    PLoS neglected tropical diseases 2020;14;4;e0008235

  • Genome sequence of the root-knot nematode Meloidogyne luci.

    Susič N, Koutsovoulos GD, Riccio C, Danchin EGJ, Blaxter ML, Lunt DH, Strajnar P, Širca S, Urek G and Stare BG

    Agricultural Institute of Slovenia, Plant Protection Department, Ljubljana, Slovenia.

    Root-knot nematodes from the genus <i>Meloidogyne</i> are polyphagous plant endoparasites and agricultural pests of global importance. Here, we report the high-quality genome sequence of <i>Meloidogyne luci</i> population SI-Smartno V13. The resulting genome assembly of <i>M. luci</i> SI-Smartno V13 consists of 327 contigs, with an N50 contig length of 1,711,905 bp and a total assembly length of 209.16 Mb.

    Root-knot nematodes from the genus <i>Meloidogyne</i> are polyphagous plant endoparasites and agricultural pests of global importance. Here, we report the high-quality genome sequence of <i>Meloidogyne luci</i> population SI-Smartno V13. The resulting genome assembly of <i>M. luci</i> SI-Smartno V13 consists of 327 contigs, with an N50 contig length of 1,711,905 bp and a total assembly length of 209.16 Mb.

    Journal of nematology 2020;52;1-5

  • Evolutionary genomics at the human-environment interface in Africa.

    Svardal H, Rusuwa B, Linderoth T, Kiran A, Charmantier A, Lattorff HMG, Cagan A, Ommeh SC, Kamng'ona A, Katongo C, Santos ME, Durbin R, Kumwenda B, Visscher PM, von der Heyden S and participants of SMBE Malawi

    Department of Biology, University of Antwerp, Antwerp, Belgium.

    We report on the first meeting of SMBE in Africa. SMBE Malawi was initiated to bring together African and international researchers who use genetics or genomics to study natural systems impacted by human activities. The goals of this conference were (1) to reach a world class standard of science with a large number of contributions from within Africa, (2) to initiate an exchange between African and international researchers and (3) to identify challenges and opportunities for evolutionary genomics research in Africa. As we report here we think that we have achieved these goals and make suggestions on the way forward for African evolutionary genomics research.

    Molecular biology and evolution 2020

  • Non-typhoidal Salmonella bloodstream infections in Kisantu, DR Congo: Emergence of O5-negative Salmonella Typhimurium and extensive drug resistance.

    Tack B, Phoba MF, Barbé B, Kalonji LM, Hardy L, Van Puyvelde S, Ingelbeen B, Falay D, Ngbonda D, van der Sande MAB, Deborggraeve S, Jacobs J and Lunguya O

    Department of Clinical Sciences, Institute of Tropical Medicine, Antwerp, Belgium.

    Background: Non-typhoidal Salmonella (NTS) are a major cause of bloodstream infection (BSI) in sub-Saharan Africa. This study aimed to assess its longitudinal evolution as cause of BSI, its serotype distribution and its antibiotic resistance pattern in Kisantu, DR Congo.

    Methods: As part of a national surveillance network, blood cultures were sampled in patients with suspected BSI admitted to Kisantu referral hospital from 2015-2017. Blood cultures were worked-up according to international standards. Results were compared to similar data from 2007 onwards.

    Results: In 2015-2017, NTS (n = 896) represented the primary cause of BSI. NTS were isolated from 7.6% of 11,764 suspected and 65.4% of 1371 confirmed BSI. In children <5 years, NTS accounted for 9.6% of suspected BSI. These data were in line with data from previous surveillance periods, except for the proportion of confirmed BSI, which was lower in previous surveillance periods. Salmonella Typhimurium accounted for 63.1% of NTS BSI and Salmonella Enteritidis for 36.4%. Of all Salmonella Typhimurium, 36.9% did not express the O5-antigen (i.e. variant Copenhagen). O5-negative Salmonella Typhimurium were rare before 2013, but increased gradually from then onwards. Multidrug resistance was observed in 87.4% of 864 NTS isolates, decreased ciprofloxacin susceptibility in 7.3%, ceftriaxone resistance in 15.7% and azithromycin resistance in 14.9%. A total of 14.2% of NTS isolates, that were all Salmonella Typhimurium, were multidrug resistant and ceftriaxone and azithromycin co-resistant. These Salmonella isolates were called extensively drug resistant. Compared to previous surveillance periods, proportions of NTS isolates with resistance to ceftriaxone and azithromycin and decreased ciprofloxacin susceptibility increased.

    Conclusion: As in previous surveillance periods, NTS ranked first as the cause of BSI in children. The emergence of O5-negative Salmonella Typhimurium needs to be considered in the light of vaccine development. The high proportions of antibiotic resistance are worrisome.

    PLoS neglected tropical diseases 2020;14;4;e0008121

  • Uptake of Plasmodium falciparum Gametocytes During Mosquito Bloodmeal by Direct and Membrane Feeding.

    Talman AM, Ouologuem DTD, Love K, Howick VM, Mulamba C, Haidara A, Dara N, Sylla D, Sacko A, Coulibaly MM, Dao F, Sangare CPO, Djimde A and Lawniczak MKN

    Wellcome Sanger Institute, Hinxton, United Kingdom.

    <i>Plasmodium falciparum</i> remains one of the leading causes of child mortality, and nearly half of the world's population is at risk of contracting malaria. While pathogenesis results from replication of asexual forms in human red blood cells, it is the sexually differentiated forms, gametocytes, which are responsible for the spread of the disease. For transmission to succeed, both mature male and female gametocytes must be taken up by a female <i>Anopheles</i> mosquito during its blood meal for subsequent differentiation into gametes and mating inside the mosquito gut. Observed circulating numbers of gametocytes in the human host are often surprisingly low. A pre-fertilization behavior, such as skin sequestration, has been hypothesized to explain the efficiency of human-to-mosquito transmission but has not been sufficiently tested due to a lack of appropriate tools. In this study, we describe the optimization of a qPCR tool that enables the relative quantification of gametocytes within very small input samples. Such a tool allows for the quantification of gametocytes in different compartments of the host and the vector that could potentially unravel mechanisms that enable highly efficient malaria transmission. We demonstrate the use of our gametocyte quantification method in mosquito blood meals from both direct skin feeding on <i>Plasmodium</i> gametocyte carriers and standard membrane feeding assay. Relative gametocyte abundance was not different between mosquitoes fed through a membrane or directly on the skin suggesting that there is no systematic enrichment of gametocytes picked up in the skin.

    Funded by: Wellcome Trust

    Frontiers in microbiology 2020;11;246

  • The network effect: studying COVID-19 pathology with the Human Cell Atlas.

    Teichmann S and Regev A

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK.

    Nature reviews. Molecular cell biology 2020

  • Whole-genome sequencing of a sporadic primary immunodeficiency cohort.

    Thaventhiran JED, Lango Allen H, Burren OS, Rae W, Greene D, Staples E, Zhang Z, Farmery JHR, Simeoni I, Rivers E, Maimaris J, Penkett CJ, Stephens J, Deevi SVV, Sanchis-Juan A, Gleadall NS, Thomas MJ, Sargur RB, Gordins P, Baxendale HE, Brown M, Tuijnenburg P, Worth A, Hanson S, Linger RJ, Buckland MS, Rayner-Matthews PJ, Gilmour KC, Samarghitean C, Seneviratne SL, Sansom DM, Lynch AG, Megy K, Ellinghaus E, Ellinghaus D, Jorgensen SF, Karlsen TH, Stirrups KE, Cutler AJ, Kumararatne DS, Chandra A, Edgar JDM, Herwadkar A, Cooper N, Grigoriadou S, Huissoon AP, Goddard S, Jolles S, Schuetz C, Boschann F, Primary Immunodeficiency Consortium for the NIHR Bioresource, Lyons PA, Hurles ME, Savic S, Burns SO, Kuijpers TW, Turro E, Ouwehand WH, Thrasher AJ and Smith KGC

    Cambridge Institute of Therapeutic Immunology and Infectious Disease, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, Cambridge, UK.

    Primary immunodeficiency (PID) is characterized by recurrent and often life-threatening infections, autoimmunity and cancer, and it poses major diagnostic and therapeutic challenges. Although the most severe forms of PID are identified in early childhood, most patients present in adulthood, typically with no apparent family history and a variable clinical phenotype of widespread immune dysregulation: about 25% of patients have autoimmune disease, allergy is prevalent and up to 10% develop lymphoid malignancies<sup>1-3</sup>. Consequently, in sporadic (or non-familial) PID genetic diagnosis is difficult and the role of genetics is not well defined. Here we address these challenges by performing whole-genome sequencing in a large PID cohort of 1,318 participants. An analysis of the coding regions of the genome in 886 index cases of PID found that disease-causing mutations in known genes that are implicated in monogenic PID occurred in 10.3% of these patients, and a Bayesian approach (BeviMed<sup>4</sup>) identified multiple new candidate PID-associated genes, including IVNS1ABP. We also examined the noncoding genome, and found deletions in regulatory regions that contribute to disease causation. In addition, we used a genome-wide association study to identify loci that are associated with PID, and found evidence for the colocalization of-and interplay between-novel high-penetrance monogenic variants and common variants (at the PTPN2 and SOCS1 loci). This begins to explain the contribution of common variants to the variable penetrance and phenotypic complexity that are observed in PID. Thus, using a cohort-based whole-genome-sequencing approach in the diagnosis of PID can increase diagnostic yield and further our understanding of the key pathways that influence immune responsiveness in humans.

    Nature 2020

  • Low rates of mutation in clinical grade human pluripotent stem cells under different culture conditions.

    Thompson O, von Meyenn F, Hewitt Z, Alexander J, Wood A, Weightman R, Gregory S, Krueger F, Andrews S, Barbaric I, Gokhale PJ, Moore HD, Reik W, Milo M, Nik-Zainal S, Yusa K and Andrews PW

    The Centre for Stem Cell Biology, Department of Biomedical Science, University of Sheffield, Western Bank, Sheffield, S10 2TN, UK.

    The occurrence of repetitive genomic changes that provide a selective growth advantage in pluripotent stem cells is of concern for their clinical application. However, the effect of different culture conditions on the underlying mutation rate is unknown. Here we show that the mutation rate in two human embryonic stem cell lines derived and banked for clinical application is low and not substantially affected by culture with Rho Kinase inhibitor, commonly used in their routine maintenance. However, the mutation rate is reduced by >50% in cells cultured under 5% oxygen, when we also found alterations in imprint methylation and reversible DNA hypomethylation. Mutations are evenly distributed across the chromosomes, except for a slight increase on the X-chromosome, and an elevation in intergenic regions suggesting that chromatin structure may affect mutation rate. Overall the results suggest that pluripotent stem cells are not subject to unusually high rates of genetic or epigenetic alterations.

    Nature communications 2020;11;1;1528

  • A Maturity Matrix for Nurse Leaders to Facilitate and Benchmark Progress in Genomic Healthcare Policy, Infrastructure, Education, and Delivery.

    Tonkin E, Calzone KA, Badzek L, Benjamin C, Middleton A, Patch C and Kirk M

    Associate Professor of Genomics Healthcare, University of South Wales, Pontypridd, Rhondda Cynon Taff, Wales, UK.

    Purpose: Nurse leaders driving strategic integration of genomics across nursing need tools and resources to evaluate their environment, guide strategies to address deficits, and benchmark progress. We describe the development and pilot testing of a self-assessment maturity matrix (MM) that enables users to benchmark the current state of nursing genomic competency and integration for their country or nursing group; guides the development of a strategic course for improvement and implementation; and assesses change over time.

    Design: Mixed-methods participatory research and self-assessment.

    Methods: During a 3-day workshop involving nursing experts in health care and genomics, a genomic integration MM grid was built by consensus using iterative participatory methods. Data were analyzed using descriptive techniques. This work built on an online survey involving the same participants to identify the critical elements needed for "effective nursing which promotes health outcomes globally through genomics."

    Findings: Experts from 19 countries across six continents and seven organizations participated in item development. The Assessment of Strategic Integration of Genomics across Nursing (ASIGN) MM incorporates 55 outcome-focused items serving as subscales for six critical success factors (CSFs): education and workforce; effective nursing practice; infrastructure and resources; collaboration and communication; public/patient involvement; policy and leadership. Users select their current circumstances for each item against a 5-point ordinal scale (precontemplation to leading). Nurses representing 17 countries undertook matrix pilot testing. Results demonstrate variation across CSFs, with many countries at the earliest stages of implementation.

    Conclusions: The MM has the potential to guide the strategic integration of genomics across nursing and enables additional assessments within and between countries to be made.

    Clinical relevance: Nurse leadership and direction are essential to accelerate integration of genomics across nursing practice and education. The MM helps nurse leaders to benchmark progress and guide strategic planning to build global genomic nursing capacity.

    Funded by: Health Education England; NCI NIH HHS; Wellcome Genome Campus Connecting Science; Wellcome Trust: 206194

    Journal of nursing scholarship : an official publication of Sigma Theta Tau International Honor Society of Nursing 2020

  • A Roadmap for Global Acceleration of Genomics Integration Across Nursing.

    Tonkin E, Calzone KA, Badzek L, Benjamin C, Middleton A, Patch C and Kirk M

    Associate Professor of Genomic Healthcare, University of South Wales, Pontypridd, Wales, UK.

    Purpose: The changes needed to accelerate integration of genomics across nursing are complex, with significant challenges faced globally. Common themes lend themselves to a coordinated and collaborative strategic approach to sustained change. We aim to synthesize the outputs of a research program to present a roadmap for nursing leadership to guide integration of genomics across practice.

    Design: Mixed methods involving a purposive sample of global nursing leaders and nursing organizations in a sustained, highly interactive program.

    Methods: Experts in nursing, health care and healthcare services, policy, and leadership were recruited. Online surveys preceded a 3-day residential meeting utilizing participatory methods and techniques to gain consensus on the essential elements of a roadmap to promote genomics integration.

    Findings: Twenty-three leaders representing 19 countries and seven organizations participated overall. Data on the scope and status of nursing, genomics health care, and resources have been synthesized. Participants identified 117 facilitators to genomics integration across diverse sources. Barriers and priorities identified were mapped to the constructs of the Consolidated Framework for Implementation Research. The roadmap is underpinned by a maturity matrix created by participants to guide and benchmark progress in genomics integration.

    Conclusions: Nurse leaders seeking to accelerate change can access practical guidance with the roadmap, underpinned by support through the Global Genomics Nursing Alliance and its strategic priorities.

    Clinical relevance: Genomics is shaping the future of healthcare, but change is needed for integration across nursing. This practical roadmap, adaptable to local health systems and clinical and educational contexts, is relevant to nurse leaders aiming to accelerate change.

    Funded by: Health Education England; NIH HHS; National Cancer Institute, National Institutes of Health; National Human Genome Research Institute, National Institutes of Health; Wellcome Genome Campus Connecting Science; Wellcome Trust: 206194

    Journal of nursing scholarship : an official publication of Sigma Theta Tau International Honor Society of Nursing 2020

  • Phylogenomic analysis of Neisseria gonorrhoeae transmission to assess sexual mixing and HIV transmission risk in England: a cross-sectional, observational, whole-genome sequencing study.

    Town K, Field N, Harris SR, Sánchez-Busó L, Cole MJ, Pitt R, Fifer H, Mohammed H and Hughes G

    National Institute for Health Research Health Protection Research Unit in Blood Borne and Sexually Transmitted Infections, University College London, London, UK; Centre for Molecular Epidemiology and Translational Research, Institute for Global Health, University College London, London, UK; National Infection Service, Public Health England, London, UK. Electronic address:

    Background: Characterising sexual networks with transmission of sexually transmitted infections might allow identification of individuals at increased risk of infection. We aimed to investigate sexual mixing in Neisseria gonorrhoeae transmission networks between women, heterosexual men, and men who report sex with men (MSM), and between people with and without HIV.

    Methods: In this cross-sectional observational study, we whole-genome sequenced N gonorrhoeae isolates from the archive of the Gonococcal Resistance to Antimicrobials Surveillance Programme (GRASP).w Isolates that varied by five single nucleotide polymorphisms or fewer were grouped into clusters that represented sexual networks with N gonorrhoeae transmission. Clusters were described by gender, sexual risk group, and HIV status.

    Findings: We sequenced 1277 N gonorrhoeae isolates with linked clinical and sociodemographic data that were collected in five clinics in England during 2013-16 (July 1 to Sept 30 in 2013-15; July 1 to Sept 9 in 2016). The isolates grouped into 213 clusters. 30 (14%) clusters contained isolates from heterosexual men and MSM but no women and three (1%) clusters contained isolates from only women and MSM. 146 (69%) clusters comprised solely people with negative or unknown HIV status and seven (3%) comprised only HIV-positive people. 60 (28%) clusters comprised MSM with positive and negative or unknown HIV status.

    Interpretation: N gonorrhoeae molecular data can provide information indicating risk of HIV or other sexually transmitted infections for some individuals for whom such risk might not be known from clinical history. These findings have implications for sexual health care, including offering testing, prevention advice, and preventive treatment, such as HIV pre-exposure prophylaxis.

    Funding: National Institute for Health Research Health Protection Research Unit; Wellcome; Public Health England.

    The Lancet. Infectious diseases 2020

  • Genomic and Phenotypic Variability in Neisseria gonorrhoeae Antimicrobial Susceptibility, England.

    Town K, Harris S, Sánchez-Busó L, Cole MJ, Pitt R, Fifer H, Mohammed H, Field N and Hughes G

    Antimicrobial resistance (AMR) in Neisseria gonorrhoeae is a global concern. Phylogenetic analyses resolve uncertainties regarding genetic relatedness of isolates with identical phenotypes and inform whether AMR is due to new mutations and clonal expansion or separate introductions by importation. We sequenced 1,277 isolates with associated epidemiologic and antimicrobial susceptibility data collected during 2013-2016 to investigate N. gonorrhoeae genomic variability in England. Comparing genetic markers and phenotypes for AMR, we identified 2 N. gonorrhoeae lineages with different antimicrobial susceptibility profiles and 3 clusters with elevated MICs for ceftriaxone, varying mutations in the penA allele, and different epidemiologic characteristics. Our results indicate N. gonorrhoeae with reduced antimicrobial susceptibility emerged independently and multiple times in different sexual networks in England, through new mutation or recombination events and by importation. Monitoring and control for AMR in N. gonorrhoeae should cover the entire population affected, rather than focusing on specific risk groups or locations.

    Emerging infectious diseases 2020;26;3;505-515

  • Nearly Complete Genome Sequence of Brugia malayi Strain FR3.

    Tracey A, Foster JM, Paulini M, Grote A, Mattick J, Tsai YC, Chung M, Cotton JA, Clark TA, Geber A, Holroyd N, Korlach J, Libro S, Lustigman S, Michalski ML, Rogers MB, Twaddle A, Dunning Hotopp JC, Berriman M and Ghedin E

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom.

    Lymphatic filariasis affects ∼120 million people and can result in elephantiasis and hydrocele. Here, we report the nearly complete genome sequence of the best-studied causative agent of lymphatic filariasis, <i>Brugia malayi</i> The assembly contains four autosomes, an X chromosome, and only eight gaps but lacks a contiguous sequence for the known Y chromosome.

    Microbiology resource announcements 2020;9;24

  • A Novel Chemically Differentiated Mouse Embryonic Stem Cell-Based Model to Study Liver Stages of Plasmodium berghei.

    Tripathi J, Segeritz CP, Griffiths G, Bushell W, Vallier L, Skarnes WC, Mota MM and Billker O

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK.

    Asymptomatic and obligatory liver stage (LS) infection of Plasmodium parasites presents an attractive target for antimalarial vaccine and drug development. Lack of robust cellular models to study LS infection has hindered the discovery and validation of host genes essential for intrahepatic parasite development. Here, we present a chemically differentiated mouse embryonic stem cell (ESC)-based LS model, which supports complete development of Plasmodium berghei exoerythrocytic forms (EEFs) and can be used to define new host-parasite interactions. Using our model, we established that host Pnpla2, coding for adipose triglyceride lipase, is dispensable for P. berghei EEF development. In addition, we also evaluated in-vitro-differentiated human hepatocyte-like cells (iHLCs) to study LS of P. berghei and found it to be a sub-optimal infection model. Overall, our results present a new mouse ESC-based P. berghei LS infection model that can be utilized to study the impact of host genetic variation on parasite development.

    Stem cell reports 2020

  • Sequence Composition Underlying Centromeric and Heterochromatic Genome Compartments of the Pacific Oyster Crassostrea gigas.

    Tunjić Cvitanić M, Vojvoda Zeljko T, Pasantes JJ, García-Souto D, Gržan T, Despot-Slade E, Plohl M and Šatović E

    Division of Molecular Biology, Ruđer Bošković Institute, Bijenička cesta 54, 10000 Zagreb, Croatia.

    Segments of the genome enriched in repetitive sequences still present a challenge and are omitted in genome assemblies. For that reason, the exact composition of DNA sequences underlying the heterochromatic regions and the active centromeres are still unexplored for many organisms. The centromere is a crucial region of eukaryotic chromosomes responsible for the accurate segregation of genetic material. The typical landmark of centromere chromatin is the rapidly-evolving variant of the histone H3, CenH3, while DNA sequences packed in constitutive heterochromatin are associated with H3K9me3-modified histones. In the Pacific oyster <i>Crassostrea gigas</i> we identified its centromere histone variant, Cg-CenH3, that shows stage-specific distribution in gonadal cells. In order to investigate the DNA composition of genomic regions associated with the two specific chromatin types, we employed chromatin immunoprecipitation followed by high-throughput next-generation sequencing of the Cg-CenH3- and H3K9me3-associated sequences. CenH3-associated sequences were assigned to six groups of repetitive elements, while H3K9me3-associated-ones were assigned only to three. Those associated with CenH3 indicate the lack of uniformity in the chromosomal distribution of sequences building the centromeres, being also in the same time dispersed throughout the genome. The heterochromatin of <i>C. gigas</i> exhibited general paucity and limited chromosomal localization as predicted, with H3K9me3-associated sequences being predominantly constituted of DNA transposons.

    Funded by: Hrvatska Zaklada za Znanost: IP-09-2014-3183; Xunta de Galicia and Fondos FEDER: &quot;Unha maneira de facer Europa&quot; (Axudas do programa de consolidación e estruturación de unidades de investigacións competitivas do SUG): ED431C 2016-037

    Genes 2020;11;6

  • A whole-genome screen identifies Salmonella enterica serovar Typhi genes involved in fluoroquinolone susceptibility.

    Turner AK, Eckert SE, Turner DJ, Yasir M, Webber MA, Charles IG, Parkhill J and Wain J

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    Objectives: A whole-genome screen at sub-gene resolution was performed to identify candidate loci that contribute to enhanced or diminished ciprofloxacin susceptibility in Salmonella enterica serovar Typhi.

    Methods: A pool of over 1 million transposon insertion mutants of an S. Typhi Ty2 derivative were grown in a sub-MIC concentration of ciprofloxacin, or without ciprofloxacin. Transposon-directed insertion site sequencing (TraDIS) identified relative differences between the mutants that grew following the ciprofloxacin treatment compared with the untreated mutant pool, thereby indicating which mutations contribute to gain or loss of ciprofloxacin susceptibility.

    Results: Approximately 88% of the S. Typhi strain's 4895 annotated genes were assayed, and at least 116 were identified as contributing to gain or loss of ciprofloxacin susceptibility. Many of the identified genes are known to influence susceptibility to ciprofloxacin, thereby providing method validation. Genes were identified that were not known previously to be involved in susceptibility, and some of these had no previously known phenotype. Susceptibility to ciprofloxacin was enhanced by insertion mutations in genes coding for efflux, other surface-associated functions, DNA repair and expression regulation, including phoP, barA and marA. Insertion mutations that diminished susceptibility were predominantly in genes coding for surface polysaccharide biosynthesis and regulatory genes, including slyA, emrR, envZ and cpxR.

    Conclusions: A genomics approach has identified novel contributors to gain or loss of ciprofloxacin susceptibility in S. Typhi, expanding our understanding of the impact of fluoroquinolones on bacteria and of mechanisms that may contribute to resistance. The data also demonstrate the power of the TraDIS technology for antibacterial research.

    The Journal of antimicrobial chemotherapy 2020

  • Whole-genome sequencing of patients with rare diseases in a national health system.

    Turro E, Astle WJ, Megy K, Gräf S, Greene D, Shamardina O, Allen HL, Sanchis-Juan A, Frontini M, Thys C, Stephens J, Mapeta R, Burren OS, Downes K, Haimel M, Tuna S, Deevi SVV, Aitman TJ, Bennett DL, Calleja P, Carss K, Caulfield MJ, Chinnery PF, Dixon PH, Gale DP, James R, Koziell A, Laffan MA, Levine AP, Maher ER, Markus HS, Morales J, Morrell NW, Mumford AD, Ormondroyd E, Rankin S, Rendon A, Richardson S, Roberts I, Roy NBA, Saleem MA, Smith KGC, Stark H, Tan RYY, Themistocleous AC, Thrasher AJ, Watkins H, Webster AR, Wilkins MR, Williamson C, Whitworth J, Humphray S, Bentley DR, NIHR BioResource for the 100,000 Genomes Project, Kingston N, Walker N, Bradley JR, Ashford S, Penkett CJ, Freson K, Stirrups KE, Raymond FL and Ouwehand WH

    Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK.

    Most patients with rare diseases do not receive a molecular diagnosis and the aetiological variants and causative genes for more than half such disorders remain to be discovered<sup>1</sup>. Here we used whole-genome sequencing (WGS) in a national health system to streamline diagnosis and to discover unknown aetiological variants in the coding and non-coding regions of the genome. We generated WGS data for 13,037 participants, of whom 9,802 had a rare disease, and provided a genetic diagnosis to 1,138 of the 7,065 extensively phenotyped participants. We identified 95 Mendelian associations between genes and rare diseases, of which 11 have been discovered since 2015 and at least 79 are confirmed to be aetiological. By generating WGS data of UK Biobank participants<sup>2</sup>, we found that rare alleles can explain the presence of some individuals in the tails of a quantitative trait for red blood cells. Finally, we identified four novel non-coding variants that cause disease through the disruption of transcription of ARPC1B, GATA1, LRBA and MPL. Our study demonstrates a synergy by using WGS for diagnosis and aetiological discovery in routine healthcare.

    Nature 2020

  • Does Intensive Treatment Select for Praziquantel Resistance in High-Transmission Settings? Parasitological Trends and Treatment Efficacy Within a Cluster-Randomized Trial.

    Tushabe JV, Lubyayi L, Sserubanja J, Kabuubi P, Abayo E, Kiwanuka S, Nassuuna J, Kaweesa J, Corstjens P, van Dam G, Sanya RE, Ssenyonga W, Tukahebwa EM, Kabatereine NB, Elliott AM, Webb EL and LaVIISWA trial team

    Immunomodulation and Vaccines Research Programme, Medical Research Council/Uganda Virus Research Institute and London School of Hygiene and Tropical Medicine Uganda Research Unit, Entebbe, Uganda.

    Background: Praziquantel mass drug administration (MDA) is recommended in schistosomiasis-endemic areas. Animal models demonstrate <i>Schistosoma</i> parasite resistance to praziquantel after repeated exposure.

    Methods: We conducted a parasitological survey in 26 fishing communities in Uganda after 4 years of quarterly (13 communities) or annual (13 communities) praziquantel MDA, with <i>Schistosoma</i> infection detected by single-stool-sample Kato-Katz. A test of cure was done in participants who were positive on both urine circulating cathodic antigen test and 3-sample Kato-Katz. We calculated cure rates (CRs) and egg reduction rates (ERRs) based on 3-sample Kato-Katz and infection intensity using worm-specific circulating anodic antigen (CAA) in blood, comparing these between quarterly and annually treated participants.

    Results: Single-sample Kato-Katz <i>Schistosoma mansoni</i> prevalence was 22% in 1,056 quarterly treated participants and 34% in 1,030 annually treated participants (risk ratio, 0.62; 95% confidence interval [CI], 0.40 to 0.94). Among 110 test-of-cure participants, CRs were 65% and 51% in annually and quarterly treated villages, respectively (odds ratio, 0.65; 95% CI, 0.27 to 1.58); ERRs were 94% and 81% (difference, -13%; 95% CI, -48% to 2%). There was no impact of quarterly vs annual praziquantel on <i>S. mansoni</i> by CAA.

    Conclusions: In this schistosomiasis hot spot, there was little evidence of decreased praziquantel efficacy. However, in the absence of alternative therapies, there remains a need for continued vigilance of praziquantel efficacy in the MDA era.

    Open forum infectious diseases 2020;7;4;ofaa091

  • Therapeutic targeting of preleukemia cells in a mouse model of NPM1 mutant acute myeloid leukemia.

    Uckelmann HJ, Kim SM, Wong EM, Hatton C, Giovinazzo H, Gadrey JY, Krivtsov AV, Rücker FG, Döhner K, McGeehan GM, Levine RL, Bullinger L, Vassiliou GS and Armstrong SA

    Department of Pediatric Oncology, Dana-Farber Cancer Institute, and Division of Hematology/Oncology, Boston, MA, USA.

    The initiating mutations that contribute to cancer development are sometimes present in premalignant cells. Whether therapies targeting these mutations can eradicate premalignant cells is unclear. Acute myeloid leukemia (AML) is an attractive system for investigating the effect of preventative treatment because this disease is often preceded by a premalignant state (clonal hematopoiesis or myelodysplastic syndrome). In <i>Npm1c/Dnmt3a</i> mutant knock-in mice, a model of AML development, leukemia is preceded by a period of extended myeloid progenitor cell proliferation and self-renewal. We found that this self-renewal can be reversed by oral administration of a small molecule (VTP-50469) that targets the MLL1-Menin chromatin complex. These preclinical results support the hypothesis that individuals at high risk of developing AML might benefit from targeted epigenetic therapy in a preventative setting.

    Funded by: Cancer Research UK; NCI NIH HHS: P01 CA066996, P30 CA008748, P50 CA206963, R01 CA176745, R01 CA204639; NIH HHS: U54 OD020355

    Science (New York, N.Y.) 2020;367;6477;586-590

  • Mechanisms generating cancer genome complexity from a single cell division error.

    Umbreit NT, Zhang CZ, Lynch LD, Blaine LJ, Cheng AM, Tourdot R, Sun L, Almubarak HF, Judge K, Mitchell TJ, Spektor A and Pellman D

    Howard Hughes Medical Institute, Chevy Chase, MD, USA.

    The chromosome breakage-fusion-bridge (BFB) cycle is a mutational process that produces gene amplification and genome instability. Signatures of BFB cycles can be observed in cancer genomes alongside chromothripsis, another catastrophic mutational phenomenon. We explain this association by elucidating a mutational cascade that is triggered by a single cell division error-chromosome bridge formation-that rapidly increases genomic complexity. We show that actomyosin forces are required for initial bridge breakage. Chromothripsis accumulates, beginning with aberrant interphase replication of bridge DNA. A subsequent burst of DNA replication in the next mitosis generates extensive DNA damage. During this second cell division, broken bridge chromosomes frequently missegregate and form micronuclei, promoting additional chromothripsis. We propose that iterations of this mutational cascade generate the continuing evolution and subclonal heterogeneity characteristic of many human cancers.

    Funded by: Cancer Research UK; Howard Hughes Medical Institute; NCI NIH HHS: K22 CA216319, R33 CA225344; NIGMS NIH HHS: R01 GM083299

    Science (New York, N.Y.) 2020;368;6488

  • Persistent and emerging pneumococcal carriage serotypes in a rural Gambian community after ten years of pneumococcal conjugate vaccine pressure.

    Usuf E, Christian B, Gladstone R, Bojang E, Jawneh K, Cox I, Jallow E, Bojang A, Greenwood B, Adegbola RA, Bentley SD, Hill PC and Roca A

    Disease Control and Elimination Theme, Medical Research Council Unit The Gambia at London School Hygiene and Tropical Medicine, Fajara The Gambia.

    Background: The continuing impact of pneumococcal conjugate vaccines (PCVs) in regions with high pneumococcal transmission is threatened by the persistence of vaccine serotypes (VT) and the emergence of non-vaccine serotypes (NVT).

    Methods: In 2016, we conducted a cross-sectional carriage survey (CSS5) in a community where PCV7 was first introduced in 2006 during a cluster randomised trial conducted before nationwide introduction of PCV7 (2009) and PCV13 (2011). We estimated the prevalence of PCV13 VT and NVT by age and compared these to earlier surveys before (CSS0), during (CSS1-3), and after the trial but before PCV13 (CSS4). Genomic analysis was conducted for the non-typeable pneumococci.

    Results: The prevalence of PCV13 VT carriage decreased during the 10 years between CSS0 and CSS5 across all age groups (67·6% to 13·5%, p<0.001; 59·8% to 14·4%, p<0.001; 43·1% to 17·9%, p<0.001; and 24·0% to 5·1%, p<0.001 in <2, 2-4, 5-14 and ≥15 years respectively). However, there was no difference between CSS4 and CSS5 in children ≥2 years and adults (children < 2 years, no data). The prevalence of PCV13 NVT increased between CSS0 and CSS5 for children <2 years but decreased in older children and adults.In CSS5, serotypes 3, 6A and 19F were the most common VT and non-typeable isolates, the most common NVT. Among non-typeable isolates, 73·0% lost the ability to express a capsule. Of these, 70·8% were from a VT background.

    Conclusions: The decrease in PCV13 VT that has occurred since the introduction of PCV13 appears to have plateaued. Significant carriage of these serotypes remains in all age groups.

    Clinical infectious diseases : an official publication of the Infectious Diseases Society of America 2020

  • Triple artemisinin-based combination therapies versus artemisinin-based combination therapies for uncomplicated Plasmodium falciparum malaria: a multicentre, open-label, randomised clinical trial.

    van der Pluijm RW, Tripura R, Hoglund RM, Pyae Phyo A, Lek D, Ul Islam A, Anvikar AR, Satpathi P, Satpathi S, Behera PK, Tripura A, Baidya S, Onyamboko M, Chau NH, Sovann Y, Suon S, Sreng S, Mao S, Oun S, Yen S, Amaratunga C, Chutasmit K, Saelow C, Runcharern R, Kaewmok W, Hoa NT, Thanh NV, Hanboonkunupakarn B, Callery JJ, Mohanty AK, Heaton J, Thant M, Gantait K, Ghosh T, Amato R, Pearson RD, Jacob CG, Gonçalves S, Mukaka M, Waithira N, Woodrow CJ, Grobusch MP, van Vugt M, Fairhurst RM, Cheah PY, Peto TJ, von Seidlein L, Dhorda M, Maude RJ, Winterberg M, Thuy-Nhien NT, Kwiatkowski DP, Imwong M, Jittamala P, Lin K, Hlaing TM, Chotivanich K, Huy R, Fanello C, Ashley E, Mayxay M, Newton PN, Hien TT, Valecha N, Smithuis F, Pukrittayakamee S, Faiz A, Miotto O, Tarning J, Day NPJ, White NJ, Dondorp AM and Tracking Resistance to Artemisinin Collaboration

    Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand; Centre for Tropical Medicine and Global Health, University of Oxford, Oxford, UK; The Open University, Milton Keynes, UK.

    Background: Artemisinin and partner-drug resistance in Plasmodium falciparum are major threats to malaria control and elimination. Triple artemisinin-based combination therapies (TACTs), which combine existing co-formulated ACTs with a second partner drug that is slowly eliminated, might provide effective treatment and delay emergence of antimalarial drug resistance.

    Methods: In this multicentre, open-label, randomised trial, we recruited patients with uncomplicated P falciparum malaria at 18 hospitals and health clinics in eight countries. Eligible patients were aged 2-65 years, with acute, uncomplicated P falciparum malaria alone or mixed with non-falciparum species, and a temperature of 37·5°C or higher, or a history of fever in the past 24 h. Patients were randomly assigned (1:1) to one of two treatments using block randomisation, depending on their location: in Thailand, Cambodia, Vietnam, and Myanmar patients were assigned to either dihydroartemisinin-piperaquine or dihydroartemisinin-piperaquine plus mefloquine; at three sites in Cambodia they were assigned to either artesunate-mefloquine or dihydroartemisinin-piperaquine plus mefloquine; and in Laos, Myanmar, Bangladesh, India, and the Democratic Republic of the Congo they were assigned to either artemether-lumefantrine or artemether-lumefantrine plus amodiaquine. All drugs were administered orally and doses varied by drug combination and site. Patients were followed-up weekly for 42 days. The primary endpoint was efficacy, defined by 42-day PCR-corrected adequate clinical and parasitological response. Primary analysis was by intention to treat. A detailed assessment of safety and tolerability of the study drugs was done in all patients randomly assigned to treatment. This study is registered at, NCT02453308, and is complete.

    Findings: Between Aug 7, 2015, and Feb 8, 2018, 1100 patients were given either dihydroartemisinin-piperaquine (183 [17%]), dihydroartemisinin-piperaquine plus mefloquine (269 [25%]), artesunate-mefloquine (73 [7%]), artemether-lumefantrine (289 [26%]), or artemether-lumefantrine plus amodiaquine (286 [26%]). The median age was 23 years (IQR 13 to 34) and 854 (78%) of 1100 patients were male. In Cambodia, Thailand, and Vietnam the 42-day PCR-corrected efficacy after dihydroartemisinin-piperaquine plus mefloquine was 98% (149 of 152; 95% CI 94 to 100) and after dihydroartemisinin-piperaquine was 48% (67 of 141; 95% CI 39 to 56; risk difference 51%, 95% CI 42 to 59; p<0·0001). Efficacy of dihydroartemisinin-piperaquine plus mefloquine in the three sites in Myanmar was 91% (42 of 46; 95% CI 79 to 98) versus 100% (42 of 42; 95% CI 92 to 100) after dihydroartemisinin-piperaquine (risk difference 9%, 95% CI 1 to 17; p=0·12). The 42-day PCR corrected efficacy of dihydroartemisinin-piperaquine plus mefloquine (96% [68 of 71; 95% CI 88 to 99]) was non-inferior to that of artesunate-mefloquine (95% [69 of 73; 95% CI 87 to 99]) in three sites in Cambodia (risk difference 1%; 95% CI -6 to 8; p=1·00). The overall 42-day PCR-corrected efficacy of artemether-lumefantrine plus amodiaquine (98% [281 of 286; 95% CI 97 to 99]) was similar to that of artemether-lumefantrine (97% [279 of 289; 95% CI 94 to 98]; risk difference 2%, 95% CI -1 to 4; p=0·30). Both TACTs were well tolerated, although early vomiting (within 1 h) was more frequent after dihydroartemisinin-piperaquine plus mefloquine (30 [3·8%] of 794) than after dihydroartemisinin-piperaquine (eight [1·5%] of 543; p=0·012). Vomiting after artemether-lumefantrine plus amodiaquine (22 [1·3%] of 1703) and artemether-lumefantrine (11 [0·6%] of 1721) was infrequent. Adding amodiaquine to artemether-lumefantrine extended the electrocardiogram corrected QT interval (mean increase at 52 h compared with baseline of 8·8 ms [SD 18·6] vs 0·9 ms [16·1]; p<0·01) but adding mefloquine to dihydroartemisinin-piperaquine did not (mean increase of 22·1 ms [SD 19·2] for dihydroartemisinin-piperaquine vs 20·8 ms [SD 17·8] for dihydroartemisinin-piperaquine plus mefloquine; p=0·50).

    Interpretation: Dihydroartemisinin-piperaquine plus mefloquine and artemether-lumefantrine plus amodiaquine TACTs are efficacious, well tolerated, and safe treatments of uncomplicated P falciparum malaria, including in areas with artemisinin and ACT partner-drug resistance.

    Funding: UK Department for International Development, Wellcome Trust, Bill & Melinda Gates Foundation, UK Medical Research Council, and US National Institutes of Health.

    Lancet (London, England) 2020

  • Companion canines: an under-utilised model to aid in translating anti-metastatics to the clinic.

    van der Weyden L, Starkey M, Abu-Helil B, Mutsaers AJ and Wood GA

    Experimental Cancer Genetics (T113), Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Clinical & experimental metastasis 2020;37;1;7-12

  • A Genome-Wide Screen in Mice To Identify Cell-Extrinsic Regulators of Pulmonary Metastatic Colonisation.

    van der Weyden L, Swiatkowska A, Iyer V, Speak AO and Adams DJ

    Wellcome Sanger Institute

    Metastatic colonisation, whereby a disseminated tumour cell is able to survive and proliferate at a secondary site, involves both tumour cell-intrinsic and -extrinsic factors. To identify tumour cell-extrinsic (microenvironmental) factors that regulate the ability of metastatic tumour cells to effectively colonise a tissue, we performed a genome-wide screen utilising the experimental metastasis assay on mutant mice. Mutant and wildtype (control) mice were tail vein-dosed with murine metastatic melanoma B16-F10 cells and 10 days later the number of pulmonary metastatic colonies were counted. Of the 1,300 genes/genetic locations (1,344 alleles) assessed in the screen 34 genes were determined to significantly regulate pulmonary metastatic colonisation (15 increased and 19 decreased; <i>P</i><0.005 and genotype effect +60). Whilst several of these genes have known roles in immune system regulation (<i>Bach2, Cyba, Cybb, Cybc1, Id2, Igh-6, Irf1</i>, <i>Irf7, Ncf1, Ncf2, Ncf4</i> and <i>Pik3cg</i>) most are involved in a disparate range of biological processes, ranging from ubiquitination (<i>Herc1</i>) to diphthamide synthesis (<i>Dph6</i>) to Rho GTPase-activation (<i>Arhgap30</i> and <i>Fgd4</i>), with no previous reports of a role in the regulation of metastasis. Thus, we have identified numerous novel regulators of pulmonary metastatic colonisation, which may represent potential therapeutic targets.

    G3 (Bethesda, Md.) 2020

  • The single-cell eQTLGen consortium.

    van der Wijst MG, de Vries DH, Groot HE, Trynka G, Hon CC, Bonder MJ, Stegle O, Nawijn M, Idaghdour Y, van der Harst P, Ye CJ, Powell J, Theis FJ, Mahfouz A, Heinig M and Franke L

    Genetics, University of Groningen, University Medical Center Groningen, Groningen, Netherlands.

    In recent years, functional genomics approaches combining genetic information with bulk RNA-sequencing data have identified the downstream expression effects of disease-associated genetic risk factors through so-called expression quantitative trait locus (eQTL) analysis. Single-cell RNA-sequencing creates enormous opportunities for mapping eQTLs across different cell types and in dynamic processes, many of which are obscured when using bulk methods. Rapid increase in throughput and reduction in cost per cell now allow this technology to be applied to large-scale population genetics studies. To fully leverage these emerging data resources, we have founded the single-cell eQTLGen consortium (sc-eQTLGen), aimed at pinpointing the cellular contexts in which disease-causing genetic variants affect gene expression. Here, we outline the goals, approach and potential utility of the sc-eQTLGen consortium. We also provide a set of study design considerations for future single-cell eQTL studies.

    Funded by: Dutch Research Council: NWO-Veni 192.029, ZonMW-VIDI 917.14.374; European Research Council: ERC Starting grant Immrisk 637640; National Health and Medical Research Council: Investigator grant 1175781

    eLife 2020;9

  • Inhibition of Resistance-Refractory P. falciparum Kinase PKG Delivers Prophylactic, Blood Stage, and Transmission-Blocking Antiplasmodial Activity.

    Vanaerschot M, Murithi JM, Pasaje CFA, Ghidelli-Disse S, Dwomoh L, Bird M, Spottiswoode N, Mittal N, Arendse LB, Owen ES, Wicht KJ, Siciliano G, Bösche M, Yeo T, Kumar TRS, Mok S, Carpenter EF, Giddins MJ, Sanz O, Ottilie S, Alano P, Chibale K, Llinás M, Uhlemann AC, Delves M, Tobin AB, Doerig C, Winzeler EA, Lee MCS, Niles JC and Fidock DA

    Department of Microbiology and Immunology, Columbia University Irving Medical Center, New York, NY 10032, USA.

    The search for antimalarial chemotypes with modes of action unrelated to existing drugs has intensified with the recent failure of first-line therapies across Southeast Asia. Here, we show that the trisubstituted imidazole MMV030084 potently inhibits hepatocyte invasion by Plasmodium sporozoites, merozoite egress from asexual blood stage schizonts, and male gamete exflagellation. Metabolomic, phosphoproteomic, and chemoproteomic studies, validated with conditional knockdown parasites, molecular docking, and recombinant kinase assays, identified cGMP-dependent protein kinase (PKG) as the primary target of MMV030084. PKG is known to play essential roles in Plasmodium invasion of and egress from host cells, matching MMV030084's activity profile. Resistance selections and gene editing identified tyrosine kinase-like protein 3 as a low-level resistance mediator for PKG inhibitors, while PKG itself never mutated under pressure. These studies highlight PKG as a resistance-refractory antimalarial target throughout the Plasmodium life cycle and promote MMV030084 as a promising Plasmodium PKG-targeting chemotype.

    Cell chemical biology 2020

  • Using Reactome to build an autophagy mechanism knowledgebase.

    Varusai TM, Jupe S, Sevilla C, Matthews L, Gillespie M, Stein L, Wu G, D'Eustachio P, Metzakopian E and Hermjakob H

    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus , Cambridge, UK.

    The 21st century has revealed much about the fundamental cellular process of autophagy. Autophagy controls the catabolism and recycling of various cellular components both as a constitutive process and as a response to stress and foreign material invasion. There is considerable knowledge of the molecular mechanisms of autophagy, and this is still growing as new modalities emerge. There is a need to investigate autophagy mechanisms reliably, comprehensively and conveniently. Reactome is a freely available knowledgebase that consists of manually curated molecular events (reactions) organized into cellular pathways ( Pathways/reactions in Reactome are hierarchically structured, graphically presented and extensively annotated. Data analysis tools, such as pathway enrichment, expression data overlay and species comparison, are also available. For customized analysis, information can also be programmatically queried. Here, we discuss the curation and annotation of the molecular mechanisms of autophagy in Reactome. We also demonstrate the value that Reactome adds to research by reanalyzing a previously published work on genome-wide CRISPR screening of autophagy components.

    Abbreviations: CMA: chaperone-mediated autophagy; GO: Gene Ontology; MA: macroautophagy; MI: microautophagy; MTOR: mechanistic target of rapamycin kinase; SQSTM1: sequestosome 1.

    Autophagy 2020;1-12

  • Fine capsule variation affects bacteriophage susceptibility in Klebsiella pneumoniae ST258.

    Venturini C, Ben Zakour NL, Bowring B, Morales S, Cole R, Kovach Z, Branston S, Kettle E, Thomson N and Iredell JR

    Centre for Infectious Diseases and Microbiology, The Westmead Institute for Medical Research (WIMR), Westmead, NSW, Australia.

    Multidrug resistant (MDR) carbapenemase-producing (CP) Klebsiella pneumoniae, belonging to clonal group CG258, is capable of causing severe disease in humans and is classified as an urgent threat by health agencies worldwide. Bacteriophages are being actively explored as therapeutic alternatives to antibiotics. In an effort to define a robust experimental approach for effective selection of lytic viruses for therapy, we have fully characterized the genomes of 18 Kumoniae target strains and tested them against novel lytic bacteriophages (n = 65). The genomes of K pneumoniae carrying bla<sub>NDM</sub> and bla<sub>KPC</sub> were sequenced and CG258 isolates selected for bacteriophage susceptibility testing. The local K pneumoniae CG258 population was dominated by sequence type ST258 clade 1 (86%) with variations in capsular locus (cps) and prophage content. CG258-specific bacteriophages primarily targeted the capsule, but successful infection is also likely blocked in some by immunity conferred by existing prophages. Five tailed bacteriophages against K pneumoniae ST258 clade 1 were selected for further characterization. Our findings show that effective control of K pneumoniae CG258 with bacteriophage will require mixes of diverse lytic viruses targeting relevant cps variants and allowing for variable prophage content. These insights will facilitate identification and selection of therapeutic bacteriophage candidates against this serious pathogen.

    Funded by: Cancer Institute New South Wales; Ian Potter Foundation; National Health and Medical Research Council: GRP1107322; Westmead Research Hub

    FASEB journal : official publication of the Federation of American Societies for Experimental Biology 2020

  • De Novo Variants in CNOT1, a Central Component of the CCR4-NOT Complex Involved in Gene Expression and RNA and Protein Stability, Cause Neurodevelopmental Delay.

    Vissers LELM, Kalvakuri S, de Boer E, Geuer S, Oud M, van Outersterp I, Kwint M, Witmond M, Kersten S, Polla DL, Weijers D, Begtrup A, McWalter K, Ruiz A, Gabau E, Morton JEV, Griffith C, Weiss K, Gamble C, Bartley J, Vernon HJ, Brunet K, Ruivenkamp C, Kant SG, Kruszka P, Larson A, Afenjar A, Billette de Villemeur T, Nugent K, DDD Study, Raymond FL, Venselaar H, Demurger F, Soler-Alfonso C, Li D, Bhoj E, Hayes I, Hamilton NP, Ahmad A, Fisher R, van den Born M, Willems M, Sorlin A, Delanne J, Moutton S, Christophe P, Mau-Them FT, Vitobello A, Goel H, Massingham L, Phornphutkul C, Schwab J, Keren B, Charles P, Vreeburg M, De Simone L, Hoganson G, Iascone M, Milani D, Evenepoel L, Revencu N, Ward DI, Burns K, Krantz I, Raible SE, Murrell JR, Wood K, Cho MT, van Bokhoven H, Muenke M, Kleefstra T, Bodmer R and de Brouwer APM

    Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, PO Box 9101, 6500 HB Nijmegen, the Netherlands. Electronic address:

    CNOT1 is a member of the CCR4-NOT complex, which is a master regulator, orchestrating gene expression, RNA deadenylation, and protein ubiquitination. We report on 39 individuals with heterozygous de novo CNOT1 variants, including missense, splice site, and nonsense variants, who present with a clinical spectrum of intellectual disability, motor delay, speech delay, seizures, hypotonia, and behavioral problems. To link CNOT1 dysfunction to the neurodevelopmental phenotype observed, we generated variant-specific Drosophila models, which showed learning and memory defects upon CNOT1 knockdown. Introduction of human wild-type CNOT1 was able to rescue this phenotype, whereas mutants could not or only partially, supporting our hypothesis that CNOT1 impairment results in neurodevelopmental delay. Furthermore, the genetic interaction with autism-spectrum genes, such as ASH1L, DYRK1A, MED13, and SHANK3, was impaired in our Drosophila models. Molecular characterization of CNOT1 variants revealed normal CNOT1 expression levels, with both mutant and wild-type alleles expressed at similar levels. Analysis of protein-protein interactions with other members indicated that the CCR4-NOT complex remained intact. An integrated omics approach of patient-derived genomics and transcriptomics data suggested only minimal effects on endonucleolytic nonsense-mediated mRNA decay components, suggesting that de novo CNOT1 variants are likely haploinsufficient hypomorph or neomorph, rather than dominant negative. In summary, we provide strong evidence that de novo CNOT1 variants cause neurodevelopmental delay with a wide range of additional co-morbidities. Whereas the underlying pathophysiological mechanism warrants further analysis, our data demonstrate an essential and central role of the CCR4-NOT complex in human brain development.

    American journal of human genetics 2020

  • Willingness to donate genomic and other medical data: results from Germany.

    Voigt TH, Holtz V, Niemiec E, Howard HC, Middleton A and Prainsack B

    Institute of Sociology, RWTH Aachen University, Aachen, Germany.

    This paper reports findings from Germany-based participants in the "Your DNA, Your Say" study, a collaborative effort among researchers in more than 20 countries across the world to explore public attitudes, values and opinions towards willingness to donate genomic and other personal data for use by others. Based on a representative sample of German residents (n = 1506) who completed the German-language version of the survey, we found that views of genetic exceptionalism were less prevalent in the German-language arm of the study than in the English-language arm (43% versus 52%). Also, people's willingness to make their data available for research was lower in the German than in the English-language samples of the study (56% versus 67%). In the German sample, those who were more familiar with genetics, and those holding views of genetic exceptionalism were more likely to be willing to donate data than others. We explain these findings with reference to the important role that the "right of informational self-determination" plays in German public discourse. Rather than being a particularly strict interpretation of privacy in the sense of a right to be left alone, the German understanding of informational self-determination bestows on each citizen the responsibility to carefully consider how their personal data should be used to protect important rights and to serve the public good.

    Funded by: King's College London: n/a

    European journal of human genetics : EJHG 2020

  • Mutational signatures are jointly shaped by DNA damage and repair.

    Volkova NV, Meier B, González-Huici V, Bertolini S, Gonzalez S, Vöhringer H, Abascal F, Martincorena I, Campbell PJ, Gartner A and Gerstung M

    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, CB10, 1SD, UK.

    Cells possess an armamentarium of DNA repair pathways to counter DNA damage and prevent mutation. Here we use C. elegans whole genome sequencing to systematically quantify the contributions of these factors to mutational signatures. We analyse 2,717 genomes from wild-type and 53 DNA repair defective backgrounds, exposed to 11 genotoxins, including UV-B and ionizing radiation, alkylating compounds, aristolochic acid, aflatoxin B1, and cisplatin. Combined genotoxic exposure and DNA repair deficiency alters mutation rates or signatures in 41% of experiments, revealing how different DNA alterations induced by the same genotoxin are mended by separate repair pathways. Error-prone translesion synthesis causes the majority of genotoxin-induced base substitutions, but averts larger deletions. Nucleotide excision repair prevents up to 99% of point mutations, almost uniformly across the mutation spectrum. Our data show that mutational signatures are joint products of DNA damage and repair and suggest that multiple factors underlie signatures observed in cancer genomes.

    Funded by: Cancer Research UK (CRUK): C11852/A14695; Worldwide Cancer Research: 18-0644

    Nature communications 2020;11;1;2169

  • Cytogenetic investigations in Emballonuroidea. I. Taphozoinae and Emballonurinae karyotypes evolve at different rates and share no derived chromosomal characters

    Volleth,M., Muller,S., Khan,F.A.A., Yong,H-S., Heller,K-G., Baker,R.J., Ray,D.A. and Sotero-Caio,C.G.

    We present a comparative molecular cytogenetic investigation of six emballonurid species using chromosome banding and cross-species chromosome painting with probes from Myotis myotis, supplemented by selected probes from human, tree shrew, and lemur. The main differences between the 2n = 42 Taphozous karyotype and the 2n = 44 chromosomal complement of Saccolaimus can be explained by one Robertsonian fusion, one type-b, and one type-c whole arm reciprocal translocation. The 2n = 24 karyotype of Emballonura is highly derived by splitting of 11 of the 25 chiropteran evolutionarily conserved units resulting in a total number of 36 segments. In contrast to the presence of several autapomorphies in the karyotypes of studied species from both subfamilies, no cytogenetic synapomorphy uniting Taphozoinae and Emballonurinae was found.

    Acta Chiropterologica 2020;21;257-269

  • Loss of ADAMTS19 causes progressive non-syndromic heart valve disease.

    Wünnemann F, Ta-Shma A, Preuss C, Leclerc S, van Vliet PP, Oneglia A, Thibeault M, Nordquist E, Lincoln J, Scharfenberg F, Becker-Pauly C, Hofmann P, Hoff K, Audain E, Kramer HH, Makalowski W, Nir A, Gerety SS, Hurles M, Comes J, Fournier A, Osinska H, Robins J, Pucéat M, MIBAVA Leducq Consortium principal investigators, Elpeleg O, Hitz MP and Andelfinger G

    Cardiovascular Genetics, Department of Pediatrics, Centre Hospitalier Universitaire Sainte-Justine Research Centre, University of Montreal, Montreal, Quebec, Canada.

    Valvular heart disease is observed in approximately 2% of the general population<sup>1</sup>. Although the initial observation is often localized (for example, to the aortic or mitral valve), disease manifestations are regularly observed in the other valves and patients frequently require surgery. Despite the high frequency of heart valve disease, only a handful of genes have so far been identified as the monogenic causes of disease<sup>2-7</sup>. Here we identify two consanguineous families, each with two affected family members presenting with progressive heart valve disease early in life. Whole-exome sequencing revealed homozygous, truncating nonsense alleles in ADAMTS19 in all four affected individuals. Homozygous knockout mice for Adamts19 show aortic valve dysfunction, recapitulating aspects of the human phenotype. Expression analysis using a lacZ reporter and single-cell RNA sequencing highlight Adamts19 as a novel marker for valvular interstitial cells; inference of gene regulatory networks in valvular interstitial cells positions Adamts19 in a highly discriminatory network driven by the transcription factor lymphoid enhancer-binding factor 1 downstream of the Wnt signaling pathway. Upregulation of endocardial Krüppel-like factor 2 in Adamts19 knockout mice precedes hemodynamic perturbation, showing that a tight balance in the Wnt-Adamts19-Klf2 axis is required for proper valve maturation and maintenance.

    Funded by: Deutsche Forschungsgemeinschaft (German Research Foundation): DFG-HI 1579/2-1; Fondation Leducq: 12CVD03; Fonds de Recherche du Québec - Santé (Fonds de la recherche en sante du Quebec): 27335; Heart and Stroke Foundation of Canada (Heart and Stroke Foundation): G-17-0019170; NHLBI NIH HHS: T32 HL134616; Wellcome Trust (Wellcome): WT098051

    Nature genetics 2020;52;1;40-47

  • Wikidata as a knowledge graph for the life sciences.

    Waagmeester A, Stupp G, Burgstaller-Muehlbacher S, Good BM, Griffith M, Griffith OL, Hanspers K, Hermjakob H, Hudson TS, Hybiske K, Keating SM, Manske M, Mayers M, Mietchen D, Mitraka E, Pico AR, Putman T, Riutta A, Queralt-Rosinach N, Schriml LM, Shafee T, Slenter D, Stephan R, Thornton K, Tsueng G, Tu R, Ul-Hasan S, Willighagen E, Wu C and Su AI

    Micelio, Antwerpen, Belgium.

    Wikidata is a community-maintained knowledge base that has been assembled from repositories in the fields of genomics, proteomics, genetic variants, pathways, chemical compounds, and diseases, and that adheres to the FAIR principles of findability, accessibility, interoperability and reusability. Here we describe the breadth and depth of the biomedical knowledge contained within Wikidata, and discuss the open-source tools we have built to add information to Wikidata and to synchronize it with source databases. We also demonstrate several use cases for Wikidata, including the crowdsourced curation of biomedical ontologies, phenotype-based diagnosis of disease, and drug repurposing.

    Funded by: NCATS NIH HHS: UL1 TR002550; NCI NIH HHS: U24CA237719; NHGRI NIH HHS: R00HG007940; NIAID NIH HHS: R01 AI126785; NIGMS NIH HHS: R01 GM089820, R01 GM100039, U54 GM114833; V Foundation for Cancer Research: V2018-007

    eLife 2020;9

  • ctDNA monitoring using patient-specific sequencing and integration of variant reads.

    Wan JCM, Heider K, Gale D, Murphy S, Fisher E, Mouliere F, Ruiz-Valdepenas A, Santonja A, Morris J, Chandrananda D, Marshall A, Gill AB, Chan PY, Barker E, Young G, Cooper WN, Hudecova I, Marass F, Mair R, Brindle KM, Stewart GD, Abraham JE, Caldas C, Rassl DM, Rintoul RC, Alifrangis C, Middleton MR, Gallagher FA, Parkinson C, Durrani A, McDermott U, Smith CG, Massie C, Corrie PG and Rosenfeld N

    Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK.

    Circulating tumor-derived DNA (ctDNA) can be used to monitor cancer dynamics noninvasively. Detection of ctDNA can be challenging in patients with low-volume or residual disease, where plasma contains very few tumor-derived DNA fragments. We show that sensitivity for ctDNA detection in plasma can be improved by analyzing hundreds to thousands of mutations that are first identified by tumor genotyping. We describe the INtegration of VAriant Reads (INVAR) pipeline, which combines custom error-suppression methods and signal-enrichment approaches based on biological features of ctDNA. With this approach, the detection limit in each sample can be estimated independently based on the number of informative reads sequenced across multiple patient-specific loci. We applied INVAR to custom hybrid-capture sequencing data from 176 plasma samples from 105 patients with melanoma, lung, renal, glioma, and breast cancer across both early and advanced disease. By integrating signal across a median of >10<sup>5</sup> informative reads, ctDNA was routinely quantified to 1 mutant molecule per 100,000, and in some cases with high tumor mutation burden and/or plasma input material, to parts per million. This resulted in median area under the curve (AUC) values of 0.98 in advanced cancers and 0.80 in early-stage and challenging settings for ctDNA detection. We generalized this method to whole-exome and whole-genome sequencing, showing that INVAR may be applied without requiring personalized sequencing panels so long as a tumor mutation list is available. As tumor sequencing becomes increasingly performed, such methods for personalized cancer monitoring may enhance the sensitivity of cancer liquid biopsies.

    Science translational medicine 2020;12;548

  • Adult Human Glioblastomas Harbor Radial Glia-like Cells.

    Wang R, Sharma R, Shen X, Laughney AM, Funato K, Clark PJ, Shpokayte M, Morgenstern P, Navare M, Xu Y, Harbi S, Masilionis I, Nanjangud G, Yang Y, Duran-Rehbein G, Hemberg M, Pe'er D and Tabar V

    Department of Neurosurgery, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.

    Radial glia (RG) cells are the first neural stem cells to appear during embryonic development. Adult human glioblastomas harbor a subpopulation of RG-like cells with typical RG morphology and markers. The cells exhibit the classic and unique mitotic behavior of normal RG in a cell-autonomous manner. Single-cell RNA sequencing analyses of glioblastoma cells reveal transcriptionally dynamic clusters of RG-like cells that share the profiles of normal human fetal radial glia and that reside in quiescent and cycling states. Functional assays show a role for interleukin in triggering exit from dormancy into active cycling, suggesting a role for inflammation in tumor progression. These data are consistent with the possibility of persistence of RG into adulthood and their involvement in tumor initiation or maintenance. They also provide a putative cellular basis for the persistence of normal developmental programs in adult tumors.

    Stem cell reports 2020;14;2;338-350

  • Phylogenomics of expanding uncultured environmental Tenericutes provides insights into their pathogenicity and evolutionary relationship with Bacilli.

    Wang Y, Huang JM, Zhou YL, Almeida A, Finn RD, Danchin A and He LS

    Institute of Deep Sea Science and Engineering, Chinese Academy of Sciences, No. 28, Luhuitou Road, Sanya, Hai Nan, P.R. China.

    Background: The metabolic capacity, stress response and evolution of uncultured environmental Tenericutes have remained elusive, since previous studies have been largely focused on pathogenic species. In this study, we expanded analyses on Tenericutes lineages that inhabit various environments using a collection of 840 genomes.

    Results: Several environmental lineages were discovered inhabiting the human gut, ground water, bioreactors and hypersaline lake and spanning the Haloplasmatales and Mycoplasmatales orders. A phylogenomics analysis of Bacilli and Tenericutes genomes revealed that some uncultured Tenericutes are affiliated with novel clades in Bacilli, such as RF39, RFN20 and ML615. Erysipelotrichales and two major gut lineages, RF39 and RFN20, were found to be neighboring clades of Mycoplasmatales. We detected habitat-specific functional patterns between the pathogenic, gut and the environmental Tenericutes, where genes involved in carbohydrate storage, carbon fixation, mutation repair, environmental response and amino acid cleavage are overrepresented in the genomes of environmental lineages, perhaps as a result of environmental adaptation. We hypothesize that the two major gut lineages, namely RF39 and RFN20, are probably acetate and hydrogen producers. Furthermore, deteriorating capacity of bactoprenol synthesis for cell wall peptidoglycan precursors secretion is a potential adaptive strategy employed by these lineages in response to the gut environment.

    Conclusions: This study uncovers the characteristic functions of environmental Tenericutes and their relationships with Bacilli, which sheds new light onto the pathogenicity and evolutionary processes of Mycoplasmatales.

    Funded by: the National Key Research and Development Program of China: 2016YFC0302504 and 2018YFC0310005

    BMC genomics 2020;21;1;408

  • Transcriptome of the parasitic flatworm Schistosoma mansoni during intra-mammalian development.

    Wangwiwatsin A, Protasio AV, Wilson S, Owusu C, Holroyd NE, Sanders MJ, Keane J, Doenhoff MJ, Rinaldi G and Berriman M

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom.

    Schistosomes are parasitic blood flukes that survive for many years within the mammalian host vasculature. How the parasites establish a chronic infection in the hostile bloodstream environment, whilst evading the host immune response is poorly understood. The parasite develops morphologically and grows as it migrates to its preferred vascular niche, avoiding or repairing damage from the host immune system. In this study, we investigated temporal changes in gene expression during the intra-mammalian development of Schistosoma mansoni. RNA-seq data were analysed from parasites developing in the lung through to egg-laying mature adult worms, providing a comprehensive picture of in vivo intra-mammalian development. Remarkably, genes involved in signalling pathways, developmental control, and adaptation to oxidative stress were up-regulated in the lung stage. The data also suggested a potential role in immune evasion for a previously uncharacterised gene. This study not only provides a large and comprehensive data resource for the research community, but also reveals new directions for further characterising host-parasite interactions that could ultimately lead to new control strategies for this neglected tropical disease pathogen.

    PLoS neglected tropical diseases 2020;14;5;e0007743

  • An improved pig reference genome sequence to enable pig genetics and genomics research.

    Warr A, Affara N, Aken B, Beiki H, Bickhart DM, Billis K, Chow W, Eory L, Finlayson HA, Flicek P, Girón CG, Griffin DK, Hall R, Hannum G, Hourlier T, Howe K, Hume DA, Izuogu O, Kim K, Koren S, Liu H, Manchanda N, Martin FJ, Nonneman DJ, O'Connor RE, Phillippy AM, Rohrer GA, Rosen BD, Rund LA, Sargent CA, Schook LB, Schroeder SG, Schwartz AS, Skinner BM, Talbot R, Tseng E, Tuggle CK, Watson M, Smith TPL and Archibald AL

    The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush Campus, Midlothian EH25 9RG, UK.

    Background: The domestic pig (Sus scrofa) is important both as a food source and as a biomedical model given its similarity in size, anatomy, physiology, metabolism, pathology, and pharmacology to humans. The draft reference genome (Sscrofa10.2) of a purebred Duroc female pig established using older clone-based sequencing methods was incomplete, and unresolved redundancies, short-range order and orientation errors, and associated misassembled genes limited its utility.

    Results: We present 2 annotated highly contiguous chromosome-level genome assemblies created with more recent long-read technologies and a whole-genome shotgun strategy, 1 for the same Duroc female (Sscrofa11.1) and 1 for an outbred, composite-breed male (USMARCv1.0). Both assemblies are of substantially higher (>90-fold) continuity and accuracy than Sscrofa10.2.

    Conclusions: These highly contiguous assemblies plus annotation of a further 11 short-read assemblies provide an unprecedented view of the genetic make-up of this important agricultural and biomedical model species. We propose that the improved Duroc assembly (Sscrofa11.1) become the reference genome for genomic research in pigs.

    GigaScience 2020;9;6

  • Association of aberrant ASNS imprinting with asparaginase sensitivity and chromosomal abnormality in childhood BCP-ALL.

    Watanabe A, Miyake K, Nordlund J, Syvänen AC, van der Weyden L, Honda H, Yamasaki N, Nagamachi A, Inaba T, Ikawa T, Urayama KY, Kiyokawa N, Ohara A, Kimura S, Kubota Y, Takita J, Goto H, Sakaguchi K, Minegishi M, Iwamoto S, Shinohara T, Kagami K, Abe M, Akahane K, Goi K, Sugita K and Inukai T

    University of Yamanashi, School of Medicine, Chuo-city, Japan.

    Karyotype is an important prognostic factor in childhood B-cell precursor acute lymphoblastic leukemia (BCP-ALL), but the underlying pharmacogenomics remains unknown. Asparaginase is an integral component in current chemotherapy for childhood BCP-ALL. Asparaginase therapy depletes serum asparagine. Normal hematopoietic cells can produce asparagine by asparagine synthetase (ASNS) activity, while ALL cells are unable to synthesize adequate amounts of asparagine. The ASNS gene has a typical CpG island in its promoter. Thus, methylation of the ASNS CpG island could be one of the epigenetic mechanisms for ASNS gene silencing in BCP-ALL. To gain deep insights into the pharmacogenomics of asparaginase therapy, we investigated the association of ASNS methylation status with asparaginase sensitivity. ASNS CpG island is largely unmethylated in normal hematopoietic cells but is allele-specifically methylated in BCP-ALL cells. The ASNS gene is located at 7q21, an evolutionally conserved imprinted gene cluster. ASNS methylation in childhood BCP-ALL is associated with an aberrant methylation of the imprinted gene cluster at 7q21. Aberrant methylation of mouse Asns and a syntenic imprinted gene cluster is also confirmed in leukemic spleen samples from ETV6-RUNX1 knock-in mice. In three childhood BCP-ALL cohorts, ASNS is highly methylated in BCP-ALL with favorable karyotypes but is mostly unmethylated in BCP-ALL with poor prognostic karyotypes. Higher ASNS methylation is associated with higher l-asparaginase sensitivity in BCP-ALL through lower ASNS gene and protein expression levels. These observations demonstrate that silencing of the ASNS gene due to aberrant imprinting is a pharmacogenetic mechanism for the leukemia-specific activity of asparaginase therapy in BCP-ALL.

    Blood 2020

  • The dental proteome of Homo antecessor.

    Welker F, Ramos-Madrigal J, Gutenbrunner P, Mackie M, Tiwary S, Rakownikow Jersie-Christensen R, Chiva C, Dickinson MR, Kuhlwilm M, de Manuel M, Gelabert P, Martinón-Torres M, Margvelashvili A, Arsuaga JL, Carbonell E, Marques-Bonet T, Penkman K, Sabidó E, Cox J, Olsen JV, Lordkipanidze D, Racimo F, Lalueza-Fox C, Bermúdez de Castro JM, Willerslev E and Cappellini E

    Evolutionary Genomics Section, Globe Institute, University of Copenhagen, Copenhagen, Denmark.

    The phylogenetic relationships between hominins of the Early Pleistocene epoch in Eurasia, such as Homo antecessor, and hominins that appear later in the fossil record during the Middle Pleistocene epoch, such as Homo sapiens, are highly debated<sup>1-5</sup>. For the oldest remains, the molecular study of these relationships is hindered by the degradation of ancient DNA. However, recent research has demonstrated that the analysis of ancient proteins can address this challenge<sup>6-8</sup>. Here we present the dental enamel proteomes of H. antecessor from Atapuerca (Spain)<sup>9,10</sup> and Homo erectus from Dmanisi (Georgia)<sup>1</sup>, two key fossil assemblages that have a central role in models of Pleistocene hominin morphology, dispersal and divergence. We provide evidence that H. antecessor is a close sister lineage to subsequent Middle and Late Pleistocene hominins, including modern humans, Neanderthals and Denisovans. This placement implies that the modern-like face of H. antecessor-that is, similar to that of modern humans-may have a considerably deep ancestry in the genus Homo, and that the cranial morphology of Neanderthals represents a derived form. By recovering AMELY-specific peptide sequences, we also conclude that the H. antecessor molar fragment from Atapuerca that we analysed belonged to a male individual. Finally, these H. antecessor and H. erectus fossils preserve evidence of enamel proteome phosphorylation and proteolytic digestion that occurred in vivo during tooth formation. Our results provide important insights into the evolutionary relationships between H. antecessor and other hominin groups, and pave the way for future studies using enamel proteomes to investigate hominin biology across the existence of the genus Homo.

    Funded by: Wellcome Trust

    Nature 2020;580;7802;235-238

  • Obstacles to detecting isoforms using full-length scRNA-seq data.

    Westoby J, Artemov P, Hemberg M and Ferguson-Smith A

    Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK.

    Background: Early single-cell RNA-seq (scRNA-seq) studies suggested that it was unusual to see more than one isoform being produced from a gene in a single cell, even when multiple isoforms were detected in matched bulk RNA-seq samples. However, these studies generally did not consider the impact of dropouts or isoform quantification errors, potentially confounding the results of these analyses.

    Results: In this study, we take a simulation based approach in which we explicitly account for dropouts and isoform quantification errors. We use our simulations to ask to what extent it is possible to study alternative splicing using scRNA-seq. Additionally, we ask what limitations must be overcome to make splicing analysis feasible. We find that the high rate of dropouts associated with scRNA-seq is a major obstacle to studying alternative splicing. In mice and other well-established model organisms, the relatively low rate of isoform quantification errors poses a lesser obstacle to splicing analysis. We find that different models of isoform choice meaningfully change our simulation results.

    Conclusions: To accurately study alternative splicing with single-cell RNA-seq, a better understanding of isoform choice and the errors associated with scRNA-seq is required. An increase in the capture efficiency of scRNA-seq would also be beneficial. Until some or all of the above are achieved, we do not recommend attempting to resolve isoforms in individual cells using scRNA-seq.

    Genome biology 2020;21;1;74

  • Lean, mean, learning machines.

    Wheeler NE, Sánchez-Busó L, Argimón S and Jeffrey B

    Centre for Genomic Pathogen Surveillance, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

    Nature reviews. Microbiology 2020

  • Cell Atlas technologies and insights into tissue architecture.

    Wilbrey-Clark A, Roberts K and Teichmann SA

    Wellcome Sanger Institute, Cambridge, U.K.

    Since Robert Hooke first described the existence of 'cells' in 1665, scientists have sought to identify and further characterise these fundamental units of life. While our understanding of cell location, morphology and function has expanded greatly; our understanding of cell types and states at the molecular level, and how these function within tissue architecture, is still limited. A greater understanding of our cells could revolutionise basic biology and medicine. Atlasing initiatives like the Human Cell Atlas aim to identify all cell types at the molecular level, including their physical locations, and to make this reference data openly available to the scientific community. This is made possible by a recent technology revolution: both in single-cell molecular profiling, particularly single-cell RNA sequencing, and in spatially resolved methods for assessing gene and protein expression. Here, we review available and upcoming atlasing technologies, the biological insights gained to date and the promise of this field for the future.

    The Biochemical journal 2020;477;8;1427-1442

  • Genome-Wide Association Study of Cryptosporidiosis in Infants Implicates PRKCA.

    Wojcik GL, Korpe P, Marie C, Mentzer AJ, Carstensen T, Mychaleckyj J, Kirkpatrick BD, Rich SS, Concannon P, Faruque ASG, Haque R, Petri WA and Duggal P

    Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California, USA.

    Diarrhea is a major cause of both morbidity and mortality worldwide, especially among young children. Cryptosporidiosis is a leading cause of diarrhea in children, particularly in South Asia and sub-Saharan Africa, where it is responsible for over 200,000 deaths per year. Beyond the initial clinical presentation of diarrhea, it is associated with long-term sequelae such as malnutrition and neurocognitive developmental deficits. Risk factors include poverty and overcrowding, and yet not all children with these risk factors and exposure are infected, nor do all infected children develop symptomatic disease. One potential risk factor to explain these differences is their human genome. To identify genetic variants associated with symptomatic cryptosporidiosis, we conducted a genome-wide association study (GWAS) examining 6.5 million single nucleotide polymorphisms (SNPs) in 873 children from three independent cohorts in Dhaka, Bangladesh, namely, the Dhaka Birth Cohort (DBC), the Performance of Rotavirus and Oral Polio Vaccines in Developing Countries (PROVIDE) study, and the Cryptosporidiosis Birth Cohort (CBC). Associations were estimated separately for each cohort under an additive model, adjusting for length-for-age Z-score at 12 months of age, the first two principal components to account for population substructure, and genotyping batch. The strongest meta-analytic association was with rs58296998 (<i>P</i> = 3.73 × 10<sup>-8</sup>), an intronic SNP and expression quantitative trait locus (eQTL) of protein kinase C alpha (<i>PRKCA</i>). Each additional risk allele conferred 2.4 times the odds of <i>Cryptosporidium</i>-associated diarrhea in the first year of life. This genetic association suggests a role for protein kinase C alpha in pediatric cryptosporidiosis and warrants further investigation.<b>IMPORTANCE</b> Globally, diarrhea remains one of the major causes of pediatric morbidity and mortality. The initial symptoms of diarrhea can often lead to long-term consequences for the health of young children, such as malnutrition and neurocognitive developmental deficits. Despite many children having similar exposures to infectious causes of diarrhea, not all develop symptomatic disease, indicating a possible role for human genetic variation. Here, we conducted a genetic study of susceptibility to symptomatic disease associated with <i>Cryptosporidium</i> infection (a leading cause of diarrhea) in three independent cohorts of infants from Dhaka, Bangladesh. We identified a genetic variant within protein kinase C alpha (<i>PRKCA</i>) associated with higher risk of cryptosporidiosis in the first year of life. These results indicate a role for human genetics in susceptibility to cryptosporidiosis and warrant further research to elucidate the mechanism.

    mBio 2020;11;1

  • Genomic surveillance for hypervirulence and multi-drug resistance in invasive Klebsiella pneumoniae from South and Southeast Asia.

    Wyres KL, Nguyen TNT, Lam MMC, Judd LM, van Vinh Chau N, Dance DAB, Ip M, Karkey A, Ling CL, Miliya T, Newton PN, Lan NPH, Sengduangphachanh A, Turner P, Veeraraghavan B, Vinh PV, Vongsouvath M, Thomson NR, Baker S and Holt KE

    Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, Victoria, 3004, Australia.

    Background: Klebsiella pneumoniae is a leading cause of bloodstream infection (BSI). Strains producing extended-spectrum beta-lactamases (ESBLs) or carbapenemases are considered global priority pathogens for which new treatment and prevention strategies are urgently required, due to severely limited therapeutic options. South and Southeast Asia are major hubs for antimicrobial-resistant (AMR) K. pneumoniae and also for the characteristically antimicrobial-sensitive, community-acquired "hypervirulent" strains. The emergence of hypervirulent AMR strains and lack of data on exopolysaccharide diversity pose a challenge for K. pneumoniae BSI control strategies worldwide.

    Methods: We conducted a retrospective genomic epidemiology study of 365 BSI K. pneumoniae from seven major healthcare facilities across South and Southeast Asia, extracting clinically relevant information (AMR, virulence, K and O antigen loci) using Kleborate, a K. pneumoniae-specific genomic typing tool.

    Results: K. pneumoniae BSI isolates were highly diverse, comprising 120 multi-locus sequence types (STs) and 63 K-loci. ESBL and carbapenemase gene frequencies were 47% and 17%, respectively. The aerobactin synthesis locus (iuc), associated with hypervirulence, was detected in 28% of isolates. Importantly, 7% of isolates harboured iuc plus ESBL and/or carbapenemase genes. The latter represent genotypic AMR-virulence convergence, which is generally considered a rare phenomenon but was particularly common among South Asian BSI (17%). Of greatest concern, we identified seven novel plasmids carrying both iuc and AMR genes, raising the prospect of co-transfer of these phenotypes among K. pneumoniae.

    Conclusions: K. pneumoniae BSI in South and Southeast Asia are caused by different STs from those predominating in other regions, and with higher frequency of acquired virulence determinants. K. pneumoniae carrying both iuc and AMR genes were also detected at higher rates than have been reported elsewhere. The study demonstrates how genomics-based surveillance-reporting full molecular profiles including STs, AMR, virulence and serotype locus information-can help standardise comparisons between sites and identify regional differences in pathogen populations.

    Funded by: Wellcome Trust: #206194

    Genome medicine 2020;12;1;11

  • Butler enables rapid cloud-based analysis of thousands of human genomes.

    Yakneen S, Waszak SM, PCAWG Technical Working Group, Gertz M, Korbel JO and PCAWG Consortium

    European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany.

    We present Butler, a computational tool that facilitates large-scale genomic analyses on public and academic clouds. Butler includes innovative anomaly detection and self-healing functions that improve the efficiency of data processing and analysis by 43% compared with current approaches. Butler enabled processing of a 725-terabyte cancer genome dataset from the Pan-Cancer Analysis of Whole Genomes (PCAWG) project in a time-efficient and uniform manner.

    Funded by: EC | European Research Council (ERC): 336045; European Commission (EC): 739563; European Molecular Biology Organization (EMBO): ALTF 755-2014

    Nature biotechnology 2020;38;3;288-292

  • Stratification and prediction of drug synergy based on target functional similarity.

    Yang M, Jaaks P, Dry J, Garnett M, Menden MP and Saez-Rodriguez J

    Heidelberg University, Faculty of Biosciences, Germany.

    Drug combinations can expand therapeutic options and address cancer's resistance. However, the combinatorial space is enormous precluding its systematic exploration. Therefore, synergy prediction strategies are essential. We here present an approach to prioritise drug combinations in high-throughput screens and to stratify synergistic responses. At the core of our approach is the observation that the likelihood of synergy increases when targeting proteins with either strong functional similarity or dissimilarity. We estimate the similarity applying a multitask machine learning approach to basal gene expression and response to single drugs. We tested 7 protein target pairs (representing 29 combinations) and predicted their synergies in 33 breast cancer cell lines. In addition, we experimentally validated predicted synergy of the BRAF/insulin receptor combination (Dabrafenib/BMS-754807) in 48 colorectal cancer cell lines. We anticipate that our approaches can be used for prioritization of drug combinations in large scale screenings, and to maximize the efficacy of drugs already known to induce synergy, ultimately enabling patient stratification.

    NPJ systems biology and applications 2020;6;1;16

  • Reply to: 'Browning capabilities of human primary adipose-derived stromal cells compared to SGBS cells'.

    Yeo CR, Agrawal M, Hoon S, Shabbir A, Shrivastava MK, Huang S, Khoo CM, Chhay V, Shabeer M, Shyong Tai E, Vidal-Puig A and Toh SA

    Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, 117599, Singapore, Singapore.

    Scientific reports 2020;10;1;9634

  • The Antimalarial Natural Product Salinipostin A Identifies Essential α/β Serine Hydrolases Involved in Lipid Metabolism in P. falciparum Parasites.

    Yoo E, Schulze CJ, Stokes BH, Onguka O, Yeo T, Mok S, Gnädig NF, Zhou Y, Kurita K, Foe IT, Terrell SM, Boucher MJ, Cieplak P, Kumpornsin K, Lee MCS, Linington RG, Long JZ, Uhlemann AC, Weerapana E, Fidock DA and Bogyo M

    Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA.

    Salinipostin A (Sal A) is a potent antiplasmodial marine natural product with an undefined mechanism of action. Using a Sal A-derived activity-based probe, we identify its targets in the Plasmodium falciparum parasite. All of the identified proteins contain α/β serine hydrolase domains and several are essential for parasite growth. One of the essential targets displays a high degree of homology to human monoacylglycerol lipase (MAGL) and is able to process lipid esters including a MAGL acylglyceride substrate. This Sal A target is inhibited by the anti-obesity drug Orlistat, which disrupts lipid metabolism. Resistance selections yielded parasites that showed only minor reductions in sensitivity and that acquired mutations in a PRELI domain-containing protein linked to drug resistance in Toxoplasma gondii. This inability to evolve efficient resistance mechanisms combined with the non-essentiality of human homologs makes the serine hydrolases identified here promising antimalarial targets.

    Cell chemical biology 2020;27;2;143-157.e5

  • Tobacco smoking and somatic mutations in human bronchial epithelium.

    Yoshida K, Gowers KHC, Lee-Six H, Chandrasekharan DP, Coorens T, Maughan EF, Beal K, Menzies A, Millar FR, Anderson E, Clarke SE, Pennycuick A, Thakrar RM, Butler CR, Kakiuchi N, Hirano T, Hynds RE, Stratton MR, Martincorena I, Janes SM and Campbell PJ

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, UK.

    Tobacco smoking causes lung cancer<sup>1-3</sup>, a process that is driven by more than 60 carcinogens in cigarette smoke that directly damage and mutate DNA<sup>4,5</sup>. The profound effects of tobacco on the genome of lung cancer cells are well-documented<sup>6-10</sup>, but equivalent data for normal bronchial cells are lacking. Here we sequenced whole genomes of 632 colonies derived from single bronchial epithelial cells across 16 subjects. Tobacco smoking was the major influence on mutational burden, typically adding from 1,000 to 10,000 mutations per cell; massively increasing the variance both within and between subjects; and generating several distinct mutational signatures of substitutions and of insertions and deletions. A population of cells in individuals with a history of smoking had mutational burdens that were equivalent to those expected for people who had never smoked: these cells had less damage from tobacco-specific mutational processes, were fourfold more frequent in ex-smokers than current smokers and had considerably longer telomeres than their more-mutated counterparts. Driver mutations increased in frequency with age, affecting 4-14% of cells in middle-aged subjects who had never smoked. In current smokers, at least 25% of cells carried driver mutations and 0-6% of cells had two or even three drivers. Thus, tobacco smoking increases mutational burden, cell-to-cell heterogeneity and driver mutations, but quitting promotes replenishment of the bronchial epithelium from mitotically quiescent cells that have avoided tobacco mutagenesis.

    Funded by: Cancer Research UK: A21777, A23024; Medical Research Council: MR/R015635/1; Wellcome Trust: 088340, 206194, 209199

    Nature 2020;578;7794;266-272

  • COVID-19 autopsy in people who died in community settings: the first series.

    Youd E and Moore L

    Department of Histopathology, Cwm Taf Morgannwg Health Board, Llantrisant, UK.

    Here, we report the pathological findings of nine complete autopsies of individuals who died in community settings in the UK, three of which were positive for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), three tested negative for SARS-CoV-2 but are likely false negatives, and three died of other respiratory infections. Autopsy revealed firm, consolidated lungs or lobar pneumonia. Histology of the lungs showed changes of diffuse alveolar damage with fibrin membrane formation, thickened alveolar walls and interstitium with lymphocytic infiltrate, and type 2 pneumocyte hyperplasia with shedding into the alveolar space. This series is the first in the world to describe autopsy findings in individuals dying suddenly in the community, not previously known to have COVID-19 infection, and the first autopsy series in the UK. During a time when testing in the UK is currently primarily offered to patients in hospital or symptomatic key workers, with limited testing available in community settings, it highlights the importance of testing for COVID-19 at autopsy. Two deaths occurred in care homes where a diagnosis of COVID-19 allowed the health protection team to provide support in that 'closed setting' to reduce the risks of onward transmission. This work highlights the need for frequent COVID-19 testing in the management of patients in community settings. Comprehensive virology and microbiology assessment is pivotal to correctly identify the cause of death, including those due to COVID-19 infection, and to derive accurate death statistics.

    Journal of clinical pathology 2020

  • Comprehensive molecular characterization of mitochondrial genomes in human cancers.

    Yuan Y, Ju YS, Kim Y, Li J, Wang Y, Yoon CJ, Yang Y, Martincorena I, Creighton CJ, Weinstein JN, Xu Y, Han L, Kim HL, Nakagawa H, Park K, Campbell PJ, Liang H and PCAWG Consortium

    Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.

    Mitochondria are essential cellular organelles that play critical roles in cancer. Here, as part of the International Cancer Genome Consortium/The Cancer Genome Atlas Pan-Cancer Analysis of Whole Genomes Consortium, which aggregated whole-genome sequencing data from 2,658 cancers across 38 tumor types, we performed a multidimensional, integrated characterization of mitochondrial genomes and related RNA sequencing data. Our analysis presents the most definitive mutational landscape of mitochondrial genomes and identifies several hypermutated cases. Truncating mutations are markedly enriched in kidney, colorectal and thyroid cancers, suggesting oncogenic effects with the activation of signaling pathways. We find frequent somatic nuclear transfers of mitochondrial DNA, some of which disrupt therapeutic target genes. Mitochondrial copy number varies greatly within and across cancers and correlates with clinical variables. Co-expression analysis highlights the function of mitochondrial genes in oxidative phosphorylation, DNA repair and the cell cycle, and shows their connections with clinically actionable genes. Our study lays a foundation for translating mitochondrial biology into clinical applications.

    Nature genetics 2020;52;3;342-352

  • Lineage specific evolution and gene flow in Listeria monocytogenes is independent of bacteriophages.

    Zamudio R, Haigh RD, Ralph JD, De Ste Croix M, Tasara T, Zurfluh K, Kwun MJ, Millard AD, Bentley SD, Croucher NJ, Stephan R and Oggioni MR

    Department of Genetics and Genome Biology, University of Leicester, Leicester, UK.

    Listeria monocytogenes is a foodborne pathogen causing systemic infection with high mortality. To allow efficient tracing of outbreaks a clear definition of the genomic signature of a cluster of related isolates is required, but lineage specific characteristics call for a more detailed understanding of evolution. In our work we used core genome MLST (cgMLST) to identify new outbreaks combined to core genome SNP analysis to characterize the population structure and gene flow between lineages. Whilst analysing differences between the four lineages of L. monocytogenes we have detected differences in the recombination rate, and interestingly also divergence in the SNP differences between sub-lineages. In addition, the exchange of core genome variation between the lineages exhibited a distinct pattern, with lineage III being the best donor for horizontal gene transfer. Whilst attempting to link bacteriophage mediated transduction to observed gene transfer, we found an inverse correlation between phage presence in a lineage and the extent of recombination. Irrespective of the profound differences in recombination rates observed between sub-lineages and lineages we found that the previously proposed cut-off of 10 allelic differences in cgMLST can be still considered valid for the definition of a foodborne outbreak cluster of L. monocytogenes. This article is protected by copyright. All rights reserved.

    Environmental microbiology 2020

  • Local and Universal Action: The Paradoxes of Indole Signalling in Bacteria.

    Zarkan A, Liu J, Matuszewska M, Gaimster H and Summers DK

    Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK. Electronic address:

    Indole is a signalling molecule produced by many bacterial species and involved in intraspecies, interspecies, and interkingdom signalling. Despite the increasing volume of research published in this area, many aspects of indole signalling remain enigmatic. There is disagreement over the mechanism of indole import and export and no clearly defined target through which its effects are exerted. Progress is hindered further by the confused and sometimes contradictory body of indole research literature. We explore the reasons behind this lack of consistency and speculate whether the discovery of a new, pulse mode of indole signalling, together with a move away from the idea of a conventional protein target, might help to overcome these problems and enable the field to move forward.

    Trends in microbiology 2020;28;7;566-577

  • Widening of the genetic and clinical spectrum of Lamb-Shaffer syndrome, a neurodevelopmental disorder due to SOX5 haploinsufficiency.

    Zawerton A, Mignot C, Sigafoos A, Blackburn PR, Haseeb A, McWalter K, Ichikawa S, Nava C, Keren B, Charles P, Marey I, Tabet AC, Levy J, Perrin L, Hartmann A, Lesca G, Schluth-Bolard C, Monin P, Dupuis-Girod S, Guillen Sacoto MJ, Schnur RE, Zhu Z, Poisson A, El Chehadeh S, Alembik Y, Bruel AL, Lehalle D, Nambot S, Moutton S, Odent S, Jaillard S, Dubourg C, Hilhorst-Hofstee Y, Barbaro-Dieber T, Ortega L, Bhoj EJ, Masser-Frye D, Bird LM, Lindstrom K, Ramsey KM, Narayanan V, Fassi E, Willing M, Cole T, Salter CG, Akilapa R, Vandersteen A, Canham N, Rump P, Gerkes EH, Klein Wassink-Ruiter JS, Bijlsma E, Hoffer MJV, Vargas M, Wojcik A, Cherik F, Francannet C, Rosenfeld JA, Machol K, Scott DA, Bacino CA, Wang X, Clark GD, Bertoli M, Zwolinski S, Thomas RH, Akay E, Chang RC, Bressi R, Sanchez Russo R, Srour M, Russell L, Goyette AE, Dupuis L, Mendoza-Londono R, Karimov C, Joseph M, Nizon M, Cogné B, Kuechler A, Piton A, Deciphering Developmental Disorder Study, Klee EW, Lefebvre V, Clark KJ and Depienne C

    Department of Cellular & Molecular Medicine, Cleveland Clinic Lerner Research Institute, Cleveland, OH, USA.

    Purpose: Lamb-Shaffer syndrome (LAMSHF) is a neurodevelopmental disorder described in just over two dozen patients with heterozygous genetic alterations involving SOX5, a gene encoding a transcription factor regulating cell fate and differentiation in neurogenesis and other discrete developmental processes. The genetic alterations described so far are mainly microdeletions. The present study was aimed at increasing our understanding of LAMSHF, its clinical and genetic spectrum, and the pathophysiological mechanisms involved.

    Methods: Clinical and genetic data were collected through GeneMatcher and clinical or genetic networks for 41 novel patients harboring various types ofSOX5 alterations. Functional consequences of selected substitutions were investigated.

    Results: Microdeletions and truncating variants occurred throughout SOX5. In contrast, most missense variants clustered in the pivotal SOX-specific high-mobility-group domain. The latter variants prevented SOX5 from binding DNA and promoting transactivation in vitro, whereas missense variants located outside the high-mobility-group domain did not. Clinical manifestations and severity varied among patients. No clear genotype-phenotype correlations were found, except that missense variants outside the high-mobility-group domain were generally better tolerated.

    Conclusions: This study extends the clinical and genetic spectrum associated with LAMSHF and consolidates evidence that SOX5 haploinsufficiency leads to variable degrees of intellectual disability, language delay, and other clinical features.

    Funded by: AFSGT: none; Agence Nationale de la Recherche: EUHFAUTISM; Assistance Publique - Hôpitaux de Paris: none; Institut National de la Santé et de la Recherche Médicale: none

    Genetics in medicine : official journal of the American College of Medical Genetics 2020;22;3;524-537

  • High-throughput discovery of genetic determinants of circadian misalignment.

    Zhang T, Xie P, Dong Y, Liu Z, Zhou F, Pan D, Huang Z, Zhai Q, Gu Y, Wu Q, Tanaka N, Obata Y, Bradley A, Lelliott CJ, Sanger Institute Mouse Genetics Project, Nutter LMJ, McKerlie C, Flenniken AM, Champy MF, Sorg T, Herault Y, Angelis MH, Durner VG, Mallon AM, Brown SDM, Meehan T, Parkinson HE, Smedley D, Lloyd KCK, Yan J, Gao X, Seong JK, Wang CL, Sedlacek R, Liu Y, Rozman J, Yang L and Xu Y

    Cambridge-Suda Genomic Resource Center, Jiangsu Key Laboratory of Neuropsychiatric Diseases, Medical college of Soochow University, Suzhou, Jiangsu, China.

    Circadian systems provide a fitness advantage to organisms by allowing them to adapt to daily changes of environmental cues, such as light/dark cycles. The molecular mechanism underlying the circadian clock has been well characterized. However, how internal circadian clocks are entrained with regular daily light/dark cycles remains unclear. By collecting and analyzing indirect calorimetry (IC) data from more than 2000 wild-type mice available from the International Mouse Phenotyping Consortium (IMPC), we show that the onset time and peak phase of activity and food intake rhythms are reliable parameters for screening defects of circadian misalignment. We developed a machine learning algorithm to quantify these two parameters in our misalignment screen (SyncScreener) with existing datasets and used it to screen 750 mutant mouse lines from five IMPC phenotyping centres. Mutants of five genes (Slc7a11, Rhbdl1, Spop, Ctc1 and Oxtr) were found to be associated with altered patterns of activity or food intake. By further studying the Slc7a11tm1a/tm1a mice, we confirmed its advanced activity phase phenotype in response to a simulated jetlag and skeleton photoperiod stimuli. Disruption of Slc7a11 affected the intercellular communication in the suprachiasmatic nucleus, suggesting a defect in synchronization of clock neurons. Our study has established a systematic phenotype analysis approach that can be used to uncover the mechanism of circadian entrainment in mice.

    Funded by: Medical Research Council: MC_U142684172

    PLoS genetics 2020;16;1;e1008577

  • Novel Subclone of Carbapenem-Resistant Klebsiella pneumoniae Sequence Type 11 with Enhanced Virulence and Transmissibility, China.

    Zhou K, Xiao T, David S, Wang Q, Zhou Y, Guo L, Aanensen D, Holt KE, Thomson NR, Grundmann H, Shen P and Xiao Y

    We aimed to clarify the epidemiologic and clinical importance of evolutionary events that occurred in carbapenem-resistant Klebsiella pneumoniae (CRKP). We collected 203 CRKP causing bloodstream infections in a tertiary hospital in China during 2013-2017. We detected a subclonal shift in the dominant clone sequence type (ST) 11 CRKP in which the previously prevalent capsular loci (KL) 47 had been replaced by KL64 since 2016. Patients infected with ST11-KL64 CRKP had a significantly higher 30-day mortality rate than other CRKP-infected patients. Enhanced virulence was further evidenced by phenotypic tests. Phylogenetic reconstruction demonstrated that ST11-KL64 is derived from an ST11-KL47-like ancestor through recombination. We identified a pLVPK-like virulence plasmid carrying rmpA and peg-344 in ST11-KL64 exclusively from 2016 onward. The pLVPK-like-positive ST11-KL64 isolates exhibited enhanced environmental survival. Retrospective screening of a national collection identified ST11-KL64 in multiple regions. Targeted surveillance of this high-risk CRKP clone is urgently needed.

    Emerging infectious diseases 2020;26;2;289-297

  • Cell-type-specific visualisation and biochemical isolation of endogenous synaptic proteins in mice.

    Zhu F, Collins MO, Harmse J, Choudhary JS, Grant SGN and Komiyama NH

    Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, UK.

    In recent years, the remarkable molecular complexity of synapses has been revealed, with over 1,000 proteins identified in the synapse proteome. Although it is known that different receptors and other synaptic proteins are present in different types of neurons, the extent of synapse diversity across the brain is largely unknown. This is mainly due to the limitations of current techniques. Here, we report an efficient method for the purification of synaptic protein complexes, fusing a high-affinity tag to endogenous PSD95 in specific cell types. We also developed a strategy, which enables the visualisation of endogenous PSD95 with fluorescent-protein tag in Cre-recombinase-expressing cells. We demonstrate the feasibility of proteomic analysis of synaptic protein complexes and visualisation of these in specific cell types. We find that the composition of PSD95 complexes purified from specific cell types differs from those extracted from tissues with diverse cellular composition. The results suggest that there might be differential interactions in the PSD95 complexes in different brain regions. We have detected differentially interacting proteins by comparing data sets from the whole hippocampus and the CA3 subfield of the hippocampus. Therefore, these novel conditional PSD95 tagging lines will not only serve as powerful tools for precisely dissecting synapse diversity in specific brain regions and subsets of neuronal cells, but also provide an opportunity to better understand brain region- and cell-type-specific alterations associated with various psychiatric/neurological diseases. These newly developed conditional gene tagging methods can be applied to many different synaptic proteins and will facilitate research on the molecular complexity of synapses.

    Funded by: Simons Foundation: R83730; Wellcome Trust: R44459

    The European journal of neuroscience 2020;51;3;793-805

  • Translating Basic Cancer Discoveries to the Clinic.

    No authors listed

    Cancer cell 2020;37;6;735-737