Sanger Institute - Publications 2018

Number of papers published in 2018: 633

  • Loose ends: almost one in five human genes still have unresolved coding status.

    Abascal F, Juan D, Jungreis I, Martinez L, Rigau M, Rodriguez JM, Vazquez J and Tress ML

    Wellcome Trust Sanger Institute, Hinxton CB10 1SA, Cambridgeshire, UK.

    Seventeen years after the sequencing of the human genome, the human proteome is still under revision. One in eight of the 22 210 coding genes listed by the Ensembl/GENCODE, RefSeq and UniProtKB reference databases are annotated differently across the three sets. We have carried out an in-depth investigation on the 2764 genes classified as coding by one or more sets of manual curators and not coding by others. Data from large-scale genetic variation analyses suggests that most are not under protein-like purifying selection and so are unlikely to code for functional proteins. A further 1470 genes annotated as coding in all three reference sets have characteristics that are typical of non-coding genes or pseudogenes. These potential non-coding genes also appear to be undergoing neutral evolution and have considerably less supporting transcript and protein evidence than other coding genes. We believe that the three reference databases currently overestimate the number of human coding genes by at least 2000, complicating and adding noise to large-scale biomedical experiments. Determining which potential non-coding genes do not code for proteins is a difficult but vitally important task since the human reference proteome is a fundamental pillar of most basic research and supports almost all large-scale biomedical projects.

    Nucleic acids research 2018;46;14;7070-7084

  • Prediction of acute myeloid leukaemia risk in healthy individuals.

    Abelson S, Collord G, Ng SWK, Weissbrod O, Mendelson Cohen N, Niemeyer E, Barda N, Zuzarte PC, Heisler L, Sundaravadanam Y, Luben R, Hayat S, Wang TT, Zhao Z, Cirlan I, Pugh TJ, Soave D, Ng K, Latimer C, Hardy C, Raine K, Jones D, Hoult D, Britten A, McPherson JD, Johansson M, Mbabaali F, Eagles J, Miller JK, Pasternack D, Timms L, Krzyzanowski P, Awadalla P, Costa R, Segal E, Bratman SV, Beer P, Behjati S, Martincorena I, Wang JCY, Bowles KM, Quirós JR, Karakatsani A, La Vecchia C, Trichopoulou A, Salamanca-Fernández E, Huerta JM, Barricarte A, Travis RC, Tumino R, Masala G, Boeing H, Panico S, Kaaks R, Krämer A, Sieri S, Riboli E, Vineis P, Foll M, McKay J, Polidoro S, Sala N, Khaw KT, Vermeulen R, Campbell PJ, Papaemmanuil E, Minden MD, Tanay A, Balicer RD, Wareham NJ, Gerstung M, Dick JE, Brennan P, Vassiliou GS and Shlush LI

    Princess Margaret Cancer Centre, University Health Network (UHN), Toronto, Ontario, Canada.

    The incidence of acute myeloid leukaemia (AML) increases with age and mortality exceeds 90% when diagnosed after age 65. Most cases arise without any detectable early symptoms and patients usually present with the acute complications of bone marrow failure<sup>1</sup>. The onset of such de novo AML cases is typically preceded by the accumulation of somatic mutations in preleukaemic haematopoietic stem and progenitor cells (HSPCs) that undergo clonal expansion<sup>2,3</sup>. However, recurrent AML mutations also accumulate in HSPCs during ageing of healthy individuals who do not develop AML, a phenomenon referred to as age-related clonal haematopoiesis (ARCH)<sup>4-8</sup>. Here we use deep sequencing to analyse genes that are recurrently mutated in AML to distinguish between individuals who have a high risk of developing AML and those with benign ARCH. We analysed peripheral blood cells from 95 individuals that were obtained on average 6.3 years before AML diagnosis (pre-AML group), together with 414 unselected age- and gender-matched individuals (control group). Pre-AML cases were distinct from controls and had more mutations per sample, higher variant allele frequencies, indicating greater clonal expansion, and showed enrichment of mutations in specific genes. Genetic parameters were used to derive a model that accurately predicted AML-free survival; this model was validated in an independent cohort of 29 pre-AML cases and 262 controls. Because AML is rare, we also developed an AML predictive model using a large electronic health record database that identified individuals at greater risk. Collectively our findings provide proof-of-concept that it is possible to discriminate ARCH from pre-AML many years before malignant transformation. This could in future enable earlier detection and monitoring, and may help to inform intervention.

    Funded by: Cancer Research UK: 14136; Medical Research Council: G0401527, G1000143, MC_PC_12009, MC_UU_12015/1, MR/N003284/1; Wellcome Trust; World Health Organization: 001

    Nature 2018;559;7714;400-404

  • Recommendations for interpreting the loss of function PVS1 ACMG/AMP variant criterion.

    Abou Tayoun AN, Pesaran T, DiStefano MT, Oza A, Rehm HL, Biesecker LG, Harrison SM and ClinGen Sequence Variant Interpretation Working Group (ClinGen SVI)

    The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania.

    The 2015 ACMG/AMP sequence variant interpretation guideline provided a framework for classifying variants based on several benign and pathogenic evidence criteria, including a pathogenic criterion (PVS1) for predicted loss of function variants. However, the guideline did not elaborate on specific considerations for the different types of loss of function variants, nor did it provide decision-making pathways assimilating information about variant type, its location, or any additional evidence for the likelihood of a true null effect. Furthermore, this guideline did not take into account the relative strengths for each evidence type and the final outcome of their combinations with respect to PVS1 strength. Finally, criteria specifying the genes for which PVS1 can be applied are still missing. Here, as part of the ClinGen Sequence Variant Interpretation (SVI) Workgroup's goal of refining ACMG/AMP criteria, we provide recommendations for applying the PVS1 criterion using detailed guidance addressing the above-mentioned gaps. Evaluation of the refined criterion by seven disease-specific groups using heterogeneous types of loss of function variants (n = 56) showed 89% agreement with the new recommendation, while discrepancies in six variants (11%) were appropriately due to disease-specific refinements. Our recommendations will facilitate consistent and accurate interpretation of predicted loss of function variants.

    Funded by: NHGRI NIH HHS: U41 HG006834; National Human Genome Research Institute: U41HG006834

    Human mutation 2018

  • Whole-Body Single-Cell Sequencing Reveals Transcriptional Domains in the Annelid Larval Body.

    Achim K, Eling N, Vergara HM, Bertucci PY, Musser J, Vopalensky P, Brunet T, Collier P, Benes V, Marioni JC and Arendt D

    Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.

    Animal bodies comprise diverse arrays of cells. To characterize cellular identities across an entire body, we have compared the transcriptomes of single cells randomly picked from dissociated whole larvae of the marine annelid Platynereis dumerilii. We identify five transcriptionally distinct groups of differentiated cells, each expressing a unique set of transcription factors and effector genes that implement cellular phenotypes. Spatial mapping of cells into a cellular expression atlas, and wholemount in situ hybridization of group-specific genes reveals spatially coherent transcriptional domains in the larval body, comprising, for example, apical sensory-neurosecretory cells versus neural/epidermal surface cells. These domains represent new, basic subdivisions of the annelid body based entirely on differential gene expression, and are composed of multiple, transcriptionally similar cell types. They do not represent clonal domains, as revealed by developmental lineage analysis. We propose that the transcriptional domains that subdivide the annelid larval body represent families of related cell types that have arisen by evolutionary diversification. Their possible evolutionary conservation makes them a promising tool for evo-devo research.

    Funded by: Cancer Research UK

    Molecular biology and evolution 2018;35;5;1047-1062

  • Development and Evaluation of a Novel Loop-Mediated Isothermal Amplification Assay for Diagnosis of Cutaneous and Visceral Leishmaniasis.

    Adams ER, Schoone G, Versteeg I, Gomez MA, Diro E, Mori Y, Perlee D, Downing T, Saravia N, Assaye A, Hailu A, Albertini A, Ndung'u JM and Schallig H

    Research Centre for Drugs and Diagnostics, Parasitology Department, Liverpool School of Tropical Medicine, Liverpool, United Kingdom

    A novel pan-<i>Leishmania</i> loop-mediated isothermal amplification (LAMP) assay for the diagnosis of cutaneous and visceral leishmaniasis (CL and VL) that can be used in near-patient settings was developed. Primers were designed based on the 18S ribosomal DNA (rDNA) and the conserved region of minicircle kinetoplast DNA (kDNA), selected on the basis of high copy number. LAMP assays were evaluated for CL diagnosis in a prospective cohort trial of 105 patients in southwest Colombia. Lesion swab samples from CL suspects were collected and were tested using the LAMP assay, and the results were compared to those of a composite reference of microscopy and/or culture in order to calculate diagnostic accuracy. LAMP assays were tested on samples (including whole blood, peripheral blood mononuclear cells, and buffy coat) from 50 suspected VL patients from Ethiopia. Diagnostic accuracy was calculated against a reference standard of microscopy of splenic or bone marrow aspirates. To calculate analytical specificity, 100 clinical samples and isolates from fever-causing pathogens, including malaria parasites, arboviruses, and bacteria, were tested. We found that the LAMP assay had a sensitivity of 95% (95% confidence interval [CI], 87.2% to 98.5%) and a specificity of 86% (95% CI, 67.3% to 95.9%) for the diagnosis of CL. With VL suspects, the sensitivity of the LAMP assay was 92% (95% CI, 74.9% to 99.1%) and its specificity was 100% (95% CI, 85.8% to 100%) in whole blood. For CL, the LAMP assay is a sensitive tool for diagnosis and requires less equipment, time, and expertise than alternative CL diagnostics. For VL, the LAMP assay using a minimally invasive sample is more sensitive than the gold standard. Analytical specificity was 100%.

    Journal of clinical microbiology 2018;56;7

  • CTCF maintains regulatory homeostasis of cancer pathways.

    Aitken SJ, Ibarra-Soria X, Kentepozidou E, Flicek P, Feig C, Marioni JC and Odom DT

    Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.

    Background: CTCF binding to DNA helps partition the mammalian genome into discrete structural and regulatory domains. Complete removal of CTCF from mammalian cells causes catastrophic genome dysregulation, likely due to widespread collapse of 3D chromatin looping and alterations to inter- and intra-TAD interactions within the nucleus. In contrast, Ctcf hemizygous mice with lifelong reduction of CTCF expression are viable, albeit with increased cancer incidence. Here, we exploit chronic Ctcf hemizygosity to reveal its homeostatic roles in maintaining genome function and integrity.

    Results: We find that Ctcf hemizygous cells show modest but robust changes in almost a thousand sites of genomic CTCF occupancy; these are enriched for lower affinity binding events with weaker evolutionary conservation across the mouse lineage. Furthermore, we observe dysregulation of the expression of several hundred genes, which are concentrated in cancer-related pathways, and are caused by changes in transcriptional regulation. Chromatin structure is preserved but some loop interactions are destabilized; these are often found around differentially expressed genes and their enhancers. Importantly, the transcriptional alterations identified in vitro are recapitulated in mouse tumors and also in human cancers.

    Conclusions: This multi-dimensional genomic and epigenomic profiling of a Ctcf hemizygous mouse model system shows that chronic depletion of CTCF dysregulates steady-state gene expression by subtly altering transcriptional regulation, changes which can also be observed in primary tumors.

    Funded by: Cancer Research UK: 20412; Cancer Research UK (GB): 20412; European Research Council (): 615584; Pathological Society of Great Britain and Ireland (GB): SGS 2015/04/04; Wellcome Trust (GB): 106563/Z/14, 108438/Z/15, 108749/Z/15/Z, 202878/A/16/Z, 202878/B/16/Z

    Genome biology 2018;19;1;106

  • Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response.

    Alasoo K, Rodrigues J, Mukhopadhyay S, Knights AJ, Mann AL, Kundu K, HIPSCI Consortium, Hale C, Dougan G and Gaffney DJ

    Wellcome Trust Sanger Institute, Hinxton, UK.

    Regulatory variants are often context specific, modulating gene expression in a subset of possible cellular states. Although these genetic effects can play important roles in disease, the molecular mechanisms underlying context specificity are poorly understood. Here, we identified shared quantitative trait loci (QTLs) for chromatin accessibility and gene expression in human macrophages exposed to IFNγ, Salmonella and IFNγ plus Salmonella. We observed that ~60% of stimulus-specific expression QTLs with a detectable effect on chromatin altered the chromatin accessibility in naive cells, thus suggesting that they perturb enhancer priming. Such variants probably influence binding of cell-type-specific transcription factors, such as PU.1, which can then indirectly alter the binding of stimulus-specific transcription factors, such as NF-κB or STAT2. Thus, although chromatin accessibility assays are powerful for fine-mapping causal regulatory variants, detecting their downstream effects on gene expression will be challenging, requiring profiling of large numbers of stimulated cellular states and time points.

    Funded by: Medical Research Council: MC_PC_12026; Wellcome Trust: 098051, 098503

    Nature genetics 2018;50;3;424-431

  • The Malaria-Protective Human Glycophorin Structural Variant DUP4 Shows Somatic Mosaicism and Association with Hemoglobin Levels.

    Algady W, Louzada S, Carpenter D, Brajer P, Färnert A, Rooth I, Ngasala B, Yang F, Shaw MA and Hollox EJ

    Department of Genetics and Genome Biology, University of Leicester, Leicester LE1 7RH, UK.

    Glycophorin A and glycophorin B are red blood cell surface proteins and are both receptors for the parasite Plasmodium falciparum, which is the principal cause of malaria in sub-Saharan Africa. DUP4 is a complex structural genomic variant that carries extra copies of a glycophorin A-glycophorin B fusion gene and has a dramatic effect on malaria risk by reducing the risk of severe malaria by up to 40%. Using fiber-FISH and Illumina sequencing, we validate the structural arrangement of the glycophorin locus in the DUP4 variant and reveal somatic variation in copy number of the glycophorin B-glycophorin A fusion gene. By developing a simple, specific, PCR-based assay for DUP4, we show that the DUP4 variant reaches a frequency of 13% in the population of a malaria-endemic village in south-eastern Tanzania. We genotype a substantial proportion of that village and demonstrate an association of DUP4 genotype with hemoglobin levels, a phenotype related to malaria, using a family-based association test. Taken together, we show that DUP4 is a complex structural variant that may be susceptible to somatic variation and show that DUP4 is associated with a malarial-related phenotype in a longitudinally followed population.

    Funded by: Wellcome Trust: WT098051

    American journal of human genetics 2018;103;5;769-776

  • A cross-sectional analysis of ITN and IRS coverage in Namibia in 2013.

    Allcock SH, Young EH and Sandhu MS

    Department of Medicine, University of Cambridge, Cambridge, Cambridgeshire, UK.

    Background: Achieving vector control targets is a key step towards malaria elimination. Because of variations in reporting of progress towards vector control targets in 2013, the coverage of these vector control interventions in Namibia was assessed.

    Methods: Data on 9846 households, representing 41,314 people, collected in the 2013 nationally-representative Namibia Demographic and Health Survey were used to explore the coverage of two vector control methods: indoor residual spraying (IRS) and insecticide-treated nets (ITNs). Regional data on Plasmodium falciparum parasite rate in those aged 2-10 years (PfPR<sub>2-10</sub>), obtained from the Malaria Atlas Project, were used to provide information on malaria transmission intensity. Poisson regression analyses were carried out exploring the relationship between household interventions and PfPR<sub>2-10</sub>, with fully adjusted models adjusting for wealth and residence type and accounting for regional and enumeration area clustering. Additionally, the coverage as a function of government intervention zones was explored and models were compared using log-likelihood ratio tests.

    Results: Intervention coverage was greatest in the highest transmission areas (PfPR<sub>2-10</sub> ≥ 5%), but was still below target levels of 95% coverage in these regions, with 27.6% of households covered by IRS, 32.3% with an ITN and 49.0% with at least one intervention (ITN and/or IRS). In fully adjusted models, PfPR<sub>2-10</sub> ≥ 5% was strongly associated with IRS (RR 14.54; 95% CI 5.56-38.02; p < 0.001), ITN ownership (RR 5.70; 95% CI 2.84-11.45; p < 0.001) and ITN and/or IRS coverage (RR 5.32; 95% CI 3.09-9.16; p < 0.001).

    Conclusions: The prevalence of IRS and ITN interventions in 2013 did not reflect the Namibian government intervention targets. As such, there is a need to include quantitative monitoring of such interventions to reliably inform intervention strategies for malaria elimination in Namibia.

    Funded by: African Partnership for Chronic Disease Research (Medical Research Council UK partnership grant): MR/K013491/1; Medical Research Council: MR/K013491/1; Wellcome Trust; Wellcome Trust Sanger Institute: WT098051

    Malaria journal 2018;17;1;264

  • Predicting the mutations generated by repair of Cas9-induced double-strand breaks.

    Allen F, Crepaldi L, Alsinet C, Strong AJ, Kleshchevnikov V, De Angeli P, Páleníková P, Khodak A, Kiselev V, Kosicki M, Bassett AR, Harding H, Galanty Y, Muñoz-Martínez F, Metzakopian E, Jackson SP and Parts L

    Wellcome Sanger Institute, Hinxton, UK.

    The DNA mutation produced by cellular repair of a CRISPR-Cas9-generated double-strand break determines its phenotypic effect. It is known that the mutational outcomes are not random, but depend on DNA sequence at the targeted location. Here we systematically study the influence of flanking DNA sequence on repair outcome by measuring the edits generated by >40,000 guide RNAs (gRNAs) in synthetic constructs. We performed the experiments in a range of genetic backgrounds and using alternative CRISPR-Cas9 reagents. In total, we gathered data for >10<sup>9</sup> mutational outcomes. The majority of reproducible mutations are insertions of a single base, short deletions or longer microhomology-mediated deletions. Each gRNA has an individual cell-line-dependent bias toward particular outcomes. We uncover sequence determinants of the mutations produced and use these to derive a predictor of Cas9 editing outcomes. Improved understanding of sequence repair will allow better design of gene editing experiments.

    Funded by: Wellcome Trust: 098051

    Nature biotechnology 2018

  • Genome watch: Keeping tally in the microbiome.

    Almeida A and Shao Y

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    This month's Genome Watch highlights how the development of new approaches for quantifying the human microbiome may pave the way for a better understanding of microbial shifts in the context of human health and disease.

    Nature reviews. Microbiology 2018;16;3;124

  • Benchmarking taxonomic assignments based on 16S rRNA gene profiling of the microbiota from commonly sampled environments.

    Almeida A, Mitchell AL, Tarkowska A and Finn RD

    EMBL-EBI European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

    Background: Taxonomic profiling of ribosomal RNA (rRNA) sequences has been the accepted norm for inferring the composition of complex microbial ecosystems. Quantitative Insights Into Microbial Ecology (QIIME) and mothur have been the most widely used taxonomic analysis tools for this purpose, with MAPseq and QIIME 2 being two recently released alternatives. However, no independent and direct comparison between these four main tools has been performed. Here, we compared the default classifiers of MAPseq, mothur, QIIME, and QIIME 2 using synthetic simulated datasets comprised of some of the most abundant genera found in the human gut, ocean, and soil environments. We evaluate their accuracy when paired with both different reference databases and variable sub-regions of the 16S rRNA gene.

    Findings: We show that QIIME 2 provided the best recall and F-scores at genus and family levels, together with the lowest distance estimates between the observed and simulated samples. However, MAPseq showed the highest precision, with miscall rates consistently <2%. Notably, QIIME 2 was the most computationally expensive tool, with CPU time and memory usage almost 2 and 30 times higher than MAPseq, respectively. Using the SILVA database generally yielded a higher recall than using Greengenes, while assignment results of different 16S rRNA variable sub-regions varied up to 40% between samples analysed with the same pipeline.

    Conclusions: Our results support the use of either QIIME 2 or MAPseq for optimal 16S rRNA gene profiling, and we suggest that the choice between the two should be based on the level of recall, precision, and/or computational performance required.

    GigaScience 2018;7;5

  • Consistent signatures of selection from genomic analysis of pairs of temporal and spatial Plasmodium falciparum populations from The Gambia.

    Amambua-Ngwa A, Jeffries D, Amato R, Worwui A, Karim M, Ceesay S, Nyang H, Nwakanma D, Okebe J, Kwiatkowski D, Conway DJ and D'Alessandro U

    Medical Research Council Unit The Gambia at LSHTM, Banjul, The Gambia.

    Genome sequences of 247 Plasmodium falciparum isolates collected in The Gambia in 2008 and 2014 were analysed to identify changes possibly related to the scale-up of antimalarial interventions that occurred during this period. Overall, there were 15 regions across the genomes with signatures of positive selection. Five of these were sweeps around known drug resistance and antigenic loci. Signatures at antigenic loci such as thrombospodin related adhesive protein (Pftrap) were most frequent in eastern Gambia, where parasite prevalence and transmission remain high. There was a strong temporal differentiation at a non-synonymous SNP in a cysteine desulfarase (Pfnfs) involved in iron-sulphur complex biogenesis. During the 7-year period, the frequency of the lysine variant at codon 65 (Pfnfs-Q65K) increased by 22% (10% to 32%) in the Greater Banjul area. Between 2014 and 2015, the frequency of this variant increased by 6% (20% to 26%) in eastern Gambia. IC<sub>50</sub> for lumefantrine was significantly higher in Pfnfs-65K isolates. This is probably the first evidence of directional selection on Pfnfs or linked loci by lumefantrine. Given the declining malaria transmission, the consequent loss of population immunity, and sustained drug pressure, it is important to monitor Gambian P. falciparum populations for further signs of adaptation.

    Funded by: Medical Research Council: MC_EX_MR/K02440X/1, MC_UP_A900_1119; Medical Research Council (MRC): MC_EX_MR/K02440X/1

    Scientific reports 2018;8;1;9687

  • Genomic positional conservation identifies topological anchor point RNAs linked to developmental loci.

    Amaral PP, Leonardi T, Han N, Viré E, Gascoigne DK, Arias-Carrasco R, Büscher M, Pandolfini L, Zhang A, Pluchino S, Maracaja-Coutinho V, Nakaya HI, Hemberg M, Shiekhattar R, Enright AJ and Kouzarides T

    The Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK.

    Background: The mammalian genome is transcribed into large numbers of long noncoding RNAs (lncRNAs), but the definition of functional lncRNA groups has proven difficult, partly due to their low sequence conservation and lack of identified shared properties. Here we consider promoter conservation and positional conservation as indicators of functional commonality.

    Results: We identify 665 conserved lncRNA promoters in mouse and human that are preserved in genomic position relative to orthologous coding genes. These positionally conserved lncRNA genes are primarily associated with developmental transcription factor loci with which they are coexpressed in a tissue-specific manner. Over half of positionally conserved RNAs in this set are linked to chromatin organization structures, overlapping binding sites for the CTCF chromatin organiser and located at chromatin loop anchor points and borders of topologically associating domains (TADs). We define these RNAs as topological anchor point RNAs (tapRNAs). Characterization of these noncoding RNAs and their associated coding genes shows that they are functionally connected: they regulate each other's expression and influence the metastatic phenotype of cancer cells in vitro in a similar fashion. Furthermore, we find that tapRNAs contain conserved sequence domains that are enriched in motifs for zinc finger domain-containing RNA-binding proteins and transcription factors, whose binding sites are found mutated in cancers.

    Conclusions: This work leverages positional conservation to identify lncRNAs with potential importance in genome organization, development and disease. The evidence that many developmental transcription factors are physically and functionally connected to lncRNAs represents an exciting stepping-stone to further our understanding of genome regulation.

    Funded by: Cancer Research UK: 10827, C6/A18796, C6946/A14492; European Research Council: 268569; NIGMS NIH HHS: R01 GM078455; Wellcome Trust: 092096

    Genome biology 2018;19;1;32

  • Origins of the current outbreak of multidrug-resistant malaria in southeast Asia: a retrospective genetic study.

    Amato R, Pearson RD, Almagro-Garcia J, Amaratunga C, Lim P, Suon S, Sreng S, Drury E, Stalker J, Miotto O, Fairhurst RM and Kwiatkowski DP

    Wellcome Sanger Institute, Hinxton, UK; MRC Centre for Genomics and Global Health, Big Data Institute, Oxford University, Oxford, UK. Electronic address:

    Background: Antimalarial resistance is rapidly spreading across parts of southeast Asia where dihydroartemisinin-piperaquine is used as first-line treatment for Plasmodium falciparum malaria. The first published reports about resistance to antimalarial drugs came from western Cambodia in 2013. Here, we analyse genetic changes in the P falciparum population of western Cambodia in the 6 years before those reports.

    Methods: We analysed genome sequence data on 1492 P falciparum samples from 11 locations across southeast Asia, including 464 samples collected in western Cambodia between 2007 and 2013. Different epidemiological origins of resistance were identified by haplotypic analysis of the kelch13 artemisinin resistance locus and the plasmepsin 2-3 piperaquine resistance locus.

    Findings: We identified more than 30 independent origins of artemisinin resistance, of which the KEL1 lineage accounted for 140 (91%) of 154 parasites resistant to dihydroartemisinin-piperaquine. In 2008, KEL1 combined with PLA1, the major lineage associated with piperaquine resistance. By 2013, the KEL1/PLA1 co-lineage had reached a frequency of 63% (24/38) in western Cambodia and had spread to northern Cambodia.

    Interpretation: The KEL1/PLA1 co-lineage emerged in the same year that dihydroartemisinin-piperaquine became the first-line antimalarial drug in western Cambodia and spread rapidly thereafter, displacing other artemisinin-resistant parasite lineages. These findings have important implications for management of the global health risk associated with the current outbreak of multidrug-resistant malaria in southeast Asia.

    Funding: Wellcome Trust, Bill & Melinda Gates Foundation, Medical Research Council, UK Department for International Development, and the Intramural Research Program of the National Institute of Allergy and Infectious Diseases.

    Funded by: Medical Research Council: G0600718, MR/M006212/1; Wellcome Trust: 090770, 098051, 204911, 206194

    The Lancet. Infectious diseases 2018;18;3;337-345

  • Rearrangement bursts generate canonical gene fusions in bone and soft tissue tumors.

    Anderson ND, de Borja R, Young MD, Fuligni F, Rosic A, Roberts ND, Hajjar S, Layeghifard M, Novokmet A, Kowalski PE, Anaka M, Davidson S, Zarrei M, Id Said B, Schreiner LC, Marchand R, Sitter J, Gokgoz N, Brunga L, Graham GT, Fullam A, Pillay N, Toretsky JA, Yoshida A, Shibata T, Metzler M, Somers GR, Scherer SW, Flanagan AM, Campbell PJ, Schiffman JD, Shago M, Alexandrov LB, Wunder JS, Andrulis IL, Malkin D, Behjati S and Shlien A

    Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada.

    Sarcomas are cancers of the bone and soft tissue often defined by gene fusions. Ewing sarcoma involves fusions between <i>EWSR1</i>, a gene encoding an RNA binding protein, and E26 transformation-specific (ETS) transcription factors. We explored how and when <i>EWSR1-ETS</i> fusions arise by studying the whole genomes of Ewing sarcomas. In 52 of 124 (42%) of tumors, the fusion gene arises by a sudden burst of complex, loop-like rearrangements, a process called chromoplexy, rather than by simple reciprocal translocations. These loops always contained the disease-defining fusion at the center, but they disrupted multiple additional genes. The loops occurred preferentially in early replicating and transcriptionally active genomic regions. Similar loops forming canonical fusions were found in three other sarcoma types. Chromoplexy-generated fusions appear to be associated with an aggressive form of Ewing sarcoma. These loops arise early, giving rise to both primary and relapse Ewing sarcoma tumors, which can continue to evolve in parallel.

    Funded by: Wellcome Trust: 110104

    Science (New York, N.Y.) 2018;361;6405

  • False signals induced by single-cell imputation.

    Andrews TS and Hemberg M

    Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK.

    <b>Background:</b> Single-cell RNA-seq is a powerful tool for measuring gene expression at the resolution of individual cells.  A challenge in the analysis of this data is the large amount of zero values, representing either missing data or no expression. Several imputation approaches have been proposed to address this issue, but they generally rely on structure inherent to the dataset under consideration they may not provide any additional information, hence, are limited by the information contained therein and the validity of their assumptions. <b>Methods:</b> We evaluated the risk of generating false positive or irreproducible differential expression when imputing data with six different methods. We applied each method to a variety of simulated datasets as well as to permuted real single-cell RNA-seq datasets and consider the number of false positive gene-gene correlations and differentially expressed genes. Using matched 10X and Smart-seq2 data we examined whether cell-type specific markers were reproducible across datasets derived from the same tissue before and after imputation. <b>Results:</b> The extent of false-positives introduced by imputation varied considerably by method. Data smoothing based methods, MAGIC, knn-smooth and dca, generated many false-positives in both real and simulated data. Model-based imputation methods typically generated fewer false-positives but this varied greatly depending on the diversity of cell-types in the sample. All imputation methods decreased the reproducibility of cell-type specific markers, although this could be mitigated by selecting markers with large effect size and significance. <b>Conclusions:</b> Imputation of single-cell RNA-seq data introduces circularity that can generate false-positive results. Thus, statistical tests applied to imputed data should be treated with care. Additional filtering by effect size can reduce but not fully eliminate these effects. Of the methods we considered, SAVER was the least likely to generate false or irreproducible results, thus should be favoured over alternatives if imputation is necessary.

    F1000Research 2018;7;1740

  • Identifying cell populations with scRNASeq.

    Andrews TS and Hemberg M

    Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK.

    Single-cell RNASeq (scRNASeq) has emerged as a powerful method for quantifying the transcriptome of individual cells. However, the data from scRNASeq experiments is often both noisy and high dimensional, making the computational analysis non-trivial. Here we provide an overview of different experimental protocols and the most popular methods for facilitating the computational analysis. We focus on approaches for identifying biologically important genes, projecting data into lower dimensions and clustering data into putative cell-populations. Finally we discuss approaches to validation and biological interpretation of the identified cell-types or cell-states.

    Molecular aspects of medicine 2018;59;114-122

  • Demographic History and Genetic Adaptation in the Himalayan Region Inferred from Genome-Wide SNP Genotypes of 49 Populations.

    Arciero E, Kraaijenbrink T, Asan, Haber M, Mezzavilla M, Ayub Q, Wang W, Pingcuo Z, Yang H, Wang J, Jobling MA, van Driem G, Xue Y, de Knijff P and Tyler-Smith C

    The Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom.

    We genotyped 738 individuals belonging to 49 populations from Nepal, Bhutan, North India, or Tibet at over 500,000 SNPs, and analyzed the genotypes in the context of available worldwide population data in order to investigate the demographic history of the region and the genetic adaptations to the harsh environment. The Himalayan populations resembled other South and East Asians, but in addition displayed their own specific ancestral component and showed strong population structure and genetic drift. We also found evidence for multiple admixture events involving Himalayan populations and South/East Asians between 200 and 2,000 years ago. In comparisons with available ancient genomes, the Himalayans, like other East and South Asian populations, showed similar genetic affinity to Eurasian hunter-gatherers (a 24,000-year-old Upper Palaeolithic Siberian), and the related Bronze Age Yamnaya. The high-altitude Himalayan populations all shared a specific ancestral component, suggesting that genetic adaptation to life at high altitude originated only once in this region and subsequently spread. Combining four approaches to identifying specific positively selected loci, we confirmed that the strongest signals of high-altitude adaptation were located near the Endothelial PAS domain-containing protein 1 and Egl-9 Family Hypoxia Inducible Factor 1 loci, and discovered eight additional robust signals of high-altitude adaptation, five of which have strong biological functional links to such adaptation. In conclusion, the demographic history of Himalayan populations is complex, with strong local differentiation, reflecting both genetic and cultural factors; these populations also display evidence of multiple genetic adaptations to high-altitude environments.

    Funded by: Wellcome Trust: 087576, 098051

    Molecular biology and evolution 2018;35;8;1916-1933

  • Multi-Omics Factor Analysis-a framework for unsupervised integration of multi-omics data sets.

    Argelaguet R, Velten B, Arnol D, Dietrich S, Zenz T, Marioni JC, Buettner F, Huber W and Stegle O

    European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK.

    Multi-omics studies promise the improved characterization of biological processes across molecular layers. However, methods for the unsupervised integration of the resulting heterogeneous data sets are lacking. We present Multi-Omics Factor Analysis (MOFA), a computational method for discovering the principal sources of variation in multi-omics data sets. MOFA infers a set of (hidden) factors that capture biological and technical sources of variability. It disentangles axes of heterogeneity that are shared across multiple modalities and those specific to individual data modalities. The learnt factors enable a variety of downstream analyses, including identification of sample subgroups, data imputation and the detection of outlier samples. We applied MOFA to a cohort of 200 patient samples of chronic lymphocytic leukaemia, profiled for somatic mutations, RNA expression, DNA methylation and <i>ex vivo</i> drug responses. MOFA identified major dimensions of disease heterogeneity, including immunoglobulin heavy-chain variable region status, trisomy of chromosome 12 and previously underappreciated drivers, such as response to oxidative stress. In a second application, we used MOFA to analyse single-cell multi-omics data, identifying coordinated transcriptional and epigenetic changes along cell differentiation.

    Funded by: Medical Research Council: MR/M01536X/1

    Molecular systems biology 2018;14;6;e8124

  • Genome-wide interaction study of a proxy for stress-sensitivity and its prediction of major depressive disorder.

    Arnau-Soler A, Adams MJ, Generation Scotland, Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium, Hayward C and Thomson PA

    Medical Genetics Section, Centre for Genomic and Experimental Medicine, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, United Kingdom.

    Individual response to stress is correlated with neuroticism and is an important predictor of both neuroticism and the onset of major depressive disorder (MDD). Identification of the genetics underpinning individual differences in response to negative events (stress-sensitivity) may improve our understanding of the molecular pathways involved, and its association with stress-related illnesses. We sought to generate a proxy for stress-sensitivity through modelling the interaction between SNP allele and MDD status on neuroticism score in order to identify genetic variants that contribute to the higher neuroticism seen in individuals with a lifetime diagnosis of depression compared to unaffected individuals. Meta-analysis of genome-wide interaction studies (GWIS) in UK Biobank (N = 23,092) and Generation Scotland: Scottish Family Health Study (N = 7,155) identified no genome-wide significance SNP interactions. However, gene-based tests identified a genome-wide significant gene, ZNF366, a negative regulator of glucocorticoid receptor function implicated in alcohol dependence (p = 1.48x10-7; Bonferroni-corrected significance threshold p < 2.79x10-6). Using summary statistics from the stress-sensitivity term of the GWIS, SNP heritability for stress-sensitivity was estimated at 5.0%. In models fitting polygenic risk scores of both MDD and neuroticism derived from independent GWAS, we show that polygenic risk scores derived from the UK Biobank stress-sensitivity GWIS significantly improved the prediction of MDD in Generation Scotland. This study may improve interpretation of larger genome-wide association studies of MDD and other stress-related illnesses, and the understanding of the etiological mechanisms underpinning stress-sensitivity.

    Funded by: Chief Scientist Office: CZD/16/6; Medical Research Council: G0200243, MC_PC_17228, MC_QA137853, MC_UU_00007/10, MR/K007017/1, MR/L010305/1, MR/N015746/1; NIMH NIH HHS: U01 MH109528, U01 MH109532, U01 MH109536; Wellcome Trust: 104036/Z/14/Z

    PloS one 2018;13;12;e0209160

  • mlplasmids: a user-friendly tool to predict plasmid- and chromosome-derived sequences for single species.

    Arredondo-Alonso S, Rogers MRC, Braat JC, Verschuuren TD, Top J, Corander J, Willems RJL and Schürch AC

    1​Department of Medical Microbiology, University Medical Center Utrecht, Utrecht, The Netherlands.

    Assembly of bacterial short-read whole-genome sequencing data frequently results in hundreds of contigs for which the origin, plasmid or chromosome, is unclear. Complete genomes resolved by long-read sequencing can be used to generate and label short-read contigs. These were used to train several popular machine learning methods to classify the origin of contigs from Enterococcus faecium, Klebsiella pneumoniae and Escherichia coli using pentamer frequencies. We selected support-vector machine (SVM) models as the best classifier for all three bacterial species (F1-score E. faecium=0.92, F1-score K. pneumoniae=0.90, F1-score E. coli=0.76), which outperformed other existing plasmid prediction tools using a benchmarking set of isolates. We demonstrated the scalability of our models by accurately predicting the plasmidome of a large collection of 1644 E. faecium isolates and illustrate its applicability by predicting the location of antibiotic-resistance genes in all three species. The SVM classifiers are publicly available as an R package and graphical-user interface called 'mlplasmids'. We anticipate that this tool may significantly facilitate research on the dissemination of plasmids encoding antibiotic resistance and/or contributing to host adaptation.

    Microbial genomics 2018;4;11

  • Streptococcus suis contains multiple phase-variable methyltransferases that show a discrete lineage distribution.

    Atack JM, Weinert LA, Tucker AW, Husna AU, Wileman TM, F Hadjirin N, Hoa NT, Parkhill J, Maskell DJ, Blackall PJ and Jennings MP

    Institute for Glycomics, Griffith University, Gold Coast, Queensland 4222, Australia.

    Streptococcus suis is a major pathogen of swine, responsible for a number of chronic and acute infections, and is also emerging as a major zoonotic pathogen, particularly in South-East Asia. Our study of a diverse population of S. suis shows that this organism contains both Type I and Type III phase-variable methyltransferases. In all previous examples, phase-variation of methyltransferases results in genome wide methylation differences, and results in differential regulation of multiple genes, a system known as the phasevarion (phase-variable regulon). We hypothesized that each variant in the Type I and Type III systems encoded a methyltransferase with a unique specificity, and could therefore control a distinct phasevarion, either by recombination-driven shuffling between different specificities (Type I) or by biphasic on-off switching via simple sequence repeats (Type III). Here, we present the identification of the target specificities for each Type III allelic variant from S. suis using single-molecule, real-time methylome analysis. We demonstrate phase-variation is occurring in both Type I and Type III methyltransferases, and show a distinct association between methyltransferase type and presence, and population clades. In addition, we show that the phase-variable Type I methyltransferase was likely acquired at the origin of a highly virulent zoonotic sub-population.

    Funded by: Wellcome Trust: 109385/Z/15/Z

    Nucleic acids research 2018;46;21;11466-11476

  • Genomic analysis of a pre-elimination Malaysian Plasmodium vivax population reveals selective pressures and changing transmission dynamics.

    Auburn S, Benavente ED, Miotto O, Pearson RD, Amato R, Grigg MJ, Barber BE, William T, Handayuni I, Marfurt J, Trimarsanto H, Noviyanti R, Sriprawat K, Nosten F, Campino S, Clark TG, Anstey NM, Kwiatkowski DP and Price RN

    Global and Tropical Health Division, Menzies School of Health Research and Charles Darwin University, Darwin, NT, 0811, Australia.

    The incidence of Plasmodium vivax infection has declined markedly in Malaysia over the past decade despite evidence of high-grade chloroquine resistance. Here we investigate the genetic changes in a P. vivax population approaching elimination in 51 isolates from Sabah, Malaysia and compare these with data from 104 isolates from Thailand and 104 isolates from Indonesia. Sabah displays extensive population structure, mirroring that previously seen with the emergence of artemisinin-resistant P. falciparum founder populations in Cambodia. Fifty-four percent of the Sabah isolates have identical genomes, consistent with a rapid clonal expansion. Across Sabah, there is a high prevalence of loci known to be associated with antimalarial drug resistance. Measures of differentiation between the three countries reveal several gene regions under putative selection in Sabah. Our findings highlight important factors pertinent to parasite resurgence and molecular cues that can be used to monitor low-endemic populations at the end stages of P. vivax elimination.

    Funded by: Bill and Melinda Gates Foundation: OPP1164105; Department of Health | National Health and Medical Research Council (NHMRC): 1037304, 1042072, 1045156, 1074795, 1088738, 1131932, 1135820; Medical Research Council: MR/M006212/1; Medical Research Council (MRC): M006212, MC_PC_15103, MR/K000551/1, MR/M01360X/1, MR/N010469/1; Wellcome Trust: 200909, 204911, 204911/Z/16/Z, 206194

    Nature communications 2018;9;1;2585

  • The impact of serotype-specific vaccination on phylodynamic parameters of Streptococcus pneumoniae and the pneumococcal pan-genome.

    Azarian T, Grant LR, Arnold BJ, Hammitt LL, Reid R, Santosham M, Weatherholtz R, Goklish N, Thompson CM, Bentley SD, O'Brien KL, Hanage WP and Lipsitch M

    Center for Communicable Disease Dynamics, Department of Epidemiology, T.H. Chan School of Public Health, Harvard University; Cambridge, Massachusetts, United States of America.

    In the United States, the introduction of the heptavalent pneumococcal conjugate vaccine (PCV) largely eliminated vaccine serotypes (VT); non-vaccine serotypes (NVT) subsequently increased in carriage and disease. Vaccination also disrupts the composition of the pneumococcal pangenome, which includes mobile genetic elements and polymorphic non-capsular antigens important for virulence, transmission, and pneumococcal ecology. Antigenic proteins are of interest for future vaccines; yet, little is known about how the they are affected by PCV use. To investigate the evolutionary impact of vaccination, we assessed recombination, evolution, and pathogen demographic history of 937 pneumococci collected from 1998-2012 among Navajo and White Mountain Apache Native American communities. We analyzed changes in the pneumococcal pangenome, focusing on metabolic loci and 19 polymorphic protein antigens. We found the impact of PCV on the pneumococcal population could be observed in reduced diversity, a smaller pangenome, and changing frequencies of accessory clusters of orthologous groups (COGs). Post-PCV7, diversity rebounded through clonal expansion of NVT lineages and inferred in-migration of two previously unobserved lineages. Accessory COGs frequencies trended toward pre-PCV7 values with increasing time since vaccine introduction. Contemporary frequencies of protein antigen variants are better predicted by pre-PCV7 values (1998-2000) than the preceding period (2006-2008), suggesting balancing selection may have acted in maintaining variant frequencies in this population. Overall, we present the largest genomic analysis of pneumococcal carriage in the United States to date, which includes a snapshot of a true vaccine-naïve community prior to the introduction of PCV7. These data improve our understanding of pneumococcal evolution and emphasize the need to consider pangenome composition when inferring the impact of vaccination and developing future protein-based pneumococcal vaccines.

    Funded by: NIAID NIH HHS: R01 AI048935

    PLoS pathogens 2018;14;4;e1006966

  • Global emergence and population dynamics of divergent serotype 3 CC180 pneumococci.

    Azarian T, Mitchell PK, Georgieva M, Thompson CM, Ghouila A, Pollard AJ, von Gottberg A, du Plessis M, Antonio M, Kwambana-Adams BA, Clarke SC, Everett D, Cornick J, Sadowy E, Hryniewicz W, Skoczynska A, Moïsi JC, McGee L, Beall B, Metcalf BJ, Breiman RF, Ho PL, Reid R, O'Brien KL, Gladstone RA, Bentley SD and Hanage WP

    Center for Communicable Disease Dynamics, Department of Epidemiology, T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America.

    Streptococcus pneumoniae serotype 3 remains a significant cause of morbidity and mortality worldwide, despite inclusion in the 13-valent pneumococcal conjugate vaccine (PCV13). Serotype 3 increased in carriage since the implementation of PCV13 in the USA, while invasive disease rates remain unchanged. We investigated the persistence of serotype 3 in carriage and disease, through genomic analyses of a global sample of 301 serotype 3 isolates of the Netherlands3-31 (PMEN31) clone CC180, combined with associated patient data and PCV utilization among countries of isolate collection. We assessed phenotypic variation between dominant clades in capsule charge (zeta potential), capsular polysaccharide shedding, and susceptibility to opsonophagocytic killing, which have previously been associated with carriage duration, invasiveness, and vaccine escape. We identified a recent shift in the CC180 population attributed to a lineage termed Clade II, which was estimated by Bayesian coalescent analysis to have first appeared in 1968 [95% HPD: 1939-1989] and increased in prevalence and effective population size thereafter. Clade II isolates are divergent from the pre-PCV13 serotype 3 population in non-capsular antigenic composition, competence, and antibiotic susceptibility, the last of which resulting from the acquisition of a Tn916-like conjugative transposon. Differences in recombination rates among clades correlated with variations in the ATP-binding subunit of Clp protease, as well as amino acid substitutions in the comCDE operon. Opsonophagocytic killing assays elucidated the low observed efficacy of PCV13 against serotype 3. Variation in PCV13 use among sampled countries was not independently correlated with the CC180 population shift; therefore, genotypic and phenotypic differences in protein antigens and, in particular, antibiotic resistance may have contributed to the increase of Clade II. Our analysis emphasizes the need for routine, representative sampling of isolates from disperse geographic regions, including historically under-sampled areas. We also highlight the value of genomics in resolving antigenic and epidemiological variations within a serotype, which may have implications for future vaccine development.

    Funded by: NIAID NIH HHS: R01 AI106786; Wellcome Trust

    PLoS pathogens 2018;14;11;e1007438

  • Complete avian malaria parasite genomes reveal features associated with lineage-specific evolution in birds and mammals.

    Böhme U, Otto TD, Cotton JA, Steinbiss S, Sanders M, Oyola SO, Nicot A, Gandon S, Patra KP, Herd C, Bushell E, Modrzynska KK, Billker O, Vinetz JM, Rivero A, Newbold CI and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom.

    Avian malaria parasites are prevalent around the world and infect a wide diversity of bird species. Here, we report the sequencing and analysis of high-quality draft genome sequences for two avian malaria species, <i>Plasmodium relictum</i> and <i>Plasmodium gallinaceum</i> We identify 50 genes that are specific to avian malaria, located in an otherwise conserved core of the genome that shares gene synteny with all other sequenced malaria genomes. Phylogenetic analysis suggests that the avian malaria species form an outgroup to the mammalian <i>Plasmodium</i> species, and using amino acid divergence between species, we estimate the avian- and mammalian-infective lineages diverged in the order of 10 million years ago. Consistent with their phylogenetic position, we identify orthologs of genes that had previously appeared to be restricted to the clades of parasites containing <i>Plasmodium falciparum</i> and <i>Plasmodium vivax</i>, the species with the greatest impact on human health. From these orthologs, we explore differential diversifying selection across the genus and show that the avian lineage is remarkable in the extent to which invasion-related genes are evolving. The subtelomeres of the <i>P. relictum</i> and <i>P. gallinaceum</i> genomes contain several novel gene families, including an expanded <i>surf</i> multigene family. We also identify an expansion of reticulocyte binding protein homologs in <i>P. relictum</i>, and within these proteins, we detect distinct regions that are specific to nonhuman primate, humans, rodent, and avian hosts. For the first time in the <i>Plasmodium</i> lineage, we find evidence of transposable elements, including several hundred fragments of LTR-retrotransposons in both species and an apparently complete LTR-retrotransposon in the genome of <i>P. gallinaceum</i>.

    Funded by: Wellcome Trust: 206194, 104792/Z/14/Z, WT099198MA

    Genome research 2018;28;4;547-560

  • A test metric for assessing single-cell RNA-seq batch correction.

    Büttner M, Miao Z, Wolf FA, Teichmann SA and Theis FJ

    Helmholtz Zentrum München-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.

    Single-cell transcriptomics is a versatile tool for exploring heterogeneous cell populations, but as with all genomics experiments, batch effects can hamper data integration and interpretation. The success of batch-effect correction is often evaluated by visual inspection of low-dimensional embeddings, which are inherently imprecise. Here we present a user-friendly, robust and sensitive k-nearest-neighbor batch-effect test (kBET; ) for quantification of batch effects. We used kBET to assess commonly used batch-regression and normalization approaches, and to quantify the extent to which they remove batch effects while preserving biological variability. We also demonstrate the application of kBET to data from peripheral blood mononuclear cells (PBMCs) from healthy donors to distinguish cell-type-specific inter-individual variability from changes in relative proportions of cell populations. This has important implications for future data-integration efforts, central to projects such as the Human Cell Atlas.

    Nature methods 2018;16;1;43-49

  • A synthesis approach of mouse studies to identify genes and proteins in arterial thrombosis and bleeding.

    Baaten CCFMJ, Meacham S, de Witt SM, Feijge MAH, Adams DJ, Akkerman JN, Cosemans JMEM, Grassi L, Jupe S, Kostadima M, Mattheij NJA, Prins MH, Ramirez-Solis R, Soehnlein O, Swieringa F, Weber C, White JK, Ouwehand WH and Heemskerk JWM

    Department of Biochemistry, Cardiovascular Research Institute Maastricht (CARIM), Maastricht University, Maastricht, The Netherlands.

    Antithrombotic therapies reduce cardiovascular diseases by preventing arterial thrombosis and thromboembolism, but at expense of increased bleeding risks. Arterial thrombosis studies using genetically modified mice have been invaluable for identification of new molecular targets. Because of low sample sizes and heterogeneity in approaches or methodologies, a formal meta-analysis to compare studies of mice with single-gene defects encountered major limitations. To overcome these, we developed a novel synthesis approach to quantitatively scale 1514 published studies of arterial thrombus formation (in vivo and in vitro), thromboembolism, and tail-bleeding of genetically modified mice. Using a newly defined consistency parameter (CP), indicating the strength of published data, comparisons were made of 431 mouse genes, of which 17 consistently contributed to thrombus formation without affecting hemostasis. Ranking analysis indicated high correlations between collagen-dependent thrombosis models in vivo (FeCl<sub>3</sub> injury or ligation/compression) and in vitro. Integration of scores and CP values resulted in a network of protein interactions in thrombosis and hemostasis (PITH), which was combined with databases of genetically linked human bleeding and thrombotic disorders. The network contained 2946 nodes linked to modifying genes of thrombus formation, mostly with expression in megakaryocytes. Reactome pathway analysis and network characteristics revealed multiple novel genes with potential contribution to thrombosis/hemostasis. Studies with additional knockout mice revealed that 4 of 8 (<i>Apoe</i>, <i>Fpr2</i>, <i>Ifnar1</i>, <i>Vps13a</i>) new genes were modifying in thrombus formation. The PITH network further: (i) revealed a high similarity of murine and human hemostatic and thrombotic processes and (ii) identified multiple new candidate proteins regulating these processes.

    Funded by: British Heart Foundation: RG/09/12/28096; NHGRI NIH HHS: U41 HG003751

    Blood 2018;132;24;e35-e46

  • Shared activity patterns arising at genetic susceptibility loci reveal underlying genomic and cellular architecture of human disease.

    Baillie JK, Bretherick A, Haley CS, Clohisey S, Gray A, Neyton LPA, Barrett J, Stahl EA, Tenesa A, Andersson R, Brown JB, Faulkner GJ, Lizio M, Schaefer U, Daub C, Itoh M, Kondo N, Lassmann T, Kawai J, IIBDGC Consortium, Mole D, Bajic VB, Heutink P, Rehli M, Kawaji H, Sandelin A, Suzuki H, Satsangi J, Wells CA, Hacohen N, Freeman TC, Hayashizaki Y, Carninci P, Forrest ARR and Hume DA

    Division of Genetics and Genomics, The Roslin Institute, University of Edinburgh, Edinburgh, United Kingdom.

    Genetic variants underlying complex traits, including disease susceptibility, are enriched within the transcriptional regulatory elements, promoters and enhancers. There is emerging evidence that regulatory elements associated with particular traits or diseases share similar patterns of transcriptional activity. Accordingly, shared transcriptional activity (coexpression) may help prioritise loci associated with a given trait, and help to identify underlying biological processes. Using cap analysis of gene expression (CAGE) profiles of promoter- and enhancer-derived RNAs across 1824 human samples, we have analysed coexpression of RNAs originating from trait-associated regulatory regions using a novel quantitative method (network density analysis; NDA). For most traits studied, phenotype-associated variants in regulatory regions were linked to tightly-coexpressed networks that are likely to share important functional characteristics. Coexpression provides a new signal, independent of phenotype association, to enable fine mapping of causative variants. The NDA coexpression approach identifies new genetic variants associated with specific traits, including an association between the regulation of the OCT1 cation transporter and genetic variants underlying circulating cholesterol levels. NDA strongly implicates particular cell types and tissues in disease pathogenesis. For example, distinct groupings of disease-associated regulatory regions implicate two distinct biological processes in the pathogenesis of ulcerative colitis; a further two separate processes are implicated in Crohn's disease. Thus, our functional analysis of genetic predisposition to disease defines new distinct disease endotypes. We predict that patients with a preponderance of susceptibility variants in each group are likely to respond differently to pharmacological therapy. Together, these findings enable a deeper biological understanding of the causal basis of complex traits.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/I001107/1, BBS/E/D/20211551, BBS/E/D/20211552, BBS/E/D/20211553, BBS/E/D/20211554; Medical Research Council: MC_PC_U127592696, MC_UU_00007/10, MR/P008887/1; NHGRI NIH HHS: R00 HG006698

    PLoS computational biology 2018;14;3;e1005934

  • Genomic epidemiology of Shigella in the United Kingdom shows transmission of pathogen sublineages and determinants of antimicrobial resistance.

    Baker KS, Dallman TJ, Field N, Childs T, Mitchell H, Day M, Weill FX, Lefèvre S, Tourdjman M, Hughes G, Jenkins C and Thomson N

    Institute for Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, United Kingdom.

    Shigella are globally important diarrhoeal pathogens that are endemic in low-to-middle income nations and also occur in high income nations, typically in travellers or community-based risk-groups. Shigella phylogenetics reveals population structures that are more reliable than those built with traditional typing methods, and has identified sublineages associated with specific geographical regions or patient groups. Genomic analyses reveal temporal increases in Shigella antimicrobial resistance (AMR) gene content, which is frequently encoded on mobile genetic elements. Here, we whole genome sequenced representative subsamples of S. flexneri 2a and S. sonnei (n = 366) from the United Kingdom from 2008 to 2014, and analysed these alongside publicly available data to make qualitative insights on the genomic epidemiology of shigellosis and its AMR within the broader global context. Combined phylogenetic, epidemiological and genomic anlayses revealed the presence of domestically-circulating sublineages in patient risk-groups and the importation of travel-related sublineages from both Africa and Asia, including ciprofloxacin-resistant sublineages of both species from Asia. Genomic analyses revealed common AMR determinants among travel-related and domestically-acquired isolates, and the evolution of mutations associated with reduced quinolone susceptibility in domestically-circulating sublineages. Collectively, this study provides unprecedented insights on the contribution and mobility of endemic and travel-imported sublineages and AMR determinants responsible for disease in a high-income nation.

    Funded by: Wellcome Trust: 106690/A/14/Z, 206194

    Scientific reports 2018;8;1;7389

  • Horizontal antimicrobial resistance transfer drives epidemics of multiple Shigella species.

    Baker KS, Dallman TJ, Field N, Childs T, Mitchell H, Day M, Weill FX, Lefèvre S, Tourdjman M, Hughes G, Jenkins C and Thomson N

    Institute for Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK.

    Horizontal gene transfer has played a role in developing the global public health crisis of antimicrobial resistance (AMR). However, the dynamics of AMR transfer through bacterial populations and its direct impact on human disease is poorly elucidated. Here, we study parallel epidemic emergences of multiple Shigella species, a priority AMR organism, in men who have sex with men to gain insight into AMR emergence and spread. Using genomic epidemiology, we show that repeated horizontal transfer of a single AMR plasmid among Shigella enhanced existing and facilitated new epidemics. These epidemic patterns contrasted with slighter, slower increases in disease caused by organisms with vertically inherited (chromosomally encoded) AMR. This demonstrates that horizontal transfer of AMR directly affects epidemiological outcomes of globally important AMR pathogens and highlights the need for integration of genomic analyses into all areas of AMR research, surveillance and management.

    Nature communications 2018;9;1;1462

  • An outbreak of a rare Shiga-toxin-producing Escherichia coli serotype (O117:H7) among men who have sex with men.

    Baker KS, Dallman TJ, Thomson NR and Jenkins C

    1​Institute for Integrative Biology, University of Liverpool, Liverpool, UK.

    Sexually transmissible enteric infections (STEIs) are commonly associated with transmission among men who have sex with men (MSM). In the past decade, the UK has experienced multiple parallel STEI emergences in MSM caused by a range of bacterial species of the genus Shigella, and an outbreak of an uncommon serotype (O117 : H7) of Shiga-toxin-producing Escherichia coli (STEC). Here, we used microbial genomics on 6 outbreak and 30 sporadic STEC O117 : H7 isolates to explore the origins and pathogenic drivers of the STEC O117 : H7 emergence in MSM. Using genomic epidemiology, we found that the STEC O117 : H7 outbreak lineage was potentially imported from Latin America and likely continues to circulate both in the UK MSM population and in Latin America. We found genomic relationships consistent with existing symptomatic evidence for chronic infection with this STEC serotype. Comparative genomic analysis indicated the existence of a novel Shiga toxin 1-encoding prophage in the outbreak isolates, and evidence of horizontal gene exchange among the STEC O117 : H7 outbreak lineage and other enteric pathogens. There was no evidence of increased virulence in the outbreak strains relative to contextual isolates, but the outbreak lineage was associated with azithromycin resistance. Comparing these findings with similar genomic investigations of emerging MSM-associated Shigella in the UK highlighted many parallels, the most striking of which was the importance of the azithromycin phenotype for STEI emergence in this patient group.

    Funded by: Wellcome Trust: 106690/A/14/Z

    Microbial genomics 2018;4;7

  • Genomic insights into the emergence and spread of antimicrobial-resistant bacterial pathogens.

    Baker S, Thomson N, Weill FX and Holt KE

    Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam.

    Whole-genome sequencing (WGS) has been vital for revealing the rapid temporal and spatial evolution of antimicrobial resistance (AMR) in bacterial pathogens. Some antimicrobial-resistant pathogens have outpaced us, with untreatable infections appearing in hospitals and the community. However, WGS has additionally provided us with enough knowledge to initiate countermeasures. Although we cannot stop bacterial adaptation, the predictability of many evolutionary processes in AMR bacteria offers us an opportunity to channel them using new control strategies. Furthermore, by using WGS for coordinating surveillance and to create a more fundamental understanding of the outcome of antimicrobial treatment and AMR mechanisms, we can use current and future antimicrobials more effectively and aim to extend their longevity.

    Funded by: Wellcome Trust

    Science (New York, N.Y.) 2018;360;6390;733-738

  • Targeting of NAT10 enhances healthspan in a mouse model of human accelerated aging syndrome.

    Balmus G, Larrieu D, Barros AC, Collins C, Abrudan M, Demir M, Geisler NJ, Lelliott CJ, White JK, Karp NA, Atkinson J, Kirton A, Jacobsen M, Clift D, Rodriguez R, Sanger Mouse Genetics Project, Adams DJ and Jackson SP

    The Wellcome Trust/Cancer Research UK Gurdon Institute and Department of Biochemistry, University of Cambridge, Cambridge, CB2 1QN, UK.

    Hutchinson-Gilford Progeria Syndrome (HGPS) is a rare, but devastating genetic disease characterized by segmental premature aging, with cardiovascular disease being the main cause of death. Cells from HGPS patients accumulate progerin, a permanently farnesylated, toxic form of Lamin A, disrupting the nuclear shape and chromatin organization, leading to DNA-damage accumulation and senescence. Therapeutic approaches targeting farnesylation or aiming to reduce progerin levels have provided only partial health improvements. Recently, we identified Remodelin, a small-molecule agent that leads to amelioration of HGPS cellular defects through inhibition of the enzyme N-acetyltransferase 10 (NAT10). Here, we show the preclinical data demonstrating that targeting NAT10 in vivo, either via chemical inhibition or genetic depletion, significantly enhances the healthspan in a Lmna <sup>G609G</sup> HGPS mouse model. Collectively, the data provided here highlights NAT10 as a potential therapeutic target for HGPS.

    Funded by: Medical Research Council: MC_U105181010, MR/L019116/1; Wellcome Trust

    Nature communications 2018;9;1;1700

  • Sphingolipid dysregulation due to lack of functional KDSR impairs proplatelet formation causing thrombocytopenia.

    Bariana TK, Labarque V, Heremans J, Thys C, De Reys M, Greene D, Jenkins B, Grassi L, Seyres D, Burden F, Whitehorn D, Shamardina O, Papadia S, Gomez K, NIHR BioResource, Van Geet C, Koulman A, Ouwehand WH, Ghevaert C, Frontini M, Turro E and Freson K

    University College London.

    Sphingolipids are fundamental to membrane trafficking, apoptosis and cell differentiation and proliferation. KDSR or 3-keto-dihydrosphingosine reductase is an essential enzyme for de novo sphingolipid synthesis, and pathogenic mutations in KDSR result in the severe skin disorder erythrokeratodermia variabilis et progressiva-4. Four of the eight reported cases also had thrombocytopenia but the underlying mechanism has remained unexplored. Here we expand upon the phenotypic spectrum of KDSR deficiency with studies in two siblings with novel compound heterozygous variants associated with thrombocytopenia, anemia and minimal skin involvement. We report a novel phenotype of progressive juvenile myelofibrosis in the propositus, with spontaneous recovery of anemia and thrombocytopenia in the first decade of life. Examination of bone marrow biopsies showed megakaryocyte hyperproliferation and dysplasia. Megakaryocytes obtained by culture of CD34+ stem cells confirmed hyperproliferation and showed reduced proplatelet formation. The effect of KDSR insufficiency on the sphingolipid profile was unknown, and was explored in vivo and in vitro by a broad metabolomics screen that indicated activation of an in vivo compensatory pathway that leads to normalisation of downstream metabolites such as ceramide. Differentiation of propositus-derived induced pluripotent stem cells to megakaryocytes followed by expression of functional KDSR showed correction of the aberrant cellular and biochemical phenotypes, corroborating the critical role of KDSR in proplatelet formation. Finally, Kdsr depletion in zebrafish recapitulated the thrombocytopenia and showed biochemical changes similar to those observed in the affected siblings. These studies support an important role for sphingolipids as regulators of cytoskeletal organisation during megakaryopoiesis and proplatelet formation.

    Haematologica 2018

  • Objective measurement of physical activity: improving the evidence base to address non-communicable diseases in Africa.

    Barr AL, Young EH and Sandhu MS

    Department of Medicine, University of Cambridge, Cambridge, UK.

    Funded by: Medical Research Council: MR/K013491/1; Wellcome Trust

    BMJ global health 2018;3;5;e001044

  • Delineating the HMGB1 and HMGB2 interactome in prostate and ovary epithelial cells and its relationship with cancer.

    Barreiro-Alonso A, Lamas-Maceiras M, García-Díaz R, Rodríguez-Belmonte E, Yu L, Pardo M, Choudhary JS and Cerdán ME

    EXPRELA Group, Centro de Investigacións Científicas Avanzadas, Departamento de Biología, Facultade de Ciencias, INIBIC-Universidade da Coruña, Campus de A Coruña, A Coruña, 15071, Spain.

    High Mobility Group B (HMGB) proteins are involved in cancer progression and in cellular responses to platinum compounds used in the chemotherapy of prostate and ovary cancer. Here we use affinity purification coupled to mass spectrometry (MS) and yeast two-hybrid (Y2H) screening to carry out an exhaustive study of HMGB1 and HMGB2 protein interactions in the context of prostate and ovary epithelia. We present a proteomic study of HMGB1 partners based on immunoprecipitation of HMGB1 from a non-cancerous prostate epithelial cell line. In addition, HMGB1 and HMGB2 were used as baits in yeast two-hybrid screening of libraries from prostate and ovary epithelial cell lines as well as from healthy ovary tissue. HMGB1 interacts with many nuclear proteins that control gene expression, but also with proteins that form part of the cytoskeleton, cell-adhesion structures and others involved in intracellular protein translocation, cellular migration, secretion, apoptosis and cell survival. HMGB2 interacts with proteins involved in apoptosis, cell motility and cellular proliferation. High confidence interactors, based on repeated identification in different cell types or in both MS and Y2H approaches, are discussed in relation to cancer. This study represents a useful resource for detailed investigation of the role of HMGB1 in cancer of epithelial origins, as well as potential alternative avenues of therapeutic intervention.

    Funded by: Wellcome Trust

    Oncotarget 2018;9;27;19050-19064

  • ADCY3, neuronal primary cilia and obesity.

    Barroso I

    Wellcome Trust Sanger Institute, Cambridge, UK.

    Nature genetics 2018;50;2;166-167

  • Editorial overview: Molecular and genetic basis of [metabolic] disease: Genes, glucose, glycerol and girth: metabolism in our DNA.

    Barroso I and Florez JC

    Current opinion in genetics & development 2018;50;iv-vi

  • Microevolution and Patterns of Transmission of Shigella sonnei within Cyclic Outbreaks Shigellosis, Israel.

    Behar A, Baker KS, Bassal R, Ezernitchi A, Valinsky L, Thomson NR and Cohen D

    Whole-genome sequencing unveiled host and environment-related insights to Shigella sonnei transmission within cyclic epidemics during 2000-2012 in Israel. The Israeli reservoir contains isolates belonging to S. sonnei lineage III but of different origin, shows loss of tetracycline resistance genes, and little genetic variation within the O antigen: highly relevant for Shigella vaccine development.

    Funded by: Wellcome Trust: 098051, 106690/A/14/Z

    Emerging infectious diseases 2018;24;7;1335-1339

  • Mapping human development at single-cell resolution.

    Behjati S, Lindsay S, Teichmann SA and Haniffa M

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK

    Human development is regulated by spatiotemporally restricted molecular programmes and is pertinent to many areas of basic biology and human medicine, such as stem cell biology, reproductive medicine and childhood cancer. Mapping human development has presented significant technological, logistical and ethical challenges. The availability of established human developmental biorepositories and the advent of cutting-edge single-cell technologies provide new opportunities to study human development. Here, we present a working framework for the establishment of a human developmental cell atlas exploiting single-cell genomics and spatial analysis. We discuss how the development atlas will benefit the scientific and clinical communities to advance our understanding of basic biology, health and disease.

    Funded by: Medical Research Council: G0700089; Wellcome Trust: 110104/Z/15/Z

    Development (Cambridge, England) 2018;145;3

  • Single-cell transcriptomics reveals a new dynamical function of transcription factors during embryonic hematopoiesis.

    Bergiers I, Andrews T, Vargel Bölükbaşı Ö, Buness A, Janosz E, Lopez-Anguita N, Ganter K, Kosim K, Celen C, Itır Perçin G, Collier P, Baying B, Benes V, Hemberg M and Lancrin C

    European Molecular Biology Laboratory, EMBL Rome, Monterotondo, Italy.

    Recent advances in single-cell transcriptomics techniques have opened the door to the study of gene regulatory networks (GRNs) at the single-cell level. Here, we studied the GRNs controlling the emergence of hematopoietic stem and progenitor cells from mouse embryonic endothelium using a combination of single-cell transcriptome assays. We found that a heptad of transcription factors (Runx1, Gata2, Tal1, Fli1, Lyl1, Erg and Lmo2) is specifically co-expressed in an intermediate population expressing both endothelial and hematopoietic markers. Within the heptad, we identified two sets of factors of opposing functions: one (Erg/Fli1) promoting the endothelial cell fate, the other (Runx1/Gata2) promoting the hematopoietic fate. Surprisingly, our data suggest that even though Fli1 initially supports the endothelial cell fate, it acquires a pro-hematopoietic role when co-expressed with Runx1. This work demonstrates the power of single-cell RNA-sequencing for characterizing complex transcription factor dynamics.

    Funded by: Wellcome Trust

    eLife 2018;7

  • Human Genetics: Busy Subway Networks in Remote Oceania?

    Bergström A and Tyler-Smith C

    The Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK. Electronic address:

    Ancient human DNA from the Oceanian islands of Vanuatu reveals a surprisingly complex history of human settlement, featuring almost complete replacement shortly after initial colonisation, followed by mixing and a puzzling disconnect between genetic ancestry and language.

    Current biology : CB 2018;28;9;R549-R551

  • The SMAD2/3 interactome reveals that TGFβ controls m6A mRNA methylation in pluripotency.

    Bertero A, Brown S, Madrigal P, Osnato A, Ortmann D, Yiangou L, Kadiwala J, Hubner NC, de Los Mozos IR, Sadée C, Lenaerts AS, Nakanoh S, Grandy R, Farnell E, Ule J, Stunnenberg HG, Mendjan S and Vallier L

    Wellcome Trust-MRC Cambridge Stem Cell Institute, Anne McLaren Laboratory and Department of Surgery, University of Cambridge, Cambridge CB2 0SZ, UK.

    The TGFβ pathway has essential roles in embryonic development, organ homeostasis, tissue repair and disease. These diverse effects are mediated through the intracellular effectors SMAD2 and SMAD3 (hereafter SMAD2/3), whose canonical function is to control the activity of target genes by interacting with transcriptional regulators. Therefore, a complete description of the factors that interact with SMAD2/3 in a given cell type would have broad implications for many areas of cell biology. Here we describe the interactome of SMAD2/3 in human pluripotent stem cells. This analysis reveals that SMAD2/3 is involved in multiple molecular processes in addition to its role in transcription. In particular, we identify a functional interaction with the METTL3-METTL14-WTAP complex, which mediates the conversion of adenosine to N<sup>6</sup>-methyladenosine (m<sup>6</sup>A) on RNA. We show that SMAD2/3 promotes binding of the m<sup>6</sup>A methyltransferase complex to a subset of transcripts involved in early cell fate decisions. This mechanism destabilizes specific SMAD2/3 transcriptional targets, including the pluripotency factor gene NANOG, priming them for rapid downregulation upon differentiation to enable timely exit from pluripotency. Collectively, these findings reveal the mechanism by which extracellular signalling can induce rapid cellular responses through regulation of the epitranscriptome. These aspects of TGFβ signalling could have far-reaching implications in many other cell types and in diseases such as cancer.

    Funded by: European Research Council: 281335; Medical Research Council: MC_PC_12009, MC_U105185858; Wellcome Trust

    Nature 2018;555;7695;256-259

  • Conditional Manipulation of Gene Function in Human Cells with Optimized Inducible shRNA.

    Bertero A, Yiangou L, Brown S, Ortmann D, Pawlowski M and Vallier L

    Wellcome Trust-MRC Stem Cell Institute, Anne McLaren Laboratory, University of Cambridge, Cambridge, United Kingdom.

    The difficulties involved in conditionally perturbing complex gene expression networks represent major challenges toward defining the mechanisms controlling human development, physiology, and disease. We developed an OPTimized inducible KnockDown (OPTiKD) platform that addresses the limitations of previous approaches by allowing streamlined, tightly-controlled, and potent loss-of-function experiments for both single and multiple genes. The method relies on single-step genetic engineering of the AAVS1 genomic safe harbor with an optimized tetracycline-responsive cassette driving one or more inducible short hairpin RNAs (shRNAs). OPTiKD provides homogeneous, dose-responsive, and reversible gene knockdown. When implemented in human pluripotent stem cells (hPSCs), the approach can be then applied to a broad range of hPSC-derived mature cell lineages that include neurons, cardiomyocytes, and hepatocytes. Generation of OPTiKD hPSCs in commonly used culture conditions is simple (plasmid based), rapid (two weeks), and highly efficient (>95%). Overall, this method facilitates the functional annotation of the human genome in health and disease. © 2018 by John Wiley & Sons, Inc.

    Funded by: British Heart Foundation: FS/11/77/39327 ; Medical Research Council: MC_PC_12009, PSAG028; Wellcome Trust: PSAG/048

    Current protocols in stem cell biology 2018;44;5C.4.1-5C.4.48

  • Complexity and conservation of regulatory landscapes underlie evolutionary resilience of mammalian gene expression.

    Berthelot C, Villar D, Horvath JE, Odom DT and Flicek P

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

    To gain insight into how mammalian gene expression is controlled by rapidly evolving regulatory elements, we jointly analysed promoter and enhancer activity with downstream transcription levels in liver samples from 15 species. Genes associated with complex regulatory landscapes generally exhibit high expression levels that remain evolutionarily stable. While the number of regulatory elements is the key driver of transcriptional output and resilience, regulatory conservation matters: elements active across mammals most effectively stabilize gene expression. In contrast, recently evolved enhancers typically contribute weakly, consistent with their high evolutionary plasticity. These effects are observed across the entire mammalian clade and are robust to potential confounders, such as the gene expression level. Using liver as a representative somatic tissue, our results illuminate how the evolutionary stability of gene expression is profoundly entwined with both the number and conservation of surrounding promoters and enhancers.

    Funded by: European Research Council: 615584; Wellcome Trust: 108749, 202878FLICEK

    Nature ecology & evolution 2018;2;1;152-163

  • Hepatitis E in southern Vietnam: Seroepidemiology in humans and molecular epidemiology in pigs.

    Berto A, Pham HA, Thao TTN, Vy NHT, Caddy SL, Hiraide R, Tue NT, Goodfellow I, Carrique-Mas JJ, Thwaites GE, Baker S, Boni MF and VIZIONS consortium

    Wellcome Trust Major Overseas Programme, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam.

    Viral pathogens account for a significant proportion of the burden of emerging infectious diseases in humans. The Wellcome Trust-Vietnamese Initiative on Zoonotic Infections (WT-VIZIONS) is aiming to understand the circulation of viral zoonotic pathogens in animals that pose a potential risk to human health. Evidence suggests that human exposure and infections with hepatitis E virus (HEV) genotypes (GT) 3 and 4 results from zoonotic transmission. Hypothesising that HEV GT3 and GT4 are circulating in the Vietnamese pig population and can be transmitted to humans, we aimed to estimate the seroprevalence of HEV exposure in a population of farmers and the general population. We additionally performed sequence analysis of HEV in pig populations in the same region to address knowledge gaps regarding HEV circulation and to evaluate if pigs were a potential source of HEV exposure. We found a high prevalence of HEV GT3 viral RNA in pigs (19.1% in faecal samples and 8.2% in rectal swabs) and a high HEV seroprevalence in pig farmers (16.0%) and a hospital-attending population (31.7%) in southern Vietnam. The hospital population was recruited as a general-population proxy even though this particular population subgroup may introduce bias. The detection of HEV RNA in pigs indicates that HEV may be a zoonotic disease risk in this location, although a larger sample size is required to infer an association between HEV positivity in pigs and seroprevalence in humans.

    Funded by: Wellcome Trust: 093724, 097997, 097997/Z/11/Z, 098511, 098511/Z/12/Z, 100087, 100087/Z/12/Z, WT/093724

    Zoonoses and public health 2018;65;1;43-50

  • Chemical Synergy between Ionophore PBT2 and Zinc Reverses Antibiotic Resistance.

    Bohlmann L, De Oliveira DMP, El-Deeb IM, Brazel EB, Harbison-Price N, Ong CY, Rivera-Hernandez T, Ferguson SA, Cork AJ, Phan MD, Soderholm AT, Davies MR, Nimmo GR, Dougan G, Schembri MA, Cook GM, McEwan AG, von Itzstein M, McDevitt CA and Walker MJ

    School of Chemistry and Molecular Biosciences and Australian Infectious Diseases Research Centre, The University of Queensland, Brisbane, QLD, Australia.

    The World Health Organization reports that antibiotic-resistant pathogens represent an imminent global health disaster for the 21st century. Gram-positive superbugs threaten to breach last-line antibiotic treatment, and the pharmaceutical industry antibiotic development pipeline is waning. Here we report the synergy between ionophore-induced physiological stress in Gram-positive bacteria and antibiotic treatment. PBT2 is a safe-for-human-use zinc ionophore that has progressed to phase 2 clinical trials for Alzheimer's and Huntington's disease treatment. In combination with zinc, PBT2 exhibits antibacterial activity and disrupts cellular homeostasis in erythromycin-resistant group A <i>Streptococcus</i> (GAS), methicillin-resistant <i>Staphylococcus aureus</i> (MRSA), and vancomycin-resistant <i>Enterococcus</i> (VRE). We were unable to select for mutants resistant to PBT2-zinc treatment. While ineffective alone against resistant bacteria, several clinically relevant antibiotics act synergistically with PBT2-zinc to enhance killing of these Gram-positive pathogens. These data represent a new paradigm whereby disruption of bacterial metal homeostasis reverses antibiotic-resistant phenotypes in a number of priority human bacterial pathogens.<b>IMPORTANCE</b> The rise of bacterial antibiotic resistance coupled with a reduction in new antibiotic development has placed significant burdens on global health care. Resistant bacterial pathogens such as methicillin-resistant <i>Staphylococcus aureus</i> and vancomycin-resistant <i>Enterococcus</i> are leading causes of community- and hospital-acquired infection and present a significant clinical challenge. These pathogens have acquired resistance to broad classes of antimicrobials. Furthermore, <i>Streptococcus pyogenes</i>, a significant disease agent among Indigenous Australians, has now acquired resistance to several antibiotic classes. With a rise in antibiotic resistance and reduction in new antibiotic discovery, it is imperative to investigate alternative therapeutic regimens that complement the use of current antibiotic treatment strategies. As stated by the WHO Director-General, "On current trends, common diseases may become untreatable. Doctors facing patients will have to say, Sorry, there is nothing I can do for you."

    mBio 2018;9;6

  • Genomic patterns of progression in smoldering multiple myeloma.

    Bolli N, Maura F, Minvielle S, Gloznik D, Szalat R, Fullam A, Martincorena I, Dawson KJ, Samur MK, Zamora J, Tarpey P, Davies H, Fulciniti M, Shammas MA, Tai YT, Magrangeas F, Moreau P, Corradini P, Anderson K, Alexandrov L, Wedge DC, Avet-Loiseau H, Campbell P and Munshi N

    Department of Oncology and Hemato-Oncology, University of Milan, Milan, 20122, Italy.

    We analyzed whole genomes of unique paired samples from smoldering multiple myeloma (SMM) patients progressing to multiple myeloma (MM). We report that the genomic landscape, including mutational profile and structural rearrangements at the smoldering stage is very similar to MM. Paired sample analysis shows two different patterns of progression: a "static progression model", where the subclonal architecture is retained as the disease progressed to MM suggesting that progression solely reflects the time needed to accumulate a sufficient disease burden; and a "spontaneous evolution model", where a change in the subclonal composition is observed. We also observe that activation-induced cytidine deaminase plays a major role in shaping the mutational landscape of early subclinical phases, while progression is driven by APOBEC cytidine deaminases. These results provide a unique insight into myelomagenesis with potential implications for the definition of smoldering disease and timing of treatment initiation.

    Funded by: BLRD VA: I01 BX001584; NCI NIH HHS: P01 CA155258, P50 CA100707

    Nature communications 2018;9;1;3363

  • Using WormBase ParaSite: An Integrated Platform for Exploring Helminth Genomic Data.

    Bolt BJ, Rodgers FH, Shafie M, Kersey PJ, Berriman M and Howe KL

    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK.

    WormBase ParaSite ( ) is a comprehensive resource for the genomes of parasitic nematodes and flatworms (helminths). It currently includes genomic data for over 100 helminth species, adding value by way of consistent functional annotation, gene comparative analysis and gene expression analysis. We provide several ways of exploring the data including a choice of genome browsers, genome and gene summary pages, text and sequence searching, a query wizard, bulk downloads, and programmatic interfaces. WormBase ParaSite is released three to six times per year, and is developed in collaboration with WormBase ( ) and Ensembl Genomes ( ).

    Funded by: Biotechnology and Biological Sciences Research Council: BB/K020080

    Methods in molecular biology (Clifton, N.J.) 2018;1757;471-491

  • Comparative sequence analysis of the capsular polysaccharide loci of Actinobacillus pleuropneumoniae serovars 1-18, and development of two multiplex PCRs for comprehensive capsule typing.

    Bossé JT, Li Y, Fernandez Crespo R, Lacouture S, Gottschalk M, Sárközi R, Fodor L, Casas Amoribieta M, Angen Ø, Nedbalcova K, Holden MTG, Maskell DJ, Tucker AW, Wren BW, Rycroft AN, Langford PR and BRaDP1T consortium

    Section of Paediatrics, Department of Medicine, Imperial College London, St. Mary's Campus, London, UK. Electronic address:

    Problems with serological cross-reactivity have led to development of a number of PCRs (individual and multiplex) for molecular typing of Actinobacillus pleuropneumoniae, the causative agent of porcine pleuropneumonia. Most of these assays were developed for detection of specific amplicons within capsule biosynthetic genes before the availability of complete sequences for the different serovars. Here we describe comparative analysis of the complete capsular loci for all 18 serovars of A. pleuropneumoniae, and development of two multiplex PCRs for comprehensive capsule typing of this important pig pathogen.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/G018553/1, BB/G019177/1, BB/G019274/1, BB/G020744/1

    Veterinary microbiology 2018;220;83-89

  • Proposal of serovars 17 and 18 of Actinobacillus pleuropneumoniae based on serological and genotypic analysis.

    Bossé JT, Li Y, Sárközi R, Fodor L, Lacouture S, Gottschalk M, Casas Amoribieta M, Angen Ø, Nedbalcova K, Holden MTG, Maskell DJ, Tucker AW, Wren BW, Rycroft AN, Langford PR and BRaDP1T consortium

    Section of Paediatrics, Department of Medicine, Imperial College London, St. Mary's Campus, London, UK. Electronic address:

    The aim of this study was to investigate isolates of Actinobacillus pleuropneumoniae previously designated serologically either as non-typable (NT) or as 'K2:07', which did not produce serovar-specific amplicons in PCR assays. We used whole genome sequencing to identify the capsule (CPS) loci of six previously designated biovar 1 NT and two biovar 1 'K2:O7' isolates of A. pleuropneumoniae from Denmark, as well as a recent biovar 2 NT isolate from Canada. All of the NT isolates have the same six-gene type I CPS locus, sharing common cpsABC genes with serovars 2, 3, 6, 7, 8, 9, 11 and 13. The two 'K2:O7' isolates contain a unique three-gene type II CPS locus, having a cpsA gene similar to that of serovars 1, 4, 12, 14 and 15. The previously NT isolates share the same O-antigen genes, found between erpA and rpsU, as serovars 3, 6, 8, and 15. Whereas the 'K2:O7' isolates, have the same O-antigen genes as serovar 7, which likely contributed to their previous mis-identification. All of the NT and 'K2:O7' isolates have only the genes required for production of ApxII (apxIICA structural genes, and apxIBD export genes). Rabbit polyclonal antisera raised against representative isolates with these new CPS loci demonstrated distinct reactivity compared to the 16 known serovars. The serological and genomic results indicate that the isolates constitute new serovars 17 (previously NT) and 18 (previously 'K2:O7'). Primers designed for amplification of specific serovar 17 and 18 sequences for molecular diagnostics will facilitate epidemiological tracking of these two new serovars of A. pleuropneumoniae.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/G018553/1, BB/G019177/1, BB/G019274/1, BB/G020744/1

    Veterinary microbiology 2018;217;1-6

  • Analysis of shared heritability in common disorders of the brain.

    Brainstorm Consortium, Anttila V, Bulik-Sullivan B, Finucane HK, Walters RK, Bras J, Duncan L, Escott-Price V, Falcone GJ, Gormley P, Malik R, Patsopoulos NA, Ripke S, Wei Z, Yu D, Lee PH, Turley P, Grenier-Boley B, Chouraki V, Kamatani Y, Berr C, Letenneur L, Hannequin D, Amouyel P, Boland A, Deleuze JF, Duron E, Vardarajan BN, Reitz C, Goate AM, Huentelman MJ, Kamboh MI, Larson EB, Rogaeva E, St George-Hyslop P, Hakonarson H, Kukull WA, Farrer LA, Barnes LL, Beach TG, Demirci FY, Head E, Hulette CM, Jicha GA, Kauwe JSK, Kaye JA, Leverenz JB, Levey AI, Lieberman AP, Pankratz VS, Poon WW, Quinn JF, Saykin AJ, Schneider LS, Smith AG, Sonnen JA, Stern RA, Van Deerlin VM, Van Eldik LJ, Harold D, Russo G, Rubinsztein DC, Bayer A, Tsolaki M, Proitsi P, Fox NC, Hampel H, Owen MJ, Mead S, Passmore P, Morgan K, Nöthen MM, Rossor M, Lupton MK, Hoffmann P, Kornhuber J, Lawlor B, McQuillin A, Al-Chalabi A, Bis JC, Ruiz A, Boada M, Seshadri S, Beiser A, Rice K, van der Lee SJ, De Jager PL, Geschwind DH, Riemenschneider M, Riedel-Heller S, Rotter JI, Ransmayr G, Hyman BT, Cruchaga C, Alegret M, Winsvold B, Palta P, Farh KH, Cuenca-Leon E, Furlotte N, Kurth T, Ligthart L, Terwindt GM, Freilinger T, Ran C, Gordon SD, Borck G, Adams HHH, Lehtimäki T, Wedenoja J, Buring JE, Schürks M, Hrafnsdottir M, Hottenga JJ, Penninx B, Artto V, Kaunisto M, Vepsäläinen S, Martin NG, Montgomery GW, Kurki MI, Hämäläinen E, Huang H, Huang J, Sandor C, Webber C, Muller-Myhsok B, Schreiber S, Salomaa V, Loehrer E, Göbel H, Macaya A, Pozo-Rosich P, Hansen T, Werge T, Kaprio J, Metspalu A, Kubisch C, Ferrari MD, Belin AC, van den Maagdenberg AMJM, Zwart JA, Boomsma D, Eriksson N, Olesen J, Chasman DI, Nyholt DR, Avbersek A, Baum L, Berkovic S, Bradfield J, Buono RJ, Catarino CB, Cossette P, De Jonghe P, Depondt C, Dlugos D, Ferraro TN, French J, Hjalgrim H, Jamnadas-Khoda J, Kälviäinen R, Kunz WS, Lerche H, Leu C, Lindhout D, Lo W, Lowenstein D, McCormack M, Møller RS, Molloy A, Ng PW, Oliver K, Privitera M, Radtke R, Ruppert AK, Sander T, Schachter S, Schankin C, Scheffer I, Schoch S, Sisodiya SM, Smith P, Sperling M, Striano P, Surges R, Thomas GN, Visscher F, Whelan CD, Zara F, Heinzen EL, Marson A, Becker F, Stroink H, Zimprich F, Gasser T, Gibbs R, Heutink P, Martinez M, Morris HR, Sharma M, Ryten M, Mok KY, Pulit S, Bevan S, Holliday E, Attia J, Battey T, Boncoraglio G, Thijs V, Chen WM, Mitchell B, Rothwell P, Sharma P, Sudlow C, Vicente A, Markus H, Kourkoulis C, Pera J, Raffeld M, Silliman S, Boraska Perica V, Thornton LM, Huckins LM, William Rayner N, Lewis CM, Gratacos M, Rybakowski F, Keski-Rahkonen A, Raevuori A, Hudson JI, Reichborn-Kjennerud T, Monteleone P, Karwautz A, Mannik K, Baker JH, O'Toole JK, Trace SE, Davis OSP, Helder SG, Ehrlich S, Herpertz-Dahlmann B, Danner UN, van Elburg AA, Clementi M, Forzan M, Docampo E, Lissowska J, Hauser J, Tortorella A, Maj M, Gonidakis F, Tziouvas K, Papezova H, Yilmaz Z, Wagner G, Cohen-Woods S, Herms S, Julià A, Rabionet R, Dick DM, Ripatti S, Andreassen OA, Espeseth T, Lundervold AJ, Steen VM, Pinto D, Scherer SW, Aschauer H, Schosser A, Alfredsson L, Padyukov L, Halmi KA, Mitchell J, Strober M, Bergen AW, Kaye W, Szatkiewicz JP, Cormand B, Ramos-Quiroga JA, Sánchez-Mora C, Ribasés M, Casas M, Hervas A, Arranz MJ, Haavik J, Zayats T, Johansson S, Williams N, Dempfle A, Rothenberger A, Kuntsi J, Oades RD, Banaschewski T, Franke B, Buitelaar JK, Arias Vasquez A, Doyle AE, Reif A, Lesch KP, Freitag C, Rivero O, Palmason H, Romanos M, Langley K, Rietschel M, Witt SH, Dalsgaard S, Børglum AD, Waldman I, Wilmot B, Molly N, Bau CHD, Crosbie J, Schachar R, Loo SK, McGough JJ, Grevet EH, Medland SE, Robinson E, Weiss LA, Bacchelli E, Bailey A, Bal V, Battaglia A, Betancur C, Bolton P, Cantor R, Celestino-Soper P, Dawson G, De Rubeis S, Duque F, Green A, Klauck SM, Leboyer M, Levitt P, Maestrini E, Mane S, De-Luca DM, Parr J, Regan R, Reichenberg A, Sandin S, Vorstman J, Wassink T, Wijsman E, Cook E, Santangelo S, Delorme R, Rogé B, Magalhaes T, Arking D, Schulze TG, Thompson RC, Strohmaier J, Matthews K, Melle I, Morris D, Blackwood D, McIntosh A, Bergen SE, Schalling M, Jamain S, Maaser A, Fischer SB, Reinbold CS, Fullerton JM, Guzman-Parra J, Mayoral F, Schofield PR, Cichon S, Mühleisen TW, Degenhardt F, Schumacher J, Bauer M, Mitchell PB, Gershon ES, Rice J, Potash JB, Zandi PP, Craddock N, Ferrier IN, Alda M, Rouleau GA, Turecki G, Ophoff R, Pato C, Anjorin A, Stahl E, Leber M, Czerski PM, Cruceanu C, Jones IR, Posthuma D, Andlauer TFM, Forstner AJ, Streit F, Baune BT, Air T, Sinnamon G, Wray NR, MacIntyre DJ, Porteous D, Homuth G, Rivera M, Grove J, Middeldorp CM, Hickie I, Pergadia M, Mehta D, Smit JH, Jansen R, de Geus E, Dunn E, Li QS, Nauck M, Schoevers RA, Beekman AT, Knowles JA, Viktorin A, Arnold P, Barr CL, Bedoya-Berrio G, Bienvenu OJ, Brentani H, Burton C, Camarena B, Cappi C, Cath D, Cavallini M, Cusi D, Darrow S, Denys D, Derks EM, Dietrich A, Fernandez T, Figee M, Freimer N, Gerber G, Grados M, Greenberg E, Hanna GL, Hartmann A, Hirschtritt ME, Hoekstra PJ, Huang A, Huyser C, Illmann C, Jenike M, Kuperman S, Leventhal B, Lochner C, Lyon GJ, Macciardi F, Madruga-Garrido M, Malaty IA, Maras A, McGrath L, Miguel EC, Mir P, Nestadt G, Nicolini H, Okun MS, Pakstis A, Paschou P, Piacentini J, Pittenger C, Plessen K, Ramensky V, Ramos EM, Reus V, Richter MA, Riddle MA, Robertson MM, Roessner V, Rosário M, Samuels JF, Sandor P, Stein DJ, Tsetsos F, Van Nieuwerburgh F, Weatherall S, Wendland JR, Wolanczyk T, Worbe Y, Zai G, Goes FS, McLaughlin N, Nestadt PS, Grabe HJ, Depienne C, Konkashbaev A, Lanzagorta N, Valencia-Duarte A, Bramon E, Buccola N, Cahn W, Cairns M, Chong SA, Cohen D, Crespo-Facorro B, Crowley J, Davidson M, DeLisi L, Dinan T, Donohoe G, Drapeau E, Duan J, Haan L, Hougaard D, Karachanak-Yankova S, Khrunin A, Klovins J, Kučinskas V, Lee Chee Keong J, Limborska S, Loughland C, Lönnqvist J, Maher B, Mattheisen M, McDonald C, Murphy KC, Nenadic I, van Os J, Pantelis C, Pato M, Petryshen T, Quested D, Roussos P, Sanders AR, Schall U, Schwab SG, Sim K, So HC, Stögmann E, Subramaniam M, Toncheva D, Waddington J, Walters J, Weiser M, Cheng W, Cloninger R, Curtis D, Gejman PV, Henskens F, Mattingsdal M, Oh SY, Scott R, Webb B, Breen G, Churchhouse C, Bulik CM, Daly M, Dichgans M, Faraone SV, Guerreiro R, Holmans P, Kendler KS, Koeleman B, Mathews CA, Price A, Scharf J, Sklar P, Williams J, Wood NW, Cotsapas C, Palotie A, Smoller JW, Sullivan P, Rosand J, Corvin A, Neale BM, Schott JM, Anney R, Elia J, Grigoroiu-Serbanescu M, Edenberg HJ and Murray R

    Analytic Translational Genetics Unit, Massachusetts General Hospital Harvard Medical School, Boston, Massachusetts, USA.

    Disorders of the brain can exhibit considerable epidemiological comorbidity and often share symptoms, provoking debate about their etiologic overlap. We quantified the genetic sharing of 25 brain disorders from genome-wide association studies of 265,218 patients and 784,643 control participants and assessed their relationship to 17 phenotypes from 1,191,588 individuals. Psychiatric disorders share common variant risk, whereas neurological disorders appear more distinct from one another and from the psychiatric disorders. We also identified significant sharing between disorders and a number of brain phenotypes, including cognitive measures. Further, we conducted simulations to explore how statistical power, diagnostic misclassification, and phenotypic heterogeneity affect genetic correlations. These results highlight the importance of common genetic variation as a risk factor for brain disorders and the value of heritability-based methods in understanding their etiology.

    Funded by: Department of Health: PDA/02/06/016; Intramural NIH HHS: Z99 AG999999; Medical Research Council: G0401207, G0800637, G0901310, MC_G1000735, MC_UU_00024/1, MR/K026992/1, MR/L010305/1, MR/L023784/2, MR/L501529/1, MR/L501554/1, MR/N008324/1, MR/P005748/1, MR/R024804/1; Motor Neurone Disease Association: ALCHALABI-DOBSON/APR14/829-791; NCATS NIH HHS: UL1 TR002369; NIA NIH HHS: P01 AG003991, P01 AG026276, P30 AG008017, P30 AG010129, P30 AG010161, P50 AG005133, P50 AG005136, P50 AG005681, R01 AG030653, R01 AG041718, R01 AG054076, U01 AG016976; NIMH NIH HHS: K01 MH109782, R00 MH101367, R01 MH092293, R01 MH106490, R01 MH107649, R01 MH115961, R25 MH077823, T32 MH076694, U01 MH094432, U01 MH109536; NINDS NIH HHS: R01 NS017950; Parkinson's UK: J-0901

    Science (New York, N.Y.) 2018;360;6395

  • A single nucleotide polymorphism in the Plasmodium falciparum atg18 gene associates with artemisinin resistance and confers enhanced parasite survival under nutrient deprivation.

    Breglio KF, Amato R, Eastman R, Lim P, Sa JM, Guha R, Ganesan S, Dorward DW, Klumpp-Thomas C, McKnight C, Fairhurst RM, Roberts D, Thomas C and Simon AK

    National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, USA.

    Background: Artemisinin-resistant Plasmodium falciparum has been reported throughout the Greater Mekong subregion and threatens to disrupt current malaria control efforts worldwide. Polymorphisms in kelch13 have been associated with clinical and in vitro resistance phenotypes; however, several studies suggest that the genetic determinants of resistance may involve multiple genes. Current proposed mechanisms of resistance conferred by polymorphisms in kelch13 hint at a connection to an autophagy-like pathway in P. falciparum.

    Results: A SNP in autophagy-related gene 18 (atg18) was associated with long parasite clearance half-life in patients following artemisinin-based combination therapy. This gene encodes PfAtg18, which is shown to be similar to the mammalian/yeast homologue WIPI/Atg18 in terms of structure, binding abilities, and ability to form puncta in response to stress. To investigate the contribution of this polymorphism, the atg18 gene was edited using CRISPR/Cas9 to introduce a T38I mutation into a k13-edited Dd2 parasite. The presence of this SNP confers a fitness advantage by enabling parasites to grow faster in nutrient-limited settings. The mutant and parent parasites were screened against drug libraries of 6349 unique compounds. While the SNP did not modulate the parasite's susceptibility to any of the anti-malarial compounds using a 72-h drug pulse, it did alter the parasite's susceptibility to 227 other compounds.

    Conclusions: These results suggest that the atg18 T38I polymorphism may provide additional resistance against artemisinin derivatives, but not partner drugs, even in the absence of kelch13 mutations, and may also be important in parasite survival during nutrient deprivation.

    Funded by: Wellcome Trust

    Malaria journal 2018;17;1;391

  • Generating CRISPR/Cas9-Derived Mutant Mice by Zygote Cytoplasmic Injection Using an Automatic Microinjector

    Brendan Doe, Ellen Brown and Katharina Boroviak

    Methods and Protocols 2018;1;1;5

  • Rapid HIV disease progression following superinfection in an HLA-B*27:05/B*57:01-positive transmission recipient.

    Brener J, Gall A, Hurst J, Batorsky R, Lavandier N, Chen F, Edwards A, Bolton C, Dsouza R, Allen T, Pybus OG, Kellam P, Matthews PC and Goulder PJR

    Department of Paediatrics, University of Oxford, Oxford, UK.

    Background: The factors determining differential HIV disease outcome among individuals expressing protective HLA alleles such as HLA-B*27:05 and HLA-B*57:01 remain unknown. We here analyse two HIV-infected subjects expressing both HLA-B*27:05 and HLA-B*57:01. One subject maintained low-to-undetectable viral loads for more than a decade of follow up. The other progressed to AIDS in < 3 years.

    Results: The rapid progressor was the recipient within a known transmission pair, enabling virus sequences to be tracked from transmission. Progression was associated with a 12% Gag sequence change and 26% Nef sequence change at the amino acid level within 2 years. Although next generation sequencing from early timepoints indicated that multiple CD8+ cytotoxic T lymphocyte (CTL) escape mutants were being selected prior to superinfection, < 4% of the amino acid changes arising from superinfection could be ascribed to CTL escape. Analysis of an HLA-B*27:05/B*57:01 non-progressor, in contrast, demonstrated minimal virus sequence diversification (1.1% Gag amino acid sequence change over 10 years), and dominant HIV-specific CTL responses previously shown to be effective in control of viraemia were maintained. Clonal sequencing demonstrated that escape variants were generated within the non-progressor, but in many cases were not selected. In the rapid progressor, progression occurred despite substantial reductions in viral replicative capacity (VRC), and non-progression in the elite controller despite relatively high VRC.

    Conclusions: These data are consistent with previous studies demonstrating rapid progression in association with superinfection and that rapid disease progression can occur despite the relatively the low VRC that is typically observed in the setting of multiple CTL escape mutants.

    Funded by: NIAID NIH HHS: R01 AI046995, R01 AI133673; NIH HHS: RO1AI46995; Wellcome Trust: WT104748MA

    Retrovirology 2018;15;1;7

  • Laboratory and molecular surveillance of paediatric typhoidal Salmonella in Nepal: Antimicrobial resistance and implications for vaccine policy.

    Britto CD, Dyson ZA, Duchene S, Carter MJ, Gurung M, Kelly DF, Murdoch DR, Ansari I, Thorson S, Shrestha S, Adhikari N, Dougan G, Holt KE and Pollard AJ

    Oxford Vaccine Group, Department of Paediatrics, University of Oxford and the NIHR Oxford Biomedical Research Centre, Oxford, United Kingdom.

    Background: Children are substantially affected by enteric fever in most settings with a high burden of the disease, including Nepal. However pathogen population structure and transmission dynamics are poorly delineated in young children, the proposed target group for immunization programs. Here we present whole genome sequencing and antimicrobial susceptibility data on 198 S. Typhi and 66 S. Paratyphi A isolated from children aged 2 months to 15 years of age during blood culture surveillance at Patan Hospital, Nepal, 2008-2016.

    Principal findings: S. Typhi was the dominant agent and comprised several distinct genotypes, dominated by 4.3.1 (H58). The heterogeneity of genotypes in children under five was reduced compared to data from 2005-2006, attributable to ongoing clonal expansion of H58. Most isolates (86%) were non-susceptible to fluoroquinolones, associated mainly with S. Typhi H58 lineage II and S. Paratyphi A harbouring mutations in the quinolone resistance-determining region (QRDR); non-susceptible strains from these groups accounted for 50% and 25% of all isolates. Multi-drug resistance (MDR) was rare (3.5% of S. Typhi, 0 S. Paratyphi A) and restricted to chromosomal insertions of resistance genes in H58 lineage I strains. Temporal analyses revealed a shift in dominance from H58 Lineage I to H58 Lineage II, with the latter being significantly more common after 2010. Comparison to global data sets showed the local S. Typhi and S. Paratyphi A strains had close genetic relatives in other South Asian countries, indicating regional strain circulation. Multiple imports from India of ciprofloxacin-resistant H58 lineage II strains were identified, but these were rare and showed no evidence of clonal replacement of local S. Typhi.

    Significance: These data indicate that enteric fever in Nepal continues to be a major public health issue with ongoing inter- and intra-country transmission, and highlights the need for regional coordination of intervention strategies. The absence of a S. Paratyphi A vaccine is cause for concern, given its prevalence as a fluoroquinolone resistant enteric fever agent in this setting.

    PLoS neglected tropical diseases 2018;12;4;e0006408

  • A systematic review of antimicrobial resistance in Salmonella enterica serovar Typhi, the etiological agent of typhoid.

    Britto CD, Wong VK, Dougan G and Pollard AJ

    Oxford Vaccine Group, Department of Paediatrics, University of Oxford and the NIHR Oxford Biomedical Research Centre, Oxford, United Kingdom.

    Background: The temporal and spatial change in trends of antimicrobial resistance (AMR) in typhoid have not been systematically studied, and such information will be critical for defining intervention, as well as planning sustainable prevention strategies.

    Methodology and findings: To identify the phenotypic trends in AMR, 13,833 individual S. Typhi isolates, reported from 1973 to 2018 in 62 publications, were analysed to determine the AMR preponderance over time. Separate analyses of molecular resistance determinants present in over 4,000 isolates reported in 61 publications were also conducted. Multi-drug resistant (MDR) typhoid is in decline in Asia in a setting of high fluoroquinolone resistance while it is on the increase in Africa. Mutations in QRDRs in gyrA (S83F, D87N) and parC (S80I) are the most common mechanisms responsible for fluoroquinolone resistance. Cephalosporin resistant S. Typhi, dubbed extensively drug-resistant (XDR) is a real threat and underscores the urgency in deploying the Vi-conjugate vaccines.

    Conclusion: From these observations, it appears that AMR in S. Typhi will continue to emerge leading to treatment failure, changes in antimicrobial policy and further resistance developing in S. Typhi isolates and other Gram-negative bacteria in endemic regions. The deployment of typhoid conjugate vaccines to control the disease in endemic regions may be the best defence.

    Funded by: Wellcome Trust

    PLoS neglected tropical diseases 2018;12;10;e0006779

  • Whole genome sequencing and microsatellite analysis of the Plasmodium falciparum E5 NF54 strain show that the var, rifin and stevor gene families follow Mendelian inheritance.

    Bruske E, Otto TD and Frank M

    Institute of Tropical Medicine, University of Tuebingen, Wilhelmstr. 27, 72074, Tuebingen, Germany.

    Background: Plasmodium falciparum exhibits a high degree of inter-isolate genetic diversity in its variant surface antigen (VSA) families: P. falciparum erythrocyte membrane protein 1, repetitive interspersed family (RIFIN) and subtelomeric variable open reading frame (STEVOR). The role of recombination for the generation of this diversity is a subject of ongoing research. Here the genome of E5, a sibling of the 3D7 genome strain is presented. Short and long read whole genome sequencing (WGS) techniques (Ilumina, Pacific Bioscience) and a set of 84 microsatellites (MS) were employed to characterize the 3D7 and non-3D7 parts of the E5 genome. This is the first time that VSA genes in sibling parasites were analysed with long read sequencing technology.

    Results: Of the 5733 E5 genes only 278 genes, mostly var and rifin/stevor genes, had no orthologues in the 3D7 genome. WGS and MS analysis revealed that chromosomal crossovers occurred at a rate of 0-3 per chromosome. var, stevor and rifin genes were inherited within the respective non-3D7 or 3D7 chromosomal context. 54 of the 84 MS PCR fragments correctly identified the respective MS as 3D7- or non-3D7 and this correlated with var and rifin/stevor gene inheritance in the adjacent chromosomal regions. E5 had 61 var and 189 rifin/stevor genes. One large non-chromosomal recombination event resulted in a new var gene on chromosome 14. The remainder of the E5 3D7-type subtelomeric and central regions were identical to 3D7.

    Conclusions: The data show that the rifin/stevor and var gene families represent the most diverse compartments of the P. falciparum genome but that the majority of var genes are inherited without alterations within their respective parental chromosomal context. Furthermore, MS genotyping with 54 MS can successfully distinguish between two sibling progeny of a natural P. falciparum cross and thus can be used to investigate identity by descent in field isolates.

    Funded by: Bundesministerium für Bildung und Forschung: BMBF-grant 01KA110; Wellcome Trust: 098051

    Malaria journal 2018;17;1;376

  • Itraconazole targets cell cycle heterogeneity in colorectal cancer.

    Buczacki SJA, Popova S, Biggs E, Koukorava C, Buzzelli J, Vermeulen L, Hazelwood L, Francies H, Garnett MJ and Winton DJ

    Cancer Research UK (CRUK) Cambridge Institute, Li Ka Shing Centre, Robinson Way, Cambridge, England, UK

    Cellular dormancy and heterogeneity in cell cycle length provide important explanations for treatment failure after adjuvant therapy with S-phase cytotoxics in colorectal cancer (CRC), yet the molecular control of the dormant versus cycling state remains unknown. We sought to understand the molecular features of dormant CRC cells to facilitate rationale identification of compounds to target both dormant and cycling tumor cells. Unexpectedly, we demonstrate that dormant CRC cells are differentiated, yet retain clonogenic capacity. Mouse organoid drug screening identifies that itraconazole generates spheroid collapse and loss of dormancy. Human CRC cell dormancy and tumor growth can also be perturbed by itraconazole, which is found to inhibit Wnt signaling through noncanonical hedgehog signaling. Preclinical validation shows itraconazole to be effective in multiple assays through Wnt inhibition, causing both cycling and dormant cells to switch to global senescence. These data provide preclinical evidence to support an early phase trial of itraconazole in CRC.

    Funded by: Cancer Research UK: C14094/A16485, C44943/A22536; Wellcome Trust: 102696

    The Journal of experimental medicine 2018;215;7;1891-1912

  • Fitness Loss under Amino Acid Starvation in Artemisinin-Resistant Plasmodium falciparum Isolates from Cambodia.

    Bunditvorapoom D, Kochakarn T, Kotanan N, Modchang C, Kümpornsin K, Loesbanluechai D, Krasae T, Cui L, Chotivanich K, White NJ, Wilairat P, Miotto O and Chookajorn T

    Genomics and Evolutionary Medicine Unit (GEM), Center of Excellence in Malaria Research, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand.

    Artemisinin is the most rapidly effective drug for Plasmodium falciparum malaria treatment currently in clinical use. Emerging artemisinin-resistant parasites pose a great global health risk. At present, the level of artemisinin resistance is still relatively low with evidence pointing towards a trade-off between artemisinin resistance and fitness loss. Here we show that artemisinin-resistant P. falciparum isolates from Cambodia manifested fitness loss, showing fewer progenies during the intra-erythrocytic developmental cycle. The loss in fitness was exacerbated under the condition of low exogenous amino acid supply. The resistant parasites failed to undergo maturation, whereas their drug-sensitive counterparts were able to complete the erythrocytic cycle under conditions of amino acid deprivation. The artemisinin-resistant phenotype was not stable, and loss of the phenotype was associated with changes in the expression of a putative target, Exp1, a membrane glutathione transferase. Analysis of SNPs in haemoglobin processing genes revealed associations with parasite clearance times, suggesting changes in haemoglobin catabolism may contribute to artemisinin resistance. These findings on fitness and protein homeostasis could provide clues on how to contain emerging artemisinin-resistant parasites.

    Funded by: FIC NIH HHS: D43 TW006571; NIAID NIH HHS: R01 AI128940, U19 AI089672

    Scientific reports 2018;8;1;12622

  • Association of LPA Variants With Risk of Coronary Disease and the Implications for Lipoprotein(a)-Lowering Therapies: A Mendelian Randomization Analysis.

    Burgess S, Ference BA, Staley JR, Freitag DF, Mason AM, Nielsen SF, Willeit P, Young R, Surendran P, Karthikeyan S, Bolton TR, Peters JE, Kamstrup PR, Tybjærg-Hansen A, Benn M, Langsted A, Schnohr P, Vedel-Krogh S, Kobylecki CJ, Ford I, Packard C, Trompet S, Jukema JW, Sattar N, Di Angelantonio E, Saleheen D, Howson JMM, Nordestgaard BG, Butterworth AS, Danesh J and European Prospective Investigation Into Cancer and Nutrition–Cardiovascular Disease (EPIC-CVD) Consortium

    Medical Research Council Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom.

    Importance: Human genetic studies have indicated that plasma lipoprotein(a) (Lp[a]) is causally associated with the risk of coronary heart disease (CHD), but randomized trials of several therapies that reduce Lp(a) levels by 25% to 35% have not provided any evidence that lowering Lp(a) level reduces CHD risk.

    Objective: To estimate the magnitude of the change in plasma Lp(a) levels needed to have the same evidence of an association with CHD risk as a 38.67-mg/dL (ie, 1-mmol/L) change in low-density lipoprotein cholesterol (LDL-C) level, a change that has been shown to produce a clinically meaningful reduction in the risk of CHD.

    Design, setting, and participants: A mendelian randomization analysis was conducted using individual participant data from 5 studies and with external validation using summarized data from 48 studies. Population-based prospective cohort and case-control studies featured 20 793 individuals with CHD and 27 540 controls with individual participant data, whereas summarized data included 62 240 patients with CHD and 127 299 controls. Data were analyzed from November 2016 to March 2018.

    Exposures: Genetic LPA score and plasma Lp(a) mass concentration.

    Main outcomes and measures: Coronary heart disease.

    Results: Of the included study participants, 53% were men, all were of white European ancestry, and the mean age was 57.5 years. The association of genetically predicted Lp(a) with CHD risk was linearly proportional to the absolute change in Lp(a) concentration. A 10-mg/dL lower genetically predicted Lp(a) concentration was associated with a 5.8% lower CHD risk (odds ratio [OR], 0.942; 95% CI, 0.933-0.951; P = 3 × 10-37), whereas a 10-mg/dL lower genetically predicted LDL-C level estimated using an LDL-C genetic score was associated with a 14.5% lower CHD risk (OR, 0.855; 95% CI, 0.818-0.893; P = 2 × 10-12). Thus, a 101.5-mg/dL change (95% CI, 71.0-137.0) in Lp(a) concentration had the same association with CHD risk as a 38.67-mg/dL change in LDL-C level. The association of genetically predicted Lp(a) concentration with CHD risk appeared to be independent of changes in LDL-C level owing to genetic variants that mimic the relationship of statins, PCSK9 inhibitors, and ezetimibe with CHD risk.

    Conclusions and relevance: The clinical benefit of lowering Lp(a) is likely to be proportional to the absolute reduction in Lp(a) concentration. Large absolute reductions in Lp(a) of approximately 100 mg/dL may be required to produce a clinically meaningful reduction in the risk of CHD similar in magnitude to what can be achieved by lowering LDL-C level by 38.67 mg/dL (ie, 1 mmol/L).

    Funded by: British Heart Foundation: RG/08/014, RG/13/13/30194, SP/09/002; Department of Health; Medical Research Council: G0800270, MC_UU_00002/7, MR/L003120/1, MR/S003746/1; Wellcome Trust: 204623/Z/16/Z

    JAMA cardiology 2018;3;7;619-627

  • Insular Celtic population structure and genomic footprints of migration.

    Byrne RP, Martiniano R, Cassidy LM, Carrigan M, Hellenthal G, Hardiman O, Bradley DG and McLaughlin RL

    Complex Trait Genomics Laboratory, Smurfit Institute of Genetics, School of Genetics and Microbiology, Trinity College Dublin, College Green, Dublin, Republic of Ireland.

    Previous studies of the genetic landscape of Ireland have suggested homogeneity, with population substructure undetectable using single-marker methods. Here we have harnessed the haplotype-based method fineSTRUCTURE in an Irish genome-wide SNP dataset, identifying 23 discrete genetic clusters which segregate with geographical provenance. Cluster diversity is pronounced in the west of Ireland but reduced in the east where older structure has been eroded by historical migrations. Accordingly, when populations from the neighbouring island of Britain are included, a west-east cline of Celtic-British ancestry is revealed along with a particularly striking correlation between haplotypes and geography across both islands. A strong relationship is revealed between subsets of Northern Irish and Scottish populations, where discordant genetic and geographic affinities reflect major migrations in recent centuries. Additionally, Irish genetic proximity of all Scottish samples likely reflects older strata of communication across the narrowest inter-island crossing. Using GLOBETROTTER we detected Irish admixture signals from Britain and Europe and estimated dates for events consistent with the historical migrations of the Norse-Vikings, the Anglo-Normans and the British Plantations. The influence of the former is greater than previously estimated from Y chromosome haplotypes. In all, we paint a new picture of the genetic landscape of Ireland, revealing structure which should be considered in the design of studies examining rare genetic variation and its association with traits.

    Funded by: Motor Neurone Disease Association: MCLAUGHLIN/OCT15/957-799

    PLoS genetics 2018;14;1;e1007152

  • Farewell Stan Stanley Falkow: 1934-2018.

    Cabello FC, Cohen SN, Curtiss R, Dougan G, van Embden J, Finlay BB, Heffron F, Helinski D, Hull R, Hull S, Isberg R, Kopecko DJ, Levy S, Mekalanos J, Ortiz JM, Rappuoli R, Roberts MC, So M and Timmis KN

    Department of Microbiology and Immunology, New York Medical College, Valhalla, NY, USA.

    Environmental microbiology 2018;20;7;2322-2333

  • Morphological, genomic and transcriptomic responses of Klebsiella pneumoniae to the last-line antibiotic colistin.

    Cain AK, Boinett CJ, Barquist L, Dordel J, Fookes M, Mayho M, Ellington MJ, Goulding D, Pickard D, Wick RR, Holt KE, Parkhill J and Thomson NR

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Colistin remains one of the few antibiotics effective against multi-drug resistant (MDR) hospital pathogens, such as Klebsiella pneumoniae. Yet resistance to this last-line drug is rapidly increasing. Characterized mechanisms of col<sup>R</sup> in K. pneumoniae are largely due to chromosomal mutations in two-component regulators, although a plasmid-mediated col<sup>R</sup> mechanism has recently been uncovered. However, the effects of intrinsic colistin resistance are yet to be characterized on a whole-genome level. Here, we used a genomics-based approach to understand the mechanisms of adaptive col<sup>R</sup> acquisition in K. pneumoniae. In controlled directed-evolution experiments we observed two distinct paths to colistin resistance acquisition. Whole genome sequencing identified mutations in two colistin resistance genes: in the known col<sup>R</sup> regulator phoQ which became fixed in the population and resulted in a single amino acid change, and unstable minority variants in the recently described two-component sensor crrB. Through RNAseq and microscopy, we reveal the broad range of effects that colistin exposure has on the cell. This study is the first to use genomics to identify a population of minority variants with mutations in a col<sup>R</sup> gene in K. pneumoniae.

    Funded by: Medical Research Council: G1100100; Wellcome Trust: WT098051

    Scientific reports 2018;8;1;9868

  • Increasing nursing capacity in genomics: Overview of existing global genomics resources.

    Calzone KA, Kirk M, Tonkin E, Badzek L, Benjamin C and Middleton A

    National Institutes of Health, National Cancer Institute, Center for Cancer Research, Genetics Branch, 37 Convent Drive, Building 37, RM 6002C, MSC 4256, Bethesda, MD 20892, USA. Electronic address:

    Background: Global genomic literacy of all health professions, including nurses, remains low despite an inundation of genomic information with established clinical and analytic validity and clinical utility. Genomic literacy and competency deficits contribute to lost opportunities to take advantage of the benefits that genomic information provides to improve health outcomes, reduce healthcare costs, and increase patient quality and safety. Nurses are essential to the integration of genomics into healthcare. The greatest challenges to realizing their potential in successful integration include education and awareness. Identification of resources, their focus, whether they targeted at nursing, and how to access them, form the foundation for a global genomic resource initiative led by the Global Genomics Nursing Alliance.

    Objectives: The aim was to identify existing global genomic resources and competencies, identifying the source, type and accessibility.

    Design: Cross sectional online descriptive survey to ascertain existing genomic resources.

    Settings: Limited to eighteen countries and seven organizations represented by delegates attending the inaugural meeting in 2017 of the Global Genomics Nursing Alliance.

    Participants: A purposive sample of global nursing leaders and representatives of national and international nursing organizations.

    Methods: The primary method was by online survey administered following an orientation webinar. Given the small numbers of nurse leaders in genomics within our sample (and indeed within the world), results were analyzed and presented descriptively. Those identifying resources provided further detailed resource information. Additional data were collected during a face-to-face meeting using an electronic audience-response system.

    Results: Of the twenty-three global delegates responding, 9 identified existing genomic resources that could be used for academic or continuing genomics education. Three countries have competence frameworks to guide learning and 5 countries have national organizations for genetics nurses.

    Conclusions: The genomic resources that already exist are not readily accessible or discoverable to the international nursing community and as such are underutilized.

    Funded by: Intramural NIH HHS: Z99 CA999999; Wellcome Trust

    Nurse education today 2018;69;53-59

  • A forward genetic screen reveals a primary role for Plasmodium falciparum Reticulocyte Binding Protein Homologue 2a and 2b in determining alternative erythrocyte invasion pathways.

    Campino S, Marin-Menendez A, Kemp A, Cross N, Drought L, Otto TD, Benavente ED, Ravenhall M, Schwach F, Girling G, Manske M, Theron M, Gould K, Drury E, Clark TG, Kwiatkowski DP, Pance A and Rayner JC

    Malaria Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom.

    Invasion of human erythrocytes is essential for Plasmodium falciparum parasite survival and pathogenesis, and is also a complex phenotype. While some later steps in invasion appear to be invariant and essential, the earlier steps of recognition are controlled by a series of redundant, and only partially understood, receptor-ligand interactions. Reverse genetic analysis of laboratory adapted strains has identified multiple genes that when deleted can alter invasion, but how the relative contributions of each gene translate to the phenotypes of clinical isolates is far from clear. We used a forward genetic approach to identify genes responsible for variable erythrocyte invasion by phenotyping the parents and progeny of previously generated experimental genetic crosses. Linkage analysis using whole genome sequencing data revealed a single major locus was responsible for the majority of phenotypic variation in two invasion pathways. This locus contained the PfRh2a and PfRh2b genes, members of one of the major invasion ligand gene families, but not widely thought to play such a prominent role in specifying invasion phenotypes. Variation in invasion pathways was linked to significant differences in PfRh2a and PfRh2b expression between parasite lines, and their role in specifying alternative invasion was confirmed by CRISPR-Cas9-mediated genome editing. Expansion of the analysis to a large set of clinical P. falciparum isolates revealed common deletions, suggesting that variation at this locus is a major cause of invasion phenotypic variation in the endemic setting. This work has implications for blood-stage vaccine development and will help inform the design and location of future large-scale studies of invasion in clinical isolates.

    Funded by: Medical Research Council: MR/M01360X/1, MR/M006212/1, MR/M01360X/1; Wellcome Trust: 090851

    PLoS pathogens 2018;14;11;e1007436

  • Homozygous loss-of-function mutations in SLC26A7 cause goitrous congenital hypothyroidism.

    Cangul H, Liao XH, Schoenmakers E, Kero J, Barone S, Srichomkwun P, Iwayama H, Serra EG, Saglam H, Eren E, Tarim O, Nicholas AK, Zvetkova I, Anderson CA, Frankl FEK, Boelaert K, Ojaniemi M, Jääskeläinen J, Patyra K, Löf C, Williams ED, UK10K Consortium, Soleimani M, Barrett T, Maher ER, Chatterjee VK, Refetoff S and Schoenmakers N

    Department of Medical Genetics, Istanbul Medipol University, International School of Medicine, Istanbul, Turkey.

    Defects in genes mediating thyroid hormone biosynthesis result in dyshormonogenic congenital hypothyroidism (CH). Here, we report homozygous truncating mutations in SLC26A7 in 6 unrelated families with goitrous CH and show that goitrous hypothyroidism also occurs in Slc26a7-null mice. In both species, the gene is expressed predominantly in the thyroid gland, and loss of function is associated with impaired availability of iodine for thyroid hormone synthesis, partially corrected in mice by iodine supplementation. SLC26A7 is a member of the same transporter family as SLC26A4 (pendrin), an anion exchanger with affinity for iodide and chloride (among others), whose gene mutations cause congenital deafness and dyshormonogenic goiter. However, in contrast to pendrin, SLC26A7 does not mediate cellular iodide efflux and hearing in affected individuals is normal. We delineate a hitherto unrecognized role for SLC26A7 in thyroid hormone biosynthesis, for which the mechanism remains unclear.

    Funded by: Department of Health; Medical Research Council: G0502115, G0600717, MC_UU_12012/5; NIDDK NIH HHS: R01 DK015070, R37 DK015070; Wellcome Trust: 095564/Z/11/Z , 098051, 100574/Z/12/Z, 100585/Z/12/Z, WT091310

    JCI insight 2018;3;20

  • Evaluation of Protein-Ligand Docking by Cyscore.

    Cao Y, Dai W and Miao Z

    Center of Growth, Metabolism and Aging, Key Lab of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, People's Republic of China.

    Protein-ligand docking is a powerful method in drug discovery. The reliability of docking can be quantified by RMSD between a docking structure and an experimentally determined one. However, most experimentally determined structures are not available in practice. Evaluation by scoring functions is an alternative for assessing protein-ligand docking results. This chapter first provides a brief introduction to scoring methods used in docking. Then details are provided on how to use Cyscore programs. Finally it describes a case study for evaluation of protein-ligand docking.

    Methods in molecular biology (Clifton, N.J.) 2018;1762;233-243

  • Ancient Biomolecules and Evolutionary Inference.

    Cappellini E, Prohaska A, Racimo F, Welker F, Pedersen MW, Allentoft ME, de Barros Damgaard P, Gutenbrunner P, Dunne J, Hammann S, Roffet-Salque M, Ilardo M, Moreno-Mayar JV, Wang Y, Sikora M, Vinner L, Cox J, Evershed RP and Willerslev E

    Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, 1350 Copenhagen, Denmark; email: ,

    Over the past three decades, studies of ancient biomolecules-particularly ancient DNA, proteins, and lipids-have revolutionized our understanding of evolutionary history. Though initially fraught with many challenges, today the field stands on firm foundations. Researchers now successfully retrieve nucleotide and amino acid sequences, as well as lipid signatures, from progressively older samples, originating from geographic areas and depositional environments that, until recently, were regarded as hostile to long-term preservation of biomolecules. Sampling frequencies and the spatial and temporal scope of studies have also increased markedly, and with them the size and quality of the data sets generated. This progress has been made possible by continuous technical innovations in analytical methods, enhanced criteria for the selection of ancient samples, integrated experimental methods, and advanced computational approaches. Here, we discuss the history and current state of ancient biomolecule research, its applications to evolutionary inference, and future directions for this young and exciting field.

    Annual review of biochemistry 2018;87;1029-1060

  • In silico guided reconstruction and analysis of ICAM-1-binding var genes from Plasmodium falciparum.

    Carrington E, Otto TD, Szestak T, Lennartz F, Higgins MK, Newbold CI and Craig AG

    Liverpool School of Tropical Medicine, Pembroke Place, Liverpool, L3 5QA, UK.

    The Plasmodium falciparum variant surface antigen PfEMP1 expressed on the surface of infected erythrocytes is thought to play a major role in the pathology of severe malaria. As the sequence pool of the var genes encoding PfEMP1 expands there are opportunities, despite the high degree of sequence diversity demonstrated by this gene family, to reconstruct full-length var genes from small sequence tags generated from patient isolates. To test whether this is possible we have used a set of recently laboratory adapted ICAM-1-binding parasite isolates to generate sequence tags and, from these, to identify the full-length PfEMP1 being expressed by them. In a subset of the strains available we were able to produce validated, full-length var gene sequences and use these to conduct biophysical analyses of the ICAM-1 binding regions.

    Funded by: Medical Research Council: G0901062, MC_PC_16052; Wellcome Trust: 095507/Z/11/Z, 104792/Z/14/Z

    Scientific reports 2018;8;1;3282

  • A novel variant in GLIS3 is associated with osteoarthritis.

    Casalone E, Tachmazidou I, Zengini E, Hatzikotoulas K, Hackinger S, Suveges D, Steinberg J, Rayner NW, arcOGEN Consortium, Wilkinson JM, Panoutsopoulou K and Zeggini E

    Department of Medical Sciences, University of Turin, Turin, Italy.

    Objectives: Osteoarthritis (OA) is a complex disease, but its genetic aetiology remains poorly characterised. To identify novel susceptibility loci for OA, we carried out a genome-wide association study (GWAS) in individuals from the largest UK-based OA collections to date.

    Methods: We carried out a discovery GWAS in 5414 OA individuals with knee and/or hip total joint replacement (TJR) and 9939 population-based controls. We followed-up prioritised variants in OA subjects from the interim release of the UK Biobank resource (up to 12 658 cases and 50 898 controls) and our lead finding in operated OA subjects from the full release of UK Biobank (17 894 cases and 89 470 controls). We investigated its functional implications in methylation, gene expression and proteomics data in primary chondrocytes from 12 pairs of intact and degraded cartilage samples from patients undergoing TJR.

    Results: We detect a genome-wide significant association at rs10116772 with TJR (P=3.7×10<sup>-8</sup>; for allele A: OR (95% CI) 0.97 (0.96 to 0.98)), an intronic variant in <i>GLIS3</i>, which is expressed in cartilage. Variants in strong correlation with rs10116772 have been associated with elevated plasma glucose levels and diabetes.

    Conclusions: We identify a novel susceptibility locus for OA that has been previously implicated in diabetes and glycaemic traits.

    Funded by: Medical Research Council: G1001799, MC_QA137853, MR/N01104X/1, MR/N01104X/2, MR/P020941/1; Versus Arthritis: 20308; Wellcome Trust

    Annals of the rheumatic diseases 2018;77;4;620-623

  • Multiplexed ChIP-Seq Using Direct Nucleosome Barcoding: A Tool for High-Throughput Chromatin Analysis.

    Chabbert CD, Adjalley SH, Steinmetz LM and Pelechano V

    Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany.

    Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) or microarray hybridization (ChIP-on-chip) are standard methods for the study of transcription factor binding sites and histone chemical modifications. However, these approaches only allow profiling of a single factor or protein modification at a time.In this chapter, we present Bar-ChIP, a higher throughput version of ChIP-Seq that relies on the direct ligation of molecular barcodes to chromatin fragments. Bar-ChIP enables the concurrent profiling of multiple DNA-protein interactions and is therefore amenable to experimental scale-up, without the need for any robotic instrumentation.

    Methods in molecular biology (Clifton, N.J.) 2018;1689;177-194

  • Latin Americans show wide-spread Converso ancestry and imprint of local Native ancestry on physical appearance.

    Chacón-Duque JC, Adhikari K, Fuentes-Guajardo M, Mendoza-Revilla J, Acuña-Alonzo V, Barquera R, Quinto-Sánchez M, Gómez-Valdés J, Everardo Martínez P, Villamil-Ramírez H, Hünemeier T, Ramallo V, Silva de Cerqueira CC, Hurtado M, Villegas V, Granja V, Villena M, Vásquez R, Llop E, Sandoval JR, Salazar-Granara AA, Parolin ML, Sandoval K, Peñaloza-Espinosa RI, Rangel-Villalobos H, Winkler CA, Klitz W, Bravi C, Molina J, Corach D, Barrantes R, Gomes V, Resende C, Gusmão L, Amorim A, Xue Y, Dugoujon JM, Moral P, González-José R, Schuler-Faccini L, Salzano FM, Bortolini MC, Canizales-Quinteros S, Poletti G, Gallo C, Bedoya G, Rothhammer F, Balding D, Hellenthal G and Ruiz-Linares A

    Department of Genetics, Evolution and Environment and UCL Genetics Institute, University College London, London, WC1E 6BT, UK.

    Historical records and genetic analyses indicate that Latin Americans trace their ancestry mainly to the intermixing (admixture) of Native Americans, Europeans and Sub-Saharan Africans. Using novel haplotype-based methods, here we infer sub-continental ancestry in over 6,500 Latin Americans and evaluate the impact of regional ancestry variation on physical appearance. We find that Native American ancestry components in Latin Americans correspond geographically to the present-day genetic structure of Native groups, and that sources of non-Native ancestry, and admixture timings, match documented migratory flows. We also detect South/East Mediterranean ancestry across Latin America, probably stemming mostly from the clandestine colonial migration of Christian converts of non-European origin (Conversos). Furthermore, we find that ancestry related to highland (Central Andean) versus lowland (Mapuche) Natives is associated with variation in facial features, particularly nose morphology, and detect significant differences in allele frequencies between these groups at loci previously associated with nose morphology in this sample.

    Nature communications 2018;9;1;5388

  • Single-Cell (Multi)omics Technologies.

    Chappell L, Russell AJC and Voet T

    Wellcome Sanger Institute, Cambridge CB10 1SA, United Kingdom; email: , ,

    Single-cell multiomics technologies typically measure multiple types of molecule from the same individual cell, enabling more profound biological insight than can be inferred by analyzing each molecular layer from separate cells. These single-cell multiomics technologies can reveal cellular heterogeneity at multiple molecular layers within a population of cells and reveal how this variation is coupled or uncoupled between the captured omic layers. The data sets generated by these techniques have the potential to enable a deeper understanding of the key biological processes and mechanisms driving cellular heterogeneity and how they are linked with normal development and aging as well as disease etiology. This review details both established and novel single-cell mono- and multiomics technologies and considers their limitations, applications, and likely future developments.

    Funded by: Wellcome Trust: 105031/E/14/Z, 105045/Z/14/Z

    Annual review of genomics and human genetics 2018;19;15-41

  • A rapid and robust method for single cell chromatin accessibility profiling.

    Chen X, Miragaia RJ, Natarajan KN and Teichmann SA

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    The assay for transposase-accessible chromatin using sequencing (ATAC-seq) is widely used to identify regulatory regions throughout the genome. However, very few studies have been performed at the single cell level (scATAC-seq) due to technical challenges. Here we developed a simple and robust plate-based scATAC-seq method, combining upfront bulk Tn5 tagging with single-nuclei sorting. We demonstrate that our method works robustly across various systems, including fresh and cryopreserved cells from primary tissues. By profiling over 3000 splenocytes, we identify distinct immune cell types and reveal cell type-specific regulatory regions and related transcription factors.

    Funded by: Wellcome Trust

    Nature communications 2018;9;1;5345

  • Pooled extracellular receptor-ligand interaction screening using CRISPR activation.

    Chong ZS, Ohnishi S, Yusa K and Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK.

    Extracellular interactions between cell surface receptors are necessary for signaling and adhesion but identifying them remains technically challenging. We describe a cell-based genome-wide approach employing CRISPR activation to identify receptors for a defined ligand. We show receptors for high-affinity antibodies and low-affinity ligands can be unambiguously identified when used in pools or as individual binding probes. We apply this technique to identify ligands for the adhesion G-protein-coupled receptors and show that the Nogo myelin-associated inhibitory proteins are ligands for ADGRB1. This method will enable extracellular receptor-ligand identification on a genome-wide scale.

    Funded by: Wellcome Trust: 206194

    Genome biology 2018;19;1;205

  • RecQ helicases in the malaria parasite Plasmodium falciparum affect genome stability, gene expression patterns and DNA replication dynamics.

    Claessens A, Harris LM, Stanojcic S, Chappell L, Stanton A, Kuk N, Veneziano-Broccia P, Sterkers Y, Rayner JC and Merrick CJ

    London School of Hygiene and Tropical Medicine, London, United Kingdom.

    The malaria parasite Plasmodium falciparum has evolved an unusual genome structure. The majority of the genome is relatively stable, with mutation rates similar to most eukaryotic species. However, some regions are very unstable with high recombination rates, driving the generation of new immune evasion-associated var genes. The molecular factors controlling the inconsistent stability of this genome are not known. Here we studied the roles of the two putative RecQ helicases in P. falciparum, PfBLM and PfWRN. When PfWRN was knocked down, recombination rates increased four-fold, generating chromosomal abnormalities, a high rate of chimeric var genes and many microindels, particularly in known 'fragile sites'. This is the first identification of a gene involved in suppressing recombination and maintaining genome stability in Plasmodium. By contrast, no change in mutation rate appeared when the second RecQ helicase, PfBLM, was mutated. At the transcriptional level, however, both helicases evidently modulate the transcription of large cohorts of genes, with several hundred genes-including a large proportion of vars-showing deregulated expression in each RecQ mutant. Aberrant processing of stalled replication forks is a possible mechanism underlying elevated mutation rates and this was assessed by measuring DNA replication dynamics in the RecQ mutant lines. Replication forks moved slowly and stalled at elevated rates in both mutants, confirming that RecQ helicases are required for efficient DNA replication. Overall, this work identifies the Plasmodium RecQ helicases as major players in DNA replication, antigenic diversification and genome stability in the most lethal human malaria parasite, with important implications for genome evolution in this pathogen.

    Funded by: Medical Research Council: MR/K000535/1, MR/L008823/1, MR/P010873/1, MR/P010873/2; Wellcome Trust: 090851

    PLoS genetics 2018;14;7;e1007490

  • scNMT-seq enables joint profiling of chromatin accessibility DNA methylation and transcription in single cells.

    Clark SJ, Argelaguet R, Kapourani CA, Stubbs TM, Lee HJ, Alda-Catalinas C, Krueger F, Sanguinetti G, Kelsey G, Marioni JC, Stegle O and Reik W

    Epigenetics Programme, Babraham Institute, Cambridge, CB22 3AT, UK.

    Parallel single-cell sequencing protocols represent powerful methods for investigating regulatory relationships, including epigenome-transcriptome interactions. Here, we report a single-cell method for parallel chromatin accessibility, DNA methylation and transcriptome profiling. scNMT-seq (single-cell nucleosome, methylation and transcription sequencing) uses a GpC methyltransferase to label open chromatin followed by bisulfite and RNA sequencing. We validate scNMT-seq by applying it to differentiating mouse embryonic stem cells, finding links between all three molecular layers and revealing dynamic coupling between epigenomic layers during differentiation.

    Funded by: Medical Research Council: MR/K011332/1; Wellcome Trust

    Nature communications 2018;9;1;781

  • The genomics of insecticide resistance: insights from recent studies in African malaria vectors.

    Clarkson CS, Temple HJ and Miles A

    Wellcome Sanger Institute, Hinxton CB10 1SA, United Kingdom. Electronic address:

    Over 80% of the world's population is at risk from arthropod-vectored diseases, and arthropod crop pests are a significant threat to food security. Insecticides are our front-line response for controlling these disease vectors and pests, and consequently the increasing prevalence of insecticide resistance is of global concern. Here we provide a brief overview of how genomics can be used to implement effective insecticide resistance management (IRM), with a focus on recent advances in the study of Anopheles gambiae, the major vector of malaria in Africa. These advances unlock the potential for a predictive form of IRM, allowing tractable feedback for stakeholders, where the latest field data and well parameterised models can maximise the lifetime and effectiveness of available insecticides.

    Current opinion in insect science 2018;27;111-115

  • Pneumococcal vaccine impacts on the population genomics of non-typeable Haemophilus influenzae.

    Cleary D, Devine V, Morris D, Osman K, Gladstone R, Bentley S, Faust S and Clarke S

    1​Faculty of Medicine and Institute for Life Sciences, University of Southampton, Southampton, UK.

    The implementation of pneumococcal conjugate vaccines (PCVs) has led to a decline in vaccine-type disease. However, there is evidence that the epidemiology of non-typeable Haemophilus influenzae (NTHi) carriage and disease can be altered as a consequence of PCV introduction. We explored the epidemiological shifts in NTHi carriage using whole genome sequencing over a 5-year period that included PCV13 replacement of PCV7 in the UK's National Immunization Programme in 2010. Between 2008/09 and 2012/13 (October to March), nasopharyngeal swabs were taken from children <5 years of age. Significantly increased carriage post-PCV13 was observed and lineage-specific associations with Streptococcus pneumoniae were seen before but not after PCV13 introduction. NTHi were characterized into 11 discrete, temporally stable lineages, congruent with current knowledge regarding the clonality of NTHi. The increased carriage could not be linked to the expansion of a particular clone and different co-carriage dynamics were seen before PCV13 implementation when NTHi co-carried with vaccine serotype pneumococci. In summary, PCV13 introduction has been shown to have an indirect effect on NTHi epidemiology and there exists both negative and positive, distinct associations between pneumococci and NTHi. This should be considered when evaluating the impacts of pneumococcal vaccine design and policy.

    Funded by: Department of Health; Wellcome Trust

    Microbial genomics 2018;4;9

  • GDSCTools for mining pharmacogenomic interactions in cancer.

    Cokelaer T, Chen E, Iorio F, Menden MP, Lightfoot H, Saez-Rodriguez J and Garnett MJ

    Institut Pasteur-Bioinformatics and Biostatistics Hub-C3BI, USR 3756 IP CNRS, Paris, France.

    Motivation: Large pharmacogenomic screenings integrate heterogeneous cancer genomic datasets as well as anti-cancer drug responses on thousand human cancer cell lines. Mining this data to identify new therapies for cancer sub-populations would benefit from common data structures, modular computational biology tools and user-friendly interfaces.

    Results: We have developed GDSCTools: a software aimed at the identification of clinically relevant genomic markers of drug response. The Genomics of Drug Sensitivity in Cancer (GDSC) database ( integrates heterogeneous cancer genomic datasets as well as anti-cancer drug responses on a thousand cancer cell lines. Including statistical tools (analysis of variance) and predictive methods (Elastic Net), as well as common data structures, GDSCTools allows users to reproduce published results from GDSC and to implement new analytical methods. In addition, non-GDSC data resources can also be analysed since drug responses and genomic features can be encoded as CSV files.

    Contact: or or

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Bioinformatics (Oxford, England) 2018;34;7;1226-1228

  • Genome-wide analysis of multi- and extensively drug-resistant Mycobacterium tuberculosis.

    Coll F, Phelan J, Hill-Cawthorne GA, Nair MB, Mallard K, Ali S, Abdallah AM, Alghamdi S, Alsomali M, Ahmed AO, Portelli S, Oppong Y, Alves A, Bessa TB, Campino S, Caws M, Chatterjee A, Crampin AC, Dheda K, Furnham N, Glynn JR, Grandjean L, Minh Ha D, Hasan R, Hasan Z, Hibberd ML, Joloba M, Jones-López EC, Matsumoto T, Miranda A, Moore DJ, Mocillo N, Panaiotov S, Parkhill J, Penha C, Perdigão J, Portugal I, Rchiad Z, Robledo J, Sheen P, Shesha NT, Sirgel FA, Sola C, Oliveira Sousa E, Streicher EM, Helden PV, Viveiros M, Warren RM, McNerney R, Pain A and Clark TG

    Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK.

    To characterize the genetic determinants of resistance to antituberculosis drugs, we performed a genome-wide association study (GWAS) of 6,465 Mycobacterium tuberculosis clinical isolates from more than 30 countries. A GWAS approach within a mixed-regression framework was followed by a phylogenetics-based test for independent mutations. In addition to mutations in established and recently described resistance-associated genes, novel mutations were discovered for resistance to cycloserine, ethionamide and para-aminosalicylic acid. The capacity to detect mutations associated with resistance to ethionamide, pyrazinamide, capreomycin, cycloserine and para-aminosalicylic acid was enhanced by inclusion of insertions and deletions. Odds ratios for mutations within candidate genes were found to reflect levels of resistance. New epistatic relationships between candidate drug-resistance-associated genes were identified. Findings also suggest the involvement of efflux pumps (drrA and Rv2688c) in the emergence of resistance. This study will inform the design of new diagnostic tests and expedite the investigation of resistance and compensatory epistatic mechanisms.

    Funded by: Medical Research Council: MR/K020420/1; Wellcome Trust: 098610

    Nature genetics 2018;50;2;307-316

  • Dietary trehalose enhances virulence of epidemic Clostridium difficile.

    Collins J, Robinson C, Danhof H, Knetsch CW, van Leeuwen HC, Lawley TD, Auchtung JM and Britton RA

    Baylor College of Medicine, Department of Molecular Virology and Microbiology, One Baylor Plaza, Houston, Texas 77030, USA.

    Clostridium difficile disease has recently increased to become a dominant nosocomial pathogen in North America and Europe, although little is known about what has driven this emergence. Here we show that two epidemic ribotypes (RT027 and RT078) have acquired unique mechanisms to metabolize low concentrations of the disaccharide trehalose. RT027 strains contain a single point mutation in the trehalose repressor that increases the sensitivity of this ribotype to trehalose by more than 500-fold. Furthermore, dietary trehalose increases the virulence of a RT027 strain in a mouse model of infection. RT078 strains acquired a cluster of four genes involved in trehalose metabolism, including a PTS permease that is both necessary and sufficient for growth on low concentrations of trehalose. We propose that the implementation of trehalose as a food additive into the human diet, shortly before the emergence of these two epidemic lineages, helped select for their emergence and contributed to hypervirulence.

    Funded by: NIAID NIH HHS: R01 AI123278, U01 AI124290; NIDDK NIH HHS: P30 DK056338; NIH HHS: 5U19AI09087202

    Nature 2018;553;7688;291-294

  • Recurrent histone mutations in T-cell acute lymphoblastic leukaemia.

    Collord G, Martincorena I, Young MD, Foroni L, Bolli N, Stratton MR, Vassiliou GS, Campbell PJ and Behjati S

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK.

    Funded by: Medical Research Council: MC_PC_12009; Wellcome Trust; Wellcome Trust Clinical PhD Fellowship: WT098051; Wellcome Trust Senior Clinical Research Fellowship: WT088340MA

    British journal of haematology 2018;184;4;676-679

  • Clonal haematopoiesis is not prevalent in survivors of childhood cancer.

    Collord G, Park N, Podestà M, Dagnino M, Cilloni D, Jones D, Varela I, Frassoni F and Vassiliou GS

    Wellcome Trust Sanger Institute, Cambridge, UK.

    Funded by: Medical Research Council: MC_PC_12009; Wellcome Trust: WT095663MA, WT098051

    British journal of haematology 2018;181;4;537-539

  • An integrated genomic analysis of anaplastic meningioma identifies prognostic molecular signatures.

    Collord G, Tarpey P, Kurbatova N, Martincorena I, Moran S, Castro M, Nagy T, Bignell G, Maura F, Young MD, Berna J, Tubio JMC, McMurran CE, Young AMH, Sanders M, Noorani I, Price SJ, Watts C, Leipnitz E, Kirsch M, Schackert G, Pearson D, Devadass A, Ram Z, Collins VP, Allinson K, Jenkinson MD, Zakaria R, Syed K, Hanemann CO, Dunn J, McDermott MW, Kirollos RW, Vassiliou GS, Esteller M, Behjati S, Brazma A, Santarius T and McDermott U

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK.

    Anaplastic meningioma is a rare and aggressive brain tumor characterised by intractable recurrences and dismal outcomes. Here, we present an integrated analysis of the whole genome, transcriptome and methylation profiles of primary and recurrent anaplastic meningioma. A key finding was the delineation of distinct molecular subgroups that were associated with diametrically opposed survival outcomes. Relative to lower grade meningiomas, anaplastic tumors harbored frequent driver mutations in SWI/SNF complex genes, which were confined to the poor prognosis subgroup. Aggressive disease was further characterised by transcriptional evidence of increased PRC2 activity, stemness and epithelial-to-mesenchymal transition. Our analyses discern biologically distinct variants of anaplastic meningioma with prognostic and therapeutic significance.

    Funded by: Medical Research Council: MC_PC_12009; Wellcome Trust: WT098051

    Scientific reports 2018;8;1;13537

  • Computational pan-genomics: status, promises and challenges.

    Computational Pan-Genomics Consortium

    Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains.

    Briefings in bioinformatics 2018;19;1;118-135

  • Formalising recall by genotype as an efficient approach to detailed phenotyping and causal inference.

    Corbin LJ, Tan VY, Hughes DA, Wade KH, Paul DS, Tansey KE, Butcher F, Dudbridge F, Howson JM, Jallow MW, John C, Kingston N, Lindgren CM, O'Donavan M, O'Rahilly S, Owen MJ, Palmer CNA, Pearson ER, Scott RA, van Heel DA, Whittaker J, Frayling T, Tobin MD, Wain LV, Smith GD, Evans DM, Karpe F, McCarthy MI, Danesh J, Franks PW and Timpson NJ

    MRC Integrative Epidemiology Unit at University of Bristol, Bristol, BS8 2BN, UK.

    Detailed phenotyping is required to deepen our understanding of the biological mechanisms behind genetic associations. In addition, the impact of potentially modifiable risk factors on disease requires analytical frameworks that allow causal inference. Here, we discuss the characteristics of Recall-by-Genotype (RbG) as a study design aimed at addressing both these needs. We describe two broad scenarios for the application of RbG: studies using single variants and those using multiple variants. We consider the efficacy and practicality of the RbG approach, provide a catalogue of UK-based resources for such studies and present an online RbG study planner.

    Funded by: British Heart Foundation: RG/16/4/32218; Medical Research Council: G0600705, G0902313, G9815508, MC_EX_MR/M01424X/1, MC_PC_15018, MC_PC_MR/R020183/1, MC_U123292700, MC_UU_00026/3, MC_UU_12012/1, MC_UU_12012/5, MC_UU_12013/1, MC_UU_12013/3, MC_UU_12013/4, MR/L003120/1, MR/L010305/1, MR/L020149/1, MR/N011317/1, MR/P00167X/1, MR/P013880/1, MR/P02811X/1; NIDDK NIH HHS: U01 DK105535

    Nature communications 2018;9;1;711

  • Draft Genome Sequences of Two Multidrug-Resistant Salmonella enterica Serovar Typhimurium Clinical Isolates from Uruguay.

    Cordeiro NF, D'Alessandro B, Iriarte A, Pickard D, Yim L, Chabalgoity JA, Betancor L and Vignoli R

    Departamento de Bacteriología y Virología, Instituto de Higiene, Facultad de Medicina, UDELAR, Montevideo, Uruguay.

    Multidrug-resistant Salmonella enterica isolates are an increasing problem worldwide; nevertheless, the mechanisms responsible for such resistance are rarely well defined. Multidrug-resistant S. enterica serovar Typhimurium isolates ST3224 and ST827 were collected from two patients. The characteristics of both genomes and antimicrobial resistance genes were determined using next-generation sequencing.

    Microbiology resource announcements 2018;7;4

  • PPARs and Metabolic Disorders Associated with Challenged Adipose Tissue Plasticity.

    Corrales P, Vidal-Puig A and Medina-Gómez G

    Área de Bioquímica y Biología Molecular, Departamento de Ciencias Básicas de la Salud, Facultad de Ciencias de la Salud, Universidad Rey Juan Carlos, Avda. de Atenas s/n. Alcorcón, 28922 Madrid, Spain.

    Peroxisome proliferator-activated receptors (PPARs) are members of a family of nuclear hormone receptors that exert their transcriptional control on genes harboring PPAR-responsive regulatory elements (PPRE) in partnership with retinoid X receptors (RXR). The activation of PPARs coordinated by specific coactivators/repressors regulate networks of genes controlling diverse homeostatic processes involving inflammation, adipogenesis, lipid metabolism, glucose homeostasis, and insulin resistance. Defects in PPARs have been linked to lipodystrophy, obesity, and insulin resistance as a result of the impairment of adipose tissue expandability and functionality. PPARs can act as lipid sensors, and when optimally activated, can rewire many of the metabolic pathways typically disrupted in obesity leading to an improvement of metabolic homeostasis. PPARs also contribute to the homeostasis of adipose tissue under challenging physiological circumstances, such as pregnancy and aging. Given their potential pathogenic role and their therapeutic potential, the benefits of PPARs activation should not only be considered relevant in the context of energy balance-associated pathologies and insulin resistance but also as potential relevant targets in the context of diabetic pregnancy and changes in body composition and metabolic stress associated with aging. Here, we review the rationale for the optimization of PPAR activation under these conditions.

    International journal of molecular sciences 2018;19;7

  • Sequence variation of Epstein-Barr virus: viral types, geography, codon usage and diseases.

    Correia S, Bridges R, Wegner F, Venturini C, Palser A, Middeldorp JM, Cohen JI, Lorenzetti MA, Bassano I, White RE, Kellam P, Breuer J and Farrell PJ

    Section of Virology, Faculty of Medicine, Norfolk Place, London W2 1PG, UK.

    138 new Epstein-Barr virus (EBV) genome sequences have been determined. 125 of these and 116 from previous reports were combined to produce a multiple sequence alignment of 241 EBV genomes, which we have used to analyze variation within the viral genome. The type 1/type2 classification of EBV remains the major form of variation and is defined mostly by EBNA2 and EBNA3, but the type 2 SNPs at the EBNA3 locus extend into the adjacent gp350 and gp42 genes, whose products mediate infection of B cells by EBV. A small insertion within the BART miRNA region of the genome was present in 21 EBV strains. EBV from saliva of USA patients with chronic active EBV infection aligned with the wild type EBV genome, with no evidence of WZhet rearrangements. The V3 polymorphism in the Zp promoter for BZLF1 was found to be frequent in nasopharyngeal carcinoma cases both from Hong Kong and Indonesia. Codon usage was found to differ between latent and lytic cycle EBV genes and the main forms of variation of the EBNA1 protein have been identified.<b>IMPORTANCE</b> Epstein-Barr virus causes most cases of infectious mononucleosis and post-transplant lymphoproliferative disease. It contributes to several types of cancer including Hodgkin's lymphoma, Burkitt's lymphoma, diffuse large B cell lymphoma, nasopharyngeal carcinoma and gastric carcinoma. EBV genome variation is important because some of the diseases associated with EBV have very different incidences in different populations and geographic regions - differences in the EBV genome might contribute to these diseases. Some specific EBV genome alterations that appear to be significant in EBV associated cancers are already known and current efforts to make an EBV vaccine and antiviral drugs should also take account of sequence differences in the proteins used as targets.

    Journal of virology 2018

  • Eradication genomics-lessons for parasite control.

    Cotton JA, Berriman M, Dalén L and Barnes I

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Science (New York, N.Y.) 2018;361;6398;130-131

  • Leishmania naiffi and Leishmania guyanensis reference genomes highlight genome structure and gene evolution in the Viannia subgenus.

    Coughlan S, Taylor AS, Feane E, Sanders M, Schonian G, Cotton JA and Downing T

    School of Mathematics, Applied Mathematics and Statistics, National University of Ireland, Galway, Republic of Ireland.

    The unicellular protozoan parasite <i>Leishmania</i> causes the neglected tropical disease leishmaniasis, affecting 12 million people in 98 countries. In South America, where the <i>Viannia</i> subgenus predominates, so far only <i>L.</i> (<i>Viannia</i>) <i>braziliensis</i> and <i>L.</i> (<i>V.</i>) <i>panamensis</i> have been sequenced, assembled and annotated as reference genomes. Addressing this deficit in molecular information can inform species typing, epidemiological monitoring and clinical treatment. Here, <i>L.</i> (<i>V.</i>) <i>naiffi</i> and <i>L.</i> (<i>V.</i>) <i>guyanensis</i> genomic DNA was sequenced to assemble these two genomes as draft references from short sequence reads. The methods used were tested using short sequence reads for <i>L. braziliensis</i> M2904 against its published reference as a comparison. This assembly and annotation pipeline identified 70 additional genes not annotated on the original M2904 reference. Phylogenetic and evolutionary comparisons of <i>L. guyanensis</i> and <i>L. naiffi</i> with 10 other <i>Viannia</i> genomes revealed four traits common to all <i>Viannia</i>: aneuploidy, 22 orthologous groups of genes absent in other <i>Leishmania</i> subgenera, elevated TATE transposon copies and a high NADH-dependent fumarate reductase gene copy number. Within the <i>Viannia</i>, there were limited structural changes in genome architecture specific to individual species: a 45 Kb amplification on chromosome 34 was present in all bar <i>L. lainsoni</i>, <i>L. naiffi</i> had a higher copy number of the virulence factor leishmanolysin, and laboratory isolate <i>L. shawi</i> M8408 had a possible minichromosome derived from the 3' end of chromosome 34<i>.</i> This combination of genome assembly, phylogenetics and comparative analysis across an extended panel of diverse <i>Viannia</i> has uncovered new insights into the origin and evolution of this subgenus and can help improve diagnostics for leishmaniasis surveillance.

    Royal Society open science 2018;5;4;172212

  • Mapping the malaria parasite druggable genome by using in vitro evolution and chemogenomics.

    Cowell AN, Istvan ES, Lukens AK, Gomez-Lorenzo MG, Vanaerschot M, Sakata-Kato T, Flannery EL, Magistrado P, Owen E, Abraham M, LaMonte G, Painter HJ, Williams RM, Franco V, Linares M, Arriaga I, Bopp S, Corey VC, Gnädig NF, Coburn-Flynn O, Reimer C, Gupta P, Murithi JM, Moura PA, Fuchs O, Sasaki E, Kim SW, Teng CH, Wang LT, Akidil A, Adjalley S, Willis PA, Siegel D, Tanaseichuk O, Zhong Y, Zhou Y, Llinás M, Ottilie S, Gamo FJ, Lee MCS, Goldberg DE, Fidock DA, Wirth DF and Winzeler EA

    School of Medicine, University of California San Diego (UCSD), 9500 Gilman Drive, La Jolla, CA 92093, USA.

    Chemogenetic characterization through in vitro evolution combined with whole-genome analysis can identify antimalarial drug targets and drug-resistance genes. We performed a genome analysis of 262 <i>Plasmodium falciparum</i> parasites resistant to 37 diverse compounds. We found 159 gene amplifications and 148 nonsynonymous changes in 83 genes associated with drug-resistance acquisition, where gene amplifications contributed to one-third of resistance acquisition events. Beyond confirming previously identified multidrug-resistance mechanisms, we discovered hitherto unrecognized drug target-inhibitor pairs, including thymidylate synthase and a benzoquinazolinone, farnesyltransferase and a pyrimidinedione, and a dipeptidylpeptidase and an arylurea. This exploration of the <i>P. falciparum</i> resistome and druggable genome will likely guide drug discovery and structural biology efforts, while also advancing our understanding of resistance mechanisms available to the malaria parasite.

    Funded by: NIAID NIH HHS: F32 AI102567, R01 AI050234, R01 AI090141, R01 AI099105, R01 AI103058, R37 AI050234, T32 AI007036; NIGMS NIH HHS: P50 GM085764, T32 GM007198, T32 GM008666

    Science (New York, N.Y.) 2018;359;6372;191-199

  • Transposon Insertion Sequencing Elucidates Novel Gene Involvement in Susceptibility and Resistance to Phages T4 and T7 in Escherichia coli O157.

    Cowley LA, Low AS, Pickard D, Boinett CJ, Dallman TJ, Day M, Perry N, Gally DL, Parkhill J, Jenkins C and Cain AK

    Gastrointestinal Bacterial Reference Unit, Public Health England, London United Kingdom

    Experiments using bacteriophage (phage) to infect bacterial strains have helped define some basic genetic concepts in microbiology, but our understanding of the complexity of bacterium-phage interactions is still limited. As the global threat of antibiotic resistance continues to increase, phage therapy has reemerged as an attractive alternative or supplement to treating antibiotic-resistant bacterial infections. Further, the long-used method of phage typing to classify bacterial strains is being replaced by molecular genetic techniques. Thus, there is a growing need for a complete understanding of the precise molecular mechanisms underpinning phage-bacterium interactions to optimize phage therapy for the clinic as well as for retrospectively interpreting phage typing data on the molecular level. In this study, a genomics-based fitness assay (TraDIS) was used to identify all host genes involved in phage susceptibility and resistance for a T4 phage infecting Shiga-toxigenic <i>Escherichia coli</i> O157. The TraDIS results identified both established and previously unidentified genes involved in phage infection, and a subset were confirmed by site-directed mutagenesis and phenotypic testing of 14 T4 and 2 T7 phages. For the first time, the entire <i>sap</i> operon was implicated in phage susceptibility and, conversely, the stringent starvation protein A gene (<i>sspA</i>) was shown to provide phage resistance. Identifying genes involved in phage infection and replication should facilitate the selection of bespoke phage combinations to target specific bacterial pathogens.<b>IMPORTANCE</b> Antibiotic resistance has diminished treatment options for many common bacterial infections. Phage therapy is an alternative option that was once popularly used across Europe to kill bacteria within humans. Phage therapy acts by using highly specific viruses (called phages) that infect and lyse certain bacterial species to treat the infection. Whole-genome sequencing has allowed modernization of the investigations into phage-bacterium interactions. Here, using <i>E. coli</i> O157 and T4 bacteriophage as a model, we have exploited a genome-wide fitness assay to investigate all genes involved in defining phage resistance or susceptibility. This knowledge of the genetic determinants of phage resistance and susceptibility can be used to design bespoke phage combinations targeted to specific bacterial infections for successful infection eradication.

    Funded by: Biotechnology and Biological Sciences Research Council: P013740; Medical Research Council: G1100100, G1100100/1; Wellcome Trust: WT098051

    mBio 2018;9;4

  • The evolutionary landscape of colorectal tumorigenesis.

    Cross W, Kovac M, Mustonen V, Temko D, Davis H, Baker AM, Biswas S, Arnold R, Chegwidden L, Gatenbee C, Anderson AR, Koelzer VH, Martinez P, Jiang X, Domingo E, Woodcock DJ, Feng Y, Kovacova M, Maughan T, S:CORT Consortium, Jansen M, Rodriguez-Justo M, Ashraf S, Guy R, Cunningham C, East JE, Wedge DC, Wang LM, Palles C, Heinimann K, Sottoriva A, Leedham SJ, Graham TA and Tomlinson IPM

    Evolution and Cancer Laboratory, Barts Cancer Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK.

    The evolutionary events that cause colorectal adenomas (benign) to progress to carcinomas (malignant) remain largely undetermined. Using multi-region genome and exome sequencing of 24 benign and malignant colorectal tumours, we investigate the evolutionary fitness landscape occupied by these neoplasms. Unlike carcinomas, advanced adenomas frequently harbour sub-clonal driver mutations-considered to be functionally important in the carcinogenic process-that have not swept to fixation, and have relatively high genetic heterogeneity. Carcinomas are distinguished from adenomas by widespread aneusomies that are usually clonal and often accrue in a 'punctuated' fashion. We conclude that adenomas evolve across an undulating fitness landscape, whereas carcinomas occupy a sharper fitness peak, probably owing to stabilizing selection.

    Funded by: Cancer Research UK: A16459, A19771; European Research Council: 340560; Medical Research Council: MR/K000063/1, MR/L016508/1, MR/M009157/1, MR/M016587/1; Wellcome Trust

    Nature ecology & evolution 2018;2;10;1661-1672

  • Pneumococcal Vaccines: Host Interactions, Population Dynamics, and Design Principles.

    Croucher NJ, Løchen A and Bentley SD

    Department of Infectious Disease Epidemiology, Imperial College London, London W2 1PG, United Kingdom.

    Streptococcus pneumoniae (the pneumococcus) is a nasopharyngeal commensal and respiratory pathogen. Most isolates express a capsule, the species-wide diversity of which has been immunologically classified into ∼100 serotypes. Capsule polysaccharides have been combined into multivalent vaccines widely used in adults, but the T cell independence of the antibody response means they are not protective in infants. Polysaccharide conjugate vaccines (PCVs) trigger a T cell-dependent response through attaching a carrier protein to capsular polysaccharides. The immune response stimulated by PCVs in infants inhibits carriage of vaccine serotypes (VTs), resulting in population-wide herd immunity. These were replaced in carriage by non-VTs. Nevertheless, PCVs drove reductions in infant pneumococcal disease, due to the lower mean invasiveness of the postvaccination bacterial population; age-varying serotype invasiveness resulted in a smaller reduction in adult disease. Alternative vaccines being tested in trials are designed to provide species-wide protection through stimulating innate and cellular immune responses, alongside antibodies to conserved antigens.

    Funded by: Medical Research Council: MR/R015600/1

    Annual review of microbiology 2018;72;521-549

  • Preclinical Development of a Novel, Orally-Administered Anti-Tumour Necrosis Factor Domain Antibody for the Treatment of Inflammatory Bowel Disease.

    Crowe JS, Roberts KJ, Carlton TM, Maggiore L, Cubitt MF, Clare S, Harcourt K, Reckless J, MacDonald TT, Ray KP, Vossenkämper A and West MR

    VHsquared Ltd., 1 Lower Court, Copley Hill, Cambridge Road, Babraham, Cambridge, CB22 3GN, UK.

    TNFα is an important cytokine in inflammatory bowel disease. V565 is a novel anti-TNFα domain antibody developed for oral administration in IBD patients, derived from a llama domain antibody and engineered to enhance intestinal protease resistance. V565 activity was evaluated in TNFα-TNFα receptor-binding ELISAs as well as TNFα responsive cellular assays and demonstrated neutralisation of both soluble and membrane TNFα with potencies similar to those of adalimumab. Although sensitive to pepsin, V565 retained activity after lengthy incubations with trypsin, chymotrypsin, and pancreatin, as well as mouse small intestinal and human ileal and faecal supernatants. In orally dosed naïve and DSS colitis mice, high V565 concentrations were observed in intestinal contents and faeces and immunostaining revealed V565 localisation in mouse colon tissue. V565 was detected by ELISA in post-dose serum of colitis mice, but not naïve mice, demonstrating penetration of disrupted epithelium. In an ex vivo human IBD tissue culture model, V565 inhibition of tissue phosphoprotein levels and production of inflammatory cytokine biomarkers was similar to infliximab, demonstrating efficacy when present at the disease site. Taken together, results of these studies provide confidence that oral V565 dosing will be therapeutic in IBD patients where the mucosal epithelial barrier is compromised.

    Funded by: Medical Research Council: G0800746; Wellcome Trust

    Scientific reports 2018;8;1;4941

  • Integrated genomic and metabolomic profiling of ISC1, an emerging Leishmania donovani population in the Indian subcontinent.

    Cuypers B, Berg M, Imamura H, Dumetz F, De Muylder G, Domagalska MA, Rijal S, Bhattarai NR, Maes I, Sanders M, Cotton JA, Meysman P, Laukens K and Dujardin JC

    Department of Biomedical Sciences, Institute of Tropical Medicine, Antwerp, Belgium; Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium.

    Leishmania donovani is the responsible agent for visceral leishmaniasis (VL) in the Indian subcontinent (ISC). The disease is lethal without treatment and causes 0.2 to 0.4 million cases each year. Recently, reports of VL in Nepalese hilly districts have increased as well as VL cases caused by L. donovani from the ISC1 genetic group, a new and emerging genotype. In this study, we perform for the first time an integrated, untargeted genomics and metabolomics approach to characterize ISC1, in comparison with the Core Group (CG), main population that drove the most recent outbreak of VL in the ISC. We show that the ISC1 population is very different from the CG, both at genome and metabolome levels. The genomic differences include SNPs, CNV and small indels in genes coding for known virulence factors, immunogens and surface proteins. Both genomic and metabolic approaches highlighted dissimilarities related to membrane lipids, the nucleotide salvage pathway and the urea cycle in ISC1 versus CG. Many of these pathways and molecules are important for the interaction with the host/extracellular environment. Altogether, our data predict major functional differences in ISC1 versus CG parasites, including virulence. Therefore, particular attention is required to monitor the fate of this emerging ISC1 population in the ISC, especially in a post-VL elimination context.

    Funded by: Wellcome Trust: 206194

    Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases 2018;62;170-178

  • Fluoroquinolone resistance in Salmonella: insights by whole-genome sequencing.

    Cuypers WL, Jacobs J, Wong V, Klemm EJ, Deborggraeve S and Van Puyvelde S

    1​Department of Biomedical Sciences, Institute of Tropical Medicine, Antwerpen, Belgium.

    Fluoroquinolone (FQ)-resistant Salmonella spp. were listed by the WHO in 2017 as priority pathogens for which new antibiotics were urgently needed. The overall global burden of Salmonella infections is high, but differs per region. Whereas typhoid fever is most prevalent in South and South-East Asia, non-typhoidal salmonellosis is prevalent across the globe and associated with a mild gastroenteritis. By contrast, invasive non-typhoidal Salmonella cause bloodstream infections associated with high mortality, particularly in sub-Saharan Africa. Most Salmonella strains from clinical sources are resistant to first-line antibiotics, with FQs now being the antibiotic of choice for treatment of invasive Salmonella infections. However, FQ resistance is increasingly being reported in Salmonella, and multiple molecular mechanisms are already described. Whole-genome sequencing (WGS) is becoming more frequently used to analyse bacterial genomes for antibiotic-resistance markers, and to understand the phylogeny of bacteria in relation to their antibiotic-resistance profiles. This mini-review provides an overview of FQ resistance in Salmonella, guided by WGS studies that demonstrate that WGS is a valuable tool for global surveillance.

    Funded by: Wellcome Trust

    Microbial genomics 2018;4;7

  • A novel prophage identified in strains from Salmonella enterica serovar Enteritidis is a phylogenetic signature of the lineage ST-1974.

    D'Alessandro B, Pérez Escanda V, Balestrazzi L, Iriarte A, Pickard D, Yim L, Chabalgoity JA and Betancor L

    1​Instituto de Higiene, Facultad de Medicina, UDELAR, Montevideo, Uruguay.

    Salmonella enterica serovar Enteritidis is a major agent of foodborne diseases worldwide. In Uruguay, this serovar was almost negligible until the mid 1990s but since then it has become the most prevalent. Previously, we characterized a collection of strains isolated from 1988 to 2005 and found that the two oldest strains were the most genetically divergent. In order to further characterize these strains, we sequenced and annotated eight genomes including those of the two oldest isolates. We report on the identification and characterization of a novel 44 kbp Salmonella prophage found exclusively in these two genomes. Sequence analysis reveals that the prophage is a mosaic, with homologous regions in different Salmonella prophages. It contains 60 coding sequences, including two genes, gogB and sseK3, involved in virulence and modulation of host immune response. Analysis of serovar Enteritidis genomes available in public databases confirmed that this prophage is absent in most of them, with the exception of a group of 154 genomes. All 154 strains carrying this prophage belong to the same sequence type (ST-1974), suggesting that its acquisition occurred in a common ancestor. We tested this by phylogenetic analysis of 203 genomes representative of the intraserovar diversity. The ST-1974 forms a distinctive monophyletic lineage, and the newly described prophage is a phylogenetic signature of this lineage that could be used as a molecular marker. The phylogenetic analysis also shows that the major ST (ST-11) is polyphyletic and might have given rise to almost all other STs, including ST-1974.

    Microbial genomics 2018;4;3

  • Fine-mapping of prostate cancer susceptibility loci in a large meta-analysis identifies candidate causal variants.

    Dadaev T, Saunders EJ, Newcombe PJ, Anokian E, Leongamornlert DA, Brook MN, Cieza-Borrella C, Mijuskovic M, Wakerell S, Olama AAA, Schumacher FR, Berndt SI, Benlloch S, Ahmed M, Goh C, Sheng X, Zhang Z, Muir K, Govindasami K, Lophatananon A, Stevens VL, Gapstur SM, Carter BD, Tangen CM, Goodman P, Thompson IM, Batra J, Chambers S, Moya L, Clements J, Horvath L, Tilley W, Risbridger G, Gronberg H, Aly M, Nordström T, Pharoah P, Pashayan N, Schleutker J, Tammela TLJ, Sipeky C, Auvinen A, Albanes D, Weinstein S, Wolk A, Hakansson N, West C, Dunning AM, Burnet N, Mucci L, Giovannucci E, Andriole G, Cussenot O, Cancel-Tassin G, Koutros S, Freeman LEB, Sorensen KD, Orntoft TF, Borre M, Maehle L, Grindedal EM, Neal DE, Donovan JL, Hamdy FC, Martin RM, Travis RC, Key TJ, Hamilton RJ, Fleshner NE, Finelli A, Ingles SA, Stern MC, Rosenstein B, Kerns S, Ostrer H, Lu YJ, Zhang HW, Feng N, Mao X, Guo X, Wang G, Sun Z, Giles GG, Southey MC, MacInnis RJ, FitzGerald LM, Kibel AS, Drake BF, Vega A, Gómez-Caamaño A, Fachal L, Szulkin R, Eklund M, Kogevinas M, Llorca J, Castaño-Vinyals G, Penney KL, Stampfer M, Park JY, Sellers TA, Lin HY, Stanford JL, Cybulski C, Wokolorczyk D, Lubinski J, Ostrander EA, Geybels MS, Nordestgaard BG, Nielsen SF, Weisher M, Bisbjerg R, Røder MA, Iversen P, Brenner H, Cuk K, Holleczek B, Maier C, Luedeke M, Schnoeller T, Kim J, Logothetis CJ, John EM, Teixeira MR, Paulo P, Cardoso M, Neuhausen SL, Steele L, Ding YC, De Ruyck K, De Meerleer G, Ost P, Razack A, Lim J, Teo SH, Lin DW, Newcomb LF, Lessel D, Gamulin M, Kulis T, Kaneva R, Usmani N, Slavov C, Mitev V, Parliament M, Singhal S, Claessens F, Joniau S, Van den Broeck T, Larkin S, Townsend PA, Aukim-Hastie C, Gago-Dominguez M, Castelao JE, Martinez ME, Roobol MJ, Jenster G, van Schaik RHN, Menegaux F, Truong T, Koudou YA, Xu J, Khaw KT, Cannon-Albright L, Pandha H, Michael A, Kierzek A, Thibodeau SN, McDonnell SK, Schaid DJ, Lindstrom S, Turman C, Ma J, Hunter DJ, Riboli E, Siddiq A, Canzian F, Kolonel LN, Le Marchand L, Hoover RN, Machiela MJ, Kraft P, PRACTICAL (Prostate Cancer Association Group to Investigate Cancer-Associated Alterations in the Genome) Consortium, Freedman M, Wiklund F, Chanock S, Henderson BE, Easton DF, Haiman CA, Eeles RA, Conti DV and Kote-Jarai Z

    The Institute of Cancer Research, London, SW7 3RP, UK.

    Prostate cancer is a polygenic disease with a large heritable component. A number of common, low-penetrance prostate cancer risk loci have been identified through GWAS. Here we apply the Bayesian multivariate variable selection algorithm JAM to fine-map 84 prostate cancer susceptibility loci, using summary data from a large European ancestry meta-analysis. We observe evidence for multiple independent signals at 12 regions and 99 risk signals overall. Only 15 original GWAS tag SNPs remain among the catalogue of candidate variants identified; the remainder are replaced by more likely candidates. Biological annotation of our credible set of variants indicates significant enrichment within promoter and enhancer elements, and transcription factor-binding sites, including AR, ERG and FOXA1. In 40 regions at least one variant is colocalised with an eQTL in prostate cancer tissue. The refined set of candidate variants substantially increase the proportion of familial relative risk explained by these known susceptibility regions, which highlights the importance of fine-mapping studies and has implications for clinical risk profiling.

    Funded by: Cancer Research UK: 13232, 14136; Medical Research Council: G0401527, G1000143; NCI NIH HHS: K07 CA187546, UM1 CA182883

    Nature communications 2018;9;1;2256

  • 137 ancient human genomes from across the Eurasian steppes.

    Damgaard PB, Marchi N, Rasmussen S, Peyrot M, Renaud G, Korneliussen T, Moreno-Mayar JV, Pedersen MW, Goldberg A, Usmanova E, Baimukhanov N, Loman V, Hedeager L, Pedersen AG, Nielsen K, Afanasiev G, Akmatov K, Aldashev A, Alpaslan A, Baimbetov G, Bazaliiskii VI, Beisenov A, Boldbaatar B, Boldgiv B, Dorzhu C, Ellingvag S, Erdenebaatar D, Dajani R, Dmitriev E, Evdokimov V, Frei KM, Gromov A, Goryachev A, Hakonarson H, Hegay T, Khachatryan Z, Khaskhanov R, Kitov E, Kolbina A, Kubatbek T, Kukushkin A, Kukushkin I, Lau N, Margaryan A, Merkyte I, Mertz IV, Mertz VK, Mijiddorj E, Moiyesev V, Mukhtarova G, Nurmukhanbetov B, Orozbekova Z, Panyushkina I, Pieta K, Smrčka V, Shevnina I, Logvin A, Sjögren KG, Štolcová T, Taravella AM, Tashbaeva K, Tkachev A, Tulegenov T, Voyakin D, Yepiskoposyan L, Undrakhbold S, Varfolomeev V, Weber A, Wilson Sayres MA, Kradin N, Allentoft ME, Orlando L, Nielsen R, Sikora M, Heyer E, Kristiansen K and Willerslev E

    Center for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark.

    For thousands of years the Eurasian steppes have been a centre of human migrations and cultural change. Here we sequence the genomes of 137 ancient humans (about 1× average coverage), covering a period of 4,000 years, to understand the population history of the Eurasian steppes after the Bronze Age migrations. We find that the genetics of the Scythian groups that dominated the Eurasian steppes throughout the Iron Age were highly structured, with diverse origins comprising Late Bronze Age herders, European farmers and southern Siberian hunter-gatherers. Later, Scythians admixed with the eastern steppe nomads who formed the Xiongnu confederations, and moved westward in about the second or third century BC, forming the Hun traditions in the fourth-fifth century AD, and carrying with them plague that was basal to the Justinian plague. These nomads were further admixed with East Asian groups during several short-term khanates in the Medieval period. These historical events transformed the Eurasian steppes from being inhabited by Indo-European speakers of largely West Eurasian ancestry to the mostly Turkic-speaking groups of the present day, who are primarily of East Asian ancestry.

    Nature 2018;557;7705;369-374

  • Amino acid residues in five separate HLA genes can explain most of the known associations between the MHC and primary biliary cholangitis.

    Darlay R, Ayers KL, Mells GF, Hall LS, Liu JZ, Almarri MA, Alexander GJ, Jones DE, Sandford RN, Anderson CA and Cordell HJ

    Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, United Kingdom.

    Primary Biliary Cholangitis (PBC) is a chronic autoimmune liver disease characterised by progressive destruction of intrahepatic bile ducts. The strongest genetic association is with HLA-DQA1*04:01, but at least three additional independent HLA haplotypes contribute to susceptibility. We used dense single nucleotide polymorphism (SNP) data in 2861 PBC cases and 8514 controls to impute classical HLA alleles and amino acid polymorphisms using state-of-the-art methodologies. We then demonstrated through stepwise regression that association in the HLA region can be largely explained by variation at five separate amino acid positions. Three-dimensional modelling of protein structures and calculation of electrostatic potentials for the implicated HLA alleles/amino acid substitutions demonstrated a correlation between the electrostatic potential of pocket P6 in HLA-DP molecules and the HLA-DPB1 alleles/amino acid substitutions conferring PBC susceptibility/protection, highlighting potential new avenues for future functional investigation.

    Funded by: Medical Research Council: MR/L001489/1; Wellcome Trust: 085925/Z/08/Z, 102858/Z/13/Z, WT090355/A/09/Z, WT090355/B/09/Z

    PLoS genetics 2018;14;12;e1007833

  • Spatial structuring of a Legionella pneumophila population within the water system of a large occupational building.

    David S, Mentasti M, Lai S, Vaghji L, Ready D, Chalker VJ and Parkhill J

    1​Pathogen Genomics, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK.

    The diversity of Legionella pneumophila populations within single water systems is not well understood, particularly in those unassociated with cases of Legionnaires' disease. Here, we performed genomic analysis of 235 L. pneumophila isolates obtained from 28 water samples in 13 locations within a large occupational building. Despite regular treatment, the water system of this building is thought to have been colonized by L. pneumophila for at least 30 years without evidence of association with Legionnaires' disease cases. All isolates belonged to one of three sequence types (STs), ST27 (n=81), ST68 (n=122) and ST87 (n=32), all three of which have been recovered from Legionnaires' disease patients previously. Pairwise single nucleotide polymorphism differences amongst isolates of the same ST were low, ranging from 0 to 19 in ST27, from 0 to 30 in ST68 and from 0 to 7 in ST87, and no homologous recombination was observed in any lineage. However, there was evidence of horizontal transfer of a plasmid, which was found in all ST87 isolates and only one ST68 isolate. A single ST was found in 10/13 sampled locations, and isolates of each ST were also more similar to those from the same location compared with those from different locations, demonstrating spatial structuring of the population within the water system. These findings provide the first insights into the diversity and genomic evolution of a L. pneumophila population within a complex water system not associated with disease.

    Funded by: Wellcome Trust: 098051

    Microbial genomics 2018;4;10

  • Low genomic diversity of Legionella pneumophila within clinical specimens.

    David S, Mentasti M, Parkhill J and Chalker VJ

    Centre for Genomic Pathogen Surveillance, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom. Electronic address:

    Objectives: Legionella pneumophila is the leading cause of Legionnaires' disease, a severe form of pneumonia acquired from environmental sources. Investigations of both sporadic cases and outbreaks rely mostly on analysis of a single to a few colony pick(s) isolated from each patient. However, because of the lack of data describing diversity within single patients, the optimal number of picks is unknown. Here, we investigated diversity within individual patients using sequence-based typing (SBT) and whole-genome sequencing (WGS).

    Methods: Ten isolates of L. pneumophila were obtained from each of ten epidemiologically unrelated patients. SBT and WGS were undertaken, and single-nucleotide polymorphisms (SNPs) were identified between isolates from the same patient.

    Results: The same sequence type (ST) was obtained for each set of ten isolates. Using genomic analysis, zero SNPs were identified between isolates from seven patients, a maximum of one SNP was found between isolates from two patients, and a maximum of two SNPs was found amongst isolates from one patient. Assuming that the full within-host diversity has been captured with ten isolates, statistical analyses showed that, on average, analysis of one isolate would yield a 70% chance of capturing all observed genotypes, and seven isolates would yield a 90% chance.

    Conclusions: SBT and WGS analyses of multiple colony picks obtained from ten patients showed no, or very low, within-host genomic diversity in L. pneumophila, suggesting that analysis of one colony pick per patient will often be sufficient to obtain reliable typing data to aid investigation of cases of Legionnaires' disease.

    Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases 2018;24;9;1020.e1-1020.e4

  • Study of 300,486 individuals identifies 148 independent genetic loci influencing general cognitive function.

    Davies G, Lam M, Harris SE, Trampush JW, Luciano M, Hill WD, Hagenaars SP, Ritchie SJ, Marioni RE, Fawns-Ritchie C, Liewald DCM, Okely JA, Ahola-Olli AV, Barnes CLK, Bertram L, Bis JC, Burdick KE, Christoforou A, DeRosse P, Djurovic S, Espeseth T, Giakoumaki S, Giddaluru S, Gustavson DE, Hayward C, Hofer E, Ikram MA, Karlsson R, Knowles E, Lahti J, Leber M, Li S, Mather KA, Melle I, Morris D, Oldmeadow C, Palviainen T, Payton A, Pazoki R, Petrovic K, Reynolds CA, Sargurupremraj M, Scholz M, Smith JA, Smith AV, Terzikhan N, Thalamuthu A, Trompet S, van der Lee SJ, Ware EB, Windham BG, Wright MJ, Yang J, Yu J, Ames D, Amin N, Amouyel P, Andreassen OA, Armstrong NJ, Assareh AA, Attia JR, Attix D, Avramopoulos D, Bennett DA, Böhmer AC, Boyle PA, Brodaty H, Campbell H, Cannon TD, Cirulli ET, Congdon E, Conley ED, Corley J, Cox SR, Dale AM, Dehghan A, Dick D, Dickinson D, Eriksson JG, Evangelou E, Faul JD, Ford I, Freimer NA, Gao H, Giegling I, Gillespie NA, Gordon SD, Gottesman RF, Griswold ME, Gudnason V, Harris TB, Hartmann AM, Hatzimanolis A, Heiss G, Holliday EG, Joshi PK, Kähönen M, Kardia SLR, Karlsson I, Kleineidam L, Knopman DS, Kochan NA, Konte B, Kwok JB, Le Hellard S, Lee T, Lehtimäki T, Li SC, Lill CM, Liu T, Koini M, London E, Longstreth WT, Lopez OL, Loukola A, Luck T, Lundervold AJ, Lundquist A, Lyytikäinen LP, Martin NG, Montgomery GW, Murray AD, Need AC, Noordam R, Nyberg L, Ollier W, Papenberg G, Pattie A, Polasek O, Poldrack RA, Psaty BM, Reppermund S, Riedel-Heller SG, Rose RJ, Rotter JI, Roussos P, Rovio SP, Saba Y, Sabb FW, Sachdev PS, Satizabal CL, Schmid M, Scott RJ, Scult MA, Simino J, Slagboom PE, Smyrnis N, Soumaré A, Stefanis NC, Stott DJ, Straub RE, Sundet K, Taylor AM, Taylor KD, Tzoulaki I, Tzourio C, Uitterlinden A, Vitart V, Voineskos AN, Kaprio J, Wagner M, Wagner H, Weinhold L, Wen KH, Widen E, Yang Q, Zhao W, Adams HHH, Arking DE, Bilder RM, Bitsios P, Boerwinkle E, Chiba-Falek O, Corvin A, De Jager PL, Debette S, Donohoe G, Elliott P, Fitzpatrick AL, Gill M, Glahn DC, Hägg S, Hansell NK, Hariri AR, Ikram MK, Jukema JW, Vuoksimaa E, Keller MC, Kremen WS, Launer L, Lindenberger U, Palotie A, Pedersen NL, Pendleton N, Porteous DJ, Räikkönen K, Raitakari OT, Ramirez A, Reinvang I, Rudan I, Dan Rujescu, Schmidt R, Schmidt H, Schofield PW, Schofield PR, Starr JM, Steen VM, Trollor JN, Turner ST, Van Duijn CM, Villringer A, Weinberger DR, Weir DR, Wilson JF, Malhotra A, McIntosh AM, Gale CR, Seshadri S, Mosley TH, Bressler J, Lencz T and Deary IJ

    Centre for Cognitive Ageing and Cognitive Epidemiology, Department of Psychology, School of Philosophy, Psychology and Language Sciences, The University of Edinburgh, Edinburgh, EH8 9JZ, UK.

    General cognitive function is a prominent and relatively stable human trait that is associated with many important life outcomes. We combine cognitive and genetic data from the CHARGE and COGENT consortia, and UK Biobank (total N = 300,486; age 16-102) and find 148 genome-wide significant independent loci (P < 5 × 10<sup>-8</sup>) associated with general cognitive function. Within the novel genetic loci are variants associated with neurodegenerative and neurodevelopmental disorders, physical and psychiatric illnesses, and brain structure. Gene-based analyses find 709 genes associated with general cognitive function. Expression levels across the cortex are associated with general cognitive function. Using polygenic scores, up to 4.3% of variance in general cognitive function is predicted in independent samples. We detect significant genetic overlap between general cognitive function, reaction time, and many health variables including eyesight, hypertension, and longevity. In conclusion we identify novel genetic loci and pathways contributing to the heritability of general cognitive function.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F019394/1, BB/F022441/1; Medical Research Council: G0100594, G0400491, G0600237, G0901461, MC_PC_17228, MC_QA137853, MC_U147585819, MC_U147585824, MC_U147585827, MC_UP_A620_1014, MC_UP_A620_1017, MC_UU_00007/10, MC_UU_12011/1, MC_UU_12011/2, MC_UU_12011/4, MR/J000094/1, MR/K026992/1, MR/S015132/1; NCATS NIH HHS: UL1 TR001881; NHLBI NIH HHS: R01 HL070825, R01 HL105756, U01 HL096812, U01 HL096814, U01 HL096899, U01 HL096902, U01 HL096917, U01 HL130114; NIA NIH HHS: P30 AG010161, R01 AG033193, R01 AG049789, R01 AG054076, R01 AG055406, RF1 AG015819, U01 AG009740, U01 AG049505, U01 AG052409; NIGMS NIH HHS: U54 GM115428; NIMH NIH HHS: R01 MH085018; NINDS NIH HHS: R01 NS017950

    Nature communications 2018;9;1;2098

  • The first horse herders and the impact of early Bronze Age steppe expansions into Asia.

    de Barros Damgaard P, Martiniano R, Kamm J, Moreno-Mayar JV, Kroonen G, Peyrot M, Barjamovic G, Rasmussen S, Zacho C, Baimukhanov N, Zaibert V, Merz V, Biddanda A, Merz I, Loman V, Evdokimov V, Usmanova E, Hemphill B, Seguin-Orlando A, Yediay FE, Ullah I, Sjögren KG, Iversen KH, Choin J, de la Fuente C, Ilardo M, Schroeder H, Moiseyev V, Gromov A, Polyakov A, Omura S, Senyurt SY, Ahmad H, McKenzie C, Margaryan A, Hameed A, Samad A, Gul N, Khokhar MH, Goriunova OI, Bazaliiskii VI, Novembre J, Weber AW, Orlando L, Allentoft ME, Nielsen R, Kristiansen K, Sikora M, Outram AK, Durbin R and Willerslev E

    Centre for GeoGenetics, Natural History Museum, University of Copenhagen, Copenhagen, Denmark.

    The Yamnaya expansions from the western steppe into Europe and Asia during the Early Bronze Age (~3000 BCE) are believed to have brought with them Indo-European languages and possibly horse husbandry. We analyzed 74 ancient whole-genome sequences from across Inner Asia and Anatolia and show that the Botai people associated with the earliest horse husbandry derived from a hunter-gatherer population deeply diverged from the Yamnaya. Our results also suggest distinct migrations bringing West Eurasian ancestry into South Asia before and after, but not at the time of, Yamnaya culture. We find no evidence of steppe ancestry in Bronze Age Anatolia from when Indo-European languages are attested there. Thus, in contrast to Europe, Early Bronze Age Yamnaya-related migrations had limited direct genetic impact in Asia.

    Funded by: NIGMS NIH HHS: T32 GM007197; Wellcome Trust: 207492/Z/17/Z

    Science (New York, N.Y.) 2018;360;6396

  • Single-cell sequencing reveals the origin and the order of mutation acquisition in T-cell acute lymphoblastic leukemia.

    De Bie J, Demeyer S, Alberti-Servera L, Geerdens E, Segers H, Broux M, De Keersmaecker K, Michaux L, Vandenberghe P, Voet T, Boeckx N, Uyttebroeck A and Cools J

    Center for Human Genetics, KU Leuven, Leuven, Belgium.

    Next-generation sequencing has provided a detailed overview of the various genomic lesions implicated in the pathogenesis of T-cell acute lymphoblastic leukemia (T-ALL). Typically, 10-20 protein-altering lesions are found in T-ALL cells at diagnosis. However, it is currently unclear in which order these mutations are acquired and in which progenitor cells this is initiated. To address these questions, we used targeted single-cell sequencing of total bone marrow cells and CD34<sup>+</sup>CD38<sup>-</sup> multipotent progenitor cells for four T-ALL cases. Hierarchical clustering detected a dominant leukemia cluster at diagnosis, accompanied by a few smaller clusters harboring only a fraction of the mutations. We developed a graph-based algorithm to determine the order of mutation acquisition. Two of the four patients had an early event in a known oncogene (MED12, STAT5B) among various pre-leukemic events. Intermediate events included loss of 9p21 (CDKN2A/B) and acquisition of fusion genes, while NOTCH1 mutations were typically late events. Analysis of CD34<sup>+</sup>CD38<sup>-</sup> cells and myeloid progenitors revealed that in half of the cases somatic mutations were detectable in multipotent progenitor cells. We demonstrate that targeted single-cell sequencing can elucidate the order of mutation acquisition in T-ALL and that T-ALL development can start in a multipotent progenitor cell.

    Funded by: European Research Council: 617340

    Leukemia 2018;32;6;1358-1369

  • Recognizing the reagent microbiome.

    de Goffau MC, Lager S, Salter SJ, Wagner J, Kronbichler A, Charnock-Jones DS, Peacock SJ, Smith GCS and Parkhill J

    Wellcome Sanger Institute, Cambridge, UK.

    Funded by: Medical Research Council: G1100221, MR/K021133/1

    Nature microbiology 2018;3;8;851-853

  • Applying polygenic risk scoring for psychiatric disorders to a large family with bipolar disorder and major depressive disorder.

    de Jong S, Diniz MJA, Saloma A, Gadelha A, Santoro ML, Ota VK, Noto C, Major Depressive Disorder and Bipolar Disorder Working Groups of the Psychiatric Genomics Consortium, Curtis C, Newhouse SJ, Patel H, Hall LS, O Reilly PF, Belangero SI, Bressan RA and Breen G

    MRC Social Genetic and Developmental Psychiatry Centre, Institute of Psychiatry Psychology and Neuroscience, King's College London, London, SE5 8AF, UK.

    Psychiatric disorders are thought to have a complex genetic pathology consisting of interplay of common and rare variation. Traditionally, pedigrees are used to shed light on the latter only, while here we discuss the application of polygenic risk scores to also highlight patterns of common genetic risk. We analyze polygenic risk scores for psychiatric disorders in a large pedigree (<i>n</i> ~ 260) in which 30% of family members suffer from major depressive disorder or bipolar disorder. Studying patterns of assortative mating and anticipation, it appears increased polygenic risk is contributed by affected individuals who married into the family, resulting in an increasing genetic risk over generations. This may explain the observation of anticipation in mood disorders, whereby onset is earlier and the severity increases over the generations of a family. Joint analyses of rare and common variation may be a powerful way to understand the familial genetics of psychiatric disorders.

    Funded by: NIMH NIH HHS: U01 MH109536; Wellcome Trust

    Communications biology 2018;1;163

  • Genomic insights into the origin and diversification of late maritime hunter-gatherers from the Chilean Patagonia.

    de la Fuente C, Ávila-Arcos MC, Galimany J, Carpenter ML, Homburger JR, Blanco A, Contreras P, Cruz Dávalos D, Reyes O, San Roman M, Moreno-Estrada A, Campos PF, Eng C, Huntsman S, Burchard EG, Malaspinas AS, Bustamante CD, Willerslev E, Llop E, Verdugo RA and Moraga M

    Human Genetics Program, Institute of Biomedical Sciences, Faculty of Medicine, University of Chile, Santiago 8380453, Chile.

    Patagonia was the last region of the Americas reached by humans who entered the continent from Siberia ∼15,000-20,000 y ago. Despite recent genomic approaches to reconstruct the continental evolutionary history, regional characterization of ancient and modern genomes remains understudied. Exploring the genomic diversity within Patagonia is not just a valuable strategy to gain a better understanding of the history and diversification of human populations in the southernmost tip of the Americas, but it would also improve the representation of Native American diversity in global databases of human variation. Here, we present genome data from four modern populations from Central Southern Chile and Patagonia (<i>n</i> = 61) and four ancient maritime individuals from Patagonia (∼1,000 y old). Both the modern and ancient individuals studied in this work have a greater genetic affinity with other modern Native Americans than to any non-American population, showing within South America a clear structure between major geographical regions. Native Patagonian Kawéskar and Yámana showed the highest genetic affinity with the ancient individuals, indicating genetic continuity in the region during the past 1,000 y before present, together with an important agreement between the ethnic affiliation and historical distribution of both groups. Lastly, the ancient maritime individuals were genetically equidistant to a ∼200-y-old terrestrial hunter-gatherer from Tierra del Fuego, which supports a model with an initial separation of a common ancestral group to both maritime populations from a terrestrial population, with a later diversification of the maritime groups.

    Proceedings of the National Academy of Sciences of the United States of America 2018

  • Streptococcus bovimastitidis sp. nov., isolated from a dairy cow with mastitis.

    de Vries SPW, Hadjirin NF, Lay EM, Zadoks RN, Peacock SJ, Parkhill J, Grant AJ, McDougall S and Holmes MA

    Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.

    Here we describe a new species of the genus Streptococcus that was isolated from a dairy cow with mastitis in New Zealand. Strain NZ1587<sup>T</sup> was Gram-positive, coccus-shaped and arranged as chains, catalase and coagulase negative, γ-haemolytic and negative for Lancefield carbohydrates (A-D, F and G). The 16S rRNA sequence did not match sequences in the NCBI 16S rRNA or GreenGenes databases. Taxonomic classification of strain NZ1587<sup>T</sup> was investigated using 16S rRNA and core genome phylogeny, genome-wide average nucleotide identity (ANI) and predicted DNA-DNA hybridisation (DDH) analyses. Phylogeny based on 16S rRNA was unable to resolve the taxonomic position of strain NZ1587<sup>T</sup>, however NZ1587<sup>T</sup> shared 99.4 % identity at the 16S rRNA level with a distinct branch of S. pseudoporcinus. Importantly, core genome phylogeny demonstrated that NZ1587<sup>T</sup> grouped amongst the 'pyogenic' streptococcal species and formed a distinct branch supported by a 100 % bootstrap value. In addition, average nucleotide identity and inferred DNA-DNA hybridisation analyses showed that NZ1587<sup>T</sup> represents a novel species. Biochemical profiling using the rapid ID 32 strep identification test enabled differentiation of strain NZ1587<sup>T</sup> from closely related streptococcal species. In conclusion, strain NZ1587<sup>T</sup> can be classified as a novel species, and we propose a novel taxon named Streptococcus bovimastitidis sp. nov.; the type strain is NZ1587<sup>T</sup>. NZ1587<sup>T</sup> has been deposited in the Culture Collection University of Gothenburg (CCUG 69277<sup>T</sup>) and the Belgian Co-ordinated Collections of Micro-organisms/LMG (LMG 29747).

    Funded by: Medical Research Council: G1001787

    International journal of systematic and evolutionary microbiology 2018;68;1;21-27

  • Comparative genomics reveals that loss of lunatic fringe (LFNG) promotes melanoma metastasis.

    Del Castillo Velasco-Herrera M, van der Weyden L, Nsengimana J, Speak AO, Sjöberg MK, Bishop DT, Jönsson G, Newton-Bishop J and Adams DJ

    Experimental Cancer Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Metastasis is the leading cause of death in patients with advanced melanoma, yet the somatic alterations that aid tumour cell dissemination and colonisation are poorly understood. Here, we deploy comparative genomics to identify and validate clinically relevant drivers of melanoma metastasis. To do this, we identified a set of 976 genes whose expression level was associated with a poor outcome in patients from two large melanoma cohorts. Next, we characterised the genomes and transcriptomes of mouse melanoma cell lines defined as weakly metastatic, and their highly metastatic derivatives. By comparing expression data between species, we identified lunatic fringe (LFNG), among 28 genes whose expression level is predictive of poor prognosis and whose altered expression is associated with a prometastatic phenotype in mouse melanoma cells. CRISPR/Cas9-mediated knockout of Lfng dramatically enhanced the capability of weakly metastatic melanoma cells to metastasise in vivo, a phenotype that could be rescued with the Lfng cDNA. Notably, genomic alterations disrupting LFNG are found exclusively in human metastatic melanomas sequenced as part of The Cancer Genome Atlas. Using comparative genomics, we show that LFNG expression plays a functional role in regulating melanoma metastasis.

    Funded by: Cancer Research UK: 11963, 13031; Wellcome Trust; Worldwide Cancer Research: 12-0023

    Molecular oncology 2018;12;2;239-255

  • Outer membrane vesicles from Neisseria gonorrhoeae target PorB to mitochondria and induce apoptosis.

    Deo P, Chow SH, Hay ID, Kleifeld O, Costin A, Elgass KD, Jiang JH, Ramm G, Gabriel K, Dougan G, Lithgow T, Heinz E and Naderer T

    Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Clayton, Victoria, Australia.

    Neisseria gonorrhoeae causes the sexually transmitted disease gonorrhoea by evading innate immunity. Colonizing the mucosa of the reproductive tract depends on the bacterial outer membrane porin, PorB, which is essential for ion and nutrient uptake. PorB is also targeted to host mitochondria and regulates apoptosis pathways to promote infections. How PorB traffics from the outer membrane of N. gonorrhoeae to mitochondria and whether it modulates innate immune cells, such as macrophages, remains unclear. Here, we show that N. gonorrhoeae secretes PorB via outer membrane vesicles (OMVs). Purified OMVs contained primarily outer membrane proteins including oligomeric PorB. The porin was targeted to mitochondria of macrophages after exposure to purified OMVs and wild type N. gonorrhoeae. This was associated with loss of mitochondrial membrane potential, release of cytochrome c, activation of apoptotic caspases and cell death in a time-dependent manner. Consistent with this, OMV-induced macrophage death was prevented with the pan-caspase inhibitor, Q-VD-PH. This shows that N. gonorrhoeae utilizes OMVs to target PorB to mitochondria and to induce apoptosis in macrophages, thus affecting innate immunity.

    PLoS pathogens 2018;14;3;e1006945

  • Genome-wide haplotyping embryos developing from 0PN and 1PN zygotes increases transferrable embryos in PGT-M.

    Destouni A, Dimitriadou E, Masset H, Debrock S, Melotte C, Van Den Bogaert K, Zamani Esteki M, Ding J, Voet T, Denayer E, de Ravel T, Legius E, Meuleman C, Peeraer K and Vermeesch JR

    Laboratory for Cytogenetics and Genome Research, Center for Human Genetics, University of Leuven, O&N I Herestraat 49, KU Leuven, Leuven, Belgium.

    Study question: Can genome-wide haplotyping increase success following preimplantation genetic testing for a monogenic disorder (PGT-M) by including zygotes with absence of pronuclei (0PN) or the presence of only one pronucleus (1PN)?

    Summary answer: Genome-wide haplotyping 0PNs and 1PNs increases the number of PGT-M cycles reaching embryo transfer (ET) by 81% and the live-birth rate by 75%.

    What is known already: Although a significant subset of 0PN and 1PN zygotes can develop into balanced, diploid and developmentally competent embryos, they are usually discarded because parental diploidy detection is not part of the routine work-up of PGT-M.

    Study design, size, duration: This prospective cohort study evaluated the pronuclear number in 2229 zygotes from 2337 injected metaphase II (MII) oocytes in 268 cycles. PGT-M for 0PN and 1PN embryos developing into Day 5/6 blastocysts with adequate quality for vitrification was performed in 42 of the 268 cycles (15.7%). In these 42 cycles, we genome-wide haplotyped 216 good quality embryos corresponding to 49 0PNs, 15 1PNs and 152 2PNs. The reported outcomes include parental contribution to embryonic ploidy, embryonic aneuploidy, genetic diagnosis for the monogenic disorder, cycles reaching ETs, pregnancy and live birth rates (LBR) for unaffected offspring.

    Participants/materials, setting, methods: Blastomere DNA was whole-genome amplified and hybridized on the Illumina Human CytoSNP12V2.1.1 BeadChip arrays. Subsequently, genome-wide haplotyping and copy-number profiling was applied to investigate the embryonic genome architecture. Bi-parental, unaffected embryos were transferred regardless of their initial zygotic PN score.

    Main results and the role of chance: A staggering 75.51% of 0PN and 42.86% of 1PN blastocysts are diploid bi-parental allowing accurate genetic diagnosis for the monogenic disorder. In total, 31% (13/42) of the PGT-M cycles reached ET or could repeat ET with an unaffected 0PN or 1PN embryo. The LBR per initiated cycle increased from 9.52 to 16.67%.

    Limitations, reasons for caution: The clinical efficacy of the routine inclusion of 0PN and 1PN zygotes in PGT-M cycles should be confirmed in larger cohorts from multicenter studies.

    Wider implications of the findings: Genome-wide haplotyping allows the inclusion of 0PN and 1PN embryos and subsequently increases the cycles reaching ET following PGT-M and potentially PGT for aneuploidy (PGT-A) and chromosomal structural rearrangements (PGT-SR). Establishing measures of clinical efficacy could lead to an update of the ESHRE guidelines which advise against the use of these zygotes.

    Study funding/competing interest(s): SymBioSys (PFV/10/016 and C1/018 to J.R.V. and T.V.), the Horizon 2020 WIDENLIFE: 692065 to J.R.V., T.V., E.D., A.D. and M.Z.E. M.Z.E., T.V. and J.R.V. co-invented haplarithmisis ('Haplotyping and copy-number typing using polymorphic variant allelic frequencies'), which has been licensed to Agilent Technologies. H.M. is fully supported by the (FWO) (ZKD1543-ASP/16). The authors have no competing interests to declare.

    Human reproduction (Oxford, England) 2018;33;12;2302-2311

  • Shieldin complex promotes DNA end-joining and counters homologous recombination in BRCA1-null cells.

    Dev H, Chiang TW, Lescale C, de Krijger I, Martin AG, Pilger D, Coates J, Sczaniecka-Clift M, Wei W, Ostermaier M, Herzog M, Lam J, Shea A, Demir M, Wu Q, Yang F, Fu B, Lai Z, Balmus G, Belotserkovskaya R, Serra V, O'Connor MJ, Bruna A, Beli P, Pellegrini L, Caldas C, Deriano L, Jacobs JJL, Galanty Y and Jackson SP

    The Wellcome Trust/Cancer Research UK Gurdon Institute and Department of Biochemistry, University of Cambridge, Cambridge, UK.

    BRCA1 deficiencies cause breast, ovarian, prostate and other cancers, and render tumours hypersensitive to poly(ADP-ribose) polymerase (PARP) inhibitors. To understand the resistance mechanisms, we conducted whole-genome CRISPR-Cas9 synthetic-viability/resistance screens in BRCA1-deficient breast cancer cells treated with PARP inhibitors. We identified two previously uncharacterized proteins, C20orf196 and FAM35A, whose inactivation confers strong PARP-inhibitor resistance. Mechanistically, we show that C20orf196 and FAM35A form a complex, 'Shieldin' (SHLD1/2), with FAM35A interacting with single-stranded DNA through its C-terminal oligonucleotide/oligosaccharide-binding fold region. We establish that Shieldin acts as the downstream effector of 53BP1/RIF1/MAD2L2 to promote DNA double-strand break (DSB) end-joining by restricting DSB resection and to counteract homologous recombination by antagonizing BRCA2/RAD51 loading in BRCA1-deficient cells. Notably, Shieldin inactivation further sensitizes BRCA1-deficient cells to cisplatin, suggesting how defining the SHLD1/2 status of BRCA1-deficient tumours might aid patient stratification and yield new treatment opportunities. Highlighting this potential, we document reduced SHLD1/2 expression in human breast cancers displaying intrinsic or acquired PARP-inhibitor resistance.

    Funded by: Cancer Research UK: A18796; European Research Council: 310917, 311565; Wellcome Trust: 206388

    Nature cell biology 2018;20;8;954-965

  • Bayesian inference of ancestral dates on bacterial phylogenetic trees.

    Didelot X, Croucher NJ, Bentley SD, Harris SR and Wilson DJ

    Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, UK.

    The sequencing and comparative analysis of a collection of bacterial genomes from a single species or lineage of interest can lead to key insights into its evolution, ecology or epidemiology. The tool of choice for such a study is often to build a phylogenetic tree, and more specifically when possible a dated phylogeny, in which the dates of all common ancestors are estimated. Here, we propose a new Bayesian methodology to construct dated phylogenies which is specifically designed for bacterial genomics. Unlike previous Bayesian methods aimed at building dated phylogenies, we consider that the phylogenetic relationships between the genomes have been previously evaluated using a standard phylogenetic method, which makes our methodology much faster and scalable. This two-step approach also allows us to directly exploit existing phylogenetic methods that detect bacterial recombination, and therefore to account for the effect of recombination in the construction of a dated phylogeny. We analysed many simulated datasets in order to benchmark the performance of our approach in a wide range of situations. Furthermore, we present applications to three different real datasets from recent bacterial genomic studies. Our methodology is implemented in a R package called BactDating which is freely available for download at

    Funded by: Medical Research Council: MR/N010760/1, MR/N010760/1, MR/R015600/1; Wellcome Trust: 101237/Z/13/Z

    Nucleic acids research 2018;46;22;e134

  • Comparative genomics of Czech vaccine strains of Bordetella pertussis.

    Dienstbier A, Pouchnik D, Wildung M, Amman F, Hofacker IL, Parkhill J, Holubova J, Sebo P and Vecerek B

    Institute of Microbiology v.v.i., Laboratory of post-transcriptional control of gene expression, 14220 Prague, Czech Republic.

    Bordetella pertussis is a strictly human pathogen causing the respiratory infectious disease called whooping cough or pertussis. B. pertussis adaptation to acellular pertussis vaccine pressure has been repeatedly highlighted, but recent data indicate that adaptation of circulating strains started already in the era of the whole cell pertussis vaccine (wP) use. We sequenced the genomes of five B. pertussis wP vaccine strains isolated in the former Czechoslovakia in the pre-wP (1954-1957) and early wP (1958-1965) eras, when only limited population travel into and out of the country was possible. Four isolates exhibit a similar genome organization and form a distinct phylogenetic cluster with a geographic signature. The fifth strain is rather distinct, both in genome organization and SNP-based phylogeny. Surprisingly, despite isolation of this strain before 1966, its closest sequenced relative appears to be a recent isolate from the US. On the genome content level, the five vaccine strains contained both new and already described regions of difference. One of the new regions contains duplicated genes potentially associated with transport across the membrane. The prevalence of this region in recent isolates indicates that its spread might be associated with selective advantage leading to increased strain fitness.

    Pathogens and disease 2018;76;7

  • Mutational Analysis Identifies Therapeutic Biomarkers in Inflammatory Bowel Disease-Associated Colorectal Cancers.

    Din S, Wong K, Mueller MF, Oniscu A, Hewinson J, Black CJ, Miller ML, Jiménez-Sánchez A, Rabbie R, Rashid M, Satsangi J, Adams DJ and Arends MJ

    NHS Lothian, Gastrointestinal Unit, Western General Hospital, Edinburgh, Scotland, United Kingdom.

    <b>Purpose:</b> Inflammatory bowel disease-associated colorectal cancers (IBD-CRC) are associated with a higher mortality than sporadic colorectal cancers. The poorly defined molecular pathogenesis of IBD-CRCs limits development of effective prevention, detection, and treatment strategies. We aimed to identify biomarkers using whole-exome sequencing of IBD-CRCs to guide individualized management.<b>Experimental Design:</b> Whole-exome sequencing was performed on 34 formalin-fixed paraffin-embedded primary IBD-CRCs and 31 matched normal lymph nodes. Computational methods were used to identify somatic point mutations, small insertions and deletions, mutational signatures, and somatic copy number alterations. Mismatch repair status was examined.<b>Results:</b> Hypermutation was observed in 27% of IBD-CRCs. All hypermutated cancers were from the proximal colon; all but one of the cancers with hypermutation had defective mismatch repair or somatic mutations in the proofreading domain of DNA <i>POLE</i> Hypermutated IBD-CRCs had increased numbers of predicted neo-epitopes, which could be exploited using immunotherapy. We identified six distinct mutation signatures in IBD-CRCs, three of which corresponded to known mechanisms of mutagenesis. Driver genes were also identified.<b>Conclusions:</b> IBD-CRCs should be evaluated for hypermutation and defective mismatch repair to identify patients with a higher neo-epitope load who may benefit from immunotherapies. Prospective trials are required to determine whether IHC to detect loss of MLH1 expression in dysplastic colonic tissue could identify patients at increased risk of developing IBD-CRC. We identified mutations in genes in IBD-CRCs with hypermutation that might be targeted therapeutically. These approaches would complement and individualize surveillance and treatment programs. <i>Clin Cancer Res; 24(20); 5133-42. ©2018 AACR</i>.

    Funded by: Cancer Research UK: A21717, A6997; European Research Council: 319661; Wellcome Trust: 082356, 206194

    Clinical cancer research : an official journal of the American Association for Cancer Research 2018;24;20;5133-5142

  • SRSF3 maintains transcriptome integrity in oocytes by regulation of alternative splicing and transposable elements.

    Do DV, Strauss B, Cukuroglu E, Macaulay I, Wee KB, Hu TX, Igor RLM, Lee C, Harrison A, Butler R, Dietmann S, Jernej U, Marioni J, Smith CWJ, Göke J and Surani MA

    1Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN UK.

    The RNA-binding protein SRSF3 (also known as SRp20) has critical roles in the regulation of pre-mRNA splicing. Zygotic knockout of <i>Srsf3</i> results in embryo arrest at the blastocyst stage. However, SRSF3 is also present in oocytes, suggesting that it might be critical as a maternally inherited factor. Here we identify SRSF3 as an essential regulator of alternative splicing and of transposable elements to maintain transcriptome integrity in mouse oocyte. Using 3D time-lapse confocal live imaging, we show that conditional deletion of <i>Srsf3</i> in fully grown germinal vesicle oocytes substantially compromises the capacity of germinal vesicle breakdown (GVBD), and consequently entry into meiosis. By combining single cell RNA-seq, and oocyte micromanipulation with steric blocking antisense oligonucleotides and RNAse-H inducing gapmers, we found that the GVBD defect in mutant oocytes is due to both aberrant alternative splicing and derepression of B2 SINE transposable elements. Together, our study highlights how control of transcriptional identity of the maternal transcriptome by the RNA-binding protein SRSF3 is essential to the development of fertilized-competent oocytes.

    Cell discovery 2018;4;33

  • Defining endemic cholera at three levels of spatiotemporal resolution within Bangladesh.

    Domman D, Chowdhury F, Khan AI, Dorman MJ, Mutreja A, Uddin MI, Paul A, Begum YA, Charles RC, Calderwood SB, Bhuiyan TR, Harris JB, LaRocque RC, Ryan ET, Qadri F and Thomson NR

    Infection Genomics Programme, Wellcome Sanger Institute, Hinxton, UK.

    Although much focus is placed on cholera epidemics, the greatest burden occurs in settings in which cholera is endemic, including areas of South Asia, Africa and now Haiti<sup>1,2</sup>. Dhaka, Bangladesh is a megacity that is hyper-endemic for cholera, and experiences two regular seasonal outbreaks of cholera each year<sup>3</sup>. Despite this, a detailed understanding of the diversity of Vibrio cholerae strains circulating in this setting, and their relationships to annual outbreaks, has not yet been obtained. Here we performed whole-genome sequencing of V. cholerae across several levels of focus and scale, at the maximum possible resolution. We analyzed bacterial isolates to define cholera dynamics at multiple levels, ranging from infection within individuals, to disease dynamics at the household level, to regional and intercontinental cholera transmission. Our analyses provide a genomic framework for understanding cholera diversity and transmission in an endemic setting.

    Funded by: FIC NIH HHS: D43 TW005572, K43 TW010362; NIAID NIH HHS: R01 AI103055, R01 AI106878, R56 AI106878, U01 AI058935, U01 AI077883; NIDDK NIH HHS: P30 DK043351

    Nature genetics 2018;50;7;951-955

  • A genomic infection control study for Staphylococcus aureus in two Ghanaian hospitals.

    Donkor ES, Jamrozy D, Mills RO, Dankwah T, Amoo PK, Egyir B, Badoe EV, Twasam J and Bentley SD

    Department of Medical Microbiology, School of Biomedical and Allied Health Sciences, College of Health Sciences, University of Ghana, Accra, Ghana.

    Background: Whole genome sequencing analysis (WGSA) provides the best resolution for typing of bacterial isolates and has the potential for identification of transmission pathways. The aim of the study was to apply WGSA to elucidate the possible transmission events involved in two suspected <i>Staphylococcus aureus</i> hospital outbreaks in Ghana and describe genomic features of the <i>S. aureus</i> isolates sampled in the outbreaks.

    Methods: The study was carried out at Korle-Bu Teaching Hospital and Lekma Hospital where the suspected outbreaks occurred in 2012 and 2015, respectively. The <i>S. aureus</i> isolates collected from the two hospitals were from three sources including carriage, invasive disease, and the environment. Whole genome sequencing of the <i>S. aureus</i> isolates was performed and the sequence reads were mapped to the <i>S. aureus</i> reference genome of strain USA300_FPR3757. A maximum-likelihood phylogenetic tree was reconstructed. Multilocus sequence typing together with the analysis of antimicrobial resistance and virulence genes were performed by short read mapping using the SRST2.

    Results: The <i>S. aureus</i> isolates belonged to diverse sequence types (STs) with ST15 and ST152 most common. All isolates carried the <i>blaZ</i> gene, with low prevalence of <i>tetK</i> and <i>dfrG</i> genes also observed. All isolates were <i>mecA</i> negative. The <i>pvl</i> genes were common and observed in distinct lineages that revealed diverse <i>Sa2int</i> phages. At Korle-Bu Teaching Hospital, the genomics data indicated several transmission events of <i>S. aureus</i> ST15 involving contamination of various surfaces in the pediatric emergency ward where the outbreak occurred.

    Conclusion: The pattern of dissemination of the ST15 clone in the emergency ward of Korle-Bu Teaching Hospital highlights a basic problem with disinfection of environmental surfaces at the hospital. Diverse phage population rather than a single highly transmissible phage type likely mediates the high prevalence of <i>pvl</i> genes among the <i>S. aureus</i> isolates.

    Funded by: Wellcome Trust

    Infection and drug resistance 2018;11;1757-1765

  • Regulatory Hierarchies Controlling Virulence Gene Expression in Shigella flexneri and Vibrio cholerae.

    Dorman MJ and Dorman CJ

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom.

    Gram-negative enteropathogenic bacteria use a variety of strategies to cause disease in the human host and gene regulation in some form is typically a part of the strategy. This article will compare the toxin-based infection strategy used by the non-invasive pathogen <i>Vibrio cholerae</i>, the etiological agent in human cholera, with the invasive approach used by <i>Shigella flexneri</i>, the cause of bacillary dysentery. Despite the differences in the mechanisms by which the two pathogens cause disease, they use environmentally-responsive regulatory hierarchies to control the expression of genes that have some features, and even some components, in common. The involvement of AraC-like transcription factors, the integration host factor, the Factor for inversion stimulation, small regulatory RNAs, the RNA chaperone Hfq, horizontal gene transfer, variable DNA topology and the need to overcome the pervasive silencing of transcription by H-NS of horizontally acquired genes are all shared features. A comparison of the regulatory hierarchies in these two pathogens illustrates some striking cross-species similarities and differences among mechanisms coordinating virulence gene expression. <i>S. flexneri</i>, with its low infectious dose, appears to use a strategy that is centered on the individual bacterial cell, whereas <i>V. cholerae</i>, with a community-based, quorum-dependent approach and an infectious dose that is several orders of magnitude higher, seems to rely more on the actions of a bacterial collective.

    Funded by: Wellcome Trust

    Frontiers in microbiology 2018;9;2686

  • The Capsule Regulatory Network of Klebsiella pneumoniae Defined by density-TraDISort.

    Dorman MJ, Feltwell T, Goulding DA, Parkhill J and Short FL

    Wellcome Sanger Institute, Hinxton, Cambridgeshire, United Kingdom.

    <i>Klebsiella pneumoniae</i> infections affect infants and the immunocompromised, and the recent emergence of hypervirulent and multidrug-resistant <i>K. pneumoniae</i> lineages is a critical health care concern. Hypervirulence in <i>K. pneumoniae</i> is mediated by several factors, including the overproduction of extracellular capsule. However, the full details of how <i>K. pneumoniae</i> capsule biosynthesis is achieved or regulated are not known. We have developed a robust and sensitive procedure to identify genes influencing capsule production, density-TraDISort, which combines density gradient centrifugation with transposon insertion sequencing. We have used this method to explore capsule regulation in two clinically relevant <i>Klebsiella</i> strains, <i>K. pneumoniae</i> NTUH-K2044 (capsule type K1) and <i>K. pneumoniae</i> ATCC 43816 (capsule type K2). We identified multiple genes required for full capsule production in <i>K. pneumoniae</i>, as well as putative suppressors of capsule in NTUH-K2044, and have validated the results of our screen with targeted knockout mutants. Further investigation of several of the <i>K. pneumoniae</i> capsule regulators identified-ArgR, MprA/KvrB, SlyA/KvrA, and the Sap ABC transporter-revealed effects on capsule amount and architecture, serum resistance, and virulence. We show that capsule production in <i>K. pneumoniae</i> is at the center of a complex regulatory network involving multiple global regulators and environmental cues and that the majority of capsule regulatory genes are located in the core genome. Overall, our findings expand our understanding of how capsule is regulated in this medically important pathogen and provide a technology that can be easily implemented to study capsule regulation in other bacterial species.<b>IMPORTANCE</b> Capsule production is essential for <i>K. pneumoniae</i> to cause infections, but its regulation and mechanism of synthesis are not fully understood in this organism. We have developed and applied a new method for genome-wide identification of capsule regulators. Using this method, many genes that positively or negatively affect capsule production in <i>K. pneumoniae</i> were identified, and we use these data to propose an integrated model for capsule regulation in this species. Several of the genes and biological processes identified have not previously been linked to capsule synthesis. We also show that the methods presented here can be applied to other species of capsulated bacteria, providing the opportunity to explore and compare capsule regulatory networks in other bacterial strains and species.

    Funded by: Wellcome Trust: 106063/A/14/Z, 206194

    mBio 2018;9;6

  • A Genome Resequencing-Based Genetic Map Reveals the Recombination Landscape of an Outbred Parasitic Nematode in the Presence of Polyploidy and Polyandry.

    Doyle SR, Laing R, Bartley DJ, Britton C, Chaudhry U, Gilleard JS, Holroyd N, Mable BK, Maitland K, Morrison AA, Tait A, Tracey A, Berriman M, Devaney E, Cotton JA and Sargison ND

    Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom.

    The parasitic nematode Haemonchus contortus is an economically and clinically important pathogen of small ruminants, and a model system for understanding the mechanisms and evolution of traits such as anthelmintic resistance. Anthelmintic resistance is widespread and is a major threat to the sustainability of livestock agriculture globally; however, little is known about the genome architecture and parameters such as recombination that will ultimately influence the rate at which resistance may evolve and spread. Here, we performed a genetic cross between two divergent strains of H. contortus, and subsequently used whole-genome resequencing of a female worm and her brood to identify the distribution of genome-wide variation that characterizes these strains. Using a novel bioinformatic approach to identify variants that segregate as expected in a pseudotestcross, we characterized linkage groups and estimated genetic distances between markers to generate a chromosome-scale F1 genetic map. We exploited this map to reveal the recombination landscape, the first for any helminth species, demonstrating extensive variation in recombination rate within and between chromosomes. Analyses of these data also revealed the extent of polyandry, whereby at least eight males were found to have contributed to the genetic variation of the progeny analyzed. Triploid offspring were also identified, which we hypothesize are the result of nondisjunction during female meiosis or polyspermy. These results expand our knowledge of the genetics of parasitic helminths and the unusual life-history of H. contortus, and enhance ongoing efforts to understand the genetic basis of resistance to the drugs used to control these worms and for related species that infect livestock and humans throughout the world. This study also demonstrates the feasibility of using whole-genome resequencing data to directly construct a genetic map in a single generation cross from a noninbred nonmodel organism with a complex lifecycle.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/M003949; Wellcome Trust

    Genome biology and evolution 2018;10;2;396-409

  • Malaria Vaccines: Recent Advances and New Horizons.

    Draper SJ, Sack BK, King CR, Nielsen CM, Rayner JC, Higgins MK, Long CA and Seder RA

    The Jenner Institute, University of Oxford, Old Road Campus Research Building, Oxford, OX3 7DQ, UK. Electronic address:

    The development of highly effective and durable vaccines against the human malaria parasites Plasmodium falciparum and P. vivax remains a key priority. Decades of endeavor have taught that achieving this goal will be challenging; however, recent innovation in malaria vaccine research and a diverse pipeline of novel vaccine candidates for clinical assessment provides optimism. With first-generation pre-erythrocytic vaccines aiming for licensure in the coming years, it is important to reflect on how next-generation approaches can improve on their success. Here we review the latest vaccine approaches that seek to prevent malaria infection, disease, and transmission and highlight some of the major underlying immunological and molecular mechanisms of protection. The synthesis of rational antigen selection, immunogen design, and immunization strategies to induce quantitatively and qualitatively improved immune effector mechanisms offers promise for achieving sustained high-level protection.

    Funded by: Wellcome Trust

    Cell host & microbe 2018;24;1;43-56

  • Strategies for managing rival bacterial communities: Lessons from burying beetles.

    Duarte A, Welch M, Swannack C, Wagner J and Kilner RM

    Department of Zoology, University of Cambridge, Cambridge, UK.

    The role of bacteria in animal development, ecology and evolution is increasingly well understood, yet little is known of how animal behaviour affects bacterial communities. Animals that benefit from defending a key resource from microbial competitors are likely to evolve behaviours to control or manipulate the animal's associated external microbiota. We describe four possible mechanisms by which animals could gain a competitive edge by disrupting a rival bacterial community: "weeding," "seeding," "replanting" and "preserving." By combining detailed behavioural observations with molecular and bioinformatic analyses, we then test which of these mechanisms best explains how burying beetles, Nicrophorus vespilloides, manipulate the bacterial communities on their carcass breeding resource. Burying beetles are a suitable species to study how animals manage external microbiota because reproduction revolves around a small vertebrate carcass. Parents shave a carcass and apply antimicrobial exudates on its surface, shaping it into an edible nest for their offspring. We compared bacterial communities in mice carcasses that were either fresh, prepared by beetles or unprepared but buried underground for the same length of time. We also analysed bacterial communities in the burying beetle's gut, during and after breeding, to understand whether beetles could be "seeding" the carcass with particular microbes. We show that burying beetles do not "preserve" the carcass by reducing bacterial load, as is commonly supposed. Instead, our results suggest they "seed" the carcass with bacterial groups which are part of the Nicrophorus core microbiome. They may also "replant" other bacteria from the carcass gut onto the surface of their carrion nest. Both these processes may lead to the observed increase in bacterial load on the carcass surface in the presence of beetles. Beetles may also "weed" the bacterial community by eliminating some groups of bacteria on the carcass, perhaps through the production of antimicrobials themselves. Whether these alterations to the bacterial community are adaptive from the beetle's perspective, or are simply a by-product of the way in which the beetles prepare the carcass for reproduction, remains to be determined in future work. In general, our work suggests that animals might use more sophisticated techniques for attacking and disrupting rival microbial communities than is currently appreciated.

    Funded by: European Research Council: 310785

    The Journal of animal ecology 2018;87;2;414-427

  • Multi-population genomic analysis of malaria parasites indicates local selection and differentiation at the gdv1 locus regulating sexual development.

    Duffy CW, Amambua-Ngwa A, Ahouidi AD, Diakite M, Awandare GA, Ba H, Tarr SJ, Murray L, Stewart LB, D'Alessandro U, Otto TD, Kwiatkowski DP and Conway DJ

    Pathogen Molecular Biology Department, London School of Hygiene and Tropical Medicine, Keppel St, London, UK.

    Parasites infect hosts in widely varying environments, encountering diverse challenges for adaptation. To identify malaria parasite genes under locally divergent selection across a large endemic region with a wide spectrum of transmission intensity, genome sequences were obtained from 284 clinical Plasmodium falciparum infections from four newly sampled locations in Senegal, The Gambia, Mali and Guinea. Combining these with previous data from seven other sites in West Africa enabled a multi-population analysis to identify discrete loci under varying local selection. A genome-wide scan showed the most exceptional geographical divergence to be at the early gametocyte gene locus gdv1 which is essential for parasite sexual development and transmission. We identified a major structural dimorphism with alternative 1.5 kb and 1.0 kb sequence deletions at different positions of the 3'-intergenic region, in tight linkage disequilibrium with the most highly differentiated single nucleotide polymorphism, one of the alleles being very frequent in Senegal and The Gambia but rare in the other locations. Long non-coding RNA transcripts were previously shown to include the entire antisense of the gdv1 coding sequence and the portion of the intergenic region with allelic deletions, suggesting adaptive regulation of parasite sexual development and transmission in response to local conditions.

    Funded by: Biotechnology and Biological Sciences Research Council (BBSRC): LIDO studentship; EC | European Research Council (ERC): AdG-2011-294428; European Research Council: 294428; Medical Research Council: G1100123, MC_EX_MR/K02440X/1, MC_UP_A900_1119, MR/M006212/1; Medical Research Council (MRC): G1100123; Royal Society: AA110050; Wellcome Trust: 090770/Z/09/Z

    Scientific reports 2018;8;1;15763

  • Relationship Between Sequence Homology, Genome Architecture, and Meiotic Behavior of the Sex Chromosomes in North American Voles.

    Dumont BL, Williams CL, Ng BL, Horncastle V, Chambers CL, McGraw LA, Adams D, Mackay TFC and Breen M

    Initiative in Biological Complexity, North Carolina State University, Raleigh, North Carolina 04609

    In most mammals, the X and Y chromosomes synapse and recombine along a conserved region of homology known as the pseudoautosomal region (PAR). These homology-driven interactions are required for meiotic progression and are essential for male fertility. Although the PAR fulfills key meiotic functions in most mammals, several exceptional species lack PAR-mediated sex chromosome associations at meiosis. Here, we leveraged the natural variation in meiotic sex chromosome programs present in North American voles (<i>Microtus</i>) to investigate the relationship between meiotic sex chromosome dynamics and X/Y sequence homology. To this end, we developed a novel, reference-blind computational method to analyze sparse sequencing data from flow-sorted X and Y chromosomes isolated from vole species with sex chromosomes that always (<i>Microtus montanus</i>), never (<i>Microtus mogollonensis</i>), and occasionally synapse (<i>Microtus ochrogaster</i>) at meiosis. Unexpectedly, we find more shared X/Y homology in the two vole species with no and sporadic X/Y synapsis compared to the species with obligate synapsis. Sex chromosome homology in the asynaptic and occasionally synaptic species is interspersed along chromosomes and largely restricted to low-complexity sequences, including a striking enrichment for the telomeric repeat sequence, TTAGGG. In contrast, homology is concentrated in high complexity, and presumably euchromatic, sequence on the X and Y chromosomes of the synaptic vole species, <i>M. montanus</i> Taken together, our findings suggest key conditions required to sustain the standard program of X/Y synapsis at meiosis and reveal an intriguing connection between heterochromatic repeat architecture and noncanonical, asynaptic mechanisms of sex chromosome segregation in voles.

    Funded by: NIGMS NIH HHS: R00 GM110332

    Genetics 2018;210;1;83-97

  • Alpha-v-containing integrins are host receptors for the Plasmodium falciparum sporozoite surface protein, TRAP.

    Dundas K, Shears MJ, Sun Y, Hopp CS, Crosnier C, Metcalf T, Girling G, Sinnis P, Billker O and Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, CB10 1SA Cambridge, United Kingdom.

    Malaria-causing <i>Plasmodium</i> sporozoites are deposited in the dermis by the bite of an infected mosquito and move by gliding motility to the liver where they invade and develop within host hepatocytes. Although extracellular interactions between <i>Plasmodium</i> sporozoite ligands and host receptors provide important guidance cues for productive infection and are good vaccine targets, these interactions remain largely uncharacterized. Thrombospondin-related anonymous protein (TRAP) is a parasite cell surface ligand that is essential for both gliding motility and invasion because it couples the extracellular binding of host receptors to the parasite cytoplasmic actinomyosin motor; however, the molecular nature of the host TRAP receptors is poorly defined. Here, we use a systematic extracellular protein interaction screening approach to identify the integrin αvβ3 as a directly interacting host receptor for <i>Plasmodium falciparum</i> TRAP. Biochemical characterization of the interaction suggests a two-site binding model, requiring contributions from both the von Willebrand factor A domain and the RGD motif of TRAP for integrin binding. We show that TRAP binding to cells is promoted in the presence of integrin-activating proadhesive Mn<sup>2+</sup> ions, and that cells genetically targeted so that they lack cell surface expression of the integrin αv-subunit are no longer able to bind TRAP. <i>P. falciparum</i> sporozoites moved with greater speed in the dermis of <i>Itgb3</i>-deficient mice, suggesting that the interaction has a role in sporozoite migration. The identification of the integrin αvβ3 as the host receptor for TRAP provides an important demonstration of a sporozoite surface ligand that directly interacts with host receptors.

    Funded by: Medical Research Council: MR/J004111/1; NIAID NIH HHS: R01 AI056840, R01 AI132359; Wellcome Trust: 206194

    Proceedings of the National Academy of Sciences of the United States of America 2018;115;17;4477-4482

  • Registered access: authorizing data access.

    Dyke SOM, Linden M, Lappalainen I, De Argila JR, Carey K, Lloyd D, Spalding JD, Cabili MN, Kerry G, Foreman J, Cutts T, Shabani M, Rodriguez LL, Haeussler M, Walsh B, Jiang X, Wang S, Perrett D, Boughtwood T, Matern A, Brookes AJ, Cupak M, Fiume M, Pandya R, Tulchinsky I, Scollen S, Törnroos J, Das S, Evans AC, Malin BA, Beck S, Brenner SE, Nyrönen T, Blomberg N, Firth HV, Hurles M, Philippakis AA, Rätsch G, Brudno M, Boycott KM, Rehm HL, Baudis M, Sherry ST, Kato K, Knoppers BM, Baker D and Flicek P

    Centre of Genomics and Policy, Faculty of Medicine, McGill University, Montreal, QC, Canada.

    The Global Alliance for Genomics and Health (GA4GH) proposes a data access policy model-"registered access"-to increase and improve access to data requiring an agreement to basic terms and conditions, such as the use of DNA sequence and health data in research. A registered access policy would enable a range of categories of users to gain access, starting with researchers and clinical care professionals. It would also facilitate general use and reuse of data but within the bounds of consent restrictions and other ethical obligations. In piloting registered access with the Scientific Demonstration data sharing projects of GA4GH, we provide additional ethics, policy and technical guidance to facilitate the implementation of this access model in an international setting.

    Funded by: CIHR: CEE-151618, EP1-120608, EP1-120609; NHGRI NIH HHS: R00 HG008175, U41 HG002371, U41 HG007346; Wellcome Trust: 201535

    European journal of human genetics : EJHG 2018;26;12;1721-1731

  • A Requirement for Zic2 in the Regulation of Nodal Expression Underlies the Establishment of Left-Sided Identity.

    Dykes IM, Szumska D, Kuncheria L, Puliyadi R, Chen CM, Papanayotou C, Lockstone H, Dubourg C, David V, Schneider JE, Keane TM, Adams DJ, Brown SDM, Mercier S, Odent S, Collignon J and Bhattacharya S

    Department of Cardiovascular Medicine, BHF Centre of Research Excellence, University of Oxford, Roosevelt Drive, Headington, Oxford, OX3 7BN, United Kingdom.

    ZIC2 mutation is known to cause holoprosencephaly (HPE). A subset of ZIC2 HPE probands harbour cardiovascular and visceral anomalies suggestive of laterality defects. 3D-imaging of novel mouse Zic2 mutants uncovers, in addition to HPE, laterality defects in lungs, heart, vasculature and viscera. A strong bias towards right isomerism indicates a failure to establish left identity in the lateral plate mesoderm (LPM), a phenotype that cannot be explained simply by the defective ciliogenesis previously noted in Zic2 mutants. Gene expression analysis showed that the left-determining NODAL-dependent signalling cascade fails to be activated in the LPM, and that the expression of Nodal at the node, which normally triggers this event, is itself defective in these embryos. Analysis of ChiP-seq data, in vitro transcriptional assays and mutagenesis reveals a requirement for a low-affinity ZIC2 binding site for the activation of the Nodal enhancer HBE, which is normally active in node precursor cells. These data show that ZIC2 is required for correct Nodal expression at the node and suggest a model in which ZIC2 acts at different levels to establish LR asymmetry, promoting both the production of the signal that induces left side identity and the morphogenesis of the cilia that bias its distribution.

    Funded by: British Heart Foundation: FS/11/50/29038, RG/10/17/28553; British Heart Foundation (BHF): CH/09/003/26631, FS/11/50/29038, RE/08/004, RE/13/1/30181, RG/10/17/28553, RM/13/3/30159; Medical Research Council: MC_U142684172; Wellcome Trust: 077012/Z/05/Z, 083228, 090532/Z/09/Z

    Scientific reports 2018;8;1;10439

  • Epigenetic and Transcriptional Variability Shape Phenotypic Plasticity.

    Ecker S, Pancaldi V, Valencia A, Beck S and Paul DS

    UCL Cancer Institute, University College London, 72 Huntley Street, London, WC1E 6BT, UK.

    Epigenetic and transcriptional variability contribute to the vast diversity of cellular and organismal phenotypes and are key in human health and disease. In this review, we describe different types, sources, and determinants of epigenetic and transcriptional variability, enabling cells and organisms to adapt and evolve to a changing environment. We highlight the latest research and hypotheses on how chromatin structure and the epigenome influence gene expression variability. Further, we provide an overview of challenges in the analysis of biological variability. An improved understanding of the molecular mechanisms underlying epigenetic and transcriptional variability, at both the intra- and inter-individual level, provides great opportunity for disease prevention, better therapeutic approaches, and personalized medicine.

    Funded by: British Heart Foundation: RG/08/014/24067, RG/13/13/30194; Medical Research Council: MR/L003120/1

    BioEssays : news and reviews in molecular, cellular and developmental biology 2018;40;2

  • Dynamics of the epigenetic landscape during the maternal-to-zygotic transition.

    Eckersley-Maslin MA, Alda-Catalinas C and Reik W

    Epigenetics Programme, Babraham Institute, Cambridge, UK.

    A remarkable epigenetic remodelling process occurs shortly after fertilization, which restores totipotency to the zygote. This involves global DNA demethylation, chromatin remodelling, genome spatial reorganization and substantial transcriptional changes. Key to these changes is the transition from the maternal environment of the oocyte to an embryonic-driven developmental expression programme, a process termed the maternal-to-zygotic transition (MZT). Zygotic genome activation occurs predominantly at the two-cell stage in mice and the eight-cell stage in humans, yet the dynamics of its control are still mostly obscure. In recent years, partly due to single-cell and low-cell number epigenomic studies, our understanding of the epigenetic and chromatin landscape of preimplantation development has improved considerably. In this Review, we discuss the latest advances in the study of the MZT, focusing on DNA methylation, histone post-translational modifications, local chromatin structure and higher-order genome organization. We also discuss key mechanistic studies that investigate the mode of action of chromatin regulators, transcription factors and non-coding RNAs during preimplantation development. Finally, we highlight areas requiring additional research, as well as new technological advances that could assist in eventually completing our understanding of the MZT.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/K010867/1; Wellcome Trust: 095645/Z/ 11/Z

    Nature reviews. Molecular cell biology 2018;19;7;436-450

  • Inter-homologue repair in fertilized human eggs?

    Egli D, Zuccaro MV, Kosicki M, Church GM, Bradley A and Jasin M

    Department of Obstetrics and Gynecology and Department of Pediatrics, Columbia University, New York, NY, USA.

    Nature 2018;560;7717;E5-E7

  • Lysogenic conversion of atypical enteropathogenic Escherichia coli (aEPEC) from human, murine, and bovine origin with bacteriophage Φ3538 Δstx2::cat proves their enterohemorrhagic E. coli (EHEC) progeny.

    Eichhorn I, Heidemanns K, Ulrich RG, Schmidt H, Semmler T, Fruth A, Bethe A, Goulding D, Pickard D, Karch H and Wieler LH

    Institute for Microbiology and Epizootics, Freie Universität Berlin, Berlin, Germany.

    Bacteriophages play an important role in the evolution of bacterial pathogens. A phage-mediated transfer of stx-genes to atypical enteropathogenic E. coli (aEPEC) which are prevalent in different hosts, would convert them to enterohemorrhagic E. coli (EHEC). We decided to confirm this hypothesis experimentally to provide conclusive evidence that aEPEC isolated from different mammalian hosts are indeed progenitors of typical EHEC which gain the ability to produce Shiga-Toxin by lysogeny with stx-converting bacteriophages, utilizing the model phage Φ3538 Δstx<sub>2</sub>::cat. We applied a modified in vitro plaque-assay, using a high titer of a bacteriophage carrying a deletion in the stx<sub>2</sub> gene (Φ3538 Δstx<sub>2</sub>::cat) to increase the detection of lysogenic conversion events. Three wild-type aEPEC strains were chosen as acceptor strains: the murine aEPEC-strain IMT14505 (sequence type (ST)28, serotype Ont:H6), isolated from a striped field mouse (Apodemus agrarius) in the surrounding of a cattle shed, and the human aEPEC-strain 910#00 (ST28, Ont:H6). The close genomic relationship of both strains implies a high zoonotic potential. A third strain, the bovine aEPEC IMT19981, was of serotype O26:H11 and ST21 (STC29). All three aEPEC were successfully lysogenized with phage Φ3538 Δstx<sub>2</sub>::cat. Integration of the bacteriophage DNA into the aEPEC host genomes was confirmed by amplification of chloramphenicol transferase (cat) marker gene and by Southern-Blot hybridization. Analysis of the whole genome sequence of each of the three lysogens showed that the bacteriophage was integrated into the known tRNA integration site argW, which is highly variable among E. coli. In conclusion, the successful lysogenic conversion of aEPEC with a stx-phage in vitro underlines the important role of aEPEC as progenitors of EHEC. Given the high prevalence and the wide host range of aEPEC acceptors, their high risk of zoonotic transmission should be recognized in infection control measures.

    International journal of medical microbiology : IJMM 2018;308;7;890-898

  • Microevolution of epidemiological highly relevant non-O157 enterohemorrhagic Escherichia coli of serogroups O26 and O111.

    Eichhorn I, Semmler T, Mellmann A, Pickard D, Anjum MF, Fruth A, Karch H and Wieler LH

    Institute of Microbiology and Epizootics, Freie Universität Berlin, Centre for Infection Medicine, Berlin, Germany.

    Enterohemorrhagic Escherichia coli (EHEC) are a cause of bloody diarrhea, hemorrhagic colitis (HC) and the potentially fatal hemolytic uremic syndrome (HUS). While O157:H7 is the dominant EHEC serotype, non-O157 EHEC have emerged as serious causes of disease. In Germany, the most important non-O157 O-serogroups causing one third of EHEC infections, including diarrhea as well as HUS, are O26, O103, O111 and O145. Interestingly, we identified EHEC O-serogroups O26 and O111 in one single sequence type complex, STC29, that also harbours atypical enteropathogenic E. coli (aEPEC). aEPEC differ from typical EHEC merely in the absence of stx-genes. These findings inspired us to unravel a putative microevolutionary scenario of these non-O157 EHEC by whole genome analyses. Analysis of single nucleotide polymorphisms (SNPs) of the maximum common genome (MCG) of 20 aEPEC (11 human/ 9 bovine) and 79 EHEC (42 human/ 36 bovine/ 1 food source) of STC29 identified three distinct clusters: Cluster 1 harboured strains of O-serogroup O111, the central Cluster 2 harboured only O26 aEPEC strains, while the more heterogeneous Cluster 3 contained both EHEC and aEPEC strains of O-serogroup O26. Further combined analyses of accessory virulence associated genes (VAGs) and insertion sites for mobile genetic elements suggested a parallel evolution of the MCG and the acquisition of virulence genes. The resulting microevolutionary model suggests the development of two distinct EHEC lineages from one common aEPEC ancestor of ST29 by lysogenic conversion with stx-converting bacteriophages, independent of the host species the strains had been isolated from. In conclusion, our cumulative data indicate that EHEC of O-serogroups O26 and O111 of STC29 originate from a common aEPEC ancestor and are bona fide zoonotic agents. The role of aEPEC in the emergence of O26 and O111 EHEC should be considered for infection control measures to prevent possible lysogenic conversion with stx-converting bacteriophages as major vehicle driving the emergence of EHEC lineages with direct Public Health consequences.

    International journal of medical microbiology : IJMM 2018;308;8;1085-1095

  • HIV treatment is associated with a two-fold higher probability of raised triglycerides: Pooled Analyses in 21 023 individuals in sub-Saharan Africa.

    Ekoru K, Young EH, Dillon DG, Gurdasani D, Stehouwer N, Faurholt-Jepsen D, Levitt NS, Crowther NJ, Nyirenda M, Njelekela MA, Ramaiya K, Nyan O, Adewole OO, Anastos K, Compostella C, Dave JA, Fourie CM, Friis H, Kruger IM, Longenecker CT, Maher DP, Mutimura E, Ndhlovu CE, Praygod G, Pefura Yone EW, Pujades-Rodriguez M, Range N, Sani MU, Sanusi M, Schutte AE, Sliwa K, Tien PC, Vorster EH, Walsh C, Gareta D, Mashili F, Sobngwi E, Adebamowo C, Kamali A, Seeley J, Smeeth L, Pillay D, Motala AA, Kaleebu P and Sandhu MS

    Department of Medicine, University of Cambridge, Cambridge, United Kingdom.

    Background: Anti-retroviral therapy (ART) regimes for HIV are associated with raised levels of circulating triglycerides (TG) in western populations. However, there are limited data on the impact of ART on cardiometabolic risk in sub-Saharan African (SSA) populations.

    Methods: Pooled analyses of 14 studies comprising 21 023 individuals, on whom relevant cardiometabolic risk factors (including TG), HIV and ART status were assessed between 2003 and 2014, in SSA. The association between ART and raised TG (>2.3 mmol/L) was analysed using regression models.

    Findings: Among 10 615 individuals, ART was associated with a two-fold higher probability of raised TG (RR 2.05, 95% CI 1.51-2.77, I<sup>2</sup>=45.2%). The associations between ART and raised blood pressure, glucose, HbA1c, and other lipids were inconsistent across studies.

    Interpretation: Evidence from this study confirms the association of ART with raised TG in SSA populations. Given the possible causal effect of raised TG on cardiovascular disease (CVD), the evidence highlights the need for prospective studies to clarify the impact of long term ART on CVD outcomes in SSA.

    Funded by: Medical Research Council: MR/K013491/1; World Health Organization: 001

    Global health, epidemiology and genomics 2018;3

  • Uncovering Natural Longevity Alleles from Intercrossed Pools of Aging Fission Yeast Cells.

    Ellis DA, Mustonen V, Rodríguez-López M, Rallis C, Malecki M, Jeffares DC and Bähler J

    Department of Genetics, Evolution and Environment and Institute of Healthy Ageing, University College London, WC1E 6BT, U.K.

    Quantitative traits often show large variation caused by multiple genetic factors . One such trait is the chronological lifespan of non-dividing yeast cells, serving as a model for cellular aging. Screens for genetic factors involved in aging typically assay mutants of protein-coding genes. To identify natural genetic variants contributing to cellular aging, we exploited two strains of the fission yeast, <i>Schizosaccharomyces pombe</i>, that differ in chronological lifespan. We generated segregant pools from these strains and subjected them to advanced intercrossing over multiple generations to break up linkage groups. We chronologically aged the intercrossed segregant pool, followed by genome sequencing at different times to detect genetic variants that became reproducibly enriched as a function of age. A region on Chromosome II showed strong positive selection during aging. Based on expected functions, two candidate variants from this region in the long-lived strain were most promising to be causal: small insertions and deletions in the 5'-untranslated regions of <i>ppk31</i> and <i>SPBC409.08</i> Ppk31 is an ortholog of Rim15, a conserved kinase controlling cell proliferation in response to nutrients, while SPBC409.08 is a predicted spermine transmembrane transporter. Both Rim15 and the spermine-precursor, spermidine, are implicated in aging as they are involved in autophagy-dependent lifespan extension. Single and double allele replacement suggests that both variants, alone or combined, have subtle effects on cellular longevity. Furthermore, deletion mutants of both <i>ppk31</i> and <i>SPBC409.08</i> rescued growth defects caused by spermidine. We propose that Ppk31 and SPBC409.08 may function together to modulate lifespan, thus linking Rim15/Ppk31 with spermidine metabolism.

    Funded by: Biotechnology and Biological Sciences Research Council; Wellcome Trust: 095598/Z/11/Z

    Genetics 2018;210;2;733-744

  • Analysis of predicted loss-of-function variants in UK Biobank identifies variants protective for disease.

    Emdin CA, Khera AV, Chaffin M, Klarin D, Natarajan P, Aragam K, Haas M, Bick A, Zekavat SM, Nomura A, Ardissino D, Wilson JG, Schunkert H, McPherson R, Watkins H, Elosua R, Bown MJ, Samani NJ, Baber U, Erdmann J, Gupta N, Danesh J, Chasman D, Ridker P, Denny J, Bastarache L, Lichtman JH, D'Onofrio G, Mattera J, Spertus JA, Sheu WH, Taylor KD, Psaty BM, Rich SS, Post W, Rotter JI, Chen YI, Krumholz H, Saleheen D, Gabriel S and Kathiresan S

    Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, 02114, USA.

    Less than 3% of protein-coding genetic variants are predicted to result in loss of protein function through the introduction of a stop codon, frameshift, or the disruption of an essential splice site; however, such predicted loss-of-function (pLOF) variants provide insight into effector transcript and direction of biological effect. In >400,000 UK Biobank participants, we conduct association analyses of 3759 pLOF variants with six metabolic traits, six cardiometabolic diseases, and twelve additional diseases. We identified 18 new low-frequency or rare (allele frequency < 5%) pLOF variant-phenotype associations. pLOF variants in the gene GPR151 protect against obesity and type 2 diabetes, in the gene IL33 against asthma and allergic disease, and in the gene IFIH1 against hypothyroidism. In the gene PDE3B, pLOF variants associate with elevated height, improved body fat distribution and protection from coronary artery disease. Our findings prioritize genes for which pharmacologic mimics of pLOF variants may lower risk for disease.

    Funded by: British Heart Foundation: CS/14/2/30841, RG/13/13/30194, RG/16/4/32218; Medical Research Council: G0800270, MC_PC_17228, MC_QA137853, MR/L003120/1, MR/P013880/1, MR/P02811X/1; NCATS NIH HHS: KL2 TR001100, UL1 TR000040, UL1 TR001079, UL1 TR001420, UL1 TR001881; NCI NIH HHS: R01 CA047988, UM1 CA182913; NHGRI NIH HHS: UM1 HG008895; NHLBI NIH HHS: HHSN268201100005C, HHSN268201100005G, HHSN268201100005I, HHSN268201100006C, HHSN268201100007C, HHSN268201100007I, HHSN268201100008C, HHSN268201100008I, HHSN268201100009C, HHSN268201100009I, HHSN268201100010C, HHSN268201100011C, HHSN268201100011I, HHSN268201100012C, HHSN268201500003C, K08 HL140203, N01HC95159, N01HC95160, N01HC95161, N01HC95162, N01HC95163, N01HC95164, N01HC95165, N01HC95166, N01HC95167, N01HC95168, N01HC95169, R01 HL043851, R01 HL080467, R01 HL081153, RC2 HL102419; NIDDK NIH HHS: P30 DK063491; NIGMS NIH HHS: T32 GM007205, U54 GM115428

    Nature communications 2018;9;1;1613

  • Phenotypic Consequences of a Genetic Predisposition to Enhanced Nitric Oxide Signaling.

    Emdin CA, Khera AV, Klarin D, Natarajan P, Zekavat SM, Nomura A, Haas M, Aragam K, Ardissino D, Wilson JG, Schunkert H, McPherson R, Watkins H, Elosua R, Bown MJ, Samani NJ, Baber U, Erdmann J, Gormley P, Palotie A, Stitziel NO, Gupta N, Danesh J, Saleheen D, Gabriel S and Kathiresan S

    Center for Genomic Medicine (C.A.E., A.V.K., D.K., P.N., S.M.Z., A.N., M.H., K.A., A.P., N.G., S.G., S.K.).

    Background: Nitric oxide signaling plays a key role in the regulation of vascular tone and platelet activation. Here, we seek to understand the impact of a genetic predisposition to enhanced nitric oxide signaling on risk for cardiovascular diseases, thus informing the potential utility of pharmacological stimulation of the nitric oxide pathway as a therapeutic strategy.

    Methods: We analyzed the association of common and rare genetic variants in 2 genes that mediate nitric oxide signaling (Nitric Oxide Synthase 3 [<i>NOS3</i>] and Guanylate Cyclase 1, Soluble, Alpha 3 [<i>GUCY1A3</i>]) with a range of human phenotypes. We selected 2 common variants (rs3918226 in <i>NOS3</i> and rs7692387 in <i>GUCY1A3</i>) known to associate with increased <i>NOS3</i> and <i>GUCY1A3</i> expression and reduced mean arterial pressure, combined them into a genetic score, and standardized this exposure to a 5 mm Hg reduction in mean arterial pressure. Using individual-level data from 335 464 participants in the UK Biobank and summary association results from 7 large-scale genome-wide association studies, we examined the effect of this nitric oxide signaling score on cardiometabolic and other diseases. We also examined whether rare loss-of-function mutations in <i>NOS3</i> and <i>GUCY1A3</i> were associated with coronary heart disease using gene sequencing data from the Myocardial Infarction Genetics Consortium (n=27 815).

    Results: A genetic predisposition to enhanced nitric oxide signaling was associated with reduced risks of coronary heart disease (odds ratio, 0.37; 95% confidence interval [CI], 0.31-0.45; <i>P</i>=5.5*10<sup>-26</sup>], peripheral arterial disease (odds ratio 0.42; 95% CI, 0.26-0.68; <i>P</i>=0.0005), and stroke (odds ratio, 0.53; 95% CI, 0.37-0.76; <i>P</i>=0.0006). In a mediation analysis, the effect of the genetic score on decreased coronary heart disease risk extended beyond its effect on blood pressure. Conversely, rare variants that inactivate the <i>NOS3</i> or <i>GUCY1A3</i> genes were associated with a 23 mm Hg higher systolic blood pressure (95% CI, 12-34; <i>P</i>=5.6*10<sup>-5</sup>) and a 3-fold higher risk of coronary heart disease (odds ratio, 3.03; 95% CI, 1.29-7.12; <i>P</i>=0.01).

    Conclusions: A genetic predisposition to enhanced nitric oxide signaling is associated with reduced risks of coronary heart disease, peripheral arterial disease, and stroke. Pharmacological stimulation of nitric oxide signaling may prove useful in the prevention or treatment of cardiovascular disease.

    Funded by: British Heart Foundation: CS/14/2/30841; Medical Research Council: MC_QA137853; NCATS NIH HHS: KL2 TR001100; NHGRI NIH HHS: U54 HG003067; NHLBI NIH HHS: HHSN268201300046C, HHSN268201300047C, HHSN268201300048C, HHSN268201300049C, HHSN268201300050C, K08 HL114642, R01 HL127564, R01 HL131961, RC2 HL102923, RC2 HL102924, RC2 HL102925, RC2 HL102926, RC2 HL103010; NIGMS NIH HHS: U54 GM115428

    Circulation 2018;137;3;222-232

  • SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data.

    Epping L, van Tonder AJ, Gladstone RA, The Global Pneumococcal Sequencing Consortium, Bentley SD, Page AJ and Keane JA

    2​Microbial Genomics, Robert Koch Institute, Berlin, Germany.

    Streptococcus pneumoniae is responsible for 240 000-460 000 deaths in children under 5 years of age each year. Accurate identification of pneumococcal serotypes is important for tracking the distribution and evolution of serotypes following the introduction of effective vaccines. Recent efforts have been made to infer serotypes directly from genomic data but current software approaches are limited and do not scale well. Here, we introduce a novel method, SeroBA, which uses a k-mer approach. We compare SeroBA against real and simulated data and present results on the concordance and computational performance against a validation dataset, the robustness and scalability when analysing a large dataset, and the impact of varying the depth of coverage on sequence-based serotyping. SeroBA can predict serotypes, by identifying the cps locus, directly from raw whole genome sequencing read data with 98 % concordance using a k-mer-based method, can process 10 000 samples in just over 1 day using a standard server and can call serotypes at a coverage as low as 15-21×. SeroBA is implemented in Python3 and is freely available under an open source GPLv3 licence from:

    Funded by: Wellcome Trust: WT 098051

    Microbial genomics 2018;4;7

  • Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits.

    Evangelou E, Warren HR, Mosen-Ansorena D, Mifsud B, Pazoki R, Gao H, Ntritsos G, Dimou N, Cabrera CP, Karaman I, Ng FL, Evangelou M, Witkowska K, Tzanis E, Hellwege JN, Giri A, Velez Edwards DR, Sun YV, Cho K, Gaziano JM, Wilson PWF, Tsao PS, Kovesdy CP, Esko T, Mägi R, Milani L, Almgren P, Boutin T, Debette S, Ding J, Giulianini F, Holliday EG, Jackson AU, Li-Gao R, Lin WY, Luan J, Mangino M, Oldmeadow C, Prins BP, Qian Y, Sargurupremraj M, Shah N, Surendran P, Thériault S, Verweij N, Willems SM, Zhao JH, Amouyel P, Connell J, de Mutsert R, Doney ASF, Farrall M, Menni C, Morris AD, Noordam R, Paré G, Poulter NR, Shields DC, Stanton A, Thom S, Abecasis G, Amin N, Arking DE, Ayers KL, Barbieri CM, Batini C, Bis JC, Blake T, Bochud M, Boehnke M, Boerwinkle E, Boomsma DI, Bottinger EP, Braund PS, Brumat M, Campbell A, Campbell H, Chakravarti A, Chambers JC, Chauhan G, Ciullo M, Cocca M, Collins F, Cordell HJ, Davies G, de Borst MH, de Geus EJ, Deary IJ, Deelen J, Del Greco M F, Demirkale CY, Dörr M, Ehret GB, Elosua R, Enroth S, Erzurumluoglu AM, Ferreira T, Frånberg M, Franco OH, Gandin I, Gasparini P, Giedraitis V, Gieger C, Girotto G, Goel A, Gow AJ, Gudnason V, Guo X, Gyllensten U, Hamsten A, Harris TB, Harris SE, Hartman CA, Havulinna AS, Hicks AA, Hofer E, Hofman A, Hottenga JJ, Huffman JE, Hwang SJ, Ingelsson E, James A, Jansen R, Jarvelin MR, Joehanes R, Johansson Å, Johnson AD, Joshi PK, Jousilahti P, Jukema JW, Jula A, Kähönen M, Kathiresan S, Keavney BD, Khaw KT, Knekt P, Knight J, Kolcic I, Kooner JS, Koskinen S, Kristiansson K, Kutalik Z, Laan M, Larson M, Launer LJ, Lehne B, Lehtimäki T, Liewald DCM, Lin L, Lind L, Lindgren CM, Liu Y, Loos RJF, Lopez LM, Lu Y, Lyytikäinen LP, Mahajan A, Mamasoula C, Marrugat J, Marten J, Milaneschi Y, Morgan A, Morris AP, Morrison AC, Munson PJ, Nalls MA, Nandakumar P, Nelson CP, Niiranen T, Nolte IM, Nutile T, Oldehinkel AJ, Oostra BA, O'Reilly PF, Org E, Padmanabhan S, Palmas W, Palotie A, Pattie A, Penninx BWJH, Perola M, Peters A, Polasek O, Pramstaller PP, Nguyen QT, Raitakari OT, Ren M, Rettig R, Rice K, Ridker PM, Ried JS, Riese H, Ripatti S, Robino A, Rose LM, Rotter JI, Rudan I, Ruggiero D, Saba Y, Sala CF, Salomaa V, Samani NJ, Sarin AP, Schmidt R, Schmidt H, Shrine N, Siscovick D, Smith AV, Snieder H, Sõber S, Sorice R, Starr JM, Stott DJ, Strachan DP, Strawbridge RJ, Sundström J, Swertz MA, Taylor KD, Teumer A, Tobin MD, Tomaszewski M, Toniolo D, Traglia M, Trompet S, Tuomilehto J, Tzourio C, Uitterlinden AG, Vaez A, van der Most PJ, van Duijn CM, Vergnaud AC, Verwoert GC, Vitart V, Völker U, Vollenweider P, Vuckovic D, Watkins H, Wild SH, Willemsen G, Wilson JF, Wright AF, Yao J, Zemunik T, Zhang W, Attia JR, Butterworth AS, Chasman DI, Conen D, Cucca F, Danesh J, Hayward C, Howson JMM, Laakso M, Lakatta EG, Langenberg C, Melander O, Mook-Kanamori DO, Palmer CNA, Risch L, Scott RA, Scott RJ, Sever P, Spector TD, van der Harst P, Wareham NJ, Zeggini E, Levy D, Munroe PB, Newton-Cheh C, Brown MJ, Metspalu A, Hung AM, O'Donnell CJ, Edwards TL, Psaty BM, Tzoulaki I, Barnes MR, Wain LV, Elliott P, Caulfield MJ and Million Veteran Program

    Department of Epidemiology and Biostatistics, Imperial College London, London, UK.

    High blood pressure is a highly heritable and modifiable risk factor for cardiovascular disease. We report the largest genetic association study of blood pressure traits (systolic, diastolic and pulse pressure) to date in over 1 million people of European ancestry. We identify 535 novel blood pressure loci that not only offer new biological insights into blood pressure regulation but also highlight shared genetic architecture between blood pressure and lifestyle exposures. Our findings identify new biological pathways for blood pressure regulation with potential for improved cardiovascular disease prevention in the future.

    Funded by: BLRD VA: I01 BX003360; Biotechnology and Biological Sciences Research Council: BB/F019394/1; British Heart Foundation: FS/12/82/29736, RG/13/13/30194, RG/15/12/31616, SP/13/2/30111; CSRD VA: I01 CX000982; Cancer Research UK: 14136; Medical Research Council: G0401527, G0601966, G1000143, G1001799, MC_PC_13040, MC_PC_U127561128, MC_UU_00007/10, MC_UU_12015/1, MR/K007017/1, MR/L003120/1, MR/L01341X/1, MR/L01632X/1, MR/M004422/1, MR/N003284/1, MR/N01104X/1, MR/N01104X/2, MR/N015746/1, MR/R023484/1, MR/S003746/1; NCI NIH HHS: T32 CA160056; NCRR NIH HHS: M01 RR000070; NHGRI NIH HHS: U01 HG007417; NHLBI NIH HHS: R01 HL105756, R21 HL121429, U01 HL130114; NICHD NIH HHS: K12 HD043483; NIDDK NIH HHS: P30 DK020572, R01 DK062370, R01 DK075787, R01 DK101855, R01 DK107786, R01 DK110113, U01 DK062370, U01 DK102163; NIH HHS: S10 OD023680; Wellcome Trust

    Nature genetics 2018;50;10;1412-1425

  • The Reactome Pathway Knowledgebase.

    Fabregat A, Jupe S, Matthews L, Sidiropoulos K, Gillespie M, Garapati P, Haw R, Jassal B, Korninger F, May B, Milacic M, Roca CD, Rothfels K, Sevilla C, Shamovsky V, Shorser S, Varusai T, Viteri G, Weiser J, Wu G, Stein L, Hermjakob H and D'Eustachio P

    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.

    The Reactome Knowledgebase ( provides molecular details of signal transduction, transport, DNA replication, metabolism, and other cellular processes as an ordered network of molecular transformations-an extended version of a classic metabolic map, in a single consistent data model. Reactome functions both as an archive of biological processes and as a tool for discovering unexpected functional relationships in data such as gene expression profiles or somatic mutation catalogues from tumor cells. To support the continued brisk growth in the size and complexity of Reactome, we have implemented a graph database, improved performance of data analysis tools, and designed new data structures and strategies to boost diagram viewer performance. To make our website more accessible to human users, we have improved pathway display and navigation by implementing interactive Enhanced High Level Diagrams (EHLDs) with an associated icon library, and subpathway highlighting and zooming, in a simplified and reorganized web site with adaptive design. To encourage re-use of our content, we have enabled export of pathway diagrams as 'PowerPoint' files.

    Funded by: NHGRI NIH HHS: U41 HG003751

    Nucleic acids research 2018;46;D1;D649-D655

  • Reactome graph database: Efficient access to complex pathway data.

    Fabregat A, Korninger F, Viteri G, Sidiropoulos K, Marin-Garcia P, Ping P, Wu G, Stein L, D'Eustachio P and Hermjakob H

    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom.

    Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types.

    Funded by: NHGRI NIH HHS: U41 HG003751; NIH HHS: P41HG003751 , U54GM114833

    PLoS computational biology 2018;14;1;e1005968

  • Epistasis studies reveal redundancy among calcium-dependent protein kinases in motility and invasion of malaria parasites.

    Fang H, Gomes AR, Klages N, Pino P, Maco B, Walker EM, Zenonos ZA, Angrisano F, Baum J, Doerig C, Baker DA, Billker O and Brochet M

    Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, Geneva, CH-1211, Switzerland.

    In malaria parasites, evolution of parasitism has been linked to functional optimisation. Despite this optimisation, most members of a calcium-dependent protein kinase (CDPK) family show genetic redundancy during erythrocytic proliferation. To identify relationships between phospho-signalling pathways, we here screen 294 genetic interactions among protein kinases in Plasmodium berghei. This reveals a synthetic negative interaction between a hypomorphic allele of the protein kinase G (PKG) and CDPK4 to control erythrocyte invasion which is conserved in P. falciparum. CDPK4 becomes critical when PKG-dependent calcium signals are attenuated to phosphorylate proteins important for the stability of the inner membrane complex, which serves as an anchor for the acto-myosin motor required for motility and invasion. Finally, we show that multiple kinases functionally complement CDPK4 during erythrocytic proliferation and transmission to the mosquito. This study reveals how CDPKs are wired within a stage-transcending signalling network to control motility and host cell invasion in malaria parasites.

    Funded by: EC | European Research Council (ERC): 695596; Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (Swiss National Science Foundation): BSSGI0_155852; Wellcome Trust: 098051, 100993/Z/13/Z, 106240/Z/14/Z

    Nature communications 2018;9;1;4248

  • Telomerecat: A ploidy-agnostic method for estimating telomere length from whole genome sequencing data.

    Farmery JHR, Smith ML, NIHR BioResource - Rare Diseases and Lynch AG

    Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.

    Telomere length is a risk factor in disease and the dynamics of telomere length are crucial to our understanding of cell replication and vitality. The proliferation of whole genome sequencing represents an unprecedented opportunity to glean new insights into telomere biology on a previously unimaginable scale. To this end, a number of approaches for estimating telomere length from whole-genome sequencing data have been proposed. Here we present Telomerecat, a novel approach to the estimation of telomere length. Previous methods have been dependent on the number of telomeres present in a cell being known, which may be problematic when analysing aneuploid cancer data and non-human samples. Telomerecat is designed to be agnostic to the number of telomeres present, making it suited for the purpose of estimating telomere length in cancer studies. Telomerecat also accounts for interstitial telomeric reads and presents a novel approach to dealing with sequencing errors. We show that Telomerecat performs well at telomere length estimation when compared to leading experimental and computational methods. Furthermore, we show that it detects expected patterns in longitudinal data, repeated measurements, and cross-species comparisons. We also apply the method to a cancer cell data, uncovering an interesting relationship with the underlying telomerase genotype.

    Funded by: Medical Research Council: MC_UU_00002/10, MR/K023489/1, MR/L006197/1; Wellcome Trust

    Scientific reports 2018;8;1;1300

  • Relative Suffix Trees.

    Farruggia A, Gagie T, Navarro G, Puglisi SJ and Sirén J

    Department of Computer Science, University of Pisa, Largo Bruno Pontecorvo 3, 56127 Pisa PI, Italy.

    Suffix trees are one of the most versatile data structures in stringology, with many applications in bioinformatics. Their main drawback is their size, which can be tens of times larger than the input sequence. Much effort has been put into reducing the space usage, leading ultimately to compressed suffix trees. These compressed data structures can efficiently simulate the suffix tree, while using space proportional to a compressed representation of the sequence. In this work, we take a new approach to compressed suffix trees for repetitive sequence collections, such as collections of individual genomes. We compress the suffix trees of individual sequences relative to the suffix tree of a reference sequence. These relative data structures provide competitive time/space trade-offs, being almost as small as the smallest compressed suffix trees for repetitive collections, and competitive in time with the largest and fastest compressed suffix trees.

    The computer journal 2018;61;5;773-788

  • Histone Lysine Methylases and Demethylases in the Landscape of Human Developmental Disorders.

    Faundes V, Newman WG, Bernardini L, Canham N, Clayton-Smith J, Dallapiccola B, Davies SJ, Demos MK, Goldman A, Gill H, Horton R, Kerr B, Kumar D, Lehman A, McKee S, Morton J, Parker MJ, Rankin J, Robertson L, Temple IK, Clinical Assessment of the Utility of Sequencing and Evaluation as a Service (CAUSES) Study, Deciphering Developmental Disorders (DDD) Study and Banka S

    Manchester Centre for Genomic Medicine, Division of Evolution & Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9WL, UK; Laboratorio de Genética y Enfermedades Metabólicas, Instituto de Nutrición y Tecnología de los Alimentos , Universidad de Chile, Santiago 7830490, Chile.

    Histone lysine methyltransferases (KMTs) and demethylases (KDMs) underpin gene regulation. Here we demonstrate that variants causing haploinsufficiency of KMTs and KDMs are frequently encountered in individuals with developmental disorders. Using a combination of human variation databases and existing animal models, we determine 22 KMTs and KDMs as additional candidates for dominantly inherited developmental disorders. We show that KMTs and KDMs that are associated with, or are candidates for, dominant developmental disorders tend to have a higher level of transcription, longer canonical transcripts, more interactors, and a higher number and more types of post-translational modifications than other KMT and KDMs. We provide evidence to firmly associate KMT2C, ASH1L, and KMT5B haploinsufficiency with dominant developmental disorders. Whereas KMT2C or ASH1L haploinsufficiency results in a predominantly neurodevelopmental phenotype with occasional physical anomalies, KMT5B mutations cause an overgrowth syndrome with intellectual disability. We further expand the phenotypic spectrum of KMT2B-related disorders and show that some individuals can have severe developmental delay without dystonia at least until mid-childhood. Additionally, we describe a recessive histone lysine-methylation defect caused by homozygous or compound heterozygous KDM5B variants and resulting in a recognizable syndrome with developmental delay, facial dysmorphism, and camptodactyly. Collectively, these results emphasize the significance of histone lysine methylation in normal human development and the importance of this process in human developmental disorders. Our results demonstrate that systematic clinically oriented pathway-based analysis of genomic data can accelerate the discovery of rare genetic disorders.

    Funded by: Wellcome Trust

    American journal of human genetics 2018;102;1;175-187

  • Maturing Human CD127+ CCR7+ PDL1+ Dendritic Cells Express AIRE in the Absence of Tissue Restricted Antigens.

    Fergusson JR, Morgan MD, Bruchard M, Huitema L, Heesters BA, van Unen V, van Hamburg JP, van der Wel NN, Picavet D, Koning F, Tas SW, Anderson MS, Marioni JC, Holländer GA and Spits H

    Department of Experimental Immunology, Academic Medical Center, Amsterdam, Netherlands.

    Expression of the Autoimmune regulator (AIRE) outside of the thymus has long been suggested in both humans and mice, but the cellular source in humans has remained undefined. Here we identify AIRE expression in human tonsils and extensively analyzed these "extra-thymic AIRE expressing cells" (eTACs) using combinations of flow cytometry, CyTOF and single cell RNA-sequencing. We identified AIRE+ cells as dendritic cells (DCs) with a mature and migratory phenotype including high levels of antigen presenting molecules and costimulatory molecules, and specific expression of CD127, CCR7, and PDL1. These cells also possessed the ability to stimulate and re-stimulate T cells and displayed reduced responses to toll-like receptor (TLR) agonists compared to conventional DCs. While expression of <i>AIRE</i> was enriched within CCR7+CD127+ DCs, single-cell RNA sequencing revealed expression of <i>AIRE</i> to be transient, rather than stable, and associated with the differentiation to a mature phenotype. The role of AIRE in central tolerance induction within the thymus is well-established, however our study shows that <i>AIRE</i> expression within the periphery is not associated with an enriched expression of tissue-restricted antigens (TRAs). This unexpected finding, suggestive of wider functions of AIRE, may provide an explanation for the non-autoimmune symptoms of APECED patients who lack functional AIRE.

    Funded by: Cancer Research UK: 17197; NIDDK NIH HHS: R01 DK101622; Wellcome Trust: 105045/Z/14/Z

    Frontiers in immunology 2018;9;2902

  • Beyond the lysosome: Cholesterol role on endoplasmic reticulum and lipid droplets in Parkinson's disease.

    Fernandes HJR

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom.

    Movement disorders : official journal of the Movement Disorder Society 2018;33;2;342

  • Development and worldwide use of non-lethal, and minimal population-level impact, protocols for the isolation of amphibian chytrid fungi.

    Fisher MC, Ghosh P, Shelton JMG, Bates K, Brookes L, Wierzbicki C, Rosa GM, Farrer RA, Aanensen DM, Alvarado-Rybak M, Bataille A, Berger L, Böll S, Bosch J, Clare FC, A Courtois E, Crottini A, Cunningham AA, Doherty-Bone TM, Gebresenbet F, Gower DJ, Höglund J, James TY, Jenkinson TS, Kosch TA, Lambertini C, Laurila A, Lin CF, Loyau A, Martel A, Meurling S, Miaud C, Minting P, Ndriantsoa S, O'Hanlon SJ, Pasmans F, Rakotonanahary T, Rabemananjara FCE, Ribeiro LP, Schmeller DS, Schmidt BR, Skerratt L, Smith F, Soto-Azat C, Tessa G, Toledo LF, Valenzuela-Sánchez A, Verster R, Vörös J, Waldman B, Webb RJ, Weldon C, Wombwell E, Zamudio KR, Longcore JE and Garner TWJ

    Department of Infectious Disease Epidemiology, School of Public Health, Faculty of Medicine (St Mary's campus), Imperial College London, London, W2 1PG, UK.

    Parasitic chytrid fungi have emerged as a significant threat to amphibian species worldwide, necessitating the development of techniques to isolate these pathogens into culture for research purposes. However, early methods of isolating chytrids from their hosts relied on killing amphibians. We modified a pre-existing protocol for isolating chytrids from infected animals to use toe clips and biopsies from toe webbing rather than euthanizing hosts, and distributed the protocol to researchers as part of the BiodivERsA project RACE; here called the RML protocol. In tandem, we developed a lethal procedure for isolating chytrids from tadpole mouthparts. Reviewing a database of use a decade after their inception, we find that these methods have been applied across 5 continents, 23 countries and in 62 amphibian species. Isolation of chytrids by the non-lethal RML protocol occured in 18% of attempts with 207 fungal isolates and three species of chytrid being recovered. Isolation of chytrids from tadpoles occured in 43% of attempts with 334 fungal isolates of one species (Batrachochytrium dendrobatidis) being recovered. Together, these methods have resulted in a significant reduction and refinement of our use of threatened amphibian species and have improved our ability to work with this group of emerging pathogens.

    Funded by: Medical Research Council: MR/R015600/1

    Scientific reports 2018;8;1;7772

  • Recurrent rearrangements of FOS and FOSB define osteoblastoma.

    Fittall MW, Mifsud W, Pillay N, Ye H, Strobl AC, Verfaillie A, Demeulemeester J, Zhang L, Berisha F, Tarabichi M, Young MD, Miranda E, Tarpey PS, Tirabosco R, Amary F, Grigoriadis AE, Stratton MR, Van Loo P, Antonescu CR, Campbell PJ, Flanagan AM and Behjati S

    The Francis Crick Institute, London, NW1 1AT, UK.

    The transcription factor FOS has long been implicated in the pathogenesis of bone tumours, following the discovery that the viral homologue, v-fos, caused osteosarcoma in laboratory mice. However, mutations of FOS have not been found in human bone-forming tumours. Here, we report recurrent rearrangement of FOS and its paralogue, FOSB, in the most common benign tumours of bone, osteoblastoma and osteoid osteoma. Combining whole-genome DNA and RNA sequences, we find rearrangement of FOS in five tumours and of FOSB in one tumour. Extending our findings into a cohort of 55 cases, using FISH and immunohistochemistry, provide evidence of ubiquitous mutation of FOS or FOSB in osteoblastoma and osteoid osteoma. Overall, our findings reveal a human bone tumour defined by mutations of FOS and FOSB.

    Funded by: NCI NIH HHS: P30 CA008748, P50 CA140146; Wellcome Trust

    Nature communications 2018;9;1;2150

  • Proteomic identification of Axc, a novel beta-lactamase with carbapenemase activity in a meropenem-resistant clinical isolate of Achromobacter xylosoxidans.

    Fleurbaaij F, Henneman AA, Corver J, Knetsch CW, Smits WK, Nauta ST, Giera M, Dragan I, Kumar N, Lawley TD, Verhoeven A, van Leeuwen HC, Kuijper EJ and Hensbergen PJ

    Department of Medical Microbiology, Leiden University Medical Center, 2333 ZA, Leiden, The Netherlands.

    The development of antibiotic resistance during treatment is a threat to patients and their environment. Insight in the mechanisms of resistance development is important for appropriate therapy and infection control. Here, we describe how through the application of mass spectrometry-based proteomics, a novel beta-lactamase Axc was identified as an indicator of acquired carbapenem resistance in a clinical isolate of Achromobacter xylosoxidans. Comparative proteomic analysis of consecutively collected susceptible and resistant isolates from the same patient revealed that high Axc protein levels were only observed in the resistant isolate. Heterologous expression of Axc in Escherichia coli significantly increased the resistance towards carbapenems. Importantly, direct Axc mediated hydrolysis of imipenem was demonstrated using pH shift assays and <sup>1</sup>H-NMR, confirming Axc as a legitimate carbapenemase. Whole genome sequencing revealed that the susceptible and resistant isolates were remarkably similar. Together these findings provide a molecular context for the fast development of meropenem resistance in A. xylosoxidans during treatment and demonstrate the use of mass spectrometric techniques in identifying novel resistance determinants.

    Scientific reports 2018;8;1;8181

  • Interleukin-22 promotes phagolysosomal fusion to induce protection against Salmonella enterica Typhimurium in human epithelial cells.

    Forbester JL, Lees EA, Goulding D, Forrest S, Yeung A, Speak A, Clare S, Coomber EL, Mukhopadhyay S, Kraiczy J, Schreiber F, Lawley TD, Hancock REW, Uhlig HH, Zilbauer M, Powrie F and Dougan G

    Institute of Infection and Immunity, School of Medicine, Cardiff University, Cardiff CF14 4XN, United Kingdom;

    Intestinal epithelial cells (IECs) play a key role in regulating immune responses and controlling infection. However, the direct role of IECs in restricting pathogens remains incompletely understood. Here, we provide evidence that IL-22 primed intestinal organoids derived from healthy human induced pluripotent stem cells (hIPSCs) to restrict <i>Salmonella enterica</i> serovar Typhimurium SL1344 infection. A combination of transcriptomics, bacterial invasion assays, and imaging suggests that IL-22-induced antimicrobial activity is driven by increased phagolysosomal fusion in IL-22-pretreated cells. The antimicrobial phenotype was absent in hIPSCs derived from a patient harboring a homozygous mutation in the <i>IL10RB</i> gene that inactivates the IL-22 receptor but was restored by genetically complementing the IL10RB deficiency. This study highlights a mechanism through which the IL-22 pathway facilitates the human intestinal epithelium to control microbial infection.

    Funded by: Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 2018;115;40;10118-10123

  • Natural Genetic Variation in a Multigenerational Phenotype in C. elegans.

    Frézal L, Demoinet E, Braendle C, Miska E and Félix MA

    Institut de Biologie de l'Ecole Normale Supérieure, Centre National de la Recherche Scientifique, INSERM, École Normale Supérieure, Paris Sciences et Lettres, Paris, France; Wellcome Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, UK; Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK.

    Although heredity mostly relies on the transmission of DNA sequence, additional molecular and cellular features are heritable across several generations. In the nematode Caenorhabditis elegans, insights into such unconventional inheritance result from two lines of work. First, the mortal germline (Mrt) phenotype was defined as a multigenerational phenotype whereby a selfing lineage becomes sterile after several generations, implying multigenerational memory [1, 2]. Second, certain RNAi effects are heritable over several generations in the absence of the initial trigger [3-5]. Both lines of work converged when the subset of Mrt mutants that are heat sensitive were found to closely correspond to mutants defective in the RNAi-inheritance machinery, including histone modifiers [6-9]. Here, we report the surprising finding that several C. elegans wild isolates display a heat-sensitive mortal germline phenotype in laboratory conditions: upon chronic exposure to higher temperatures, such as 25°C, lines reproducibly become sterile after several generations. This phenomenon is reversible, as it can be suppressed by temperature alternations at each generation, suggesting a non-genetic basis for the sterility. We tested whether natural variation in the temperature-induced Mrt phenotype was of genetic nature by building recombinant inbred lines between the isolates MY10 (Mrt) and JU1395 (non-Mrt). Using bulk segregant analysis, we detected two quantitative trait loci. After further recombinant mapping and genome editing, we identified the major causal locus as a polymorphism in the set-24 gene, encoding a SET- and SPK-domain protein. We conclude that C. elegans natural populations may harbor natural genetic variation in epigenetic inheritance phenomena.

    Funded by: NIH HHS: P40 OD010440

    Current biology : CB 2018;28;16;2588-2596.e8

  • Population size changes and selection drive patterns of parallel evolution in a host-virus system.

    Frickel J, Feulner PGD, Karakoc E and Becks L

    Community Dynamics Group, Department Evolutionary Ecology, Max Planck Institute for Evolutionary Biology, 24306, Plön, Germany.

    Predicting the repeatability of evolution remains elusive. Theory and empirical studies suggest that strong selection and large population sizes increase the probability for parallel evolution at the phenotypic and genotypic levels. However, selection and population sizes are not constant, but rather change continuously and directly affect each other even on short time scales. Here, we examine the degree of parallel evolution shaped through eco-evolutionary dynamics in an algal host population coevolving with a virus. We find high degrees of parallelism at the level of population size changes (ecology) and at the phenotypic level between replicated populations. At the genomic level, we find evidence for parallelism, as the same large genomic region was duplicated in all replicated populations, but also substantial novel sequence divergence between replicates. These patterns of genome evolution can be explained by considering population size changes as an important driver of rapid evolution.

    Nature communications 2018;9;1;1706

  • Surveillance and Epidemiology of Drug Resistant Infections Consortium (SEDRIC): Supporting the transition from strategy to action.

    Fukuda K, Limmathurotsakul D, Okeke IN, Shetty N, van Doorn R, Feasey NA, Chiara F, Zoubiane G, Jinks T, Parkhill J, Patel J, Reid SWJ, Holmes AH, Peacock SJ and Surveillance and Epidemiology of Drug Resistant Infections Consortium (SEDRIC)

    School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulum, Hong Kong.

    In recognition of the central importance of surveillance and epidemiology in the control of antimicrobial resistance and the need to strengthen surveillance at all levels, Wellcome has brought together a new international expert group SEDRIC (Surveillance and Epidemiology of Drug Resistant Infections Consortium). SEDRIC aims to advance and transform the ways of tracking, sharing and analysing rates of infection and drug resistance, burden of disease, information on antibiotic use, opportunities for preventative measures such as vaccines, and contamination of the environment. SEDRIC will strengthen the availability of information needed to monitor and track risks, including an evaluation of access to, and utility of data generated by pharma and research activities, and will support the translation of surveillance data into interventions, changes in policy and more effective practices. Ways of working will include the provision of independent scientific analysis, advocacy and expert advice to groups, such as the Wellcome Drug Resistant Infection Priority Programme. A priority for SEDRIC's first Working Group is to review mechanisms to strengthen the generation, collection, collation and dissemination of high quality data, together with the need for creativity in the use of existing data and proxy measures, and linking to existing in-country networking infrastructure. SEDRIC will also promote the translation of technological innovations into public health solutions.

    Wellcome open research 2018;3;59

  • A CRISPR knockout screen Identifies SETDB1-target retroelement silencing factors in embryonic stem cells.

    Fukuda K, Okuda A, Yusa K and Shinkai Y


    In mouse embryonic stem cells (mESCs), expression of provirus and endogenous retroelements is epigenetically repressed. Although many cellular factors involved in retroelement silencing have been identified, the complete molecular mechanism remains elusive. In this study, we performed a genome-wide CRISPR screen to advance our understanding of retroelement silencing in mESCs. The Moloney murine leukemia virus (MLV)-based retroviral vector MSCV-GFP, which is repressed by the SETDB1/KAP1 pathway in mESCs was used as a reporter provirus and we identified more than 80 genes involved in this process. In particular, ATF7IP and the BAF complex components are linked with the repression of most of the SETDB1 targets. We characterized two factors, MORC2A and DRES1, of which DRES1 is novel molecule in retroelement silencing. Although both factors are recruited to repress provirus, their roles in repression are different. MORC2A appears to function dependent on repressive epigenetic modifications while DRES1 regulates repressive epigenetic modifications associated with SETDB1. Our genome-wide CRISPR screen cataloged genes which function at different levels in silencing of SETDB1-target retroelements and provides a useful resource for further molecular studies.

    Genome research 2018

  • Glutaminolysis is a metabolic dependency in FLT3ITD acute myeloid leukemia unmasked by FLT3 tyrosine kinase inhibition.

    Gallipoli P, Giotopoulos G, Tzelepis K, Costa ASH, Vohra S, Medina-Perez P, Basheer F, Marando L, Di Lisio L, Dias JML, Yun H, Sasca D, Horton SJ, Vassiliou G, Frezza C and Huntly BJP

    Wellcome Trust-MRC Cambridge Stem Cell Institute, Cambridge, United Kingdom.

    FLT3 internal tandem duplication (FLT3<sup>ITD</sup>) mutations are common in acute myeloid leukemia (AML) associated with poor patient prognosis. Although new-generation FLT3 tyrosine kinase inhibitors (TKI) have shown promising results, the outcome of FLT3<sup>ITD</sup> AML patients remains poor and demands the identification of novel, specific, and validated therapeutic targets for this highly aggressive AML subtype. Utilizing an unbiased genome-wide clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 screen, we identify GLS, the first enzyme in glutamine metabolism, as synthetically lethal with FLT3-TKI treatment. Using complementary metabolomic and gene-expression analysis, we demonstrate that glutamine metabolism, through its ability to support both mitochondrial function and cellular redox metabolism, becomes a metabolic dependency of FLT3<sup>ITD</sup> AML, specifically unmasked by FLT3-TKI treatment. We extend these findings to AML subtypes driven by other tyrosine kinase (TK) activating mutations and validate the role of GLS as a clinically actionable therapeutic target in both primary AML and in vivo models. Our work highlights the role of metabolic adaptations as a resistance mechanism to several TKI and suggests glutaminolysis as a therapeutically targetable vulnerability when combined with specific TKI in FLT3<sup>ITD</sup> and other TK activating mutation-driven leukemias.

    Funded by: Cancer Research UK: C56179/A21617; Medical Research Council: G1000288, MC_PC_12009, MC_UU_12022/6, MR/M010392/1; Wellcome Trust: 109967/Z/15/Z; Worldwide Cancer Research: 14-1069

    Blood 2018;131;15;1639-1653

  • Quantifying the Impact of Rare and Ultra-rare Coding Variation across the Phenotypic Spectrum.

    Ganna A, Satterstrom FK, Zekavat SM, Das I, Kurki MI, Churchhouse C, Alfoldi J, Martin AR, Havulinna AS, Byrnes A, Thompson WK, Nielsen PR, Karczewski KJ, Saarentaus E, Rivas MA, Gupta N, Pietiläinen O, Emdin CA, Lescai F, Bybjerg-Grauholm J, Flannick J, GoT2D/T2D-GENES Consortium, Mercader JM, Udler M, SIGMA Consortium Helmsley IBD Exome Sequencing Project, FinMetSeq Consortium, iPSYCH-Broad Consortium, Laakso M, Salomaa V, Hultman C, Ripatti S, Hämäläinen E, Moilanen JS, Körkkö J, Kuismin O, Nordentoft M, Hougaard DM, Mors O, Werge T, Mortensen PB, MacArthur D, Daly MJ, Sullivan PF, Locke AE, Palotie A, Børglum AD, Kathiresan S and Neale BM

    Analytic and Translational Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm 17176, Sweden. Electronic address:

    There is a limited understanding about the impact of rare protein-truncating variants across multiple phenotypes. We explore the impact of this class of variants on 13 quantitative traits and 10 diseases using whole-exome sequencing data from 100,296 individuals. Protein-truncating variants in genes intolerant to this class of mutations increased risk of autism, schizophrenia, bipolar disorder, intellectual disability, and ADHD. In individuals without these disorders, there was an association with shorter height, lower education, increased hospitalization, and reduced age at enrollment. Gene sets implicated from GWASs did not show a significant protein-truncating variants burden beyond what was captured by established Mendelian genes. In conclusion, we provide a thorough investigation of the impact of rare deleterious coding variants on complex traits, suggesting widespread pleiotropic risk.

    Funded by: NHGRI NIH HHS: R01 HG006855, U54 HG003067; NIDDK NIH HHS: K23 DK114551, L30 DK106874, P30 DK043351, U54 DK105566; NIMH NIH HHS: R01 MH077139, R01 MH101244, RC2 MH089905, U01 MH105666, U01 MH109528

    American journal of human genetics 2018;102;6;1204-1211

  • APOBEC3A/B deletion polymorphism and cancer risk.

    Gansmo LB, Romundstad P, Hveem K, Vatten L, Nik-Zainal S, Lønning PE and Knappskog S

    Section of Oncology, Department of Clinical Science, University of Bergen, Bergen, Norway.

    Activity of the apolipoprotein B mRNA editing enzyme, catalytic-polypeptide-like (APOBEC) enzymes has been linked to specific mutational processes in human cancer genomes. A germline APOBEC3A/B deletion polymorphism is associated with APOBEC-dependent mutational signatures, and the deletion allele has been reported to confer an elevated risk of some cancers in Asian populations, while the results in European populations, so far, have been conflicting. We genotyped the APOBEC3A/B deletion polymorphism in a large population-based sample consisting of 11 106 Caucasian (Norwegian) individuals, including 7279 incident cancer cases (1769 breast, 1360 lung, 1585 colon, and 2565 prostate cancer) and a control group of 3827 matched individuals without cancer (1918 females and 1909 males) from the same population. Overall, the APOBEC3A/B deletion polymorphism was not associated with risk of any of the four cancer types. However, in subgroup analyses stratified by age, we found that the deletion allele was associated with increased risk for lung cancer among individuals <50 years of age (OR 2.17, CI 1.19-3.97), and that the association was gradually reduced with increasing age (P = 0.01). A similar but weaker pattern was observed for prostate cancer. In support of these findings, the APOBEC3A/B deletion was associated with young age at diagnosis among the cancer cases for both cancer forms (lung cancer: P = 0.02; dominant model and prostate cancer: P = 0.03; recessive model). No such associations were observed for breast or colon cancer.

    Carcinogenesis 2018;39;2;118-124

  • Alterations in sperm long RNA contribute to the epigenetic inheritance of the effects of postnatal trauma.

    Gapp K, van Steenwyk G, Germain PL, Matsushima W, Rudolph KLM, Manuella F, Roszkowski M, Vernaz G, Ghosh T, Pelczar P, Mansuy IM and Miska EA

    Gurdon Institute, University of Cambridge, Tennis Court Rd, Cambridge, CB2 1QN, UK.

    Psychiatric diseases have a strong heritable component known to not be restricted to DNA sequence-based genetic inheritance alone but to also involve epigenetic factors in germ cells. Initial evidence suggested that sperm RNA is causally linked to the transmission of symptoms induced by traumatic experiences. Here, we show that alterations in long RNA in sperm contribute to the inheritance of specific trauma symptoms. Injection of long RNA fraction from sperm of males exposed to postnatal trauma recapitulates the effects on food intake, glucose response to insulin and risk-taking in adulthood whereas the small RNA fraction alters body weight and behavioural despair. Alterations in long RNA are maintained after fertilization, suggesting a direct link between sperm and embryo RNA.

    Funded by: Cancer Research UK: A24843; Swiss National Science Foundation: 159096, 167698; Wellcome Trust: 203144

    Molecular psychiatry 2018;25;9;2162-2174

  • Alcohol and endogenous aldehydes damage chromosomes and mutate stem cells.

    Garaycoechea JI, Crossan GP, Langevin F, Mulderrig L, Louzada S, Yang F, Guilbaud G, Park N, Roerink S, Nik-Zainal S, Stratton MR and Patel KJ

    MRC Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis Crick Avenue, Cambridge CB2 0QH, UK.

    Haematopoietic stem cells renew blood. Accumulation of DNA damage in these cells promotes their decline, while misrepair of this damage initiates malignancies. Here we describe the features and mutational landscape of DNA damage caused by acetaldehyde, an endogenous and alcohol-derived metabolite. This damage results in DNA double-stranded breaks that, despite stimulating recombination repair, also cause chromosome rearrangements. We combined transplantation of single haematopoietic stem cells with whole-genome sequencing to show that this damage occurs in stem cells, leading to deletions and rearrangements that are indicative of microhomology-mediated end-joining repair. Moreover, deletion of p53 completely rescues the survival of aldehyde-stressed and mutated haematopoietic stem cells, but does not change the pattern or the intensity of genome instability within individual stem cells. These findings characterize the mutation of the stem-cell genome by an alcohol-derived and endogenous source of DNA damage. Furthermore, we identify how the choice of DNA-repair pathway and a stringent p53 response limit the transmission of aldehyde-induced mutations in stem cells.

    Funded by: Medical Research Council: MC_U105178811; Wellcome Trust

    Nature 2018;553;7687;171-177

  • A graph-based approach to diploid genome assembly.

    Garg S, Rautiainen M, Novak AM, Garrison E, Durbin R and Marschall T

    Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, Saarbrücken, Germany.

    Motivation: Constructing high-quality haplotype-resolved de novo assemblies of diploid genomes is important for revealing the full extent of structural variation and its role in health and disease. Current assembly approaches often collapse the two sequences into one haploid consensus sequence and, therefore, fail to capture the diploid nature of the organism under study. Thus, building an assembler capable of producing accurate and complete diploid assemblies, while being resource-efficient with respect to sequencing costs, is a key challenge to be addressed by the bioinformatics community.

    Results: We present a novel graph-based approach to diploid assembly, which combines accurate Illumina data and long-read Pacific Biosciences (PacBio) data. We demonstrate the effectiveness of our method on a pseudo-diploid yeast genome and show that we require as little as 50× coverage Illumina data and 10× PacBio data to generate accurate and complete assemblies. Additionally, we show that our approach has the ability to detect and phase structural variants.

    Availability and implementation:

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Funded by: NHGRI NIH HHS: U41 HG007234; Wellcome Trust: WT206194

    Bioinformatics (Oxford, England) 2018;34;13;i105-i114

  • Variation graph toolkit improves read mapping by representing genetic variation in the reference.

    Garrison E, Sirén J, Novak AM, Hickey G, Eizenga JM, Dawson ET, Jones W, Garg S, Markello C, Lin MF, Paten B and Durbin R

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK.

    Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references represent only one version of each locus, ignoring variation in the population. Poor representation of an individual's genome sequence impacts read mapping and introduces bias. Variation graphs are bidirected DNA sequence graphs that compactly represent genetic variation across a population, including large-scale structural variation such as inversions and duplications. Previous graph genome software implementations have been limited by scalability or topological constraints. Here we present vg, a toolkit of computational methods for creating, manipulating, and using these structures as references at the scale of the human genome. vg provides an efficient approach to mapping reads onto arbitrary variation graphs using generalized compressed suffix arrays, with improved accuracy over alignment to a linear reference, and effectively removing reference bias. These capabilities make using variation graphs as references for DNA sequencing practical at a gigabase scale, or at the topological complexity of de novo assemblies.

    Funded by: NHGRI NIH HHS: T32 HG008345, U41 HG007234, U54 HG007990; NHLBI NIH HHS: U01 HL137183; Wellcome Trust: 109083, 206194, 207492, 207492/Z/17/Z

    Nature biotechnology 2018;36;9;875-879

  • Noncanonical secondary structures arising from non-B DNA motifs are determinants of mutagenesis.

    Georgakopoulos-Soares I, Morganella S, Jain N, Hemberg M and Nik-Zainal S

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom.

    Somatic mutations show variation in density across cancer genomes. Previous studies have shown that chromatin organization and replication time domains are correlated with, and thus predictive of, this variation. Here, we analyze 1809 whole-genome sequences from 10 cancer types to show that a subset of repetitive DNA sequences, called non-B motifs that predict noncanonical secondary structure formation can independently account for variation in mutation density. Combined with epigenetic factors and replication timing, the variance explained can be improved to 43%-76%. Approximately twofold mutation enrichment is observed directly within non-B motifs, is focused on exposed structural components, and is dependent on physical properties that are optimal for secondary structure formation. Therefore, there is mounting evidence that secondary structures arising from non-B motifs are not simply associated with increased mutation density-they are possibly causally implicated. Our results suggest that they are determinants of mutagenesis and increase the likelihood of recurrent mutations in the genome. This analysis calls for caution in the interpretation of recurrent mutations and highlights the importance of taking non-B motifs that can simply be inferred from the reference sequence into consideration in background models of mutability henceforth.

    Funded by: Wellcome Trust

    Genome research 2018;28;9;1264-1271

  • Updated recommendation for the benign stand-alone ACMG/AMP criterion.

    Ghosh R, Harrison SM, Rehm HL, Plon SE, Biesecker LG and ClinGen Sequence Variant Interpretation Working Group

    Department of Pediatrics, Baylor College of Medicine, Houston, Texas.

    The Clinical Genome Resource (ClinGen) Sequence Variant Interpretation Working Group set out to refine the American College of Medical Genetics and Genomics and the Association of Molecular Pathologists (ACMG/AMP) variant pathogenicity recommendations for stand-alone rule BA1 (a variant with minor allele frequency [MAF] > 0.05 is benign), by clarifying how it should be used and specifying a set of variants that should be exempted from this rule. We cross-referenced ClinVar and Exome Aggregation Consortium data to identify variants for which there was a plausible argument for pathogenicity and the variant exists in one or more population data sets at MAF > 0.05. We identified nine such variants that were present in these data sets that may not be benign. The ACMG/AMP criteria were applied to these variants that resulted in four pathogenic and five variants of uncertain significance. We have refined benign rule BA1 by clarifying terms used to describe its use, which databases we recommend using, and assumptions made about this rule. We also recognized an initial list of nine variants for which there was some evidence of pathogenicity even though the MAF was high for these variants. We specify processes whereby individuals can petition ClinGen for amendments to our variant-specific assertions and the criteria experts should use when setting a numerically lower threshold for BA1 for specific genes.

    Funded by: Intramural NIH HHS: ZIA HG200328-13, ZIA HG200359-09; NCI NIH HHS: U41HG009650

    Human mutation 2018;39;11;1525-1530

  • Plasmodium vivax-like genome sequences shed new insights into Plasmodium vivax biology and evolution.

    Gilabert A, Otto TD, Rutledge GG, Franzon B, Ollomo B, Arnathau C, Durand P, Moukodoum ND, Okouga AP, Ngoubangoye B, Makanga B, Boundenga L, Paupy C, Renaud F, Prugnolle F and Rougeron V

    MIVEGEC, IRD, CNRS, University of Montpellier, Montpellier, France.

    Although Plasmodium vivax is responsible for the majority of malaria infections outside Africa, little is known about its evolution and pathway to humans. Its closest genetic relative, P. vivax-like, was discovered in African great apes and is hypothesized to have given rise to P. vivax in humans. To unravel the evolutionary history and adaptation of P. vivax to different host environments, we generated using long- and short-read sequence technologies 2 new P. vivax-like reference genomes and 9 additional P. vivax-like genotypes. Analyses show that the genomes of P. vivax and P. vivax-like are highly similar and colinear within the core regions. Phylogenetic analyses clearly show that P. vivax-like parasites form a genetically distinct clade from P. vivax. Concerning the relative divergence dating, we show that the evolution of P. vivax in humans did not occur at the same time as the other agents of human malaria, thus suggesting that the transfer of Plasmodium parasites to humans happened several times independently over the history of the Homo genus. We further identify several key genes that exhibit signatures of positive selection exclusively in the human P. vivax parasites. Two of these genes have been identified to also be under positive selection in the other main human malaria agent, P. falciparum, thus suggesting their key role in the evolution of the ability of these parasites to infect humans or their anthropophilic vectors. Finally, we demonstrate that some gene families important for red blood cell (RBC) invasion (a key step of the life cycle of these parasites) have undergone lineage-specific evolution in the human parasite (e.g., reticulocyte-binding proteins [RBPs]).

    Funded by: Medical Research Council: MR/J004111/1; Wellcome Trust: 098051

    PLoS biology 2018;16;8;e2006035

  • Genetic Diversity of Cryptosporidium hominis in a Bangladeshi Community as Revealed by Whole-Genome Sequencing.

    Gilchrist CA, Cotton JA, Burkey C, Arju T, Gilmartin A, Lin Y, Ahmed E, Steiner K, Alam M, Ahmed S, Robinson G, Zaman SU, Kabir M, Sanders M, Chalmers RM, Ahmed T, Ma JZ, Haque R, Faruque ASG, Berriman M and Petri WA

    Department of Medicine, University of Virginia, Charlottesville.

    We studied the genetic diversity of Cryptosporidium hominis infections in slum-dwelling infants from Dhaka over a 2-year period. Cryptosporidium hominis infections were common during the monsoon, and were genetically diverse as measured by gp60 genotyping and whole-genome resequencing. Recombination in the parasite was evidenced by the decay of linkage disequilibrium in the genome over <300 bp. Regions of the genome with high levels of polymorphism were also identified. Yet to be determined is if genomic diversity is responsible in part for the high rate of reinfection, seasonality, and varied clinical presentations of cryptosporidiosis in this population.

    Funded by: NIAID NIH HHS: R01 AI043596, T32 AI055432; NIGMS NIH HHS: T32 GM007267; Wellcome Trust: 206194

    The Journal of infectious diseases 2018;218;2;259-264

  • Cohort-wide deep whole genome sequencing and the allelic architecture of complex traits.

    Gilly A, Suveges D, Kuchenbaecker K, Pollard M, Southam L, Hatzikotoulas K, Farmaki AE, Bjornland T, Waples R, Appel EVR, Casalone E, Melloni G, Kilian B, Rayner NW, Ntalla I, Kundu K, Walter K, Danesh J, Butterworth A, Barroso I, Tsafantakis E, Dedoussis G, Moltke I and Zeggini E

    Department of Human Genetics, Wellcome Sanger Institute, Hinxton, CB10 1SA, United Kingdom.

    The role of rare variants in complex traits remains uncharted. Here, we conduct deep whole genome sequencing of 1457 individuals from an isolated population, and test for rare variant burdens across six cardiometabolic traits. We identify a role for rare regulatory variation, which has hitherto been missed. We find evidence of rare variant burdens that are independent of established common variant signals (ADIPOQ and adiponectin, P = 4.2 × 10<sup>-8</sup>; APOC3 and triglyceride levels, P = 1.5 × 10<sup>-26</sup>), and identify replicating evidence for a burden associated with triglyceride levels in FAM189B (P = 2.2 × 10<sup>-8</sup>), indicating a role for this gene in lipid metabolism.

    Funded by: British Heart Foundation: RG/13/13/30194; Medical Research Council: G0800270, MR/L003120/1; Wellcome Trust

    Nature communications 2018;9;1;4674

  • Interferon lambda is required for interferon gamma-expressing NK cell responses but does not afford antiviral protection during acute and persistent murine cytomegalovirus infection.

    Gimeno Brias S, Marsden M, Forbester J, Clement M, Brandt C, Harcourt K, Kane L, Chapman L, Clare S and Humphreys IR

    Institute of Infection Immunity, School of Medicine/Systems Immunity University Research Institute, Cardiff University, Cardiff, United Kingdom.

    Interferon lambda (IFNλ) is a group of cytokines that belong to the IL-10 family. They exhibit antiviral activities against certain viruses during infection of the liver and mucosal tissues. Here we report that IFNλ restricts in vitro replication of the β-herpesvirus murine cytomegalovirus (mCMV). However, IFNλR1-deficient (Ifnλr1-/-) mice were not preferentially susceptible to mCMV infection in vivo during acute infection after systemic or mucosal challenge, or during virus persistence in the mucosa. Instead, our studies revealed that IFNλ influences NK cell responses during mCMV infection. Ifnλr1-/- mice exhibited defective development of conventional interferon-gamma (IFNγ)-expressing NK cells in the spleen during mCMV infection whereas accumulation of granzyme B-expressing NK cells was unaltered. In vitro, development of splenic IFNγ+ NK cells following stimulation with IL-12 or, to a lesser extent, IL-18 was abrogated by IFNλR1-deficiency. Thus, IFNλ regulates NK cell responses during mCMV infection and restricts virus replication in vitro but is redundant in the control of acute and persistent mCMV replication within mucosal and non-mucosal tissues.

    Funded by: Medical Research Council; Wellcome Trust: 207503/Z/17/Z, WT098026

    PloS one 2018;13;5;e0197596

  • scanPAV: a pipeline for extracting presence-absence variations in genome pairs.

    Giordano F, Stammnitz MR, Murchison EP and Ning Z

    The Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton.

    Motivation: The recent technological advances in genome sequencing techniques have resulted in an exponential increase in the number of sequenced human and non-human genomes. The ever increasing number of assemblies generated by novel de novo pipelines and strategies demands the development of new software to evaluate assembly quality and completeness. One way to determine the completeness of an assembly is by detecting its Presence-Absence variations (PAV) with respect to a reference, where PAVs between two assemblies are defined as the sequences present in one assembly but entirely missing in the other one. Beyond assembly error or technology bias, PAVs can also reveal real genome polymorphism, consequence of species or individual evolution, or horizontal transfer from viruses and bacteria.

    Results: We present scanPAV, a pipeline for pairwise assembly comparison to identify and extract sequences present in one assembly but not the other. In this note, we use the GRCh38 reference assembly to assess the completeness of six human genome assemblies from various assembly strategies and sequencing technologies including Illumina short reads, 10× genomics linked-reads, PacBio and Oxford Nanopore long reads, and Bionano optical maps. We also discuss the PAV polymorphism of seven Tasmanian devil whole genome assemblies of normal animal tissues and devil facial tumour 1 (DFT1) and 2 (DFT2) samples, and the identification of bacterial sequences as contamination in some of the tumorous assemblies.

    Availability and implementation: The pipeline is available under the MIT License at

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Funded by: Wellcome Trust: 102942/Z/13/A, WT098051

    Bioinformatics (Oxford, England) 2018;34;17;3022-3024

  • Trans-ethnic association study of blood pressure determinants in over 750,000 individuals.

    Giri A, Hellwege JN, Keaton JM, Park J, Qiu C, Warren HR, Torstenson ES, Kovesdy CP, Sun YV, Wilson OD, Robinson-Cohen C, Roumie CL, Chung CP, Birdwell KA, Damrauer SM, DuVall SL, Klarin D, Cho K, Wang Y, Evangelou E, Cabrera CP, Wain LV, Shrestha R, Mautz BS, Akwo EA, Sargurupremraj M, Debette S, Boehnke M, Scott LJ, Luan J, Zhao JH, Willems SM, Thériault S, Shah N, Oldmeadow C, Almgren P, Li-Gao R, Verweij N, Boutin TS, Mangino M, Ntalla I, Feofanova E, Surendran P, Cook JP, Karthikeyan S, Lahrouchi N, Liu C, Sepúlveda N, Richardson TG, Kraja A, Amouyel P, Farrall M, Poulter NR, Understanding Society Scientific Group, International Consortium for Blood Pressure, Blood Pressure-International Consortium of Exome Chip Studies, Laakso M, Zeggini E, Sever P, Scott RA, Langenberg C, Wareham NJ, Conen D, Palmer CNA, Attia J, Chasman DI, Ridker PM, Melander O, Mook-Kanamori DO, Harst PV, Cucca F, Schlessinger D, Hayward C, Spector TD, Jarvelin MR, Hennig BJ, Timpson NJ, Wei WQ, Smith JC, Xu Y, Matheny ME, Siew EE, Lindgren C, Herzig KH, Dedoussis G, Denny JC, Psaty BM, Howson JMM, Munroe PB, Newton-Cheh C, Caulfield MJ, Elliott P, Gaziano JM, Concato J, Wilson PWF, Tsao PS, Velez Edwards DR, Susztak K, Million Veteran Program, O'Donnell CJ, Hung AM and Edwards TL

    Division of Quantitative Sciences, Department of Obstetrics & Gynecology, Vanderbilt Genetics Institute, Vanderbilt Epidemiology Center, Institute for Medicine and Public Health, Vanderbilt University Medical Center, Nashville, TN, USA.

    In this trans-ethnic multi-omic study, we reinterpret the genetic architecture of blood pressure to identify genes, tissues, phenomes and medication contexts of blood pressure homeostasis. We discovered 208 novel common blood pressure SNPs and 53 rare variants in genome-wide association studies of systolic, diastolic and pulse pressure in up to 776,078 participants from the Million Veteran Program (MVP) and collaborating studies, with analysis of the blood pressure clinical phenome in MVP. Our transcriptome-wide association study detected 4,043 blood pressure associations with genetically predicted gene expression of 840 genes in 45 tissues, and mouse renal single-cell RNA sequencing identified upregulated blood pressure genes in kidney tubule cells.

    Funded by: BLRD VA: I01 BX003340, I01 BX003360, I01 BX003362; British Heart Foundation: FS/12/82/29736; CSRD VA: I01 CX000982; Medical Research Council: G1001799, G9815508, MC_PC_15018, MC_UU_00007/10, MC_UU_12015/1, MR/L01341X/1, MR/M004422/1, MR/N01104X/1, MR/N01104X/2, MR/S003746/1, MR/S003886/1; NCI NIH HHS: P30 CA068485, T32 CA160056; NCRR NIH HHS: S10 RR025141; NEI NIH HHS: P30 EY008126; NHGRI NIH HHS: T32 HG008341; NHLBI NIH HHS: R01 HL105756, R01 HL113933, R01 HL124262, R01 HL133786, R21 HL121429, U01 HL130114, U19 HL065962; NIA NIH HHS: P30 AG010129, U01 AG052409; NIAMS NIH HHS: K23 AR064768; NICHD NIH HHS: K12 HD043483; NIDDK NIH HHS: DP3 DK108220, K01 DK109019, P30 DK020572, R01 DK076077, R01 DK087635, R01 DK105821, U01 DK062370; NIGMS NIH HHS: P50 GM115305; NIH HHS: S10 OD023680; NINDS NIH HHS: R01 NS017950, UH3 NS100605

    Nature genetics 2018;51;1;51-62

  • Mutational mechanisms of amplifications revealed by analysis of clustered rearrangements in breast cancers.

    Glodzik D, Purdie C, Rye IH, Simpson PT, Staaf J, Span PN, Russnes HG and Nik-Zainal S

    Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Lund, Sweden.

    Background: Complex clusters of rearrangements are a challenge in interpretation of cancer genomes. Some clusters of rearrangements demarcate clear amplifications of driver oncogenes but others are less well understood. A detailed analysis of rearrangements within these complex clusters could reveal new insights into selection and underlying mutational mechanisms.

    Patients and methods: Here, we systematically investigate rearrangements that are densely clustered in individual tumours in a cohort of 560 breast cancers. Applying an agnostic approach, we identify 21 hotspots where clustered rearrangements recur across cancers.

    Results: Some hotspots coincide with known oncogene loci including CCND1, ERBB2, ZNF217, chr8:ZNF703/FGFR1, IGF1R, and MYC. Others contain cancer genes not typically associated with breast cancer: MCL1, PTP4A1, and MYB. Intriguingly, we identify clustered rearrangements that physically connect distant hotspots. In particular, we observe simultaneous amplification of chr8:ZNF703/FGFR1 and chr11:CCND1 where deep analysis reveals that a chr8-chr11 translocation is likely to be an early, critical, initiating event.

    Conclusions: We present an overview of complex rearrangements in breast cancer, highlighting a potential new way for detecting drivers and revealing novel mechanistic insights into the formation of two common amplicons.

    Funded by: Cancer Research UK: C60100/A23916; Wellcome Trust: 077012/Z/05/Z, 101126/B/13/Z

    Annals of oncology : official journal of the European Society for Medical Oncology 2018;29;11;2223-2231

  • Hydroxycarbamide Plus Aspirin Versus Aspirin Alone in Patients With Essential Thrombocythemia Age 40 to 59 Years Without High-Risk Features.

    Godfrey AL, Campbell PJ, MacLean C, Buck G, Cook J, Temple J, Wilkins BS, Wheatley K, Nangalia J, Grinfeld J, McMullin MF, Forsyth C, Kiladjian JJ, Green AR, Harrison CN, United Kingdom Medical Research Council Primary Thrombocythemia-1 Study, United Kingdom National Cancer Research Institute Myeloproliferative Neoplasms Subgroup, French Intergroup of Myeloproliferative Neoplasms and and the Australasian Leukaemia and Lymphoma Group.

    Anna L. Godfrey, Jacob Grinfeld, and Anthony R. Green, Cambridge University Hospitals National Health Service (NHS) Foundation Trust; Peter J. Campbell and Jyoti Nangalia, Wellcome Trust Sanger Institute, Hinxton; Cathy MacLean, Julia Cook, Julie Temple, and Anthony R. Green, University of Cambridge; Anthony R. Green, Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, Cambridge; Georgina Buck, University of Oxford, Oxford; Bridget S. Wilkins and Claire N. Harrison, Guy's and St Thomas' NHS Foundation Trust, London; Keith Wheatley, University of Birmingham, Birmingham; Mary Frances McMullin, Queen's University Belfast, Belfast, United Kingdom; Cecily Forsyth, Gosford Hospital, Gosford, and Australasian Leukaemia and Lymphoma Group, Australia; and Jean-Jacques Kiladjian, Hôpital Saint-Louis, Paris, France.

    Purpose Cytoreductive therapy is beneficial in patients with essential thrombocythemia (ET) at high risk of thrombosis. However, its value in those lacking high-risk features remains unknown. This open-label, randomized trial compared hydroxycarbamide plus aspirin with aspirin alone in patients with ET age 40 to 59 years and without high-risk factors or extreme thrombocytosis. Patients and Methods Patients were age 40 to 59 years and lacked a history of ischemia, thrombosis, embolism, hemorrhage, extreme thrombocytosis (platelet count ≥ 1,500 × 10<sup>9</sup>/L), hypertension, or diabetes requiring therapy. In all, 382 patients were randomly assigned 1:1 to hydroxycarbamide plus aspirin or aspirin alone. The composite primary end point was time to arterial or venous thrombosis, serious hemorrhage, or death from vascular causes. Secondary end points were time to first arterial or venous thrombosis, first serious hemorrhage, death, incidence of transformation, and patient-reported quality of life. Results After a median follow-up of 73 months and a total follow-up of 2,373 patient-years, there was no significant difference between the arms in the likelihood of patients reaching the primary end point (hazard ratio, 0.98; 95% CI, 0.42 to 2.25; P = 1.0). The incidence of significant vascular events was low, at 0.93 per 100 patient-years (95% CI, 0.61 to 1.41). There were also no differences in overall survival; in the composite end point of transformation to myelofibrosis, acute myeloid leukemia, or myelodysplasia; in adverse events; or in patient-reported quality of life. Conclusion In patients with ET age 40 to 59 years and lacking high-risk factors for thrombosis or extreme thrombocytosis, preemptive addition of hydroxycarbamide to aspirin did not reduce vascular events, myelofibrotic transformation, or leukemic transformation. Patients age 40 to 59 years without other clinical indications for treatment (such as previous thrombosis or hemorrhage) who have a platelet count < 1,500 × 10<sup>9</sup>/L should not receive cytoreductive therapy.

    Funded by: Medical Research Council: MC_PC_12009; Wellcome Trust

    Journal of clinical oncology : official journal of the American Society of Clinical Oncology 2018;JCO2018788414

  • Polycythaemia Vera, Essential Thrombocythaemia and Myelofibrosis

    Godfrey, A.L, VASSILIOU,G.S and Green,A.R

    ABC of Clinical Haematology 2018;21

  • Polycythaemia Vera, Essential Thrombocythaemia and Myelofibrosis

    Godfrey, Anna L., Vassiliou,George S. and Green, Anthony R.

    ABC of Clinical Haematology 2018;21

  • Antimicrobial resistance prediction and phylogenetic analysis of Neisseria gonorrhoeae isolates using the Oxford Nanopore MinION sequencer.

    Golparian D, Donà V, Sánchez-Busó L, Foerster S, Harris S, Endimiani A, Low N and Unemo M

    WHO Collaborating Centre for Gonorrhoea and other Sexually Transmitted Infections, Department of Laboratory Medicine, Clinical Microbiology, Faculty of Medicine and Health, Örebro University, Örebro, Sweden.

    Antimicrobial resistance (AMR) in Neisseria gonorrhoeae is common, compromising gonorrhoea treatment internationally. Rapid characterisation of AMR strains could ensure appropriate and personalised treatment, and support identification and investigation of gonorrhoea outbreaks in nearly real-time. Whole-genome sequencing is ideal for investigation of emergence and dissemination of AMR determinants, predicting AMR, in the gonococcal population and spread of AMR strains in the human population. The novel, rapid and revolutionary long-read sequencer MinION is a small hand-held device that generates bacterial genomes within one day. However, accuracy of MinION reads has been suboptimal for many objectives and the MinION has not been evaluated for gonococci. In this first MinION study for gonococci, we show that MinION-derived sequences analysed with existing open-access, web-based sequence analysis tools are not sufficiently accurate to identify key gonococcal AMR determinants. Nevertheless, using an in house-developed CLC Genomics Workbench including de novo assembly and optimised BLAST algorithms, we show that 2D ONT-derived sequences can be used for accurate prediction of decreased susceptibility or resistance to recommended antimicrobials in gonococcal isolates. We also show that the 2D ONT-derived sequences are useful for rapid phylogenomic-based molecular epidemiological investigations, and, in hybrid assemblies with Illumina sequences, for producing contiguous assemblies and finished reference genomes.

    Funded by: Wellcome Trust: 098051

    Scientific reports 2018;8;1;17596

  • Chromosomal evolution and phylogeny in the Nullicauda group (Chiroptera, Phyllostomidae): evidence from multidirectional chromosome painting.

    Gomes AJB, Nagamachi CY, Rodrigues LRR, Ferguson-Smith MA, Yang F, O'Brien PCM and Pieczarka JC

    Laboratório de Citogenética, CEABIO, ICB, Universidade Federal do Pará, Av. Bernardo Sayão, sn. Guamá, Belém, Pará, 66075-900, Brazil.

    Background: The family Phyllostomidae (Chiroptera) shows wide morphological, molecular and cytogenetic variation; many disagreements regarding its phylogeny and taxonomy remains to be resolved. In this study, we use chromosome painting with whole chromosome probes from the Phyllostomidae Phyllostomus hastatus and Carollia brevicauda to determine the rearrangements among several genera of the Nullicauda group (subfamilies Gliphonycterinae, Carolliinae, Rhinophyllinae and Stenodermatinae).

    Results: These data, when compared with previously published chromosome homology maps, allow the construction of a phylogeny comparable to those previously obtained by morphological and molecular analysis. Our phylogeny is largely in agreement with that proposed with molecular data, both on relationships between the subfamilies and among genera; it confirms, for instance, that Carollia and Rhinophylla, previously considered as part of the same subfamily are, in fact, distant genera.

    Conclusions: The occurrence of the karyotype considered ancestral for this family in several different branches suggests that the diversification of Phyllostomidae into many subfamilies has occurred in a short period of time. Finally, the comparison with published maps using human whole chromosome probes allows us to track some syntenic associations prior to the emergence of this family.

    Funded by: Banco Nacional de Desenvolvimento Economico e Social: 2.318.697.0001; Conselho Nacional de Desenvolvimento Científico e Tecnológico: 308401/2013-1, 308428/2013-7; Conselho Nacional de Desenvolvimento Científico e Tecnológico (BR): 552032/2010-7; Coordenação de Aperfeiçoamento de Pessoal de Nível Superior: 047/2012; Fundação Amazônia Paraense de Amparo à Pesquisa: 064/2008, 064/2011

    BMC evolutionary biology 2018;18;1;62

  • Gene-level association analysis of systemic sclerosis: A comparison of African-Americans and White populations.

    Gorlova OY, Li Y, Gorlov I, Ying J, Chen WV, Assassi S, Reveille JD, Arnett FC, Zhou X, Bossini-Castillo L, Lopez-Isac E, Acosta-Herrera M, Gregersen PK, Lee AT, Steen VD, Fessler BJ, Khanna D, Schiopu E, Silver RM, Molitor JA, Furst DE, Kafaja S, Simms RW, Lafyatis RA, Carreira P, Simeon CP, Castellvi I, Beltran E, Ortego N, Amos CI, Martin J and Mayes MD

    Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Lebanon, NH, United States of America.

    Gene-level analysis of ImmunoChip or genome-wide association studies (GWAS) data has not been previously reported for systemic sclerosis (SSc, scleroderma). The objective of this study was to analyze genetic susceptibility loci in SSc at the gene level and to determine if the detected associations were shared in African-American and White populations, using data from ImmunoChip and GWAS genotyping studies. The White sample included 1833 cases and 3466 controls (956 cases and 2741 controls from the US and 877 cases and 725 controls from Spain) and the African American sample, 291 cases and 260 controls. In both Whites and African Americans, we performed a gene-level analysis that integrates association statistics in a gene possibly harboring multiple SNPs with weak effect on disease risk, using Versatile Gene-based Association Study (VEGAS) software. The SNP-level analysis was performed using PLINK v.1.07. We identified 4 novel candidate genes (STAT1, FCGR2C, NIPSNAP3B, and SCT) significantly associated and 4 genes (SERBP1, PINX1, TMEM175 and EXOC2) suggestively associated with SSc in the gene level analysis in White patients. As an exploratory analysis we compared the results on Whites with those from African Americans. Of previously established susceptibility genes identified in Whites, only TNFAIP3 was significant at the nominal level (p = 6.13x10-3) in African Americans in the gene-level analysis of the ImmunoChip data. Among the top suggestive novel genes identified in Whites based on the ImmunoChip data, FCGR2C and PINX1 were only nominally significant in African Americans (p = 0.016 and p = 0.028, respectively), while among the top novel genes identified in the gene-level analysis in African Americans, UNC5C (p = 5.57x10-4) and CLEC16A (p = 0.0463) were also nominally significant in Whites. We also present the gene-level analysis of SSc clinical and autoantibody phenotypes among Whites. Our findings need to be validated by independent studies, particularly due to the limited sample size of African Americans.

    Funded by: NCI NIH HHS: P30 CA023108; NIH HHS: N01-AR-02251 , P50-AR054144 , R01-AR-055258

    PloS one 2018;13;1;e0189498

  • Antimicrobial-Resistant Klebsiella pneumoniae Carriage and Infection in Specialized Geriatric Care Wards Linked to Acquisition in the Referring Hospital.

    Gorrie CL, Mirceta M, Wick RR, Judd LM, Wyres KL, Thomson NR, Strugnell RA, Pratt NF, Garlick JS, Watson KM, Hunter PC, McGloughlin SA, Spelman DW, Jenney AWJ and Holt KE

    Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, Melbourne, Victoria, Australia.

    Background: Klebsiella pneumoniae is a leading cause of extended-spectrum β-lactamase (ESBL)-producing hospital-associated infections, for which elderly patients are at increased risk.

    Methods: We conducted a 1-year prospective cohort study, in which a third of patients admitted to 2 geriatric wards in a specialized hospital were recruited and screened for carriage of K. pneumoniae by microbiological culture. Clinical isolates were monitored via the hospital laboratory. Colonizing and clinical isolates were subjected to whole-genome sequencing and antimicrobial susceptibility testing.

    Results: K. pneumoniae throat carriage prevalence was 4.1%, rectal carriage 10.8%, and ESBL carriage 1.7%, and the incidence of K. pneumoniae infection was 1.2%. The isolates were diverse, and most patients were colonized or infected with a unique phylogenetic lineage, with no evidence of transmission in the wards. ESBL strains carried blaCTX-M-15 and belonged to clones associated with hospital-acquired ESBL infections in other countries (sequence type [ST] 29, ST323, and ST340). One also carried the carbapenemase blaIMP-26. Genomic and epidemiological data provided evidence that ESBL strains were acquired in the referring hospital. Nanopore sequencing also identified strain-to-strain transmission of a blaCTX-M-15 FIBK/FIIK plasmid in the referring hospital.

    Conclusions: The data suggest the major source of K. pneumoniae was the patient's own gut microbiome, but ESBL strains were acquired in the referring hospital. This highlights the importance of the wider hospital network to understanding K. pneumoniae risk and infection prevention. Rectal screening for ESBL organisms on admission to geriatric wards could help inform patient management and infection control in such facilities.

    Clinical infectious diseases : an official publication of the Infectious Diseases Society of America 2018;67;2;161-170

  • Genomic Surveillance of Enterococcus faecium Reveals Limited Sharing of Strains and Resistance Genes between Livestock and Humans in the United Kingdom.

    Gouliouris T, Raven KE, Ludden C, Blane B, Corander J, Horner CS, Hernandez-Garcia J, Wood P, Hadjirin NF, Radakovic M, Holmes MA, de Goffau M, Brown NM, Parkhill J and Peacock SJ

    Department of Medicine, University of Cambridge, Cambridge, United Kingdom.

    Vancomycin-resistant <i>Enterococcus faecium</i> (VREfm) is a major cause of nosocomial infection and is categorized as high priority by the World Health Organization global priority list of antibiotic-resistant bacteria. In the past, livestock have been proposed as a putative reservoir for drug-resistant <i>E. faecium</i> strains that infect humans, and isolates of the same lineage have been found in both reservoirs. We undertook cross-sectional surveys to isolate <i>E. faecium</i> (including VREfm) from livestock farms, retail meat, and wastewater treatment plants in the United Kingdom. More than 600 isolates from these sources were sequenced, and their relatedness and antibiotic resistance genes were compared with genomes of almost 800 <i>E. faecium</i> isolates from patients with bloodstream infection in the United Kingdom and Ireland. <i>E. faecium</i> was isolated from 28/29 farms; none of these isolates were VREfm, suggesting a decrease in VREfm prevalence since the last UK livestock survey in 2003. However, VREfm was isolated from 1% to 2% of retail meat products and was ubiquitous in wastewater treatment plants. Phylogenetic comparison demonstrated that the majority of human and livestock-related isolates were genetically distinct, although pig isolates from three farms were more genetically related to human isolates from 2001 to 2004 (minimum of 50 single-nucleotide polymorphisms [SNPs]). Analysis of accessory (variable) genes added further evidence for distinct niche adaptation. An analysis of acquired antibiotic resistance genes and their variants revealed limited sharing between humans and livestock. Our findings indicate that the majority of <i>E. faecium</i> strains infecting patients are largely distinct from those from livestock in this setting, with limited sharing of strains and resistance genes.<b>IMPORTANCE</b> The rise in rates of human infection caused by vancomycin-resistant <i>Enterococcus faecium</i> (VREfm) strains between 1988 to the 2000s in Europe was suggested to be associated with acquisition from livestock. As a result, the European Union banned the use of the glycopeptide drug avoparcin as a growth promoter in livestock feed. While some studies reported a decrease in VREfm in livestock, others reported no reduction. Here, we report the first livestock VREfm prevalence survey in the UK since 2003 and the first large-scale study using whole-genome sequencing to investigate the relationship between <i>E. faecium</i> strains in livestock and humans. We found a low prevalence of VREfm in retail meat and limited evidence for recent sharing of strains between livestock and humans with bloodstream infection. There was evidence for limited sharing of genes encoding antibiotic resistance between these reservoirs, a finding which requires further research.

    Funded by: Department of Health: HICF-T5-342; Wellcome Trust: 098051, 103387/Z/13/Z, 110243/Z/15/Z, WT098600

    mBio 2018;9;6

  • UTX-mediated enhancer and chromatin remodeling suppresses myeloid leukemogenesis through noncatalytic inverse regulation of ETS and GATA programs.

    Gozdecka M, Meduri E, Mazan M, Tzelepis K, Dudek M, Knights AJ, Pardo M, Yu L, Choudhary JS, Metzakopian E, Iyer V, Yun H, Park N, Varela I, Bautista R, Collord G, Dovey O, Garyfallos DA, De Braekeleer E, Kondo S, Cooper J, Göttgens B, Bullinger L, Northcott PA, Adams D, Vassiliou GS and Huntly BJP

    Haematological Cancer Genetics, Wellcome Trust Sanger Institute, Hinxton, UK.

    The histone H3 Lys27-specific demethylase UTX (or KDM6A) is targeted by loss-of-function mutations in multiple cancers. Here, we demonstrate that UTX suppresses myeloid leukemogenesis through noncatalytic functions, a property shared with its catalytically inactive Y-chromosome paralog, UTY (or KDM6C). In keeping with this, we demonstrate concomitant loss/mutation of KDM6A (UTX) and UTY in multiple human cancers. Mechanistically, global genomic profiling showed only minor changes in H3K27me3 but significant and bidirectional alterations in H3K27ac and chromatin accessibility; a predominant loss of H3K4me1 modifications; alterations in ETS and GATA-factor binding; and altered gene expression after Utx loss. By integrating proteomic and genomic analyses, we link these changes to UTX regulation of ATP-dependent chromatin remodeling, coordination of the COMPASS complex and enhanced pioneering activity of ETS factors during evolution to AML. Collectively, our findings identify a dual role for UTX in suppressing acute myeloid leukemia via repression of oncogenic ETS and upregulation of tumor-suppressive GATA programs.

    Funded by: Medical Research Council: MC_PC_12009, MR/M010392/1, MR/R009708/1

    Nature genetics 2018;50;6;883-894

  • Identification of rare sequence variation underlying heritable pulmonary arterial hypertension.

    Gräf S, Haimel M, Bleda M, Hadinnapola C, Southgate L, Li W, Hodgson J, Liu B, Salmon RM, Southwood M, Machado RD, Martin JM, Treacy CM, Yates K, Daugherty LC, Shamardina O, Whitehorn D, Holden S, Aldred M, Bogaard HJ, Church C, Coghlan G, Condliffe R, Corris PA, Danesino C, Eyries M, Gall H, Ghio S, Ghofrani HA, Gibbs JSR, Girerd B, Houweling AC, Howard L, Humbert M, Kiely DG, Kovacs G, MacKenzie Ross RV, Moledina S, Montani D, Newnham M, Olschewski A, Olschewski H, Peacock AJ, Pepke-Zaba J, Prokopenko I, Rhodes CJ, Scelsi L, Seeger W, Soubrier F, Stein DF, Suntharalingam J, Swietlik EM, Toshner MR, van Heel DA, Vonk Noordegraaf A, Waisfisz Q, Wharton J, Wort SJ, Ouwehand WH, Soranzo N, Lawrie A, Upton PD, Wilkins MR, Trembath RC and Morrell NW

    Department of Medicine, University of Cambridge, Cambridge, CB2 0QQ, United Kingdom.

    Pulmonary arterial hypertension (PAH) is a rare disorder with a poor prognosis. Deleterious variation within components of the transforming growth factor-β pathway, particularly the bone morphogenetic protein type 2 receptor (BMPR2), underlies most heritable forms of PAH. To identify the missing heritability we perform whole-genome sequencing in 1038 PAH index cases and 6385 PAH-negative control subjects. Case-control analyses reveal significant overrepresentation of rare variants in ATP13A3, AQP1 and SOX17, and provide independent validation of a critical role for GDF2 in PAH. We demonstrate familial segregation of mutations in SOX17 and AQP1 with PAH. Mutations in GDF2, encoding a BMPR2 ligand, lead to reduced secretion from transfected cells. In addition, we identify pathogenic mutations in the majority of previously reported PAH genes, and provide evidence for further putative genes. Taken together these findings contribute new insights into the molecular basis of PAH and indicate unexplored pathways for therapeutic intervention.

    Funded by: British Heart Foundation: FS/13/48/30453, FS/15/59/31839, PG/12/54/29734, PG/15/39/31519, PG/17/1/32532, PG/17/58/33134, RG/13/4/30107, SP/12/12/29836; Medical Research Council: MR/K020919/1; NHLBI NIH HHS: R01 HL098199

    Nature communications 2018;9;1;1416

  • Loss-of-function variants in ADCY3 increase risk of obesity and type 2 diabetes.

    Grarup N, Moltke I, Andersen MK, Dalby M, Vitting-Seerup K, Kern T, Mahendran Y, Jørsboe E, Larsen CVL, Dahl-Petersen IK, Gilly A, Suveges D, Dedoussis G, Zeggini E, Pedersen O, Andersson R, Bjerregaard P, Jørgensen ME, Albrechtsen A and Hansen T

    Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.

    We have identified a variant in ADCY3 (encoding adenylate cyclase 3) associated with markedly increased risk of obesity and type 2 diabetes in the Greenlandic population. The variant disrupts a splice acceptor site, and carriers have decreased ADCY3 RNA expression. Additionally, we observe an enrichment of rare ADCY3 loss-of-function variants among individuals with type 2 diabetes in trans-ancestry cohorts. These findings provide new information on disease etiology relevant for future treatment strategies.

    Funded by: European Research Council: 638273

    Nature genetics 2018;50;2;172-174

  • Dynamics of Transcription Regulation in Human Bone Marrow Myeloid Differentiation to Mature Blood Neutrophils.

    Grassi L, Pourfarzad F, Ullrich S, Merkel A, Were F, Carrillo-de-Santa-Pau E, Yi G, Hiemstra IH, Tool ATJ, Mul E, Perner J, Janssen-Megens E, Berentsen K, Kerstens H, Habibi E, Gut M, Yaspo ML, Linser M, Lowy E, Datta A, Clarke L, Flicek P, Vingron M, Roos D, van den Berg TK, Heath S, Rico D, Frontini M, Kostadima M, Gut I, Valencia A, Ouwehand WH, Stunnenberg HG, Martens JHA and Kuijpers TW

    Department of Haematology, University of Cambridge, Cambridge CB2 0PT, UK; National Health Service Blood and Transplant, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK.

    Neutrophils are short-lived blood cells that play a critical role in host defense against infections. To better comprehend neutrophil functions and their regulation, we provide a complete epigenetic overview, assessing important functional features of their differentiation stages from bone marrow-residing progenitors to mature circulating cells. Integration of chromatin modifications, methylation, and transcriptome dynamics reveals an enforced regulation of differentiation, for cellular functions such as release of proteases, respiratory burst, cell cycle regulation, and apoptosis. We observe an early establishment of the cytotoxic capability, while the signaling components that activate these antimicrobial mechanisms are transcribed at later stages, outside the bone marrow, thus preventing toxic effects in the bone marrow niche. Altogether, these data reveal how the developmental dynamics of the chromatin landscape orchestrate the daily production of a large number of neutrophils required for innate host defense and provide a comprehensive overview of differentiating human neutrophils.

    Funded by: Wellcome Trust: 108749

    Cell reports 2018;24;10;2784-2794

  • De Novo Variants in the F-Box Protein FBXO11 in 20 Individuals with a Variable Neurodevelopmental Disorder.

    Gregor A, Sadleir LG, Asadollahi R, Azzarello-Burri S, Battaglia A, Ousager LB, Boonsawat P, Bruel AL, Buchert R, Calpena E, Cogné B, Dallapiccola B, Distelmaier F, Elmslie F, Faivre L, Haack TB, Harrison V, Henderson A, Hunt D, Isidor B, Joset P, Kumada S, Lachmeijer AMA, Lees M, Lynch SA, Martinez F, Matsumoto N, McDougall C, Mefford HC, Miyake N, Myers CT, Moutton S, Nesbitt A, Novelli A, Orellana C, Rauch A, Rosello M, Saida K, Santani AB, Sarkar A, Scheffer IE, Shinawi M, Steindl K, Symonds JD, Zackai EH, University of Washington Center for Mendelian Genomics, DDD Study, Reis A, Sticht H and Zweier C

    Institute of Human Genetics, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany.

    Next-generation sequencing combined with international data sharing has enormously facilitated identification of new disease-associated genes and mutations. This is particularly true for genetically extremely heterogeneous entities such as neurodevelopmental disorders (NDDs). Through exome sequencing and world-wide collaborations, we identified and assembled 20 individuals with de novo variants in FBXO11. They present with mild to severe developmental delay associated with a range of features including short (4/20) or tall (2/20) stature, obesity (5/20), microcephaly (4/19) or macrocephaly (2/19), behavioral problems (17/20), seizures (5/20), cleft lip or palate or bifid uvula (3/20), and minor skeletal anomalies. FBXO11 encodes a member of the F-Box protein family, constituting a subunit of an E3-ubiquitin ligase complex. This complex is involved in ubiquitination and proteasomal degradation and thus in controlling critical biological processes by regulating protein turnover. The identified de novo aberrations comprise two large deletions, ten likely gene disrupting variants, and eight missense variants distributed throughout FBXO11. Structural modeling for missense variants located in the CASH or the Zinc-finger UBR domains suggests destabilization of the protein. This, in combination with the observed spectrum and localization of identified variants and the lack of apparent genotype-phenotype correlations, is compatible with loss of function or haploinsufficiency as an underlying mechanism. We implicate de novo missense and likely gene disrupting variants in FBXO11 in a neurodevelopmental disorder with variable intellectual disability and various other features.

    Funded by: NHGRI NIH HHS: U54 HG006493, UM1 HG006493; NICHD NIH HHS: U54 HD083091, U54 HD087011; NINDS NIH HHS: R01 NS069605

    American journal of human genetics 2018;103;2;305-316

  • Detection and removal of barcode swapping in single-cell RNA-seq data.

    Griffiths JA, Richard AC, Bach K, Lun ATL and Marioni JC

    Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, CB2 0RE, United Kingdom.

    Barcode swapping results in the mislabelling of sequencing reads between multiplexed samples on patterned flow-cell Illumina sequencing machines. This may compromise the validity of numerous genomic assays; however, the severity and consequences of barcode swapping remain poorly understood. We have used two statistical approaches to robustly quantify the fraction of swapped reads in two plate-based single-cell RNA-sequencing datasets. We found that approximately 2.5% of reads were mislabelled between samples on the HiSeq 4000, which is lower than previous reports. We observed no correlation between the swapped fraction of reads and the concentration of free barcode across plates. Furthermore, we have demonstrated that barcode swapping may generate complex but artefactual cell libraries in droplet-based single-cell RNA-sequencing studies. To eliminate these artefacts, we have developed an algorithm to exclude individual molecules that have swapped between samples in 10x Genomics experiments, allowing the continued use of cutting-edge sequencing machines for these assays.

    Funded by: Cancer Research UK (CRUK): A17197; Wellcome Trust: 109081

    Nature communications 2018;9;1;2667

  • Using single-cell genomics to understand developmental processes and cell fate decisions.

    Griffiths JA, Scialdone A and Marioni JC

    Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK.

    High-throughput <i>-omics</i> techniques have revolutionised biology, allowing for thorough and unbiased characterisation of the molecular states of biological systems. However, cellular decision-making is inherently a unicellular process to which "bulk" -omics techniques are poorly suited, as they capture ensemble averages of cell states. Recently developed single-cell methods bridge this gap, allowing high-throughput molecular surveys of individual cells. In this review, we cover core concepts of analysis of single-cell gene expression data and highlight areas of developmental biology where single-cell techniques have made important contributions. These include understanding of cell-to-cell heterogeneity, the tracing of differentiation pathways, quantification of gene expression from specific alleles, and the future directions of cell lineage tracing and spatial gene expression analysis.

    Funded by: Cancer Research UK: A17197; Wellcome Trust: 105031/B/14/Z, 109081/Z/15/A

    Molecular systems biology 2018;14;4;e8046

  • Classification and Personalized Prognosis in Myeloproliferative Neoplasms.

    Grinfeld J, Nangalia J, Baxter EJ, Wedge DC, Angelopoulos N, Cantrill R, Godfrey AL, Papaemmanuil E, Gundem G, MacLean C, Cook J, O'Neil L, O'Meara S, Teague JW, Butler AP, Massie CE, Williams N, Nice FL, Andersen CL, Hasselbalch HC, Guglielmelli P, McMullin MF, Vannucchi AM, Harrison CN, Gerstung M, Green AR and Campbell PJ

    From the Wellcome-MRC Cambridge Stem Cell Institute and Cambridge Institute for Medical Research (J.G., C.E.M., F.L.N., A.R.G., P.J.C.), the Department of Haematology, University of Cambridge (J.G., E.J.B., C.M., J.C., C.E.M., F.L.N., A.R.G.), and the Department of Haematology, Cambridge University Hospitals NHS Foundation Trust (J.G., E.J.B., A.L.G., C.M., J.C., A.R.G.), Cambridge, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus (J.N., D.C.W., N.A., E.P., G.G., L.O., S.O., J.W.T., A.P.B., N.W., P.J.C.), and the European Molecular Biology Laboratory, European Bioinformatics Institute (R.C., M.G.), Hinxton, Big Data Institute, University of Oxford, Oxford (D.C.W.), the Department of Haematology, Queen's University Belfast, Belfast (M.F.M.), and the Department of Haematology, Guy's and St. Thomas' NHS Foundation Trust, London (C.N.H.) - all in the United Kingdom; the Center for Molecular Oncology and the Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York (E.P., G.G.); the Department of Hematology, Zealand University Hospital, Roskilde, and the University of Copenhagen, Copenhagen (C.L.A., H.C.H.); and the Department of Experimental and Clinical Medicine, Center of Research and Innovation of Myeloproliferative Neoplasms, Azienda Ospedaliera Universitaria Careggi, University of Florence, Florence, Italy (P.G., A.M.V.).

    Background: Myeloproliferative neoplasms, such as polycythemia vera, essential thrombocythemia, and myelofibrosis, are chronic hematologic cancers with varied progression rates. The genomic characterization of patients with myeloproliferative neoplasms offers the potential for personalized diagnosis, risk stratification, and treatment.

    Methods: We sequenced coding exons from 69 myeloid cancer genes in patients with myeloproliferative neoplasms, comprehensively annotating driver mutations and copy-number changes. We developed a genomic classification for myeloproliferative neoplasms and multistage prognostic models for predicting outcomes in individual patients. Classification and prognostic models were validated in an external cohort.

    Results: A total of 2035 patients were included in the analysis. A total of 33 genes had driver mutations in at least 5 patients, with mutations in JAK2, CALR, or MPL being the sole abnormality in 45% of the patients. The numbers of driver mutations increased with age and advanced disease. Driver mutations, germline polymorphisms, and demographic variables independently predicted whether patients received a diagnosis of essential thrombocythemia as compared with polycythemia vera or a diagnosis of chronic-phase disease as compared with myelofibrosis. We defined eight genomic subgroups that showed distinct clinical phenotypes, including blood counts, risk of leukemic transformation, and event-free survival. Integrating 63 clinical and genomic variables, we created prognostic models capable of generating personally tailored predictions of clinical outcomes in patients with chronic-phase myeloproliferative neoplasms and myelofibrosis. The predicted and observed outcomes correlated well in internal cross-validation of a training cohort and in an independent external cohort. Even within individual categories of existing prognostic schemas, our models substantially improved predictive accuracy.

    Conclusions: Comprehensive genomic characterization identified distinct genetic subgroups and provided a classification of myeloproliferative neoplasms on the basis of causal biologic mechanisms. Integration of genomic data with clinical variables enabled the personalized predictions of patients' outcomes and may support the treatment of patients with myeloproliferative neoplasms. (Funded by the Wellcome Trust and others.).

    Funded by: Medical Research Council: MC_PC_12009; NCI NIH HHS: P30 CA008748; Wellcome Trust: 203151

    The New England journal of medicine 2018;379;15;1416-1430

  • FusC, a member of the M16 protease family acquired by bacteria for iron piracy against plants.

    Grinter R, Hay ID, Song J, Wang J, Teng D, Dhanesakaran V, Wilksch JJ, Davies MR, Littler D, Beckham SA, Henderson IR, Strugnell RA, Dougan G and Lithgow T

    Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Clayton, Australia.

    Iron is essential for life. Accessing iron from the environment can be a limiting factor that determines success in a given environmental niche. For bacteria, access of chelated iron from the environment is often mediated by TonB-dependent transporters (TBDTs), which are β-barrel proteins that form sophisticated channels in the outer membrane. Reports of iron-bearing proteins being used as a source of iron indicate specific protein import reactions across the bacterial outer membrane. The molecular mechanism by which a folded protein can be imported in this way had remained mysterious, as did the evolutionary process that could lead to such a protein import pathway. How does the bacterium evolve the specificity factors that would be required to select and import a protein encoded on another organism's genome? We describe here a model whereby the plant iron-bearing protein ferredoxin can be imported across the outer membrane of the plant pathogen Pectobacterium by means of a Brownian ratchet mechanism, thereby liberating iron into the bacterium to enable its growth in plant tissues. This import pathway is facilitated by FusC, a member of the same protein family as the mitochondrial processing peptidase (MPP). The Brownian ratchet depends on binding sites discovered in crystal structures of FusC that engage a linear segment of the plant protein ferredoxin. Sequence relationships suggest that the bacterial gene encoding FusC has previously unappreciated homologues in plants and that the protein import mechanism employed by the bacterium is an evolutionary echo of the protein import pathway in plant mitochondria and plastids.

    Funded by: Wellcome Trust: 106077/Z/14/Z

    PLoS biology 2018;16;8;e2006026

  • Cryo-EM structure of an essential Plasmodium vivax invasion complex.

    Gruszczyk J, Huang RK, Chan LJ, Menant S, Hong C, Murphy JM, Mok YF, Griffin MDW, Pearson RD, Wong W, Cowman AF, Yu Z and Tham WH

    The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.

    Plasmodium vivax is the most widely distributed malaria parasite that infects humans<sup>1</sup>. P. vivax invades reticulocytes exclusively, and successful entry depends on specific interactions between the P. vivax reticulocyte-binding protein 2b (PvRBP2b) and transferrin receptor 1 (TfR1)<sup>2</sup>. TfR1-deficient erythroid cells are refractory to invasion by P. vivax, and anti-PvRBP2b monoclonal antibodies inhibit reticulocyte binding and block P. vivax invasion in field isolates<sup>2</sup>. Here we report a high-resolution cryo-electron microscopy structure of a ternary complex of PvRBP2b bound to human TfR1 and transferrin, at 3.7 Å resolution. Mutational analyses show that PvRBP2b residues involved in complex formation are conserved; this suggests that antigens could be designed that act across P. vivax strains. Functional analyses of TfR1 highlight how P. vivax hijacks TfR1, an essential housekeeping protein, by binding to sites that govern host specificity, without affecting its cellular function of transporting iron. Crystal and solution structures of PvRBP2b in complex with antibody fragments characterize the inhibitory epitopes. Our results establish a structural framework for understanding how P. vivax reticulocyte-binding protein engages its receptor and the molecular mechanism of inhibitory monoclonal antibodies, providing important information for the design of novel vaccine candidates.

    Funded by: Wellcome Trust: 090770, 208693

    Nature 2018;559;7712;135-139

  • Transferrin receptor 1 is a reticulocyte-specific receptor for Plasmodium vivax.

    Gruszczyk J, Kanjee U, Chan LJ, Menant S, Malleret B, Lim NTY, Schmidt CQ, Mok YF, Lin KM, Pearson RD, Rangel G, Smith BJ, Call MJ, Weekes MP, Griffin MDW, Murphy JM, Abraham J, Sriprawat K, Menezes MJ, Ferreira MU, Russell B, Renia L, Duraisingh MT and Tham WH

    The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria 3052, Australia.

    <i>Plasmodium vivax</i> shows a strict host tropism for reticulocytes. We identified transferrin receptor 1 (TfR1) as the receptor for <i>P. vivax</i> reticulocyte-binding protein 2b (PvRBP2b). We determined the structure of the N-terminal domain of PvRBP2b involved in red blood cell binding, elucidating the molecular basis for TfR1 recognition. We validated TfR1 as the biological target of PvRBP2b engagement by means of TfR1 expression knockdown analysis. TfR1 mutant cells deficient in PvRBP2b binding were refractory to invasion of <i>P. vivax</i> but not to invasion of <i>P. falciparum</i> Using Brazilian and Thai clinical isolates, we show that PvRBP2b monoclonal antibodies that inhibit reticulocyte binding also block <i>P. vivax</i> entry into reticulocytes. These data show that TfR1-PvRBP2b invasion pathway is critical for the recognition of reticulocytes during <i>P. vivax</i> invasion.

    Funded by: CIHR; Howard Hughes Medical Institute; Wellcome Trust: 090770, 098051, 108070, 206194, 208693

    Science (New York, N.Y.) 2018;359;6371;48-55

  • Mutations in Vps15 perturb neuronal migration in mice and are associated with neurodevelopmental disease in humans.

    Gstrein T, Edwards A, Přistoupilová A, Leca I, Breuss M, Pilat-Carotta S, Hansen AH, Tripathy R, Traunbauer AK, Hochstoeger T, Rosoklija G, Repic M, Landler L, Stránecký V, Dürnberger G, Keane TM, Zuber J, Adams DJ, Flint J, Honzik T, Gut M, Beltran S, Mechtler K, Sherr E, Kmoch S, Gut I and Keays DA

    Institute of Molecular Pathology (IMP), Vienna Biocentre (VBC), Vienna, Austria.

    The formation of the vertebrate brain requires the generation, migration, differentiation and survival of neurons. Genetic mutations that perturb these critical cellular events can result in malformations of the telencephalon, providing a molecular window into brain development. Here we report the identification of an N-ethyl-N-nitrosourea-induced mouse mutant characterized by a fractured hippocampal pyramidal cell layer, attributable to defects in neuronal migration. We show that this is caused by a hypomorphic mutation in Vps15 that perturbs endosomal-lysosomal trafficking and autophagy, resulting in an upregulation of Nischarin, which inhibits Pak1 signaling. The complete ablation of Vps15 results in the accumulation of autophagic substrates, the induction of apoptosis and severe cortical atrophy. Finally, we report that mutations in VPS15 are associated with cortical atrophy and epilepsy in humans. These data highlight the importance of the Vps15-Vps34 complex and the Nischarin-Pak1 signaling hub in the development of the telencephalon.

    Funded by: NINDS NIH HHS: R01 NS058721

    Nature neuroscience 2018;21;2;207-217

  • KIAA1109 Variants Are Associated with a Severe Disorder of Brain Development and Arthrogryposis.

    Gueneau L, Fish RJ, Shamseldin HE, Voisin N, Tran Mau-Them F, Preiksaitiene E, Monroe GR, Lai A, Putoux A, Allias F, Ambusaidi Q, Ambrozaityte L, Cimbalistienė L, Delafontaine J, Guex N, Hashem M, Kurdi W, Jamuar SS, Ying LJ, Bonnard C, Pippucci T, Pradervand S, Roechert B, van Hasselt PM, Wiederkehr M, Wright CF, DDD Study, Xenarios I, van Haaften G, Shaw-Smith C, Schindewolf EM, Neerman-Arbez M, Sanlaville D, Lesca G, Guibaud L, Reversade B, Chelly J, Kučinskas V, Alkuraya FS and Reymond A

    Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.

    Whole-exome and targeted sequencing of 13 individuals from 10 unrelated families with overlapping clinical manifestations identified loss-of-function and missense variants in KIAA1109 allowing delineation of an autosomal-recessive multi-system syndrome, which we suggest to name Alkuraya-Kučinskas syndrome (MIM 617822). Shared phenotypic features representing the cardinal characteristics of this syndrome combine brain atrophy with clubfoot and arthrogryposis. Affected individuals present with cerebral parenchymal underdevelopment, ranging from major cerebral parenchymal thinning with lissencephalic aspect to moderate parenchymal rarefaction, severe to mild ventriculomegaly, cerebellar hypoplasia with brainstem dysgenesis, and cardiac and ophthalmologic anomalies, such as microphthalmia and cataract. Severe loss-of-function cases were incompatible with life, whereas those individuals with milder missense variants presented with severe global developmental delay, syndactyly of 2<sup>nd</sup> and 3<sup>rd</sup> toes, and severe muscle hypotonia resulting in incapacity to stand without support. Consistent with a causative role for KIAA1109 loss-of-function/hypomorphic variants in this syndrome, knockdowns of the zebrafish orthologous gene resulted in embryos with hydrocephaly and abnormally curved notochords and overall body shape, whereas published knockouts of the fruit fly and mouse orthologous genes resulted in lethality or severe neurological defects reminiscent of the probands' features.

    Funded by: Wellcome Trust

    American journal of human genetics 2018;102;1;116-132

  • The opium poppy genome and morphinan production.

    Guo L, Winzer T, Yang X, Li Y, Ning Z, He Z, Teodor R, Lu Y, Bowser TA, Graham IA and Ye K

    MOE Key Lab for Intelligent Networks and Networks Security, School of Electronics and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049 China.

    Morphinan-based painkillers are derived from opium poppy (<i>Papaver somniferum</i> L.). We report a draft of the opium poppy genome, with 2.72 gigabases assembled into 11 chromosomes with contig N50 and scaffold N50 of 1.77 and 204 megabases, respectively. Synteny analysis suggests a whole-genome duplication at ~7.8 million years ago and ancient segmental or whole-genome duplication(s) that occurred before the Papaveraceae-Ranunculaceae divergence 110 million years ago. Syntenic blocks representative of phthalideisoquinoline and morphinan components of a benzylisoquinoline alkaloid cluster of 15 genes provide insight into how this cluster evolved. Paralog analysis identified P450 and oxidoreductase genes that combined to form the <i>STORR</i> gene fusion essential for morphinan biosynthesis in opium poppy. Thus, gene duplication, rearrangement, and fusion events have led to evolution of specialized metabolic products in opium poppy.

    Funded by: Wellcome Trust

    Science (New York, N.Y.) 2018;362;6412;343-347

  • Compensation between CSF1R+ macrophages and Foxp3+ Treg cells drives resistance to tumor immunotherapy.

    Gyori D, Lim EL, Grant FM, Spensberger D, Roychoudhuri R, Shuttleworth SJ, Okkenhaug K, Stephens LR and Hawkins PT

    Signalling ISP, Babraham Institute, Babraham Research Campus, Cambridge, Cambridgeshire, United Kingdom.

    Redundancy and compensation provide robustness to biological systems but may contribute to therapy resistance. Both tumor-associated macrophages (TAMs) and Foxp3+ regulatory T (Treg) cells promote tumor progression by limiting antitumor immunity. Here we show that genetic ablation of CSF1 in colorectal cancer cells reduces the influx of immunosuppressive CSF1R+ TAMs within tumors. This reduction in CSF1-dependent TAMs resulted in increased CD8+ T cell attack on tumors, but its effect on tumor growth was limited by a compensatory increase in Foxp3+ Treg cells. Similarly, disruption of Treg cell activity through their experimental ablation produced moderate effects on tumor growth and was associated with elevated numbers of CSF1R+ TAMs. Importantly, codepletion of CSF1R+ TAMs and Foxp3+ Treg cells resulted in an increased influx of CD8+ T cells, augmentation of their function, and a synergistic reduction in tumor growth. Further, inhibition of Treg cell activity either through systemic pharmacological blockade of PI3Kδ, or its genetic inactivation within Foxp3+ Treg cells, sensitized previously unresponsive solid tumors to CSF1R+ TAM depletion and enhanced the effect of CSF1R blockade. These findings identify CSF1R+ TAMs and PI3Kδ-driven Foxp3+ Treg cells as the dominant compensatory cellular components of the immunosuppressive tumor microenvironment, with implications for the design of combinatorial immunotherapies.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/E009867/1, BB/J004456/1, BB/N007794/1, BBS/E/B/0000H235, BBS/E/B/000C0407, BBS/E/B/000C0409, BBS/E/B/000C0427, BBS/E/B/000C0428; Cancer Research UK: C52623/A22597; Wellcome Trust: 095198/Z/10/Z, 095691, 105663/Z/14/Z

    JCI insight 2018;3;11

  • Response to Giem.

    Haber M, Doumet-Serhal C, Scheib C, Xue Y, Danecek P, Mezzavilla M, Youhanna S, Martiniano R, Prado-Martinez J, Szpak M, Matisoo-Smith E, Schutkowski H, Mikulski R, Zalloua P, Kivisild T and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambs. CB10 1SA, United Kingdom. Electronic address:

    American journal of human genetics 2018;102;2;331

  • Evidence for genetic contribution to the increased risk of type 2 diabetes in schizophrenia.

    Hackinger S, Prins B, Mamakou V, Zengini E, Marouli E, Brčić L, Serafetinidis I, Lamnissou K, Kontaxakis V, Dedoussis G, Gonidakis F, Thanopoulou A, Tentolouris N, Tsezou A and Zeggini E

    Human Genetics, Wellcome Trust Sanger Institute, Hinxton, CB10 1HH, UK.

    The epidemiologic link between schizophrenia (SCZ) and type 2 diabetes (T2D) remains poorly understood. Here, we investigate the presence and extent of a shared genetic background between SCZ and T2D using genome-wide approaches. We performed a genome-wide association study (GWAS) and polygenic risk score analysis in a Greek sample collection (GOMAP) comprising three patient groups: SCZ only (n = 924), T2D only (n = 822), comorbid SCZ and T2D (n = 505); samples from two separate Greek cohorts were used as population-based controls (n = 1,125). We used genome-wide summary statistics from two large-scale GWAS of SCZ and T2D from the PGC and DIAGRAM consortia, respectively, to perform genetic overlap analyses, including a regional colocalisation test. We show for the first time that patients with comorbid SCZ and T2D have a higher genetic predisposition to both disorders compared to controls. We identify five genomic regions with evidence of colocalising SCZ and T2D signals, three of which contain known loci for both diseases. We also observe a significant excess of shared association signals between SCZ and T2D at nine out of ten investigated p value thresholds. Finally, we identify 29 genes associated with both T2D and SCZ, several of which have been implicated in biological processes relevant to these disorders. Together our results demonstrate that the observed comorbidity between SCZ and T2D is at least in part due to shared genetic mechanisms.

    Funded by: Wellcome Trust: WT098051

    Translational psychiatry 2018;8;1;252

  • Phandango: an interactive viewer for bacterial population genomics.

    Hadfield J, Croucher NJ, Goater RJ, Abudahab K, Aanensen DM and Harris SR

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.

    Summary: Fully exploiting the wealth of data in current bacterial population genomics datasets requires synthesizing and integrating different types of analysis across millions of base pairs in hundreds or thousands of isolates. Current approaches often use static representations of phylogenetic, epidemiological, statistical and evolutionary analysis results that are difficult to relate to one another. Phandango is an interactive application running in a web browser allowing fast exploration of large-scale population genomics datasets combining the output from multiple genomic analysis methods in an intuitive and interactive manner.

    Availability and implementation: Phandango is a web application freely available for use at and includes a diverse collection of datasets as examples. Source code together with a detailed wiki page is available on GitHub at

    Bioinformatics (Oxford, England) 2018;34;2;292-293

  • Gene expression variability across cells and species shapes innate immunity.

    Hagai T, Chen X, Miragaia RJ, Rostom R, Gomes T, Kunowska N, Henriksson J, Park JE, Proserpio V, Donati G, Bossini-Castillo L, Vieira Braga FA, Naamati G, Fletcher J, Stephenson E, Vegh P, Trynka G, Kondova I, Dennis M, Haniffa M, Nourmohammad A, Lässig M and Teichmann SA

    Wellcome Sanger Institute, Cambridge, UK.

    As the first line of defence against pathogens, cells mount an innate immune response, which varies widely from cell to cell. The response must be potent but carefully controlled to avoid self-damage. How these constraints have shaped the evolution of innate immunity remains poorly understood. Here we characterize the innate immune response's transcriptional divergence between species and variability in expression among cells. Using bulk and single-cell transcriptomics in fibroblasts and mononuclear phagocytes from different species, challenged with immune stimuli, we map the architecture of the innate immune response. Transcriptionally diverging genes, including those that encode cytokines and chemokines, vary across cells and have distinct promoter structures. Conversely, genes that are involved in the regulation of this response, such as those that encode transcription factors and kinases, are conserved between species and display low cell-to-cell variability in expression. We suggest that this expression pattern, which is observed across species and conditions, has evolved as a mechanism for fine-tuned regulation to achieve an effective but balanced response.

    Funded by: Medical Research Council: MR/N014995/1

    Nature 2018;563;7730;197-202

  • Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors.

    Haghverdi L, Lun ATL, Morgan MD and Marioni JC

    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.

    Large-scale single-cell RNA sequencing (scRNA-seq) data sets that are produced in different laboratories and at different times contain batch effects that may compromise the integration and interpretation of the data. Existing scRNA-seq analysis methods incorrectly assume that the composition of cell populations is either known or identical across batches. We present a strategy for batch correction based on the detection of mutual nearest neighbors (MNNs) in the high-dimensional expression space. Our approach does not rely on predefined or equal population compositions across batches; instead, it requires only that a subset of the population be shared between batches. We demonstrate the superiority of our approach compared with existing methods by using both simulated and real scRNA-seq data sets. Using multiple droplet-based scRNA-seq data sets, we demonstrate that our MNN batch-effect-correction method can be scaled to large numbers of cells.

    Funded by: Wellcome Trust

    Nature biotechnology 2018;36;5;421-427

  • Hepatic gene body hypermethylation is a shared epigenetic signature of murine longevity.

    Hahn O, Stubbs TM, Reik W, Grönke S, Beyer A and Partridge L

    Max Planck Institute for Biology of Ageing, Cologne, Germany.

    Dietary, pharmacological and genetic interventions can extend health- and lifespan in diverse mammalian species. DNA methylation has been implicated in mediating the beneficial effects of these interventions; methylation patterns deteriorate during ageing, and this is prevented by lifespan-extending interventions. However, whether these interventions also actively shape the epigenome, and whether such epigenetic reprogramming contributes to improved health at old age, remains underexplored. We analysed published, whole-genome, BS-seq data sets from mouse liver to explore DNA methylation patterns in aged mice in response to three lifespan-extending interventions: dietary restriction (DR), reduced TOR signaling (rapamycin), and reduced growth (Ames dwarf mice). Dwarf mice show enhanced DNA hypermethylation in the body of key genes in lipid biosynthesis, cell proliferation and somatotropic signaling, which strongly correlates with the pattern of transcriptional repression. Remarkably, DR causes a similar hypermethylation in lipid biosynthesis genes, while rapamycin treatment increases methylation signatures in genes coding for growth factor and growth hormone receptors. Shared changes of DNA methylation were restricted to hypermethylated regions, and they were not merely a consequence of slowed ageing, thus suggesting an active mechanism driving their formation. By comparing the overlap in ageing-independent hypermethylated patterns between all three interventions, we identified four regions, which, independent of genetic background or gender, may serve as novel biomarkers for longevity-extending interventions. In summary, we identified gene body hypermethylation as a novel and partly conserved signature of lifespan-extending interventions in mouse, highlighting epigenetic reprogramming as a possible intervention to improve health at old age.

    Funded by: Biotechnology and Biological Sciences Research Council; Wellcome Trust

    PLoS genetics 2018;14;11;e1007766

  • Tissue-specific transcriptome analyses provide new insights into GPCR signalling in adult Schistosoma mansoni.

    Hahnel S, Wheeler N, Lu Z, Wangwiwatsin A, McVeigh P, Maule A, Berriman M, Day T, Ribeiro P and Grevelding CG

    Institute of Parasitology, BFS, Justus Liebig University, Giessen, Germany.

    Schistosomes are blood-dwelling trematodes with global impact on human and animal health. Because medical treatment is currently based on a single drug, praziquantel, there is urgent need for the development of alternative control strategies. The Schistosoma mansoni genome project provides a platform to study and connect the genetic repertoire of schistosomes to specific biological functions essential for successful parasitism. G protein-coupled receptors (GPCRs) form the largest superfamily of transmembrane receptors throughout the Eumetazoan phyla, including platyhelminths. Due to their involvement in diverse biological processes, their pharmacological importance, and proven druggability, GPCRs are promising targets for new anthelmintics. However, to identify candidate receptors, a more detailed understanding of the roles of GPCR signalling in schistosome biology is essential. An updated phylogenetic analysis of the S. mansoni GPCR genome (GPCRome) is presented, facilitated by updated genome data that allowed a more precise annotation of GPCRs. Additionally, we review the current knowledge on GPCR signalling in this parasite and provide new insights into the potential roles of GPCRs in schistosome reproduction based on the findings of a recent tissue-specific transcriptomic study in paired and unpaired S. mansoni. According to the current analysis, GPCRs contribute to gonad-specific functions but also to nongonad, pairing-dependent processes. The latter may regulate gonad-unrelated functions during the multifaceted male-female interaction. Finally, we compare the schistosome GPCRome to that of another parasitic trematode, Fasciola, and discuss the importance of GPCRs to basic and applied research. Phylogenetic analyses display GPCR diversity in free-living and parasitic platyhelminths and suggest diverse functions in schistosomes. Although their roles need to be substantiated by functional studies in the future, the data support the selection of GPCR candidates for basic and applied studies, invigorating the exploitation of this important receptor class for drug discovery against schistosomes but also other trematodes.

    PLoS pathogens 2018;14;1;e1006718

  • Tissue-Restricted Adaptive Type 2 Immunity Is Orchestrated by Expression of the Costimulatory Molecule OX40L on Group 2 Innate Lymphoid Cells.

    Halim TYF, Rana BMJ, Walker JA, Kerscher B, Knolle MD, Jolin HE, Serrao EM, Haim-Vilmovsky L, Teichmann SA, Rodewald HR, Botto M, Vyse TJ, Fallon PG, Li Z, Withers DR and McKenzie ANJ

    MRC Laboratory of Molecular Biology, Cambridge CB2 0QH, UK; University of Cambridge, CRUK Cambridge Institute, Cambridge CB2 0RE, UK. Electronic address:

    The local regulation of type 2 immunity relies on dialog between the epithelium and the innate and adaptive immune cells. Here we found that alarmin-induced expression of the co-stimulatory molecule OX40L on group 2 innate lymphoid cells (ILC2s) provided tissue-restricted T cell co-stimulation that was indispensable for Th2 and regulatory T (Treg) cell responses in the lung and adipose tissue. Interleukin (IL)-33 administration resulted in organ-specific surface expression of OX40L on ILC2s and the concomitant expansion of Th2 and Treg cells, which was abolished upon deletion of OX40L on ILC2s (Il7ra<sup>Cre/+</sup>Tnfsf4<sup>fl/fl</sup> mice). Moreover, Il7ra<sup>Cre/+</sup>Tnfsf4<sup>fl/fl</sup> mice failed to mount effective Th2 and Treg cell responses and corresponding adaptive type 2 pulmonary inflammation arising from Nippostrongylus brasiliensis infection or allergen exposure. Thus, the increased expression of OX40L in response to IL-33 acts as a licensing signal in the orchestration of tissue-specific adaptive type 2 immunity, without which this response fails to establish.

    Funded by: Medical Research Council: MC_U105178805; Wellcome Trust

    Immunity 2018;48;6;1195-1207.e6

  • Heterozygous mutations affecting the protein kinase domain of CDK13 cause a syndromic form of developmental delay and intellectual disability.

    Hamilton MJ, Caswell RC, Canham N, Cole T, Firth HV, Foulds N, Heimdal K, Hobson E, Houge G, Joss S, Kumar D, Lampe AK, Maystadt I, McKay V, Metcalfe K, Newbury-Ecob R, Park SM, Robert L, Rustad CF, Wakeling E, Wilkie AOM, Study TDDD, Twigg SRF and Suri M

    West of Scotland Genetics Service, Queen Elizabeth University Hospital, Glasgow, UK.

    Introduction: Recent evidence has emerged linking mutations in <i>CDK13</i> to syndromic congenital heart disease. We present here genetic and phenotypic data pertaining to 16 individuals with <i>CDK13</i> mutations.

    Methods: Patients were investigated by exome sequencing, having presented with developmental delay and additional features suggestive of a syndromic cause.

    Results: Our cohort comprised 16 individuals aged 4-16 years. All had developmental delay, including six with autism spectrum disorder. Common findings included feeding difficulties (15/16), structural cardiac anomalies (9/16), seizures (4/16) and abnormalities of the corpus callosum (4/11 patients who had undergone MRI). All had craniofacial dysmorphism, with common features including short, upslanting palpebral fissures, hypertelorism or telecanthus, medial epicanthic folds, low-set, posteriorly rotated ears and a small mouth with thin upper lip vermilion. Fifteen patients had predicted missense mutations, including five identical p.(Asn842Ser) substitutions and two p.(Gly717Arg) substitutions. One patient had a canonical splice acceptor site variant (c.2898-1G>A). All mutations were located within the protein kinase domain of CDK13. The affected amino acids are highly conserved, and in silico analyses including comparative protein modelling predict that they will interfere with protein function. The location of the missense mutations in a key catalytic domain suggests that they are likely to cause loss of catalytic activity but retention of cyclin K binding, resulting in a dominant negative mode of action. Although the splice-site mutation was predicted to produce a stable internally deleted protein, this was not supported by expression studies in lymphoblastoid cells. A loss of function contribution to the underlying pathological mechanism therefore cannot be excluded, and the clinical significance of this variant remains uncertain.

    Conclusions: These patients demonstrate that heterozygous, likely dominant negative mutations affecting the protein kinase domain of the <i>CDK13</i> gene result in a recognisable, syndromic form of intellectual disability, with or without congenital heart disease.

    Funded by: Wellcome Trust: 102731/Z/13/Z

    Journal of medical genetics 2018;55;1;28-38

  • The widespread use of topical antimicrobials enriches for resistance in Staphylococcus aureus isolated from Atopic Dermatitis patients.

    Harkins CP, McAleer MA, Bennett D, McHugh M, Fleury OM, Pettigrew KA, Oravcová K, Parkhill J, Proby CM, Dawe RS, Geoghegan JA, Irvine AD and Holden MTG

    School of Medicine, University of St Andrews, St Andrews, KY11 9TF, UK.

    Background: Carriage rates of Staphylococcus aureus on affected skin in atopic dermatitis (AD) are approximately 70%. Increasing disease severity during flares and overall disease severity correlate with increased burden of S. aureus. Treatment in AD therefore often targets S. aureus, with topical and systemic antimicrobials.

    Objectives: To determine if antimicrobial sensitivities and genetic determinants of resistance differed in S. aureus isolates from the skin of children with AD compared with healthy child nasal carriers.

    Methods: In this case-control study, we compared S. aureus isolates from children with AD (n=50) attending a hospital dermatology department to nasal carriage isolates from children without skin disease (n=49) attending a hospital emergency department for non-infective conditions. Using whole genome sequencing we generated a phylogenetic framework for the isolates based on variation in the core genome, then compared antimicrobial resistance phenotype and genotypes between disease groups.

    Results and conclusions: S. aureus from cases and controls had on average similar numbers of phenotypic resistances per isolate. Case isolates differed in their resistance patterns, with Fusidic acid resistance (Fus<sup>R</sup> ) being significantly more frequent in AD (p=0.009). The genetic basis of Fus<sup>R</sup> also differentiated the populations, with chromosomal mutations in fusA predominating in AD (p=0.049). Analysis revealed that Fus<sup>R</sup> evolved multiple times and via multiple mechanism in the population. Carriage of plasmid derived qac genes, which have been associated with reduced susceptibility to antiseptics, was 8 times more frequent in AD (p=0.016). The results suggest strong selective pressure drives the emergence and maintenance of specific resistances in AD. This article is protected by copyright. All rights reserved.

    The British journal of dermatology 2018

  • The Microevolution and Epidemiology of Staphylococcus aureus Colonization during Atopic Eczema Disease Flare.

    Harkins CP, Pettigrew KA, Oravcová K, Gardner J, Hearn RMR, Rice D, Mather AE, Parkhill J, Brown SJ, Proby CM and Holden MTG

    School of Medicine, University of St Andrews, St Andrews, UK; Department of Dermatology, Ninewells Hospital, Dundee, UK; School of Medicine, University of Dundee, Dundee, UK. Electronic address:

    Staphylococcus aureus is an opportunistic pathogen and variable component of the human microbiota. A characteristic of atopic eczema (AE) is colonization by S. aureus, with exacerbations associated with an increased bacterial burden of the organism. Despite this, the origins and genetic diversity of S. aureus colonizing individual patients during AE disease flares is poorly understood. To examine the microevolution of S. aureus colonization, we deep sequenced S. aureus populations from nine children with moderate to severe AE and 18 non-atopic children asymptomatically carrying S. aureus nasally. Colonization by clonal S. aureus populations was observed in both AE patients and control participants, with all but one of the individuals carrying colonies belonging to a single sequence type. Phylogenetic analysis showed that disease flares were associated with the clonal expansion of the S. aureus population, occurring over a period of weeks to months. There was a significant difference in the genetic backgrounds of S. aureus colonizing AE cases versus controls (Fisher exact test, P = 0.03). Examination of intra-host genetic heterogeneity of the colonizing S. aureus populations identified evidence of within-host selection in the AE patients, with AE variants being potentially selectively advantageous for intracellular persistence and treatment resistance.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/M014088/1; Wellcome Trust

    The Journal of investigative dermatology 2018;138;2;336-343

  • Public health surveillance of multidrug-resistant clones of Neisseria gonorrhoeae in Europe: a genomic survey.

    Harris SR, Cole MJ, Spiteri G, Sánchez-Busó L, Golparian D, Jacobsson S, Goater R, Abudahab K, Yeats CA, Bercot B, Borrego MJ, Crowley B, Stefanelli P, Tripodo F, Abad R, Aanensen DM, Unemo M and Euro-GASP study group

    Infection Genomics, Wellcome Sanger Institute, Hinxton, UK.

    Background: Traditional methods for molecular epidemiology of Neisseria gonorrhoeae are suboptimal. Whole-genome sequencing (WGS) offers ideal resolution to describe population dynamics and to predict and infer transmission of antimicrobial resistance, and can enhance infection control through linkage with epidemiological data. We used WGS, in conjunction with linked epidemiological and phenotypic data, to describe the gonococcal population in 20 European countries. We aimed to detail changes in phenotypic antimicrobial resistance levels (and the reasons for these changes) and strain distribution (with a focus on antimicrobial resistance strains in risk groups), and to predict antimicrobial resistance from WGS data.

    Methods: We carried out an observational study, in which we sequenced isolates taken from patients with gonorrhoea from the European Gonococcal Antimicrobial Surveillance Programme in 20 countries from September to November, 2013. We also developed a web platform that we used for automated antimicrobial resistance prediction, molecular typing (N gonorrhoeae multi-antigen sequence typing [NG-MAST] and multilocus sequence typing), and phylogenetic clustering in conjunction with epidemiological and phenotypic data.

    Findings: The multidrug-resistant NG-MAST genogroup G1407 was predominant and accounted for the most cephalosporin resistance, but the prevalence of this genogroup decreased from 248 (23%) of 1066 isolates in a previous study from 2009-10 to 174 (17%) of 1054 isolates in this survey in 2013. This genogroup previously showed an association with men who have sex with men, but changed to an association with heterosexual people (odds ratio=4·29). WGS provided substantially improved resolution and accuracy over NG-MAST and multilocus sequence typing, predicted antimicrobial resistance relatively well, and identified discrepant isolates, mixed infections or contaminants, and multidrug-resistant clades linked to risk groups.

    Interpretation: To our knowledge, we provide the first use of joint analysis of WGS and epidemiological data in an international programme for regional surveillance of sexually transmitted infections. WGS provided enhanced understanding of the distribution of antimicrobial resistance clones, including replacement with clones that were more susceptible to antimicrobials, in several risk groups nationally and regionally. We provide a framework for genomic surveillance of gonococci through standardised sampling, use of WGS, and a shared information architecture for interpretation and dissemination by use of open access software.

    Funding: The European Centre for Disease Prevention and Control, The Centre for Genomic Pathogen Surveillance, Örebro University Hospital, and Wellcome.

    Funded by: Wellcome Trust: 098051 , 099202

    The Lancet. Infectious diseases 2018;18;7;758-768

  • Genome-wide association study of developmental dysplasia of the hip identifies an association with GDF5.

    Hatzikotoulas K, Roposch A, DDH Case Control Consortium, Shah KM, Clark MJ, Bratherton S, Limbani V, Steinberg J, Zengini E, Warsame K, Ratnayake M, Tselepi M, Schwartzentruber J, Loughlin J, Eastwood DM, Zeggini E and Wilkinson JM

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Morgan Building, Hinxton, Cambridge, CB10 1HH, UK.

    Developmental dysplasia of the hip (DDH) is the most common skeletal developmental disease. However, its genetic architecture is poorly understood. We conduct the largest DDH genome-wide association study to date and replicate our findings in independent cohorts. We find the heritable component of DDH attributable to common genetic variants to be 55% and distributed equally across the autosomal and X-chromosomes. We identify replicating evidence for association between <i>GDF5</i> promoter variation and DDH (rs143384, effect allele A, odds ratio 1.44, 95% confidence interval 1.34-1.56, <i>P</i> = 3.55 × 10<sup>-22</sup>). Gene-based analysis implicates <i>GDF5</i> (<i>P</i> = 9.24 × 10<sup>-12</sup>), <i>UQCC1</i> (<i>P</i> = 1.86 × 10<sup>-</sup><sup>10</sup>), <i>MMP24</i> (<i>P</i> = 3.18 × 10<sup>-9</sup>), <i>RETSAT</i> (<i>P</i> = 3.70 × 10<sup>-</sup><sup>8</sup>) and <i>PDRG1</i> (<i>P</i> = 1.06 × 10<sup>-</sup><sup>7</sup>) in DDH susceptibility. We find shared genetic architecture between DDH and hip osteoarthritis, but no predictive power of osteoarthritis polygenic risk score on DDH status, underscoring the complex nature of the two traits. We report a scalable, time-efficient recruitment strategy and establish for the first time to our knowledge a robust DDH genetic association locus at <i>GDF5</i>.

    Funded by: Medical Research Council: MR/P020941/1; Versus Arthritis: 20771; Wellcome Trust

    Communications biology 2018;1;56

  • The return of Pfeiffer's bacillus: Rising incidence of ampicillin resistance in Haemophilus influenzae.

    Heinz E

    Wellcome Trust Sanger Institute, Cambridge CB10 1SA, UK.

    Haemophilus influenzae, originally named Pfeiffer's bacillus after its discoverer Richard Pfeiffer in 1892, was a major risk for global health at the beginning of the 20th century, causing childhood pneumonia and invasive disease as well as otitis media and other upper respiratory tract infections. The implementation of the Hib vaccine, targeting the major capsule type of H. influenzae, almost eradicated the disease in countries that adapted the vaccination scheme. However, a rising number of infections are caused by non-typeable H. influenzae (NTHi), which has no capsule and against which the vaccine therefore provides no protection, as well as other serotypes equally not recognised by the vaccine. The first line of treatment is ampicillin, but there is a steady rise in ampicillin resistance. This is both through acquired as well as intrinsic mechanisms, and is cause for serious concern and the need for more surveillance. There are also increasing reports of new modifications of the intrinsic ampicillin-resistance mechanism leading to resistance against cephalosporins and carbapenems, the last line of well-tolerated drugs, and ampicillin-resistant H. influenzae was included in the recently released priority list of antibiotic-resistant bacteria by the WHO. This review provides an overview of ampicillin resistance prevalence and mechanisms in the context of our current knowledge about population dynamics of H. influenzae.

    Funded by: Wellcome Trust: 206194

    Microbial genomics 2018;4;9

  • De Novo Pathogenic Variants in CACNA1E Cause Developmental and Epileptic Encephalopathy with Contractures, Macrocephaly, and Dyskinesias.

    Helbig KL, Lauerer RJ, Bahr JC, Souza IA, Myers CT, Uysal B, Schwarz N, Gandini MA, Huang S, Keren B, Mignot C, Afenjar A, Billette de Villemeur T, Héron D, Nava C, Valence S, Buratti J, Fagerberg CR, Soerensen KP, Kibaek M, Kamsteeg EJ, Koolen DA, Gunning B, Schelhaas HJ, Kruer MC, Fox J, Bakhtiari S, Jarrar R, Padilla-Lopez S, Lindstrom K, Jin SC, Zeng X, Bilguvar K, Papavasileiou A, Xing Q, Zhu C, Boysen K, Vairo F, Lanpher BC, Klee EW, Tillema JM, Payne ET, Cousin MA, Kruisselbrink TM, Wick MJ, Baker J, Haan E, Smith N, Sadeghpour A, Davis EE, Katsanis N, Task Force for Neonatal Genomics, Corbett MA, MacLennan AH, Gecz J, Biskup S, Goldmann E, Rodan LH, Kichula E, Segal E, Jackson KE, Asamoah A, Dimmock D, McCarrier J, Botto LD, Filloux F, Tvrdik T, Cascino GD, Klingerman S, Neumann C, Wang R, Jacobsen JC, Nolan MA, Snell RG, Lehnert K, Sadleir LG, Anderlid BM, Kvarnung M, Guerrini R, Friez MJ, Lyons MJ, Leonhard J, Kringlen G, Casas K, El Achkar CM, Smith LA, Rotenberg A, Poduri A, Sanchis-Juan A, Carss KJ, Rankin J, Zeman A, Raymond FL, Blyth M, Kerr B, Ruiz K, Urquhart J, Hughes I, Banka S, Deciphering Developmental Disorders Study, Hedrich UBS, Scheffer IE, Helbig I, Zamponi GW, Lerche H and Mefford HC

    Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.

    Developmental and epileptic encephalopathies (DEEs) are severe neurodevelopmental disorders often beginning in infancy or early childhood that are characterized by intractable seizures, abundant epileptiform activity on EEG, and developmental impairment or regression. CACNA1E is highly expressed in the central nervous system and encodes the α<sub>1</sub>-subunit of the voltage-gated Ca<sub>V</sub>2.3 channel, which conducts high voltage-activated R-type calcium currents that initiate synaptic transmission. Using next-generation sequencing techniques, we identified de novo CACNA1E variants in 30 individuals with DEE, characterized by refractory infantile-onset seizures, severe hypotonia, and profound developmental impairment, often with congenital contractures, macrocephaly, hyperkinetic movement disorders, and early death. Most of the 14, partially recurring, variants cluster within the cytoplasmic ends of all four S6 segments, which form the presumed Ca<sub>V</sub>2.3 channel activation gate. Functional analysis of several S6 variants revealed consistent gain-of-function effects comprising facilitated voltage-dependent activation and slowed inactivation. Another variant located in the domain II S4-S5 linker results in facilitated activation and increased current density. Five participants achieved seizure freedom on the anti-epileptic drug topiramate, which blocks R-type calcium channels. We establish pathogenic variants in CACNA1E as a cause of DEEs and suggest facilitated R-type calcium currents as a disease mechanism for human epilepsy and developmental disorders.

    Funded by: NINDS NIH HHS: R01 NS069605, R56 NS069605; Wellcome Trust

    American journal of human genetics 2018;103;5;666-678

  • Refining the phenotype associated with GNB1 mutations: Clinical data on 18 newly identified patients and review of the literature.

    Hemati P, Revah-Politi A, Bassan H, Petrovski S, Bilancia CG, Ramsey K, Griffin NG, Bier L, Cho MT, Rosello M, Lynch SA, Colombo S, Weber A, Haug M, Heinzen EL, Sands TT, Narayanan V, Primiano M, Aggarwal VS, Millan F, Sattler-Holtrop SG, Caro-Llopis A, Pillar N, Baker J, Freedman R, Kroes HY, Sacharow S, Stong N, Lapunzina P, Schneider MC, Mendelsohn NJ, Singleton A, Loik Ramey V, Wou K, Kuzminsky A, Monfort S, Weiss M, Doyle S, Iglesias A, Martinez F, Mckenzie F, Orellana C, van Gassen KLI, Palomares M, Bazak L, Lee A, Bircher A, Basel-Vanagaite L, Hafström M, Houge G, C4RCD Research Group, DDD study, Goldstein DB and Anyane-Yeboa K

    Institute for Genomic Medicine, Columbia University Medical Center, New York, New York.

    De novo germline mutations in GNB1 have been associated with a neurodevelopmental phenotype. To date, 28 patients with variants classified as pathogenic have been reported. We add 18 patients with de novo mutations to this cohort, including a patient with mosaicism for a GNB1 mutation who presented with a milder phenotype. Consistent with previous reports, developmental delay in these patients was moderate to severe, and more than half of the patients were non-ambulatory and nonverbal. The most observed substitution affects the p.Ile80 residue encoded in exon 6, with 28% of patients carrying a variant at this residue. Dystonia and growth delay were observed more frequently in patients carrying variants in this residue, suggesting a potential genotype-phenotype correlation. In the new cohort of 18 patients, 50% of males had genitourinary anomalies and 61% of patients had gastrointestinal anomalies, suggesting a possible association of these findings with variants in GNB1. In addition, cutaneous mastocytosis, reported once before in a patient with a GNB1 variant, was observed in three additional patients, providing further evidence for an association to GNB1. We will review clinical and molecular data of these new cases and all previously reported cases to further define the phenotype and establish possible genotype-phenotype correlations.

    American journal of medical genetics. Part A 2018;176;11;2259-2275

  • Single-cell genomics.

    Hemberg M

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

    Briefings in functional genomics 2018;17;4;207-208

  • ProxECAT: Proxy External Controls Association Test. A new case-control gene region association test using allele frequencies from public controls.

    Hendricks AE, Billups SC, Pike HNC, Farooqi IS, Zeggini E, Santorico SA, Barroso I and Dupuis J

    Mathematical and Statistical Sciences Department, University of Colorado Denver, Denver, CO, United States of America.

    A primary goal of the recent investment in sequencing is to detect novel genetic associations in health and disease improving the development of treatments and playing a critical role in precision medicine. While this investment has resulted in an enormous total number of sequenced genomes, individual studies of complex traits and diseases are often smaller and underpowered to detect rare variant genetic associations. Existing genetic resources such as the Exome Aggregation Consortium (>60,000 exomes) and the Genome Aggregation Database (~140,000 sequenced samples) have the potential to be used as controls in these studies. Fully utilizing these and other existing sequencing resources may increase power and could be especially useful in studies where resources to sequence additional samples are limited. However, to date, these large, publicly available genetic resources remain underutilized, or even misused, in large part due to the lack of statistical methods that can appropriately use this summary level data. Here, we present a new method to incorporate external controls in case-control analysis called ProxECAT (Proxy External Controls Association Test). ProxECAT estimates enrichment of rare variants within a gene region using internally sequenced cases and external controls. We evaluated ProxECAT in simulations and empirical analyses of obesity cases using both low-depth of coverage (7x) whole-genome sequenced controls and ExAC as controls. We find that ProxECAT maintains the expected type I error rate with increased power as the number of external controls increases. With an accompanying R package, ProxECAT enables the use of publicly available allele frequencies as external controls in case-control analysis.

    Funded by: NIDDK NIH HHS: U01 DK078616; Wellcome Trust: WT098051, WT206194

    PLoS genetics 2018;14;10;e1007591

  • Single-cell transcriptional analysis reveals ILC-like cells in zebrafish.

    Hernández PP, Strzelecka PM, Athanasiadis EI, Hall D, Robalo AF, Collins CM, Boudinot P, Levraud JP and Cvejic A

    Macrophages et Développement de l'Immunité, Institut Pasteur, Paris, France.

    Innate lymphoid cells (ILCs) are important mediators of the immune response and homeostasis in barrier tissues of mammals. However, the existence and function of ILCs in other vertebrates are poorly understood. Here, we use single-cell RNA sequencing to generate a comprehensive atlas of zebrafish lymphocytes during tissue homeostasis and after immune challenge. We profiled 14,080 individual cells from the gut of wild-type zebrafish, as well as of <i>rag1</i>-deficient zebrafish that lack T and B cells, and discovered populations of ILC-like cells. We uncovered a <i>rorc</i>-positive subset of ILCs that could express cytokines associated with type 1, 2, and 3 responses upon immune challenge. Specifically, these ILC-like cells expressed <i>il22</i> and <i>tnfa</i> after exposure to inactivated bacteria or <i>il13</i> after exposure to helminth extract. Cytokine-producing ILC-like cells express a specific repertoire of novel immune-type receptors, likely involved in recognition of environmental cues. We identified additional novel markers of zebrafish ILCs and generated a cloud repository for their in-depth exploration.

    Funded by: Cancer Research UK: C45041/A14953; Medical Research Council: MC_PC_12009; Wellcome Trust

    Science immunology 2018;3;29

  • Long- and short-term outcomes in renal allografts with deceased donors: A large recipient and donor genome-wide association study.

    Hernandez-Fuentes MP, Franklin C, Rebollo-Mesa I, Mollon J, Delaney F, Perucha E, Stapleton C, Borrows R, Byrne C, Cavalleri G, Clarke B, Clatworthy M, Feehally J, Fuggle S, Gagliano SA, Griffin S, Hammad A, Higgins R, Jardine A, Keogan M, Leach T, MacPhee I, Mark PB, Marsh J, Maxwell P, McKane W, McLean A, Newstead C, Augustine T, Phelan P, Powis S, Rowe P, Sheerin N, Solomon E, Stephens H, Thuraisingham R, Trembath R, Topham P, Vaughan R, Sacks SH, Conlon P, Opelz G, Soranzo N, Weale ME, Lord GM and United Kingdom and Ireland Renal Transplant Consortium (UKIRTC) and the Wellcome Trust Case Control Consortium (WTCCC)-3

    King's College London, MRC Centre for Transplantation, London, UK.

    Improvements in immunosuppression have modified short-term survival of deceased-donor allografts, but not their rate of long-term failure. Mismatches between donor and recipient HLA play an important role in the acute and chronic allogeneic immune response against the graft. Perfect matching at clinically relevant HLA loci does not obviate the need for immunosuppression, suggesting that additional genetic variation plays a critical role in both short- and long-term graft outcomes. By combining patient data and samples from supranational cohorts across the United Kingdom and European Union, we performed the first large-scale genome-wide association study analyzing both donor and recipient DNA in 2094 complete renal transplant-pairs with replication in 5866 complete pairs. We studied deceased-donor grafts allocated on the basis of preferential HLA matching, which provided some control for HLA genetic effects. No strong donor or recipient genetic effects contributing to long- or short-term allograft survival were found outside the HLA region. We discuss the implications for future research and clinical application.

    Funded by: Medical Research Council: G0600698, G0600892, G0701320, G0802068, MC_PC_15025, MR/J006742/1, MR/K002996/1, MR/K500999/1; Wellcome Trust: 088849/Z/09/Z, 090355/A/09/Z, 090355/B/09/Z, WT091310, WT098051

    American journal of transplantation : official journal of the American Society of Transplantation and the American Society of Transplant Surgeons 2018;18;6;1370-1379

  • Improving communication for interdisciplinary teams working on storage of digital information in DNA.

    Hesketh EE, Sayir J and Goldman N

    Wellcome Sanger Institute, Hinxton, CB10 1SA, UK.

    Close collaboration between specialists from diverse backgrounds and working in different scientific domains is an effective strategy to overcome challenges in areas that interface between biology, chemistry, physics and engineering. Communication in such collaborations can itself be challenging.  Even when projects are successfully concluded, resulting publications - necessarily multi-authored - have the potential to be disjointed. Few, both in the field and outside, may be able to fully understand the work as a whole. This needs to be addressed to facilitate efficient working, peer review, accessibility and impact to larger audiences. We are an interdisciplinary team working in a nascent scientific area, the repurposing of DNA as a storage medium for digital information. In this note, we highlight some of the difficulties that arise from such collaborations and outline our efforts to improve communication through a glossary and a controlled vocabulary and accessibility via short plain-language summaries. We hope to stimulate early discussion within this emerging field of how our community might improve the description and presentation of our work to facilitate clear communication within and between research groups and increase accessibility to those not familiar with our respective fields - be it molecular biology, computer science, information theory or others that might become relevant in future. To enable an open and inclusive discussion we have created a glossary and controlled vocabulary as a cloud-based shared document and we invite other scientists to critique our suggestions and contribute their own ideas.

    F1000Research 2018;7;39

  • What Is Resistance? Impact of Phenotypic versus Molecular Drug Resistance Testing on Therapy for Multi- and Extensively Drug-Resistant Tuberculosis.

    Heyckendorf J, Andres S, Köser CU, Olaru ID, Schön T, Sturegård E, Beckert P, Schleusener V, Kohl TA, Hillemann D, Moradigaravand D, Parkhill J, Peacock SJ, Niemann S, Lange C and Merker M

    Division of Clinical Infectious Diseases, Research Center Borstel, Borstel, Germany.

    Rapid and accurate drug susceptibility testing (DST) is essential for the treatment of multi- and extensively drug-resistant tuberculosis (M/XDR-TB). We compared the utility of genotypic DST assays with phenotypic DST (pDST) using Bactec 960 MGIT or Löwenstein-Jensen to construct M/XDR-TB treatment regimens for a cohort of 25 consecutive M/XDR-TB patients and 15 possible anti-TB drugs. Genotypic DST results from Cepheid GeneXpert MTB/RIF (Xpert) and line probe assays (LPAs; Hain GenoType MTBDR<i>plus</i> 2.0 and MTBDR<i>sl</i> 2.0) and whole-genome sequencing (WGS) were translated into individual algorithm-derived treatment regimens for each patient. We further analyzed if discrepancies between the various methods were due to flaws in the genotypic or phenotypic test using MIC results. Compared with pDST, the average agreement in the number of drugs prescribed in genotypic regimens ranged from just 49% (95% confidence interval [CI], 39 to 59%) for Xpert and 63% (95% CI, 56 to 70%) for LPAs to 93% (95% CI, 88 to 98%) for WGS. Only the WGS regimens did not contain any drugs to which pDST showed resistance. Importantly, MIC testing revealed that pDST likely underestimated the true rate of resistance for key drugs (rifampin, levofloxacin, moxifloxacin, and kanamycin) because critical concentrations (CCs) were too high. WGS can be used to rule in resistance even in M/XDR strains with complex resistance patterns, but pDST for some drugs is still needed to confirm susceptibility and construct the final regimens. Some CCs for pDST need to be reexamined to avoid systematic false-susceptible results in low-level resistant isolates.

    Funded by: Wellcome Trust: WT098600

    Antimicrobial agents and chemotherapy 2018;62;2

  • The contribution of CACNA1A, ATP1A2 and SCN1A mutations in hemiplegic migraine: A clinical and genetic study in Finnish migraine families.

    Hiekkala ME, Vuola P, Artto V, Häppölä P, Häppölä E, Vepsäläinen S, Cuenca-León E, Lal D, Gormley P, Hämäläinen E, Ilmavirta M, Nissilä M, Säkö E, Sumelahti ML, Harno H, Havanka H, Keski-Säntti P, Färkkilä M, Palotie A, Wessman M, Kaunisto MA and Kallela M

    1 Institute of Genetics, Folkhälsan Research Center, Helsinki, Finland.

    Objective To study the position of hemiplegic migraine in the clinical spectrum of migraine with aura and to reveal the importance of CACNA1A, ATP1A2 and SCN1A in the development of hemiplegic migraine in Finnish migraine families. Methods The International Classification of Headache Disorders 3rd edition criteria were used to determine clinical characteristics and occurrence of hemiplegic migraine, based on detailed questionnaires, in a Finnish migraine family collection consisting of 9087 subjects. Involvement of CACNA1A, ATP1A2 and SCN1A was studied using whole exome sequencing data from 293 patients with hemiplegic migraine. Results Overall, hemiplegic migraine patients reported clinically more severe headache and aura episodes than non-hemiplegic migraine with aura patients. We identified two mutations, c.1816G>A (p.Ala606Thr) and c.1148G>A (p.Arg383His), in ATP1A2 and one mutation, c.1994C>T (p.Thr665Met) in CACNA1A. Conclusions The results highlight hemiplegic migraine as a clinically and genetically heterogeneous disease. Hemiplegic migraine patients do not form a clearly separate group with distinct symptoms, but rather have an extreme phenotype in the migraine with aura continuum. We have shown that mutations in CACNA1A, ATP1A2 and SCN1A are not the major cause of the disease in Finnish hemiplegic migraine patients, suggesting that there are additional genetic factors contributing to the phenotype.

    Cephalalgia : an international journal of headache 2018;38;12;1849-1863

  • Dual-stressor selection alters eco-evolutionary dynamics in experimental communities.

    Hiltunen T, Cairns J, Frickel J, Jalasvuori M, Laakso J, Kaitala V, Künzel S, Karakoc E and Becks L

    Department of Microbiology, University of Helsinki, Helsinki, Finland.

    Recognizing when and how rapid evolution drives ecological change is fundamental for our understanding of almost all ecological and evolutionary processes such as community assembly, genetic diversification and the stability of communities and ecosystems. Generally, rapid evolutionary change is driven through selection on genetic variation and is affected by evolutionary constraints, such as tradeoffs and pleiotropic effects, all contributing to the overall rate of evolutionary change. Each of these processes can be influenced by the presence of multiple environmental stressors reducing a population's reproductive output. Potential consequences of multistressor selection for the occurrence and strength of the link from rapid evolution to ecological change are unclear. However, understanding these is necessary for predicting when rapid evolution might drive ecological change. Here we investigate how the presence of two stressors affects this link using experimental evolution with the bacterium Pseudomonas fluorescens and its predator Tetrahymena thermophila. We show that the combination of predation and sublethal antibiotic concentrations delays the evolution of anti-predator defence and antibiotic resistance compared with the presence of only one of the two stressors. Rapid defence evolution drives stabilization of the predator-prey dynamics but this link between evolution and ecology is weaker in the two-stressor environment, where defence evolution is slower, leading to less stable population dynamics. Tracking the molecular evolution of whole populations over time shows further that mutations in different genes are favoured under multistressor selection. Overall, we show that selection by multiple stressors can significantly alter eco-evolutionary dynamics and their predictability.

    Nature ecology & evolution 2018;2;12;1974-1981

  • Integrative Molecular Characterization of Malignant Pleural Mesothelioma.

    Hmeljak J, Sanchez-Vega F, Hoadley KA, Shih J, Stewart C, Heiman D, Tarpey P, Danilova L, Drill E, Gibb EA, Bowlby R, Kanchi R, Osmanbeyoglu HU, Sekido Y, Takeshita J, Newton Y, Graim K, Gupta M, Gay CM, Diao L, Gibbs DL, Thorsson V, Iype L, Kantheti H, Severson DT, Ravegnini G, Desmeules P, Jungbluth AA, Travis WD, Dacic S, Chirieac LR, Galateau-Sallé F, Fujimoto J, Husain AN, Silveira HC, Rusch VW, Rintoul RC, Pass H, Kindler H, Zauderer MG, Kwiatkowski DJ, Bueno R, Tsao AS, Creaney J, Lichtenberg T, Leraas K, Bowen J, TCGA Research Network, Felau I, Zenklusen JC, Akbani R, Cherniack AD, Byers LA, Noble MS, Fletcher JA, Robertson AG, Shen R, Aburatani H, Robinson BW, Campbell P and Ladanyi M

    Department of Pathology and Human Oncology & Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York.

    Malignant pleural mesothelioma (MPM) is a highly lethal cancer of the lining of the chest cavity. To expand our understanding of MPM, we conducted a comprehensive integrated genomic study, including the most detailed analysis of <i>BAP1</i> alterations to date. We identified histology-independent molecular prognostic subsets, and defined a novel genomic subtype with <i>TP53</i> and <i>SETDB1</i> mutations and extensive loss of heterozygosity. We also report strong expression of the immune-checkpoint gene <i>VISTA</i> in epithelioid MPM, strikingly higher than in other solid cancers, with implications for the immune response to MPM and for its immunotherapy. Our findings highlight new avenues for further investigation of MPM biology and novel therapeutic options. SIGNIFICANCE: Through a comprehensive integrated genomic study of 74 MPMs, we provide a deeper understanding of histology-independent determinants of aggressive behavior, define a novel genomic subtype with <i>TP53</i> and <i>SETDB1</i> mutations and extensive loss of heterozygosity, and discovered strong expression of the immune-checkpoint gene <i>VISTA</i> in epithelioid MPM.<i>See related commentary by Aggarwal and Albelda, p. 1508</i>.<i>This article is highlighted in the In This Issue feature, p. 1494</i>.

    Funded by: NCI NIH HHS: K99 CA207871, P30 CA008748, P30 CA016672, R00 CA207871, R50 CA221675, T32 CA009666, U24 CA143799, U24 CA143835, U24 CA143840, U24 CA143843, U24 CA143845, U24 CA143848, U24 CA143858, U24 CA143866, U24 CA143867, U24 CA143882, U24 CA143883, U24 CA144025, U24 CA199461, U24 CA210950, U24 CA210990; NHGRI NIH HHS: T32 HG008345, U54 HG003067, U54 HG003079, U54 HG003273

    Cancer discovery 2018;8;12;1548-1565

  • Congenital macrothrombocytopenia with focal myelofibrosis due to mutations in human G6b-B is rescued in humanized mice.

    Hofmann I, Geer MJ, Vögtle T, Crispin A, Campagna DR, Barr A, Calicchio ML, Heising S, van Geffen JP, Kuijpers MJE, Heemskerk JWM, Eble JA, Schmitz-Abe K, Obeng EA, Douglas M, Freson K, Pondarré C, Favier R, Jarvis GE, Markianos K, Turro E, Ouwehand WH, Mazharian A, Fleming MD and Senis YA

    Division of Hematology, Oncology, and Bone Marrow Transplantation, Department of Pediatrics, University of Wisconsin, Madison, WI.

    Unlike primary myelofibrosis (PMF) in adults, myelofibrosis in children is rare. Congenital (inherited) forms of myelofibrosis (cMF) have been described, but the underlying genetic mechanisms remain elusive. Here we describe 4 families with autosomal recessive inherited macrothrombocytopenia with focal myelofibrosis due to germ line loss-of-function mutations in the megakaryocyte-specific immunoreceptor tyrosine-based inhibitory motif (ITIM)-containing receptor G6b-B (<i>G6b</i>, <i>C6orf25</i>, or <i>MPIG6B</i>). Patients presented with a mild-to-moderate bleeding diathesis, macrothrombocytopenia, anemia, leukocytosis and atypical megakaryocytes associated with a distinctive, focal, perimegakaryocytic pattern of bone marrow fibrosis. In addition to identifying the responsible gene, the description of G6b-B as the mutated protein potentially implicates aberrant G6b-B megakaryocytic signaling and activation in the pathogenesis of myelofibrosis. Targeted insertion of human <i>G6b</i> in mice rescued the knockout phenotype and a copy number effect of human G6b-B expression was observed. Homozygous knockin mice expressed 25% of human G6b-B and exhibited a marginal reduction in platelet count and mild alterations in platelet function; these phenotypes were more severe in heterozygous mice that expressed only 12% of human G6b-B. This study establishes G6b-B as a critical regulator of platelet homeostasis in humans and mice. In addition, the humanized <i>G6b</i> mouse will provide an invaluable tool for further investigating the physiological functions of human G6b-B as well as testing the efficacy of drugs targeting this receptor.

    Funded by: British Heart Foundation: FS/13/1/29894, FS/15/58/31784; Medical Research Council: GBT1564; NIDDK NIH HHS: R24 DK099808

    Blood 2018;132;13;1399-1412

  • Ecto-5'-nucleotidase (CD73) regulates peripheral chemoreceptor activity and cardiorespiratory responses to hypoxia.

    Holmes AP, Ray CJ, Pearson SA, Coney AM and Kumar P

    Institute of Cardiovascular Sciences.

    Key points: Carotid body dysfunction is recognized as a cause of hypertension in a number of cardiorespiratory diseases states and has therefore been identified as a potential therapeutic target. Purinergic transmission is an important element of the carotid body chemotransduction pathway. We show that inhibition of ecto-5'-nucleotidase (CD73) in vitro reduces carotid body basal discharge and responses to hypoxia and mitochondrial inhibition. Additionally, inhibition of CD73 in vivo decreased the hypoxic ventilatory response, reduced the hypoxia-induced heart rate elevation and exaggerated the blood pressure decrease in response to hypoxia. Our data show CD73 to be a novel regulator of carotid body sensory function and therefore suggest that this enzyme may offer a new target for reducing carotid body activity in selected cardiovascular diseases.

    Abstract: Augmented sensory neuronal activity from the carotid body (CB) has emerged as a principal cause of hypertension in a number of cardiovascular related pathologies, including obstructive sleep apnoea, heart failure and diabetes. Development of new targets and pharmacological treatment strategies aiming to reduce CB sensory activity may thus improve outcomes in these key patient cohorts. The present study investigated whether ecto-5'-nucleotidase (CD73), an enzyme that generates adenosine, is functionally important in modifying CB sensory activity and cardiovascular respiratory responses to hypoxia. Inhibition of CD73 by α,β-methylene ADP (AOPCP) in the whole CB preparation in vitro reduced basal discharge frequency by 76 ± 5% and reduced sensory activity throughout graded hypoxia. AOPCP also significantly attenuated elevations in sensory activity evoked by mitochondrial inhibition. These effects were mimicked by antagonism of adenosine receptors with 8-(p-sulfophenyl) theophylline. Infusion of AOPCP in vivo significantly decreased the hypoxic ventilatory response (Δ <mml:math xmlns:mml=""> <mml:mover> <mml:mi>V</mml:mi> <mml:mo>̇</mml:mo> </mml:mover> </mml:math><sub>E</sub> control 74 ± 6%, Δ <mml:math xmlns:mml=""> <mml:mover> <mml:mi>V</mml:mi> <mml:mo>̇</mml:mo> </mml:mover> </mml:math><sub>E</sub> AOPCP 64 ± 5%, P < 0.05). AOPCP also modified cardiovascular responses to hypoxia, as indicated by reduced elevations in heart rate and exaggerated changes in femoral vascular conductance and mean arterial blood pressure. Thus we identify CD73 as a novel regulator of CB sensory activity. Future investigations are warranted to clarify whether inhibition of CD73 can effectively reduce CB activity in CB-mediated cardiovascular pathology.

    Funded by: A. E. Hills Postgraduate Scholarship; College of Medical and Dental Sciences; University of Birmingham

    The Journal of physiology 2018;596;15;3137-3148

  • The Human Cell Atlas: Technical approaches and challenges.

    Hon CC, Shin JW, Carninci P and Stubbington MJT

    RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa, Japan.

    The Human Cell Atlas is a large, international consortium that aims to identify and describe every cell type in the human body. The comprehensive cellular maps that arise from this ambitious effort have the potential to transform many aspects of fundamental biology and clinical practice. Here, we discuss the technical approaches that could be used today to generate such a resource and also the technical challenges that will be encountered.

    Briefings in functional genomics 2018;17;4;283-294

  • SLING: a tool to search for linked genes in bacterial datasets.

    Horesh G, Harms A, Fino C, Parts L, Gerdes K, Heinz E and Thomson NR

    Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom.

    Gene arrays and operons that encode functionally linked proteins form the most basic unit of transcriptional regulation in bacteria. Rules that govern the order and orientation of genes in these systems have been defined; however, these were based on a small set of genomes that may not be representative. The growing availability of large genomic datasets presents an opportunity to test these rules, to define the full range and diversity of these systems, and to understand their evolution. Here we present SLING, a tool to Search for LINked Genes by searching for a single functionally essential gene, along with its neighbours in a rule-defined proximity ( Examining this subset of genes enables us to understand the basic diversity of these genetic systems in large datasets. We demonstrate the utility of SLING on a clinical collection of enteropathogenic Escherichia coli for two relevant operons: toxin antitoxin (TA) systems and RND efflux pumps. By examining the diversity of these systems, we gain insight on distinct classes of operons which present variable levels of prevalence and ability to be lost or gained. The importance of this analysis is not limited to TA systems and RND pumps, and can be expanded to understand the diversity of many other relevant gene arrays.

    Nucleic acids research 2018;46;21;e128

  • Identification of novel adenovirus genotype 90 in children from Bangladesh.

    Houldcroft CJ, Beale MA, Sayeed MA, Qadri F, Dougan G and Mutreja A

    1​Department of Medicine, University of Cambridge, Cambridge, UK.

    Novel adenovirus genotypes are associated with outbreaks of disease, such as acute gastroenteritis, renal disease, upper respiratory tract infection and keratoconjunctivitis. Here, we identify novel and variant adenovirus genotypes in children coinfected with enterotoxigenic Escherichia coli, in Bangladesh. Metagenomic sequencing of stool was performed and whole adenovirus genomes were extracted. A novel species D virus, designated genotype 90 (P33H27F67) was identified, and the partial genome of a putative recombinant species B virus was recovered. Furthermore, the enteric types HAdV-A61 and HAdV-A40 were found in stool specimens. Knowledge of the diversity of adenovirus genomes circulating worldwide, especially in low-income countries where the burden of disease is high, will be required to ensure that future vaccination strategies cover the diversity of adenovirus strains associated with disease.

    Funded by: Department of Health; Wellcome Trust: 098051

    Microbial genomics 2018;4;10

  • DNA Methylation and Transcription Patterns in Intestinal Epithelial Cells From Pediatric Patients With Inflammatory Bowel Diseases Differentiate Disease Subtypes and Associate With Outcome.

    Howell KJ, Kraiczy J, Nayak KM, Gasparetto M, Ross A, Lee C, Mak TN, Koo BK, Kumar N, Lawley T, Sinha A, Rosenstiel P, Heuschkel R, Stegle O and Zilbauer M

    University Department of Paediatrics, University of Cambridge, UK; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK.

    Background &amp; aims: We analyzed DNA methylation patterns and transcriptomes of primary intestinal epithelial cells (IEC) of children newly diagnosed with inflammatory bowel diseases (IBD) to learn more about pathogenesis.

    Methods: We obtained mucosal biopsies (N = 236) collected from terminal ileum and ascending and sigmoid colons of children (median age 13 years) newly diagnosed with IBD (43 with Crohn's disease [CD], 23 with ulcerative colitis [UC]), and 30 children without IBD (controls). Patients were recruited and managed at a hospital in the United Kingdom from 2013 through 2016. We also obtained biopsies collected at later stages from a subset of patients. IECs were purified and analyzed for genome-wide DNA methylation patterns and gene expression profiles. Adjacent microbiota were isolated from biopsies and analyzed by 16S gene sequencing. We generated intestinal organoid cultures from a subset of samples and genome-wide DNA methylation analysis was performed.

    Results: We found gut segment-specific differences in DNA methylation and transcription profiles of IECs from children with IBD vs controls; some were independent of mucosal inflammation. Changes in gut microbiota between IBD and control groups were not as large and were difficult to assess because of large amounts of intra-individual variation. Only IECs from patients with CD had changes in DNA methylation and transcription patterns in terminal ileum epithelium, compared with controls. Colon epithelium from patients with CD and from patients with ulcerative colitis had distinct changes in DNA methylation and transcription patterns, compared with controls. In IECs from patients with IBD, changes in DNA methylation, compared with controls, were stable over time and were partially retained in ex-vivo organoid cultures. Statistical analyses of epithelial cell profiles allowed us to distinguish children with CD or UC from controls; profiles correlated with disease outcome parameters, such as the requirement for treatment with biologic agents.

    Conclusions: We identified specific changes in DNA methylation and transcriptome patterns in IECs from pediatric patients with IBD compared with controls. These data indicate that IECs undergo changes during IBD development and could be involved in pathogenesis. Further analyses of primary IECs from patients with IBD could improve our understanding of the large variations in disease progression and outcomes.

    Funded by: Medical Research Council: MC_PC_12009; Wellcome Trust

    Gastroenterology 2018;154;3;585-598

  • PPM1D Mutations Drive Clonal Hematopoiesis in Response to Cytotoxic Chemotherapy.

    Hsu JI, Dayaram T, Tovy A, De Braekeleer E, Jeong M, Wang F, Zhang J, Heffernan TP, Gera S, Kovacs JJ, Marszalek JR, Bristow C, Yan Y, Garcia-Manero G, Kantarjian H, Vassiliou G, Futreal PA, Donehower LA, Takahashi K and Goodell MA

    Translational Biology and Molecular Medicine Graduate Program and Medical Scientist Training Program, Baylor College of Medicine, Houston, TX 77030, USA; Center for Cell and Gene Therapy, Baylor College of Medicine, Houston, TX 77030, USA.

    Clonal hematopoiesis (CH), in which stem cell clones dominate blood production, becomes increasingly common with age and can presage malignancy development. The conditions that promote ascendancy of particular clones are unclear. We found that mutations in PPM1D (protein phosphatase Mn<sup>2+</sup>/Mg<sup>2+</sup>-dependent 1D), a DNA damage response regulator that is frequently mutated in CH, were present in one-fifth of patients with therapy-related acute myeloid leukemia or myelodysplastic syndrome and strongly correlated with cisplatin exposure. Cell lines with hyperactive PPM1D mutations expand to outcompete normal cells after exposure to cytotoxic DNA damaging agents including cisplatin, and this effect was predominantly mediated by increased resistance to apoptosis. Moreover, heterozygous mutant Ppm1d hematopoietic cells outcompeted their wild-type counterparts in vivo after exposure to cisplatin and doxorubicin, but not during recovery from bone marrow transplantation. These findings establish the clinical relevance of PPM1D mutations in CH and the importance of studying mutation-treatment interactions. VIDEO ABSTRACT.

    Funded by: Cancer Research UK: C22324/A23015; Medical Research Council: MC_PC_12009; NCI NIH HHS: P30 CA016672, P30 CA125123, R01 CA183252; NCRR NIH HHS: S10 RR024574; NHLBI NIH HHS: T32 HL092332; NIAID NIH HHS: P30 AI036211; NIDDK NIH HHS: F30 DK116428, R01 DK092883, R56 DK092883, T32 DK060445; Wellcome Trust: WT098051

    Cell stem cell 2018;23;5;700-713.e6

  • Transformation of Accessible Chromatin and 3D Nucleome Underlies Lineage Commitment of Early T Cells.

    Hu G, Cui K, Fang D, Hirose S, Wang X, Wangsa D, Jin W, Ried T, Liu P, Zhu J, Rothenberg EV and Zhao K

    Systems Biology Center, National Heart, Lung, and Blood Institute, NIH, Bethesda, MD 20892, USA. Electronic address:

    How chromatin reorganization coordinates differentiation and lineage commitment from hematopoietic stem and progenitor cells (HSPCs) to mature immune cells has not been well understood. Here, we carried out an integrative analysis of chromatin accessibility, topologically associating domains, AB compartments, and gene expression from HSPCs to CD4<sup>+</sup>CD8<sup>+</sup> T cells. We found that abrupt genome-wide changes at all three levels of chromatin organization occur during the transition from double-negative stage 2 (DN2) to DN3, accompanying the T lineage commitment. The transcription factor BCL11B, a critical regulator of T cell commitment, is associated with increased chromatin interaction, and Bcl11b deletion compromised chromatin interaction at its target genes. We propose that these large-scale and concerted changes in chromatin organization present an energy barrier to prevent the cell from reversing its fate to earlier stages or redirecting to alternatives and thus lock the cell fate into the T lineages.

    Funded by: Intramural NIH HHS: Z01 HL005801-05; NIAID NIH HHS: R01 AI083514

    Immunity 2018;48;2;227-242.e8

  • Investigation of common, low-frequency and rare genome-wide variation in anorexia nervosa.

    Huckins LM, Hatzikotoulas K, Southam L, Thornton LM, Steinberg J, Aguilera-McKay F, Treasure J, Schmidt U, Gunasinghe C, Romero A, Curtis C, Rhodes D, Moens J, Kalsi G, Dempster D, Leung R, Keohane A, Burghardt R, Ehrlich S, Hebebrand J, Hinney A, Ludolph A, Walton E, Deloukas P, Hofman A, Palotie A, Palta P, van Rooij FJA, Stirrups K, Adan R, Boni C, Cone R, Dedoussis G, van Furth E, Gonidakis F, Gorwood P, Hudson J, Kaprio J, Kas M, Keski-Rahkonen A, Kiezebrink K, Knudsen GP, Slof-Op 't Landt MCT, Maj M, Monteleone AM, Monteleone P, Raevuori AH, Reichborn-Kjennerud T, Tozzi F, Tsitsika A, van Elburg A, Eating Disorder Working Group of the Psychiatric Genomics Consortium, Collier DA, Sullivan PF, Breen G, Bulik CM and Zeggini E

    Department of Human Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Anorexia nervosa (AN) is a complex neuropsychiatric disorder presenting with dangerously low body weight, and a deep and persistent fear of gaining weight. To date, only one genome-wide significant locus associated with AN has been identified. We performed an exome-chip based genome-wide association studies (GWAS) in 2158 cases from nine populations of European origin and 15 485 ancestrally matched controls. Unlike previous studies, this GWAS also probed association in low-frequency and rare variants. Sixteen independent variants were taken forward for in silico and de novo replication (11 common and 5 rare). No findings reached genome-wide significance. Two notable common variants were identified: rs10791286, an intronic variant in OPCML (P=9.89 × 10<sup>-6</sup>), and rs7700147, an intergenic variant (P=2.93 × 10<sup>-5</sup>). No low-frequency variant associations were identified at genome-wide significance, although the study was well-powered to detect low-frequency variants with large effect sizes, suggesting that there may be no AN loci in this genomic search space with large effect sizes.

    Funded by: Department of Health; Medical Research Council: MR/J500355/1; NCI NIH HHS: P50 CA093459, P50 CA097007, R01 CA133996; NIA NIH HHS: P01 AG002219, P50 AG005138; NIEHS NIH HHS: R01 ES011740; NIMH NIH HHS: HHSN271201300031C, K01 MH109782, P50 MH066392, P50 MH084053, R01 MH075916, R01 MH080405, R01 MH085542, R01 MH093725, R01 MH097276, R37 MH057881, U01 MH109528; Wellcome Trust: 098051, WT088827/Z/09

    Molecular psychiatry 2018;23;5;1169-1180

  • Novel viral vectors in infectious diseases.

    Humphreys IR and Sebastian S

    Institute of Infection and Immunity/Systems Immunity University Research Institute, Cardiff University, Cardiff, UK.

    Since the development of vaccinia virus as a vaccine vector in 1984, the utility of numerous viruses in vaccination strategies has been explored. In recent years, key improvements to existing vectors such as those based on adenovirus have led to significant improvements in immunogenicity and efficacy. Furthermore, exciting new vectors that exploit viruses such as cytomegalovirus (CMV) and vesicular stomatitis virus (VSV) have emerged. Herein, we summarize these recent developments in viral vector technologies, focusing on novel vectors based on CMV, VSV, measles and modified adenovirus. We discuss the potential utility of these exciting approaches in eliciting protection against infectious diseases.

    Funded by: Wellcome Trust

    Immunology 2018;153;1;1-9

  • CADM1 is essential for KSHV-encoded vGPCR-and vFLIP-mediated chronic NF-κB activation.

    Hunte R, Alonso P, Thomas R, Bazile CA, Ramos JC, van der Weyden L, Dominguez-Bendala J, Khan WN and Shembade N

    Department of Microbiology and Immunology, Viral Oncology Program, Sylvester Comprehensive Cancer Center, Miller School of Medicine, The University of Miami, Miami, FL, United States of America.

    Approximately 12% of all human cancers worldwide are caused by infections with oncogenic viruses. Kaposi's sarcoma herpesvirus/human herpesvirus 8 (KSHV/HHV8) is one of the oncogenic viruses responsible for human cancers, including Kaposi's sarcoma (KS), Primary Effusion Lymphoma (PEL), and the lymphoproliferative disorder multicentric Castleman's disease (MCD). Chronic inflammation mediated by KSHV infection plays a decisive role in the development and survival of these cancers. NF-κB, a family of transcription factors regulating inflammation, cell survival, and proliferation, is persistently activated in KSHV-infected cells. The KSHV latent and lytic expressing oncogenes involved in NF-κB activation are vFLIP/K13 and vGPCR, respectively. However, the mechanisms by which NF-κB is activated by vFLIP and vGPCR are poorly understood. In this study, we have found that a host molecule, Cell Adhesion Molecule 1 (CADM1), is robustly upregulated in KSHV-infected PBMCs and KSHV-associated PEL cells. Further investigation determined that both vFLIP and vGPCR interacted with CADM1. The PDZ binding motif localized at the carboxyl terminus of CADM1 is essential for both vGPCR and vFLIP to maintain chronic NF-κB activation. Membrane lipid raft associated CADM1 interaction with vFLIP is critical for the initiation of IKK kinase complex and NF-κB activation in the PEL cells. In addition, CADM1 played essential roles in the survival of KSHV-associated PEL cells. These data indicate that CADM1 plays key roles in the activation of NF-κB pathways during latent and lytic phases of the KSHV life cycle and the survival of KSHV-infected cells.

    Funded by: NIH HHS: P30AI073961 , R01CA223232

    PLoS pathogens 2018;14;4;e1006968

  • Defining murine organogenesis at single-cell resolution reveals a role for the leukotriene pathway in regulating blood progenitor formation.

    Ibarra-Soria X, Jawaid W, Pijuan-Sala B, Ladopoulos V, Scialdone A, Jörg DJ, Tyser RCV, Calero-Nieto FJ, Mulas C, Nichols J, Vallier L, Srinivas S, Simons BD, Göttgens B and Marioni JC

    Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK.

    During gastrulation, cell types from all three germ layers are specified and the basic body plan is established <sup>1</sup> . However, molecular analysis of this key developmental stage has been hampered by limited cell numbers and a paucity of markers. Single-cell RNA sequencing circumvents these problems, but has so far been limited to specific organ systems <sup>2</sup> . Here, we report single-cell transcriptomic characterization of >20,000 cells immediately following gastrulation at E8.25 of mouse development. We identify 20 major cell types, which frequently contain substructure, including three distinct signatures in early foregut cells. Pseudo-space ordering of somitic progenitor cells identifies dynamic waves of transcription and candidate regulators, which are validated by molecular characterization of spatially resolved regions of the embryo. Within the endothelial population, cells that transition from haemogenic endothelial to erythro-myeloid progenitors specifically express Alox5 and its co-factor Alox5ap, which control leukotriene production. Functional assays using mouse embryonic stem cells demonstrate that leukotrienes promote haematopoietic progenitor cell generation. Thus, this comprehensive single-cell map can be exploited to reveal previously unrecognized pathways that contribute to tissue development.

    Funded by: Bloodwise: 12029; Medical Research Council: G0900951, MC_PC_12009, MR/M008975/1; NIDDK NIH HHS: R24 DK106766; Wellcome Trust: 105031REIK, 206328

    Nature cell biology 2018;20;2;127-134

  • Physiological and Genetic Adaptations to Diving in Sea Nomads.

    Ilardo MA, Moltke I, Korneliussen TS, Cheng J, Stern AJ, Racimo F, de Barros Damgaard P, Sikora M, Seguin-Orlando A, Rasmussen S, van den Munckhof ICL, Ter Horst R, Joosten LAB, Netea MG, Salingkat S, Nielsen R and Willerslev E

    Centre for GeoGenetics, University of Copenhagen, Copenhagen 1350, Denmark.

    Understanding the physiology and genetics of human hypoxia tolerance has important medical implications, but this phenomenon has thus far only been investigated in high-altitude human populations. Another system, yet to be explored, is humans who engage in breath-hold diving. The indigenous Bajau people ("Sea Nomads") of Southeast Asia live a subsistence lifestyle based on breath-hold diving and are renowned for their extraordinary breath-holding abilities. However, it is unknown whether this has a genetic basis. Using a comparative genomic study, we show that natural selection on genetic variants in the PDE10A gene have increased spleen size in the Bajau, providing them with a larger reservoir of oxygenated red blood cells. We also find evidence of strong selection specific to the Bajau on BDKRB2, a gene affecting the human diving reflex. Thus, the Bajau, and possibly other diving populations, provide a new opportunity to study human adaptation to hypoxia tolerance. VIDEO ABSTRACT.

    Cell 2018;173;3;569-580.e15

  • The genomic landscape of cutaneous SCC reveals drivers and a novel azathioprine associated mutational signature.

    Inman GJ, Wang J, Nagano A, Alexandrov LB, Purdie KJ, Taylor RG, Sherwood V, Thomson J, Hogan S, Spender LC, South AP, Stratton M, Chelala C, Harwood CA, Proby CM and Leigh IM

    Division of Cancer Research, Jacqui Wood Cancer Centre, School of Medicine, University of Dundee, Dundee, DD1 9SY, UK.

    Cutaneous squamous cell carcinoma (cSCC) has a high tumour mutational burden (50 mutations per megabase DNA pair). Here, we combine whole-exome analyses from 40 primary cSCC tumours, comprising 20 well-differentiated and 20 moderately/poorly differentiated tumours, with accompanying clinical data from a longitudinal study of immunosuppressed and immunocompetent patients and integrate this analysis with independent gene expression studies. We identify commonly mutated genes, copy number changes and altered pathways and processes. Comparisons with tumour differentiation status suggest events which may drive disease progression. Mutational signature analysis reveals the presence of a novel signature (signature 32), whose incidence correlates with chronic exposure to the immunosuppressive drug azathioprine. Characterisation of a panel of 15 cSCC tumour-derived cell lines reveals that they accurately reflect the mutational signatures and genomic alterations of primary tumours and provide a valuable resource for the validation of tumour drivers and therapeutic targets.

    Funded by: Cancer Research UK: 13044; Cancer Research UK (CRUK): A13044; European Research Council: 250170

    Nature communications 2018;9;1;3667

  • Genomic Risk Prediction of Coronary Artery Disease in 480,000 Adults: Implications for Primary Prevention.

    Inouye M, Abraham G, Nelson CP, Wood AM, Sweeting MJ, Dudbridge F, Lai FY, Kaptoge S, Brozynska M, Wang T, Ye S, Webb TR, Rutter MK, Tzoulaki I, Patel RS, Loos RJF, Keavney B, Hemingway H, Thompson J, Watkins H, Deloukas P, Di Angelantonio E, Butterworth AS, Danesh J, Samani NJ and UK Biobank CardioMetabolic Consortium CHD Working Group

    Cambridge Baker Systems Genomics Initiative, Melbourne, Victoria, Australia, and Cambridge, United Kingdom; Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia; MRC/BHF Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom; Department of Clinical Pathology and School of BioSciences, University of Melbourne, Parkville, Victoria, Australia; The Alan Turing Institute, London, United Kingdom. Electronic address:

    Background: Coronary artery disease (CAD) has substantial heritability and a polygenic architecture. However, the potential of genomic risk scores to help predict CAD outcomes has not been evaluated comprehensively, because available studies have involved limited genomic scope and limited sample sizes.

    Objectives: This study sought to construct a genomic risk score for CAD and to estimate its potential as a screening tool for primary prevention.

    Methods: Using a meta-analytic approach to combine large-scale, genome-wide, and targeted genetic association data, we developed a new genomic risk score for CAD (metaGRS) consisting of 1.7 million genetic variants. We externally tested metaGRS, both by itself and in combination with available data on conventional risk factors, in 22,242 CAD cases and 460,387 noncases from the UK Biobank.

    Results: The hazard ratio (HR) for CAD was 1.71 (95% confidence interval [CI]: 1.68 to 1.73) per SD increase in metaGRS, an association larger than any other externally tested genetic risk score previously published. The metaGRS stratified individuals into significantly different life course trajectories of CAD risk, with those in the top 20% of metaGRS distribution having an HR of 4.17 (95% CI: 3.97 to 4.38) compared with those in the bottom 20%. The corresponding HR was 2.83 (95% CI: 2.61 to 3.07) among individuals on lipid-lowering or antihypertensive medications. The metaGRS had a higher C-index (C = 0.623; 95% CI: 0.615 to 0.631) for incident CAD than any of 6 conventional factors (smoking, diabetes, hypertension, body mass index, self-reported high cholesterol, and family history). For men in the top 20% of metaGRS with >2 conventional factors, 10% cumulative risk of CAD was reached by 48 years of age.

    Conclusions: The genomic score developed and evaluated here substantially advances the concept of using genomic information to stratify individuals with different trajectories of CAD risk and highlights the potential for genomic screening in early life to complement conventional risk prediction.

    Funded by: British Heart Foundation: RG/13/13/30194, RG/14/5/30893, RG/15/12/31616; Medical Research Council: MC_PC_17228, MC_QA137853, MR/K006584/1, MR/L003120/1

    Journal of the American College of Cardiology 2018;72;16;1883-1893

  • Comparative genomics of the major parasitic worms.

    International Helminth Genomes Consortium

    Parasitic nematodes (roundworms) and platyhelminths (flatworms) cause debilitating chronic infections of humans and animals, decimate crop production and are a major impediment to socioeconomic development. Here we report a broad comparative study of 81 genomes of parasitic and non-parasitic worms. We have identified gene family births and hundreds of expanded gene families at key nodes in the phylogeny that are relevant to parasitism. Examples include gene families that modulate host immune responses, enable parasite migration though host tissues or allow the parasite to feed. We reveal extensive lineage-specific differences in core metabolism and protein families historically targeted for drug development. From an in silico screen, we have identified and prioritized new potential drug targets and compounds for testing. This comparative genomics resource provides a much-needed boost for the research community to understand and combat parasitic worms.

    Funded by: Biotechnology and Biological Sciences Research Council: REI18431; Medical Research Council: MR/L001020/1, MR/S000453/1; NHGRI NIH HHS: U54 HG003079; NIAID NIH HHS: K22 AI125473, R01 AI081803, R21 AI126466; NIGMS NIH HHS: R01 GM097435; Wellcome Trust

    Nature genetics 2018;51;1;163-174

  • Genome-wide mega-analysis identifies 16 loci and highlights diverse biological mechanisms in the common epilepsies.

    International League Against Epilepsy Consortium on Complex Epilepsies

    The epilepsies affect around 65 million people worldwide and have a substantial missing heritability component. We report a genome-wide mega-analysis involving 15,212 individuals with epilepsy and 29,677 controls, which reveals 16 genome-wide significant loci, of which 11 are novel. Using various prioritization criteria, we pinpoint the 21 most likely epilepsy genes at these loci, with the majority in genetic generalized epilepsies. These genes have diverse biological functions, including coding for ion-channel subunits, transcription factors and a vitamin-B6 metabolism enzyme. Converging evidence shows that the common variants associated with epilepsy play a role in epigenetic regulation of gene expression in the brain. The results show an enrichment for monogenic epilepsy genes as well as known targets of antiepileptic drugs. Using SNP-based heritability analyses we disentangle both the unique and overlapping genetic basis to seven different epilepsy subtypes. Together, these findings provide leads for epilepsy therapies based on underlying pathophysiology.

    Funded by: Medical Research Council: G0800637

    Nature communications 2018;9;1;5269

  • Unsupervised correction of gene-independent cell responses to CRISPR-Cas9 targeting.

    Iorio F, Behan FM, Gonçalves E, Bhosle SG, Chen E, Shepherd R, Beaver C, Ansari R, Pooley R, Wilkinson P, Harper S, Butler AP, Stronach EA, Saez-Rodriguez J, Yusa K and Garnett MJ

    European Molecular Biology Laboratory - European Bioinformatics Institute, Cambridge, UK.

    Background: Genome editing by CRISPR-Cas9 technology allows large-scale screening of gene essentiality in cancer. A confounding factor when interpreting CRISPR-Cas9 screens is the high false-positive rate in detecting essential genes within copy number amplified regions of the genome. We have developed the computational tool CRISPRcleanR which is capable of identifying and correcting gene-independent responses to CRISPR-Cas9 targeting. CRISPRcleanR uses an unsupervised approach based on the segmentation of single-guide RNA fold change values across the genome, without making any assumption about the copy number status of the targeted genes.

    Results: Applying our method to existing and newly generated genome-wide essentiality profiles from 15 cancer cell lines, we demonstrate that CRISPRcleanR reduces false positives when calling essential genes, correcting biases within and outside of amplified regions, while maintaining true positive rates. Established cancer dependencies and essentiality signals of amplified cancer driver genes are detectable post-correction. CRISPRcleanR reports sgRNA fold changes and normalised read counts, is therefore compatible with downstream analysis tools, and works with multiple sgRNA libraries.

    Conclusions: CRISPRcleanR is a versatile open-source tool for the analysis of CRISPR-Cas9 knockout screens to identify essential genes.

    Funded by: Cancer Research UK: C44943/A22536, SU2C-AACR-DT1213; Open Targets: 015; Wellcome Trust (GB): 102696

    BMC genomics 2018;19;1;604

  • Pathway-based dissection of the genomic heterogeneity of cancer hallmarks' acquisition with SLAPenrich.

    Iorio F, Garcia-Alonso L, Brammeld JS, Martincorena I, Wille DR, McDermott U and Saez-Rodriguez J

    European Molecular Biology Laboratory - European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, CB10 1SD, UK.

    Cancer hallmarks are evolutionary traits required by a tumour to develop. While extensively characterised, the way these traits are achieved through the accumulation of somatic mutations in key biological pathways is not fully understood. To shed light on this subject, we characterised the landscape of pathway alterations associated with somatic mutations observed in 4,415 patients across ten cancer types, using 374 orthogonal pathway gene-sets mapped onto canonical cancer hallmarks. Towards this end, we developed SLAPenrich: a computational method based on population-level statistics, freely available as an open source R package. Assembling the identified pathway alterations into sets of hallmark signatures allowed us to connect somatic mutations to clinically interpretable cancer mechanisms. Further, we explored the heterogeneity of these signatures, in terms of ratio of altered pathways associated with each individual hallmark, assuming that this is reflective of the extent of selective advantage provided to the cancer type under consideration. Our analysis revealed the predominance of certain hallmarks in specific cancer types, thus suggesting different evolutionary trajectories across cancer lineages. Finally, although many pathway alteration enrichments are guided by somatic mutations in frequently altered high-confidence cancer genes, excluding these driver mutations preserves the hallmark heterogeneity signatures, thus the detected hallmarks' predominance across cancer types. As a consequence, we propose the hallmark signatures as a ground truth to characterise tails of infrequent genomic alterations and identify potential novel cancer driver genes and networks.

    Scientific reports 2018;8;1;6713

  • Cancer-mutation network and the number and specificity of driver mutations.

    Iranzo J, Martincorena I and Koonin EV

    National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894;

    Cancer genomics has produced extensive information on cancer-associated genes, but the number and specificity of cancer-driver mutations remains a matter of debate. We constructed a bipartite network in which 7,665 tumors from 30 cancer types are connected via shared mutations in 198 previously identified cancer genes. We show that about 27% of the tumors can be assigned to statistically supported modules, most of which encompass one or two cancer types. The rest of the tumors belong to a diffuse network component suggesting lower gene specificity of driver mutations. Linear regression of the mutational loads in cancer genes was used to estimate the number of drivers required for the onset of different cancers. The mean number of drivers in known cancer genes is approximately two, with a range of one to five. Cancers that are associated with modules had more drivers than those from the diffuse network component, suggesting that unidentified and/or interchangeable drivers exist in the latter.

    Proceedings of the National Academy of Sciences of the United States of America 2018;115;26;E6010-E6019

  • Surveying what's flushed away.

    Iraola G and Kumar N

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

    Nature reviews. Microbiology 2018;16;8;456

  • Genome-wide transcriptional analyses in Anopheles mosquitoes reveal an unexpected association between salivary gland gene expression and insecticide resistance.

    Isaacs AT, Mawejje HD, Tomlinson S, Rigden DJ and Donnelly MJ

    Department of Vector Biology, Liverpool School of Tropical Medicine, Liverpool, UK.

    Background: To combat malaria transmission, the Ugandan government has embarked upon an ambitious programme of indoor residual spraying (IRS) with a carbamate class insecticide, bendiocarb. In preparation for this campaign, we characterized bendiocarb resistance and associated transcriptional variation among Anopheles gambiae s.s. mosquitoes from two sites in Uganda.

    Results: Gene expression in two mosquito populations displaying some resistance to bendiocarb (95% and 79% An. gambiae s.l. WHO tube bioassay mortality in Nagongera and Kihihi, respectively) was investigated using whole-genome microarrays. Significant overexpression of several genes encoding salivary gland proteins, including D7r2 and D7r4, was detected in mosquitoes from Nagongera. In Kihihi, D7r4, two detoxification-associated genes (Cyp6m2 and Gstd3) and an epithelial serine protease were among the genes most highly overexpressed in resistant mosquitoes. Following the first round of IRS in Nagongera, bendiocarb-resistant mosquitoes were collected, and real-time quantitative PCR analyses detected significant overexpression of D7r2 and D7r4 in resistant mosquitoes. A single nucleotide polymorphism located in a non-coding transcript downstream of the D7 genes was found at a significantly higher frequency in resistant individuals. In silico modelling of the interaction between D7r4 and bendiocarb demonstrated similarity between the insecticide and serotonin, a known ligand of D7 proteins. A meta-analysis of published microarray studies revealed a recurring association between D7 expression and insecticide resistance across Anopheles species and locations.

    Conclusions: A whole-genome microarray approach identified an association between novel insecticide resistance candidates and bendiocarb resistance in Uganda. In addition, a single nucleotide polymorphism associated with this resistance mechanism was discovered. The use of such impartial screening methods allows for discovery of resistance candidates that have no previously-ascribed function in insecticide binding or detoxification. Characterizing these novel candidates will broaden our understanding of resistance mechanisms and yield new strategies for combatting widespread insecticide resistance among malaria vectors.

    Funded by: NIAID NIH HHS: U19 AI089674; National Institute of Allergy and Infectious Diseases: U19AI089674

    BMC genomics 2018;19;1;225

  • No unexpected CRISPR-Cas9 off-target activity revealed by trio sequencing of gene-edited mice.

    Iyer V, Boroviak K, Thomas M, Doe B, Riva L, Ryder E and Adams DJ

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom.

    CRISPR-Cas9 technologies have transformed genome-editing of experimental organisms and have immense therapeutic potential. Despite significant advances in our understanding of the CRISPR-Cas9 system, concerns remain over the potential for off-target effects. Recent studies have addressed these concerns using whole-genome sequencing (WGS) of gene-edited embryos or animals to search for de novo mutations (DNMs), which may represent candidate changes introduced by poor editing fidelity. Critically, these studies used strain-matched, but not pedigree-matched controls and thus were unable to reliably distinguish generational or colony-related differences from true DNMs. Here we used a trio design and whole genome sequenced 8 parents and 19 embryos, where 10 of the embryos were mutagenised with well-characterised gRNAs targeting the coat colour Tyrosinase (Tyr) locus. Detailed analyses of these whole genome data allowed us to conclude that if CRISPR mutagenesis were causing SNV or indel off-target mutations in treated embryos, then the number of these mutations is not statistically distinguishable from the background rate of DNMs occurring due to other processes.

    Funded by: Wellcome Trust

    PLoS genetics 2018;14;7;e1007503

  • Meta-analysis of exome array data identifies six novel genetic loci for lung function.

    Jackson VE, Latourelle JC, Wain LV, Smith AV, Grove ML, Bartz TM, Obeidat M, Province MA, Gao W, Qaiser B, Porteous DJ, Cassano PA, Ahluwalia TS, Grarup N, Li J, Altmaier E, Marten J, Harris SE, Manichaikul A, Pottinger TD, Li-Gao R, Lind-Thomsen A, Mahajan A, Lahousse L, Imboden M, Teumer A, Prins B, Lyytikäinen LP, Eiriksdottir G, Franceschini N, Sitlani CM, Brody JA, Bossé Y, Timens W, Kraja A, Loukola A, Tang W, Liu Y, Bork-Jensen J, Justesen JM, Linneberg A, Lange LA, Rawal R, Karrasch S, Huffman JE, Smith BH, Davies G, Burkart KM, Mychaleckyj JC, Bonten TN, Enroth S, Lind L, Brusselle GG, Kumar A, Stubbe B, Understanding Society Scientific Group, Kähönen M, Wyss AB, Psaty BM, Heckbert SR, Hao K, Rantanen T, Kritchevsky SB, Lohman K, Skaaby T, Pisinger C, Hansen T, Schulz H, Polasek O, Campbell A, Starr JM, Rich SS, Mook-Kanamori DO, Johansson Å, Ingelsson E, Uitterlinden AG, Weiss S, Raitakari OT, Gudnason V, North KE, Gharib SA, Sin DD, Taylor KD, O'Connor GT, Kaprio J, Harris TB, Pederson O, Vestergaard H, Wilson JG, Strauch K, Hayward C, Kerr S, Deary IJ, Barr RG, de Mutsert R, Gyllensten U, Morris AP, Ikram MA, Probst-Hensch N, Gläser S, Zeggini E, Lehtimäki T, Strachan DP, Dupuis J, Morrison AC, Hall IP, Tobin MD and London SJ

    Department of Health Sciences, University of Leicester, Leicester, UK.

    <b>Background:</b> Over 90 regions of the genome have been associated with lung function to date, many of which have also been implicated in chronic obstructive pulmonary disease. <b>Methods:</b> We carried out meta-analyses of exome array data and three lung function measures: forced expiratory volume in one second (FEV <sub>1</sub>), forced vital capacity (FVC) and the ratio of FEV <sub>1</sub> to FVC (FEV <sub>1</sub>/FVC). These analyses by the SpiroMeta and CHARGE consortia included 60,749 individuals of European ancestry from 23 studies, and 7,721 individuals of African Ancestry from 5 studies in the discovery stage, with follow-up in up to 111,556 independent individuals. <b>Results:</b> We identified significant (P<2·8x10 <sup>-7</sup>) associations with six SNPs: a nonsynonymous variant in <i>RPAP1</i>, which is predicted to be damaging, three intronic SNPs ( <i>SEC24C, CASC17</i> and <i>UQCC1</i>) and two intergenic SNPs near to <i>LY86</i> and <i>FGF10.</i> Expression quantitative trait loci analyses found evidence for regulation of gene expression at three signals and implicated several genes, including <i>TYRO3</i> and <i>PLAU</i>. <b>Conclusions:</b> Further interrogation of these loci could provide greater understanding of the determinants of lung function and pulmonary disease.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F019394/1; Medical Research Council: MC_UU_00007/10, MR/K026992/1, MR/N011317/1; NHLBI NIH HHS: R01 HL105756, U01 HL130114

    Wellcome open research 2018;3;4

  • Schistosoma mansoni infection is associated with quantitative and qualitative modifications of the mammalian intestinal microbiota.

    Jenkins TP, Peachey LE, Ajami NJ, MacDonald AS, Hsieh MH, Brindley PJ, Cantacessi C and Rinaldi G

    Department of Veterinary Medicine, University of Cambridge, Cambridge, CB3 0ES, UK.

    In spite of the extensive contribution of intestinal pathology to the pathophysiology of schistosomiasis, little is known of the impact of schistosome infection on the composition of the gut microbiota of its mammalian host. Here, we characterised the fluctuations in the composition of the gut microbial flora of the small and large intestine, as well as the changes in abundance of individual microbial species, of mice experimentally infected with Schistosoma mansoni with the goal of identifying microbial taxa with potential roles in the pathophysiology of infection and disease. Bioinformatic analyses of bacterial 16S rRNA gene data revealed an overall reduction in gut microbial alpha diversity, alongside a significant increase in microbial beta diversity characterised by expanded populations of Akkermansia muciniphila (phylum Verrucomicrobia) and lactobacilli, in the gut microbiota of S. mansoni-infected mice when compared to uninfected control animals. These data support a role of the mammalian gut microbiota in the pathogenesis of hepato-intestinal schistosomiasis and serves as a foundation for the design of mechanistic studies to unravel the complex relationships amongst parasitic helminths, gut microbiota, pathophysiology of infection and host immunity.

    Funded by: Biotechnology and Biological Sciences Research Council; HHSN272201000005I; NIAID NIH HHS: R01 AI072773, R21 AI109532

    Scientific reports 2018;8;1;12072

  • Single cell RNA-seq and ATAC-seq analysis of cardiac progenitor cell transition states and lineage settlement.

    Jia G, Preussner J, Chen X, Guenther S, Yuan X, Yekelchyk M, Kuenne C, Looso M, Zhou Y, Teichmann S and Braun T

    Department of Cardiac Development and Remodeling, Max Planck Institute for Heart and Lung Research, 61231, Bad Nauheim, Germany.

    Formation and segregation of cell lineages forming the heart have been studied extensively but the underlying gene regulatory networks and epigenetic changes driving cell fate transitions during early cardiogenesis are still only partially understood. Here, we comprehensively characterize mouse cardiac progenitor cells (CPCs) marked by Nkx2-5 and Isl1 expression from E7.5 to E9.5 using single-cell RNA sequencing and transposase-accessible chromatin profiling (ATAC-seq). By leveraging on cell-to-cell transcriptome and chromatin accessibility heterogeneity, we identify different previously unknown cardiac subpopulations. Reconstruction of developmental trajectories reveal that multipotent Isl1<sup>+</sup> CPC pass through an attractor state before separating into different developmental branches, whereas extended expression of Nkx2-5 commits CPC to an unidirectional cardiomyocyte fate. Furthermore, we show that CPC fate transitions are associated with distinct open chromatin states critically depending on Isl1 and Nkx2-5. Our data provide a model of transcriptional and epigenetic regulations during cardiac progenitor cell fate decisions at single-cell resolution.

    Nature communications 2018;9;1;4877

  • Genome-wide association study in 79,366 European-ancestry individuals informs the genetic architecture of 25-hydroxyvitamin D levels.

    Jiang X, O'Reilly PF, Aschard H, Hsu YH, Richards JB, Dupuis J, Ingelsson E, Karasik D, Pilz S, Berry D, Kestenbaum B, Zheng J, Luan J, Sofianopoulou E, Streeten EA, Albanes D, Lutsey PL, Yao L, Tang W, Econs MJ, Wallaschofski H, Völzke H, Zhou A, Power C, McCarthy MI, Michos ED, Boerwinkle E, Weinstein SJ, Freedman ND, Huang WY, Van Schoor NM, van der Velde N, Groot LCPGM, Enneman A, Cupples LA, Booth SL, Vasan RS, Liu CT, Zhou Y, Ripatti S, Ohlsson C, Vandenput L, Lorentzon M, Eriksson JG, Shea MK, Houston DK, Kritchevsky SB, Liu Y, Lohman KK, Ferrucci L, Peacock M, Gieger C, Beekman M, Slagboom E, Deelen J, Heemst DV, Kleber ME, März W, de Boer IH, Wood AC, Rotter JI, Rich SS, Robinson-Cohen C, den Heijer M, Jarvelin MR, Cavadino A, Joshi PK, Wilson JF, Hayward C, Lind L, Michaëlsson K, Trompet S, Zillikens MC, Uitterlinden AG, Rivadeneira F, Broer L, Zgaga L, Campbell H, Theodoratou E, Farrington SM, Timofeeva M, Dunlop MG, Valdes AM, Tikkanen E, Lehtimäki T, Lyytikäinen LP, Kähönen M, Raitakari OT, Mikkilä V, Ikram MA, Sattar N, Jukema JW, Wareham NJ, Langenberg C, Forouhi NG, Gundersen TE, Khaw KT, Butterworth AS, Danesh J, Spector T, Wang TJ, Hyppönen E, Kraft P and Kiel DP

    Program in Genetic Epidemiology and Statistical Genetics. Department of Epidemiology, Harvard T.H.Chan School of Public Health, 677 Huntington Avenue, Boston, 02115, MA, USA.

    Vitamin D is a steroid hormone precursor that is associated with a range of human traits and diseases. Previous GWAS of serum 25-hydroxyvitamin D concentrations have identified four genome-wide significant loci (GC, NADSYN1/DHCR7, CYP2R1, CYP24A1). In this study, we expand the previous SUNLIGHT Consortium GWAS discovery sample size from 16,125 to 79,366 (all European descent). This larger GWAS yields two additional loci harboring genome-wide significant variants (P = 4.7×10<sup>-9</sup> at rs8018720 in SEC23A, and P = 1.9×10<sup>-14</sup> at rs10745742 in AMDHD1). The overall estimate of heritability of 25-hydroxyvitamin D serum concentrations attributable to GWAS common SNPs is 7.5%, with statistically significant loci explaining 38% of this total. Further investigation identifies signal enrichment in immune and hematopoietic tissues, and clustering with autoimmune diseases in cell-type-specific analysis. Larger studies are required to identify additional common SNPs, and to explore the role of rare or structural variants and gene-gene interactions in the heritability of circulating 25-hydroxyvitamin D levels.

    Funded by: British Heart Foundation: PG/09/023/26806, RG/08/014/24067; Cancer Research UK: 12076, 14136; Department of Health: PHCS/C4/4/016; Medical Research Council: G0401527, G0600237, G0600329, G0601653, G1000143, G1001799, MC_PC_U127527198, MC_PC_U127561128, MC_U127527198, MC_UU_00007/1, MC_UU_12015/1, MC_UU_12015/5, MR/K018647/1, MR/L003120/1, MR/N003284/1, MR/N01104X/1, MR/N01104X/2, MR/N015746/1; NCATS NIH HHS: UL1 TR001881; NIAMS NIH HHS: R01 AR041398, R01 AR072199; NIDDK NIH HHS: K01 DK109019, P30 DK063491; Worldwide Cancer Research: 12-1087

    Nature communications 2018;9;1;260

  • Comparison of Salmonella enterica Serovars Typhi and Typhimurium Reveals Typhoidal Serovar-Specific Responses to Bile.

    Johnson R, Ravenhall M, Pickard D, Dougan G, Byrne A and Frankel G

    MRC Centre for Molecular Bacteriology and Infection, Department of Life Sciences, Imperial College London, London, United Kingdom.

    <i>Salmonella enterica</i> serovars Typhi and Typhimurium cause typhoid fever and gastroenteritis, respectively. A unique feature of typhoid infection is asymptomatic carriage within the gallbladder, which is linked with <i>S</i> Typhi transmission. Despite this, <i>S</i> Typhi responses to bile have been poorly studied. Transcriptome sequencing (RNA-Seq) of <i>S</i> Typhi Ty2 and a clinical <i>S</i> Typhi isolate belonging to the globally dominant H58 lineage (strain 129-0238), as well as <i>S</i> Typhimurium 14028, revealed that 249, 389, and 453 genes, respectively, were differentially expressed in the presence of 3% bile compared to control cultures lacking bile. <i>fad</i> genes, the <i>actP-acs</i> operon, and putative sialic acid uptake and metabolism genes (t1787 to t1790) were upregulated in all strains following bile exposure, which may represent adaptation to the small intestine environment. Genes within the <i>Salmonella</i> pathogenicity island 1 (SPI-1), those encoding a type IIII secretion system (T3SS), and motility genes were significantly upregulated in both <i>S</i> Typhi strains in bile but downregulated in <i>S</i> Typhimurium. Western blots of the SPI-1 proteins SipC, SipD, SopB, and SopE validated the gene expression data. Consistent with this, bile significantly increased <i>S</i> Typhi HeLa cell invasion, while <i>S</i> Typhimurium invasion was significantly repressed. Protein stability assays demonstrated that in <i>S</i> Typhi the half-life of HilD, the dominant regulator of SPI-1, is three times longer in the presence of bile; this increase in stability was independent of the acetyltransferase Pat. Overall, we found that <i>S</i> Typhi exhibits a specific response to bile, especially with regard to virulence gene expression, which could impact pathogenesis and transmission.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/J014567/1; Medical Research Council: MR/J006874/1; Wellcome Trust

    Infection and immunity 2018;86;3

  • COSMIC-3D provides structural perspectives on cancer genetics for drug discovery.

    Jubb HC, Saini HK, Verdonk ML and Forbes SA

    COSMIC, Wellcome Sanger Institute, Cambridge, UK.

    Funded by: Wellcome Trust

    Nature genetics 2018;50;9;1200-1202

  • Advances in the generation of bioengineered bile ducts.

    Justin AW, Saeb-Parsy K, Markaki AE, Vallier L and Sampaziotis F

    Department of Engineering, University of Cambridge, Cambridge, UK. Electronic address:

    The generation of bioengineered biliary tissue could contribute to the management of some of the most impactful cholangiopathies associated with liver transplantation, such as biliary atresia or ischemic cholangiopathy. Recent advances in tissue engineering and in vitro cholangiocyte culture have made the achievement of this goal possible. Here we provide an overview of these developments and review the progress towards the generation and transplantation of bioengineered bile ducts. This article is part of a Special Issue entitled: Cholangiocytes in Health and Diseaseedited by Jesus Banales, Marco Marzioni and Peter Jansen.

    Funded by: Medical Research Council: MC_PC_12009, MR/L016761/1

    Biochimica et biophysica acta. Molecular basis of disease 2018;1864;4 Pt B;1532-1538

  • Reply to Dookie et al., "Whole-Genome Sequencing To Guide the Selection of Treatment for Drug-Resistant Tuberculosis".

    Köser CU, Heyckendorf J, Andres S, Olaru ID, Schön T, Sturegård E, Beckert P, Schleusener V, Kohl TA, Hillemann D, Moradigaravand D, Parkhill J, Peacock SJ, Niemann S, Lange C and Merker M

    Department of Genetics, University of Cambridge, Cambridge, United Kingdom.

    Funded by: Department of Health: HICF-T5-342 ; Wellcome Trust: WT098600

    Antimicrobial agents and chemotherapy 2018;62;8

  • Whole-exome sequencing of a meningeal melanocytic tumour reveals activating CYSLTR2 and EIF1AX hotspot mutations and similarities to uveal melanoma.

    Küsters-Vandevelde HVN, Germans MR, Rabbie R, Rashid M, Ten Broek R, Blokx WAM, Prinsen CFM, Adams DJ and Ter Laan M

    Department of Pathology, Canisius Wilhelmina Hospital, P.O. Box 9015, 6500 GS, Nijmegen, The Netherlands.

    Funded by: Cancer Research UK: C20510/A13031; Wellcome Trust: 077012/Z/05/Z

    Brain tumor pathology 2018;35;2;127-130

  • KILchip v1.0: A Novel Plasmodium falciparum Merozoite Protein Microarray to Facilitate Malaria Vaccine Candidate Prioritization.

    Kamuyu G, Tuju J, Kimathi R, Mwai K, Mburu J, Kibinge N, Chong Kwan M, Hawkings S, Yaa R, Chepsat E, Njunge JM, Chege T, Guleid F, Rosenkranz M, Kariuki CK, Frank R, Kinyanjui SM, Murungi LM, Bejon P, Färnert A, Tetteh KKA, Beeson JG, Conway DJ, Marsh K, Rayner JC and Osier FHA

    KEMRI-Wellcome Trust Research Programme, Centre for Geographic Medicine Research-Coast, Kilifi, Kenya.

    Passive transfer studies in humans clearly demonstrated the protective role of IgG antibodies against malaria. Identifying the precise parasite antigens that mediate immunity is essential for vaccine design, but has proved difficult. Completion of the <i>Plasmodium falciparum</i> genome revealed thousands of potential vaccine candidates, but a significant bottleneck remains in their validation and prioritization for further evaluation in clinical trials. Focusing initially on the <i>Plasmodium falciparum</i> merozoite proteome, we used peer-reviewed publications, multiple proteomic and bioinformatic approaches, to select and prioritize potential immune targets. We expressed 109 <i>P. falciparum</i> recombinant proteins, the majority of which were obtained using a mammalian expression system that has been shown to produce biologically functional extracellular proteins, and used them to create KILchip v1.0: a novel protein microarray to facilitate high-throughput multiplexed antibody detection from individual samples. The microarray assay was highly specific; antibodies against <i>P. falciparum</i> proteins were detected exclusively in sera from malaria-exposed but not malaria-naïve individuals. The intensity of antibody reactivity varied as expected from strong to weak across well-studied antigens such as AMA1 and RH5 (Kruskal-Wallis H test for trend: <i>p</i> < 0.0001). The inter-assay and intra-assay variability was minimal, with reproducible results obtained in re-assays using the same chip over a duration of 3 months. Antibodies quantified using the multiplexed format in KILchip v1.0 were highly correlated with those measured in the gold-standard monoplex ELISA [median (range) Spearman's R of 0.84 (0.65-0.95)]. KILchip v1.0 is a robust, scalable and adaptable protein microarray that has broad applicability to studies of naturally acquired immunity against malaria by providing a standardized tool for the detection of antibody correlates of protection. It will facilitate rapid high-throughput validation and prioritization of potential <i>Plasmodium falciparum</i> merozoite-stage antigens paving the way for urgently needed clinical trials for the next generation of malaria vaccines.

    Funded by: Medical Research Council: MR/L00450X/1, MR/M003906/1, MR/P020321/1; Wellcome Trust

    Frontiers in immunology 2018;9;2866

  • Biology and genome of a newly discovered sibling species of Caenorhabditis elegans.

    Kanzaki N, Tsai IJ, Tanaka R, Hunt VL, Liu D, Tsuyama K, Maeda Y, Namai S, Kumagai R, Tracey A, Holroyd N, Doyle SR, Woodruff GC, Murase K, Kitazume H, Chai C, Akagi A, Panda O, Ke HM, Schroeder FC, Wang J, Berriman M, Sternberg PW, Sugimoto A and Kikuchi T

    Forestry and Forest Products Research Institute, Tsukuba, 305-8687, Japan.

    A 'sibling' species of the model organism Caenorhabditis elegans has long been sought for use in comparative analyses that would enable deep evolutionary interpretations of biological phenomena. Here, we describe the first sibling species of C. elegans, C. inopinata n. sp., isolated from fig syconia in Okinawa, Japan. We investigate the morphology, developmental processes and behaviour of C. inopinata, which differ significantly from those of C. elegans. The 123-Mb C. inopinata genome was sequenced and assembled into six nuclear chromosomes, allowing delineation of Caenorhabditis genome evolution and revealing unique characteristics, such as highly expanded transposable elements that might have contributed to the genome evolution of C. inopinata. In addition, C. inopinata exhibits massive gene losses in chemoreceptor gene families, which could be correlated with its limited habitat area. We have developed genetic and molecular techniques for C. inopinata; thus C. inopinata provides an exciting new platform for comparative evolutionary studies.

    Funded by: Japan Society for the Promotion of Science (JSPS): 15K14503, 16H04722, 26292178; NIGMS NIH HHS: R01 GM088290; Wellcome Trust: 206194

    Nature communications 2018;9;1;3216

  • Designing an intuitive web application for drug discovery scientists.

    Karamanis N, Pignatelli M, Carvalho-Silva D, Rowland F, Cham JA and Dunham I

    Open Targets, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK; European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK. Electronic address:

    We discuss how we designed the Open Targets Platform (, an intuitive application for bench scientists working in early drug discovery. To meet the needs of our users, we applied lean user experience (UX) design methods: we started engaging with users very early and carried out research, design and evaluation activities within an iterative development process. We also emphasize the collaborative nature of applying lean UX design, which we believe is a foundation for success in this and many other scientific projects.

    Funded by: Wellcome Trust

    Drug discovery today 2018;23;6;1169-1174

  • Identification, Characterization, and Heritability of Murine Metastable Epialleles: Implications for Non-genetic Inheritance.

    Kazachenka A, Bertozzi TM, Sjoberg-Herrera MK, Walker N, Gardner J, Gunning R, Pahita E, Adams S, Adams D and Ferguson-Smith AC

    Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK.

    Generally repressed by epigenetic mechanisms, retrotransposons represent around 40% of the murine genome. At the Agouti viable yellow (A<sup>vy</sup>) locus, an endogenous retrovirus (ERV) of the intracisternal A particle (IAP) class retrotransposed upstream of the agouti coat-color locus, providing an alternative promoter that is variably DNA methylated in genetically identical individuals. This results in variable expressivity of coat color that is inherited transgenerationally. Here, a systematic genome-wide screen identifies multiple C57BL/6J murine IAPs with A<sup>vy</sup> epigenetic properties. Each exhibits a stable methylation state within an individual but varies between individuals. Only in rare instances do they act as promoters controlling adjacent gene expression. Their methylation state is locus-specific within an individual, and their flanking regions are enriched for CTCF. Variably methylated IAPs are reprogrammed after fertilization and re-established as variable loci in the next generation, indicating reconstruction of metastable epigenetic states and challenging the generalizability of non-genetic inheritance at these regions.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council: MR/J001597/1; Wellcome Trust: WT095606RR

    Cell 2018;175;5;1259-1271.e13

  • The mRNA cap methyltransferase gene TbCMT1 is not essential in vitro but is a virulence factor in vivo for bloodstream form Trypanosoma brucei.

    Kelner A, Tinti M, Guther MLS, Foth BJ, Chappell L, Berriman M, Cowling VH and Ferguson MAJ

    Wellcome Centre for Anti-Infectives Research, School of Life Sciences, University of Dundee, Dundee, United Kingdom.

    Messenger RNA is modified by the addition of a 5' methylated cap structure, which protects the transcript and recruits protein complexes that mediate RNA processing and/or the initiation of translation. Two genes encoding mRNA cap methyltransferases have been identified in T. brucei: TbCMT1 and TbCGM1. Here we analysed the impact of TbCMT1 gene deletion on bloodstream form T. brucei cells. TbCMT1 was dispensable for parasite proliferation in in vitro culture. However, significantly decreased parasitemia was observed in mice inoculated with TbCMT1 null and conditional null cell lines. Using RNA-Seq, we observed that several cysteine peptidase mRNAs were downregulated in TbCMT1 null cells lines. The cysteine peptidase Cathepsin-L was also shown to be reduced at the protein level in TbCMT1 null cell lines. Our data suggest that TbCMT1 is not essential to bloodstream form T. brucei growth in vitro or in vivo but that it contributes significantly to parasite virulence in vivo.

    Funded by: Medical Research Council: MR/K024213/1; Wellcome Trust: 093712/Z/10/Z, 101842/Z13/Z

    PloS one 2018;13;7;e0201263

  • The Impact of NOD2 Variants on Fecal Microbiota in Crohn's Disease and Controls Without Gastrointestinal Disease.

    Kennedy NA, Lamb CA, Berry SH, Walker AW, Mansfield J, Parkes M, Simpkins R, Tremelling M, Nutland S, UK IBD Genetics Consortium, Parkhill J, Probert C, Hold GL and Lees CW

    GI Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK.

    Background/aims: Current models of Crohn's disease (CD) describe an inappropriate immune response to gut microbiota in genetically susceptible individuals. NOD2 variants are strongly associated with development of CD, and NOD2 is part of the innate immune response to bacteria. This study aimed to identify differences in fecal microbiota in CD patients and non-IBD controls stratified by NOD2 genotype.

    Methods: Patients with CD and non-IBD controls of known NOD2 genotype were identified from patients in previous UK IBD genetics studies and the Cambridge bioresource (genotyped/phenotyped volunteers). Individuals with known CD-associated NOD2 mutations were matched to those with wild-type genotype. We obtained fecal samples from patients in clinical remission with low fecal calprotectin (<250 µg/g) and controls without gastrointestinal disease. After extracting DNA, the V1-2 region of 16S rRNA genes were polymerase chain reaction (PCR)-amplified and sequenced. Analysis was undertaken using the mothur package. Volatile organic compounds (VOC) were also measured.

    Results: Ninety-one individuals were in the primary analysis (37 CD, 30 bioresource controls, and 24 household controls). Comparing CD with nonIBD controls, there were reductions in bacterial diversity, Ruminococcaceae, Rikenellaceae, and Christensenellaceae and an increase in Enterobacteriaceae. No significant differences could be identified in microbiota by NOD2 genotype, but fecal butanoic acid was higher in Crohn's patients carrying NOD2 mutations.

    Conclusions: In this well-controlled study of NOD2 genotype and fecal microbiota, we identified no significant genotype-microbiota associations. This suggests that the changes associated with NOD2 genotype might only be seen at the mucosal level, or that environmental factors and prior inflammation are the predominant determinant of the observed dysbiosis in gut microbiota.

    Funded by: Department of Health: NIHR-RP-R3-12-026; Medical Research Council: MC_UU_12010/7; Wellcome Trust: 093885 , 097943 , 098051

    Inflammatory bowel diseases 2018;24;3;583-592

  • Inducible developmental reprogramming redefines commitment to sexual development in the malaria parasite Plasmodium berghei.

    Kent RS, Modrzynska KK, Cameron R, Philip N, Billker O and Waters AP

    Institute of Infection, Immunity and Inflammation, University of Glasgow, Glasgow, UK.

    During malaria infection, Plasmodium spp. parasites cyclically invade red blood cells and can follow two different developmental pathways. They can either replicate asexually to sustain the infection, or differentiate into gametocytes, the sexual stage that can be taken up by mosquitoes, ultimately leading to disease transmission. Despite its importance for malaria control, the process of gametocytogenesis remains poorly understood, partially due to the difficulty of generating high numbers of sexually committed parasites in laboratory conditions<sup>1</sup>. Recently, an apicomplexa-specific transcription factor (AP2-G) was identified as necessary for gametocyte production in multiple Plasmodium species<sup>2,3</sup>, and suggested to be an epigenetically regulated master switch that initiates gametocytogenesis<sup>4,5</sup>. Here we show that in a rodent malaria parasite, Plasmodium berghei, conditional overexpression of AP2-G can be used to synchronously convert the great majority of the population into fertile gametocytes. This discovery allowed us to redefine the time frame of sexual commitment, identify a number of putative AP2-G targets and chart the sequence of transcriptional changes through gametocyte development, including the observation that gender-specific transcription occurred within 6 h of induction. These data provide entry points for further detailed characterization of the key process required for malaria transmission.

    Funded by: Biotechnology and Biological Sciences Research Council: J013854/01; Wellcome Trust: 098051, 104111, 107046, 202600

    Nature microbiology 2018;3;11;1206-1213

  • High-throughput DNA methylation analysis in anorexia nervosa confirms TNXB hypermethylation.

    Kesselmeier M, Pütter C, Volckmar AL, Baurecht H, Grallert H, Illig T, Ismail K, Ollikainen M, Silén Y, Keski-Rahkonen A, Bulik CM, Collier DA, Zeggini E, Hebebrand J, Scherag A, Hinney A and GCAN and WTCCC3

    a Clinical Epidemiology, Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital , Jena , Germany.

    Objectives: Patients with anorexia nervosa (AN) are ideally suited to identify differentially methylated genes in response to starvation.

    Methods: We examined high-throughput DNA methylation derived from whole blood of 47 females with AN, 47 lean females without AN and 100 population-based females to compare AN with both controls. To account for different cell type compositions, we applied two reference-free methods (FastLMM-EWASher, RefFreeEWAS) and searched for consensus CpG sites identified by both methods. We used a validation sample of five monozygotic AN-discordant twin pairs.

    Results: Fifty-one consensus sites were identified in AN vs. lean and 81 in AN vs. population-based comparisons. These sites have not been reported in AN methylation analyses, but for the latter comparison 54/81 sites showed directionally consistent differential methylation effects in the AN-discordant twins. For a single nucleotide polymorphism rs923768 in CSGALNACT1 a nearby site was nominally associated with AN. At the gene level, we confirmed hypermethylated sites at TNXB. We found support for a locus at NR1H3 in the AN vs. lean control comparison, but the methylation direction was opposite to the one previously reported.

    Conclusions: We confirm genes like TNXB previously described to comprise differentially methylated sites, and highlight further sites that might be specifically involved in AN starvation processes.

    Funded by: Wellcome Trust: WT088827/Z/09, WT098051

    The world journal of biological psychiatry : the official journal of the World Federation of Societies of Biological Psychiatry 2018;19;3;187-199

  • Observation of Cleft Palate in an Individual with SOX11 Mutation: Indication of a Role for SOX11 in Human Palatogenesis.

    Khan U, Study D, Baker E and Clayton-Smith J

    1 University of Manchester, Faculty of Biology, Medicine and Health, Manchester, United Kingdom.

    Objective: Point mutations and deletions within the SOX11 gene have recently been described in individuals with a rare variant of Coffin-Siris syndrome, OMIM 615866, an intellectual disability syndrome with associated features of nail hypoplasia, microcephaly, and characteristic facial features including a wide mouth and prominent lips.

    Participant: We describe a further patient with a mutation in SOX11 and phenotype resembling mild Coffin-Siris syndrome.

    Results: This boy had a cleft palate, a feature not previously seen in other patients with SOX11 mutations.

    Conclusion: We discuss This adds to the current evidence that SOX11 is a gene involved in palatogenesis.

    Funded by: Department of Health; Wellcome Trust: WT098051

    The Cleft palate-craniofacial journal : official publication of the American Cleft Palate-Craniofacial Association 2018;55;3;456-461

  • Functional analysis of Salmonella Typhi adaptation to survival in water.

    Kingsley RA, Langridge G, Smith SE, Makendi C, Fookes M, Wileman TM, El Ghany MA, Keith Turner A, Dyson ZA, Sridhar S, Pickard D, Kay S, Feasey N, Wong V, Barquist L and Dougan G

    Quadram Institute Bioscience, Norwich Research Park, Norwich, UK.

    Contaminated water is a major risk factor associated with the transmission of Salmonella enterica serovar Typhi (S. Typhi), the aetiological agent of human typhoid. However, little is known about how this pathogen adapts to living in the aqueous environment. We used transcriptome analysis (RNA-seq) and transposon mutagenesis (TraDIS) to characterize these adaptive changes and identify multiple genes that contribute to survival. Over half of the genes in the S. Typhi genome altered expression level within the first 24 h following transfer from broth culture to water, although relatively few did so in the first 30 min. Genes linked to central metabolism, stress associated with arrested proton motive force and respiratory chain factors changed expression levels. Additionally, motility and chemotaxis genes increased expression, consistent with a scavenging lifestyle. The viaB-associated gene tviC encoding a glcNAc epimerase that is required for Vi polysaccharide biosynthesis was, along with several other genes, shown to contribute to survival in water. Thus, we define regulatory adaptation operating in S. Typhi that facilitates survival in water.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/R012504/1; Wellcome Trust

    Environmental microbiology 2018;20;11;4079-4090

  • Multiplexing for Oxidative Bisulfite Sequencing (oxBS-seq).

    Kirschner K, Krueger F, Green AR and Chandra T

    Cambridge Institute for Medical Research, University of Cambridge, Cambridge, CB2 0XY, UK.

    DNA modifications, especially methylation, are known to play a crucial part in many regulatory processes in the cell. Recently, 5-hydroxymethylcytosine (5hmC) was discovered, a DNA modification derived as an intermediate of 5-methylcytosine (5mC) oxidation. Efforts to gain insights into function of this DNA modification are underway and several methods were recently described to assess 5hmC levels using sequencing approaches. Here we integrate adaptation based multiplexing and high-efficiency library prep into the oxidative Bisulfite Sequencing (oxBS-seq) workflow reducing the starting amount and cost per sample to identify 5hmC levels genome-wide.

    Funded by: Medical Research Council: MC_PC_12009

    Methods in molecular biology (Clifton, N.J.) 2018;1708;665-678

  • Quantitative mass spectrometry for human melanocortin peptides in vitro and in vivo suggests prominent roles for β-MSH and desacetyl α-MSH in energy homeostasis.

    Kirwan P, Kay RG, Brouwers B, Herranz-Pérez V, Jura M, Larraufie P, Jerber J, Pembroke J, Bartels T, White A, Gribble FM, Reimann F, Farooqi IS, O'Rahilly S and Merkle FT

    Metabolic Research Laboratories and Medical Research Council Metabolic Diseases Unit, Wellcome Trust-Medical Research Council Institute of Metabolic Science, University of Cambridge, Cambridge, CB2 0QQ, UK; The Anne McLaren Laboratory for Regenerative Medicine, Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, University of Cambridge, Cambridge, CB2 0SZ, UK.

    Objective: The lack of pro-opiomelanocortin (POMC)-derived melanocortin peptides results in hypoadrenalism and severe obesity in both humans and rodents that is treatable with synthetic melanocortins. However, there are significant differences in POMC processing between humans and rodents, and little is known about the relative physiological importance of POMC products in the human brain. The aim of this study was to determine which POMC-derived peptides are present in the human brain, to establish their relative concentrations, and to test if their production is dynamically regulated.

    Methods: We analysed both fresh post-mortem human hypothalamic tissue and hypothalamic neurons derived from human pluripotent stem cells (hPSCs) using liquid chromatography tandem mass spectrometry (LC-MS/MS) to determine the sequence and quantify the production of hypothalamic neuropeptides, including those derived from POMC.

    Results: In both in vitro and in vivo hypothalamic cells, LC-MS/MS revealed the sequence of hundreds of neuropeptides as a resource for the field. Although the existence of β-melanocyte stimulating hormone (MSH) is controversial, we found that both this peptide and desacetyl α-MSH (d-α-MSH) were produced in considerable excess of acetylated α-MSH. In hPSC-derived hypothalamic neurons, these POMC derivatives were appropriately trafficked, secreted, and their production was significantly (P < 0.0001) increased in response to the hormone leptin.

    Conclusions: Our findings challenge the assumed pre-eminence of α-MSH and suggest that in humans, d-α-MSH and β-MSH are likely to be the predominant physiological products acting on melanocortin receptors.

    Funded by: Medical Research Council: G0900554, MC_PC_12009, MC_UU_12012/1, MC_UU_12012/3, MC_UU_12012/5, MR/M009041/1, MR/M024873/1, MR/P501967/1

    Molecular metabolism 2018;17;82-97

  • scmap: projection of single-cell RNA-seq data across data sets.

    Kiselev VY, Yiu A and Hemberg M

    Wellcome Sanger Institute, Hinxton, UK.

    Single-cell RNA-seq (scRNA-seq) allows researchers to define cell types on the basis of unsupervised clustering of the transcriptome. However, differences in experimental methods and computational analyses make it challenging to compare data across experiments. Here we present scmap (; web version at, a method for projecting cells from an scRNA-seq data set onto cell types or individual cells from other experiments.

    Nature methods 2018;15;5;359-362

  • Human Coronavirus NL63 Molecular Epidemiology and Evolutionary Patterns in Rural Coastal Kenya.

    Kiyuka PK, Agoti CN, Munywoki PK, Njeru R, Bett A, Otieno JR, Otieno GP, Kamau E, Clark TG, van der Hoek L, Kellam P, Nokes DJ and Cotten M

    Epidemiology and Demography Department, Kenya Medical Research Institute-Wellcome Trust Research Programme.

    Background: Human coronavirus NL63 (HCoV-NL63) is a globally endemic pathogen causing mild and severe respiratory tract infections with reinfections occurring repeatedly throughout a lifetime.

    Methods: Nasal samples were collected in coastal Kenya through community-based and hospital-based surveillance. HCoV-NL63 was detected with multiplex real-time reverse transcription PCR, and positive samples were targeted for nucleotide sequencing of the spike (S) protein. Additionally, paired samples from 25 individuals with evidence of repeat HCoV-NL63 infection were selected for whole-genome virus sequencing.

    Results: HCoV-NL63 was detected in 1.3% (75/5573) of child pneumonia admissions. Two HCoV-NL63 genotypes circulated in Kilifi between 2008 and 2014. Full genome sequences formed a monophyletic clade closely related to contemporary HCoV-NL63 from other global locations. An unexpected pattern of repeat infections was observed with some individuals showing higher viral titers during their second infection. Similar patterns for 2 other endemic coronaviruses, HCoV-229E and HCoV-OC43, were observed. Repeat infections by HCoV-NL63 were not accompanied by detectable genotype switching.

    Conclusions: In this coastal Kenya setting, HCoV-NL63 exhibited low prevalence in hospital pediatric pneumonia admissions. Clade persistence with low genetic diversity suggest limited immune selection, and absence of detectable clade switching in reinfections indicates initial exposure was insufficient to elicit a protective immune response.

    The Journal of infectious diseases 2018;217;11;1728-1739

  • A large impact crater beneath Hiawatha Glacier in northwest Greenland.

    Kjær KH, Larsen NK, Binder T, Bjørk AA, Eisen O, Fahnestock MA, Funder S, Garde AA, Haack H, Helm V, Houmark-Nielsen M, Kjeldsen KK, Khan SA, Machguth H, McDonald I, Morlighem M, Mouginot J, Paden JD, Waight TE, Weikusat C, Willerslev E and MacGregor JA

    Centre for GeoGenetics, Natural History Museum, University of Copenhagen, Copenhagen, Denmark.

    We report the discovery of a large impact crater beneath Hiawatha Glacier in northwest Greenland. From airborne radar surveys, we identify a 31-kilometer-wide, circular bedrock depression beneath up to a kilometer of ice. This depression has an elevated rim that cross-cuts tributary subglacial channels and a subdued central uplift that appears to be actively eroding. From ground investigations of the deglaciated foreland, we identify overprinted structures within Precambrian bedrock along the ice margin that strike tangent to the subglacial rim. Glaciofluvial sediment from the largest river draining the crater contains shocked quartz and other impact-related grains. Geochemical analysis of this sediment indicates that the impactor was a fractionated iron asteroid, which must have been more than a kilometer wide to produce the identified crater. Radiostratigraphy of the ice in the crater shows that the Holocene ice is continuous and conformable, but all deeper and older ice appears to be debris rich or heavily disturbed. The age of this impact crater is presently unknown, but from our geological and geophysical evidence, we conclude that it is unlikely to predate the Pleistocene inception of the Greenland Ice Sheet.

    Science advances 2018;4;11;eaar8173

  • Implications of insecticide resistance for malaria vector control with long-lasting insecticidal nets: a WHO-coordinated, prospective, international, observational cohort study.

    Kleinschmidt I, Bradley J, Knox TB, Mnzava AP, Kafy HT, Mbogo C, Ismail BA, Bigoga JD, Adechoubou A, Raghavendra K, Cook J, Malik EM, Nkuni ZJ, Macdonald M, Bayoh N, Ochomo E, Fondjo E, Awono-Ambene HP, Etang J, Akogbeto M, Bhatt RM, Chourasia MK, Swain DK, Kinyari T, Subramaniam K, Massougbodji A, Okê-Sopoh M, Ogouyemi-Hounto A, Kouambeng C, Abdin MS, West P, Elmardi K, Cornelie S, Corbel V, Valecha N, Mathenge E, Kamau L, Lines J and Donnelly MJ

    MRC Tropical Epidemiology Group, Department of Infectious Disease Epidemiology, London School of Hygiene & Tropical Medicine, London, UK; School of Public Health, University of the Witwatersrand, Johannesburg, South Africa. Electronic address:

    Background: Scale-up of insecticide-based interventions has averted more than 500 million malaria cases since 2000. Increasing insecticide resistance could herald a rebound in disease and mortality. We aimed to investigate whether insecticide resistance was associated with loss of effectiveness of long-lasting insecticidal nets and increased malaria disease burden.

    Methods: This WHO-coordinated, prospective, observational cohort study was done at 279 clusters (villages or groups of villages in which phenotypic resistance was measurable) in Benin, Cameroon, India, Kenya, and Sudan. Pyrethroid long-lasting insecticidal nets were the principal form of malaria vector control in all study areas; in Sudan this approach was supplemented by indoor residual spraying. Cohorts of children from randomly selected households in each cluster were recruited and followed up by community health workers to measure incidence of clinical malaria and prevalence of infection. Mosquitoes were assessed for susceptibility to pyrethroids using the standard WHO bioassay test. Country-specific results were combined using meta-analysis.

    Findings: Between June 2, 2012, and Nov 4, 2016, 40 000 children were enrolled and assessed for clinical incidence during 1·4 million follow-up visits. 80 000 mosquitoes were assessed for insecticide resistance. Long-lasting insecticidal net users had lower infection prevalence (adjusted odds ratio [OR] 0·63, 95% CI 0·51-0·78) and disease incidence (adjusted rate ratio [RR] 0·62, 0·41-0·94) than did non-users across a range of resistance levels. We found no evidence of an association between insecticide resistance and infection prevalence (adjusted OR 0·86, 0·70-1·06) or incidence (adjusted RR 0·89, 0·72-1·10). Users of nets, although significantly better protected than non-users, were nevertheless subject to high malaria infection risk (ranging from an average incidence in net users of 0·023, [95% CI 0·016-0·033] per person-year in India, to 0·80 [0·65-0·97] per person year in Kenya; and an average infection prevalence in net users of 0·8% [0·5-1·3] in India to an average infection prevalence of 50·8% [43·4-58·2] in Benin).

    Interpretation: Irrespective of resistance, populations in malaria endemic areas should continue to use long-lasting insecticidal nets to reduce their risk of infection. As nets provide only partial protection, the development of additional vector control tools should be prioritised to reduce the unacceptably high malaria burden.

    Funding: Bill & Melinda Gates Foundation, UK Medical Research Council, and UK Department for International Development.

    Funded by: Medical Research Council: MR/K012126/1; World Health Organization: 001

    The Lancet. Infectious diseases 2018;18;6;640-649

  • Emergence of an Extensively Drug-Resistant Salmonella enterica Serovar Typhi Clone Harboring a Promiscuous Plasmid Encoding Resistance to Fluoroquinolones and Third-Generation Cephalosporins.

    Klemm EJ, Shakoor S, Page AJ, Qamar FN, Judge K, Saeed DK, Wong VK, Dallman TJ, Nair S, Baker S, Shaheen G, Qureshi S, Yousafzai MT, Saleem MK, Hasan Z, Dougan G and Hasan R

    Wellcome Trust Sanger Institute, Hinxton, United Kingdom.

    Antibiotic resistance is a major problem in <i>Salmonella enterica</i> serovar Typhi, the causative agent of typhoid. Multidrug-resistant (MDR) isolates are prevalent in parts of Asia and Africa and are often associated with the dominant H58 haplotype. Reduced susceptibility to fluoroquinolones is also widespread, and sporadic cases of resistance to third-generation cephalosporins or azithromycin have also been reported. Here, we report the first large-scale emergence and spread of a novel <i>S</i> Typhi clone harboring resistance to three first-line drugs (chloramphenicol, ampicillin, and trimethoprim-sulfamethoxazole) as well as fluoroquinolones and third-generation cephalosporins in Sindh, Pakistan, which we classify as extensively drug resistant (XDR). Over 300 XDR typhoid cases have emerged in Sindh, Pakistan, since November 2016. Additionally, a single case of travel-associated XDR typhoid has recently been identified in the United Kingdom. Whole-genome sequencing of over 80 of the XDR isolates revealed remarkable genetic clonality and sequence conservation, identified a large number of resistance determinants, and showed that these isolates were of haplotype H58. The XDR <i>S</i> Typhi clone encodes a chromosomally located resistance region and harbors a plasmid encoding additional resistance elements, including the <i>bla</i><sub>CTX-M-15</sub> extended-spectrum β-lactamase, and carrying the <i>qnrS</i> fluoroquinolone resistance gene. This antibiotic resistance-associated IncY plasmid exhibited high sequence identity to plasmids found in other enteric bacteria isolated from widely distributed geographic locations. This study highlights three concerning problems: the receding antibiotic arsenal for typhoid treatment, the ability of <i>S</i> Typhi to transform from MDR to XDR in a single step by acquisition of a plasmid, and the ability of XDR clones to spread globally.<b>IMPORTANCE</b> Typhoid fever is a severe disease caused by the Gram-negative bacterium <i>Salmonella enterica</i> serovar Typhi. Antibiotic-resistant <i>S</i> Typhi strains have become increasingly common. Here, we report the first large-scale emergence and spread of a novel extensively drug-resistant (XDR) <i>S</i> Typhi clone in Sindh, Pakistan. The XDR <i>S</i> Typhi is resistant to the majority of drugs available for the treatment of typhoid fever. This study highlights the evolving threat of antibiotic resistance in <i>S</i> Typhi and the value of antibiotic susceptibility testing and whole-genome sequencing in understanding emerging infectious diseases. We genetically characterized the XDR <i>S</i> Typhi to investigate the phylogenetic relationship between these isolates and a global collection of <i>S</i> Typhi isolates and to identify multiple genes linked to antibiotic resistance. This <i>S</i> Typhi clone harbored a promiscuous antibiotic resistance plasmid previously identified in other enteric bacteria. The increasing antibiotic resistance in <i>S</i> Typhi observed here adds urgency to the need for typhoid prevention measures.

    Funded by: Wellcome Trust

    mBio 2018;9;1

  • Emergence of dominant multidrug-resistant bacterial clades: Lessons from history and whole-genome sequencing.

    Klemm EJ, Wong VK and Dougan G

    Infection Genomics Programme, Wellcome Trust Sanger Institute, CB10 1SA Cambridge, United Kingdom.

    Antibiotic resistance in bacteria has emerged as a global challenge over the past 90 years, compromising our ability to effectively treat infections. There has been a dramatic increase in antibiotic resistance-associated determinants in bacterial populations, driven by the mobility and infectious nature of such determinants. Bacterial genome flexibility and antibiotic-driven selection are at the root of the problem. Genome evolution and the emergence of highly successful multidrug-resistant clades in different pathogens have made this a global challenge. Here, we describe some of the factors driving the origin, evolution, and spread of the antibiotic resistance genotype.

    Proceedings of the National Academy of Sciences of the United States of America 2018;115;51;12872-12877

  • XenofilteR: computational deconvolution of mouse and human reads in tumor xenograft sequence data.

    Kluin RJC, Kemper K, Kuilman T, de Ruiter JR, Iyer V, Forment JV, Cornelissen-Steijger P, de Rink I, Ter Brugge P, Song JY, Klarenbeek S, McDermott U, Jonkers J, Velds A, Adams DJ, Peeper DS and Krijgsman O

    Central Genomic Facility, Netherlands Cancer Institute, Amsterdam, The Netherlands.

    Background: Mouse xenografts from (patient-derived) tumors (PDX) or tumor cell lines are widely used as models to study various biological and preclinical aspects of cancer. However, analyses of their RNA and DNA profiles are challenging, because they comprise reads not only from the grafted human cancer but also from the murine host. The reads of murine origin result in false positives in mutation analysis of DNA samples and obscure gene expression levels when sequencing RNA. However, currently available algorithms are limited and improvements in accuracy and ease of use are necessary.

    Results: We developed the R-package XenofilteR, which separates mouse from human sequence reads based on the edit-distance between a sequence read and reference genome. To assess the accuracy of XenofilteR, we generated sequence data by in silico mixing of mouse and human DNA sequence data. These analyses revealed that XenofilteR removes > 99.9% of sequence reads of mouse origin while retaining human sequences. This allowed for mutation analysis of xenograft samples with accurate variant allele frequencies, and retrieved all non-synonymous somatic tumor mutations.

    Conclusions: XenofilteR accurately dissects RNA and DNA sequences from mouse and human origin, thereby outperforming currently available tools. XenofilteR is open source and available at .

    Funded by: FP7 Ideas: European Research Council: 319661; KWF Kankerbestrijding: NKI-2013-5799; Wellcome Trust

    BMC bioinformatics 2018;19;1;366

  • Zoonotic Transfer of Clostridium difficile Harboring Antimicrobial Resistance between Farm Animals and Humans.

    Knetsch CW, Kumar N, Forster SC, Connor TR, Browne HP, Harmanus C, Sanders IM, Harris SR, Turner L, Morris T, Perry M, Miyajima F, Roberts P, Pirmohamed M, Songer JG, Weese JS, Indra A, Corver J, Rupnik M, Wren BW, Riley TV, Kuijper EJ and Lawley TD

    Section Experimental Bacteriology, Department of Medical Microbiology, Leiden University Medical Center, Leiden, Netherlands.

    The emergence of <i>Clostridium difficile</i> as a significant human diarrheal pathogen is associated with the production of highly transmissible spores and the acquisition of antimicrobial resistance genes (ARGs) and virulence factors. Unlike the hospital-associated <i>C. difficile</i> RT027 lineage, the community-associated <i>C. difficile</i> RT078 lineage is isolated from both humans and farm animals; however, the geographical population structure and transmission networks remain unknown. Here, we applied whole-genome phylogenetic analysis of 248 <i>C. difficile</i> RT078 strains from 22 countries. Our results demonstrate limited geographical clustering for <i>C. difficile</i> RT078 and extensive coclustering of human and animal strains, thereby revealing a highly linked intercontinental transmission network between humans and animals. Comparative whole-genome analysis reveals indistinguishable accessory genomes between human and animal strains and a variety of antimicrobial resistance genes in the pangenome of <i>C. difficile</i> RT078. Thus, bidirectional spread of <i>C. difficile</i> RT078 between farm animals and humans may represent an unappreciated route disseminating antimicrobial resistance genes between humans and animals. These results highlight the importance of the "One Health" concept to monitor infectious disease emergence and the dissemination of antimicrobial resistance genes.

    Funded by: Medical Research Council: G0902453, MR/K000551/1, MR/L015080/1, PF451; Wellcome Trust: 098051

    Journal of clinical microbiology 2018;56;3

  • Loss of functional BAP1 augments sensitivity to TRAIL in cancer cells.

    Kolluri KK, Alifrangis C, Kumar N, Ishii Y, Price S, Michaut M, Williams S, Barthorpe S, Lightfoot H, Busacca S, Sharkey A, Yuan Z, Sage EK, Vallath S, Le Quesne J, Tice DA, Alrifai D, von Karstedt S, Montinaro A, Guppy N, Waller DA, Nakas A, Good R, Holmes A, Walczak H, Fennell DA, Garnett M, Iorio F, Wessels L, McDermott U and Janes SM

    Lungs for Living Research Centre, UCL Respiratory, University College London, London, United Kingdom.

    Malignant mesothelioma (MM) is poorly responsive to systemic cytotoxic chemotherapy and invariably fatal. Here we describe a screen of 94 drugs in 15 exome-sequenced MM lines and the discovery of a subset defined by loss of function of the nuclear deubiquitinase BRCA associated protein-1 (BAP1) that demonstrate heightened sensitivity to TRAIL (tumour necrosis factor-related apoptosis-inducing ligand). This association is observed across human early passage MM cultures, mouse xenografts and human tumour explants. We demonstrate that BAP1 deubiquitinase activity and its association with ASXL1 to form the Polycomb repressive deubiquitinase complex (PR-DUB) impacts TRAIL sensitivity implicating transcriptional modulation as an underlying mechanism. Death receptor agonists are well-tolerated anti-cancer agents demonstrating limited therapeutic benefit in trials without a targeting biomarker. We identify <i>BAP1</i> loss-of-function mutations, which are frequent in MM, as a potential genomic stratification tool for TRAIL sensitivity with immediate and actionable therapeutic implications.

    Funded by: Cancer Research UK: A17341; Medical Research Council: MC_UP_1203/1, MR/M015831/1; Wellcome: WT097452MA; Wellcome Trust: 106555/Z/14/Z, WT107963AIA

    eLife 2018;7

  • Chromosome assembly of large and complex genomes using multiple references.

    Kolmogorov M, Armstrong J, Raney BJ, Streeter I, Dunn M, Yang F, Odom D, Flicek P, Keane TM, Thybert D, Paten B and Pham S

    Department of Computer Science and Engineering, University of California, San Diego, California 92093, USA.

    Despite the rapid development of sequencing technologies, the assembly of mammalian-scale genomes into complete chromosomes remains one of the most challenging problems in bioinformatics. To help address this difficulty, we developed Ragout 2, a reference-assisted assembly tool that works for large and complex genomes. By taking one or more target assemblies (generated from an NGS assembler) and one or multiple related reference genomes, Ragout 2 infers the evolutionary relationships between the genomes and builds the final assemblies using a genome rearrangement approach. By using Ragout 2, we transformed NGS assemblies of 16 laboratory mouse strains into sets of complete chromosomes, leaving <5% of sequence unlocalized per set. Various benchmarks, including PCR testing and realigning of long Pacific Biosciences (PacBio) reads, suggest only a small number of structural errors in the final assemblies, comparable with direct assembly approaches. We applied Ragout 2 to the <i>Mus caroli</i> and <i>Mus pahari</i> genomes, which exhibit karyotype-scale variations compared with other genomes from the <i>Muridae</i> family. Chromosome painting maps confirmed most large-scale rearrangements that Ragout 2 detected. We applied Ragout 2 to improve draft sequences of three ape genomes that have recently been published. Ragout 2 transformed three sets of contigs (generated using PacBio reads only) into chromosome-scale assemblies with accuracy comparable to chromosome assemblies generated in the original study using BioNano maps, Hi-C, BAC clones, and FISH.

    Funded by: NHGRI NIH HHS: U41 HG007234, U54 HG007990; NHLBI NIH HHS: U01 HL137183; Wellcome Trust: WT098051, WT108749/Z/15/Z, WT202878/B/16/Z

    Genome research 2018;28;11;1720-1732

  • Global and targeted approaches to single-cell transcriptome characterization.

    Kolodziejczyk AA and Lönnberg T

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK.

    Analysing transcriptomes of cell populations is a standard molecular biology approach to understand how cells function. Recent methodological development has allowed performing similar experiments on single cells. This has opened up the possibility to examine samples with limited cell number, such as cells of the early embryo, and to obtain an understanding of heterogeneity within populations such as blood cell types or neurons. There are two major approaches for single-cell transcriptome analysis: quantitative reverse transcription PCR (RT-qPCR) on a limited number of genes of interest, or more global approaches targeting entire transcriptomes using RNA sequencing. RT-qPCR is sensitive, fast and arguably more straightforward, while whole-transcriptome approaches offer an unbiased perspective on a cell's expression status.

    Briefings in functional genomics 2018;17;4;209-219

  • Repair of double-strand breaks induced by CRISPR-Cas9 leads to large deletions and complex rearrangements.

    Kosicki M, Tomberg K and Bradley A

    Wellcome Sanger Institute, Hinxton, UK.

    CRISPR-Cas9 is poised to become the gene editing tool of choice in clinical contexts. Thus far, exploration of Cas9-induced genetic alterations has been limited to the immediate vicinity of the target site and distal off-target sequences, leading to the conclusion that CRISPR-Cas9 was reasonably specific. Here we report significant on-target mutagenesis, such as large deletions and more complex genomic rearrangements at the targeted sites in mouse embryonic stem cells, mouse hematopoietic progenitors and a human differentiated cell line. Using long-read sequencing and long-range PCR genotyping, we show that DNA breaks introduced by single-guide RNA/Cas9 frequently resolved into deletions extending over many kilobases. Furthermore, lesions distal to the cut site and crossover events were identified. The observed genomic damage in mitotically active cells caused by CRISPR-Cas9 editing may have pathogenic consequences.

    Funded by: Wellcome Trust: 079643

    Nature biotechnology 2018;36;8;765-771

  • Naturally occurring polymorphisms in the virulence regulator Rsp modulate Staphylococcus aureus survival in blood and antibiotic susceptibility.

    Krishna A, Holden MTG, Peacock SJ, Edwards AM and Wigneshweraraj S

    1​MRC Centre for Molecular Bacteriology and Infection, Imperial College London, London, UK.

    Nasal colonization by the pathogen Staphylococcus aureus is a risk factor for subsequent infection. Loss of function mutations in the gene encoding the virulence regulator Rsp are associated with the transition of S. aureus from a colonizing isolate to one that causes bacteraemia. Here, we report the identification of several novel activity-altering mutations in rsp detected in clinical isolates, including for the first time, mutations that enhance agr operon activity. We assessed how these mutations affected infection-relevant phenotypes and found loss and enhancement of function mutations to have contrasting effects on S. aureus survival in blood and antibiotic susceptibility. These findings add to the growing body of evidence that suggests S. aureus 'trades off' virulence for the acquisition of traits that benefit survival in the host, and indicates that infection severity and treatment options can be significantly affected by mutations in the virulence regulator rsp.

    Funded by: Chief Scientist Office: SIRN10; Department of Health; Medical Research Council: G1000803, MR/P028225/1; Wellcome Trust

    Microbiology (Reading, England) 2018;164;9;1189-1195

  • Assessing Rare Variation in Complex Traits.

    Kuchenbaecker K and Appel EVR

    Wellcome Trust Sanger Institute, Cambridge, UK.

    While genome-wide association studies have been very successful in identifying associations of common genetic variants with many different traits, the rarer frequency spectrum of the genome has not yet been comprehensively explored. Technological developments increasingly lift restrictions to access rare genetic variation. Dense reference panels enable improved genotype imputation for rarer variants in studies using DNA microarrays. Moreover, the decreasing cost of next generation sequencing makes whole exome and genome sequencing increasingly affordable for large samples. Large-scale efforts based on sequencing, such as ExAC, 100,000 Genomes, and TopMed, are likely to significantly advance this field.The main challenge in evaluating complex trait associations of rare variants is statistical power. The choice of population should be considered carefully because allele frequencies and linkage disequilibrium structure differ between populations. Genetically isolated populations can have favorable genomic characteristics for the study of rare variants.One strategy to increase power is to assess the combined effect of multiple rare variants within a region, known as aggregate testing. A  range of methods have been developed for this. Model performance depends on the genetic architecture of the region of interest.

    Methods in molecular biology (Clifton, N.J.) 2018;1793;51-71

  • High-resolution genetic mapping of putative causal interactions between regions of open chromatin.

    Kumasaka N, Knights AJ and Gaffney DJ

    Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

    Physical interaction of regulatory elements in three-dimensional space poses a challenge for studies of disease because non-coding risk variants may be great distances from the genes they regulate. Experimental methods to capture these interactions, such as chromosome conformation capture, usually cannot assign causal direction of effect between regulatory elements, an important component of fine-mapping studies. We developed a Bayesian hierarchical approach that uses two-stage least squares and applied it to an ATAC-seq (assay for transposase-accessible chromatin using sequencing) data set from 100 individuals, to identify over 15,000 high-confidence causal interactions. Most (60%) interactions occurred over <20 kb, where chromosome conformation capture-based methods perform poorly. For a fraction of loci, we identified a single variant that alters accessibility across multiple regions, and experimentally validated the BLK locus, which is associated with multiple autoimmune diseases, using CRISPR genome editing. Our study highlights how association genetics of chromatin state is a powerful approach for identifying interactions between regulatory elements.

    Funded by: Wellcome Trust

    Nature genetics 2018;51;1;128-137

  • Immune Cell Dynamics Unfolded by Single-Cell Technologies.

    Kunz DJ, Gomes T and James KR

    Cavendish Laboratory, Department of Physics, University of Cambridge, Cambridge, United Kingdom.

    The single-cell revolution is paving the way towards the molecular characterisation of every cell type in the human body, revealing relationships between cell types and states at high resolution. Changes in cellular phenotypes are particularly prevalent in the immune system and can be observed in its continuous remodelling up to adulthood, response to disease and development of immunological memory. In this review, we delve into the world of cellular dynamics of the immune system. We discuss current single-cell experimental and computational approaches in this area, giving insights into plasticity and commitment of cell fates. Finally, we provide an outlook on upcoming technological developments and predict how these will improve our understanding of the immune system.

    Frontiers in immunology 2018;9;1435

  • A Standard Nomenclature for Referencing and Authentication of Pluripotent Stem Cells.

    Kurtz A, Seltmann S, Bairoch A, Bittner MS, Bruce K, Capes-Davis A, Clarke L, Crook JM, Daheron L, Dewender J, Faulconbridge A, Fujibuchi W, Gutteridge A, Hei DJ, Kim YO, Kim JH, Kokocinski AK, Lekschas F, Lomax GP, Loring JF, Ludwig T, Mah N, Matsui T, Müller R, Parkinson H, Sheldon M, Smith K, Stachelscheid H, Stacey G, Streeter I, Veiga A and Xu RH

    Charité - Universitätsmedizin Berlin, Berlin-Brandenburg Center for Regenerative Therapies, Berlin 13353, Germany. Electronic address:

    Unambiguous cell line authentication is essential to avoid loss of association between data and cells. The risk for loss of references increases with the rapidity that new human pluripotent stem cell (hPSC) lines are generated, exchanged, and implemented. Ideally, a single name should be used as a generally applied reference for each cell line to access and unify cell-related information across publications, cell banks, cell registries, and databases and to ensure scientific reproducibility. We discuss the needs and requirements for such a unique identifier and implement a standard nomenclature for hPSCs, which can be automatically generated and registered by the human pluripotent stem cell registry (hPSCreg). To avoid ambiguities in PSC-line referencing, we strongly urge publishers to demand registration and use of the standard name when publishing research based on hPSC lines.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council; Wellcome Trust

    Stem cell reports 2018;10;1;1-6

  • Excision-reintegration at a pneumococcal phase-variable restriction-modification locus drives within- and between-strain epigenetic differentiation and inhibits gene acquisition.

    Kwun MJ, Oggioni MR, De Ste Croix M, Bentley SD and Croucher NJ

    MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London W2 1PG, UK.

    Phase-variation of Type I restriction-modification systems can rapidly alter the sequence motifs they target, diversifying both the epigenetic patterns and endonuclease activity within clonally descended populations. Here, we characterize the Streptococcus pneumoniae SpnIV phase-variable Type I RMS, encoded by the translocating variable restriction (tvr) locus, to identify its target motifs, mechanism and regulation of phase variation, and effects on exchange of sequence through transformation. The specificity-determining hsdS genes were shuffled through a recombinase-mediated excision-reintegration mechanism involving circular intermediate molecules, guided by two types of direct repeat. The rate of rearrangements was limited by an attenuator and toxin-antitoxin system homologs that inhibited recombinase gene transcription. Target motifs for both the SpnIV, and multiple Type II, MTases were identified through methylation-sensitive sequencing of a panel of recombinase-null mutants. This demonstrated the species-wide diversity observed at the tvr locus can likely specify nine different methylation patterns. This will reduce sequence exchange in this diverse species, as the native form of the SpnIV RMS was demonstrated to inhibit the acquisition of genomic islands by transformation. Hence the tvr locus can drive variation in genome methylation both within and between strains, and limits the genomic plasticity of S. pneumoniae.

    Funded by: Medical Research Council: MR/R015600/1

    Nucleic acids research 2018;46;21;11438-11453

  • Detecting eukaryotic microbiota with single-cell sensitivity in human tissue.

    Lager S, de Goffau MC, Sovio U, Peacock SJ, Parkhill J, Charnock-Jones DS and Smith GCS

    Department of Obstetrics and Gynaecology, University of Cambridge, National Institute for Health Research Cambridge Biomedical Research Centre, Cambridge, UK.

    Background: Fetal growth restriction, pre-eclampsia, and pre-term birth are major adverse pregnancy outcomes. These complications are considerable contributors to fetal/maternal morbidity and mortality worldwide. A significant proportion of these cases are thought to be due to dysfunction of the placenta. However, the underlying mechanisms of placental dysfunction are unclear. The aim of the present study was to investigate whether adverse pregnancy outcomes are associated with evidence of placental eukaryotic infection.

    Results: We modified the 18S Illumina Amplicon Protocol of the Earth Microbiome Project and made it capable of detecting just a single spiked-in genome copy of Plasmodium falciparum, Saccharomyces cerevisiae, or Toxoplasma gondii among more than 70,000 human cells. Using this method, we were unable to detect eukaryotic pathogens in placental biopsies in instances of adverse pregnancy outcome (n = 199) or in healthy controls (n = 99).

    Conclusions: Eukaryotic infection of the placenta is not an underlying cause of the aforementioned pregnancy complications. Possible clinical applications for this non-targeted, yet extremely sensitive, eukaryotic screening method are manifest.

    Funded by: Department of Health; Medical Research Council: G1100221, MR/K021133/1

    Microbiome 2018;6;1;151

  • Toll-like receptor 2 costimulation potentiates the antitumor efficacy of CAR T Cells.

    Lai Y, Weng J, Wei X, Qin L, Lai P, Zhao R, Jiang Z, Li B, Lin S, Wang S, Wu Q, Tang Z, Liu P, Pei D, Yao Y, Du X and Li P

    Key Laboratory of Regenerative Biology, South China Institute for Stem Cell Biology and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou, China.

    Chimeric antigen receptor (CAR) T-cell immunotherapies have shown unprecedented success in treating leukemia but limited clinical efficacy in solid tumors. Here, we generated 1928zT2 and m28zT2, targeting CD19 and mesothelin, respectively, by introducing the Toll/interleukin-1 receptor domain of Toll-like receptor 2 (TLR2) to 1928z and m28z. T cells expressing 1928zT2 or m28zT2 showed improved expansion, persistency and effector function against CD19<sup>+</sup> leukemia or mesothelin<sup>+</sup> solid tumors respectively in vitro and in vivo. In a patient with relapsed B-cell acute lymphoblastic leukemia, a single dose of 5 × 10<sup>4</sup>/kg 1928zT2 T cells resulted in robust expansion and leukemia eradication and led to complete remission. Hence, our results demonstrate that TLR2 signaling can contribute to the efficacy of CAR T cells. Further clinical trials are warranted to establish the safety and efficacy of this approach.

    Leukemia 2018;32;3;801-808

  • Loss of Genomic Diversity in a Neisseria meningitidis Clone Through a Colonization Bottleneck.

    Lamelas A, Hamid AM, Dangy JP, Hauser J, Jud M, Röltgen K, Hodgson A, Junghanss T, Harris SR, Parkhill J, Bentley SD and Pluschke G

    Swiss Tropical and Public Health Institute, Basel, Switzerland.

    Neisseria meningitidis is the leading cause of epidemic meningitis in the "meningitis belt" of Africa, where clonal waves of colonization and disease are observed. Point mutations and horizontal gene exchange lead to constant diversification of meningococcal populations during clonal spread. Maintaining a high genomic diversity may be an evolutionary strategy of meningococci that increases chances of fixing occasionally new highly successful "fit genotypes". We have performed a longitudinal study of meningococcal carriage and disease in northern Ghana by analyzing cerebrospinal fluid samples from all suspected meningitis cases and monitoring carriage of meningococci by twice yearly colonization surveys. In the framework of this study, we observed complete replacement of an A: sequence types (ST)-2859 clone by a W: ST-2881 clone. However, after a gap of 1 year, A: ST-2859 meningococci re-emerged both as colonizer and meningitis causing agent. Our whole genome sequencing analyses compared the A population isolated prior to the W colonization and disease wave with the re-emerging A meningococci. This analysis revealed expansion of one clone differing in only one nonsynonymous SNP from several isolates already present in the original A: ST-2859 population. The colonization bottleneck caused by the competing W meningococci thus resulted in a profound reduction in genomic diversity of the A meningococcal population.

    Genome biology and evolution 2018;10;8;2102-2109

  • Automated typing of red blood cell and platelet antigens: a whole-genome sequencing study.

    Lane WJ, Westhoff CM, Gleadall NS, Aguad M, Smeland-Wagman R, Vege S, Simmons DP, Mah HH, Lebo MS, Walter K, Soranzo N, Di Angelantonio E, Danesh J, Roberts DJ, Watkins NA, Ouwehand WH, Butterworth AS, Kaufman RM, Rehm HL, Silberstein LE, Green RC and MedSeq Project

    Department of Pathology, Brigham and Women's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA. Electronic address:

    Background: There are more than 300 known red blood cell (RBC) antigens and 33 platelet antigens that differ between individuals. Sensitisation to antigens is a serious complication that can occur in prenatal medicine and after blood transfusion, particularly for patients who require multiple transfusions. Although pre-transfusion compatibility testing largely relies on serological methods, reagents are not available for many antigens. Methods based on single-nucleotide polymorphism (SNP) arrays have been used, but typing for ABO and Rh-the most important blood groups-cannot be done with SNP typing alone. We aimed to develop a novel method based on whole-genome sequencing to identify RBC and platelet antigens.

    Methods: This whole-genome sequencing study is a subanalysis of data from patients in the whole-genome sequencing arm of the MedSeq Project randomised controlled trial (NCT01736566) with no measured patient outcomes. We created a database of molecular changes in RBC and platelet antigens and developed an automated antigen-typing algorithm based on whole-genome sequencing (bloodTyper). This algorithm was iteratively improved to address cis-trans haplotype ambiguities and homologous gene alignments. Whole-genome sequencing data from 110 MedSeq participants (30 × depth) were used to initially validate bloodTyper through comparison with conventional serology and SNP methods for typing of 38 RBC antigens in 12 blood-group systems and 22 human platelet antigens. bloodTyper was further validated with whole-genome sequencing data from 200 INTERVAL trial participants (15 × depth) with serological comparisons.

    Findings: We iteratively improved bloodTyper by comparing its typing results with conventional serological and SNP typing in three rounds of testing. The initial whole-genome sequencing typing algorithm was 99·5% concordant across the first 20 MedSeq genomes. Addressing discordances led to development of an improved algorithm that was 99·8% concordant for the remaining 90 MedSeq genomes. Additional modifications led to the final algorithm, which was 99·2% concordant across 200 INTERVAL genomes (or 99·9% after adjustment for the lower depth of coverage).

    Interpretation: By enabling more precise antigen-matching of patients with blood donors, antigen typing based on whole-genome sequencing provides a novel approach to improve transfusion outcomes with the potential to transform the practice of transfusion medicine.

    Funding: National Human Genome Research Institute, Doris Duke Charitable Foundation, National Health Service Blood and Transplant, National Institute for Health Research, and Wellcome Trust.

    Funded by: NCI NIH HHS: R01 CA154517; NHGRI NIH HHS: R03 HG008809, U01 HG006500, U01 HG008685; NHLBI NIH HHS: P01 HL095489, T32 HL007627; NIA NIH HHS: RF1 AG047866, U01 AG024904; NIAMS NIH HHS: P60 AR047782; NICHD NIH HHS: U19 HD077671; Wellcome Trust

    The Lancet. Haematology 2018;5;6;e241-e251

  • Separate and combined associations of obesity and metabolic health with coronary heart disease: a pan-European case-cohort analysis.

    Lassale C, Tzoulaki I, Moons KGM, Sweeting M, Boer J, Johnson L, Huerta JM, Agnoli C, Freisling H, Weiderpass E, Wennberg P, van der A DL, Arriola L, Benetou V, Boeing H, Bonnet F, Colorado-Yohar SM, Engström G, Eriksen AK, Ferrari P, Grioni S, Johansson M, Kaaks R, Katsoulis M, Katzke V, Key TJ, Matullo G, Melander O, Molina-Portillo E, Moreno-Iribas C, Norberg M, Overvad K, Panico S, Quirós JR, Saieva C, Skeie G, Steffen A, Stepien M, Tjønneland A, Trichopoulou A, Tumino R, van der Schouw YT, Verschuren WMM, Langenberg C, Di Angelantonio E, Riboli E, Wareham NJ, Danesh J and Butterworth AS

    Department of Epidemiology and Biostatistics, Imperial College London, London W2 1PG, UK.

    Aims: The hypothesis of 'metabolically healthy obesity' implies that, in the absence of metabolic dysfunction, individuals with excess adiposity are not at greater cardiovascular risk. We tested this hypothesis in a large pan-European prospective study.

    Methods and results: We conducted a case-cohort analysis in the 520 000-person European Prospective Investigation into Cancer and Nutrition study ('EPIC-CVD'). During a median follow-up of 12.2 years, we recorded 7637 incident coronary heart disease (CHD) cases. Using cut-offs recommended by guidelines, we defined obesity and overweight using body mass index (BMI), and metabolic dysfunction ('unhealthy') as ≥ 3 of elevated blood pressure, hypertriglyceridaemia, low HDL-cholesterol, hyperglycaemia, and elevated waist circumference. We calculated hazard ratios (HRs) and 95% confidence intervals (95% CI) within each country using Prentice-weighted Cox proportional hazard regressions, accounting for age, sex, centre, education, smoking, diet, and physical activity. Compared with metabolically healthy normal weight people (reference), HRs were 2.15 (95% CI: 1.79; 2.57) for unhealthy normal weight, 2.33 (1.97; 2.76) for unhealthy overweight, and 2.54 (2.21; 2.92) for unhealthy obese people. Compared with the reference group, HRs were 1.26 (1.14; 1.40) and 1.28 (1.03; 1.58) for metabolically healthy overweight and obese people, respectively. These results were robust to various sensitivity analyses.

    Conclusion: Irrespective of BMI, metabolically unhealthy individuals had higher CHD risk than their healthy counterparts. Conversely, irrespective of metabolic health, overweight and obese people had higher CHD risk than lean people. These findings challenge the concept of 'metabolically healthy obesity', encouraging population-wide strategies to tackle obesity.

    Funded by: British Heart Foundation: RG/08/014/24067, RG/13/13/30194; Cancer Research UK: A16491; European Research Council: 268834; Medical Research Council: G0800270, MC_UU_12015/1, MR/L003120/1, MR/M012190/1

    European heart journal 2018;39;5;397-406

  • Population-based analysis of ocular Chlamydia trachomatis in trachoma-endemic West African communities identifies genomic markers of disease severity.

    Last AR, Pickering H, Roberts CH, Coll F, Phelan J, Burr SE, Cassama E, Nabicassa M, Seth-Smith HMB, Hadfield J, Cutcliffe LT, Clarke IN, Mabey DCW, Bailey RL, Clark TG, Thomson NR and Holland MJ

    Clinical Research Department, London School of Hygiene and Tropical Medicine, Keppel Street, London, UK.

    Background: Chlamydia trachomatis (Ct) is the most common infectious cause of blindness and bacterial sexually transmitted infection worldwide. Ct strain-specific differences in clinical trachoma suggest that genetic polymorphisms in Ct may contribute to the observed variability in severity of clinical disease.

    Methods: Using Ct whole genome sequences obtained directly from conjunctival swabs, we studied Ct genomic diversity and associations between Ct genetic polymorphisms with ocular localization and disease severity in a treatment-naïve trachoma-endemic population in Guinea-Bissau, West Africa.

    Results: All Ct sequences fall within the T2 ocular clade phylogenetically. This is consistent with the presence of the characteristic deletion in trpA resulting in a truncated non-functional protein and the ocular tyrosine repeat regions present in tarP associated with ocular tissue localization. We have identified 21 Ct non-synonymous single nucleotide polymorphisms (SNPs) associated with ocular localization, including SNPs within pmpD (odds ratio, OR = 4.07, p* = 0.001) and tarP (OR = 0.34, p* = 0.009). Eight synonymous SNPs associated with disease severity were found in yjfH (rlmB) (OR = 0.13, p* = 0.037), CTA0273 (OR = 0.12, p* = 0.027), trmD (OR = 0.12, p* = 0.032), CTA0744 (OR = 0.12, p* = 0.041), glgA (OR = 0.10, p* = 0.026), alaS (OR = 0.10, p* = 0.032), pmpE (OR = 0.08, p* = 0.001) and the intergenic region CTA0744-CTA0745 (OR = 0.13, p* = 0.043).

    Conclusions: This study demonstrates the extent of genomic diversity within a naturally circulating population of ocular Ct and is the first to describe novel genomic associations with disease severity. These findings direct investigation of host-pathogen interactions that may be important in ocular Ct pathogenesis and disease transmission.

    Funded by: Medical Research Council: MR/K000551/1; Wellcome Trust: 079246/Z/06/Z, 097330/Z/11/Z, 098051, 105609/Z/14/Z

    Genome medicine 2018;10;1;15

  • Support for a clade of Placozoa and Cnidaria in genes with minimal compositional bias.

    Laumer CE, Gruber-Vodicka H, Hadfield MG, Pearse VB, Riesgo A, Marioni JC and Giribet G

    Wellcome Trust Sanger Institute, Hinxton, United Kingdom.

    The phylogenetic placement of the morphologically simple placozoans is crucial to understanding the evolution of complex animal traits. Here, we examine the influence of adding new genomes from placozoans to a large dataset designed to study the deepest splits in the animal phylogeny. Using site-heterogeneous substitution models, we show that it is possible to obtain strong support, in both amino acid and reduced-alphabet matrices, for either a sister-group relationship between Cnidaria and Placozoa, or for Cnidaria and Bilateria as seen in most published work to date, depending on the orthologues selected to construct the matrix. We demonstrate that a majority of genes show evidence of compositional heterogeneity, and that support for the Cnidaria + Bilateria clade can be assigned to this source of systematic error. In interpreting these results, we caution against a peremptory reading of placozoans as secondarily reduced forms of little relevance to broader discussions of early animal evolution.

    eLife 2018;7

  • BCL11A interacts with SOX2 to control the expression of epigenetic regulators in lung squamous carcinoma.

    Lazarus KA, Hadi F, Zambon E, Bach K, Santolla MF, Watson JK, Correia LL, Das M, Ugur R, Pensa S, Becker L, Campos LS, Ladds G, Liu P, Evan GI, McCaughan FM, Le Quesne J, Lee JH, Calado D and Khaled WT

    Department of Pharmacology, University of Cambridge, Cambridge, CB2 1PD, UK.

    Patients diagnosed with lung squamous cell carcinoma (LUSC) have limited targeted therapies. We report here the identification and characterisation of BCL11A, as a LUSC oncogene. Analysis of cancer genomics datasets revealed BCL11A to be upregulated in LUSC but not in lung adenocarcinoma (LUAD). Experimentally we demonstrate that non-physiological levels of BCL11A in vitro and in vivo promote squamous-like phenotypes, while its knockdown abolishes xenograft tumour formation. At the molecular level we found that BCL11A is transcriptionally regulated by SOX2 and is required for its oncogenic functions. Furthermore, we show that BCL11A and SOX2 regulate the expression of several transcription factors, including SETD8. We demonstrate that shRNA-mediated or pharmacological inhibition of SETD8 selectively inhibits LUSC growth. Collectively, our study indicates that BCL11A is integral to LUSC pathology and highlights the disruption of the BCL11A-SOX2 transcriptional programme as a novel candidate for drug development.

    Funded by: Biotechnology and Biological Sciences Research Council (BBSRC): BB/M00015X/2; Cancer Research UK (CRUK): C47525/A17348; Medical Research Council: MC_PC_12009, MC_UP_1203/1, MR/J008060/1; National Centre for the Replacement, Refinement and Reduction of Animals in Research: NC/N002369/1; Wellcome Trust

    Nature communications 2018;9;1;3327

  • Terminal uridylyltransferases target RNA viruses as part of the innate immune system.

    Le Pen J, Jiang H, Di Domenico T, Kneuss E, Kosałka J, Leung C, Morgan M, Much C, Rudolph KLM, Enright AJ, O'Carroll D, Wang D and Miska EA

    Gurdon Institute, University of Cambridge, Cambridge, UK.

    RNA viruses are a major threat to animals and plants. RNA interference (RNAi) and the interferon response provide innate antiviral defense against RNA viruses. Here, we performed a large-scale screen using Caenorhabditis elegans and its natural pathogen the Orsay virus (OrV), and we identified cde-1 as important for antiviral defense. CDE-1 is a homolog of the mammalian TUT4 and TUT7 terminal uridylyltransferases (collectively called TUT4(7)); its catalytic activity is required for its antiviral function. CDE-1 uridylates the 3' end of the OrV RNA genome and promotes its degradation in a manner independent of the RNAi pathway. Likewise, TUT4(7) enzymes uridylate influenza A virus (IAV) mRNAs in mammalian cells. Deletion of TUT4(7) leads to increased IAV mRNA and protein levels. Collectively, these data implicate 3'-terminal uridylation of viral RNAs as a conserved antiviral defense mechanism.

    Funded by: Cancer Research UK: A14492, A18583; European Research Council: 260688; Medical Research Council: MR/K017047/1; Wellcome Trust: 092096, 093970

    Nature structural & molecular biology 2018;25;9;778-786

  • A Distinct Class of Genome Rearrangements Driven by Heterologous Recombination.

    León-Ortiz AM, Panier S, Sarek G, Vannier JB, Patel H, Campbell PJ and Boulton SJ

    DSB Repair Metabolism Laboratory, The Francis Crick Institute, 1 Midland Road, London NW1 1AT, UK.

    Erroneous DNA repair by heterologous recombination (Ht-REC) is a potential threat to genome stability, but evidence supporting its prevalence is lacking. Here we demonstrate that recombination is possible between heterologous sequences and that it is a source of chromosomal alterations in mitotic and meiotic cells. Mechanistically, we find that the RTEL1 and HIM-6/BLM helicases and the BRCA1 homolog BRC-1 counteract Ht-REC in Caenorhabditis elegans, whereas mismatch repair does not. Instead, MSH-2/6 drives Ht-REC events in rtel-1 and brc-1 mutants and excessive crossovers in rtel-1 mutant meioses. Loss of vertebrate Rtel1 also causes a variety of unusually large and complex structural variations, including chromothripsis, breakage-fusion-bridge events, and tandem duplications with distant intra-chromosomal insertions, whose structure are consistent with a role for RTEL1 in preventing Ht-REC during break-induced replication. Our data establish Ht-REC as an unappreciated source of genome instability that underpins a novel class of complex genome rearrangements that likely arise during replication stress.

    Funded by: Cancer Research UK: FC0010048; Medical Research Council: FC0010048, MC_UP_1102/14; Wellcome Trust: 077012/Z/05/Z, FC0010048, WT088340MA

    Molecular cell 2018;69;2;292-305.e6

  • Integrated pathogen load and dual transcriptome analysis of systemic host-pathogen interactions in severe malaria.

    Lee HJ, Georgiadou A, Walther M, Nwakanma D, Stewart LB, Levin M, Otto TD, Conway DJ, Coin LJ and Cunnington AJ

    Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland 4072, Australia.

    The pathogenesis of infectious diseases depends on the interaction of host and pathogen. In <i>Plasmodium falciparum</i> malaria, host and parasite processes can be assessed by dual RNA sequencing of blood from infected patients. We performed dual transcriptome analyses on samples from 46 malaria-infected Gambian children to reveal mechanisms driving the systemic pathophysiology of severe malaria. Integrating these transcriptomic data with estimates of parasite load and detailed clinical information allowed consideration of potentially confounding effects due to differing leukocyte proportions in blood, parasite developmental stage, and whole-body pathogen load. We report hundreds of human and parasite genes differentially expressed between severe and uncomplicated malaria, with distinct profiles associated with coma, hyperlactatemia, and thrombocytopenia. High expression of neutrophil granule-related genes was consistently associated with all severe malaria phenotypes. We observed severity-associated variation in the expression of parasite genes, which determine cytoadhesion to vascular endothelium, rigidity of infected erythrocytes, and parasite growth rate. Up to 99% of human differential gene expression in severe malaria was driven by differences in parasite load, whereas parasite gene expression showed little association with parasite load. Coexpression analyses revealed interactions between human and <i>P. falciparum</i>, with prominent co-regulation of translation genes in severe malaria between host and parasite. Multivariate analyses suggested that increased expression of granulopoiesis and interferon-γ-related genes, together with inadequate suppression of type 1 interferon signaling, best explained severity of infection. These findings provide a framework for understanding the contributions of host and parasite to the pathogenesis of severe malaria and identifying new treatments.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/K003240/1; European Research Council: 294428; Medical Research Council: MR/L006529/1, MR/M008924/1; Wellcome Trust: 098051, WT097835MF, WT101650MA

    Science translational medicine 2018;10;447

  • WormBase 2017: molting into a new stage.

    Lee RYN, Howe KL, Harris TW, Arnaboldi V, Cain S, Chan J, Chen WJ, Davis P, Gao S, Grove C, Kishore R, Muller HM, Nakamura C, Nuin P, Paulini M, Raciti D, Rodgers F, Russell M, Schindelman G, Tuli MA, Van Auken K, Wang Q, Williams G, Wright A, Yook K, Berriman M, Kersey P, Schedl T, Stein L and Sternberg PW

    Division of Biology and Biological Engineering 156-29, California Institute of Technology, Pasadena, CA 91125, USA.

    WormBase ( is an important knowledge resource for biomedical researchers worldwide. To accommodate the ever increasing amount and complexity of research data, WormBase continues to advance its practices on data acquisition, curation and retrieval to most effectively deliver comprehensive knowledge about Caenorhabditis elegans, and genomic information about other nematodes and parasitic flatworms. Recent notable enhancements include user-directed submission of data, such as micropublication; genomic data curation and presentation, including additional genomes and JBrowse, respectively; new query tools, such as SimpleMine, Gene Enrichment Analysis; new data displays, such as the Person Lineage browser and the Summary of Ontology-based Annotations. Anticipating more rapid data growth ahead, WormBase continues the process of migrating to a cutting-edge database technology to achieve better stability, scalability, reproducibility and a faster response time. To better serve the broader research community, WormBase, with five other Model Organism Databases and The Gene Ontology project, have begun to collaborate formally as the Alliance of Genome Resources.

    Funded by: Medical Research Council: MR/L001020/1; NHGRI NIH HHS: U24 HG002223, U41 HG002223; NLM NIH HHS: U01 LM012672

    Nucleic acids research 2018;46;D1;D869-D874

  • Population dynamics of normal human blood inferred from somatic mutations.

    Lee-Six H, Øbro NF, Shepherd MS, Grossmann S, Dawson K, Belmonte M, Osborne RJ, Huntly BJP, Martincorena I, Anderson E, O'Neill L, Stratton MR, Laurenti E, Green AR, Kent DG and Campbell PJ

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, UK.

    Haematopoietic stem cells drive blood production, but their population size and lifetime dynamics have not been quantified directly in humans. Here we identified 129,582 spontaneous, genome-wide somatic mutations in 140 single-cell-derived haematopoietic stem and progenitor colonies from a healthy 59-year-old man and applied population-genetics approaches to reconstruct clonal dynamics. Cell divisions from early embryogenesis were evident in the phylogenetic tree; all blood cells were derived from a common ancestor that preceded gastrulation. The size of the stem cell population grew steadily in early life, reaching a stable plateau by adolescence. We estimate the numbers of haematopoietic stem cells that are actively making white blood cells at any one time to be in the range of 50,000-200,000. We observed adult haematopoietic stem cell clones that generate multilineage outputs, including granulocytes and B lymphocytes. Harnessing naturally occurring mutations to report the clonal architecture of an organ enables the high-resolution reconstruction of somatic cell dynamics in humans.

    Funded by: Medical Research Council: MC_PC_12009, MC_PC_16040, MR/M008975/1, MR/M010392/1, MR/R009708/1, MR/S036113/1; Wellcome Trust

    Nature 2018;561;7724;473-478

  • pyseer: a comprehensive tool for microbial pangenome-wide association studies.

    Lees JA, Galardini M, Bentley SD, Weiser JN and Corander J

    Department of Microbiology, New York University School of Medicine, New York, NY, USA.

    Summary: Genome-wide association studies (GWAS) in microbes have different challenges to GWAS in eukaryotes. These have been addressed by a number of different methods. pyseer brings these techniques together in one package tailored to microbial GWAS, allows greater flexibility of the input data used, and adds new methods to interpret the association results.

    Availability and implementation: pyseer is written in python and is freely available at, or can be installed through pip. Documentation and a tutorial are available at

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Funded by: NIAID NIH HHS: R01 AI038446, R01 AI105168; Wellcome Trust: 098051

    Bioinformatics (Oxford, England) 2018;34;24;4310-4312

  • Evaluation of phylogenetic reconstruction methods using bacterial whole genomes: a simulation based study.

    Lees JA, Kendall M, Parkhill J, Colijn C, Bentley SD and Harris SR

    Infection Genomics, Wellcome Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK.

    <b>Background</b>: Phylogenetic reconstruction is a necessary first step in many analyses which use whole genome sequence data from bacterial populations. There are many available methods to infer phylogenies, and these have various advantages and disadvantages, but few unbiased comparisons of the range of approaches have been made. <b>Methods</b>: We simulated data from a defined "true tree" using a realistic evolutionary model. We built phylogenies from this data using a range of methods, and compared reconstructed trees to the true tree using two measures, noting the computational time needed for different phylogenetic reconstructions. We also used real data from <i>Streptococcus pneumoniae</i> alignments to compare individual core gene trees to a core genome tree. <b>Results</b>: We found that, as expected, maximum likelihood trees from good quality alignments were the most accurate, but also the most computationally intensive. Using less accurate phylogenetic reconstruction methods, we were able to obtain results of comparable accuracy; we found that approximate results can rapidly be obtained using genetic distance based methods. In real data we found that highly conserved core genes, such as those involved in translation, gave an inaccurate tree topology, whereas genes involved in recombination events gave inaccurate branch lengths. We also show a tree-of-trees, relating the results of different phylogenetic reconstructions to each other. <b>Conclusions</b>: We recommend three approaches, depending on requirements for accuracy and computational time. Quicker approaches that do not perform full maximum likelihood optimisation may be useful for many analyses requiring a phylogeny, as generating a high quality input alignment is likely to be the major limiting factor of accurate tree topology. We have publicly released our simulated data and code to enable further comparisons.

    Wellcome open research 2018;3;33

  • Genetics of HbA1c: a case study in clinical translation.

    Leong A and Wheeler E

    Division of General Internal Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Harvard Medical School, Boston, MA 02115, USA.

    Glycated hemoglobin (HbA1c) measures the amount of glucose in the blood in the previous 2-3 months and is used to test whether an individual has diabetes (HbA1c≥6.5%), or how well they are managing their diabetes. Genome-wide association studies have successfully identified multiple genomic loci influencing HbA1c, through both glycemic (factors that affect the amount blood glucose levels) and erythrocytic (factors that affect the red blood cell) pathways. Inaccuracies in HbA1c, due to non-glycemic variants, could lead to suboptimal care or adverse health consequences. A recently published example is the erythrocytic variant (rs1050828) in G6PD, which leads to the artificial lowering of HbA1c and missed diagnosis of diabetes using current thresholds. In this review we will discuss recent insights into the genetic etiology of HbA1c, and how these can translate to the clinic.

    Current opinion in genetics & development 2018;50;79-85

  • BCL11B mutations in patients affected by a neurodevelopmental disorder with reduced type 2 innate lymphoid cells.

    Lessel D, Gehbauer C, Bramswig NC, Schluth-Bolard C, Venkataramanappa S, van Gassen KLI, Hempel M, Haack TB, Baresic A, Genetti CA, Funari MFA, Lessel I, Kuhlmann L, Simon R, Liu P, Denecke J, Kuechler A, de Kruijff I, Shoukier M, Lek M, Mullen T, Lüdecke HJ, Lerario AM, Kobbe R, Krieger T, Demeer B, Lebrun M, Keren B, Nava C, Buratti J, Afenjar A, Shinawi M, Guillen Sacoto MJ, Gauthier J, Hamdan FF, Laberge AM, Campeau PM, Louie RJ, Cathey SS, Prinz I, Jorge AAL, Terhal PA, Lenhard B, Wieczorek D, Strom TM, Agrawal PB, Britsch S, Tolosa E and Kubisch C

    Institute of Human Genetics, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.

    The transcription factor BCL11B is essential for development of the nervous and the immune system, and Bcl11b deficiency results in structural brain defects, reduced learning capacity, and impaired immune cell development in mice. However, the precise role of BCL11B in humans is largely unexplored, except for a single patient with a BCL11B missense mutation, affected by multisystem anomalies and profound immune deficiency. Using massively parallel sequencing we identified 13 patients bearing heterozygous germline alterations in BCL11B. Notably, all of them are affected by global developmental delay with speech impairment and intellectual disability; however, none displayed overt clinical signs of immune deficiency. Six frameshift mutations, two nonsense mutations, one missense mutation, and two chromosomal rearrangements resulting in diminished BCL11B expression, arose de novo. A further frameshift mutation was transmitted from a similarly affected mother. Interestingly, the most severely affected patient harbours a missense mutation within a zinc-finger domain of BCL11B, probably affecting the DNA-binding structural interface, similar to the recently published patient. Furthermore, the most C-terminally located premature termination codon mutation fails to rescue the progenitor cell proliferation defect in hippocampal slice cultures from Bcl11b-deficient mice. Concerning the role of BCL11B in the immune system, extensive immune phenotyping of our patients revealed alterations in the T cell compartment and lack of peripheral type 2 innate lymphoid cells (ILC2s), consistent with the findings described in Bcl11b-deficient mice. Unsupervised analysis of 102 T lymphocyte subpopulations showed that the patients clearly cluster apart from healthy children, further supporting the common aetiology of the disorder. Taken together, we show here that mutations leading either to BCL11B haploinsufficiency or to a truncated BCL11B protein clinically cause a non-syndromic neurodevelopmental delay. In addition, we suggest that missense mutations affecting specific sites within zinc-finger domains might result in distinct and more severe clinical outcomes.

    Funded by: Medical Research Council: MC_UP_1102/1; NHGRI NIH HHS: UM1 HG008900; NIAMS NIH HHS: R01 AR068429; NICHD NIH HHS: U19 HD077671

    Brain : a journal of neurology 2018;141;8;2299-2311

  • Earth BioGenome Project: Sequencing life for the future of life.

    Lewin HA, Robinson GE, Kress WJ, Baker WJ, Coddington J, Crandall KA, Durbin R, Edwards SV, Forest F, Gilbert MTP, Goldstein MM, Grigoriev IV, Hackett KJ, Haussler D, Jarvis ED, Johnson WE, Patrinos A, Richards S, Castilla-Rubio JC, van Sluys MA, Soltis PS, Xu X, Yang H and Zhang G

    Department of Evolution and Ecology, University of California, Davis, CA 95616;

    Increasing our understanding of Earth's biodiversity and responsibly stewarding its resources are among the most crucial scientific and social challenges of the new millennium. These challenges require fundamental new knowledge of the organization, evolution, functions, and interactions among millions of the planet's organisms. Herein, we present a perspective on the Earth BioGenome Project (EBP), a moonshot for biology that aims to sequence, catalog, and characterize the genomes of all of Earth's eukaryotic biodiversity over a period of 10 years. The outcomes of the EBP will inform a broad range of major issues facing humanity, such as the impact of climate change on biodiversity, the conservation of endangered species and ecosystems, and the preservation and enhancement of ecosystem services. We describe hurdles that the project faces, including data-sharing policies that ensure a permanent, freely available resource for future scientific discovery while respecting access and benefit sharing guidelines of the Nagoya Protocol. We also describe scientific and organizational challenges in executing such an ambitious project, and the structure proposed to achieve the project's goals. The far-reaching potential benefits of creating an open digital repository of genomic information for life on Earth can be realized only by a coordinated international effort.

    Funded by: Howard Hughes Medical Institute; Wellcome Trust: 207492/Z/17/Z

    Proceedings of the National Academy of Sciences of the United States of America 2018;115;17;4325-4333

  • Whole exome sequencing in adult-onset hearing loss reveals a high load of predicted pathogenic variants in known deafness-associated genes and identifies new candidate genes.

    Lewis MA, Nolan LS, Cadge BA, Matthews LJ, Schulte BA, Dubno JR, Steel KP and Dawson SJ

    Wolfson Centre for Age-Related Diseases, King's College London, WC2R 2LS, London, UK.

    Background: Deafness is a highly heterogenous disorder with over 100 genes known to underlie human non-syndromic hearing impairment. However, many more remain undiscovered, particularly those involved in the most common form of deafness: adult-onset progressive hearing loss. Despite several genome-wide association studies of adult hearing status, it remains unclear whether the genetic architecture of this common sensory loss consists of multiple rare variants each with large effect size or many common susceptibility variants each with small to medium effects. As next generation sequencing is now being utilised in clinical diagnosis, our aim was to explore the viability of diagnosing the genetic cause of hearing loss using whole exome sequencing in individual subjects as in a clinical setting.

    Methods: We performed exome sequencing of thirty patients selected for distinct phenotypic sub-types from well-characterised cohorts of 1479 people with adult-onset hearing loss.

    Results: Every individual carried predicted pathogenic variants in at least ten deafness-associated genes; similar findings were obtained from an analysis of the 1000 Genomes Project data unselected for hearing status. We have identified putative causal variants in known deafness genes and several novel candidate genes, including NEDD4 and NEFH that were mutated in multiple individuals.

    Conclusions: The high frequency of predicted-pathogenic variants detected in known deafness-associated genes was unexpected and has significant implications for current diagnostic sequencing in deafness. Our findings suggest that in a clinic setting, efforts should be made to a) confirm key sequence results by Sanger sequencing, b) assess segregations of variants and phenotypes within the family if at all possible, and c) use caution in applying current pathogenicity prediction algorithms for diagnostic purposes. We conclude that there may be a high number of pathogenic variants affecting hearing in the ageing population, including many in known deafness-associated genes. Our findings of frequent predicted-pathogenic variants in both our hearing-impaired sample and in the larger 1000 Genomes Project sample unselected for auditory function suggests that the reference population for interpreting variants for this very common disorder should be a population of people with good hearing for their age rather than an unselected population.

    Funded by: Medical Research Council: 098051, G0300212, MC_QA137918, MR/N012119/1; NCATS NIH HHS: UL1 TR000062, UL1 TR001450; NIDCD NIH HHS: P50 DC000422; Wellcome Trust: 100669

    BMC medical genomics 2018;11;1;77

  • Mutant calreticulin knockin mice develop thrombocytosis and myelofibrosis without a stem cell self-renewal advantage.

    Li J, Prins D, Park HJ, Grinfeld J, Gonzalez-Arias C, Loughran S, Dovey OM, Klampfl T, Bennett C, Hamilton TL, Pask DC, Sneade R, Williams M, Aungier J, Ghevaert C, Vassiliou GS, Kent DG and Green AR

    Cambridge Institute for Medical Research and Wellcome Trust/Medical Research Council Stem Cell Institute and.

    Somatic mutations in the endoplasmic reticulum chaperone calreticulin (CALR) are detected in approximately 40% of patients with essential thrombocythemia (ET) and primary myelofibrosis (PMF). Multiple different mutations have been reported, but all result in a +1-bp frameshift and generate a novel protein C terminus. In this study, we generated a conditional mouse knockin model of the most common CALR mutation, a 52-bp deletion. The mutant novel human C-terminal sequence is integrated into the otherwise intact mouse CALR gene and results in mutant CALR expression under the control of the endogenous mouse locus. CALR<sup>del/+</sup> mice develop a transplantable ET-like disease with marked thrombocytosis, which is associated with increased and morphologically abnormal megakaryocytes and increased numbers of phenotypically defined hematopoietic stem cells (HSCs). Homozygous CALR<sup>del/del</sup> mice developed extreme thrombocytosis accompanied by features of MF, including leukocytosis, reduced hematocrit, splenomegaly, and increased bone marrow reticulin. CALR<sup>del/+</sup> HSCs were more proliferative in vitro, but neither CALR<sup>del/+</sup> nor CALR<sup>del/del</sup> displayed a competitive transplantation advantage in primary or secondary recipient mice. These results demonstrate the consequences of heterozygous and homozygous CALR mutations and provide a powerful model for dissecting the pathogenesis of CALR-mutant ET and PMF.

    Funded by: British Heart Foundation: FS/09/039/27788; Cancer Research UK; Medical Research Council: MC_PC_12009; Wellcome Trust

    Blood 2018;131;6;649-661

  • Genome-wide CRISPR-KO Screen Uncovers mTORC1-Mediated Gsk3 Regulation in Naive Pluripotency Maintenance and Dissolution.

    Li M, Yu JSL, Tilgner K, Ong SH, Koike-Yusa H and Yusa K

    Wellcome Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    The genetic basis of naive pluripotency maintenance and loss is a central question in embryonic stem cell biology. Here, we deploy CRISPR-knockout-based screens in mouse embryonic stem cells to interrogate this question through a genome-wide, non-biased approach using the Rex1GFP reporter as a phenotypic readout. This highly sensitive and efficient method identified genes in diverse biological processes and pathways. We uncovered a key role for negative regulators of mTORC1 in maintenance and exit from naive pluripotency and provided an integrated account of how mTORC1 activity influences naive pluripotency through Gsk3. Our study therefore reinforces Gsk3 as the central node and provides a comprehensive, data-rich resource that will improve our understanding of mechanisms regulating pluripotency and stimulate avenues for further mechanistic studies.

    Funded by: Wellcome Trust

    Cell reports 2018;24;2;489-502

  • Organoid cultures recapitulate esophageal adenocarcinoma heterogeneity providing a model for clonality studies and precision therapeutics.

    Li X, Francies HE, Secrier M, Perner J, Miremadi A, Galeano-Dalmau N, Barendt WJ, Letchford L, Leyden GM, Goffin EK, Barthorpe A, Lightfoot H, Chen E, Gilbert J, Noorani A, Devonshire G, Bower L, Grantham A, MacRae S, Grehan N, Wedge DC, Fitzgerald RC and Garnett MJ

    MRC Cancer Unit, University of Cambridge, Cambridge, CB2 0XZ, UK.

    Esophageal adenocarcinoma (EAC) incidence is increasing while 5-year survival rates remain less than 15%. A lack of experimental models has hampered progress. We have generated clinically annotated EAC organoid cultures that recapitulate the morphology, genomic, and transcriptomic landscape of the primary tumor including point mutations, copy number alterations, and mutational signatures. Karyotyping of organoid cultures has confirmed polyclonality reflecting the clonal architecture of the primary tumor. Furthermore, subclones underwent clonal selection associated with driver gene status. Medium throughput drug sensitivity testing demonstrates the potential of targeting receptor tyrosine kinases and downstream mediators. EAC organoid cultures provide a pre-clinical tool for studies of clonal evolution and precision therapeutics.

    Funded by: Cancer Research UK (CRUK): C44943/A22536, RG66287; DH | National Institute for Health Research (NIHR): RG67258; EIF | Stand Up To Cancer (SU2C): SU2C-AACR-DT1213; Medical Research Council: MC_UU_12022/2; Medical Research Council (MRC): RG84369; Wellcome Trust: 102696

    Nature communications 2018;9;1;2983

  • Genome Analyses of >200,000 Individuals Identify 58 Loci for Chronic Inflammation and Highlight Pathways that Link Inflammation and Complex Disorders.

    Ligthart S, Vaez A, Võsa U, Stathopoulou MG, de Vries PS, Prins BP, Van der Most PJ, Tanaka T, Naderi E, Rose LM, Wu Y, Karlsson R, Barbalic M, Lin H, Pool R, Zhu G, Macé A, Sidore C, Trompet S, Mangino M, Sabater-Lleal M, Kemp JP, Abbasi A, Kacprowski T, Verweij N, Smith AV, Huang T, Marzi C, Feitosa MF, Lohman KK, Kleber ME, Milaneschi Y, Mueller C, Huq M, Vlachopoulou E, Lyytikäinen LP, Oldmeadow C, Deelen J, Perola M, Zhao JH, Feenstra B, LifeLines Cohort Study, Amini M, CHARGE Inflammation Working Group, Lahti J, Schraut KE, Fornage M, Suktitipat B, Chen WM, Li X, Nutile T, Malerba G, Luan J, Bak T, Schork N, Del Greco M F, Thiering E, Mahajan A, Marioni RE, Mihailov E, Eriksson J, Ozel AB, Zhang W, Nethander M, Cheng YC, Aslibekyan S, Ang W, Gandin I, Yengo L, Portas L, Kooperberg C, Hofer E, Rajan KB, Schurmann C, den Hollander W, Ahluwalia TS, Zhao J, Draisma HHM, Ford I, Timpson N, Teumer A, Huang H, Wahl S, Liu Y, Huang J, Uh HW, Geller F, Joshi PK, Yanek LR, Trabetti E, Lehne B, Vozzi D, Verbanck M, Biino G, Saba Y, Meulenbelt I, O'Connell JR, Laakso M, Giulianini F, Magnusson PKE, Ballantyne CM, Hottenga JJ, Montgomery GW, Rivadineira F, Rueedi R, Steri M, Herzig KH, Stott DJ, Menni C, Frånberg M, St Pourcain B, Felix SB, Pers TH, Bakker SJL, Kraft P, Peters A, Vaidya D, Delgado G, Smit JH, Großmann V, Sinisalo J, Seppälä I, Williams SR, Holliday EG, Moed M, Langenberg C, Räikkönen K, Ding J, Campbell H, Sale MM, Chen YI, James AL, Ruggiero D, Soranzo N, Hartman CA, Smith EN, Berenson GS, Fuchsberger C, Hernandez D, Tiesler CMT, Giedraitis V, Liewald D, Fischer K, Mellström D, Larsson A, Wang Y, Scott WR, Lorentzon M, Beilby J, Ryan KA, Pennell CE, Vuckovic D, Balkau B, Concas MP, Schmidt R, Mendes de Leon CF, Bottinger EP, Kloppenburg M, Paternoster L, Boehnke M, Musk AW, Willemsen G, Evans DM, Madden PAF, Kähönen M, Kutalik Z, Zoledziewska M, Karhunen V, Kritchevsky SB, Sattar N, Lachance G, Clarke R, Harris TB, Raitakari OT, Attia JR, van Heemst D, Kajantie E, Sorice R, Gambaro G, Scott RA, Hicks AA, Ferrucci L, Standl M, Lindgren CM, Starr JM, Karlsson M, Lind L, Li JZ, Chambers JC, Mori TA, de Geus EJCN, Heath AC, Martin NG, Auvinen J, Buckley BM, de Craen AJM, Waldenberger M, Strauch K, Meitinger T, Scott RJ, McEvoy M, Beekman M, Bombieri C, Ridker PM, Mohlke KL, Pedersen NL, Morrison AC, Boomsma DI, Whitfield JB, Strachan DP, Hofman A, Vollenweider P, Cucca F, Jarvelin MR, Jukema JW, Spector TD, Hamsten A, Zeller T, Uitterlinden AG, Nauck M, Gudnason V, Qi L, Grallert H, Borecki IB, Rotter JI, März W, Wild PS, Lokki ML, Boyle M, Salomaa V, Melbye M, Eriksson JG, Wilson JF, Penninx BWJH, Becker DM, Worrall BB, Gibson G, Krauss RM, Ciullo M, Zaza G, Wareham NJ, Oldehinkel AJ, Palmer LJ, Murray SS, Pramstaller PP, Bandinelli S, Heinrich J, Ingelsson E, Deary IJ, Mägi R, Vandenput L, van der Harst P, Desch KC, Kooner JS, Ohlsson C, Hayward C, Lehtimäki T, Shuldiner AR, Arnett DK, Beilin LJ, Robino A, Froguel P, Pirastu M, Jess T, Koenig W, Loos RJF, Evans DA, Schmidt H, Smith GD, Slagboom PE, Eiriksdottir G, Morris AP, Psaty BM, Tracy RP, Nolte IM, Boerwinkle E, Visvikis-Siest S, Reiner AP, Gross M, Bis JC, Franke L, Franco OH, Benjamin EJ, Chasman DI, Dupuis J, Snieder H, Dehghan A and Alizadeh BZ

    Department of Epidemiology, Erasmus University Medical Center, Rotterdam 3000 CA, the Netherlands.

    C-reactive protein (CRP) is a sensitive biomarker of chronic low-grade inflammation and is associated with multiple complex diseases. The genetic determinants of chronic inflammation remain largely unknown, and the causal role of CRP in several clinical outcomes is debated. We performed two genome-wide association studies (GWASs), on HapMap and 1000 Genomes imputed data, of circulating amounts of CRP by using data from 88 studies comprising 204,402 European individuals. Additionally, we performed in silico functional analyses and Mendelian randomization analyses with several clinical outcomes. The GWAS meta-analyses of CRP revealed 58 distinct genetic loci (p < 5 × 10<sup>-8</sup>). After adjustment for body mass index in the regression analysis, the associations at all except three loci remained. The lead variants at the distinct loci explained up to 7.0% of the variance in circulating amounts of CRP. We identified 66 gene sets that were organized in two substantially correlated clusters, one mainly composed of immune pathways and the other characterized by metabolic pathways in the liver. Mendelian randomization analyses revealed a causal protective effect of CRP on schizophrenia and a risk-increasing effect on bipolar disorder. Our findings provide further insights into the biology of inflammation and could lead to interventions for treating inflammation and its clinical consequences.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F019394/1; Medical Research Council: MC_UU_12013/4, MC_UU_12015/1, MR/J012165/1, MR/K002414/1, MR/K026992/1; NHLBI NIH HHS: R01 HL105756, R01 HL141399, U01 HL130114; NIA NIH HHS: P30 AG010129, P30 AG049638; NIDDK NIH HHS: P30 DK020572, R01 DK072193, U01 DK062370; NIH HHS: S10 OD020069

    American journal of human genetics 2018;103;5;691-706

  • Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci.

    Lilue J, Doran AG, Fiddes IT, Abrudan M, Armstrong J, Bennett R, Chow W, Collins J, Collins S, Czechanski A, Danecek P, Diekhans M, Dolle DD, Dunn M, Durbin R, Earl D, Ferguson-Smith A, Flicek P, Flint J, Frankish A, Fu B, Gerstein M, Gilbert J, Goodstadt L, Harrow J, Howe K, Ibarra-Soria X, Kolmogorov M, Lelliott CJ, Logan DW, Loveland J, Mathews CE, Mott R, Muir P, Nachtweide S, Navarro FCP, Odom DT, Park N, Pelan S, Pham SK, Quail M, Reinholdt L, Romoth L, Shirley L, Sisu C, Sjoberg-Herrera M, Stanke M, Steward C, Thomas M, Threadgold G, Thybert D, Torrance J, Wong K, Wood J, Yalcin B, Yang F, Adams DJ, Paten B and Keane TM

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK.

    We report full-length draft de novo genome assemblies for 16 widely used inbred mouse strains and find extensive strain-specific haplotype variation. We identify and characterize 2,567 regions on the current mouse reference genome exhibiting the greatest sequence diversity. These regions are enriched for genes involved in pathogen defence and immunity and exhibit enrichment of transposable elements and signatures of recent retrotransposition events. Combinations of alleles and genes unique to an individual strain are commonly observed at these loci, reflecting distinct strain phenotypes. We used these genomes to improve the mouse reference genome, resulting in the completion of 10 new gene structures. Also, 62 new coding loci were added to the reference genome annotation. These genomes identified a large, previously unannotated, gene (Efcab3-like) encoding 5,874 amino acids. Mutant Efcab3-like mice display anomalies in multiple brain regions, suggesting a possible role for this gene in the regulation of brain development.

    Funded by: NHGRI NIH HHS: U41 HG007234; NIH HHS: P40 OD011102, U42 OD010921; Wellcome Trust: 207492/Z/17/Z

    Nature genetics 2018;50;11;1574-1583

  • Genomic and transcriptomic comparisons of closely related malaria parasites differing in virulence and sequestration pattern.

    Lin JW, Reid AJ, Cunningham D, Böhme U, Tumwine I, Keller-Mclaughlin S, Sanders M, Berriman M and Langhorne J

    Malaria Immunology laboratory, Francis Crick Institute, London, NW1 1AT, UK.

    <b>Background:</b> Malaria parasite species differ greatly in the harm they do to humans. While <i>P. falciparum</i> kills hundreds of thousands per year, <i>P. vivax</i> kills much less often and <i>P. malariae</i> is relatively benign. Strains of the rodent malaria parasite <i>Plasmodium chabaudi</i> show phenotypic variation in virulence during infections of laboratory mice. This make it an excellent species to study genes which may be responsible for this trait. By understanding the mechanisms which underlie differences in virulence we can learn how parasites adapt to their hosts and how we might prevent disease. <b>Methods:</b> Here we present a complete reference genome sequence for a more virulent <i>P. chabaudi</i> strain, PcCB, and perform a detailed comparison with the genome of the less virulent PcAS strain. <b>Results:</b> We found the greatest variation in the subtelomeric regions, in particular amongst the sequences of the <i>pir</i> gene family, which has been associated with virulence and establishment of chronic infection. Despite substantial variation at the sequence level, the repertoire of these genes has been largely maintained, highlighting the requirement for functional conservation as well as diversification in host-parasite interactions. However, a subset of <i>pir</i> genes, previously associated with increased virulence, were more highly expressed in PcCB, suggesting a role for this gene family in virulence differences between strains. We found that core genes involved in red blood cell invasion have been under positive selection and that the more virulent strain has a greater preference for reticulocytes, which has elsewhere been associated with increased virulence. <b>Conclusions:</b> These results provide the basis for a mechanistic understanding of the phenotypic differences between <i>Plasmodium chabaudi</i> strains, which might ultimately be translated into a better understanding of malaria parasites affecting humans.

    Funded by: Medical Research Council: MC_EX_G0901345

    Wellcome open research 2018;3;142

  • BraCeR: B-cell-receptor reconstruction and clonality inference from single-cell RNA-seq.

    Lindeman I, Emerton G, Mamanova L, Snir O, Polanski K, Qiao SW, Sollid LM, Teichmann SA and Stubbington MJT

    Centre for Immune Regulation, University of Oslo and Oslo University Hospital, Oslo, Norway.

    Nature methods 2018;15;8;563-565

  • Investigating the Campylobacter jejuni Transcriptional Response to Host Intestinal Extracts Reveals the Involvement of a Widely Conserved Iron Uptake System.

    Liu MM, Boinett CJ, Chan ACK, Parkhill J, Murphy MEP and Gaynor EC

    Department of Microbiology and Immunology, University of British Columbia, Vancouver, BC, Canada.

    <i>Campylobacter jejuni</i> is a pathogenic bacterium that causes gastroenteritis in humans yet is a widespread commensal in wild and domestic animals, particularly poultry. Using RNA sequencing, we assessed <i>C. jejuni</i> transcriptional responses to medium supplemented with human fecal versus chicken cecal extracts and in extract-supplemented medium versus medium alone. <i>C. jejuni</i> exposed to extracts had altered expression of 40 genes related to iron uptake, metabolism, chemotaxis, energy production, and osmotic stress response. In human fecal versus chicken cecal extracts, <i>C. jejuni</i> displayed higher expression of genes involved in respiration (<i>fdhTU</i>) and in known or putative iron uptake systems (<i>cfbpA</i>, <i>ceuB</i>, <i>chuC</i>, and <i>CJJ81176_1649-1655</i> [here designated <i>1649-1655</i>]). The <i>1649-1655</i> genes and downstream overlapping gene <i>1656</i> were investigated further. Uncharacterized homologues of this system were identified in 33 diverse bacterial species representing 6 different phyla, 21 of which are associated with human disease. The <i>1649</i> and <i>1650</i> (<i>p19</i>) genes encode an iron transporter and a periplasmic iron binding protein, respectively; however, the role of the downstream <i>1651-1656</i> genes was unknown. A Δ<i>1651</i>-<i>1656</i> deletion strain had an iron-sensitive phenotype, consistent with a previously characterized Δ<i>p19</i> mutant, and showed reduced growth in acidic medium, increased sensitivity to streptomycin, and higher resistance to H<sub>2</sub>O<sub>2</sub> stress. In iron-restricted medium, the <i>1651-1656</i> and <i>p19</i> genes were required for optimal growth when using human fecal extracts as an iron source. Collectively, this implicates a function for the <i>1649-1656</i> gene cluster in <i>C.&nb