Sanger Institute - Publications 2015

Number of papers published in 2015: 641

  • Genetic Loci Associated with Allergic Sensitization in Lithuanians.

    Šaulienė I, Greičiuvienė J, Šukienė L, Juškevičiūtė N, Benner C, Zinkevičienė A, Ripatti S, Donner K and Kainov D

    Deptartment of Environmental Research, Siauliai University, Siauliai, Lithuania.

    Allergic rhinitis (AR) is a common and complex disease. It is associated with environmental as well as genetic factors. Three recent genome-wide association studies (GWAS) reported altogether 47 single nucleotide polymorphisms (SNPs) associated with AR or allergic sensitization (AS) in Europeans and North Americans. Two follow up studies in Swedish and Chinese replicated 15 associations. In these studies individuals were selected based on the self-reported AR, or AR/AS diagnosed using blood IgE test or skin prick test (SPT), which were performed often without restriction to specific allergens. Here we performed third replication study in Lithuanians. We used SPT and carefully selected set of allergens prevalent in Lithuania, as well as Illumina Core Exome chip for SNP detection. We genotyped 270 SPT-positive individuals (137 Betulaceae -, 174 Poaceae-, 199 Artemisia-, 70 Helianthus-, 22 Alternaria-, 22 Cladosporium-, 140 mites-, 95 cat- and 97 dog dander-sensitive cases) and 162 SPT-negative controls. We found altogether 13 known SNPs associated with AS (p ≤0.05). Three SNPs were found in Lithuanians sensitive to several allergens, and 10 SNPs were found in Lithuanians sensitive to a certain allergen. For the first time, SNP rs7775228:C was associated with patient sensitivity to dog allergens (F_A=0,269, F_U=0.180, P=0.008). Thus, careful assessment of AS allowed us to detect known genetic variants associated with AS/AR in relatively small cohort of Lithuanians.

    PloS one 2015;10;7;e0134188

  • A global reference for human genetic variation.

    1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA and Abecasis GR

    The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

    Funded by: CIHR: 136855; Biotechnology and Biological Sciences Research Council: BB/I021213/1, BB/I02593X/1; British Heart Foundation; European Research Council: 617306; Howard Hughes Medical Institute; Medical Research Council: G0801823; NCI NIH HHS: R01 CA166661, R01 CA172652, R01CA166661, R01CA172652; NHGRI NIH HHS: R01 HG002385, R01 HG002898, R01 HG004960, R01 HG005701, R01 HG006855, R01 HG007022, R01 HG007068, R01 HG007644, R01HG2385, R01HG2898, R01HG4960, R01HG5214, R01HG5701, R01HG6855, R01HG7068, R01HG7644, RC2 HG005552, RC2HG5552, U01 HG005211, U01 HG005214, U01 HG005715, U01 HG005718, U01 HG005728, U01 HG006513, U01HG5211, U01HG5214, U01HG5715, U01HG5718, U01HG5728, U01HG6513, U41 HG002371, U41 HG007497, U41 HG007635, U41HG7497, U41HG7635, U54 HG003067, U54 HG003079, U54 HG003273, U54HG3067, U54HG3079, U54HG3273; NHLBI NIH HHS: HHSN268201100040C, R01 HL087699, R01 HL104608, R01HL104608, R01HL87699, T32 HL094284, T32HL94284; NIAID NIH HHS: HHSN272201000025C; NIDDK NIH HHS: R01 DK075787; NIGMS NIH HHS: P01 GM099568, P01GM99568, R01 GM059290, R01 GM104390, R01 GM108805, R01 GM113657, R01GM104390, R01GM59290, T32 GM007790, T32GM7790; NIH HHS: DP2 OD006514, DP2OD6514, DP5 OD009154, DP5OD9154; Wellcome Trust: 089276/Z.09/Z, 090532/Z/09/Z, 090770, 095552/Z/11/Z, 095908, 096599, 100956, 102541, WT0855322/Z/08/Z, WT086084/Z/08/Z, WT090770/Z/09/Z, WT097307, WT098051, WT100956/Z/13/Z, WT109497

    Nature 2015;526;7571;68-74

  • Genomic expression catalogue of a global collection of BCG vaccine strains show evidence for highly diverged metabolic and cell-wall adaptations.

    Abdallah AM, Hill-Cawthorne GA, Otto TD, Coll F, Guerra-Assunção JA, Gao G, Naeem R, Ansari H, Malas TB, Adroub SA, Verboom T, Ummels R, Zhang H, Panigrahi AK, McNerney R, Brosch R, Clark TG, Behr MA, Bitter W and Pain A

    Pathogen Genomics Group, Biological, Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Kingdom of Saudi Arabia.

    Although Bacillus Calmette-Guérin (BCG) vaccines against tuberculosis have been available for more than 90 years, their effectiveness has been hindered by variable protective efficacy and a lack of lasting memory responses. One factor contributing to this variability may be the diversity of the BCG strains that are used around the world, in part from genomic changes accumulated during vaccine production and their resulting differences in gene expression. We have compared the genomes and transcriptomes of a global collection of fourteen of the most widely used BCG strains at single base-pair resolution. We have also used quantitative proteomics to identify key differences in expression of proteins across five representative BCG strains of the four tandem duplication (DU) groups. We provide a comprehensive map of single nucleotide polymorphisms (SNPs), copy number variation and insertions and deletions (indels) across fourteen BCG strains. Genome-wide SNP characterization allowed the construction of a new and robust phylogenic genealogy of BCG strains. Transcriptional and proteomic profiling revealed a metabolic remodeling in BCG strains that may be reflected by altered immunogenicity and possibly vaccine efficacy. Together, these integrated-omic data represent the most comprehensive catalogue of genetic variation across a global collection of BCG strains.

    Funded by: Medical Research Council: MR/K000551/1

    Scientific reports 2015;5;15443

  • Analysis of deletion breakpoints from 1,092 humans reveals details of mutation mechanisms.

    Abyzov A, Li S, Kim DR, Mohiyuddin M, Stütz AM, Parrish NF, Mu XJ, Clark W, Chen K, Hurles M, Korbel JO, Lam HY, Lee C and Gerstein MB

    Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, 200 1st Street SW, Rochester, Minnesota 55905, USA.

    Investigating genomic structural variants at basepair resolution is crucial for understanding their formation mechanisms. We identify and analyse 8,943 deletion breakpoints in 1,092 samples from the 1000 Genomes Project. We find breakpoints have more nearby SNPs and indels than the genomic average, likely a consequence of relaxed selection. By investigating the correlation of breakpoints with DNA methylation, Hi-C interactions, and histone marks and the substitution patterns of nucleotides near them, we find that breakpoints with the signature of non-allelic homologous recombination (NAHR) are associated with open chromatin. We hypothesize that some NAHR deletions occur without DNA replication and cell division, in embryonic and germline cells. In contrast, breakpoints associated with non-homologous (NH) mechanisms often have sequence microinsertions, templated from later replicating genomic sites, spaced at two characteristic distances from the breakpoint. These microinsertions are consistent with template-switching events and suggest a particular spatiotemporal configuration for DNA during the events.

    Funded by: NCI NIH HHS: P30 CA016672, P30 CA034196, R01 CA172652, R01CA172652; NHGRI NIH HHS: U41 HG007497, U41HG007497

    Nature communications 2015;6;7256

  • High-throughput spatial mapping of single-cell RNA-seq data to tissue of origin.

    Achim K, Pettit JB, Saraiva LR, Gavriouchkina D, Larsson T, Arendt D and Marioni JC

    1] European Bioinformatics Institute, European Molecular Biology Laboratory (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK. [2] Developmental Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany.

    Understanding cell type identity in a multicellular organism requires the integration of gene expression profiles from individual cells with their spatial location in a particular tissue. Current technologies allow whole-transcriptome sequencing of spatially identified cells but lack the throughput needed to characterize complex tissues. Here we present a high-throughput method to identify the spatial origin of cells assayed by single-cell RNA-sequencing within a tissue of interest. Our approach is based on comparing complete, specificity-weighted mRNA profiles of a cell with positional gene expression profiles derived from a gene expression atlas. We show that this method allocates cells to precise locations in the brain of the marine annelid Platynereis dumerilii with a success rate of 81%. Our method is applicable to any system that has a reference gene expression database of sufficiently high resolution.

    Funded by: Wellcome Trust

    Nature biotechnology 2015;33;5;503-9

  • The Mouse Genomes Project: a repository of inbred laboratory mouse strain genomes.

    Adams DJ, Doran AG, Lilue J and Keane TM

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK.

    The Mouse Genomes Project was initiated in 2009 with the goal of using next-generation sequencing technologies to catalogue molecular variation in the common laboratory mouse strains, and a selected set of wild-derived inbred strains. The initial sequencing and survey of sequence variation in 17 inbred strains was completed in 2011 and included comprehensive catalogue of single nucleotide polymorphisms, short insertion/deletions, larger structural variants including their fine scale architecture and landscape of transposable element variation, and genomic sites subject to post-transcriptional alteration of RNA. From this beginning, the resource has expanded significantly to include 36 fully sequenced inbred laboratory mouse strains, a refined and updated data processing pipeline, and new variation querying and data visualisation tools which are available on the project's website ( ). The focus of the project is now the completion of de novo assembled chromosome sequences and strain-specific gene structures for the core strains. We discuss how the assembled chromosomes will power comparative analysis, data access tools and future directions of mouse genetics.

    Funded by: Medical Research Council: MR/L007428/1

    Mammalian genome : official journal of the International Mammalian Genome Society 2015;26;9-10;403-12

  • The genome-wide effects of ionizing radiation on mutation induction in the mammalian germline.

    Adewoye AB, Lindsay SJ, Dubrova YE and Hurles ME

    Department of Genetics, University of Leicester, Leicester LE1 7RH, UK.

    The ability to predict the genetic consequences of human exposure to ionizing radiation has been a long-standing goal of human genetics in the past 50 years. Here we present the results of an unbiased, comprehensive genome-wide survey of the range of germline mutations induced in laboratory mice after parental exposure to ionizing radiation and show irradiation markedly alters the frequency and spectrum of de novo mutations. Here we show that the frequency of de novo copy number variants (CNVs) and insertion/deletion events (indels) is significantly elevated in offspring of exposed fathers. We also show that the spectrum of induced de novo single-nucleotide variants (SNVs) is strikingly different; with clustered mutations being significantly over-represented in the offspring of irradiated males. Our study highlights the specific classes of radiation-induced DNA lesions that evade repair and result in germline mutation and paves the way for similarly comprehensive characterizations of other germline mutagens.

    Funded by: Wellcome Trust: WT091106, WT098051

    Nature communications 2015;6;6684

  • Right Ventricular Epicardial Fibrosis in Mice With Sternal Segment Dislocation.

    Adissu HA, Medhanie GA, Morikawa L, White JK, Newbigging S and McKerlie C

    Centre for Modeling Human Disease, Toronto Centre for Phenogenomics, Toronto, ON, Canada Physiology & Experimental Medicine Research Program, The Hospital for Sick Children, Toronto, ON, Canada Department of Laboratory Medicine & Pathobiology, Faculty of Medicine, University of Toronto, Toronto, ON, Canada

    We report coincident sternal segment dislocation and focally extensive right ventricular epicardial fibrosis observed during routine histopathology evaluation of C57BL/6N mice as part of a high throughput phenotyping screen conducted between 4 and 16 weeks of age. This retrospective case series study was conducted to determine whether cardiac fibrosis was a pathological consequence of sternal segment dislocation. We identified sternal segment dislocation in 51 of the total 1103 mice (4.6%) analyzed at 16 weeks of age. Males were more frequently affected. In all cases but 2, the dislocation occurred at the fourth intersternebral joint. In 42 of the 51 cases (82.4%), the dislocation was encased by regenerative cartilaginous callus that protruded internally into the thoracic cavity (intrathoracic callus) and/or externally to the outer aspect of the sternum (extrathoracic callus). Displacement of dislocated ends of the sternum into the thoracic cavity was present in 19 of 51 cases (36.5%). Coincident minimal or mild right ventricular epicardial and subepicardial fibrosis was observed in 22 of the 51 cases (43%) but was not observed in any of the mice in the absence of sternal segment dislocation. Our data suggest that right ventricular fibrosis was likely caused by direct injury of the right ventricle by the dislocated ends of the sternum and/or by intrathoracic callus that develops post dislocation. Potential pathogenesis for the sternal and cardiac lesions and their implication for the interpretation of phenotypes in mouse models of cardiopulmonary and skeletal disease are discussed.

    Funded by: NHGRI NIH HHS: U54 HG006364; NIH HHS: U42 OD011175; Wellcome Trust: 098051

    Veterinary pathology 2015;52;5;967-76

  • Antimicrobial Resistance Profiles and Diversity in Salmonella from Humans and Cattle, 2004-2011.

    Afema JA, Mather AE and Sischo WM

    Department of Veterinary Clinical Sciences, College of Veterinary Medicine, Washington State University, Pullman, WA, USA.

    Analysis of long-term anti-microbial resistance (AMR) data is useful to understand source and transmission dynamics of AMR. We analysed 5124 human clinical isolates from Washington State Department of Health, 391 cattle clinical isolates from the Washington Animal Disease Diagnostic Laboratory and 1864 non-clinical isolates from foodborne disease research on dairies in the Pacific Northwest. Isolates were assigned profiles based on phenotypic resistance to 11 anti-microbials belonging to eight classes. Salmonella Typhimurium (ST), Salmonella Newport (SN) and Salmonella Montevideo (SM) were the most common serovars in both humans and cattle. Multinomial logistic regression showed ST and SN from cattle had greater probability of resistance to multiple classes of anti-microbials than ST and SN from humans (P < 0.0001). While these findings could be consistent with the belief that cattle are a source of resistant ST and SN for people, occurrence of profiles unique to cattle and not observed in temporally related human isolates indicates these profiles are circulating in cattle only. We used various measures to assess AMR diversity, conditional on the weighting of rare versus abundant profiles. AMR profile richness was greater in the common serovars from humans, although both source data sets were dominated by relatively few profiles. The greater profile richness in human Salmonella may be due to greater diversity of sources entering the human population compared to cattle or due to continuous evolution in the human environment. Also, AMR diversity was greater in clinical compared to non-clinical cattle Salmonella, and this could be due to anti-microbial selection pressure in diseased cattle that received treatment. The use of bootstrapping techniques showed that although there were shared profiles between humans and cattle, the expected and observed number of profiles was different, suggesting Salmonella and associated resistance from humans and cattle may not be wholly derived from a common population.

    Funded by: Wellcome Trust: 098051

    Zoonoses and public health 2015;62;7;506-17

  • Local evolutionary patterns of human respiratory syncytial virus derived from whole-genome sequencing.

    Agoti CN, Otieno JR, Munywoki PK, Mwihuri AG, Cane PA, Nokes DJ, Kellam P and Cotten M

    KEMRI-Wellcome Trust Research Programme, Kilifi, Kenya.

    Unlabelled: Human respiratory syncytial virus (RSV) is associated with severe childhood respiratory infections. A clear description of local RSV molecular epidemiology, evolution, and transmission requires detailed sequence data and can inform new strategies for virus control and vaccine development. We have generated 27 complete or nearly complete genomes of RSV from hospitalized children attending a rural coastal district hospital in Kilifi, Kenya, over a 10-year period using a novel full-genome deep-sequencing process. Phylogenetic analysis of the new genomes demonstrated the existence and cocirculation of multiple genotypes in both RSV A and B groups in Kilifi. Comparison of local versus global strains demonstrated that most RSV A variants observed locally in Kilifi were also seen in other parts of the world, while the Kilifi RSV B genomes encoded a high degree of variation that was not observed in other parts of the world. The nucleotide substitution rates for the individual open reading frames (ORFs) were highest in the regions encoding the attachment (G) glycoprotein and the NS2 protein. The analysis of RSV full genomes, compared to subgenomic regions, provided more precise estimates of the RSV sequence changes and revealed important patterns of RSV genomic variation and global movement. The novel sequencing method and the new RSV genomic sequences reported here expand our knowledge base for large-scale RSV epidemiological and transmission studies.

    Importance: The new RSV genomic sequences and the novel sequencing method reported here provide important data for understanding RSV transmission and vaccine development. Given the complex interplay between RSV A and RSV B infections, the existence of local RSV B evolution is an important factor in vaccine deployment.

    Funded by: Wellcome Trust: 077092, 084633, 092654, 100542, 102975

    Journal of virology 2015;89;7;3444-54

  • Successful Generation of Human Induced Pluripotent Stem Cell Lines from Blood Samples Held at Room Temperature for up to 48 hr.

    Agu CA, Soares FA, Alderton A, Patel M, Ansari R, Patel S, Forrest S, Yang F, Lineham J, Vallier L and Kirton CM

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK. Electronic address:

    The collection sites of human primary tissue samples and the receiving laboratories, where the human induced pluripotent stem cells (hIPSCs) are derived, are often not on the same site. Thus, the stability of samples prior to derivation constrains the distance between the collection site and the receiving laboratory. To investigate sample stability, we collected blood and held it at room temperature for 5, 24, or 48 hr before isolating peripheral blood mononuclear cells (PBMCs) and reprogramming into IPSCs. Additionally, PBMC samples at 5- and 48-hr time points were frozen in liquid nitrogen for 4 months and reprogrammed into IPSCs. hIPSC lines derived from all time points were pluripotent, displayed no marked difference in chromosomal aberration rates, and differentiated into three germ layers. Reprogramming efficiency at 24- and 48-hr time points was 3- and 10-fold lower, respectively, than at 5 hr; the freeze-thaw process of PBMCs resulted in no obvious change in reprogramming efficiency.

    Funded by: Medical Research Council: G0800784, G1000847; Wellcome Trust: WT098051, WT098503

    Stem cell reports 2015;5;4;660-71

  • Loss of PCLO function underlies pontocerebellar hypoplasia type III.

    Ahmed MY, Chioza BA, Rajab A, Schmitz-Abe K, Al-Khayat A, Al-Turki S, Baple EL, Patton MA, Al-Memar AY, Hurles ME, Partlow JN, Hill RS, Evrony GD, Servattalab S, Markianos K, Walsh CA, Crosby AH and Mochida GH

    From Monogenic Molecular Genetics (M.Y.A., B.A.C., E.L.B., A.H.C.), University of Exeter Medical School, RILD Wellcome Wolfson Centre, Royal Devon & Exeter NHS Foundation Trust, Exeter; Centre for Human Genetics (M.Y.A., B.A.C., E.L.B., M.A.P., A.H.C.), St. George's, University of London, UK; National Genetic Center (A.R.), Ministry of Health, Muscat, Sultanate of Oman; Division of Genetics and Genomics, Department of Medicine (K.S.-A., J.N.P., R.S.H., G.D.E., S.S., K.M., C.A.W., G.H.M.), Manton Center for Orphan Disease Research (K.S.-A., J.N.P., R.S.H., G.D.E., S.S., K.M., C.A.W., G.H.M.), and Howard Hughes Medical Institute (J.N.P., R.S.H., G.D.E., S.S., C.A.W.), Boston Children's Hospital; Departments of Pediatrics (K.S.-A., K.M., C.A.W., G.H.M.) and Neurology (C.A.W.), and Program in Biological and Biomedical Sciences (G.D.E.), Harvard Medical School, Boston; Program in Medical and Population Genetics (K.S.-A., K.M., C.A.W.), Broad Institute of MIT and Harvard University, Cambridge, MA; Department of Biology (A.A.-K.), College of Science, Sultan Qaboos University, Sultanate of Oman; Wellcome Trust Sanger Institute (S.A.-T., M.E.H.), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK; Department of Neurology (A.Y.A.-M.), Atkinson Morley Wing, St. George's Hospital, London, UK; and Pediatric Neurology Unit (G.H.M.), Department of Neurology, Massachusetts General Hospital, Boston, MA.

    Objective: To identify the genetic cause of pontocerebellar hypoplasia type III (PCH3).

    Methods: We studied the original reported pedigree of PCH3 and performed genetic analysis including genome-wide single nucleotide polymorphism genotyping, linkage analysis, whole-exome sequencing, and Sanger sequencing. Human fetal brain RNA sequencing data were then analyzed for the identified candidate gene.

    Results: The affected individuals presented with severe global developmental delay and seizures starting in the first year of life. Brain MRI of an affected individual showed diffuse atrophy of the cerebrum, cerebellum, and brainstem. Genome-wide single nucleotide polymorphism analysis confirmed the linkage to chromosome 7q we previously reported, and showed no other genomic areas of linkage. Whole-exome sequencing of 2 affected individuals identified a shared homozygous, nonsense variant in the PCLO (piccolo) gene. This variant segregated with the disease phenotype in the pedigree was rare in the population and was predicted to eliminate the PDZ and C2 domains in the C-terminus of the protein. RNA sequencing data of human fetal brain showed that PCLO was moderately expressed in the developing cerebral cortex.

    Conclusions: Here, we show that a homozygous, nonsense PCLO mutation underlies the autosomal recessive neurodegenerative disorder, PCH3. PCLO is a component of the presynaptic cytoskeletal matrix, and is thought to be involved in regulation of presynaptic proteins and synaptic vesicles. Our findings suggest that PCLO is crucial for the development and survival of a wide range of neuronal types in the human brain.

    Funded by: FIC NIH HHS: R21TW008223; Howard Hughes Medical Institute; Medical Research Council: G0700089, G1002279; NIGMS NIH HHS: T32 GM007753; NINDS NIH HHS: R01 NS035129, R01NS035129; Wellcome Trust: 099175/Z/12/Z

    Neurology 2015;84;17;1745-50

  • Principles of assembly reveal a periodic table of protein complexes.

    Ahnert SE, Marsh JA, Hernández H, Robinson CV and Teichmann SA

    Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, JJ Thomson Avenue, Cambridge CB3 0HE, UK.

    Structural insights into protein complexes have had a broad impact on our understanding of biological function and evolution. In this work, we sought a comprehensive understanding of the general principles underlying quaternary structure organization in protein complexes. We first examined the fundamental steps by which protein complexes can assemble, using experimental and structure-based characterization of assembly pathways. Most assembly transitions can be classified into three basic types, which can then be used to exhaustively enumerate a large set of possible quaternary structure topologies. These topologies, which include the vast majority of observed protein complex structures, enable a natural organization of protein complexes into a periodic table. On the basis of this table, we can accurately predict the expected frequencies of quaternary structure topologies, including those not yet observed. These results have important implications for quaternary structure prediction, modeling, and engineering.

    Funded by: Medical Research Council: G1000819

    Science (New York, N.Y.) 2015;350;6266;aaa2245

  • Discovery of four recessive developmental disorders using probabilistic genotype and phenotype matching among 4,125 families.

    Akawi N, McRae J, Ansari M, Balasubramanian M, Blyth M, Brady AF, Clayton S, Cole T, Deshpande C, Fitzgerald TW, Foulds N, Francis R, Gabriel G, Gerety SS, Goodship J, Hobson E, Jones WD, Joss S, King D, Klena N, Kumar A, Lees M, Lelliott C, Lord J, McMullan D, O'Regan M, Osio D, Piombo V, Prigmore E, Rajan D, Rosser E, Sifrim A, Smith A, Swaminathan GJ, Turnpenny P, Whitworth J, Wright CF, Firth HV, Barrett JC, Lo CW, FitzPatrick DR, Hurles ME and DDD study

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.

    Discovery of most autosomal recessive disease-associated genes has involved analysis of large, often consanguineous multiplex families or small cohorts of unrelated individuals with a well-defined clinical condition. Discovery of new dominant causes of rare, genetically heterogeneous developmental disorders has been revolutionized by exome analysis of large cohorts of phenotypically diverse parent-offspring trios. Here we analyzed 4,125 families with diverse, rare and genetically heterogeneous developmental disorders and identified four new autosomal recessive disorders. These four disorders were identified by integrating Mendelian filtering (selecting probands with rare, biallelic and putatively damaging variants in the same gene) with statistical assessments of (i) the likelihood of sampling the observed genotypes from the general population and (ii) the phenotypic similarity of patients with recessive variants in the same candidate gene. This new paradigm promises to catalyze the discovery of novel recessive disorders, especially those with less consistent or nonspecific clinical presentations and those caused predominantly by compound heterozygous genotypes.

    Funded by: Medical Research Council: MC_PC_U127561093, MC_U127561093; NHLBI NIH HHS: U01 HL098180, U01-HL098180; Wellcome Trust: 091986, WT098051

    Nature genetics 2015;47;11;1363-9

  • Transcriptional profiling of macrophages derived from monocytes and iPS cells identifies a conserved response to LPS and novel alternative transcription.

    Alasoo K, Martinez FO, Hale C, Gordon S, Powrie F, Dougan G, Mukhopadhyay S and Gaffney DJ

    Wellcome Trust Sanger Institute, Hinxton, UK.

    Macrophages differentiated from human induced pluripotent stem cells (IPSDMs) are a potentially valuable new tool for linking genotype to phenotype in functional studies. However, at a genome-wide level these cells have remained largely uncharacterised. Here, we compared the transcriptomes of naïve and lipopolysaccharide (LPS) stimulated monocyte-derived macrophages (MDMs) and IPSDMs using RNA-Seq. The IPSDM and MDM transcriptomes were broadly similar and exhibited a highly conserved response to LPS. However, there were also significant differences in the expression of genes associated with antigen presentation and tissue remodelling. Furthermore, genes coding for multiple chemokines involved in neutrophil recruitment were more highly expressed in IPSDMs upon LPS stimulation. Additionally, analysing individual transcript expression identified hundreds of genes undergoing alternative promoter and 3' untranslated region usage following LPS treatment representing a previously under-appreciated level of regulation in the LPS response.

    Funded by: Wellcome Trust: 095688, 098051

    Scientific reports 2015;5;12524

  • Clock-like mutational processes in human somatic cells.

    Alexandrov LB, Jones PH, Wedge DC, Sale JE, Campbell PJ, Nik-Zainal S and Stratton MR

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, UK.

    During the course of a lifetime, somatic cells acquire mutations. Different mutational processes may contribute to the mutations accumulated in a cell, with each imprinting a mutational signature on the cell's genome. Some processes generate mutations throughout life at a constant rate in all individuals, and the number of mutations in a cell attributable to these processes will be proportional to the chronological age of the person. Using mutations from 10,250 cancer genomes across 36 cancer types, we investigated clock-like mutational processes that have been operating in normal human cells. Two mutational signatures show clock-like properties. Both exhibit different mutation rates in different tissues. However, their mutation rates are not correlated, indicating that the underlying processes are subject to different biological influences. For one signature, the rate of cell division may influence its mutation rate. This study provides the first survey of clock-like mutational processes operating in human somatic cells.

    Funded by: Cancer Research UK: C609/A17257; Medical Research Council: MC_U105178808, MC_UU_12022/3; Wellcome Trust: 088340, 098051, WT088340MA, WT100183MA

    Nature genetics 2015;47;12;1402-7

  • A mutational signature in gastric cancer suggests therapeutic strategies.

    Alexandrov LB, Nik-Zainal S, Siu HC, Leung SY and Stratton MR

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK.

    Targeting defects in the DNA repair machinery of neoplastic cells, for example, those due to inactivating BRCA1 and/or BRCA2 mutations, has been used for developing new therapies in certain types of breast, ovarian and pancreatic cancers. Recently, a mutational signature was associated with failure of double-strand DNA break repair by homologous recombination based on its high mutational burden in samples harbouring BRCA1 or BRCA2 mutations. In pancreatic cancer, all responders to platinum therapy exhibit this mutational signature including a sample that lacked any defects in BRCA1 or BRCA2. Here, we examine 10,250 cancer genomes across 36 types of cancer and demonstrate that, in addition to breast, ovarian and pancreatic cancers, gastric cancer is another cancer type that exhibits this mutational signature. Our results suggest that 7-12% of gastric cancers have defective double-strand DNA break repair by homologous recombination and may benefit from either platinum therapy or PARP inhibitors.

    Funded by: Wellcome Trust: 098051, WT100183MA

    Nature communications 2015;6;8683

  • A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing.

    Alioto TS, Buchhalter I, Derdak S, Hutter B, Eldridge MD, Hovig E, Heisler LE, Beck TA, Simpson JT, Tonon L, Sertier AS, Patch AM, Jäger N, Ginsbach P, Drews R, Paramasivam N, Kabbe R, Chotewutmontri S, Diessl N, Previti C, Schmidt S, Brors B, Feuerbach L, Heinold M, Gröbner S, Korshunov A, Tarpey PS, Butler AP, Hinton J, Jones D, Menzies A, Raine K, Shepherd R, Stebbings L, Teague JW, Ribeca P, Giner FC, Beltran S, Raineri E, Dabad M, Heath SC, Gut M, Denroche RE, Harding NJ, Yamaguchi TN, Fujimoto A, Nakagawa H, Quesada V, Valdés-Mas R, Nakken S, Vodák D, Bower L, Lynch AG, Anderson CL, Waddell N, Pearson JV, Grimmond SM, Peto M, Spellman P, He M, Kandoth C, Lee S, Zhang J, Létourneau L, Ma S, Seth S, Torrents D, Xi L, Wheeler DA, López-Otín C, Campo E, Campbell PJ, Boutros PC, Puente XS, Gerhard DS, Pfister SM, McPherson JD, Hudson TJ, Schlesner M, Lichter P, Eils R, Jones DT and Gut IG

    CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain.

    As whole-genome sequencing for cancer genome analysis becomes a clinical tool, a full understanding of the variables affecting sequencing analysis output is required. Here using tumour-normal sample pairs from two different types of cancer, chronic lymphocytic leukaemia and medulloblastoma, we conduct a benchmarking exercise within the context of the International Cancer Genome Consortium. We compare sequencing methods, analysis pipelines and validation methods. We show that using PCR-free methods and increasing sequencing depth to ∼ 100 × shows benefits, as long as the tumour:control coverage ratio remains balanced. We observe widely varying mutation call rates and low concordance among analysis pipelines, reflecting the artefact-prone nature of the raw data and lack of standards for dealing with the artefacts. However, we show that, using the benchmark mutation set we have created, many issues are in fact easy to remedy and have an immediate positive impact on mutation detection accuracy.

    Funded by: Canadian Institutes of Health Research; Cancer Research UK

    Nature communications 2015;6;10001

  • Characterization of functional methylomes by next-generation capture sequencing identifies novel disease-associated variants.

    Allum F, Shao X, Guénard F, Simon MM, Busche S, Caron M, Lambourne J, Lessard J, Tandre K, Hedman ÅK, Kwan T, Ge B, Multiple Tissue Human Expression Resource Consortium, Rönnblom L, McCarthy MI, Deloukas P, Richmond T, Burgess D, Spector TD, Tchernof A, Marceau S, Lathrop M, Vohl MC, Pastinen T and Grundberg E

    1] Department of Human Genetics, McGill University, 740 Docteur-Penfield Avenue, Montreal, Québec , Canada H3A 0G1 [2] McGill University and Genome Quebec Innovation Centre, 740 Docteur-Penfield Avenue, Montreal, Québec, Canada H3A 0G1.

    Most genome-wide methylation studies (EWAS) of multifactorial disease traits use targeted arrays or enrichment methodologies preferentially covering CpG-dense regions, to characterize sufficiently large samples. To overcome this limitation, we present here a new customizable, cost-effective approach, methylC-capture sequencing (MCC-Seq), for sequencing functional methylomes, while simultaneously providing genetic variation information. To illustrate MCC-Seq, we use whole-genome bisulfite sequencing on adipose tissue (AT) samples and public databases to design AT-specific panels. We establish its efficiency for high-density interrogation of methylome variability by systematic comparisons with other approaches and demonstrate its applicability by identifying novel methylation variation within enhancers strongly correlated to plasma triglyceride and HDL-cholesterol, including at CD36. Our more comprehensive AT panel assesses tissue methylation and genotypes in parallel at ∼4 and ∼3 M sites, respectively. Our study demonstrates that MCC-Seq provides comparable accuracy to alternative approaches but enables more efficient cataloguing of functional and disease-relevant epigenetic and genetic variants for large-scale EWAS.

    Funded by: Canadian Institutes of Health Research: EP1-120608, TEC-128093; Medical Research Council: G0600717, MC_UU_12012/1; Wellcome Trust: 081917, 081917/Z/07/Z, 090532, 095515, 100574

    Nature communications 2015;6;7211

  • Pathway-based factor analysis of gene expression data produces highly heritable phenotypes that associate with age.

    Anand Brown A, Ding Z, Viñuela A, Glass D, Parts L, Spector T, Winn J and Durbin R

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, United Kingdom NORMENT, KG Jebsen Centre for Psychosis Research, Division of Mental Health and Addiction, Oslo University Hospital, Oslo, Norway.

    Statistical factor analysis methods have previously been used to remove noise components from high-dimensional data prior to genetic association mapping and, in a guided fashion, to summarize biologically relevant sources of variation. Here, we show how the derived factors summarizing pathway expression can be used to analyze the relationships between expression, heritability, and aging. We used skin gene expression data from 647 twins from the MuTHER Consortium and applied factor analysis to concisely summarize patterns of gene expression to remove broad confounding influences and to produce concise pathway-level phenotypes. We derived 930 "pathway phenotypes" that summarized patterns of variation across 186 KEGG pathways (five phenotypes per pathway). We identified 69 significant associations of age with phenotype from 57 distinct KEGG pathways at a stringent Bonferroni threshold ([Formula: see text]). These phenotypes are more heritable ([Formula: see text]) than gene expression levels. On average, expression levels of 16% of genes within these pathways are associated with age. Several significant pathways relate to metabolizing sugars and fatty acids; others relate to insulin signaling. We have demonstrated that factor analysis methods combined with biological knowledge can produce more reliable phenotypes with less stochastic noise than the individual gene expression levels, which increases our power to discover biologically relevant associations. These phenotypes could also be applied to discover associations with other environmental factors.

    G3 (Bethesda, Md.) 2015;5;5;839-47

  • Points to consider in the development of seed stocks of pluripotent stem cells for clinical applications: International Stem Cell Banking Initiative (ISCBI).

    Andrews PW, Baker D, Benvinisty N, Miranda B, Bruce K, Brüstle O, Choi M, Choi YM, Crook JM, de Sousa PA, Dvorak P, Freund C, Firpo M, Furue MK, Gokhale P, Ha HY, Han E, Haupt S, Healy L, Hei DJ, Hovatta O, Hunt C, Hwang SM, Inamdar MS, Isasi RM, Jaconi M, Jekerle V, Kamthorn P, Kibbey MC, Knezevic I, Knowles BB, Koo SK, Laabi Y, Leopoldo L, Liu P, Lomax GP, Loring JF, Ludwig TE, Montgomery K, Mummery C, Nagy A, Nakamura Y, Nakatsuji N, Oh S, Oh SK, Otonkoski T, Pera M, Peschanski M, Pranke P, Rajala KM, Rao M, Ruttachuk R, Reubinoff B, Ricco L, Rooke H, Sipp D, Stacey GN, Suemori H, Takahashi TA, Takada K, Talib S, Tannenbaum S, Yuan BZ, Zeng F and Zhou Q

    Department of Biomedical Science, The University of Sheffield, Sheffield, UK.

    Funded by: Medical Research Council: G0300484, MR/L01324X/1

    Regenerative medicine 2015;10;2 Suppl;1-44

  • Selective impairment of methylation maintenance is the major cause of DNA methylation reprogramming in the early embryo.

    Arand J, Wossidlo M, Lepikhov K, Peat JR, Reik W and Walter J

    University of Saarland, FR 8.3, Biological Sciences, Genetics/Epigenetics, Campus A2.4, 66123 Saarbrücken, Germany ; Departments of Pediatrics and Genetics, Stanford University School of Medicine, 265 Campus Drive, Stanford, CA 94305 USA.

    Background: DNA methylomes are extensively reprogrammed during mouse pre-implantation and early germ cell development. The main feature of this reprogramming is a genome-wide decrease in 5-methylcytosine (5mC). Standard high-resolution single-stranded bisulfite sequencing techniques do not allow discrimination of the underlying passive (replication-dependent) or active enzymatic mechanisms of 5mC loss. We approached this problem by generating high-resolution deep hairpin bisulfite sequencing (DHBS) maps, allowing us to follow the patterns of symmetric DNA methylation at CpGs dyads on both DNA strands over single replications.

    Results: We compared DHBS maps of repetitive elements in the developing zygote, the early embryo, and primordial germ cells (PGCs) at defined stages of development. In the zygote, we observed distinct effects in paternal and maternal chromosomes. A significant loss of paternal DNA methylation was linked to replication and to an increase in continuous and dispersed hemimethylated CpG dyad patterns. Overall methylation levels at maternal copies remained largely unchanged, but showed an increased level of dispersed hemi-methylated CpG dyads. After the first cell cycle, the combined DHBS patterns of paternal and maternal chromosomes remained unchanged over the next three cell divisions. By contrast, in PGCs the DNA demethylation process was continuous, as seen by a consistent decrease in fully methylated CpG dyads over consecutive cell divisions.

    Conclusions: The main driver of DNA demethylation in germ cells and in the zygote is partial impairment of maintenance of symmetric DNA methylation at CpG dyads. In the embryo, this passive demethylation is restricted to the first cell division, whereas it continues over several cell divisions in germ cells. The dispersed patterns of CpG dyads in the early-cleavage embryo suggest a continuous partial (and to a low extent active) loss of methylation apparently compensated for by selective de novo methylation. We conclude that a combination of passive and active demethylation events counteracted by de novo methylation are involved in the distinct reprogramming dynamics of DNA methylomes in the zygote, the early embryo, and PGCs.

    Funded by: Wellcome Trust: 095645

    Epigenetics & chromatin 2015;8;1;1

  • Genes Regulated by Vitamin D in Bone Cells Are Positively Selected in East Asians.

    Arciero E, Biagini SA, Chen Y, Xue Y, Luiselli D, Tyler-Smith C, Pagani L and Ayub Q

    The Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, United Kingdom.

    Vitamin D and folate are activated and degraded by sunlight, respectively, and the physiological processes they control are likely to have been targets of selection as humans expanded from Africa into Eurasia. We investigated signals of positive selection in gene sets involved in the metabolism, regulation and action of these two vitamins in worldwide populations sequenced by Phase I of the 1000 Genomes Project. Comparing allele frequency-spectrum-based summary statistics between these gene sets and matched control genes, we observed a selection signal specific to East Asians for a gene set associated with vitamin D action in bones. The selection signal was mainly driven by three genes CXXC finger protein 1 (CXXC1), low density lipoprotein receptor-related protein 5 (LRP5) and runt-related transcription factor 2 (RUNX2). Examination of population differentiation and haplotypes allowed us to identify several candidate causal regulatory variants in each gene. Four of these candidate variants (one each in CXXC1 and RUNX2 and two in LRP5) had a >70% derived allele frequency in East Asians, but were present at lower (20-60%) frequency in Europeans as well, suggesting that the adaptation might have been part of a common response to climatic and dietary changes as humans expanded out of Africa, with implications for their role in vitamin D-dependent bone mineralization and osteoporosis insurgence. We also observed haplotype sharing between East Asians, Finns and an extinct archaic human (Denisovan) sample at the CXXC1 locus, which is best explained by incomplete lineage sorting.

    Funded by: Wellcome Trust: 098051

    PloS one 2015;10;12;e0146072

  • Artificial membrane-binding proteins stimulate oxygenation of stem cells during engineering of large cartilage tissue.

    Armstrong JPK, Shakur R, Horne JP, Dickinson SC, Armstrong CT, Lau K, Kadiwala J, Lowe R, Seddon A, Mann S, Anderson JLR, Perriman AW and Hollander AP

    Bristol Centre for Functional Nanomaterials, University of Bristol, Bristol BS8 1FD, UK.

    Restricted oxygen diffusion can result in central cell necrosis in engineered tissue, a problem that is exacerbated when engineering large tissue constructs for clinical application. Here we show that pre-treating human mesenchymal stem cells (hMSCs) with synthetic membrane-active myoglobin-polymer-surfactant complexes can provide a reservoir of oxygen capable of alleviating necrosis at the centre of hyaline cartilage. This is achieved through the development of a new cell functionalization methodology based on polymer-surfactant conjugation, which allows the delivery of functional proteins to the hMSC membrane. This new approach circumvents the need for cell surface engineering using protein chimerization or genetic transfection, and we demonstrate that the surface-modified hMSCs retain their ability to proliferate and to undergo multilineage differentiation. The functionalization technology is facile, versatile and non-disruptive, and in addition to tissue oxygenation, it should have far-reaching application in a host of tissue engineering and cell-based therapies.

    Funded by: Medical Research Council: MR/K015648, MR/K015648/1, MR/K015648/2

    Nature communications 2015;6;7405

  • Prevalence of dyslipidaemia and associated risk factors in a rural population in South-Western Uganda: a community based survey.

    Asiki G, Murphy GA, Baisley K, Nsubuga RN, Karabarinde A, Newton R, Seeley J, Young EH, Kamali A and Sandhu MS

    Medical Research Council/Uganda Virus Research Institute (MRC/UVRI), Uganda Research Unit on AIDS, Entebbe, Uganda.

    Background: The burden of dyslipidaemia is rising in many low income countries. However, there are few data on the prevalence of, or risk factors for, dyslipidaemia in Africa.

    Methods: In 2011, we used the WHO Stepwise approach to collect cardiovascular risk data within a general population cohort in rural south-western Uganda. Dyslipidaemia was defined by high total cholesterol (TC) ≥ 5.2 mmol/L or low high density lipoprotein cholesterol (HDL-C) <1 mmol/L in men, and <1.3 mmol/L in women. Logistic regression was used to explore correlates of dyslipidaemia.

    Results: Low HDL-C prevalence was 71.3% and high TC was 6.0%. In multivariate analysis, factors independently associated with low HDL-C among both men and women were: decreasing age, tribe (prevalence highest among Rwandese tribe), lower education, alcohol consumption (comparing current drinkers to never drinkers: men adjusted (a)OR=0.44, 95%CI=0.35-0.55; women aOR=0.51, 95%CI=0.41-0.64), consuming <5 servings of fruit/vegetable per day, daily vigorous physical activity (comparing those with none vs those with 5 days a week: men aOR=0.83 95%CI=0.67-1.02; women aOR=0.76, 95%CI=0.55-0.99), blood pressure (comparing those with hypertension to those with normal blood pressure: men aOR=0.57, 95%CI=0.43-0.75; women aOR=0.69, 95%CI=0.52-0.93) and HIV infection (HIV infected without ART vs. HIV negative: men aOR=2.45, 95%CI=1.53-3.94; women aOR=1.88, 95%CI=1.19-2.97). The odds of low HDL-C was also higher among men with high BMI or HbA1c ≤ 6%, and women who were single or with abdominal obesity. Among both men and women, high TC was independently associated with increasing age, non-Rwandese tribe, high waist circumference (men aOR=5.70, 95%CI=1.97-16.49; women aOR=1.58, 95%CI=1.10-2.28), hypertension (men aOR=3.49, 95%CI=1.74-7.00; women aOR=1.47, 95%CI=0.96-2.23) and HbA1c >6% (men aOR=3.00, 95%CI=1.37-6.59; women aOR=2.74, 95%CI=1.77-4.27). The odds of high TC was also higher among married men, and women with higher education or high BMI.

    Conclusion: Low HDL-C prevalence in this relatively young rural population is high whereas high TC prevalence is low. The consequences of dyslipidaemia in African populations remain unclear and prospective follow-up is required.

    Funded by: British Heart Foundation: PG/08/094/26019; Medical Research Council: G0801566, G0901213-92157, MC_U950080926

    PloS one 2015;10;5;e0126166

  • A Bayesian Approach to the Overlap Analysis of Epidemiologically Linked Traits.

    Asimit JL, Panoutsopoulou K, Wheeler E, Berndt SI, GIANT consortium, the arcOGEN consortium, Cordell HJ, Morris AP, Zeggini E and Barroso I

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    Diseases often cooccur in individuals more often than expected by chance, and may be explained by shared underlying genetic etiology. A common approach to genetic overlap analyses is to use summary genome-wide association study data to identify single-nucleotide polymorphisms (SNPs) that are associated with multiple traits at a selected P-value threshold. However, P-values do not account for differences in power, whereas Bayes' factors (BFs) do, and may be approximated using summary statistics. We use simulation studies to compare the power of frequentist and Bayesian approaches with overlap analyses, and to decide on appropriate thresholds for comparison between the two methods. It is empirically illustrated that BFs have the advantage over P-values of a decreasing type I error rate as study size increases for single-disease associations. Consequently, the overlap analysis of traits from different-sized studies encounters issues in fair P-value threshold selection, whereas BFs are adjusted automatically. Extensive simulations show that Bayesian overlap analyses tend to have higher power than those that assess association strength with P-values, particularly in low-power scenarios. Calibration tables between BFs and P-values are provided for a range of sample sizes, as well as an approximation approach for sample sizes that are not in the calibration table. Although P-values are sometimes thought more intuitive, these tables assist in removing the opaqueness of Bayesian thresholds and may also be used in the selection of a BF threshold to meet a certain type I error rate. An application of our methods is used to identify variants associated with both obesity and osteoarthritis.

    Funded by: Arthritis Research UK; Medical Research Council: MR/K021486/1; Versus Arthritis: 20308; Wellcome Trust: 087436, WT098051

    Genetic epidemiology 2015;39;8;624-34

  • The Kalash genetic isolate: ancient divergence, drift, and selection.

    Ayub Q, Mezzavilla M, Pagani L, Haber M, Mohyuddin A, Khaliq S, Mehdi SQ and Tyler-Smith C

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. Electronic address:

    The Kalash represent an enigmatic isolated population of Indo-European speakers who have been living for centuries in the Hindu Kush mountain ranges of present-day Pakistan. Previous Y chromosome and mitochondrial DNA markers provided no support for their claimed Greek descent following Alexander III of Macedon's invasion of this region, and analysis of autosomal loci provided evidence of a strong genetic bottleneck. To understand their origins and demography further, we genotyped 23 unrelated Kalash samples on the Illumina HumanOmni2.5M-8 BeadChip and sequenced one male individual at high coverage on an Illumina HiSeq 2000. Comparison with published data from ancient hunter-gatherers and European farmers showed that the Kalash share genetic drift with the Paleolithic Siberian hunter-gatherers and might represent an extremely drifted ancient northern Eurasian population that also contributed to European and Near Eastern ancestry. Since the split from other South Asian populations, the Kalash have maintained a low long-term effective population size (2,319-2,603) and experienced no detectable gene flow from their geographic neighbors in Pakistan or from other extant Eurasian populations. The mean time of divergence between the Kalash and other populations currently residing in this region was estimated to be 11,800 (95% confidence interval = 10,600-12,600) years ago, and thus they represent present-day descendants of some of the earliest migrants into the Indian sub-continent from West Asia.

    Funded by: Wellcome Trust: 098051

    American journal of human genetics 2015;96;5;775-83

  • Old Drugs To Treat Resistant Bugs: Methicillin-Resistant Staphylococcus aureus Isolates with mecC Are Susceptible to a Combination of Penicillin and Clavulanic Acid.

    Ba X, Harrison EM, Lovering AL, Gleadall N, Zadoks R, Parkhill J, Peacock SJ, Holden MT, Paterson GK and Holmes MA

    Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom.

    β-Lactam resistance in methicillin-resistant Staphylococcus aureus (MRSA) is mediated by the expression of an alternative penicillin-binding protein 2a (PBP2a) (encoded by mecA) with a low affinity for β-lactam antibiotics. Recently, a novel variant of mecA, known as mecC, was identified in MRSA isolates from both humans and animals. In this study, we demonstrate that mecC-encoded PBP2c does not mediate resistance to penicillin. Rather, broad-spectrum β-lactam resistance in MRSA strains carrying mecC (mecC-MRSA strains) is mediated by a combination of both PBP2c and the distinct β-lactamase encoded by the blaZ gene of strain LGA251 (blaZLGA251), which is part of mecC-encoding staphylococcal cassette chromosome mec (SCCmec) type XI. We further demonstrate that mecC-MRSA strains are susceptible to the combination of penicillin and the β-lactam inhibitor clavulanic acid in vitro and that the same combination is effective in vivo for the treatment of experimental mecC-MRSA infection in wax moth larvae. Thus, we demonstrate how the distinct biological differences between mecA- and mecC-encoded PBP2a and PBP2c have the potential to be exploited as a novel approach for the treatment of mecC-MRSA infections.

    Funded by: Medical Research Council: G1001787/1

    Antimicrobial agents and chemotherapy 2015;59;12;7396-404

  • 5-Formylcytosine can be a stable DNA modification in mammals.

    Bachman M, Uribe-Lewis S, Yang X, Burgess HE, Iurlaro M, Reik W, Murrell A and Balasubramanian S

    1] Department of Chemistry, University of Cambridge, Cambridge, UK. [2] Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK.

    5-Formylcytosine (5fC) is a rare base found in mammalian DNA and thought to be involved in active DNA demethylation. Here, we show that developmental dynamics of 5fC levels in mouse DNA differ from those of 5-hydroxymethylcytosine (5hmC), and using stable isotope labeling in vivo, we show that 5fC can be a stable DNA modification. These results suggest that 5fC has functional roles in DNA that go beyond being a demethylation intermediate.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/K010867/1; Cancer Research UK: C14303/A17197; Wellcome Trust: 095645, 099232, WT095645/Z/11/Z, WT099232

    Nature chemical biology 2015;11;8;555-7

  • Whole genome investigation of a divergent clade of the pathogen Streptococcus suis.

    Baig A, Weinert LA, Peters SE, Howell KJ, Chaudhuri RR, Wang J, Holden MT, Parkhill J, Langford PR, Rycroft AN, Wren BW, Tucker AW and Maskell DJ

    Department of Veterinary Medicine, University of Cambridge Cambridge, UK.

    Streptococcus suis is a major porcine and zoonotic pathogen responsible for significant economic losses in the pig industry and an increasing number of human cases. Multiple isolates of S. suis show marked genomic diversity. Here, we report the analysis of whole genome sequences of nine pig isolates that caused disease typical of S. suis and had phenotypic characteristics of S. suis, but their genomes were divergent from those of many other S. suis isolates. Comparison of protein sequences predicted from divergent genomes with those from normal S. suis reduced the size of core genome from 793 to only 397 genes. Divergence was clear if phylogenetic analysis was performed on reduced core genes and MLST alleles. Phylogenies based on certain other genes (16S rRNA, sodA, recN, and cpn60) did not show divergence for all isolates, suggesting recombination between some divergent isolates with normal S. suis for these genes. Indeed, there is evidence of recent recombination between the divergent and normal S. suis genomes for 249 of 397 core genes. In addition, phylogenetic analysis based on the 16S rRNA gene and 132 genes that were conserved between the divergent isolates and representatives of the broader Streptococcus genus showed that divergent isolates were more closely related to S. suis. Six out of nine divergent isolates possessed a S. suis-like capsule region with variation in capsular gene sequences but the remaining three did not have a discrete capsule locus. The majority (40/70), of virulence-associated genes in normal S. suis were present in the divergent genomes. Overall, the divergent isolates extend the current diversity of S. suis species but the phenotypic similarities and the large amount of gene exchange with normal S. suis gives insufficient evidence to assign these isolates to a new species or subspecies. Further, sampling and whole genome analysis of more isolates is warranted to understand the diversity of the species.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/G018553/1, BB/G019177/1, BB/G019274/1

    Frontiers in microbiology 2015;6;1191

  • Identification of a human synaptotagmin-1 mutation that perturbs synaptic vesicle cycling.

    Baker K, Gordon SL, Grozeva D, van Kogelenberg M, Roberts NY, Pike M, Blair E, Hurles ME, Chong WK, Baldeweg T, Kurian MA, Boyd SG, Cousin MA and Raymond FL

    Synaptotagmin-1 (SYT1) is a calcium-binding synaptic vesicle protein that is required for both exocytosis and endocytosis. Here, we describe a human condition associated with a rare variant in SYT1. The individual harboring this variant presented with an early onset dyskinetic movement disorder, severe motor delay, and profound cognitive impairment. Structural MRI was normal, but EEG showed extensive neurophysiological disturbances that included the unusual features of low-frequency oscillatory bursts and enhanced paired-pulse depression of visual evoked potentials. Trio analysis of whole-exome sequence identified a de novo SYT1 missense variant (I368T). Expression of rat SYT1 containing the equivalent human variant in WT mouse primary hippocampal cultures revealed that the mutant form of SYT1 correctly localizes to nerve terminals and is expressed at levels that are approximately equal to levels of endogenous WT protein. The presence of the mutant SYT1 slowed synaptic vesicle fusion kinetics, a finding that agrees with the previously demonstrated role for I368 in calcium-dependent membrane penetration. Expression of the I368T variant also altered the kinetics of synaptic vesicle endocytosis. Together, the clinical features, electrophysiological phenotype, and in vitro neuronal phenotype associated with this dominant negative SYT1 mutation highlight presynaptic mechanisms that mediate human motor control and cognitive development.

    Funded by: Medical Research Council: G1002117; Wellcome Trust: 100140, WT091310, WT098051

    The Journal of clinical investigation 2015;125;4;1670-8

  • Demystifying Escherichia coli pathovars.

    Baker KS

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2015;13;1;5

  • The Murray collection of pre-antibiotic era Enterobacteriacae: a unique research resource.

    Baker KS, Burnett E, McGregor H, Deheer-Graham A, Boinett C, Langridge GC, Wailan AM, Cain AK, Thomson NR, Russell JE and Parkhill J

    Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK.

    Studies of historical isolates inform on the evolution and emergence of important pathogens and phenotypes, including antimicrobial resistance. Crucial to studying antimicrobial resistance are isolates that predate the widespread clinical use of antimicrobials. The Murray collection of several hundred bacterial strains of pre-antibiotic era Enterobacteriaceae is an invaluable resource of historical strains from important pathogen groups. Studies performed on the Collection to date merely exemplify its potential, which will only be realised through the continued effort of many scientific groups. To enable that aim, we announce the public availability of the Murray collection through the National Collection of Type Cultures, and present associated metadata with whole genome sequence data for over half of the strains. Using this information we verify the metadata for the collection with regard to subgroup designations, equivalence groupings and plasmid content. We also present genomic analyses of population structure and determinants of mobilisable antimicrobial resistance to aid strain selection in future studies. This represents an invaluable public resource for the study of these important pathogen groups and the emergence and evolution of antimicrobial resistance.

    Funded by: Medical Research Council: G1100100, G1100100/1; Wellcome Trust: 106690/Z/14/Z

    Genome medicine 2015;7;97

  • Intercontinental dissemination of azithromycin-resistant shigellosis through sexual transmission: a cross-sectional study.

    Baker KS, Dallman TJ, Ashton PM, Day M, Hughes G, Crook PD, Gilbart VL, Zittermann S, Allen VG, Howden BP, Tomita T, Valcanis M, Harris SR, Connor TR, Sintchenko V, Howard P, Brown JD, Petty NK, Gouali M, Thanh DP, Keddy KH, Smith AM, Talukder KA, Faruque SM, Parkhill J, Baker S, Weill FX, Jenkins C and Thomson NR

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK.

    Background: Shigellosis is an acute, severe bacterial colitis that, in high-income countries, is typically associated with travel to high-risk regions (Africa, Asia, and Latin America). Since the 1970s, shigellosis has also been reported as a sexually transmitted infection in men who have sex with men (MSM), in whom transmission is an important component of shigellosis epidemiology in high-income nations. We aimed to use sophisticated subtyping and international sampling to determine factors driving shigellosis emergence in MSM linked to an outbreak in the UK.

    Methods: We did a large-scale, cross-sectional genomic epidemiological study of shigellosis cases collected from 29 countries between December, 1995, and June 8, 2014. Focusing on an ongoing epidemic in the UK, we collected and whole-genome sequenced clinical isolates of Shigella flexneri serotype 3a from high-risk and low-risk regions, including cases associated with travel and sex between men. We examined relationships between geographical, demographic, and clinical patient data with the isolate antimicrobial susceptibility, genetic data, and inferred evolutionary relationships.

    Findings: We obtained 331 clinical isolates of S flexneri serotype 3a, including 275 from low-risk regions (44 from individuals who travelled to high-risk regions), 52 from high-risk regions, and four outgroup samples (ie, closely related, but genetically distinct isolates used to determine the root of the phylogenetic tree). We identified a recently emerged lineage of S flexneri 3a that has spread intercontinentally in less than 20 years throughout regions traditionally at low risk for shigellosis via sexual transmission in MSM. The lineage had acquired multiple antimicrobial resistance determinants, and prevailing sublineages were strongly associated with resistance to the macrolide azithromycin. Eight (4%) of 206 isolates from the MSM-associated lineage were obtained from patients who had previously provided an isolate; these serial isolations indicated atypical infection patterns (eg, reinfection).

    Interpretation: We identified transmission-facilitating behaviours and atypical course(s) of infection as precipitating factors in shigellosis-affected MSM. The intercontinental spread of antimicrobial-resistant shigella through established transmission routes emphasises the need for new approaches to tackle the public health challenge of sexually transmitted infections in MSM.

    Funding: Wellcome Trust (grant number 098051).

    Funded by: Wellcome Trust: 089276, 098051, 100087, 100087/Z/12/Z

    The Lancet. Infectious diseases 2015;15;8;913-21

  • Draft Genome Sequence of 24570, the Type Strain of Shigella flexneri.

    Baker KS, Parkhill J and Thomson NR

    Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom.

    Shigella flexneri is a diarrheal pathogen that causes a large disease burden worldwide. We sequenced the genome of the publicly available type strain (S. flexneri 2a strain 24570) of this bacterial species to increase its utility as a reference. We present genome assembly results and comparisons with other reference strains.

    Genome announcements 2015;3;3

  • Deep phylogenetic analysis of haplogroup G1 provides estimates of SNP and STR mutation rates on the human Y-chromosome and reveals migrations of Iranic speakers.

    Balanovsky O, Zhabagin M, Agdzhoyan A, Chukhryaeva M, Zaporozhchenko V, Utevska O, Highnam G, Sabitov Z, Greenspan E, Dibirova K, Skhalyakho R, Kuznetsova M, Koshel S, Yusupov Y, Nymadawa P, Zhumadilov Z, Pocheshkhova E, Haber M, Zalloua PA, Yepiskoposyan L, Dybo A, Tyler-Smith C and Balanovska E

    Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia; Research Centre for Medical Genetics, Russian Academy of Sciences, Moscow, Russia.

    Y-chromosomal haplogroup G1 is a minor component of the overall gene pool of South-West and Central Asia but reaches up to 80% frequency in some populations scattered within this area. We have genotyped the G1-defining marker M285 in 27 Eurasian populations (n= 5,346), analyzed 367 M285-positive samples using 17 Y-STRs, and sequenced ~11 Mb of the Y-chromosome in 20 of these samples to an average coverage of 67X. This allowed detailed phylogenetic reconstruction. We identified five branches, all with high geographical specificity: G1-L1323 in Kazakhs, the closely related G1-GG1 in Mongols, G1-GG265 in Armenians and its distant brother clade G1-GG162 in Bashkirs, and G1-GG362 in West Indians. The haplotype diversity, which decreased from West Iran to Central Asia, allows us to hypothesize that this rare haplogroup could have been carried by the expansion of Iranic speakers northwards to the Eurasian steppe and via founder effects became a predominant genetic component of some populations, including the Argyn tribe of the Kazakhs. The remarkable agreement between genetic and genealogical trees of Argyns allowed us to calibrate the molecular clock using a historical date (1405 AD) of the most recent common genealogical ancestor. The mutation rate for Y-chromosomal sequence data obtained was 0.78×10-9 per bp per year, falling within the range of published rates. The mutation rate for Y-chromosomal STRs was 0.0022 per locus per generation, very close to the so-called genealogical rate. The "clan-based" approach to estimating the mutation rate provides a third, middle way between direct farther-to-son comparisons and using archeologically known migrations, whose dates are subject to revision and of uncertain relationship to genetic events.

    Funded by: Wellcome Trust

    PloS one 2015;10;4;e0122968

  • Multidrug-resistant Salmonella enterica serotype Typhi, Gulf of Guinea Region, Africa.

    Baltazar M, Ngandjio A, Holt KE, Lepillet E, Pardos de la Gandara M, Collard JM, Bercion R, Nzouankeu A, Le Hello S, Dougan G, Fonkoua MC and Weill FX

    We identified 3 lineages among multidrug-resistant (MDR) Salmonella enterica serotype Typhi isolates in the Gulf of Guinea region in Africa during the 2000s. However, the MDR H58 haplotype, which predominates in southern Asia and Kenya, was not identified. MDR quinolone-susceptible isolates contained a 190-kb incHI1 pST2 plasmid or a 50-kb incN pST3 plasmid.

    Emerging infectious diseases 2015;21;4;655-9

  • A novel multiple-stage antimalarial agent that inhibits protein synthesis.

    Baragaña B, Hallyburton I, Lee MC, Norcross NR, Grimaldi R, Otto TD, Proto WR, Blagborough AM, Meister S, Wirjanata G, Ruecker A, Upton LM, Abraham TS, Almeida MJ, Pradhan A, Porzelle A, Luksch, Martínez MS, Luksch T, Bolscher JM, Woodland A, Norval S, Zuccotto F, Thomas J, Simeons F, Stojanovski L, Osuna-Cabello M, Brock PM, Churcher TS, Sala KA, Zakutansky SE, Jiménez-Díaz MB, Sanz LM, Riley J, Basak R, Campbell M, Avery VM, Sauerwein RW, Dechering KJ, Noviyanti R, Campo B, Frearson JA, Angulo-Barturen I, Ferrer-Bazaga S, Gamo FJ, Wyatt PG, Leroy D, Siegl P, Delves MJ, Kyle DE, Wittlin S, Marfurt J, Price RN, Sinden RE, Winzeler EA, Charman SA, Bebrevska L, Gray DW, Campbell S, Fairlamb AH, Willis PA, Rayner JC, Fidock DA, Read KD and Gilbert IH

    Drug Discovery Unit, Division of Biological Chemistry and Drug Discovery, College of Life Sciences, University of Dundee, Dundee DD1 5EH, UK.

    There is an urgent need for new drugs to treat malaria, with broad therapeutic potential and novel modes of action, to widen the scope of treatment and to overcome emerging drug resistance. Here we describe the discovery of DDD107498, a compound with a potent and novel spectrum of antimalarial activity against multiple life-cycle stages of the Plasmodium parasite, with good pharmacokinetic properties and an acceptable safety profile. DDD107498 demonstrates potential to address a variety of clinical needs, including single-dose treatment, transmission blocking and chemoprotection. DDD107498 was developed from a screening programme against blood-stage malaria parasites; its molecular target has been identified as translation elongation factor 2 (eEF2), which is responsible for the GTP-dependent translocation of the ribosome along messenger RNA, and is essential for protein synthesis. This discovery of eEF2 as a viable antimalarial drug target opens up new possibilities for drug discovery.

    Funded by: Medical Research Council: MR/K010174/1; NIAID NIH HHS: R01 AI090141, R01 AI103058; Wellcome Trust: 079838, 091625, 098051, 100476

    Nature 2015;522;7556;315-20

  • Using human genetics to make new medicines.

    Barrett JC, Dunham I and Birney E

    Centre for Therapeutic Target Validation.

    Jeffrey Barrett, Ian Dunham and Ewan Birney discuss the initiatives of the newly founded Centre for Therapeutic Target Validation, including a range of approaches to use human genetics to inform drug discovery and make better medicines.

    Nature reviews. Genetics 2015;16;10;561-2

  • Phenotypic and functional analyses show stem cell-derived hepatocyte-like cells better mimic fetal rather than adult hepatocytes.

    Baxter M, Withey S, Harrison S, Segeritz CP, Zhang F, Atkinson-Dell R, Rowe C, Gerrard DT, Sison-Young R, Jenkins R, Henry J, Berry AA, Mohamet L, Best M, Fenwick SW, Malik H, Kitteringham NR, Goldring CE, Piper Hanley K, Vallier L and Hanley NA

    Centre for Endocrinology & Diabetes, Institute of Human Development, Faculty of Medical & Human Sciences, University of Manchester, Manchester Academic Health Science Centre, AV Hill Building, Oxford Road, Manchester, UK.

    Background &amp; aims: Hepatocyte-like cells (HLCs), differentiated from pluripotent stem cells by the use of soluble factors, can model human liver function and toxicity. However, at present HLC maturity and whether any deficit represents a true fetal state or aberrant differentiation is unclear and compounded by comparison to potentially deteriorated adult hepatocytes. Therefore, we generated HLCs from multiple lineages, using two different protocols, for direct comparison with fresh fetal and adult hepatocytes.

    Methods: Protocols were developed for robust differentiation. Multiple transcript, protein and functional analyses compared HLCs to fresh human fetal and adult hepatocytes.

    Results: HLCs were comparable to those of other laboratories by multiple parameters. Transcriptional changes during differentiation mimicked human embryogenesis and showed more similarity to pericentral than periportal hepatocytes. Unbiased proteomics demonstrated greater proximity to liver than 30 other human organs or tissues. However, by comparison to fresh material, HLC maturity was proven by transcript, protein and function to be fetal-like and short of the adult phenotype. The expression of 81% phase 1 enzymes in HLCs was significantly upregulated and half were statistically not different from fetal hepatocytes. HLCs secreted albumin and metabolized testosterone (CYP3A) and dextrorphan (CYP2D6) like fetal hepatocytes. In seven bespoke tests, devised by principal components analysis to distinguish fetal from adult hepatocytes, HLCs from two different source laboratories consistently demonstrated fetal characteristics.

    Conclusions: HLCs from different sources are broadly comparable with unbiased proteomic evidence for faithful differentiation down the liver lineage. This current phenotype mimics human fetal rather than adult hepatocytes.

    Funded by: Wellcome Trust: 088566

    Journal of hepatology 2015;62;3;581-9

  • Cancer evolution: mathematical models and computational inference.

    Beerenwinkel N, Schwarz RF, Gerstung M and Markowetz F

    Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, CB20RE, United Kingdom Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, CB20RE, United Kingdom

    Cancer is a somatic evolutionary process characterized by the accumulation of mutations, which contribute to tumor growth, clinical progression, immune escape, and drug resistance development. Evolutionary theory can be used to analyze the dynamics of tumor cell populations and to make inference about the evolutionary history of a tumor from molecular data. We review recent approaches to modeling the evolution of cancer, including population dynamics models of tumor initiation and progression, phylogenetic methods to model the evolutionary relationship between tumor subclones, and probabilistic graphical models to describe dependencies among mutations. Evolutionary modeling helps to understand how tumors arise and will also play an increasingly important prognostic role in predicting disease progression and the outcome of medical interventions, such as targeted therapy.

    Funded by: Cancer Research UK

    Systematic biology 2015;64;1;e1-25

  • Whole genome analysis to detect potential vaccine-induced changes on Shigella sonnei genome.

    Behar A, Fookes MC, Goren S, Thomson NR and Cohen D

    Department of Epidemiology and Preventive Medicine, School of Public Health, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv-Yafo, Israel. Electronic address:

    Shigellosis or bacillary dysentery is endemic worldwide and is a significant cause of death in children less than five years of age in developing countries. There are no licensed Shigella vaccines and glycoconjugates are among the leading candidate vaccines against shigellosis today. We used whole genome sequence analysis (WGA) to find out whether immunization, with an investigational Shigella sonnei glycoconjugate, could induce selective pressure leading to changes in the genome of S. sonnei. An outbreak of culture-proven S. sonnei shigellosis which occurred immediately after vaccination in one of the cohorts of volunteers participating in a phase III trial of the vaccine in Israel created a unique condition in which the epidemic agent "co-existed" with the developing immune responses induced by the vaccine and natural infection among vaccinees who developed S. sonnei shigellosis. By comparing the whole genomes of S. sonnei isolated from vaccinees and from volunteers in the control group, we show at a very high sensitivity that a potent S. sonnei glycoconjugate that conferred 74% protective efficacy against the homologous disease did not induce changes in the genome of S. sonnei and in particular on the O-antigen gene cluster.

    Vaccine 2015;33;26;2978-83

  • Genome-Wide Analysis of Evolutionary Markers of Human Influenza A(H1N1)pdm09 and A(H3N2) Viruses May Guide Selection of Vaccine Strain Candidates.

    Belanov SS, Bychkov D, Benner C, Ripatti S, Ojala T, Kankainen M, Kai Lee H, Wei-Tze Tang J and Kainov DE

    Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland.

    Here we analyzed whole-genome sequences of 3,969 influenza A(H1N1)pdm09 and 4,774 A(H3N2) strains that circulated during 2009-2015 in the world. The analysis revealed changes at 481 and 533 amino acid sites in proteins of influenza A(H1N1)pdm09 and A(H3N2) strains, respectively. Many of these changes were introduced as a result of random drift. However, there were 61 and 68 changes that were present in relatively large number of A(H1N1)pdm09 and A(H3N2) strains, respectively, that circulated during relatively long time. We named these amino acid substitutions evolutionary markers, as they seemed to contain valuable information regarding the viral evolution. Interestingly, influenza A(H1N1)pdm09 and A(H3N2) viruses acquired non-overlapping sets of evolutionary markers. We next analyzed these characteristic markers in vaccine strains recommended by the World Health Organization for the past five years. Our analysis revealed that vaccine strains carried only few evolutionary markers at antigenic sites of viral hemagglutinin (HA) and neuraminidase (NA). The absence of these markers at antigenic sites could affect the recognition of HA and NA by human antibodies generated in response to vaccinations. This could, in part, explain moderate efficacy of influenza vaccines during 2009-2014. Finally, we identified influenza A(H1N1)pdm09 and A(H3N2) strains, which contain all the evolutionary markers of influenza A strains circulated in 2015, and which could be used as vaccine candidates for the 2015/2016 season. Thus, genome-wide analysis of evolutionary markers of influenza A(H1N1)pdm09 and A(H3N2) viruses may guide selection of vaccine strain candidates.

    Genome biology and evolution 2015;7;12;3472-83

  • p53 mediates loss of hematopoietic stem cell function and lymphopenia in Mysm1 deficiency.

    Belle JI, Langlais D, Petrov JC, Pardo M, Jones RG, Gros P and Nijnik A

    Department of Physiology, Complex Traits Group, and.

    MYSM1 is a chromatin-binding transcriptional cofactor that deubiquitinates histone H2A. Studies of Mysm1-deficient mice have shown that it is essential for hematopoietic stem cell (HSC) function and lymphopoiesis. Human carriers of a rare MYSM1-inactivating mutation display similar lymphopoietic deficiencies. However, the mechanism by which MYSM1 regulates hematopoietic homeostasis remains unclear. Here, we show that Mysm1-deficiency results in p53 protein elevation in many hematopoietic cell types. p53 is a central regulator of cellular stress responses and HSC homeostasis. We thus generated double-knockout mice to assess a potential genetic interaction between Mysm1 and p53 in hematopoiesis. Mysm1(-/-)p53(-/-) mouse characterization showed a full rescue of Mysm1(-/-) developmental and hematopoietic defects. This included restoration of lymphopoiesis, and HSC numbers and functions. These results establish p53 activation as the driving mechanism for hematopoietic abnormalities in Mysm1 deficiency. Our findings may advance the understanding of p53 regulation in hematopoiesis and implicate MYSM1 as a potential p53 cofactor.

    Funded by: Canadian Institutes of Health Research: 123403; Wellcome Trust: 079643/Z/06/Z

    Blood 2015;125;15;2344-8

  • Transfer of scarlet fever-associated elements into the group A Streptococcus M1T1 clone.

    Ben Zakour NL, Davies MR, You Y, Chen JH, Forde BM, Stanton-Cook M, Yang R, Cui Y, Barnett TC, Venturini C, Ong CL, Tse H, Dougan G, Zhang J, Yuen KY, Beatson SA and Walker MJ

    Australian Infectious Diseases Research Centre, School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD 4072, Australia.

    The group A Streptococcus (GAS) M1T1 clone emerged in the 1980s as a leading cause of epidemic invasive infections worldwide, including necrotizing fasciitis and toxic shock syndrome. Horizontal transfer of mobile genetic elements has played a central role in the evolution of the M1T1 clone, with bacteriophage-encoded determinants DNase Sda1 and superantigen SpeA2 contributing to enhanced virulence and colonization respectively. Outbreaks of scarlet fever in Hong Kong and China in 2011, caused primarily by emm12 GAS, led to our investigation of the next most common cause of scarlet fever, emm1 GAS. Genomic analysis of 18 emm1 isolates from Hong Kong and 16 emm1 isolates from mainland China revealed the presence of mobile genetic elements associated with the expansion of emm12 scarlet fever clones in the M1T1 genomic background. These mobile genetic elements confer expression of superantigens SSA and SpeC, and resistance to tetracycline, erythromycin and clindamycin. Horizontal transfer of mobile DNA conferring multi-drug resistance and expression of a new superantigen repertoire in the M1T1 clone should trigger heightened public health awareness for the global dissemination of these genetic elements.

    Funded by: Wellcome Trust

    Scientific reports 2015;5;15877

  • PhyTB: Phylogenetic tree visualisation and sample positioning for M. tuberculosis.

    Benavente ED, Coll F, Furnham N, McNerney R, Glynn JR, Campino S, Pain A, Mohareb FR and Clark TG

    Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel St, London, UK.

    Background: Phylogenetic-based classification of M. tuberculosis and other bacterial genomes is a core analysis for studying evolutionary hypotheses, disease outbreaks and transmission events. Whole genome sequencing is providing new insights into the genomic variation underlying intra- and inter-strain diversity, thereby assisting with the classification and molecular barcoding of the bacteria. One roadblock to strain investigation is the lack of user-interactive solutions to interrogate and visualise variation within a phylogenetic tree setting.

    Results: We have developed a web-based tool called PhyTB ( ) to assist phylogenetic tree visualisation and identification of M. tuberculosis clade-informative polymorphism. Variant Call Format files can be uploaded to determine a sample position within the tree. A map view summarises the geographical distribution of alleles and strain-types. The utility of the PhyTB is demonstrated on sequence data from 1,601 M. tuberculosis isolates.

    Conclusion: PhyTB contextualises M. tuberculosis genomic variation within epidemiological, geographical and phylogenic settings. Further tool utility is possible by incorporating large variants and phenotypic data (e.g. drug-resistance profiles), and an assessment of genotype-phenotype associations. Source code is available to develop similar websites for other organisms ( ).

    Funded by: Medical Research Council: MR/K020420/1; Wellcome Trust

    BMC bioinformatics 2015;16;155

  • Bat and pig IFN-induced transmembrane protein 3 restrict cell entry by influenza virus and lyssaviruses.

    Benfield CTO, Smith SE, Wright E, Wash RS, Ferrara F, Temperton NJ and Kellam P

    Department of Pathology and Pathogen Biology, The Royal Veterinary College, Hatfield, UK.

    IFN-induced transmembrane protein 3 (IFITM3) is a restriction factor that blocks cytosolic entry of numerous viruses that utilize acidic endosomal entry pathways. In humans and mice, IFITM3 limits influenza-induced morbidity and mortality. Although many IFITM3-sensitive viruses are zoonotic, whether IFITMs function as antiviral restriction factors in mammalian species other than humans and mice is unknown. Here, IFITM3 orthologues in the microbat (Myotis myotis) and pig (Sus scrofa domesticus) were identified using rapid amplification of cDNA ends. Amino acid residues known to be important for IFITM3 function were conserved in the pig and microbat orthologues. Ectopically expressed pig and microbat IFITM3 co-localized with transferrin (early endosomes) and CD63 (late endosomes/multivesicular bodies). Pig and microbat IFITM3 restricted cell entry mediated by multiple influenza haemagglutinin subtypes and lyssavirus glycoproteins. Expression of pig or microbat IFITM3 in A549 cells reduced influenza virus yields and nucleoprotein expression. Conversely, small interfering RNA knockdown of IFITM3 in pig NPTr cells and primary microbat cells enhanced virus replication, demonstrating that these genes are functional in their species of origin at endogenous levels. In summary, we showed that IFITMs function as potent broad-spectrum antiviral effectors in two mammals - pigs and bats - identified as major reservoirs for emerging viruses.

    Funded by: Medical Research Council: G1000413; Wellcome Trust: 098051

    The Journal of general virology 2015;96;Pt 5;991-1005

  • Genomic perspectives on the evolution and spread of bacterial pathogens.

    Bentley SD and Parkhill J

    The Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Since the first complete sequencing of a free-living organism, Haemophilus influenzae, genomics has been used to probe both the biology of bacterial pathogens and their evolution. Single-genome approaches provided information on the repertoire of virulence determinants and host-interaction factors, and, along with comparative analyses, allowed the proposal of hypotheses to explain the evolution of many of these traits. These analyses suggested many bacterial pathogens to be of relatively recent origin and identified genome degradation as a key aspect of host adaptation. The advent of very-high-throughput sequencing has allowed for detailed phylogenetic analysis of many important pathogens, revealing patterns of global and local spread, and recent evolution in response to pressure from therapeutics and the human immune system. Such analyses have shown that bacteria can evolve and transmit very rapidly, with emerging clones showing adaptation and global spread over years or decades. The resolution achieved with whole-genome sequencing has shown considerable benefits in clinical microbiology, enabling accurate outbreak tracking within hospitals and across continents. Continued large-scale sequencing promises many further insights into genetic determinants of drug resistance, virulence and transmission in bacterial pathogens.

    Funded by: Wellcome Trust: 098051

    Proceedings. Biological sciences / The Royal Society 2015;282;1821;20150488

  • Oct-2 forms a complex with Oct-1 on the iNOS promoter and represses transcription by interfering with recruitment of RNA PolII by Oct-1.

    Bentrari F, Chantôme A, Knights A, Jeannin JF and Pance A

    EPHE Laboratory, Faculty of Medicine, University of Bourgogne, 7 Boulevard Jeanne D'Arc, 21033 Dijon, France.

    Oct-1 (POU2f1) and Oct-2 (POU2f2) are members of the POU family of transcription factors. They recognize the same DNA sequence but fulfil distinct functions: Oct-1 is ubiquitous and regulates a variety of genes while Oct-2 is restricted to B-cells and neurones. Here we examine the interplay and regulatory mechanisms of these factors to control the inducible nitric oxide synthase (iNOS, NOS2). Using two breast cancer cell lines as a comparative model, we found that MCF-7 express iNOS upon cytokine stimulation while MDA-MB-231 do not. Oct-1 is present in both cell lines but MDA-MB-231 also express high levels of Oct-2. Manipulation of Oct-2 expression in these cell lines demonstrates that it is directly responsible for the repression of iNOS in MDA-MB-231. In MCF-7 cells Oct-1 binds the iNOS promoter, recruits RNA PolII and triggers initiation of transcription. In MDA-MB-231 cells, both Oct-1 and Oct-2 bind the iNOS promoter, forming a higher-order complex which fails to recruit RNA PolII, and as a consequence iNOS transcription does not proceed. Unravelling the mechanisms of transcription factor activity is paramount to the understanding of gene expression patterns that determine cell behaviour.

    Nucleic acids research 2015;43;20;9757-65

  • Essential roles of methionine and S-adenosylmethionine in the autarkic lifestyle of Mycobacterium tuberculosis.

    Berney M, Berney-Meyer L, Wong KW, Chen B, Chen M, Kim J, Wang J, Harris D, Parkhill J, Chan J, Wang F and Jacobs WR

    Department of Microbiology and Immunology, Albert Einstein College of Medicine, Bronx, NY 10461;

    Multidrug resistance, strong side effects, and compliance problems in TB chemotherapy mandate new ways to kill Mycobacterium tuberculosis (Mtb). Here we show that deletion of the gene encoding homoserine transacetylase (metA) inactivates methionine and S-adenosylmethionine (SAM) biosynthesis in Mtb and renders this pathogen exquisitely sensitive to killing in immunocompetent or immunocompromised mice, leading to rapid clearance from host tissues. Mtb ΔmetA is unable to proliferate in primary human macrophages, and in vitro starvation leads to extraordinarily rapid killing with no appearance of suppressor mutants. Cell death of Mtb ΔmetA is faster than that of other auxotrophic mutants (i.e., tryptophan, pantothenate, leucine, biotin), suggesting a particularly potent mechanism of killing. Time-course metabolomics showed complete depletion of intracellular methionine and SAM. SAM depletion was consistent with a significant decrease in methylation at the DNA level (measured by single-molecule real-time sequencing) and with the induction of several essential methyltransferases involved in biotin and menaquinone biosynthesis, both of which are vital biological processes and validated targets of antimycobacterial drugs. Mtb ΔmetA could be partially rescued by biotin supplementation, confirming a multitarget cell death mechanism. The work presented here uncovers a previously unidentified vulnerability of Mtb-the incapacity to scavenge intermediates of SAM and methionine biosynthesis from the host. This vulnerability unveils an entirely new drug target space with the promise of rapid killing of the tubercle bacillus by a new mechanism of action.

    Funded by: Howard Hughes Medical Institute; NIAID NIH HHS: AI097548, AI26170, P01AI063537, R01 AI026170, R01 AI097548; PHS HHS: 098051

    Proceedings of the National Academy of Sciences of the United States of America 2015;112;32;10008-13

  • Prmt5: a guardian of the germline protects future generations.

    Berrens RV and Reik W

    Epigenetics Programme, The Babraham Institute, Cambridge, UK.

    Funded by: Wellcome Trust: 095645

    The EMBO journal 2015;34;6;689-90

  • Fucci2a mouse upgrades live cell cycle imaging.

    Bertero A and Vallier L

    a Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute ; Anne McLaren Laboratory for Regenerative Medicine and Department of Surgery ; University of Cambridge ; UK.

    Funded by: Medical Research Council: G0800784, G1000847

    Cell cycle (Georgetown, Tex.) 2015;14;7;948-9

  • Activin/nodal signaling and NANOG orchestrate human embryonic stem cell fate decisions by controlling the H3K4me3 chromatin mark.

    Bertero A, Madrigal P, Galli A, Hubner NC, Moreno I, Burks D, Brown S, Pedersen RA, Gaffney D, Mendjan S, Pauklin S and Vallier L

    Wellcome Trust-MRC Stem Cell Institute Anne McLaren Laboratory, Department of Surgery, University of Cambridge, Cambridge CB2 0SZ, United Kingdom;

    Stem cells can self-renew and differentiate into multiple cell types. These characteristics are maintained by the combination of specific signaling pathways and transcription factors that cooperate to establish a unique epigenetic state. Despite the broad interest of these mechanisms, the precise molecular controls by which extracellular signals organize epigenetic marks to confer multipotency remain to be uncovered. Here, we use human embryonic stem cells (hESCs) to show that the Activin-SMAD2/3 signaling pathway cooperates with the core pluripotency factor NANOG to recruit the DPY30-COMPASS histone modifiers onto key developmental genes. Functional studies demonstrate the importance of these interactions for correct histone 3 Lys4 trimethylation and also self-renewal and differentiation. Finally, genetic studies in mice show that Dpy30 is also necessary to maintain pluripotency in the pregastrulation embryo, thereby confirming the existence of similar regulations in vivo during early embryonic development. Our results reveal the mechanisms by which extracellular factors coordinate chromatin status and cell fate decisions in hESCs.

    Funded by: British Heart Foundation; Medical Research Council: G0800784, G1000847

    Genes & development 2015;29;7;702-17

  • Cross-species fertilization: the hamster egg receptor, Juno, binds the human sperm ligand, Izumo1.

    Bianchi E and Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK

    Fertilization is the culminating event in sexual reproduction and requires the recognition and fusion of the haploid sperm and egg to form a new diploid organism. Specificity in these recognition events is one reason why sperm and eggs from different species are not normally compatible. One notable exception is the unusual ability of zona-free eggs from the Syrian golden hamster (Mesocricetus auratus) to recognize and fuse with human sperm, a phenomenon that has been exploited to assess sperm quality in assisted fertility treatments. Following our recent finding that the interaction between the sperm and egg recognition receptors Izumo1 and Juno is essential for fertilization, we now demonstrate concordance between the ability of Izumo1 and Juno from different species to interact, and the ability of their isolated gametes to cross-fertilize each other in vitro. In particular, we show that Juno from the golden hamster can directly interact with human Izumo1. These data suggest that the interaction between Izumo1 and Juno plays an important role in cross-species gamete recognition, and may inform the development of improved prognostic tests that do not require the use of animals to guide the most appropriate fertility treatment for infertile couples.

    Funded by: Wellcome Trust: 098051

    Philosophical transactions of the Royal Society of London. Series B, Biological sciences 2015;370;1661;20140101

  • The Ribosome Biogenesis Protein Nol9 Is Essential for Definitive Hematopoiesis and Pancreas Morphogenesis in Zebrafish.

    Bielczyk-Maczyńska E, Lam Hung L, Ferreira L, Fleischmann T, Weis F, Fernández-Pevida A, Harvey SA, Wali N, Warren AJ, Barroso I, Stemple DL and Cvejic A

    Department of Haematology, University of Cambridge, Cambridge, United Kingdom.

    Ribosome biogenesis is a ubiquitous and essential process in cells. Defects in ribosome biogenesis and function result in a group of human disorders, collectively known as ribosomopathies. In this study, we describe a zebrafish mutant with a loss-of-function mutation in nol9, a gene that encodes a non-ribosomal protein involved in rRNA processing. nol9sa1022/sa1022 mutants have a defect in 28S rRNA processing. The nol9sa1022/sa1022 larvae display hypoplastic pancreas, liver and intestine and have decreased numbers of hematopoietic stem and progenitor cells (HSPCs), as well as definitive erythrocytes and lymphocytes. In addition, ultrastructural analysis revealed signs of pathological processes occurring in endothelial cells of the caudal vein, emphasizing the complexity of the phenotype observed in nol9sa1022/sa1022 larvae. We further show that both the pancreatic and hematopoietic deficiencies in nol9sa1022/sa1022 embryos were due to impaired cell proliferation of respective progenitor cells. Interestingly, genetic loss of Tp53 rescued the HSPCs but not the pancreatic defects. In contrast, activation of mRNA translation via the mTOR pathway by L-Leucine treatment did not revert the erythroid or pancreatic defects. Together, we present the nol9sa1022/sa1022 mutant, a novel zebrafish ribosomopathy model, which recapitulates key human disease characteristics. The use of this genetically tractable model will enhance our understanding of the tissue-specific mechanisms following impaired ribosome biogenesis in the context of an intact vertebrate.

    Funded by: Cancer Research UK: C45041/A14953; Medical Research Council: MC_U105161083, MR/L003368/1; Wellcome Trust: 084183/Z/07/Z

    PLoS genetics 2015;11;12;e1005677

  • Calcium Builds Strong Host-Parasite Interactions.

    Billker O and Rayner JC

    Malaria Programme, Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK. Electronic address:

    Apicomplexan parasite invasion of host cells is a multistep process, requiring coordinated events. In this issue of Cell Host & Microbe, Paul et al. (2015) and Philip and Waters (2015) leverage experimental genetics to show that the calcium-regulated protein phosphatase, calcinuerin, regulates invasion in multiple parasite species.

    Cell host & microbe 2015;18;1;9-10

  • Human genomics: The end of the start for population sequencing.

    Birney E and Soranzo N

    European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge CB10 1SD, UK.

    Nature 2015;526;7571;52-3

  • TopBP1 interacts with BLM to maintain genome stability but is dispensable for preventing BLM degradation.

    Blackford AN, Nieminuszczy J, Schwab RA, Galanty Y, Jackson SP and Niedzwiedz W

    The Weatherall Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, UK; The Gurdon Institute and Department of Biochemistry, University of Cambridge, Cambridge CB2 1QN, UK.

    The Bloom syndrome helicase BLM and topoisomerase-IIβ-binding protein 1 (TopBP1) are key regulators of genome stability. It was recently proposed that BLM phosphorylation on Ser338 mediates its interaction with TopBP1, to protect BLM from ubiquitylation and degradation (Wang et al., 2013). Here, we show that the BLM-TopBP1 interaction does not involve Ser338 but instead requires BLM phosphorylation on Ser304. Furthermore, we establish that disrupting this interaction does not markedly affect BLM stability. However, BLM-TopBP1 binding is important for maintaining genome integrity, because in its absence cells display increased sister chromatid exchanges, replication origin firing and chromosomal aberrations. Therefore, the BLM-TopBP1 interaction maintains genome stability not by controlling BLM protein levels, but via another as-yet undetermined mechanism. Finally, we identify critical residues that mediate interactions between TopBP1 and MDC1, and between BLM and TOP3A/RMI1/RMI2. Taken together, our findings provide molecular insights into a key tumor suppressor and genome stability network.

    Funded by: Cancer Research UK: 11224, C6/A11224, C6946/A14492; Medical Research Council: G0902418; Wellcome Trust: WT092096

    Molecular cell 2015;57;6;1133-41

  • Population, genetic, and antigenic diversity of the apicomplexan Eimeria tenella and their relevance to vaccine development.

    Blake DP, Clark EL, Macdonald SE, Thenmozhi V, Kundu K, Garg R, Jatau ID, Ayoade S, Kawahara F, Moftah A, Reid AJ, Adebambo AO, Álvarez Zapata R, Srinivasa Rao AS, Thangaraj K, Banerjee PS, Dhinakar-Raj G, Raman M and Tomley FM

    Pathology and Pathogen Biology, Royal Veterinary College, North Mymms, Hertfordshire, AL9 7TA, United Kingdom;

    The phylum Apicomplexa includes serious pathogens of humans and animals. Understanding the distribution and population structure of these protozoan parasites is of fundamental importance to explain disease epidemiology and develop sustainable controls. Predicting the likely efficacy and longevity of subunit vaccines in field populations relies on knowledge of relevant preexisting antigenic diversity, population structure, the likelihood of coinfection by genetically distinct strains, and the efficiency of cross-fertilization. All four of these factors have been investigated for Plasmodium species parasites, revealing both clonal and panmictic population structures with exceptional polymorphism associated with immunoprotective antigens such as apical membrane antigen 1 (AMA1). For the coccidian Toxoplasma gondii only genomic diversity and population structure have been defined in depth so far; for the closely related Eimeria species, all four variables are currently unknown. Using Eimeria tenella, a major cause of the enteric disease coccidiosis, which exerts a profound effect on chicken productivity and welfare, we determined population structure, genotype distribution, and likelihood of cross-fertilization during coinfection and also investigated the extent of naturally occurring antigenic diversity for the E. tenella AMA1 homolog. Using genome-wide Sequenom SNP-based haplotyping, targeted sequencing, and single-cell genotyping, we show that in this coccidian the functionality of EtAMA1 appears to outweigh immune evasion. This result is in direct contrast to the situation in Plasmodium and most likely is underpinned by the biology of the direct and acute coccidian life cycle in the definitive host.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/H009337

    Proceedings of the National Academy of Sciences of the United States of America 2015;112;38;E5343-50

  • What is "data sharing" and why should biomedical researchers embrace it?

    Bobrow M

    1University of Cambridge, United Kingdom.

    Transplantation 2015;99;4;654-5

  • Characterization of Two Distinct Nucleosome Remodeling and Deacetylase (NuRD) Complex Assemblies in Embryonic Stem Cells.

    Bode D, Yu L, Tate P, Pardo M and Choudhary J

    From the ‡Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK;

    Pluripotency and self-renewal, the defining properties of embryonic stem cells, are brought about by transcriptional programs involving an intricate network of transcription factors and chromatin remodeling complexes. The Nucleosome Remodeling and Deacetylase (NuRD) complex plays a crucial and dynamic role in the regulation of stemness and differentiation. Several NuRD-associated factors have been reported but how they are organized has not been investigated in detail. Here, we have combined affinity purification and blue native polyacrylamide gel electrophoresis followed by protein identification by mass spectrometry and protein correlation profiling to characterize the topology of the NuRD complex. Our data show that in mouse embryonic stem cells the NuRD complex is present as two distinct assemblies of differing topology with different binding partners. Cell cycle regulator Cdk2ap1 and transcription factor Sall4 associate only with the higher mass NuRD assembly. We further establish that only isoform Sall4a, and not Sall4b, associates with NuRD. By contrast, Suz12, a component of the PRC2 Polycomb repressor complex, associates with the lower mass entity. In addition, we identify and validate a novel NuRD-associated protein, Wdr5, a regulatory subunit of the MLL histone methyltransferase complex, which associates with both NuRD entities. Bioinformatic analyses of published target gene sets of these chromatin binding proteins are in agreement with these structural observations. In summary, this study provides an interesting insight into mechanistic aspects of NuRD function in stem cell biology. The relevance of our work has broader implications because of the ubiquitous nature of the NuRD complex. The strategy described here can be more broadly applicable to investigate the topology of the multiple complexes an individual protein can participate in.

    Funded by: Wellcome Trust: WT098051

    Molecular & cellular proteomics : MCP 2015;15;3;878-91

  • Complete Genome Sequence of Bordetella pertussis D420.

    Boinett CJ, Harris SR, Langridge GC, Trainor EA, Merkel TJ and Parkhill J

    Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom.

    Bordetella pertussis is the causative agent of whooping cough, a highly contagious, acute respiratory illness that has seen resurgence despite the use of vaccines. We present the complete genome sequence of a clinical strain of B. pertussis, D420, which is representative of a currently circulating clade of this pathogen.

    Funded by: Medical Research Council: G1100100

    Genome announcements 2015;3;3

  • Locally Confined Clonal Complexes of Mycobacterium ulcerans in Two Buruli Ulcer Endemic Regions of Cameroon.

    Bolz M, Bratschi MW, Kerber S, Minyem JC, Um Boock A, Vogel M, Bayi PF, Junghanss T, Brites D, Harris SR, Parkhill J, Pluschke G and Lamelas Cabello A

    Swiss Tropical and Public Health Institute, Basel, Switzerland; University of Basel, Basel, Switzerland.

    Background: Mycobacterium ulcerans is the causative agent of the necrotizing skin disease Buruli ulcer (BU), which has been reported from over 30 countries worldwide. The majority of notified patients come from West African countries, such as Côte d'Ivoire, Ghana, Benin and Cameroon. All clinical isolates of M. ulcerans from these countries are closely related and their genomes differ only in a limited number of single nucleotide polymorphisms (SNPs).

    Methodology/principal findings: We performed a molecular epidemiological study with clinical isolates from patients from two distinct BU endemic regions of Cameroon, the Nyong and the Mapé river basins. Whole genome sequencing of the M. ulcerans strains from these two BU endemic areas revealed the presence of two phylogenetically distinct clonal complexes. The strains from the Nyong river basin were genetically more diverse and less closely related to the M. ulcerans strain circulating in Ghana and Benin than the strains causing BU in the Mapé river basin.

    Conclusions: Our comparative genomic analysis revealed that M. ulcerans clones diversify locally by the accumulation of SNPs. Case isolates coming from more recently emerging BU endemic areas, such as the Mapé river basin, may be less diverse than populations from longer standing disease foci, such as the Nyong river basin. Exchange of strains between distinct endemic areas seems to be rare and local clonal complexes can be easily distinguished by whole genome sequencing.

    PLoS neglected tropical diseases 2015;9;6;e0003802

  • Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency.

    Bone WP, Washington NL, Buske OJ, Adams DR, Davis J, Draper D, Flynn ED, Girdea M, Godfrey R, Golas G, Groden C, Jacobsen J, Köhler S, Lee EM, Links AE, Markello TC, Mungall CJ, Nehrebecky M, Robinson PN, Sincan M, Soldatos AG, Tifft CJ, Toro C, Trang H, Valkanas E, Vasilevsky N, Wahl C, Wolfe LA, Boerkoel CF, Brudno M, Haendel MA, Gahl WA and Smedley D

    Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA.

    Purpose: Medical diagnosis and molecular or biochemical confirmation typically rely on the knowledge of the clinician. Although this is very difficult in extremely rare diseases, we hypothesized that the recording of patient phenotypes in Human Phenotype Ontology (HPO) terms and computationally ranking putative disease-associated sequence variants improves diagnosis, particularly for patients with atypical clinical profiles.

    Methods: Using simulated exomes and the National Institutes of Health Undiagnosed Diseases Program (UDP) patient cohort and associated exome sequence, we tested our hypothesis using Exomiser. Exomiser ranks candidate variants based on patient phenotype similarity to (i) known disease-gene phenotypes, (ii) model organism phenotypes of candidate orthologs, and (iii) phenotypes of protein-protein association neighbors.

    Results: Benchmarking showed Exomiser ranked the causal variant as the top hit in 97% of known disease-gene associations and ranked the correct seeded variant in up to 87% when detectable disease-gene associations were unavailable. Using UDP data, Exomiser ranked the causative variant(s) within the top 10 variants for 11 previously diagnosed variants and achieved a diagnosis for 4 of 23 cases undiagnosed by clinical evaluation.

    Conclusion: Structured phenotyping of patients and computational analysis are effective adjuncts for diagnosing patients with genetic disorders.Genet Med 18 6, 608-617.

    Funded by: NHGRI NIH HHS: HHSN268201300036C, U54 HG006370; NIH HHS: R24 OD011883; Wellcome Trust: 098051

    Genetics in medicine : official journal of the American College of Medical Genetics 2015;18;6;608-17

  • Canine Mammary Tumours Are Affected by Frequent Copy Number Aberrations, including Amplification of MYC and Loss of PTEN.

    Borge KS, Nord S, Van Loo P, Lingjærde OC, Gunnes G, Alnæs GI, Solvang HK, Lüders T, Kristensen VN, Børresen-Dale AL and Lingaas F

    Section of Genetics, Department of Basic Sciences and Aquatic Medicine, Faculty of Veterinary Medicine and Biosciences, Norwegian University of Life Sciences (NMBU),Oslo, Norway.

    Background: Copy number aberrations frequently occur during the development of many cancers. Such events affect dosage of involved genes and may cause further genomic instability and progression of cancer. In this survey, canine SNP microarrays were used to study 117 canine mammary tumours from 69 dogs.

    Results: We found a high occurrence of copy number aberrations in canine mammary tumours, losses being more frequent than gains. Increased frequency of aberrations and loss of heterozygosity were positively correlated with increased malignancy in terms of histopathological diagnosis. One of the most highly recurrently amplified regions harbored the MYC gene. PTEN was located to a frequently lost region and also homozygously deleted in five tumours. Thus, deregulation of these genes due to copy number aberrations appears to be an important event in canine mammary tumour development. Other potential contributors to canine mammary tumour pathogenesis are COL9A3, INPP5A, CYP2E1 and RB1. The present study also shows that a more detailed analysis of chromosomal aberrations associated with histopathological parameters may aid in identifying specific genes associated with canine mammary tumour progression.

    Conclusions: The high frequency of copy number aberrations is a prominent feature of canine mammary tumours as seen in other canine and human cancers. Our findings share several features with corresponding studies in human breast tumours and strengthen the dog as a suitable model organism for this disease.

    PloS one 2015;10;5;e0126371

  • Characterisation of a mobilisable plasmid conferring florfenicol and chloramphenicol resistance in Actinobacillus pleuropneumoniae.

    Bossé JT, Li Y, Atherton TG, Walker S, Williamson SM, Rogers J, Chaudhuri RR, Weinert LA, Holden MT, Maskell DJ, Tucker AW, Wren BW, Rycroft AN, Langford PR and BRaDP1T consortium

    Section of Paediatrics, Department of Medicine, Imperial College London, St. Mary's Campus, London, W2 1PG, UK. Electronic address:

    The complete nucleotide sequence of a 7.7kb mobilisable plasmid (pM3446F), isolated from a florfenicol resistant isolate of Actinobacillus pleuropneumoniae, showed extended similarity to plasmids found in other members of the Pasteurellaceae containing the floR gene as well as replication and mobilisation genes. Mobilisation into other Pasteurellaceae species confirmed that this plasmid can be transferred horizontally.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/G018553/1, BB/G019177/1, BB/G019274/1

    Veterinary microbiology 2015;178;3-4;279-82

  • Identification of dfrA14 in two distinct plasmids conferring trimethoprim resistance in Actinobacillus pleuropneumoniae.

    Bossé JT, Li Y, Walker S, Atherton T, Fernandez Crespo R, Williamson SM, Rogers J, Chaudhuri RR, Weinert LA, Oshota O, Holden MT, Maskell DJ, Tucker AW, Wren BW, Rycroft AN, Langford PR and BRaDP1T Consortium

    Section of Paediatrics, Department of Medicine, Imperial College London, St Mary's Campus, London W2 1PG, UK

    Objectives: The objective of this study was to determine the distribution and genetic basis of trimethoprim resistance in Actinobacillus pleuropneumoniae isolates from pigs in England.

    Methods: Clinical isolates collected between 1998 and 2011 were tested for resistance to trimethoprim and sulphonamide. The genetic basis of trimethoprim resistance was determined by shotgun WGS analysis and the subsequent isolation and sequencing of plasmids.

    Results: A total of 16 (out of 106) A. pleuropneumoniae isolates were resistant to both trimethoprim (MIC >32 mg/L) and sulfisoxazole (MIC ≥256 mg/L), and a further 32 were resistant only to sulfisoxazole (MIC ≥256 mg/L). Genome sequence data for the trimethoprim-resistant isolates revealed the presence of the dfrA14 dihydrofolate reductase gene. The distribution of plasmid sequences in multiple contigs suggested the presence of two distinct dfrA14-containing plasmids in different isolates, which was confirmed by plasmid isolation and sequencing. Both plasmids encoded mobilization genes, the sulphonamide resistance gene sul2, as well as dfrA14 inserted into strA, a streptomycin-resistance-associated gene, although the gene order differed between the two plasmids. One of the plasmids further encoded the strB streptomycin-resistance-associated gene.

    Conclusions: This is the first description of mobilizable plasmids conferring trimethoprim resistance in A. pleuropneumoniae and, to our knowledge, the first report of dfrA14 in any member of the Pasteurellaceae. The identification of dfrA14 conferring trimethoprim resistance in A. pleuropneumoniae isolates will facilitate PCR screens for resistance to this important antimicrobial.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/G018553/1, BB/G019177/1, BB/G019274/1, BB/G020744/1; Wellcome Trust: 098051

    The Journal of antimicrobial chemotherapy 2015;70;8;2217-22

  • Combined Proteomics and Transcriptomics Identifies Carboxypeptidase B1 and Nuclear Factor κB (NF-κB) Associated Proteins as Putative Biomarkers of Metastasis in Low Grade Breast Cancer.

    Bouchal P, Dvořáková M, Roumeliotis T, Bortlíček Z, Ihnatová I, Procházková I, Ho JT, Maryáš J, Imrichová H, Budinská E, Vyzula R, Garbis SD, Vojtěšek B and Nenutil R

    From the ‡Masaryk Memorial Cancer Institute, Regional Centre for Applied Molecular Oncology, Brno, Czech Republic; §Masaryk University, Faculty of Science, Department of Biochemistry, Brno, Czech Republic;

    Current prognostic factors are insufficient for precise risk-discrimination in breast cancer patients with low grade breast tumors, which, in disagreement with theoretical prognosis, occasionally form early lymph node metastasis. To identify markers for this group of patients, we employed iTRAQ-2DLC-MS/MS proteomics to 24 lymph node positive and 24 lymph node negative grade 1 luminal A primary breast tumors. Another group of 48 high-grade tumors (luminal B, triple negative, Her-2 subtypes) was also analyzed to investigate marker specificity for grade 1 luminal A tumors. From the total of 4405 proteins identified (FDR < 5%), the top 65 differentially expressed together with 30 previously identified and control markers were analyzed also at transcript level. Increased levels of carboxypeptidase B1 (CPB1), PDZ and LIM domain protein 2 (PDLIM2), and ring finger protein 25 (RNF25) were associated specifically with lymph node positive grade 1 tumors, whereas stathmin 1 (STMN1) and thymosin beta 10 (TMSB10) associated with aggressive tumor phenotype also in high grade tumors at both protein and transcript level. For CPB1, these differences were also observed by immunohistochemical analysis on tissue microarrays. Up-regulation of putative biomarkers in lymph node positive (versus negative) luminal A tumors was validated by gene expression analysis of an independent published data set (n = 343) for CPB1 (p = 0.00155), PDLIM2 (p = 0.02027) and RELA (p = 0.00015). Moreover, statistically significant connections with patient survival were identified in another public data set (n = 1678). Our findings indicate unique pro-metastatic mechanisms in grade 1 tumors that can include up-regulation of CPB1, activation of NF-κB pathway and changes in cell survival and cytoskeleton. These putative biomarkers have potential to identify the specific minor subpopulation of breast cancer patients with low grade tumors who are at higher than expected risk of recurrence and who would benefit from more intensive follow-up and may require more personalized therapy.

    Molecular & cellular proteomics : MCP 2015;14;7;1814-30

  • Disease progression despite protective HLA expression in an HIV-infected transmission pair.

    Brener J, Gall A, Batorsky R, Riddell L, Buus S, Leitman E, Kellam P, Allen T, Goulder P and Matthews PC

    Department of Paediatrics, Peter Medawar Building for Pathogen Research, University of Oxford, Oxford, OX1 3SY, UK.

    Background: The precise immune responses mediated by HLA class I molecules such as HLA-B*27:05 and HLA-B*57:01 that protect against HIV disease progression remain unclear. We studied a CRF01_AE clade HIV infected donor-recipient transmission pair in which the recipient expressed both HLA-B*27:05 and HLA-B*57:01.

    Results: Within 4.5 years of diagnosis, the recipient had progressed to meet criteria for antiretroviral therapy initiation. We employed ultra-deep sequencing of the full-length virus genome in both donor and recipient as an unbiased approach by which to identify specific viral mutations selected in association with progression. Using a heat map method to highlight differences in the viral sequences between donor and recipient, we demonstrated that the majority of the recipient's mutations outside of Env were within epitopes restricted by HLA-B*27:05 and HLA-B*57:01, including the well-studied Gag epitopes. The donor, who also expressed HLA alleles associated with disease protection, HLA-A*32:01/B*13:02/B*14:01, showed selection of mutations in parallel with disease progression within epitopes restricted by these protective alleles.

    Conclusions: These studies of full-length viral sequences in a transmission pair, both of whom expressed protective HLA alleles but nevertheless failed to control viremia, are consistent with previous reports pointing to the critical role of Gag-specific CD8+ T cell responses restricted by protective HLA molecules in maintaining immune control of HIV infection. The transmission of subtype CRF01_AE clade infection may have contributed to accelerated disease progression in this pair as a result of clade-specific sequence differences in immunodominant epitopes.

    Funded by: Medical Research Council: G0501777; NIAID NIH HHS: R01 AI046995; Wellcome Trust: 104748, WT 104748MA

    Retrovirology 2015;12;55

  • Complete genome sequence of BS49 and draft genome sequence of BS34A, Bacillus subtilis strains carrying Tn916.

    Browne HP, Anvar SY, Frank J, Lawley TD, Roberts AP and Smits WK

    Host-Microbiota Interactions Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK.

    Bacillus subtilis strains BS49 and BS34A, both derived from a common ancestor, carry one or more copies of Tn916, an extremely common mobile genetic element capable of transfer to and from a broad range of microorganisms. Here, we report the complete genome sequence of BS49 and the draft genome sequence of BS34A, which have repeatedly been used as donors to transfer Tn916, Tn916 derivatives or oriTTn916-containing plasmids to clinically important pathogens.

    FEMS microbiology letters 2015;362;3;1-4

  • Devising a Consensus Framework for Validation of Novel Human Coding Loci.

    Bruford EA, Lane L and Harrow J

    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI) , Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, United Kingdom.

    A report on the Wellcome Trust retreat on devising a consensus framework for the validation of novel human protein coding loci, held in Hinxton, U.K., May 11-13, 2015.

    Funded by: NHGRI NIH HHS: U41 HG003345, U41 HG007234, U41HG003345, U41HG007234; Wellcome Trust: 099129/Z/12/Z, WT098051

    Journal of proteome research 2015;14;12;4945-8

  • Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells.

    Buettner F, Natarajan KN, Casale FP, Proserpio V, Scialdone A, Theis FJ, Teichmann SA, Marioni JC and Stegle O

    1] Helmholtz Zentrum München-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany. [2] European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Recent technical developments have enabled the transcriptomes of hundreds of cells to be assayed in an unbiased manner, opening up the possibility that new subpopulations of cells can be found. However, the effects of potential confounding factors, such as the cell cycle, on the heterogeneity of gene expression and therefore on the ability to robustly identify subpopulations remain unclear. We present and validate a computational approach that uses latent variable models to account for such hidden factors. We show that our single-cell latent variable model (scLVM) allows the identification of otherwise undetectable subpopulations of cells that correspond to different stages during the differentiation of naive T cells into T helper 2 cells. Our approach can be used not only to identify cellular subpopulations but also to tease apart different sources of gene expression heterogeneity in single-cell transcriptomes.

    Nature biotechnology 2015;33;2;155-60

  • αIIbβ3 variants defined by next-generation sequencing: predicting variants likely to cause Glanzmann thrombasthenia.

    Buitrago L, Rendon A, Liang Y, Simeoni I, Negri A, ThromboGenomics Consortium, Filizola M, Ouwehand WH and Coller BS

    Allen and Frances Adler Laboratory of Blood and Vascular Biology and.

    Next-generation sequencing is transforming our understanding of human genetic variation but assessing the functional impact of novel variants presents challenges. We analyzed missense variants in the integrin αIIbβ3 receptor subunit genes ITGA2B and ITGB3 identified by whole-exome or -genome sequencing in the ThromboGenomics project, comprising ∼32,000 alleles from 16,108 individuals. We analyzed the results in comparison with 111 missense variants in these genes previously reported as being associated with Glanzmann thrombasthenia (GT), 20 associated with alloimmune thrombocytopenia, and 5 associated with aniso/macrothrombocytopenia. We identified 114 novel missense variants in ITGA2B (affecting ∼11% of the amino acids) and 68 novel missense variants in ITGB3 (affecting ∼9% of the amino acids). Of the variants, 96% had minor allele frequencies (MAF) < 0.1%, indicating their rarity. Based on sequence conservation, MAF, and location on a complete model of αIIbβ3, we selected three novel variants that affect amino acids previously associated with GT for expression in HEK293 cells. αIIb P176H and β3 C547G severely reduced αIIbβ3 expression, whereas αIIb P943A partially reduced αIIbβ3 expression and had no effect on fibrinogen binding. We used receiver operating characteristic curves of combined annotation-dependent depletion, Polyphen 2-HDIV, and sorting intolerant from tolerant to estimate the percentage of novel variants likely to be deleterious. At optimal cut-off values, which had 69-98% sensitivity in detecting GT mutations, between 27% and 71% of the novel αIIb or β3 missense variants were predicted to be deleterious. Our data have implications for understanding the evolutionary pressure on αIIbβ3 and highlight the challenges in predicting the clinical significance of novel missense variants.

    Funded by: British Heart Foundation: RG/09/012/28096; Medical Research Council: MR/K023489/1; NCATS NIH HHS: UL1 TR000043; NHLBI NIH HHS: HL19278; NIA NIH HHS: R01 AG048022; NIMHD NIH HHS: R01 MD007880

    Proceedings of the National Academy of Sciences of the United States of America 2015;112;15;E1898-907

  • Population whole-genome bisulfite sequencing across two tissues highlights the environment as the principal source of human methylome variation.

    Busche S, Shao X, Caron M, Kwan T, Allum F, Cheung WA, Ge B, Westfall S, Simon MM, Multiple Tissue Human Expression Resource, Barrett A, Bell JT, McCarthy MI, Deloukas P, Blanchette M, Bourque G, Spector TD, Lathrop M, Pastinen T and Grundberg E

    Department of Human Genetics, McGill University, 740 Dr. Penfield Avenue, H3A 0G1, Montreal, Quebec, Canada.

    Background: CpG methylation variation is involved in human trait formation and disease susceptibility. Analyses within populations have been biased towards CpG-dense regions through the application of targeted arrays. We generate whole-genome bisulfite sequencing data for approximately 30 adipose and blood samples from monozygotic and dizygotic twins for the characterization of non-genetic and genetic effects at single-site resolution.

    Results: Purely invariable CpGs display a bimodal distribution with enrichment of unmethylated CpGs and depletion of fully methylated CpGs in promoter and enhancer regions. Population-variable CpGs account for approximately 15-20 % of total CpGs per tissue, are enriched in enhancer-associated regions and depleted in promoters, and single nucleotide polymorphisms at CpGs are a frequent confounder of extreme methylation variation. Differential methylation is primarily non-genetic in origin, with non-shared environment accounting for most of the variance. These non-genetic effects are mainly tissue-specific. Tobacco smoking is associated with differential methylation in blood with no evidence of this exposure impacting cell counts. Opposite to non-genetic effects, genetic effects of CpG methylation are shared across tissues and thus limit inter-tissue epigenetic drift. CpH methylation is rare, and shows similar characteristics of variation patterns as CpGs.

    Conclusions: Our study highlights the utility of low pass whole-genome bisulfite sequencing in identifying methylome variation beyond promoter regions, and suggests that targeting the population dynamic methylome of tissues requires assessment of understudied intergenic CpGs distal to gene promoters to reveal the full extent of inter-individual variation.

    Funded by: Wellcome Trust: 081917/Z/07/Z

    Genome biology 2015;16;290

  • The Matchmaker Exchange API: automating patient matching through the exchange of structured phenotypic and genotypic profiles.

    Buske OJ, Schiettecatte F, Hutton B, Dumitriu S, Misyura A, Huang L, Hartley T, Girdea M, Sobreira N, Mungall C and Brudno M

    Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, Canada.

    Despite the increasing prevalence of clinical sequencing, the difficulty of identifying additional affected families is a key obstacle to solving many rare diseases. There may only be a handful of similar patients worldwide, and their data may be stored in diverse clinical and research databases. Computational methods are necessary to enable finding similar patients across the growing number of patient repositories and registries. We present the Matchmaker Exchange Application Programming Interface (MME API), a protocol and data format for exchanging phenotype and genotype profiles to enable matchmaking among patient databases, facilitate the identification of additional cohorts, and increase the rate with which rare diseases can be researched and diagnosed. We designed the API to be straightforward and flexible in order to simplify its adoption on a large number of data types and workflows. We also provide a public test data set, curated from the literature, to facilitate implementation of the API and development of new matching algorithms. The initial version of the API has been successfully implemented by three members of the Matchmaker Exchange and was immediately able to reproduce previously identified matches and generate several new leads currently being validated. The API is available at

    Funded by: Canadian Institutes of Health Research; NHGRI NIH HHS: 1U54HG006542, U54 HG006542

    Human mutation 2015;36;10;922-7

  • Talking welfare: the importance of a common language.

    Bussell J and Wells SE

    Welcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Ontologies describing mouse phenotypes and pathology are well established and becoming more universally used (Smith and Eppig in Mamm Genome 23:653, 2012; Scofield et al. in J Biomed Semant 4:18, 2013). However, the language used to describe and disseminate cage-side observations is less well developed. This article explores the hurdles to unifying a language and terminology, and introduces our initial attempt to do so.

    Mammalian genome : official journal of the International Mammalian Genome Society 2015;26;9-10;482-5

  • An O antigen capsule modulates bacterial pathogenesis in Shigella sonnei.

    Caboni M, Pédron T, Rossi O, Goulding D, Pickard D, Citiulo F, MacLennan CA, Dougan G, Thomson NR, Saul A, Sansonetti PJ and Gerke C

    Novartis Vaccines Institute for Global Health, Siena, Via Fiorentina, Italy.

    Shigella is the leading cause for dysentery worldwide. Together with several virulence factors employed for invasion, the presence and length of the O antigen (OAg) of the lipopolysaccharide (LPS) plays a key role in pathogenesis. S. flexneri 2a has a bimodal OAg chain length distribution regulated in a growth-dependent manner, whereas S. sonnei LPS comprises a monomodal OAg. Here we reveal that S. sonnei, but not S. flexneri 2a, possesses a high molecular weight, immunogenic group 4 capsule, characterized by structural similarity to LPS OAg. We found that a galU mutant of S. sonnei, that is unable to produce a complete LPS with OAg attached, can still assemble OAg material on the cell surface, but a galU mutant of S. flexneri 2a cannot. High molecular weight material not linked to the LPS was purified from S. sonnei and confirmed by NMR to contain the specific sugars of the S. sonnei OAg. Deletion of genes homologous to the group 4 capsule synthesis cluster, previously described in Escherichia coli, abolished the generation of the high molecular weight OAg material. This OAg capsule strongly affects the virulence of S. sonnei. Uncapsulated knockout bacteria were highly invasive in vitro and strongly inflammatory in the rabbit intestine. But, the lack of capsule reduced the ability of S. sonnei to resist complement-mediated killing and to spread from the gut to peripheral organs. In contrast, overexpression of the capsule decreased invasiveness in vitro and inflammation in vivo compared to the wild type. In conclusion, the data indicate that in S. sonnei expression of the capsule modulates bacterial pathogenesis resulting in balanced capabilities to invade and persist in the host environment.

    PLoS pathogens 2015;11;3;e1004749

  • Using genomics to combat infectious diseases on a global scale.

    Cain AK and Lees JA

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    A report on the seventh annual Infectious Disease Genomics Conference, held in Hinxton, Cambridge, UK, 14-16 October 2015.

    Funded by: Medical Research Council: G1100100

    Genome biology 2015;16;250

  • Genomic Epidemiology of a Protracted Hospital Outbreak Caused by a Toxin A-Negative Clostridium difficile Sublineage PCR Ribotype 017 Strain in London, England.

    Cairns MD, Preston MD, Lawley TD, Clark TG, Stabler RA and Wren BW

    Department of Pathogen Molecular Biology, London School of Hygiene and Tropical Medicine, London, United Kingdom UCL Centre for Clinical Microbiology, University College London, London, United Kingdom Public Health Laboratory London, Health Protection Agency, Division of Infection, The Royal London Hospital, London, United Kingdom.

    Clostridium difficile remains the leading cause of nosocomial diarrhea worldwide, which is largely considered to be due to the production of two potent toxins: TcdA and TcdB. However, PCR ribotype (RT) 017, one of five clonal lineages of human virulent C. difficile, lacks TcdA expression but causes widespread disease. Whole-genome sequencing was applied to 35 isolates from hospitalized patients with C. difficile infection (CDI) and two environmental ward isolates in London, England. The phylogenetic analysis of single nucleotide polymorphisms (SNPs) revealed a clonal cluster of temporally variable isolates from a single hospital ward at University Hospital Lewisham (UHL) that were distinct from other London hospital isolates. De novo assembled genomes revealed a 49-kbp putative conjugative transposon exclusive to this hospital clonal cluster which would not be revealed by current typing methodologies. This study identified three sublineages of C. difficile RT017 that are circulating in London. Similar to the notorious RT027 lineage, which has caused global outbreaks of CDI since 2001, the lineage of toxin-defective RT017 strains appears to be continually evolving. By utilization of WGS technologies to identify SNPs and the evolution of clonal strains, the transmission of outbreaks caused by near-identical isolates can be retraced and identified.

    Funded by: Medical Research Council: G1000214, MR/K000551/1, PF451; Wellcome Trust: 086418, 098051

    Journal of clinical microbiology 2015;53;10;3141-7

  • Pharmacogenomic agreement between two cancer cell line data sets.

    Cancer Cell Line Encyclopedia Consortium and Genomics of Drug Sensitivity in Cancer Consortium

    Large cancer cell line collections broadly capture the genomic diversity of human cancers and provide valuable insight into anti-cancer drug response. Here we show substantial agreement and biological consilience between drug sensitivity measurements and their associated genomic predictors from two publicly available large-scale pharmacogenomics resources: The Cancer Cell Line Encyclopedia and the Genomics of Drug Sensitivity in Cancer databases.

    Funded by: Cancer Research UK: A16629; NHGRI NIH HHS: 1U54HG006097-01, U54 HG006097; Wellcome Trust: 086357, 102696, 102696STRATTON

    Nature 2015;528;7580;84-7

  • Non-typhoidal Salmonella Typhimurium ST313 isolates that cause bacteremia in humans stimulate less inflammasome activation than ST19 isolates associated with gastroenteritis.

    Carden S, Okoro C, Dougan G and Monack D

    Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA.

    Salmonella is an enteric pathogen that causes a range of diseases in humans. Non-typhoidal Salmonella (NTS) serovars such as Salmonella enterica serovar Typhimurium generally cause a self-limiting gastroenteritis whereas typhoidal serovars cause a systemic disease, typhoid fever. However, S. Typhimurium isolates within the multi-locus sequence type ST313 have emerged in sub-Saharan Africa as a major cause of bacteremia in humans. The S. Typhimurium ST313 lineage is phylogenetically distinct from classical S. Typhimurium lineages, such as ST19, that cause zoonotic gastroenteritis worldwide. Previous studies have shown that the ST313 lineage has undergone genome degradation when compared to the ST19 lineage, similar to that observed for typhoidal serovars. Currently, little is known about phenotypic differences between ST313 isolates and other NTS isolates. We find that representative ST313 isolates invade non-phagocytic cells less efficiently than the classical ST19 isolates that are more commonly associated with gastroenteritis. In addition, ST313 isolates induce less Caspase-1-dependent macrophage death and IL-1β release than ST19 isolates. ST313 isolates also express relatively lower levels of mRNA of the genes encoding the SPI-1 effector sopE2 and the flagellin, fliC, providing possible explanations for the decrease in invasion and inflammasome activation. The ST313 isolates have invasion and inflammatory phenotypes that are intermediate; more invasive and inflammatory than Salmonella enterica serovar Typhi and less than ST19 isolates associated with gastroenteritis. This suggests that both phenotypically and at the genomic level ST313 isolates are evolving signatures that facilitate a systemic lifestyle in humans.

    Funded by: NIAID NIH HHS: AI08972, AI095396; Wellcome Trust

    Pathogens and disease 2015;73;4

  • High Throughput Sequencing Analysis of the Immunoglobulin Heavy Chain Gene from Flow-Sorted B Cell Sub-Populations Define the Dynamics of Follicular Lymphoma Clonal Evolution.

    Carlotti E, Wrench D, Rosignoli G, Marzec J, Sangaralingam A, Hazanov L, Michaeli M, Hallam S, Chaplin T, Iqbal S, Calaminici M, Young B, Mehr R, Campbell P, Fitzgibbon J and Gribben JG

    Centre for Haemato-Oncology, Barts Cancer Institute - a CR-UK Centre Of Excellence, Queen Mary University of London, London, United Kingdom.

    Understanding the dynamics of evolution of Follicular Lymphoma (FL) clones during disease progression is important for monitoring and targeting this tumor effectively. Genetic profiling of serial FL biopsies and examples of FL transmission following bone marrow transplant suggest that this disease may evolve by divergent evolution from a common ancestor cell. However where this ancestor cell resides and how it evolves is still unclear. The analysis of the pattern of somatic hypermutation of the immunoglobulin gene (Ig) is traditionally used for tracking the physiological clonal evolution of B cells within the germinal center and allows to discriminate those cells that have just entered the germinal center and display features of ancestor cells from those B cells that keep re-circulating across different lymphoid organs. Here we investigated the pattern of somatic hypermutation of the heavy chain of the immunoglobulin gene (IgH-VH) in 4 flow-sorted B cells subpopulations belonging to different stages of differentiation, from sequential lymph node biopsies of cases displaying diverse patterns of evolution, using the GS-FLX Titanium sequencing platform. We observed an unexpectedly high level of clonality, with hundreds of distinct tumor subclones in the different subpopulations from the same sample, the majority detected at a frequency <10-2. By using a lineage trees analysis we observed in all our FL and t-FL cases that the oligoclonal FL population was trapped in a narrow intermediate stage of maturation that maintains the capacity to undergo SHM, but was unable to further differentiate. The presence of such a complex architecture highlights challenges currently encountered in finding a cure for this disease.

    Funded by: Cancer Research UK: C1574/A6806

    PloS one 2015;10;9;e0134833

  • Obesity, starch digestion and amylase: association between copy number variants at human salivary (AMY1) and pancreatic (AMY2) amylase genes.

    Carpenter D, Dhar S, Mitchell LM, Fu B, Tyson J, Shwan NA, Yang F, Thomas MG and Armour JA

    School of Life Sciences, Queen's Medical Centre, University of Nottingham, Nottingham NG7 2UH, UK.

    The human salivary amylase genes display extensive copy number variation (CNV), and recent work has implicated this variation in adaptation to starch-rich diets, and in association with body mass index. In this work, we use paralogue ratio tests, microsatellite analysis, read depth and fibre-FISH to demonstrate that human amylase CNV is not a smooth continuum, but is instead partitioned into distinct haplotype classes. There is a fundamental structural distinction between haplotypes containing odd or even numbers of AMY1 gene units, in turn coupled to CNV in pancreatic amylase genes AMY2A and AMY2B. Most haplotypes have one copy each of AMY2A and AMY2B and contain an odd number of copies of AMY1; consequently, most individuals have an even total number of AMY1. In contrast, haplotypes carrying an even number of AMY1 genes have rearrangements leading to CNVs of AMY2A/AMY2B. Read-depth and experimental data show that different populations harbour different proportions of these basic haplotype classes. In Europeans, the copy numbers of AMY1 and AMY2A are correlated, so that phenotypic associations caused by variation in pancreatic amylase copy number could be detected indirectly as weak association with AMY1 copy number. We show that the quantitative polymerase chain reaction (qPCR) assay previously applied to the high-throughput measurement of AMY1 copy number is less accurate than the measures we use and that qPCR data in other studies have been further compromised by systematic miscalibration. Our results uncover new patterns in human amylase variation and imply a potential role for AMY2 CNV in functional associations.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/I006370/1; Wellcome Trust: WT098051

    Human molecular genetics 2015;24;12;3472-80

  • Defining the Roles of TcdA and TcdB in Localized Gastrointestinal Disease, Systemic Organ Damage, and the Host Response during Clostridium difficile Infections.

    Carter GP, Chakravorty A, Pham Nguyen TA, Mileto S, Schreiber F, Li L, Howarth P, Clare S, Cunningham B, Sambol SP, Cheknis A, Figueroa I, Johnson S, Gerding D, Rood JI, Dougan G, Lawley TD and Lyras D

    Department of Microbiology, Monash University, Victoria, Australia.

    Unlabelled: Clostridium difficile is a leading cause of antibiotic-associated diarrhea, a significant animal pathogen, and a worldwide public health burden. Most disease-causing strains secrete two exotoxins, TcdA and TcdB, which are considered to be the primary virulence factors. Understanding the role that these toxins play in disease is essential for the rational design of urgently needed new therapeutics. However, their relative contributions to disease remain contentious. Using three different animal models, we show that TcdA(+) TcdB(-) mutants are attenuated in virulence in comparison to the wild-type (TcdA(+) TcdB(+)) strain, whereas TcdA(-) TcdB(+) mutants are fully virulent. We also show for the first time that TcdB alone is associated with both severe localized intestinal damage and systemic organ damage, suggesting that this toxin might be responsible for the onset of multiple organ dysfunction syndrome (MODS), a poorly characterized but often fatal complication of C. difficile infection (CDI). Finally, we show that TcdB is the primary factor responsible for inducing the in vivo host innate immune and inflammatory responses. Surprisingly, the animal infection model used was found to profoundly influence disease outcomes, a finding which has important ramifications for the validation of new therapeutics and future disease pathogenesis studies. Overall, our results show unequivocally that TcdB is the major virulence factor of C. difficile and provide new insights into the host response to C. difficile during infection. The results also highlight the critical nature of using appropriate and, when possible, multiple animal infection models when studying bacterial virulence mechanisms.

    Importance: Clostridium difficile is a leading cause of antibiotic-associated diarrhea and an important hospital pathogen. TcdA and TcdB are thought to be the primary virulence factors responsible for disease symptoms of C. difficile infections (CDI). However, the individual contributions of these toxins to disease remain contentious. Using three different animal models of infection, we show for the first time that TcdB alone causes severe damage to the gut, as well as systemic organ damage, suggesting that this toxin might be responsible for MODS, a serious but poorly understood complication of CDI. These findings provide important new insights into the host response to C. difficile during infection and should guide the rational development of urgently required nonantibiotic therapeutics for the treatment of CDI.

    Funded by: Medical Research Council: 93614; Wellcome Trust: 086418, 098051

    mBio 2015;6;3;e00551

  • Absence of heterozygosity due to template switching during replicative rearrangements.

    Carvalho CM, Pfundt R, King DA, Lindsay SJ, Zuccherato LW, Macville MV, Liu P, Johnson D, Stankiewicz P, Brown CW, DDD Study, Shaw CA, Hurles ME, Ira G, Hastings PJ, Brunner HG and Lupski JR

    Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Centro de Pesquisas René Rachou - FIOCRUZ, Belo Horizonte, MG 30190-002, Brazil.

    We investigated complex genomic rearrangements (CGRs) consisting of triplication copy-number variants (CNVs) that were accompanied by extended regions of copy-number-neutral absence of heterozygosity (AOH) in subjects with multiple congenital abnormalities. Molecular analyses provided observational evidence that in humans, post-zygotically generated CGRs can lead to regional uniparental disomy (UPD) due to template switches between homologs versus sister chromatids by using microhomology to prime DNA replication-a prediction of the replicative repair model, MMBIR. Our findings suggest that replication-based mechanisms might underlie the formation of diverse types of genomic alterations (CGRs and AOH) implicated in constitutional disorders.

    Funded by: NHGRI NIH HHS: U54 HG006542, U54HG006542; NIGMS NIH HHS: R01 GM080600, R01 GM106373, R01GM080600, R01GM106373; NINDS NIH HHS: R01 NS058529, R01NS058529; Wellcome Trust

    American journal of human genetics 2015;96;4;555-64

  • TEAD and YAP regulate the enhancer network of human embryonic pancreatic progenitors.

    Cebola I, Rodríguez-Seguí SA, Cho CH, Bessa J, Rovira M, Luengo M, Chhatriwala M, Berry A, Ponsa-Cobas J, Maestro MA, Jennings RE, Pasquali L, Morán I, Castro N, Hanley NA, Gomez-Skarmeta JL, Vallier L and Ferrer J

    Department of Medicine, Imperial College London, London W12 0NN, United Kingdom.

    The genomic regulatory programmes that underlie human organogenesis are poorly understood. Pancreas development, in particular, has pivotal implications for pancreatic regeneration, cancer and diabetes. We have now characterized the regulatory landscape of embryonic multipotent progenitor cells that give rise to all pancreatic epithelial lineages. Using human embryonic pancreas and embryonic-stem-cell-derived progenitors we identify stage-specific transcripts and associated enhancers, many of which are co-occupied by transcription factors that are essential for pancreas development. We further show that TEAD1, a Hippo signalling effector, is an integral component of the transcription factor combinatorial code of pancreatic progenitor enhancers. TEAD and its coactivator YAP activate key pancreatic signalling mediators and transcription factors, and regulate the expansion of pancreatic progenitors. This work therefore uncovers a central role for TEAD and YAP as signal-responsive regulators of multipotent pancreatic progenitors, and provides a resource for the study of embryonic development of the human pancreas.

    Funded by: Department of Health; Medical Research Council: G0701448, G0800784, G1100420, MR/L02036X/1; Wellcome Trust: 088566, 101033, WT097820

    Nature cell biology 2015;17;5;615-626

  • The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes.

    Challis D, Antunes L, Garrison E, Banks E, Evani US, Muzny D, Poplin R, Gibbs RA, Marth G and Yu F

    Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.

    Background: Identifying insertion/deletion polymorphisms (INDELs) with high confidence has been intrinsically challenging in short-read sequencing data. Here we report our approach for improving INDEL calling accuracy by using a machine learning algorithm to combine call sets generated with three independent methods, and by leveraging the strengths of each individual pipeline. Utilizing this approach, we generated a consensus exome INDEL call set from a large dataset generated by the 1000 Genomes Project (1000G), maximizing both the sensitivity and the specificity of the calls.

    Results: This consensus exome INDEL call set features 7,210 INDELs, from 1,128 individuals across 13 populations included in the 1000 Genomes Phase 1 dataset, with a false discovery rate (FDR) of about 7.0%.

    Conclusions: In our study we further characterize the patterns and distributions of these exonic INDELs with respect to density, allele length, and site frequency spectrum, as well as the potential mutagenic mechanisms of coding INDELs in humans.

    Funded by: NHGRI NIH HHS: 1R01HG008115, 1U01HG005211, 5U54HG003273, R01 HG008115, R01HG004719, U01HG006513

    BMC genomics 2015;16;143

  • Global reorganization of the nuclear landscape in senescent cells.

    Chandra T, Ewels PA, Schoenfelder S, Furlan-Magaril M, Wingett SW, Kirschner K, Thuret JY, Andrews S, Fraser P and Reik W

    Epigenetics Programme, The Babraham Institute, Cambridge CB22 3AT, UK; The Wellcome Trust Sanger Institute, Cambridge CB10 1SA, UK. Electronic address:

    Cellular senescence has been implicated in tumor suppression, development, and aging and is accompanied by large-scale chromatin rearrangements, forming senescence-associated heterochromatic foci (SAHF). However, how the chromatin is reorganized during SAHF formation is poorly understood. Furthermore, heterochromatin formation in senescence appears to contrast with loss of heterochromatin in Hutchinson-Gilford progeria. We mapped architectural changes in genome organization in cellular senescence using Hi-C. Unexpectedly, we find a dramatic sequence- and lamin-dependent loss of local interactions in heterochromatin. This change in local connectivity resolves the paradox of opposing chromatin changes in senescence and progeria. In addition, we observe a senescence-specific spatial clustering of heterochromatic regions, suggesting a unique second step required for SAHF formation. Comparison of embryonic stem cells (ESCs), somatic cells, and senescent cells shows a unidirectional loss in local chromatin connectivity, suggesting that senescence is an endpoint of the continuous nuclear remodelling process during differentiation.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/K010867/1, BBS/E/B/000C0404; Wellcome Trust: 095645, 095645/Z/11/Z

    Cell reports 2015;10;4;471-83

  • Quantitative differences in developmental profiles of spontaneous activity in cortical and hippocampal cultures.

    Charlesworth P, Cotterill E, Morton A, Grant S and Eglen SJ

    BackgroundNeural circuits can spontaneously generate complex spatiotemporal firing patterns during development. This spontaneous activity is thought to help guide development of the nervous system. In this study, we had two aims. First, to characterise the changes in spontaneous activity in cultures of developing networks of either hippocampal or cortical neurons dissociated from mouse. Second, to assess whether there are any functional differences in the patterns of activity in hippocampal and cortical networks.ResultsWe used multielectrode arrays to record the development of spontaneous activity in cultured networks of either hippocampal or cortical neurons every 2 or 3 days for the first month after plating. Within a few days of culturing, networks exhibited spontaneous activity. This activity strengthened and then stabilised typically around 21 days in vitro. We quantified the activity patterns in hippocampal and cortical networks using 11 features. Three out of 11 features showed striking differences in activity between hippocampal and cortical networks: (1) interburst intervals are less variable in spike trains from hippocampal cultures; (2) hippocampal networks have higher correlations and (3) hippocampal networks generate more robust theta-bursting patterns. Machine-learning techniques confirmed that these differences in patterning are sufficient to classify recordings reliably at any given age as either hippocampal or cortical networks.ConclusionsAlthough cultured networks of hippocampal and cortical networks both generate spontaneous activity that changes over time, at any given time we can reliably detect differences in the activity patterns. We anticipate that this quantitative framework could have applications in many areas, including neurotoxicity testing and for characterising the phenotype of different mutant mice. All code and data relating to this report are freely available for others to use.

    Neural development 2015;10;1;1

  • Facilitating collaboration in rare genetic disorders through effective matchmaking in DECIPHER.

    Chatzimichali EA, Brent S, Hutton B, Perrett D, Wright CF, Bevan AP, Hurles ME, Firth HV and Swaminathan GJ

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom.

    DECIPHER ( is a web-based platform for secure deposition, analysis, and sharing of plausibly pathogenic genomic variants from well-phenotyped patients suffering from genetic disorders. DECIPHER aids clinical interpretation of these rare sequence and copy-number variants by providing tools for variant analysis and identification of other patients exhibiting similar genotype-phenotype characteristics. DECIPHER also provides mechanisms to encourage collaboration among a global community of clinical centers and researchers, as well as exchange of information between clinicians and researchers within a consortium, to accelerate discovery and diagnosis. DECIPHER has contributed to matchmaking efforts by enabling the global clinical genetics community to identify many previously undiagnosed syndromes and new disease genes, and has facilitated the publication of over 700 peer-reviewed scientific publications since 2004. At the time of writing, DECIPHER contains anonymized data from ∼250 registered centers on more than 51,500 patients (∼18000 patients with consent for data sharing and ∼25000 anonymized records shared privately). In this paper, we describe salient features of the platform, with special emphasis on the tools and processes that aid interpretation, sharing, and effective matchmaking with other data held in the database and that make DECIPHER an invaluable clinical and research resource.

    Funded by: Wellcome Trust: WT077008

    Human mutation 2015;36;10;941-9

  • Phenotypic spectrum associated with PTCHD1 deletions and truncating mutations includes intellectual disability and autism spectrum disorder.

    Chaudhry A, Noor A, Degagne B, Baker K, Bok LA, Brady AF, Chitayat D, Chung BH, Cytrynbaum C, Dyment D, Filges I, Helm B, Hutchison HT, Jeng LJ, Laumonnier F, Marshall CR, Menzel M, Parkash S, Parker MJ, DDD Study, Raymond LF, Rideout AL, Roberts W, Rupps R, Schanze I, Schrander-Stumpel CT, Speevak MD, Stavropoulos DJ, Stevens SJ, Thomas ER, Toutain A, Vergano S, Weksberg R, Scherer SW, Vincent JB and Carter MT

    Department of Pediatrics, Division of Clinical and Metabolic Genetics, The Hospital for Sick Children, Toronto, Ontario, Canada.

    Studies of genomic copy number variants (CNVs) have identified genes associated with autism spectrum disorder (ASD) and intellectual disability (ID) such as NRXN1, SHANK2, SHANK3 and PTCHD1. Deletions have been reported in PTCHD1 however there has been little information available regarding the clinical presentation of these individuals. Herein we present 23 individuals with PTCHD1 deletions or truncating mutations with detailed phenotypic descriptions. The results suggest that individuals with disruption of the PTCHD1 coding region may have subtle dysmorphic features including a long face, prominent forehead, puffy eyelids and a thin upper lip. They do not have a consistent pattern of associated congenital anomalies or growth abnormalities. They have mild to moderate global developmental delay, variable degrees of ID, and many have prominent behavioral issues. Over 40% of subjects have ASD or ASD-like behaviors. The only consistent neurological findings in our cohort are orofacial hypotonia and mild motor incoordination. Our findings suggest that hemizygous PTCHD1 loss of function causes an X-linked neurodevelopmental disorder with a strong propensity to autistic behaviors. Detailed neuropsychological studies are required to better define the cognitive and behavioral phenotype.

    Funded by: Canadian Institutes of Health Research: MOP-114592; Department of Health: 1332, 2205, 247; Wellcome Trust: 091986, 100140, WT098051

    Clinical genetics 2015;88;3;224-33

  • Truncation of POC1A associated with short stature and extreme insulin resistance.

    Chen JH, Segni M, Payne F, Huang-Doran I, Sleigh A, Adams C, UK10K Consortium, Savage DB, O'Rahilly S, Semple RK and Barroso I

    The University of Cambridge Metabolic Research Laboratories Wellcome Trust-MRC Institute of Metabolic Science, Cambridge, UK The National Institute for Health Research Cambridge Biomedical Research Centre Cambridge, UK Department of Pediatrics Sapienza University, Rome, Italy Metabolic Disease Group Wellcome Trust Sanger Institute, Cambridge, UK Wolfson Brain Imaging Centre University of Cambridge, Cambridge, UK National Institute for Health Research/Wellcome Trust Clinical Research Facility Cambridge, UK The University of Cambridge Metabolic Research Laboratories Wellcome Trust-MRC Institute of Metabolic Science, Cambridge, UK The National Institute for Health Research Cambridge Biomedical Research Centre Cambridge, UK Department of Pediatrics Sapienza University, Rome, Italy Metabolic Disease Group Wellcome Trust Sanger Institute, Cambridge, UK Wolfson Brain Imaging Centre University of Cambridge, Cambridge, UK National Institute for Health Research/Wellcome Trust Clinical Research Facility Cambridge, UK.

    We describe a female proband with primordial dwarfism, skeletal dysplasia, facial dysmorphism, extreme dyslipidaemic insulin resistance and fatty liver associated with a novel homozygous frameshift mutation in POC1A, predicted to affect two of the three protein products of the gene. POC1A encodes a protein associated with centrioles throughout the cell cycle and implicated in both mitotic spindle and primary ciliary function. Three homozygous mutations affecting all isoforms of POC1A have recently been implicated in a similar syndrome of primordial dwarfism, although no detailed metabolic phenotypes were described. Primary cells from the proband we describe exhibited increased centrosome amplification and multipolar spindle formation during mitosis, but showed normal DNA content, arguing against mitotic skipping, cleavage failure or cell fusion. Despite evidence of increased DNA damage in cells with supernumerary centrosomes, no aneuploidy was detected. Extensive centrosome clustering both at mitotic spindles and in primary cilia mitigated the consequences of centrosome amplification, and primary ciliary formation was normal. Although further metabolic studies of patients with POC1A mutations are warranted, we suggest that POC1A may be added to ALMS1 and PCNT as examples of centrosomal or pericentriolar proteins whose dysfunction leads to extreme dyslipidaemic insulin resistance. Further investigation of links between these molecular defects and adipose tissue dysfunction is likely to yield insights into mechanisms of adipose tissue maintenance and regeneration that are critical to metabolic health.

    Funded by: Medical Research Council: G0701532, MC_UU_12012/5; NHLBI NIH HHS: RC2 HL102923, RC2 HL102924, RC2 HL102925, RC2 HL102926, RC2 HL103010, UC2 HL102923, UC2 HL102924, UC2 HL102925, UC2 HL102926, UC2 HL103010; Wellcome Trust: 091551, 095515, 096599, 098498, 100574, WT091310, WT098051

    Journal of molecular endocrinology 2015;55;2;147-58

  • Plasmodium Infection Is Associated with Impaired Hepatic Dimethylarginine Dimethylaminohydrolase Activity and Disruption of Nitric Oxide Synthase Inhibitor/Substrate Homeostasis.

    Chertow JH, Alkaitis MS, Nardone G, Ikeda AK, Cunnington AJ, Okebe J, Ebonyi AO, Njie M, Correa S, Jayasooriya S, Casals-Pascual C, Billker O, Conway DJ, Walther M and Ackerman H

    Laboratory of Malaria and Vector Research, Division of Intramural Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, Maryland, United States of America.

    Inhibition of nitric oxide (NO) signaling may contribute to pathological activation of the vascular endothelium during severe malaria infection. Dimethylarginine dimethylaminohydrolase (DDAH) regulates endothelial NO synthesis by maintaining homeostasis between asymmetric dimethylarginine (ADMA), an endogenous NO synthase (NOS) inhibitor, and arginine, the NOS substrate. We carried out a community-based case-control study of Gambian children to determine whether ADMA and arginine homeostasis is disrupted during severe or uncomplicated malaria infections. Circulating plasma levels of ADMA and arginine were determined at initial presentation and 28 days later. Plasma ADMA/arginine ratios were elevated in children with acute severe malaria compared to 28-day follow-up values and compared to children with uncomplicated malaria or healthy children (p<0.0001 for each comparison). To test the hypothesis that DDAH1 is inactivated during Plasmodium infection, we examined DDAH1 in a mouse model of severe malaria. Plasmodium berghei ANKA infection inactivated hepatic DDAH1 via a post-transcriptional mechanism as evidenced by stable mRNA transcript number, decreased DDAH1 protein concentration, decreased enzyme activity, elevated tissue ADMA, elevated ADMA/arginine ratio in plasma, and decreased whole blood nitrite concentration. Loss of hepatic DDAH1 activity and disruption of ADMA/arginine homeostasis may contribute to severe malaria pathogenesis by inhibiting NO synthesis.

    Funded by: Medical Research Council: G0701427; Wellcome Trust: WT098051

    PLoS pathogens 2015;11;9;e1005119

  • SpeedSeq: ultra-fast personal genome analysis and interpretation.

    Chiang C, Layer RM, Faust GG, Lindberg MR, Rose DB, Garrison EP, Marth GT, Quinlan AR and Hall IM

    McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri, USA.

    SpeedSeq is an open-source genome analysis platform that accomplishes alignment, variant detection and functional annotation of a 50× human genome in 13 h on a low-cost server and alleviates a bioinformatics bottleneck that typically demands weeks of computation with extensive hands-on expert involvement. SpeedSeq offers performance competitive with or superior to current methods for detecting germline and somatic single-nucleotide variants, structural variants, insertions and deletions, and it includes novel functionality for streamlined interpretation.

    Funded by: NCI NIH HHS: T32 CA009109; NHGRI NIH HHS: R01 HG006693, R01HG006693, U54 HG003079; NIGMS NIH HHS: T32 GM007267; NIH HHS: DP2 OD006493, DP2OD006493-01

    Nature methods 2015;12;10;966-8

  • The complexity, challenges and benefits of comparing two transporter classification systems in TCDB and Pfam.

    Chiang Z, Vastermark A, Punta M, Coggill PC, Mistry J, Finn RD and Saier MH

    Transport systems comprise roughly 10% of all proteins in a cell, playing critical roles in many processes. Improving and expanding their classification is an important goal that can affect studies ranging from comparative genomics to potential drug target searches. It is not surprising that different classification systems for transport proteins have arisen, be it within a specialized database, focused on this functional class of proteins, or as part of a broader classification system for all proteins. Two such databases are the Transporter Classification Database (TCDB) and the Protein family (Pfam) database. As part of a long-term endeavor to improve consistency between the two classification systems, we have compared transporter annotations in the two databases to understand the rationale for differences and to improve both systems. Differences sometimes reflect the fact that one database has a particular transporter family while the other does not. Differing family definitions and hierarchical organizations were reconciled, resulting in recognition of 69 Pfam 'Domains of Unknown Function', which proved to be transport protein families to be renamed using TCDB annotations. Of over 400 potential new Pfam families identified from TCDB, 10% have already been added to Pfam, and TCDB has created 60 new entries based on Pfam data. This work, for the first time, reveals the benefits of comprehensive database comparisons and explains the differences between Pfam and TCDB.

    Funded by: NIGMS NIH HHS: GM077402; Wellcome Trust: WT077044/Z/05/Z

    Briefings in bioinformatics 2015;16;5;865-72

  • Discovery of a novel neuroprotectant, BHDPC, that protects against MPP+/MPTP-induced neuronal death in multiple experimental models.

    Chong CM, Ma D, Zhao C, Franklin RJ, Zhou ZY, Ai N, Li C, Yu H, Hou T, Sa F and Lee SM

    State Key Laboratory of Quality Research in Chinese Medicine and Institute of Chinese Medical Sciences, University of Macau, Macao, China.

    Progressive degeneration and death of neurons are main causes of neurodegenerative disorders such as Parkinson's disease and Alzheimer's disease. Although some current medicines may temporarily improve their symptoms, no treatments can slow or halt the progression of neuronal death. In this study, a pyrimidine derivative, benzyl 7-(4-hydroxy-3-methoxyphenyl)-5-methyl-4,7-dihydrotetrazolo[1,5-a]pyrimidine-6-carboxylate (BHDPC), was found to attenuate dramatically the MPTP-induced death of dopaminergic neurons and improve behavior movement deficiency in zebrafish, supporting its potential neuroprotective activity in vivo. Further study in rat organotypic cerebellar cultures indicated that BHDPC was able to suppress MPP(+)-induced cell death of brain tissue slices ex vivo. The protective effect of BHDPC against MPP(+) toxicity was also effective in human neuroblastoma SH-SY5Y cells through restoring abnormal changes in mitochondrial membrane potential and numerous apoptotic regulators. Western blotting analysis indicated that BHDPC was able to activate PKA/CREB survival signaling and further up-regulate Bcl2 expression. However, BHDPC failed to suppress MPP(+)-induced cytotoxicity and the increase of caspase 3 activity in the presence of the PKA inhibitor H89. Taken together, these results suggest that BHDPC is a potential neuroprotectant with prosurvival effects in multiple models of neurodegenerative disease in vitro, ex vivo, and in vivo.

    Free radical biology & medicine 2015;89;1057-66

  • Vibrio cholerae Serogroup O139: Isolation from Cholera Patients and Asymptomatic Household Family Members in Bangladesh between 2013 and 2014.

    Chowdhury F, Mather AE, Begum YA, Asaduzzaman M, Baby N, Sharmin S, Biswas R, Uddin MI, LaRocque RC, Harris JB, Calderwood SB, Ryan ET, Clemens JD, Thomson NR and Qadri F

    International Centre for Diarrhoeal Disease Research Bangladesh (icddr,b), Dhaka, Bangladesh.

    Background: Cholera is endemic in Bangladesh, with outbreaks reported annually. Currently, the majority of epidemic cholera reported globally is El Tor biotype Vibrio cholerae isolates of the serogroup O1. However, in Bangladesh, outbreaks attributed to V. cholerae serogroup O139 isolates, which fall within the same phylogenetic lineage as the O1 serogroup isolates, were seen between 1992 and 1993 and in 2002 to 2005. Since then, V. cholerae serogroup O139 has only been sporadically isolated in Bangladesh and is now rarely isolated elsewhere.

    Methods: Here, we present case histories of four cholera patients infected with V. cholerae serogroup O139 in 2013 and 2014 in Bangladesh. We comprehensively typed these isolates using conventional approaches, as well as by whole genome sequencing. Phenotypic typing and PCR confirmed all four isolates belonging to the O139 serogroup.

    Findings: Whole genome sequencing revealed that three of the isolates were phylogenetically closely related to previously sequenced El Tor biotype, pandemic 7, toxigenic V. cholerae O139 isolates originating from Bangladesh and elsewhere. The fourth isolate was a non-toxigenic V. cholerae that, by conventional approaches, typed as O139 serogroup but was genetically divergent from previously sequenced pandemic 7 V. cholerae lineages belonging to the O139 or O1 serogroups.

    Conclusion: These results suggest that previously observed lineages of V. cholerae O139 persist in Bangladesh and can cause clinical disease and that a novel disease-causing non-toxigenic O139 isolate also occurs.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/M014088/1; NIAID NIH HHS: R01 AI106878, R01AI106878, U01 AI058935, U01 AI077883, U01AI058935; PHS HHS: U01A1077883; Wellcome Trust: 098051

    PLoS neglected tropical diseases 2015;9;11;e0004183

  • The Bangladesh Risk of Acute Vascular Events (BRAVE) Study: objectives and design.

    Chowdhury R, Alam DS, Fakir II, Adnan SD, Naheed A, Tasmin I, Monower MM, Hossain F, Hossain FM, Rahman MM, Afrin S, Roy AK, Akter M, Sume SA, Biswas AK, Pennells L, Surendran P, Young RD, Spackman SA, Hasan K, Harshfield E, Sheikh N, Houghton R, Saleheen D, Howson JM, Butterworth AS, Cardiology Research Group, Raqib R, Majumder AA, Danesh J and Di Angelantonio E

    Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK,

    During recent decades, Bangladesh has experienced a rapid epidemiological transition from communicable to non-communicable diseases. Coronary heart disease (CHD), with myocardial infarction (MI) as its main manifestation, is a major cause of death in the country. However, there is limited reliable evidence about its determinants in this population. The Bangladesh Risk of Acute Vascular Events (BRAVE) study is an epidemiological bioresource established to examine environmental, genetic, lifestyle and biochemical determinants of CHD among the Bangladeshi population. By early 2015, the ongoing BRAVE study had recruited over 5000 confirmed first-ever MI cases, and over 5000 controls "frequency-matched" by age and sex. For each participant, information has been recorded on demographic factors, lifestyle, socioeconomic, clinical, and anthropometric characteristics. A 12-lead electrocardiogram has been recorded. Biological samples have been collected and stored, including extracted DNA, plasma, serum and whole blood. Additionally, for the 3000 cases and 3000 controls initially recruited, genotyping has been done using the CardioMetabochip+ and the Exome+ arrays. The mean age (standard deviation) of MI cases is 53 (10) years, with 88 % of cases being male and 46 % aged 50 years or younger. The median interval between reported onset of symptoms and hospital admission is 5 h. Initial analyses indicate that Bangladeshis are genetically distinct from major non-South Asian ethnicities, as well as distinct from other South Asian ethnicities. The BRAVE study is well-placed to serve as a powerful resource to investigate current and future hypotheses relating to environmental, biochemical and genetic causes of CHD in an important but under-studied South Asian population.

    Funded by: British Heart Foundation: RG/13/13/30194; Medical Research Council: MR/L003120/1

    European journal of epidemiology 2015;30;7;577-87

  • Runs of Homozygosity: Association with Coronary Artery Disease and Gene Expression in Monocytes and Macrophages.

    Christofidou P, Nelson CP, Nikpay M, Qu L, Li M, Loley C, Debiec R, Braund PS, Denniff M, Charchar FJ, Arjo AR, Trégouët DA, Goodall AH, Cambien F, Ouwehand WH, Roberts R, Schunkert H, Hengstenberg C, Reilly MP, Erdmann J, McPherson R, König IR, Thompson JR, Samani NJ and Tomaszewski M

    Department of Cardiovascular Sciences, University of Leicester, Leicester LE3 9QP, UK.

    Runs of homozygosity (ROHs) are recognized signature of recessive inheritance. Contributions of ROHs to the genetic architecture of coronary artery disease and regulation of gene expression in cells relevant to atherosclerosis are not known. Our combined analysis of 24,320 individuals from 11 populations of white European ethnicity showed an association between coronary artery disease and both the count and the size of ROHs. Individuals with coronary artery disease had approximately 0.63 (95% CI: 0.4-0.8) excess of ROHs when compared to coronary-artery-disease-free control subjects (p = 1.49 × 10(-9)). The average total length of ROHs was approximately 1,046.92 (95% CI: 634.4-1,459.5) kb greater in individuals with coronary artery disease than control subjects (p = 6.61 × 10(-7)). None of the identified individual ROHs was associated with coronary artery disease after correction for multiple testing. However, in aggregate burden analysis, ROHs favoring increased risk of coronary artery disease were much more common than those showing the opposite direction of association with coronary artery disease (p = 2.69 × 10(-33)). Individual ROHs showed significant associations with monocyte and macrophage expression of genes in their close proximity-subjects with several individual ROHs showed significant differences in the expression of 44 mRNAs in monocytes and 17 mRNAs in macrophages when compared to subjects without those ROHs. This study provides evidence for an excess of homozygosity in coronary artery disease in outbred populations and suggest the potential biological relevance of ROHs in cells of importance to the pathogenesis of atherosclerosis.

    Funded by: British Heart Foundation: PG/12/9/29376; Department of Health: RP-PG-0310-1002; Wellcome Trust: 084183/Z/07/Z

    American journal of human genetics 2015;97;2;228-37

  • A high-resolution genomic analysis of multidrug-resistant hospital outbreaks of Klebsiella pneumoniae.

    Chung The H, Karkey A, Pham Thanh D, Boinett CJ, Cain AK, Ellington M, Baker KS, Dongol S, Thompson C, Harris SR, Jombart T, Le Thi Phuong T, Tran Do Hoang N, Ha Thanh T, Shretha S, Joshi S, Basnyat B, Thwaites G, Thomson NR, Rabaa MA and Baker S

    The Hospital for Tropical Diseases, Wellcome Trust Major Overseas Programme, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam.

    Multidrug-resistant (MDR) Klebsiella pneumoniae has become a leading cause of nosocomial infections worldwide. Despite its prominence, little is known about the genetic diversity of K. pneumoniae in resource-poor hospital settings. Through whole-genome sequencing (WGS), we reconstructed an outbreak of MDR K. pneumoniae occurring on high-dependency wards in a hospital in Kathmandu during 2012 with a case-fatality rate of 75%. The WGS analysis permitted the identification of two MDR K. pneumoniae lineages causing distinct outbreaks within the complex endemic K. pneumoniae. Using phylogenetic reconstruction and lineage-specific PCR, our data predicted a scenario in which K. pneumoniae, circulating for 6 months before the outbreak, underwent a series of ward-specific clonal expansions after the acquisition of genes facilitating virulence and MDR. We suggest that the early detection of a specific NDM-1 containing lineage in 2011 would have alerted the high-dependency ward staff to intervene. We argue that some form of real-time genetic characterisation, alongside clade-specific PCR during an outbreak, should be factored into future healthcare infection control practices in both high- and low-income settings.

    Funded by: Medical Research Council: G1100100, MR/K010174/1; Wellcome Trust: 089276, 095831, 100087

    EMBO molecular medicine 2015;7;3;227-39

  • Extending reference assembly models.

    Church DM, Schneider VA, Steinberg KM, Schatz MC, Quinlan AR, Chin CS, Kitts PA, Aken B, Marth GT, Hoffman MM, Herrero J, Mendoza ML, Durbin R and Flicek P

    The human genome reference assembly is crucial for aligning and analyzing sequence data, and for genome annotation, among other roles. However, the models and analysis assumptions that underlie the current assembly need revising to fully represent human sequence diversity. Improved analysis tools and updated data reporting formats are also required.

    Funded by: Intramural NIH HHS; NHGRI NIH HHS: U41 HG007234, U41 HG007635, U41HG007234; Wellcome Trust: 095908

    Genome biology 2015;16;13

  • Inherited determinants of Crohn's disease and ulcerative colitis phenotypes: a genetic association study.

    Cleynen I, Boucher G, Jostins L, Schumm LP, Zeissig S, Ahmad T, Andersen V, Andrews JM, Annese V, Brand S, Brant SR, Cho JH, Daly MJ, Dubinsky M, Duerr RH, Ferguson LR, Franke A, Gearry RB, Goyette P, Hakonarson H, Halfvarson J, Hov JR, Huang H, Kennedy NA, Kupcinskas L, Lawrance IC, Lee JC, Satsangi J, Schreiber S, Théâtre E, van der Meulen-de Jong AE, Weersma RK, Wilson DC, International Inflammatory Bowel Disease Genetics Consortium, Parkes M, Vermeire S, Rioux JD, Mansfield J, Silverberg MS, Radford-Smith G, McGovern DP, Barrett JC and Lees CW

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK; Department of Clinical and Experimental Medicine, TARGID, KU Leuven, Leuven, Belgium.

    Background: Crohn's disease and ulcerative colitis are the two major forms of inflammatory bowel disease; treatment strategies have historically been determined by this binary categorisation. Genetic studies have identified 163 susceptibility loci for inflammatory bowel disease, mostly shared between Crohn's disease and ulcerative colitis. We undertook the largest genotype association study, to date, in widely used clinical subphenotypes of inflammatory bowel disease with the goal of further understanding the biological relations between diseases.

    Methods: This study included patients from 49 centres in 16 countries in Europe, North America, and Australasia. We applied the Montreal classification system of inflammatory bowel disease subphenotypes to 34,819 patients (19,713 with Crohn's disease, 14,683 with ulcerative colitis) genotyped on the Immunochip array. We tested for genotype-phenotype associations across 156,154 genetic variants. We generated genetic risk scores by combining information from all known inflammatory bowel disease associations to summarise the total load of genetic risk for a particular phenotype. We used these risk scores to test the hypothesis that colonic Crohn's disease, ileal Crohn's disease, and ulcerative colitis are all genetically distinct from each other, and to attempt to identify patients with a mismatch between clinical diagnosis and genetic risk profile.

    Findings: After quality control, the primary analysis included 29,838 patients (16,902 with Crohn's disease, 12,597 with ulcerative colitis). Three loci (NOD2, MHC, and MST1 3p21) were associated with subphenotypes of inflammatory bowel disease, mainly disease location (essentially fixed over time; median follow-up of 10·5 years). Little or no genetic association with disease behaviour (which changed dramatically over time) remained after conditioning on disease location and age at onset. The genetic risk score representing all known risk alleles for inflammatory bowel disease showed strong association with disease subphenotype (p=1·65 × 10(-78)), even after exclusion of NOD2, MHC, and 3p21 (p=9·23 × 10(-18)). Predictive models based on the genetic risk score strongly distinguished colonic from ileal Crohn's disease. Our genetic risk score could also identify a small number of patients with discrepant genetic risk profiles who were significantly more likely to have a revised diagnosis after follow-up (p=6·8 × 10(-4)).

    Interpretation: Our data support a continuum of disorders within inflammatory bowel disease, much better explained by three groups (ileal Crohn's disease, colonic Crohn's disease, and ulcerative colitis) than by Crohn's disease and ulcerative colitis as currently defined. Disease location is an intrinsic aspect of a patient's disease, in part genetically determined, and the major driver to changes in disease behaviour over time.

    Funding: International Inflammatory Bowel Disease Genetics Consortium members funding sources (see Acknowledgments for full list).

    Funded by: AHRQ HHS: HS021747, R01 HS021747; Chief Scientist Office: ETM/137, ETM/75; Medical Research Council: G0600329, G0800675, G0800759; NCI NIH HHS: P30 CA016359, R01 CA141743; NIAID NIH HHS: AI067068, U01 AI067068; NIDCR NIH HHS: U54 DE023789, U54DE023789-01; NIDDK NIH HHS: DK062413, DK062420, DK062422, DK062423, DK062429, DK062429-S1, DK062431, DK062432, DK076984, DK084554, P01 DK046763, P01DK046763, P30 DK043351, P30 DK089502, R03 DK076984, R21 DK084554, U01 DK062413, U01 DK062418, U01 DK062420, U01 DK062422, U01 DK062423, U01 DK062429, U01 DK062431, U01 DK062432; Wellcome Trust: 083948/Z/07/Z, 085475/B/08/Z, 085475/Z/08/Z, 098051, 098759

    Lancet (London, England) 2015;387;10014;156-67

  • Over-expression of Plk4 induces centrosome amplification, loss of primary cilia and associated tissue hyperplasia in the mouse.

    Coelho PA, Bury L, Shahbazi MN, Liakath-Ali K, Tate PH, Wormald S, Hindley CJ, Huch M, Archer J, Skarnes WC, Zernicka-Goetz M and Glover DM

    Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK.

    To address the long-known relationship between supernumerary centrosomes and cancer, we have generated a transgenic mouse that permits inducible expression of the master regulator of centriole duplication, Polo-like-kinase-4 (Plk4). Over-expression of Plk4 from this transgene advances the onset of tumour formation that occurs in the absence of the tumour suppressor p53. Plk4 over-expression also leads to hyperproliferation of cells in the pancreas and skin that is enhanced in a p53 null background. Pancreatic islets become enlarged following Plk4 over-expression as a result of equal expansion of α- and β-cells, which exhibit centrosome amplification. Mice overexpressing Plk4 develop grey hair due to a loss of differentiated melanocytes and bald patches of skin associated with a thickening of the epidermis. This reflects an increase in proliferating cells expressing keratin 5 in the basal epidermal layer and the expansion of these cells into suprabasal layers. Such cells also express keratin 6, a marker for hyperplasia. This is paralleled by a decreased expression of later differentiation markers, involucrin, filaggrin and loricrin. Proliferating cells showed an increase in centrosome number and a loss of primary cilia, events that were mirrored in primary cultures of keratinocytes established from these animals. We discuss how repeated duplication of centrioles appears to prevent the formation of basal bodies leading to loss of primary cilia, disruption of signalling and thereby aberrant differentiation of cells within the epidermis. The absence of p53 permits cells with increased centrosomes to continue dividing, thus setting up a neoplastic state of error prone mitoses, a prerequisite for cancer development.

    Funded by: Cancer Research UK; Wellcome Trust: 104151

    Open biology 2015;5;12;150209

  • Genome-wide meta-analysis identifies six novel loci associated with habitual coffee consumption.

    Coffee and Caffeine Genetics Consortium, Cornelis MC, Byrne EM, Esko T, Nalls MA, Ganna A, Paynter N, Monda KL, Amin N, Fischer K, Renstrom F, Ngwa JS, Huikari V, Cavadino A, Nolte IM, Teumer A, Yu K, Marques-Vidal P, Rawal R, Manichaikul A, Wojczynski MK, Vink JM, Zhao JH, Burlutsky G, Lahti J, Mikkilä V, Lemaitre RN, Eriksson J, Musani SK, Tanaka T, Geller F, Luan J, Hui J, Mägi R, Dimitriou M, Garcia ME, Ho WK, Wright MJ, Rose LM, Magnusson PK, Pedersen NL, Couper D, Oostra BA, Hofman A, Ikram MA, Tiemeier HW, Uitterlinden AG, van Rooij FJ, Barroso I, Johansson I, Xue L, Kaakinen M, Milani L, Power C, Snieder H, Stolk RP, Baumeister SE, Biffar R, Gu F, Bastardot F, Kutalik Z, Jacobs DR, Forouhi NG, Mihailov E, Lind L, Lindgren C, Michaëlsson K, Morris A, Jensen M, Khaw KT, Luben RN, Wang JJ, Männistö S, Perälä MM, Kähönen M, Lehtimäki T, Viikari J, Mozaffarian D, Mukamal K, Psaty BM, Döring A, Heath AC, Montgomery GW, Dahmen N, Carithers T, Tucker KL, Ferrucci L, Boyd HA, Melbye M, Treur JL, Mellström D, Hottenga JJ, Prokopenko I, Tönjes A, Deloukas P, Kanoni S, Lorentzon M, Houston DK, Liu Y, Danesh J, Rasheed A, Mason MA, Zonderman AB, Franke L, Kristal BS, International Parkinson's Disease Genomics Consortium (IPDGC), North American Brain Expression Consortium (NABEC), UK Brain Expression Consortium (UKBEC), Karjalainen J, Reed DR, Westra HJ, Evans MK, Saleheen D, Harris TB, Dedoussis G, Curhan G, Stumvoll M, Beilby J, Pasquale LR, Feenstra B, Bandinelli S, Ordovas JM, Chan AT, Peters U, Ohlsson C, Gieger C, Martin NG, Waldenberger M, Siscovick DS, Raitakari O, Eriksson JG, Mitchell P, Hunter DJ, Kraft P, Rimm EB, Boomsma DI, Borecki IB, Loos RJ, Wareham NJ, Vollenweider P, Caporaso N, Grabe HJ, Neuhouser ML, Wolffenbuttel BH, Hu FB, Hyppönen E, Järvelin MR, Cupples LA, Franks PW, Ridker PM, van Duijn CM, Heiss G, Metspalu A, North KE, Ingelsson E, Nettleton JA, van Dam RM and Chasman DI

    1] Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA [2] Department of Nutrition, Harvard School of Public Health, Boston, MA, USA.

    Coffee, a major dietary source of caffeine, is among the most widely consumed beverages in the world and has received considerable attention regarding health risks and benefits. We conducted a genome-wide (GW) meta-analysis of predominately regular-type coffee consumption (cups per day) among up to 91,462 coffee consumers of European ancestry with top single-nucleotide polymorphisms (SNPs) followed-up in ~30 062 and 7964 coffee consumers of European and African-American ancestry, respectively. Studies from both stages were combined in a trans-ethnic meta-analysis. Confirmed loci were examined for putative functional and biological relevance. Eight loci, including six novel loci, met GW significance (log10Bayes factor (BF)>5.64) with per-allele effect sizes of 0.03-0.14 cups per day. Six are located in or near genes potentially involved in pharmacokinetics (ABCG2, AHR, POR and CYP1A2) and pharmacodynamics (BDNF and SLC6A4) of caffeine. Two map to GCKR and MLXIPL genes related to metabolic traits but lacking known roles in coffee consumption. Enhancer and promoter histone marks populate the regions of many confirmed loci and several potential regulatory SNPs are highly correlated with the lead SNP of each. SNP alleles near GCKR, MLXIPL, BDNF and CYP1A2 that were associated with higher coffee consumption have previously been associated with smoking initiation, higher adiposity and fasting insulin and glucose but lower blood pressure and favorable lipid, inflammatory and liver enzyme profiles (P<5 × 10(-8)).Our genetic findings among European and African-American adults reinforce the role of caffeine in mediating habitual coffee consumption and may point to molecular mechanisms underlying inter-individual variability in pharmacological and health effects of coffee.

    Funded by: British Heart Foundation: RG/08/014/24067; Cancer Research UK: 14136; Medical Research Council: G1000143, MC_U106179471, MC_UP_A100_1003, MC_UU_12015/1, MC_UU_12015/5, MR/L003120/1; NEI NIH HHS: R01 EY015473; NHGRI NIH HHS: U01 HG004728; NIAAA NIH HHS: P60 AA011998; Wellcome Trust: 090532

    Molecular psychiatry 2015;20;5;647-56

  • PhenoMiner: from text to a database of phenotypes associated with OMIM diseases.

    Collier N, Groza T, Smedley D, Robinson PN, Oellrich A and Rebholz-Schuhmann D

    The University of Cambridge, Cambridge, CB3 9DB, UK, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK,

    Analysis of scientific and clinical phenotypes reported in the experimental literature has been curated manually to build high-quality databases such as the Online Mendelian Inheritance in Man (OMIM). However, the identification and harmonization of phenotype descriptions struggles with the diversity of human expressivity. We introduce a novel automated extraction approach called PhenoMiner that exploits full parsing and conceptual analysis. Apriori association mining is then used to identify relationships to human diseases. We applied PhenoMiner to the BMC open access collection and identified 13,636 phenotype candidates. We identified 28,155 phenotype-disorder hypotheses covering 4898 phenotypes and 1659 Mendelian disorders. Analysis showed: (i) the semantic distribution of the extracted terms against linked ontologies; (ii) a comparison of term overlap with the Human Phenotype Ontology (HP); (iii) moderate support for phenotype-disorder pairs in both OMIM and the literature; (iv) strong associations of phenotype-disorder pairs to known disease-genes pairs using PhenoDigm. The full list of PhenoMiner phenotypes (S1), phenotype-disorder associations (S2), association-filtered linked data (S3) and user database documentation (S5) is available as supplementary data and can be downloaded at under a Creative Commons Attribution 4.0 license. Database URL:

    Database : the journal of biological databases and curation 2015;2015

  • Concept selection for phenotypes and diseases using learn to rank.

    Collier N, Oellrich A and Groza T

    University of Cambridge, Cambridge, UK ; European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.

    Background: Phenotypes form the basis for determining the existence of a disease against the given evidence. Much of this evidence though remains locked away in text - scientific articles, clinical trial reports and electronic patient records (EPR) - where authors use the full expressivity of human language to report their observations.

    Results: In this paper we exploit a combination of off-the-shelf tools for extracting a machine understandable representation of phenotypes and other related concepts that concern the diagnosis and treatment of diseases. These are tested against a gold standard EPR collection that has been annotated with Unified Medical Language System (UMLS) concept identifiers: the ShARE/CLEF 2013 corpus for disorder detection. We evaluate four pipelines as stand-alone systems and then attempt to optimise semantic-type based performance using several learn-to-rank (LTR) approaches - three pairwise and one listwise. We observed that whilst overall Apache cTAKES tended to outperform other stand-alone systems on a strong recall (R = 0.57), precision was low (P = 0.09) leading to low-to-moderate F1 measure (F1 = 0.16). Moreover, there is substantial variation in system performance across semantic types for disorders. For example, the concept Findings (T033) seemed to be very challenging for all systems. Combining systems within LTR improved F1 substantially (F1 = 0.24) particularly for Disease or syndrome (T047) and Anatomical abnormality (T190). Whilst recall is improved markedly, precision remains a challenge (P = 0.15, R = 0.59).

    Journal of biomedical semantics 2015;6;24

  • High-throughput and quantitative genome-wide messenger RNA sequencing for molecular phenotyping.

    Collins JE, Wali N, Sealy IM, Morris JA, White RJ, Leonard SR, Jackson DK, Jones MC, Smerdon NC, Zamora J, Dooley CM, Carruthers SN, Barrett JC, Stemple DL and Busch-Nentwich EM

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Background: We present a genome-wide messenger RNA (mRNA) sequencing technique that converts small amounts of RNA from many samples into molecular phenotypes. It encompasses all steps from sample preparation to sequence analysis and is applicable to baseline profiling or perturbation measurements.

    Results: Multiplex sequencing of transcript 3' ends identifies differential transcript abundance independent of gene annotation. We show that increasing biological replicate number while maintaining the total amount of sequencing identifies more differentially abundant transcripts.

    Conclusions: This method can be implemented on polyadenylated RNA from any organism with an annotated reference genome and in any laboratory with access to Illumina sequencing.

    Funded by: Wellcome Trust: WT098051

    BMC genomics 2015;16;578

  • Entomological Monitoring and Evaluation: Diverse Transmission Settings of ICEMR Projects Will Require Local and Regional Malaria Elimination Strategies.

    Conn JE, Norris DE, Donnelly MJ, Beebe NW, Burkot TR, Coulibaly MB, Chery L, Eapen A, Keven JB, Kilama M, Kumar A, Lindsay SW, Moreno M, Quinones M, Reimer LJ, Russell TL, Smith DL, Thomas MB, Walker ED, Wilson ML and Yan G

    The Wadsworth Center, New York State Department of Health, Albany, New York; Department of Biomedical Sciences, School of Public Health, State University of New York, Albany, New York; The W. Harry Feinstone Department of Molecular Microbiology and Immunology, The Johns Hopkins Malaria Research Institute, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland; Department of Vector Biology, Liverpool School of Tropical Medicine, Liverpool, United Kingdom; Malaria Programme, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom; The University of Queensland, Brisbane, Australia; Commonwealth Scientific and Industrial Research Organisation (CSIRO), Brisbane, Australia; James Cook University, Cairns, Australia; Malaria Research and Training Centre, Faculty of Medicine Pharmacy and Dentistry, University of Sciences, Techniques and Technologies of Bamako, Bamako, Mali; Department of Chemistry, University of Washington, Seattle, Washington; National Institute of Malaria Research, National Institute of Epidemiology Campus Chennai, Tamil Nadu, India; Papua New Guinea Institute of Medical Research, Madang, Papua New Guinea; Infectious Diseases Research Collaboration, Kampala, Uganda; National Institute of Malaria Research, Field Unit Goa, Goa, India; School of Biological and Biomedical Sciences, Durham University, Durham, United Kingdom; Division of Infectious Diseases, University of California, San Diego School of Medicine, La Jolla, California; George Palade Labs, University of California, San Diego School of Medicine, La Jolla, California; Public Health Department, Faculty of Medicine, National University of Colombia, Bogotá, Colombia; Papua New Guinea Institute of Medical Research, Goroka and Madang, Papua New Guinea; Pacific Malaria Initiative Support Centre, School of Population Health, University of Queensland, Herston, Australia; Australian Centre for Tropical and International Health, University of Queensland, Herston, Australi

    The unprecedented global efforts for malaria elimination in the past decade have resulted in altered vectorial systems, vector behaviors, and bionomics. These changes combined with increasingly evident heterogeneities in malaria transmission require innovative vector control strategies in addition to the established practices of long-lasting insecticidal nets and indoor residual spraying. Integrated vector management will require focal and tailored vector control to achieve malaria elimination. This switch of emphasis from universal coverage to universal coverage plus additional interventions will be reliant on improved entomological monitoring and evaluation. In 2010, the National Institutes for Allergies and Infectious Diseases (NIAID) established a network of malaria research centers termed ICEMRs (International Centers for Excellence in Malaria Research) expressly to develop this evidence base in diverse malaria endemic settings. In this article, we contrast the differing ecology and transmission settings across the ICEMR study locations. In South America, Africa, and Asia, vector biologists are already dealing with many of the issues of pushing to elimination such as highly focal transmission, proportionate increase in the importance of outdoor and crepuscular biting, vector species complexity, and "sub patent" vector transmission.

    Funded by: NIAID NIH HHS: U19 AI089674, U19 AI089702

    The American journal of tropical medicine and hygiene 2015;93;3 Suppl;28-41

  • Species-wide whole genome sequencing reveals historical global spread and recent local persistence in Shigella flexneri.

    Connor TR, Barker CR, Baker KS, Weill FX, Talukder KA, Smith AM, Baker S, Gouali M, Pham Thanh D, Jahan Azmi I, Dias da Silveira W, Semmler T, Wieler LH, Jenkins C, Cravioto A, Faruque SM, Parkhill J, Wook Kim D, Keddy KH and Thomson NR

    Cardiff School of Biosciences, Cardiff, United Kingdom.

    Shigella flexneri is the most common cause of bacterial dysentery in low-income countries. Despite this, S. flexneri remains largely unexplored from a genomic standpoint and is still described using a vocabulary based on serotyping reactions developed over half-a-century ago. Here we combine whole genome sequencing with geographical and temporal data to examine the natural history of the species. Our analysis subdivides S. flexneri into seven phylogenetic groups (PGs); each containing two-or-more serotypes and characterised by distinct virulence gene complement and geographic range. Within the S. flexneri PGs we identify geographically restricted sub-lineages that appear to have persistently colonised regions for many decades to over 100 years. Although we found abundant evidence of antimicrobial resistance (AMR) determinant acquisition, our dataset shows no evidence of subsequent intercontinental spread of antimicrobial resistant strains. The pattern of colonisation and AMR gene acquisition suggest that S. flexneri has a distinct life-cycle involving local persistence.

    Funded by: Medical Research Council: MR/L015080/1; Wellcome Trust: 089276, 098051, 100087

    eLife 2015;4;e07335

  • Aberrant splicing of genes involved in haemoglobin synthesis and impaired terminal erythroid maturation in SF3B1 mutated refractory anaemia with ring sideroblasts.

    Conte S, Katayama S, Vesterlund L, Karimi M, Dimitriou M, Jansson M, Mortera-Blanco T, Unneberg P, Papaemmanuil E, Sander B, Skoog T, Campbell P, Walfridsson J, Kere J and Hellström-Lindberg E

    Karolinska Institutet, Department of Medicine (Huddinge), Centre for Hematology and Regenerative Medicine, Stockholm, Sweden.

    Refractory anaemia with ring sideroblasts (RARS) is distinguished by hyperplastic inefficient erythropoiesis, aberrant mitochondrial ferritin accumulation and anaemia. Heterozygous mutations in the spliceosome gene SF3B1 are found in a majority of RARS cases. To explore the link between SF3B1 mutations and anaemia, we studied mutated RARS CD34(+) marrow cells with regard to transcriptome sequencing, splice patterns and mutational allele burden during erythroid differentiation. Transcriptome profiling during early erythroid differentiation revealed a marked up-regulation of genes involved in haemoglobin synthesis and in the oxidative phosphorylation process, and down-regulation of mitochondrial ABC transporters compared to normal bone marrow. Moreover, mis-splicing of genes involved in transcription regulation, particularly haemoglobin synthesis, was confirmed, indicating a compromised haemoglobinization during RARS erythropoiesis. In order to define the phase during which erythroid maturation of SF3B1 mutated cells is most affected, we assessed allele burden during erythroid differentiation in vitro and in vivo and found that SF3B1 mutated erythroblasts showed stable expansion until late erythroblast stage but that terminal maturation to reticulocytes was significantly reduced. In conclusion, SF3B1 mutated RARS progenitors display impaired splicing with potential downstream consequences for genes of key importance for haemoglobin synthesis and terminal erythroid differentiation.

    British journal of haematology 2015;171;4;478-90

  • Analysis of the genetic phylogeny of multifocal prostate cancer identifies multiple independent clonal expansions in neoplastic and morphologically normal prostate tissue.

    Cooper CS, Eeles R, Wedge DC, Van Loo P, Gundem G, Alexandrov LB, Kremeyer B, Butler A, Lynch AG, Camacho N, Massie CE, Kay J, Luxton HJ, Edwards S, Kote-Jarai Z, Dennis N, Merson S, Leongamornlert D, Zamora J, Corbishley C, Thomas S, Nik-Zainal S, O'Meara S, Matthews L, Clark J, Hurst R, Mithen R, Bristow RG, Boutros PC, Fraser M, Cooke S, Raine K, Jones D, Menzies A, Stebbings L, Hinton J, Teague J, McLaren S, Mudie L, Hardy C, Anderson E, Joseph O, Goody V, Robinson B, Maddison M, Gamble S, Greenman C, Berney D, Hazell S, Livni N, ICGC Prostate Group, Fisher C, Ogden C, Kumar P, Thompson A, Woodhouse C, Nicol D, Mayer E, Dudderidge T, Shah NC, Gnanapragasam V, Voet T, Campbell P, Futreal A, Easton D, Warren AY, Foster CS, Stratton MR, Whitaker HC, McDermott U, Brewer DS and Neal DE

    Division of Genetics and Epidemiology, The Institute Of Cancer Research, London, UK.

    Genome-wide DNA sequencing was used to decrypt the phylogeny of multiple samples from distinct areas of cancer and morphologically normal tissue taken from the prostates of three men. Mutations were present at high levels in morphologically normal tissue distant from the cancer, reflecting clonal expansions, and the underlying mutational processes at work in morphologically normal tissue were also at work in cancer. Our observations demonstrate the existence of ongoing abnormal mutational processes, consistent with field effects, underlying carcinogenesis. This mechanism gives rise to extensive branching evolution and cancer clone mixing, as exemplified by the coexistence of multiple cancer lineages harboring distinct ERG fusions within a single cancer nodule. Subsets of mutations were shared either by morphologically normal and malignant tissues or between different ERG lineages, indicating earlier or separate clonal cell expansions. Our observations inform on the origin of multifocal disease and have implications for prostate cancer therapy in individual cases.

    Funded by: Cancer Research UK: 14835, A12758, C5047/A14835; Wellcome Trust

    Nature genetics 2015;47;4;367-372

  • Detection and correction of artefacts in estimation of rare copy number variants and analysis of rare deletions in type 1 diabetes.

    Cooper NJ, Shtir CJ, Smyth DJ, Guo H, Swafford AD, Zanda M, Hurles ME, Walker NM, Plagnol V, Cooper JD, Howson JM, Burren OS, Onengut-Gumuscu S, Rich SS and Todd JA

    JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Cambridge Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Wellcome Trust/MRC Building, Cambridge Biomedical Campus, Cambridge CB2 0XY, UK.

    Copy number variants (CNVs) have been proposed as a possible source of 'missing heritability' in complex human diseases. Two studies of type 1 diabetes (T1D) found null associations with common copy number polymorphisms, but CNVs of low frequency and high penetrance could still play a role. We used the Log-R-ratio intensity data from a dense single nucleotide polymorphism (SNP) array, ImmunoChip, to detect rare CNV deletions (rDELs) and duplications (rDUPs) in 6808 T1D cases, 9954 controls and 2206 families with T1D-affected offspring. Initial analyses detected CNV associations. However, these were shown to be false-positive findings, failing replication with polymerase chain reaction. We developed a pipeline of quality control (QC) tests that were calibrated using systematic testing of sensitivity and specificity. The case-control odds ratios (OR) of CNV burden on T1D risk resulting from this QC pipeline converged on unity, suggesting no global frequency difference in rDELs or rDUPs. There was evidence that deletions could impact T1D risk for a small minority of cases, with enrichment for rDELs longer than 400 kb (OR = 1.57, P = 0.005). There were also 18 de novo rDELs detected in affected offspring but none for unaffected siblings (P = 0.03). No specific CNV regions showed robust evidence for association with T1D, although frequencies were lower than expected (most less than 0.1%), substantially reducing statistical power, which was examined in detail. We present an R-package, plumbCNV, which provides an automated approach for QC and detection of rare CNVs that can facilitate equivalent analyses of large-scale SNP array datasets.

    Funded by: NIDDK NIH HHS: U01 DK-062418; Wellcome Trust: 091157, 100140

    Human molecular genetics 2015;24;6;1774-90

  • A novel X-linked trichothiodystrophy associated with a nonsense mutation in RNF113A.

    Corbett MA, Dudding-Byth T, Crock PA, Botta E, Christie LM, Nardo T, Caligiuri G, Hobson L, Boyle J, Mansour A, Friend KL, Crawford J, Jackson G, Vandeleur L, Hackett A, Tarpey P, Stratton MR, Turner G, Gécz J and Field M

    Neurogenetics Research Program, School of Paediatrics and Reproductive Health, University of Adelaide, Adelaide, Australia.

    Background: Trichothiodystrophy (TTD) is a group of rare autosomal recessive disorders that variably affect a wide range of organs derived from the neuroectoderm. The key diagnostic feature is sparse, brittle, sulfur deficient hair that has a 'tiger-tail' banding pattern under polarising light microscopy.

    Patients and methods: We describe two male cousins affected by TTD associated with microcephaly, profound intellectual disability, sparse brittle hair, aged appearance, short stature, facial dysmorphism, seizures, an immunoglobulin deficiency, multiple endocrine abnormalities, cerebellar hypoplasia and partial absence of the corpus callosum, in the absence of cellular photosensitivity and ichthyosis. Obligate female carriers showed 100% skewed X-chromosome inactivation. Linkage analysis and Sanger sequencing of 737 X-chromosome exons and whole exome sequencing was used to find the responsible gene and mutation.

    Results: Linkage analysis localised the disease allele to a 7.75 Mb interval from Xq23-q25. We identified a nonsense mutation in the highly conserved RNF113A gene (c.901 C>T, p.Q301*). The mutation segregated with the disease in the family and was not observed in over 100,000 control X chromosomes. The mutation markedly reduced RNF113A protein expression in extracts from lymphoblastoid cell lines derived from the affected individuals.

    Conclusions: The association of RNF113A mutation with non-photosensitive TTD identifies a new locus for these disorders on the X chromosome. The extended phenotype within this family includes panhypopituitarism, cutis marmorata and congenital short oesophagus.

    Journal of medical genetics 2015;52;4;269-74

  • International genome-wide meta-analysis identifies new primary biliary cirrhosis risk loci and targetable pathogenic pathways.

    Cordell HJ, Han Y, Mells GF, Li Y, Hirschfield GM, Greene CS, Xie G, Juran BD, Zhu D, Qian DC, Floyd JA, Morley KI, Prati D, Lleo A, Cusi D, Canadian-US PBC Consortium, Italian PBC Genetics Study Group, UK-PBC Consortium, Gershwin ME, Anderson CA, Lazaridis KN, Invernizzi P, Seldin MF, Sandford RN, Amos CI and Siminovitch KA

    Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne NE1 3BZ, UK.

    Primary biliary cirrhosis (PBC) is a classical autoimmune liver disease for which effective immunomodulatory therapy is lacking. Here we perform meta-analyses of discovery data sets from genome-wide association studies of European subjects (n=2,764 cases and 10,475 controls) followed by validation genotyping in an independent cohort (n=3,716 cases and 4,261 controls). We discover and validate six previously unknown risk loci for PBC (Pcombined<5 × 10(-8)) and used pathway analysis to identify JAK-STAT/IL12/IL27 signalling and cytokine-cytokine pathways, for which relevant therapies exist.

    Funded by: Canadian Institutes of Health Research: MOP74621; Medical Research Council: G0800460, MR/L001489/1; NCATS NIH HHS: UL1 TR001108; NCI NIH HHS: P30 CA023108; NIDDK NIH HHS: R01 DK080670, R01 DK091823, R01DK091823, R01DK80670; NIGMS NIH HHS: GM103534, P20 GM103534; Wellcome Trust: 085475, 085925, 087436, 090355

    Nature communications 2015;6;8019

  • Clostridium sordellii genome analysis reveals plasmid localized toxin genes encoded within pathogenicity loci.

    Couchman EC, Browne HP, Dunn M, Lawley TD, Songer JG, Hall V, Petrovska L, Vidor C, Awad M, Lyras D and Fairweather NF

    Department of Life Sciences, Centre for Molecular Bacteriology and Infection, Imperial College London, London, SW7 2AZ, UK.

    Background: Clostridium sordellii can cause severe infections in animals and humans, the latter associated with trauma, toxic shock and often-fatal gynaecological infections. Strains can produce two large clostridial cytotoxins (LCCs), TcsL and TcsH, related to those produced by Clostridium difficile, Clostridium novyi and Clostridium perfringens, but the genetic basis of toxin production remains uncharacterised.

    Results: Phylogenetic analysis of the genome sequences of 44 strains isolated from human and animal infections in the UK, US and Australia placed the species into four clades. Although all strains originated from animal or clinical disease, only 5 strains contained LCC genes: 4 strains contain tcsL alone and one strain contains tcsL and tcsH. Four toxin-positive strains were found within one clade. Where present, tcsL and tcsH were localised in a pathogenicity locus, similar to but distinct from that present in C. difficile. In contrast to C. difficile, where the LCCs are chromosomally localised, the C. sordellii tcsL and tcsH genes are localised on plasmids. Our data suggest gain and loss of entire toxigenic plasmids in addition to horizontal transfer of the pathogenicity locus. A high quality, annotated sequence of ATCC9714 reveals many putative virulence factors including neuraminidase, phospholipase C and the cholesterol-dependent cytolysin sordellilysin that are highly conserved between all strains studied.

    Conclusions: Genome analysis of C. sordellii reveals that the LCCs, the major virulence factors, are localised on plasmids. Many strains do not contain the LCC genes; it is probable that in several of these cases the plasmid has been lost upon laboratory subculture. Our data are consistent with LCCs being the primary virulence factors in the majority of infections, but LCC-negative strains may precipitate certain categories of infection. A high quality genome sequence reveals putative virulence factors whose role in virulence can be investigated.

    Funded by: Biotechnology and Biological Sciences Research Council; Wellcome Trust: 086418, 098051

    BMC genomics 2015;16;392

  • Targeted resequencing of the pericentromere of chromosome 2 linked to constitutional delay of growth and puberty.

    Cousminer DL, Leinonen JT, Sarin AP, Chheda H, Surakka I, Wehkalampi K, Ellonen P, Ripatti S, Dunkel L, Palotie A and Widén E

    Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland.

    Constitutional delay of growth and puberty (CDGP) is the most common cause of pubertal delay. CDGP is defined as the proportion of the normal population who experience pubertal onset at least 2 SD later than the population mean, representing 2.3% of all adolescents. While adolescents with CDGP spontaneously enter puberty, they are at risk for short stature, decreased bone mineral density, and psychosocial problems. Genetic factors contribute heavily to the timing of puberty, but the vast majority of CDGP cases remain biologically unexplained, and there is no definitive test to distinguish CDGP from pathological absence of puberty during adolescence. Recently, we published a study identifying significant linkage between a locus at the pericentromeric region of chromosome 2 (chr 2) and CDGP in Finnish families. To investigate this region for causal variation, we sequenced chr 2 between the genomic coordinates of 79-124 Mb (genome build GRCh37) in the proband and affected parent of the 13 families contributing most to this linkage signal. One gene, DNAH6, harbored 6 protein-altering low-frequency variants (< 6% in the Finnish population) in 10 of the CDGP probands. We sequenced an additional 135 unrelated Finnish CDGP subjects and utilized the unique Sequencing Initiative Suomi (SISu) population reference exome set to show that while 5 of these variants were present in the CDGP set, they were also present in the Finnish population at similar frequencies. Additional variants in the targeted region could not be prioritized for follow-up, possibly due to gaps in sequencing coverage or lack of functional knowledge of non-genic genomic regions. Thus, despite having a well-characterized sample collection from a genetically homogeneous population with a large population-based reference sequence dataset, we were unable to pinpoint variation in the linked region predisposing delayed puberty. This study highlights the difficulties of detecting genetic variants under linkage regions for complex traits and suggests that advancements in annotation of gene function and regulatory regions of the genome will be critical for solving the genetic background of complex phenotypes like CDGP.

    PloS one 2015;10;6;e0128524

  • Pathway and network analysis of cancer genomes.

    Creixell P, Reimand J, Haider S, Wu G, Shibata T, Vazquez M, Mustonen V, Gonzalez-Perez A, Pearson J, Sander C, Raphael BJ, Marks DS, Ouellette BFF, Valencia A, Bader GD, Boutros PC, Stuart JM, Linding R, Lopez-Bigas N, Stein LD and Mutation Consequences and Pathway Analysis Working Group of the International Cancer Genome Consortium

    Cellular Signal Integration Group (C-SIG), Technical University of Denmark, Lyngby, Denmark.

    Genomic information on tumors from 50 cancer types cataloged by the International Cancer Genome Consortium (ICGC) shows that only a few well-studied driver genes are frequently mutated, in contrast to many infrequently mutated genes that may also contribute to tumor biology. Hence there has been large interest in developing pathway and network analysis methods that group genes and illuminate the processes involved. We provide an overview of these analysis techniques and show where they guide mechanistic and translational investigations.

    Funded by: Canadian Institutes of Health Research; NCI NIH HHS: P30 CA008748, R01 CA180778, R01-CA180778, U24 CA143858, U24-CA143858; NHGRI NIH HHS: P41 HG003751, R01 HG007069; NIGMS NIH HHS: P41 GM103504, R01 GM109031

    Nature methods 2015;12;7;615-621

  • A switch in time.

    Crellen T and Iantorno S

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2015;13;4;190

  • The post-vaccine microevolution of invasive Streptococcus pneumoniae.

    Cremers AJ, Mobegi FM, de Jonge MI, van Hijum SA, Meis JF, Hermans PW, Ferwerda G, Bentley SD and Zomer AL

    Radboud university medical center, Laboratory of Pediatric Infectious Diseases, Nijmegen, The Netherlands.

    The 7-valent pneumococcal conjugated vaccine (PCV7) has affected the genetic population of Streptococcus pneumoniae in pediatric carriage. Little is known however about pneumococcal population genomics in adult invasive pneumococcal disease (IPD) under vaccine pressure. We sequenced and serotyped 349 strains of S. pneumoniae isolated from IPD patients in Nijmegen between 2001 and 2011. Introduction of PCV7 in the Dutch National Immunization Program in 2006 preluded substantial alterations in the IPD population structure caused by serotype replacement. No evidence could be found for vaccine induced capsular switches. We observed that after a temporary bottleneck in gene diversity after the introduction of PCV7, the accessory gene pool re-expanded mainly by genes already circulating pre-PCV7. In the post-vaccine genomic population a number of genes changed frequency, certain genes became overrepresented in vaccine serotypes, while others shifted towards non-vaccine serotypes. Whether these dynamics in the invasive pneumococcal population have truly contributed to invasiveness and manifestations of disease remains to be further elucidated. We suggest the use of whole genome sequencing for surveillance of pneumococcal population dynamics that could give a prospect on the course of disease, facilitating effective prevention and management of IPD.

    Scientific reports 2015;5;14952

  • Population genomic datasets describing the post-vaccine evolutionary epidemiology of Streptococcus pneumoniae.

    Croucher NJ, Finkelstein JA, Pelton SI, Parkhill J, Bentley SD, Lipsitch M and Hanage WP

    Department of Infectious Disease Epidemiology, Imperial College London, St Mary's Campus , London W2 1pg, UK.

    Streptococcus pneumoniae is common nasopharyngeal commensal bacterium and important human pathogen. Vaccines against a subset of pneumococcal antigenic diversity have reduced rates of disease, without changing the frequency of asymptomatic carriage, through altering the bacterial population structure. These changes can be studied in detail through using genome sequencing to characterise systematically-sampled collections of carried S. pneumoniae. This dataset consists of 616 annotated draft genomes of isolates collected from children during routine visits to primary care physicians in Massachusetts between 2001, shortly after the seven valent polysaccharide conjugate vaccine was introduced, and 2007. Also made available are a core genome alignment and phylogeny describing the overall population structure, clusters of orthologous protein sequences, software for inferring serotype from Illumina reads, and whole genome alignments for the analysis of closely-related sets of pneumococci. These data can be used to study both bacterial evolution and the epidemiology of a pathogen population under selection from vaccine-induced immunity.

    Funded by: Medical Research Council: MR/K010174/1; NIAID NIH HHS: R01 AI066304, R01AI066304; Wellcome Trust: 098051, 104169/Z/14/Z

    Scientific data 2015;2;150058

  • Selective and genetic constraints on pneumococcal serotype switching.

    Croucher NJ, Kagedan L, Thompson CM, Parkhill J, Bentley SD, Finkelstein JA, Lipsitch M and Hanage WP

    Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom; Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America.

    Streptococcus pneumoniae isolates typically express one of over 90 immunologically distinguishable polysaccharide capsules (serotypes), which can be classified into "serogroups" based on cross-reactivity with certain antibodies. Pneumococci can alter their serotype through recombinations affecting the capsule polysaccharide synthesis (cps) locus. Twenty such "serotype switching" events were fully characterised using a collection of 616 whole genome sequences from systematic surveys of pneumococcal carriage. Eleven of these were within-serogroup switches, representing a highly significant (p < 0.0001) enrichment based on the observed serotype distribution. Whereas the recombinations resulting in between-serogroup switches all spanned the entire cps locus, some of those that caused within-serogroup switches did not. However, higher rates of within-serogroup switching could not be fully explained by either more frequent, shorter recombinations, nor by genetic linkage to genes involved in β-lactam resistance. This suggested the observed pattern was a consequence of selection for preserving serogroup. Phenotyping of strains constructed to express different serotypes in common genetic backgrounds was used to test whether genotypes were physiologically adapted to particular serogroups. These data were consistent with epistatic interactions between the cps locus and the rest of the genome that were specific to serotype, but not serogroup, meaning they were unlikely to account for the observed distribution of capsule types. Exclusion of these genetic and physiological hypotheses suggested future work should focus on alternative mechanisms, such as host immunity spanning multiple serotypes within the same serogroup, which might explain the observed pattern.

    Funded by: Medical Research Council: MR/K010174/1; NIAID NIH HHS: R01 AI066304, R01AI066304; Wellcome Trust: 098051, 104169/Z/14/Z

    PLoS genetics 2015;11;3;e1005095

  • Improving the Sequence Ontology terminology for genomic variant annotation.

    Cunningham F, Moore B, Ruiz-Schultz N, Ritchie GR and Eilbeck K

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK.

    Background: The Genome Variant Format (GVF) uses the Sequence Ontology (SO) to enable detailed annotation of sequence variation. The annotation includes SO terms for the type of sequence alteration, the genomic features that are changed and the effect of the alteration. The SO maintains and updates the specification and provides the underlying ontologicial structure.

    Methods: A requirements analysis was undertaken to gather terms missing in the SO release at the time, but needed to adequately describe the effects of sequence alteration on a set of variant genomic annotations. We have extended and remodeled the SO to include and define all terms that describe the effect of variation upon reference genomic features in the Ensembl variation databases.

    Results: The new terminology was used to annotate the human reference genome with a set of variants from both COSMIC and dbSNP. A GVF file containing 170,853 sequence alterations was generated using the SO terminology to annotate the kinds of alteration, the effect of the alteration and the reference feature changed. There are four kinds of alteration and 24 kinds of effect seen in this dataset. (Ensembl Variation annotates 34 different SO consequence terms:

    Conclusions: We explain the updates to the Sequence Ontology to describe the effect of variation on existing reference features. We have provided a set of annotations using this terminology, and the well defined GVF specification. We have also provided a provisional exploration of this large annotation dataset.

    Funded by: NHGRI NIH HHS: R01 HG004341, U41 HG007234, U41 HG007823; NICHD NIH HHS: R01 HD074078; NLM NIH HHS: T15 LM007124; Wellcome Trust: 095908

    Journal of biomedical semantics 2015;6;32

  • Susceptibility to tuberculosis is associated with variants in the ASAP1 gene encoding a regulator of dendritic cell migration.

    Curtis J, Luo Y, Zenner HL, Cuchet-Lourenço D, Wu C, Lo K, Maes M, Alisaac A, Stebbings E, Liu JZ, Kopanitsa L, Ignatyeva O, Balabanova Y, Nikolayevskyy V, Baessmann I, Thye T, Meyer CG, Nürnberg P, Horstmann RD, Drobniewski F, Plagnol V, Barrett JC and Nejentsev S

    Department of Medicine, University of Cambridge, Cambridge, UK.

    Human genetic factors predispose to tuberculosis (TB). We studied 7.6 million genetic variants in 5,530 people with pulmonary TB and in 5,607 healthy controls. In the combined analysis of these subjects and the follow-up cohort (15,087 TB patients and controls altogether), we found an association between TB and variants located in introns of the ASAP1 gene on chromosome 8q24 (P = 2.6 × 10(-11) for rs4733781; P = 1.0 × 10(-10) for rs10956514). Dendritic cells (DCs) showed high ASAP1 expression that was reduced after Mycobacterium tuberculosis infection, and rs10956514 was associated with the level of reduction of ASAP1 expression. The ASAP1 protein is involved in actin and membrane remodeling and has been associated with podosomes. The ASAP1-depleted DCs showed impaired matrix degradation and migration. Therefore, genetically determined excessive reduction of ASAP1 expression in M. tuberculosis-infected DCs may lead to their impaired migration, suggesting a potential mechanism of predisposition to TB.

    Funded by: Wellcome Trust: 076113, 085475, 088838, 088838/Z/09/Z, 090355, 095198, 095198/Z/10/Z

    Nature genetics 2015;47;5;523-527

  • Implications of multiple freeze-thawing on respiratory samples for culture-independent analyses.

    Cuthbertson L, Rogers GB, Walker AW, Oliver A, Hoffman LR, Carroll MP, Parkhill J, Bruce KD and van der Gast CJ

    NERC Centre for Ecology & Hydrology, Wallingford, OX10 8BB, UK; Institute of Pharmaceutical Science, Molecular Microbiology Research Laboratory, King's College London, London SE1 9NH, UK.

    Background: Best practice when performing culture-independent microbiological analysis of sputum samples involves their rapid freezing and storage at -80°C. However, accessing biobanked collections can mean that material has been passed through repeated freeze-thaw cycles. The aim of this study was to determine the impact of these cycles on microbial community profiles.

    Methods: Sputum was collected from eight adults with cystic fibrosis, and each sample was subjected to six freeze-thaw cycles. Following each cycle, an aliquot was removed and treated with propidium monoazide (PMA) prior to DNA extraction and 16S rRNA gene pyrosequencing.

    Results: The impact of freeze-thaw cycles was greatest on rare members of the microbiota, with variation beyond that detected with within-sample repeat analysis observed after three cycles.

    Conclusion: Four or more freeze thaw cycles result in a significant distortion of microbiota profiles from CF sputum.

    Funded by: NHLBI NIH HHS: K02 HL105543, K02HL105543; NIDDK NIH HHS: P30 DK089507; Wellcome Trust: WT 098051

    Journal of cystic fibrosis : official journal of the European Cystic Fibrosis Society 2015;14;4;464-7


    Cybis GB, Sinsheimer JS, Bedford T, Mather AE, Lemey P and Suchard MA

    Federal University of Rio Grande do Sul.

    Understanding which phenotypic traits are consistently correlated throughout evolution is a highly pertinent problem in modern evolutionary biology. Here, we propose a multivariate phylogenetic latent liability model for assessing the correlation between multiple types of data, while simultaneously controlling for their unknown shared evolutionary history informed through molecular sequences. The latent formulation enables us to consider in a single model combinations of continuous traits, discrete binary traits, and discrete traits with multiple ordered and unordered states. Previous approaches have entertained a single data type generally along a fixed history, precluding estimation of correlation between traits and ignoring uncertainty in the history. We implement our model in a Bayesian phylogenetic framework, and discuss inference techniques for hypothesis testing. Finally, we showcase the method through applications to columbine flower morphology, antibiotic resistance in <i>Salmonella</i>, and epitope evolution in influenza.

    Funded by: European Research Council: 260864; NHGRI NIH HHS: R01 HG006139; NIAID NIH HHS: R01 AI107034

    The annals of applied statistics 2015;9;2;969-991

  • PAX5 is a tumor suppressor in mouse mutagenesis models of acute lymphoblastic leukemia.

    Dang J, Wei L, de Ridder J, Su X, Rust AG, Roberts KG, Payne-Turner D, Cheng J, Ma J, Qu C, Wu G, Song G, Huether RG, Schulman B, Janke L, Zhang J, Downing JR, van der Weyden L, Adams DJ and Mullighan CG

    Department of Pathology, St Jude Children's Research Hospital, Memphis, TN;

    Alterations of genes encoding transcriptional regulators of lymphoid development are a hallmark of B-progenitor acute lymphoblastic leukemia (B-ALL) and most commonly involve PAX5, encoding the DNA-binding transcription factor paired-box 5. The majority of PAX5 alterations in ALL are heterozygous, and key PAX5 target genes are expressed in leukemic cells, suggesting that PAX5 may be a haploinsufficient tumor suppressor. To examine the role of PAX5 alterations in leukemogenesis, we performed mutagenesis screens of mice heterozygous for a loss-of-function Pax5 allele. Both chemical and retroviral mutagenesis resulted in a significantly increased penetrance and reduced latency of leukemia, with a shift to B-lymphoid lineage. Genomic profiling identified a high frequency of secondary genomic mutations, deletions, and retroviral insertions targeting B-lymphoid development, including Pax5, and additional genes and pathways mutated in ALL, including tumor suppressors, Ras, and Janus kinase-signal transducer and activator of transcription signaling. These results show that in contrast to simple Pax5 haploinsufficiency, multiple sequential alterations targeting lymphoid development are central to leukemogenesis and contribute to the arrest in lymphoid maturation characteristic of ALL. This cross-species analysis also validates the importance of concomitant alterations of multiple cellular growth, signaling, and tumor suppression pathways in the pathogenesis of B-ALL.

    Funded by: Cancer Research UK: 13031; NCI NIH HHS: P30 CA016056, P30 CA021765; Wellcome Trust

    Blood 2015;125;23;3609-17

  • Processing of Plasmodium falciparum Merozoite Surface Protein MSP1 Activates a Spectrin-Binding Function Enabling Parasite Egress from RBCs.

    Das S, Hertrich N, Perrin AJ, Withers-Martinez C, Collins CR, Jones ML, Watermeyer JM, Fobes ET, Martin SR, Saibil HR, Wright GJ, Treeck M, Epp C and Blackman MJ

    The Francis Crick Institute, Mill Hill Laboratory, Mill Hill, London, NW7 1AA, UK.

    The malaria parasite Plasmodium falciparum replicates within erythrocytes, producing progeny merozoites that are released from infected cells via a poorly understood process called egress. The most abundant merozoite surface protein, MSP1, is synthesized as a large precursor that undergoes proteolytic maturation by the parasite protease SUB1 just prior to egress. The function of MSP1 and its processing are unknown. Here we show that SUB1-mediated processing of MSP1 is important for parasite viability. Processing modifies the secondary structure of MSP1 and activates its capacity to bind spectrin, a molecular scaffold protein that is the major component of the host erythrocyte cytoskeleton. Parasites expressing an inefficiently processed MSP1 mutant show delayed egress, and merozoites lacking surface-bound MSP1 display a severe egress defect. Our results indicate that interactions between SUB1-processed merozoite surface MSP1 and the spectrin network of the erythrocyte cytoskeleton facilitate host erythrocyte rupture to enable parasite egress.

    Funded by: Wellcome Trust: 098051

    Cell host & microbe 2015;18;4;433-44

  • Rfam: annotating families of non-coding RNA sequences.

    Daub J, Eberhardt RY, Tate JG and Burge SW

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK,

    The primary task of the Rfam database is to collate experimentally validated noncoding RNA (ncRNA) sequences from the published literature and facilitate the prediction and annotation of new homologues in novel nucleotide sequences. We group homologous ncRNA sequences into "families" and related families are further grouped into "clans." We collate and manually curate data cross-references for these families from other databases and external resources. Our Web site offers researchers a simple interface to Rfam and provides tools with which to annotate their own sequences using our covariance models (CMs), through our tools for searching, browsing, and downloading information on Rfam families. In this chapter, we will work through examples of annotating a query sequence, collating family information, and searching for data.

    Funded by: Wellcome Trust: WT077044/Z/05/Z

    Methods in molecular biology (Clifton, N.J.) 2015;1269;349-63

  • It's diversity all the way down.

    David S and Hadfield J

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    This month's Genome Watch highlights how single-cell transcriptomic analysis of infected macrophages has provided insight into the diversity in host-pathogen interactions.

    Nature reviews. Microbiology 2015;13;12;740

  • Public perceptions of bacterial whole-genome sequencing for tuberculosis.

    Davies A, Scott S, Badger S, Török ME and Peacock SJ

    Department of Medicine, University of Cambridge, Cambridge, UK.

    The ability to sequence a bacterial genome in less than 1 day represents a step change for clinical microbiology. Genomic data can be used to investigate suspected outbreaks and rapidly to identify multidrug-resistant organisms. We held an open public debate to explore public understanding and perceptions of bacterial whole-genome sequencing (WGS), which we describe here.

    Funded by: Department of Health; Wellcome Trust

    Trends in genetics : TIG 2015;31;2;58-60

  • Genetic contributions to variation in general cognitive function: a meta-analysis of genome-wide association studies in the CHARGE consortium (N=53949).

    Davies G, Armstrong N, Bis JC, Bressler J, Chouraki V, Giddaluru S, Hofer E, Ibrahim-Verbaas CA, Kirin M, Lahti J, van der Lee SJ, Le Hellard S, Liu T, Marioni RE, Oldmeadow C, Postmus I, Smith AV, Smith JA, Thalamuthu A, Thomson R, Vitart V, Wang J, Yu L, Zgaga L, Zhao W, Boxall R, Harris SE, Hill WD, Liewald DC, Luciano M, Adams H, Ames D, Amin N, Amouyel P, Assareh AA, Au R, Becker JT, Beiser A, Berr C, Bertram L, Boerwinkle E, Buckley BM, Campbell H, Corley J, De Jager PL, Dufouil C, Eriksson JG, Espeseth T, Faul JD, Ford I, Generation Scotland, Gottesman RF, Griswold ME, Gudnason V, Harris TB, Heiss G, Hofman A, Holliday EG, Huffman J, Kardia SL, Kochan N, Knopman DS, Kwok JB, Lambert JC, Lee T, Li G, Li SC, Loitfelder M, Lopez OL, Lundervold AJ, Lundqvist A, Mather KA, Mirza SS, Nyberg L, Oostra BA, Palotie A, Papenberg G, Pattie A, Petrovic K, Polasek O, Psaty BM, Redmond P, Reppermund S, Rotter JI, Schmidt H, Schuur M, Schofield PW, Scott RJ, Steen VM, Stott DJ, van Swieten JC, Taylor KD, Trollor J, Trompet S, Uitterlinden AG, Weinstein G, Widen E, Windham BG, Jukema JW, Wright AF, Wright MJ, Yang Q, Amieva H, Attia JR, Bennett DA, Brodaty H, de Craen AJ, Hayward C, Ikram MA, Lindenberger U, Nilsson LG, Porteous DJ, Räikkönen K, Reinvang I, Rudan I, Sachdev PS, Schmidt R, Schofield PR, Srikanth V, Starr JM, Turner ST, Weir DR, Wilson JF, van Duijn C, Launer L, Fitzpatrick AL, Seshadri S, Mosley TH and Deary IJ

    1] Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, Edinburgh, UK [2] Department of Psychology, University of Edinburgh, Edinburgh, UK.

    General cognitive function is substantially heritable across the human life course from adolescence to old age. We investigated the genetic contribution to variation in this important, health- and well-being-related trait in middle-aged and older adults. We conducted a meta-analysis of genome-wide association studies of 31 cohorts (N=53,949) in which the participants had undertaken multiple, diverse cognitive tests. A general cognitive function phenotype was tested for, and created in each cohort by principal component analysis. We report 13 genome-wide significant single-nucleotide polymorphism (SNP) associations in three genomic regions, 6q16.1, 14q12 and 19q13.32 (best SNP and closest gene, respectively: rs10457441, P=3.93 × 10(-9), MIR2113; rs17522122, P=2.55 × 10(-8), AKAP6; rs10119, P=5.67 × 10(-9), APOE/TOMM40). We report one gene-based significant association with the HMGN1 gene located on chromosome 21 (P=1 × 10(-6)). These genes have previously been associated with neuropsychiatric phenotypes. Meta-analysis results are consistent with a polygenic model of inheritance. To estimate SNP-based heritability, the genome-wide complex trait analysis procedure was applied to two large cohorts, the Atherosclerosis Risk in Communities Study (N=6617) and the Health and Retirement Study (N=5976). The proportion of phenotypic variation accounted for by all genotyped common SNPs was 29% (s.e.=5%) and 28% (s.e.=7%), respectively. Using polygenic prediction analysis, ~1.2% of the variance in general cognitive function was predicted in the Generation Scotland cohort (N=5487; P=1.5 × 10(-17)). In hypothesis-driven tests, there was significant association between general cognitive function and four genes previously associated with Alzheimer's disease: TOMM40, APOE, ABCG1 and MEF2C.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F019394/1, S18386; Chief Scientist Office: CZB/4/505, CZB/4/710, CZD/16/6/4, ETM/55; Medical Research Council: G0700704, MC_PC_U127561128, MR/K026992/1; NCATS NIH HHS: UL1 TR000124, UL1 TR000135; NIA NIH HHS: P30 AG010161, P50 AG005133, R01 AG017917, RF1 AG015819, U01 AG049505; NIDDK NIH HHS: P30 DK063491

    Molecular psychiatry 2015;20;2;183-92

  • CtIP tetramer assembly is required for DNA-end resection and repair.

    Davies OR, Forment JV, Sun M, Belotserkovskaya R, Coates J, Galanty Y, Demir M, Morton CR, Rzechorzek NJ, Jackson SP and Pellegrini L

    Department of Biochemistry, University of Cambridge, Cambridge, UK.

    Mammalian CtIP protein has major roles in DNA double-strand break (DSB) repair. Although it is well established that CtIP promotes DNA-end resection in preparation for homology-dependent DSB repair, the molecular basis for this function has remained unknown. Here we show by biophysical and X-ray crystallographic analyses that the N-terminal domain of human CtIP exists as a stable homotetramer. Tetramerization results from interlocking interactions between the N-terminal extensions of CtIP's coiled-coil region, which lead to a 'dimer-of-dimers' architecture. Through interrogation of the CtIP structure, we identify a point mutation that abolishes tetramerization of the N-terminal domain while preserving dimerization in vitro. Notably, we establish that this mutation abrogates CtIP oligomer assembly in cells, thus leading to strong defects in DNA-end resection and gene conversion. These findings indicate that the CtIP tetramer architecture described here is essential for effective DSB repair by homologous recombination.

    Funded by: Cancer Research UK: A11224, C6/A11224, C6946/A14492; European Research Council: 268536; Wellcome Trust: 084279, WT092096

    Nature structural & molecular biology 2015;22;2;150-7

  • Analysis of mammalian gene function through broad-based phenotypic screens across a consortium of mouse clinics.

    de Angelis MH, Nicholson G, Selloum M, White J, Morgan H, Ramirez-Solis R, Sorg T, Wells S, Fuchs H, Fray M, Adams DJ, Adams NC, Adler T, Aguilar-Pimentel A, Ali-Hadji D, Amann G, André P, Atkins S, Auburtin A, Ayadi A, Becker J, Becker L, Bedu E, Bekeredjian R, Birling MC, Blake A, Bottomley J, Bowl M, Brault V, Busch DH, Bussell JN, Calzada-Wack J, Cater H, Champy MF, Charles P, Chevalier C, Chiani F, Codner GF, Combe R, Cox R, Dalloneau E, Dierich A, Di Fenza A, Doe B, Duchon A, Eickelberg O, Esapa CT, El Fertak L, Feigel T, Emelyanova I, Estabel J, Favor J, Flenniken A, Gambadoro A, Garrett L, Gates H, Gerdin AK, Gkoutos G, Greenaway S, Glasl L, Goetz P, Da Cruz IG, Götz A, Graw J, Guimond A, Hans W, Hicks G, Hölter SM, Höfler H, Hancock JM, Hoehndorf R, Hough T, Houghton R, Hurt A, Ivandic B, Jacobs H, Jacquot S, Jones N, Karp NA, Katus HA, Kitchen S, Klein-Rodewald T, Klingenspor M, Klopstock T, Lalanne V, Leblanc S, Lengger C, le Marchand E, Ludwig T, Lux A, McKerlie C, Maier H, Mandel JL, Marschall S, Mark M, Melvin DG, Meziane H, Micklich K, Mittelhauser C, Monassier L, Moulaert D, Muller S, Naton B, Neff F, Nolan PM, Nutter LM, Ollert M, Pavlovic G, Pellegata NS, Peter E, Petit-Demoulière B, Pickard A, Podrini C, Potter P, Pouilly L, Puk O, Richardson D, Rousseau S, Quintanilla-Fend L, Quwailid MM, Racz I, Rathkolb B, Riet F, Rossant J, Roux M, Rozman J, Ryder E, Salisbury J, Santos L, Schäble KH, Schiller E, Schrewe A, Schulz H, Steinkamp R, Simon M, Stewart M, Stöger C, Stöger T, Sun M, Sunter D, Teboul L, Tilly I, Tocchini-Valentini GP, Tost M, Treise I, Vasseur L, Velot E, Vogt-Weisenhorn D, Wagner C, Walling A, Weber B, Wendling O, Westerberg H, Willershäuser M, Wolf E, Wolter A, Wood J, Wurst W, Yildirim AÖ, Zeh R, Zimmer A, Zimprich A, EUMODIC Consortium, Holmes C, Steel KP, Herault Y, Gailus-Durner V, Mallon AM and Brown SD

    German Mouse Clinic, Institute of Experimental Genetics, Helmholtz Zentrum München German Research Center for Environmental Health (GmbH), München/Neuherberg, Germany.

    The function of the majority of genes in the mouse and human genomes remains unknown. The mouse embryonic stem cell knockout resource provides a basis for the characterization of relationships between genes and phenotypes. The EUMODIC consortium developed and validated robust methodologies for the broad-based phenotyping of knockouts through a pipeline comprising 20 disease-oriented platforms. We developed new statistical methods for pipeline design and data analysis aimed at detecting reproducible phenotypes with high power. We acquired phenotype data from 449 mutant alleles, representing 320 unique genes, of which half had no previous functional annotation. We captured data from over 27,000 mice, finding that 83% of the mutant lines are phenodeviant, with 65% demonstrating pleiotropy. Surprisingly, we found significant differences in phenotype annotation according to zygosity. New phenotypes were uncovered for many genes with previously unknown function, providing a powerful basis for hypothesis generation and further investigation in diverse systems.

    Funded by: Cancer Research UK: 13031; Medical Research Council: G0300212, MC_QA137918, MC_U142661184, MC_U142684171, MC_U142684172, MC_U142684175, MC_UP_1502/3; Wellcome Trust: 084655

    Nature genetics 2015;47;9;969-978

  • Understanding inflammatory bowel disease via immunogenetics.

    de Lange KM and Barrett JC

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1HH, United Kingdom.

    The major inflammatory bowel diseases, Crohn's disease and ulcerative colitis, are both debilitating disorders of the gastrointestinal tract, characterized by a dysregulated immune response to unknown environmental triggers. Both disorders have an important and overlapping genetic component, and much progress has been made in the last 20 years at elucidating some of the specific factors contributing to disease pathogenesis. Here we review our growing understanding of the immunogenetics of inflammatory bowel disease, from the twin studies that first implicated a role for the genome in disease susceptibility to the latest genome-wide association studies that have identified hundreds of associated loci. We consider the insight this offers into the biological mechanisms of the inflammatory bowel diseases, such as autophagy, barrier defence and T-cell differentiation signalling. We reflect on these findings in the context of other immune-related disorders, both common and rare. These observations include links both obvious, such as to pediatric colitis, and more surprising, such as to leprosy. As a changing picture of the underlying genetic architecture emerges, we turn to future directions for the study of complex human diseases such as these, including the use of next generation sequencing technologies for the identification of rarer risk alleles, and potential approaches for narrowing down associated loci to casual variants. We consider the implications of this work for translation into clinical practice, for example via early therapeutic hypotheses arising from our improved understanding of the biology of inflammatory bowel disease. Finally, we present potential opportunities to better understand environmental risk factors, such as the human microbiota in the context of immunogenetics.

    Funded by: Wellcome Trust: WT098051

    Journal of autoimmunity 2015;64;91-100

  • Elucidation of the RamA regulon in Klebsiella pneumoniae reveals a role in LPS regulation.

    De Majumdar S, Yu J, Fookes M, McAteer SP, Llobet E, Finn S, Spence S, Monahan A, Monaghan A, Kissenpfennig A, Ingram RJ, Bengoechea J, Gally DL, Fanning S, Elborn JS and Schneiders T

    Centre for Infection and Immunity, Belfast, United Kingdom; Division of Pathway and Infection Medicine, Edinburgh, United Kingdom.

    Klebsiella pneumoniae is a significant human pathogen, in part due to high rates of multidrug resistance. RamA is an intrinsic regulator in K. pneumoniae established to be important for the bacterial response to antimicrobial challenge; however, little is known about its possible wider regulatory role in this organism during infection. In this work, we demonstrate that RamA is a global transcriptional regulator that significantly perturbs the transcriptional landscape of K. pneumoniae, resulting in altered microbe-drug or microbe-host response. This is largely due to the direct regulation of 68 genes associated with a myriad of cellular functions. Importantly, RamA directly binds and activates the lpxC, lpxL-2 and lpxO genes associated with lipid A biosynthesis, thus resulting in modifications within the lipid A moiety of the lipopolysaccharide. RamA-mediated alterations decrease susceptibility to colistin E, polymyxin B and human cationic antimicrobial peptide LL-37. Increased RamA levels reduce K. pneumoniae adhesion and uptake into macrophages, which is supported by in vivo infection studies, that demonstrate increased systemic dissemination of ramA overexpressing K. pneumoniae. These data establish that RamA-mediated regulation directly perturbs microbial surface properties, including lipid A biosynthesis, which facilitate evasion from the innate host response. This highlights RamA as a global regulator that confers pathoadaptive phenotypes with implications for our understanding of the pathogenesis of Enterobacter, Salmonella and Citrobacter spp. that express orthologous RamA proteins.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council: G0601199; Wellcome Trust: 098051

    PLoS pathogens 2015;11;1;e1004627

  • The PDZ-domain protein Whirlin facilitates mechanosensory signaling in mammalian proprioceptors.

    de Nooij JC, Simon CM, Simon A, Doobar S, Steel KP, Banks RW, Mentis GZ, Bewick GS and Jessell TM

    Departments of Neuroscience, and Biochemistry and Molecular Biophysics,

    Mechanoreception is an essential feature of many sensory modalities. Nevertheless, the mechanisms that govern the conversion of a mechanical force to distinct patterns of action potentials remain poorly understood. Proprioceptive mechanoreceptors reside in skeletal muscle and inform the nervous system of the position of body and limbs in space. We show here that Whirlin/Deafness autosomal recessive 31 (DFNB31), a PDZ-scaffold protein involved in vestibular and auditory hair cell transduction, is also expressed by proprioceptive sensory neurons (pSNs) in dorsal root ganglia in mice. Whirlin localizes to the peripheral sensory endings of pSNs and facilitates pSN afferent firing in response to muscle stretch. The requirement of Whirlin in both proprioceptors and hair cells suggests that accessory mechanosensory signaling molecules define common features of mechanoreceptive processing across sensory systems.

    Funded by: Howard Hughes Medical Institute; Medical Research Council: G0601253; NINDS NIH HHS: R01NS078375; Wellcome Trust: 098051

    The Journal of neuroscience : the official journal of the Society for Neuroscience 2015;35;7;3073-84

  • Whole-genome sequencing confirms that Burkholderia pseudomallei multilocus sequence types common to both Cambodia and Australia are due to homoplasy.

    De Smet B, Sarovich DS, Price EP, Mayo M, Theobald V, Kham C, Heng S, Thong P, Holden MT, Parkhill J, Peacock SJ, Spratt BG, Jacobs JA, Vandamme P and Currie BJ

    Department of Clinical Sciences, Institute of Tropical Medicine, Antwerp, Belgium Laboratory of Microbiology, Faculty of Sciences, Ghent University, Ghent, Belgium.

    Burkholderia pseudomallei isolates with shared multilocus sequence types (STs) have not been isolated from different continents. We identified two STs shared between Australia and Cambodia. Whole-genome analysis revealed substantial diversity within STs, correctly identified the Asian or Australian origin, and confirmed that these shared STs were due to homoplasy.

    Funded by: Wellcome Trust: 089472, 098051

    Journal of clinical microbiology 2015;53;1;323-6

  • Genome-wide studies of verbal declarative memory in nondemented older people: the Cohorts for Heart and Aging Research in Genomic Epidemiology consortium.

    Debette S, Ibrahim Verbaas CA, Bressler J, Schuur M, Smith A, Bis JC, Davies G, Wolf C, Gudnason V, Chibnik LB, Yang Q, deStefano AL, de Quervain DJ, Srikanth V, Lahti J, Grabe HJ, Smith JA, Priebe L, Yu L, Karbalai N, Hayward C, Wilson JF, Campbell H, Petrovic K, Fornage M, Chauhan G, Yeo R, Boxall R, Becker J, Stegle O, Mather KA, Chouraki V, Sun Q, Rose LM, Resnick S, Oldmeadow C, Kirin M, Wright AF, Jonsdottir MK, Au R, Becker A, Amin N, Nalls MA, Turner ST, Kardia SL, Oostra B, Windham G, Coker LH, Zhao W, Knopman DS, Heiss G, Griswold ME, Gottesman RF, Vitart V, Hastie ND, Zgaga L, Rudan I, Polasek O, Holliday EG, Schofield P, Choi SH, Tanaka T, An Y, Perry RT, Kennedy RE, Sale MM, Wang J, Wadley VG, Liewald DC, Ridker PM, Gow AJ, Pattie A, Starr JM, Porteous D, Liu X, Thomson R, Armstrong NJ, Eiriksdottir G, Assareh AA, Kochan NA, Widen E, Palotie A, Hsieh YC, Eriksson JG, Vogler C, van Swieten JC, Shulman JM, Beiser A, Rotter J, Schmidt CO, Hoffmann W, Nöthen MM, Ferrucci L, Attia J, Uitterlinden AG, Amouyel P, Dartigues JF, Amieva H, Räikkönen K, Garcia M, Wolf PA, Hofman A, Longstreth WT, Psaty BM, Boerwinkle E, DeJager PL, Sachdev PS, Schmidt R, Breteler MM, Teumer A, Lopez OL, Cichon S, Chasman DI, Grodstein F, Müller-Myhsok B, Tzourio C, Papassotiropoulos A, Bennett DA, Ikram MA, Deary IJ, van Duijn CM, Launer L, Fitzpatrick AL, Seshadri S, Mosley TH and Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium

    Department of Neurology, Boston University School of Medicine, Boston, Massachusetts; Institut National de la Santé et de la Recherche Médicale, Epidemiology, University of Bordeaux; Department of Neurology, University Hospital of Bordeaux, Bordeaux, France. Electronic address:

    Background: Memory performance in older persons can reflect genetic influences on cognitive function and dementing processes. We aimed to identify genetic contributions to verbal declarative memory in a community setting.

    Methods: We conducted genome-wide association studies for paragraph or word list delayed recall in 19 cohorts from the Cohorts for Heart and Aging Research in Genomic Epidemiology consortium, comprising 29,076 dementia- and stroke-free individuals of European descent, aged ≥45 years. Replication of suggestive associations (p < 5 × 10(-6)) was sought in 10,617 participants of European descent, 3811 African-Americans, and 1561 young adults.

    Results: rs4420638, near APOE, was associated with poorer delayed recall performance in discovery (p = 5.57 × 10(-10)) and replication cohorts (p = 5.65 × 10(-8)). This association was stronger for paragraph than word list delayed recall and in the oldest persons. Two associations with specific tests, in subsets of the total sample, reached genome-wide significance in combined analyses of discovery and replication (rs11074779 [HS3ST4], p = 3.11 × 10(-8), and rs6813517 [SPOCK3], p = 2.58 × 10(-8)) near genes involved in immune response. A genetic score combining 58 independent suggestive memory risk variants was associated with increasing Alzheimer disease pathology in 725 autopsy samples. Association of memory risk loci with gene expression in 138 human hippocampus samples showed cis-associations with WDR48 and CLDN5, both related to ubiquitin metabolism.

    Conclusions: This largest study to date exploring the genetics of memory function in ~40,000 older individuals revealed genome-wide associations and suggested an involvement of immune and ubiquitin pathways.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F019394/1, S18386; Chief Scientist Office: CZB/4/505, CZB/4/710, ETM/55; Intramural NIH HHS: Z01 AG007270-08, Z01 AG007380-02; Medical Research Council: G0700704, MC_PC_U127561128, MR/K026992/1; NCATS NIH HHS: UL1 TR000124, UL1 TR000135, UL1TR000124; NCI NIH HHS: 5UO1CA098233, CA047988, CA134958, CA49449, CA50385, CA55075, CA65725, CA67262, CA87969; NCRR NIH HHS: UL1RR025005; NEI NIH HHS: EY015473, EY09611; NHGRI NIH HHS: HG004728, U01HG004399, U01HG004402; NHLBI NIH HHS: HL043851, HL054457, HL054464, HL054481, HL071917, HL080467, HL34594, HL35464, HL87660, K99HL098459, N01-HC-25195, N01HC55222, N01HC85079, N01HC85080, N01HC85081, N01HC85082, N01HC85083, N01HC85086, N02-HL-6-4278, R01 HL070825, R01 HL120393, R01HL086694, R01HL087641, R01HL087652, R01HL103612, R01HL105756, R01HL120393, R01HL59367, R01HL70825, U01 HL096917, U01HL080295; NIA NIH HHS: AG031287, AG033193, AG08122, AG16495, K08AG34290, K25AG41906, N01-AG-12100, P30 AG010161, P30AG10161, R01 AG008122, R01 AG017917, R01 AG033193, R01AG023629, R01AG05133, R01AG15819, R01AG17917, R01AG20098, R01AG30146, RF1 AG015819, U01 AG049505; NIDDK NIH HHS: DK058845, DK063491, DK070756, P30 DK063491, R01 DK084350; NINDS NIH HHS: NS041558, NS17950, U01 NS041588; PHS HHS: HHSN268200625226C, HHSN268200800007C, HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, HHSN268201100012C, HHSN268201200036C; Wellcome Trust: WT089062

    Biological psychiatry 2015;77;8;749-63

  • Simultaneous impairment of neuronal and metabolic function of mutated gephyrin in a patient with epileptic encephalopathy.

    Dejanovic B, Djémié T, Grünewald N, Suls A, Kress V, Hetsch F, Craiu D, Zemel M, Gormley P, Lal D, EuroEPINOMICS Dravet working group, Myers CT, Mefford HC, Palotie A, Helbig I, Meier JC, De Jonghe P, Weckhuysen S and Schwarz G

    Department of Chemistry, Institute of Biochemistry University of Cologne, Cologne, Germany

    Synaptic inhibition is essential for shaping the dynamics of neuronal networks, and aberrant inhibition plays an important role in neurological disorders. Gephyrin is a central player at inhibitory postsynapses, directly binds and organizes GABAA and glycine receptors (GABAARs and GlyRs), and is thereby indispensable for normal inhibitory neurotransmission. Additionally, gephyrin catalyzes the synthesis of the molybdenum cofactor (MoCo) in peripheral tissue. We identified a de novo missense mutation (G375D) in the gephyrin gene (GPHN) in a patient with epileptic encephalopathy resembling Dravet syndrome. Although stably expressed and correctly folded, gephyrin-G375D was non-synaptically localized in neurons and acted dominant-negatively on the clustering of wild-type gephyrin leading to a marked decrease in GABAAR surface expression and GABAergic signaling. We identified a decreased binding affinity between gephyrin-G375D and the receptors, suggesting that Gly375 is essential for gephyrin-receptor complex formation. Surprisingly, gephyrin-G375D was also unable to synthesize MoCo and activate MoCo-dependent enzymes. Thus, we describe a missense mutation that affects both functions of gephyrin and suggest that the identified defect at GABAergic synapses is the mechanism underlying the patient's severe phenotype.

    Funded by: NHLBI NIH HHS: HL113315; NICHD NIH HHS: U54 HD083091

    EMBO molecular medicine 2015;7;12;1580-94

  • Minimal morphological criteria for defining bone marrow dysplasia: a basis for clinical implementation of WHO classification of myelodysplastic syndromes.

    Della Porta MG, Travaglino E, Boveri E, Ponzoni M, Malcovati L, Papaemmanuil E, Rigolin GM, Pascutto C, Croci G, Gianelli U, Milani R, Ambaglio I, Elena C, Ubezio M, Da Via' MC, Bono E, Pietra D, Quaglia F, Bastia R, Ferretti V, Cuneo A, Morra E, Campbell PJ, Orazi A, Invernizzi R, Cazzola M and Rete Ematologica Lombarda (REL) Clinical Network

    1] Department of Hematology Oncology, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy [2] Department of Internal Medicine, University of Pavia, Pavia, Italy.

    The World Health Organization classification of myelodysplastic syndromes (MDS) is based on morphological evaluation of marrow dysplasia. We performed a systematic review of cytological and histological data from 1150 patients with peripheral blood cytopenia. We analyzed the frequency and discriminant power of single morphological abnormalities. A score to define minimal morphological criteria associated to the presence of marrow dysplasia was developed. This score showed high sensitivity/specificity (>90%), acceptable reproducibility and was independently validated. The severity of granulocytic and megakaryocytic dysplasia significantly affected survival. A close association was found between ring sideroblasts and SF3B1 mutations, and between severe granulocytic dysplasia and mutation of ASXL1, RUNX1, TP53 and SRSF2 genes. In myeloid neoplasms with fibrosis, multilineage dysplasia, hypolobulated/multinucleated megakaryocytes and increased CD34+ progenitors in the absence of JAK2, MPL and CALR gene mutations were significantly associated with a myelodysplastic phenotype. In myeloid disorders with marrow hypoplasia, granulocytic and/or megakaryocytic dysplasia, increased CD34+ progenitors and chromosomal abnormalities are consistent with a diagnosis of MDS. The proposed morphological score may be useful to evaluate the presence of dysplasia in cases without a clearly objective myelodysplastic phenotype. The integration of cytological and histological parameters improves the identification of MDS cases among myeloid disorders with fibrosis and hypocellularity.

    Funded by: Wellcome Trust: 088340

    Leukemia 2015;29;1;66-75

  • High-throughput analysis of gene essentiality and sporulation in Clostridium difficile.

    Dembek M, Barquist L, Boinett CJ, Cain AK, Mayho M, Lawley TD, Fairweather NF and Fagan RP

    Institute for Molecular Infection Biology, University of Würzburg, Würzburg, Germany.

    Unlabelled: Clostridium difficile is the most common cause of antibiotic-associated intestinal infections and a significant cause of morbidity and mortality. Infection with C. difficile requires disruption of the intestinal microbiota, most commonly by antibiotic usage. Therapeutic intervention largely relies on a small number of broad-spectrum antibiotics, which further exacerbate intestinal dysbiosis and leave the patient acutely sensitive to reinfection. Development of novel targeted therapeutic interventions will require a detailed knowledge of essential cellular processes, which represent attractive targets, and species-specific processes, such as bacterial sporulation. Our knowledge of the genetic basis of C. difficile infection has been hampered by a lack of genetic tools, although recent developments have made some headway in addressing this limitation. Here we describe the development of a method for rapidly generating large numbers of transposon mutants in clinically important strains of C. difficile. We validated our transposon mutagenesis approach in a model strain of C. difficile and then generated a comprehensive transposon library in the highly virulent epidemic strain R20291 (027/BI/NAP1) containing more than 70,000 unique mutants. Using transposon-directed insertion site sequencing (TraDIS), we have identified a core set of 404 essential genes, required for growth in vitro. We then applied this technique to the process of sporulation, an absolute requirement for C. difficile transmission and pathogenesis, identifying 798 genes that are likely to impact spore production. The data generated in this study will form a valuable resource for the community and inform future research on this important human pathogen.

    Importance: Clostridium difficile is a common cause of potentially fatal intestinal infections in hospital patients, particularly those who have been treated with antibiotics. Our knowledge of this bacterium has been hampered by a lack of tools for dissecting the organism. We have developed a method to study the function of every gene in the bacterium simultaneously. Using this tool, we have identified a set of 404 genes that are required for growth of the bacteria in the laboratory. C. difficile also produces a highly resistant spore that can survive in the environment for a long time and is a requirement for transmission of the bacteria between patients. We have applied our genetic tool to identify all of the genes required for production of a spore. All of these genes represent attractive targets for new drugs to treat infection.

    Funded by: Medical Research Council: G0800170, G1100100; Wellcome Trust: 089875/Z/09/A, 098051

    mBio 2015;6;2;e02383

  • Epigenome-wide association study (EWAS) of BMI, BMI change and waist circumference in African American adults identifies multiple replicated loci.

    Demerath EW, Guan W, Grove ML, Aslibekyan S, Mendelson M, Zhou YH, Hedman ÅK, Sandling JK, Li LA, Irvin MR, Zhi D, Deloukas P, Liang L, Liu C, Bressler J, Spector TD, North K, Li Y, Absher DM, Levy D, Arnett DK, Fornage M, Pankow JS and Boerwinkle E

    Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, MN 55454, USA,

    Obesity is an important component of the pathophysiology of chronic diseases. Identifying epigenetic modifications associated with elevated adiposity, including DNA methylation variation, may point to genomic pathways that are dysregulated in numerous conditions. The Illumina 450K Bead Chip array was used to assay DNA methylation in leukocyte DNA obtained from 2097 African American adults in the Atherosclerosis Risk in Communities (ARIC) study. Mixed-effects regression models were used to test the association of methylation beta value with concurrent body mass index (BMI) and waist circumference (WC), and BMI change, adjusting for batch effects and potential confounders. Replication using whole-blood DNA from 2377 White adults in the Framingham Heart Study and CD4+ T cell DNA from 991 Whites in the Genetics of Lipid Lowering Drugs and Diet Network Study was followed by testing using adipose tissue DNA from 648 women in the Multiple Tissue Human Expression Resource cohort. Seventy-six BMI-related probes, 164 WC-related probes and 8 BMI change-related probes passed the threshold for significance in ARIC (P < 1 × 10(-7); Bonferroni), including probes in the recently reported HIF3A, CPT1A and ABCG1 regions. Replication using blood DNA was achieved for 37 BMI probes and 1 additional WC probe. Sixteen of these also replicated in adipose tissue, including 15 novel methylation findings near genes involved in lipid metabolism, immune response/cytokine signaling and other diverse pathways, including LGALS3BP, KDM2B, PBX1 and BBS2, among others. Adiposity traits are associated with DNA methylation at numerous CpG sites that replicate across studies despite variation in tissue type, ethnicity and analytic approaches.

    Funded by: Department of Health; Intramural NIH HHS; NHLBI NIH HHS: 5RC2HL102419, N01-HC-25195, U01HL072524-04; PHS HHS: HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, HHSN268201100012C; Wellcome Trust

    Human molecular genetics 2015;24;15;4464-79

  • The utility of transposon mutagenesis for cancer studies in the era of genome editing.

    DeNicola GM, Karreth FA, Adams DJ and Wong CC

    Meyer Cancer Center, Weill Cornell Medical College, New York, NY, 10021, USA.

    The use of transposons as insertional mutagens to identify cancer genes in mice has generated a wealth of information over the past decade. Here, we discuss recent major advances in transposon-mediated insertional mutagenesis screens and compare this technology with other screening strategies.

    Funded by: Cancer Research UK; Wellcome Trust

    Genome biology 2015;16;229

  • Cardiomyocytes from human pluripotent stem cells: From laboratory curiosity to industrial biomedical platform.

    Denning C, Borgdorff V, Crutchley J, Firth KS, George V, Kalra S, Kondrashov A, Hoang MD, Mosqueira D, Patel A, Prodanov L, Rajamohan D, Skarnes WC, Smith JG and Young LE

    Department of Stem Cell Biology, Centre for Biomolecular Sciences, University of Nottingham, NG7 2RD, United Kingdom. Electronic address:

    Cardiomyocytes from human pluripotent stem cells (hPSCs-CMs) could revolutionise biomedicine. Global burden of heart failure will soon reach USD $90bn, while unexpected cardiotoxicity underlies 28% of drug withdrawals. Advances in hPSC isolation, Cas9/CRISPR genome engineering and hPSC-CM differentiation have improved patient care, progressed drugs to clinic and opened a new era in safety pharmacology. Nevertheless, predictive cardiotoxicity using hPSC-CMs contrasts from failure to almost total success. Since this likely relates to cell immaturity, efforts are underway to use biochemical and biophysical cues to improve many of the ~30 structural and functional properties of hPSC-CMs towards those seen in adult CMs. Other developments needed for widespread hPSC-CM utility include subtype specification, cost reduction of large scale differentiation and elimination of the phenotyping bottleneck. This review will consider these factors in the evolution of hPSC-CM technologies, as well as their integration into high content industrial platforms that assess structure, mitochondrial function, electrophysiology, calcium transients and contractility. This article is part of a Special Issue entitled: Cardiomyocyte Biology: Integration of Developmental and Environmental Cues in the Heart edited by Marcus Schaub and Hughes Abriel.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/E006159/1; British Heart Foundation: PG/09/027/27141; Medical Research Council: G0801098, G113/30; National Centre for the Replacement, Refinement and Reduction of Animals in Research: NC/K000225/1

    Biochimica et biophysica acta 2015

  • WAC loss-of-function mutations cause a recognisable syndrome characterised by dysmorphic features, developmental delay and hypotonia and recapitulate 10p11.23 microdeletion syndrome.

    DeSanto C, D'Aco K, Araujo GC, Shannon N, DDD Study, Vernon H, Rahrig A, Monaghan KG, Niu Z, Vitazka P, Dodd J, Tang S, Manwaring L, Martir-Negron A, Schnur RE, Juusola J, Schroeder A, Pan V, Helbig KL, Friedman B and Shinawi M

    Division of Genetics and Genomic Medicine, Department of Pediatrics, Washington University School of Medicine, St Louis, Missouri, USA.

    Background: Rare de novo mutations have been implicated as a significant cause of idiopathic intellectual disability. Large deletions encompassing 10p11.23 have been implicated in developmental delay, behavioural abnormalities and dysmorphic features, but the genotype-phenotype correlation was not delineated. Mutations in WAC have been recently reported in large screening cohorts of patients with intellectual disability or autism, but no full phenotypic characterisation was described.

    Methods: Clinical and molecular characterisation of six patients with loss-of-function WAC mutations identified by whole exome sequencing was performed. Clinical data were obtained by retrospective chart review, parental interviews, direct patient interaction and formal neuropsychological evaluation.

    Results: Five heterozygous de novo WAC mutations were identified in six patients. Three of the mutations were nonsense, and two were frameshift; all are predicted to cause loss of function either through nonsense-mediated mRNA decay or protein truncation. Clinical findings included developmental delay (6/6), hypotonia (6/6), behavioural problems (5/6), eye abnormalities (5/6), constipation (5/6), feeding difficulties (4/6), seizures (2/6) and sleep problems (2/6). All patients exhibited common dysmorphic features, including broad/prominent forehead, synophrys and/or bushy eyebrows, depressed nasal bridge and bulbous nasal tip. Posteriorly rotated ears, hirsutism, deep-set eyes, thin upper lip, inverted nipples, hearing loss and branchial cleft anomalies were also noted.

    Conclusions: Our case series show that loss-of-function mutations in WAC cause a recognisable genetic syndrome characterised by a neurocognitive phenotype and facial dysmorphism. Our data highly suggest that WAC haploinsufficiency is responsible for most of the phenotypic features associated with deletions encompassing 10p11.23.

    Funded by: Wellcome Trust: WT098051

    Journal of medical genetics 2015;52;11;754-61

  • Uncovering the genomic heterogeneity of multifocal breast cancer.

    Desmedt C, Fumagalli D, Pietri E, Zoppoli G, Brown D, Nik-Zainal S, Gundem G, Rothé F, Majjaj S, Garuti A, Carminati E, Loi S, Van Brussel T, Boeckx B, Maetens M, Mudie L, Vincent D, Kheddoumi N, Serra L, Massa I, Ballestrero A, Amadori D, Salgado R, de Wind A, Lambrechts D, Piccart M, Larsimont D, Campbell PJ and Sotiriou C

    Breast Cancer Translational Research Laboratory, Université Libre de Bruxelles, Institut Jules Bordet, Boulevard de Waterloo 121, Brussels, Belgium.

    Multifocal breast cancer (MFBC), defined as multiple synchronous unilateral lesions of invasive breast cancer, is relatively frequent and has been associated with more aggressive features than unifocal cancer. Here, we aimed to investigate the genomic heterogeneity between MFBC lesions sharing similar histopathological parameters. Characterization of different lesions from 36 patients with ductal MFBC involved the identification of non-silent coding mutations in 360 protein-coding genes (171 tumour and 36 matched normal samples). We selected only patients with lesions presenting the same grade, ER, and HER2 status. Mutations were classified as 'oncogenic' in the case of recurrent substitutions reported in COSMIC or truncating mutations affecting tumour suppressor genes. All mutations identified in a given patient were further interrogated in all samples from that patient through deep resequencing using an orthogonal platform. Whole-genome rearrangement screen was further conducted in 8/36 patients. Twenty-four patients (67%) had substitutions/indels shared by all their lesions, of which 11 carried the same mutations in all lesions, and 13 had lesions with both common and private mutations. Three-quarters of those 24 patients shared oncogenic variants. The remaining 12 patients (33%) did not share any substitution/indels, with inter-lesion heterogeneity observed for oncogenic mutation(s) in genes such as PIK3CA, TP53, GATA3, and PTEN. Genomically heterogeneous lesions tended to be further apart in the mammary gland than homogeneous lesions. Genome-wide analyses of a limited number of patients identified a common somatic background in all studied MFBCs, including those with no mutation in common between the lesions. To conclude, as the number of molecular targeted therapies increases and trials driven by genomic screening are ongoing, our findings highlight the presence of genomic inter-lesion heterogeneity in one-third, despite similar pathological features. This implies that deeper molecular characterization of all MFBC lesions is warranted for the adequate management of those cancers.

    The Journal of pathology 2015;236;4;457-66

  • G6PD gene variants and its association with malaria in a Sri Lankan population.

    Dewasurendra RL, Rockett KA, Fernando SD, Carter R, Kwiatkowski DP, Karunaweera ND and MalariaGEN Consortium

    Department of Parasitology, Faculty of Medicine, University of Colombo, 25, Kynsey Road, Colombo 08, Sri Lanka.

    Background: Glucose-6-phosphate dehydrogenase (G6PD) is an enzyme that plays an important role in many cellular functions. Deficiency of this enzyme results from point mutations in the coding region of the G6PD gene. G6PD-deficiency is important in malaria, as certain anti-malarial drugs could induce haemolysis in such patients and mutations in this gene may influence the susceptibility or resistance to the disease. Detailed information on genetic variations in the G6PD gene for Sri Lankan populations is yet to be revealed. This study describes a set of G6PD mutations present in a Sri Lankan population and their association with uncomplicated malaria.

    Methods: DNA was extracted from 1,051 individuals. Sixty-eight SNPs in the region of the G6PD gene were genotyped. A database created during the 1992-1993 malaria epidemic for the same individuals was used to assess the associations between the G6PD SNPs and parasite density or disease severity of uncomplicated malaria infections. Linkage disequilibrium for SNPs and haplotype structures were identified.

    Results: Seventeen genetic variants were polymorphic in this population. The mutant allele was the major allele in 9 SNPs. Common G6PD variants already described in Asians or South-Asians seemed to be absent or rare in this population. Both the severity of disease in uncomplicated malaria infections and parasitaemia were significantly lower in males infected with Plasmodium falciparum carrying the ancestral allele of rs915942 compared to those carrying the mutant allele. The parasite density of males infected with P. falciparum was significantly lower also in those who possessed the mutant alleles of rs5986877, rs7879049 and rs7053878. Two haplotype blocks were identified, where the recombination rates were higher in males with no history of malaria when compared to those who have experienced the disease in the past.

    Conclusions: This is the most detailed survey of G6PD SNPs in a Sri Lankan population undertaken so far that enabled novel description of single nucleotide polymorphisms within the G6PD gene. A few of these genetic variations identified, demonstrated a tendency to be associated with either disease severity or parasite density in uncomplicated disease in males. Known G6PD gene polymorphisms already described from elsewhere were either absent or rare in the local study population.

    Funded by: Medical Research Council: G0600230, G0600718, MR/M006212/1; Wellcome Trust: 075491/Z04, 090532/Z/09/Z), 090770, 090770/Z/09/Z, WT077383/Z/05/Z

    Malaria journal 2015;14;93

  • Cyclic Regulation of Sensory Perception by a Female Hormone Alters Behavior.

    Dey S, Chamero P, Pru JK, Chien MS, Ibarra-Soria X, Spencer KR, Logan DW, Matsunami H, Peluso JJ and Stowers L

    Department of Molecular and Cellular Neuroscience, The Scripps Research Institute, La Jolla, CA 92037, USA.

    Females may display dramatically different behavior depending on their state of ovulation. This is thought to occur through sex-specific hormones acting on behavioral centers in the brain. Whether incoming sensory activity also differs across the ovulation cycle to alter behavior has not been investigated. Here, we show that female mouse vomeronasal sensory neurons (VSNs) are temporarily and specifically rendered "blind" to a subset of male-emitted pheromone ligands during diestrus yet fully detect and respond to the same ligands during estrus. VSN silencing occurs through the action of the female sex-steroid progesterone. Not all VSNs are targeted for silencing; those detecting cat ligands remain continuously active irrespective of the estrous state. We identify the signaling components that account for the capacity of progesterone to target specific subsets of male-pheromone responsive neurons for inactivation. These findings indicate that internal physiology can selectively and directly modulate sensory input to produce state-specific behavior. PAPERCLIP.

    Funded by: NCRR NIH HHS: R21 RR030264; NICHD NIH HHS: T32 HD040372; NIDCD NIH HHS: DC006885, DC009413, DC010857, DC012095, R01 DC006885, R01 DC009413, R01 DC010857, R01 DC012095; NIGMS NIH HHS: T32 GM007754; Wellcome Trust: 098051

    Cell 2015;161;6;1334-44

  • A Synergistic Interaction between Chk1- and MK2 Inhibitors in KRAS-Mutant Cancer.

    Dietlein F, Kalb B, Jokic M, Noll EM, Strong A, Tharun L, Ozretić L, Künstlinger H, Kambartel K, Randerath WJ, Jüngst C, Schmitt A, Torgovnick A, Richters A, Rauh D, Siedek F, Persigehl T, Mauch C, Bartkova J, Bradley A, Sprick MR, Trumpp A, Rad R, Saur D, Bartek J, Wolf J, Büttner R, Thomas RK and Reinhardt HC

    Department I of Internal Medicine, University Hospital Cologne, Weyertal 115B, 50931 Cologne, Germany; CECAD, University of Cologne, Weyertal 115B, 50931 Cologne, Germany. Electronic address:

    KRAS is one of the most frequently mutated oncogenes in human cancer. Despite substantial efforts, no clinically applicable strategy has yet been developed to effectively treat KRAS-mutant tumors. Here, we perform a cell-line-based screen and identify strong synergistic interactions between cell-cycle checkpoint-abrogating Chk1- and MK2 inhibitors, specifically in KRAS- and BRAF-driven cells. Mechanistically, we show that KRAS-mutant cancer displays intrinsic genotoxic stress, leading to tonic Chk1- and MK2 activity. We demonstrate that simultaneous Chk1- and MK2 inhibition leads to mitotic catastrophe in KRAS-mutant cells. This actionable synergistic interaction is validated using xenograft models, as well as distinct Kras- or Braf-driven autochthonous murine cancer models. Lastly, we show that combined checkpoint inhibition induces apoptotic cell death in KRAS- or BRAF-mutant tumor cells directly isolated from patients. These results strongly recommend simultaneous Chk1- and MK2 inhibition as a therapeutic strategy for the treatment of KRAS- or BRAF-driven cancers.

    Cell 2015;162;1;146-59

  • Post-translational protein modifications in malaria parasites.

    Doerig C, Rayner JC, Scherf A and Tobin AB

    Department of Microbiology, Faculty of Biomedical and Psychological Sciences, Monash University, Wellington Road, Clayton, Victoria 3800, Australia.

    Post-translational modifications play crucial parts in regulating protein function and thereby control several fundamental aspects of eukaryotic biology, including cell signalling, protein trafficking, epigenetic control of gene expression, cell-cell interactions, and cell proliferation and differentiation. In this Review, we discuss protein modifications that have been shown to have a key role in malaria parasite biology and pathogenesis. We focus on phosphorylation, acetylation, methylation and lipidation. We provide an overview of the biological significance of these modifications and discuss prospects and progress in antimalarial drug discovery based on the inhibition of the enzymes that mediate these modifications.

    Funded by: Wellcome Trust: 098051

    Nature reviews. Microbiology 2015;13;3;160-72

  • Disruption of SF3B1 results in deregulated expression and splicing of key genes and pathways in myelodysplastic syndrome hematopoietic stem and progenitor cells.

    Dolatshad H, Pellagatti A, Fernandez-Mercado M, Yip BH, Malcovati L, Attwood M, Przychodzen B, Sahgal N, Kanapin AA, Lockstone H, Scifo L, Vandenberghe P, Papaemmanuil E, Smith CW, Campbell PJ, Ogawa S, Maciejewski JP, Cazzola M, Savage KI and Boultwood J

    LLR Molecular Haematology Unit, NDCLS, RDM, University of Oxford, Oxford, UK.

    The splicing factor SF3B1 is the most commonly mutated gene in the myelodysplastic syndrome (MDS), particularly in patients with refractory anemia with ring sideroblasts (RARS). We investigated the functional effects of SF3B1 disruption in myeloid cell lines: SF3B1 knockdown resulted in growth inhibition, cell cycle arrest and impaired erythroid differentiation and deregulation of many genes and pathways, including cell cycle regulation and RNA processing. MDS is a disorder of the hematopoietic stem cell and we thus studied the transcriptome of CD34(+) cells from MDS patients with SF3B1 mutations using RNA sequencing. Genes significantly differentially expressed at the transcript and/or exon level in SF3B1 mutant compared with wild-type cases include genes that are involved in MDS pathogenesis (ASXL1 and CBL), iron homeostasis and mitochondrial metabolism (ALAS2, ABCB7 and SLC25A37) and RNA splicing/processing (PRPF8 and HNRNPD). Many genes regulated by a DNA damage-induced BRCA1-BCLAF1-SF3B1 protein complex showed differential expression/splicing in SF3B1 mutant cases. This is the first study to determine the target genes of SF3B1 mutation in MDS CD34(+) cells. Our data indicate that SF3B1 has a critical role in MDS by affecting the expression and splicing of genes involved in specific cellular processes/pathways, many of which are relevant to the known RARS pathophysiology, suggesting a causal link.

    Funded by: Medical Research Council: G0900747 91070; Wellcome Trust: 088340, 090532/Z/09/Z

    Leukemia 2015;29;5;1092-103

  • Leucophore identity is more gold than silver.

    Dooley CM

    Pigment cell & melanoma research 2015;28;2;131

  • A PfRH5-based vaccine is efficacious against heterologous strain blood-stage Plasmodium falciparum infection in aotus monkeys.

    Douglas AD, Baldeviano GC, Lucas CM, Lugo-Roman LA, Crosnier C, Bartholdson SJ, Diouf A, Miura K, Lambert LE, Ventocilla JA, Leiva KP, Milne KH, Illingworth JJ, Spencer AJ, Hjerrild KA, Alanine DG, Turner AV, Moorhead JT, Edgel KA, Wu Y, Long CA, Wright GJ, Lescano AG and Draper SJ

    Jenner Institute, University of Oxford, Oxford OX3 7DQ, UK. Electronic address:

    Antigenic diversity has posed a critical barrier to vaccine development against the pathogenic blood-stage infection of the human malaria parasite Plasmodium falciparum. To date, only strain-specific protection has been reported by trials of such vaccines in nonhuman primates. We recently showed that P. falciparum reticulocyte binding protein homolog 5 (PfRH5), a merozoite adhesin required for erythrocyte invasion, is highly susceptible to vaccine-inducible strain-transcending parasite-neutralizing antibody. In vivo efficacy of PfRH5-based vaccines has not previously been evaluated. Here, we demonstrate that PfRH5-based vaccines can protect Aotus monkeys against a virulent vaccine-heterologous P. falciparum challenge and show that such protection can be achieved by a human-compatible vaccine formulation. Protection was associated with anti-PfRH5 antibody concentration and in vitro parasite-neutralizing activity, supporting the use of this in vitro assay to predict the in vivo efficacy of future vaccine candidates. These data suggest that PfRH5-based vaccines have potential to achieve strain-transcending efficacy in humans.

    Funded by: FIC NIH HHS: 2D43 TW007393, D43 TW007393; Intramural NIH HHS; Medical Research Council: G1000527; Wellcome Trust: 089455/2/09/z, 092873/z/10/z, 098051

    Cell host & microbe 2015;17;1;130-9

  • Comparison of genomic signatures of selection on Plasmodium falciparum between different regions of a country with high malaria endemicity.

    Duffy CW, Assefa SA, Abugri J, Amoako N, Owusu-Agyei S, Anyorigiya T, MacInnis B, Kwiatkowski DP, Conway DJ and Awandare GA

    Pathogen Molecular Biology Department, London School of Hygiene and Tropical Medicine, London, WC1E 7HT, UK.

    Background: Genome wide sequence analyses of malaria parasites from widely separated areas of the world have identified contrasting population structures and signatures of selection. To compare relatively closely situated but ecologically contrasting regions within an endemic African country, population samples of Plasmodium falciparum clinical isolates were collected in Ghana from Kintampo in the central forest-savannah area, and Navrongo in a drier savannah area ~350 km to the north with more seasonally-restricted transmission. Parasite DNA was sequenced and paired-end reads mapped to the P. falciparum reference genome.

    Results: High coverage genome wide sequence data for 85 different clinical isolates enabled analysis of 121,712 single nucleotide polymorphisms (SNPs). The local populations had similar proportions of mixed genotype infections, similar SNP allele frequency distributions, and eleven chromosomal regions had elevated integrated haplotype scores (|iHS|) in both. A between-population Rsb metric comparing extended haplotype homozygosity indicated a stronger signal within Kintampo for one of these regions (on chromosome 14) and in Navrongo for two of these regions (on chromosomes 10 and 13). At least one gene in each of these identified regions is a potential target of locally varying selection. The candidates include genes involved in parasite development in mosquitoes, members of variant-expressed multigene families, and a leading vaccine-candidate target of immunity.

    Conclusions: Against a background of very similar population structure and selection signatures in the P. falciparum populations of Ghana, three narrow genomic regions showed evidence indicating local differences in historical timing or intensity of selection. Sampling of closely situated populations across heterogeneous environments has potential to refine the mapping of important loci under temporally or spatially varying selection.

    Funded by: Medical Research Council: G0600718, G1100123, MR/M006212/1; Wellcome Trust: 090770, 090770/Z/09/Z

    BMC genomics 2015;16;527

  • Avianbase: a community resource for bird genomics.

    Eöry L, Gilbert MT, Li C, Li B, Archibald A, Aken BL, Zhang G, Jarvis E, Flicek P and Burt DW

    Giving access to sequence and annotation data for genome assemblies is important because, while facilitating research, it places both assembly and annotation quality under scrutiny, resulting in improvements to both. Therefore we announce Avianbase, a resource for bird genomics, which provides access to data released by the Avian Phylogenomics Consortium.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/I025328/1, BB/I025360/2, BB/I025506/1; Wellcome Trust: 095908, 098051

    Genome biology 2015;16;21

  • Genetic and phenotypic differentiation of an Andean intermediate altitude population.

    Eichstaedt CA, Antão T, Cardona A, Pagani L, Kivisild T and Mormina M

    Division of Biological Anthropology, University of Cambridge, Cambridge, Cambridgeshire, UK Centre for Pulmonary Hypertension, Thoraxclinic at the University Hospital Heidelberg, Heidelberg, Baden-Württemberg, Germany

    Highland populations living permanently under hypobaric hypoxia have been subject of extensive research because of the relevance of their physiological adaptations for the understanding of human health and disease. In this context, what is considered high altitude is a matter of interpretation and while the adaptive processes at high altitude (above 3000 m) are well documented, the effects of moderate altitude (below 3000 m) on the phenotype are less well established. In this study, we compare physiological and anthropometric characteristics as well as genetic variations in two Andean populations: the Calchaquíes (2300 m) and neighboring Collas (3500 m). We compare their phenotype and genotype to the sea-level Wichí population. We measured physiological (heart rate, oxygen saturation, respiration rate, and lung function) as well as anthropometric traits (height, sitting height, weight, forearm, and tibia length). We conducted genome-wide genotyping on a subset of the sample (n = 74) and performed various scans for positive selection. At the phenotypic level (n = 179), increased lung capacity stood out in both Andean groups, whereas a growth reduction in distal limbs was only observed at high altitude. At the genome level, Calchaquíes revealed strong signals around PRKG1, suggesting that the nitric oxide pathway may be a target of selection. PRKG1 was highlighted by one of four selection tests among the top five genes using the population branch statistic. Selection tests results of Collas were reported previously. Overall, our study shows that some phenotypic and genetic differentiation occurs at intermediate altitude in response to moderate lifelong selection pressures.

    Physiological reports 2015;3;5

  • Positive selection of AS3MT to arsenic water in Andean populations.

    Eichstaedt CA, Antao T, Cardona A, Pagani L, Kivisild T and Mormina M

    Division of Biological Anthropology, University of Cambridge, Cambridge CB2 1QH, Cambridgeshire, UK; Center for Pulmonary Hypertension, Thoraxclinic at the University Hospital Heidelberg, 69126 Heidelberg, Baden-Württemberg, Germany. Electronic address:

    Arsenic is a carcinogen associated with skin lesions and cardiovascular diseases. The Colla population from the Puna region in Northwest Argentinean is exposed to levels of arsenic in drinking water exceeding the recommended maximum by a factor of 20. Yet, they thrive in this challenging environment since thousands of years and therefore we hypothesize strong selection signatures in genes involved in arsenic metabolism. We analyzed genome-wide genotype data for 730,000 loci in 25 Collas, considering 24 individuals of the neighbouring Calchaquíes and 24 Wichí from the Gran Chaco region in the Argentine province of Salta as control groups. We identified a strong signal of positive selection in the main arsenic methyltransferase AS3MT gene, which has been previously associated with lower concentrations of the most toxic product of arsenic metabolism monomethylarsonic acid. This study confirms recent studies reporting selection signals in the AS3MT gene albeit using different samples, tests and control populations.

    Mutation research 2015;780;97-102

  • Emergent and evolving antimicrobial resistance cassettes in community-associated fusidic acid and meticillin-resistant Staphylococcus aureus.

    Ellington MJ, Reuter S, Harris SR, Holden MT, Cartwright EJ, Greaves D, Gerver SM, Hope R, Brown NM, Török ME, Parkhill J, Köser CU and Peacock SJ

    Public Health England, Microbiology Services Division, Addenbrooke's Hospital, Hills Road, Cambridge CB2 0QW, UK; Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. Electronic address:

    Fusidic acid is a topical and systemic antimicrobial used for the treatment of staphylococcal infections in hospitals and the community. Sales of fusidic acid and resistance rates among meticillin-resistant Staphylococcus aureus (MRSA) doubled between 1990 and 2001. For the following decade, fusidic acid resistance rates among isolates from Addenbrooke's Hospital (Cambridge, UK) were compared with national resistance rates from MRSA bacteraemia surveillance data and with antimicrobial sales data. Sales of fusidic acid remained relatively constant between 2002 and 2012, whilst fusidic acid resistance increased two- and four-fold in MRSA bacteraemias nationally and in MRSA isolates from Cambridge, respectively. A subgroup of MRSA resistant only to fusidic acid increased after 2006 by 5-fold amongst bacteraemias nationally and 17-fold (to 7.7% in 2012) amongst Cambridge MRSA isolates. All of the available local isolates from 2011 to 2012 (n=23) were acquired in the community, were not related epidemiologically and belonged to multilocus sequence typing (MLST) groups ST1, 5, 8, 45 or 149 as revealed from analysis of whole-genome sequence data. All harboured the fusC gene on one of six distinct staphylococcal cassette chromosome (SCC) elements, four of which were dual-resistance chimeras that encoded β-lactam and fusidic acid resistance. In summary, fusidic acid-resistant MRSA increased in prevalence during the 2000s with notable rises after 2006. The development of chimeric cassettes that confer dual resistance to β-lactams and fusidic acid demonstrates that the genetics underpinning resistance in community-associated MRSA are evolving.

    Funded by: Biotechnology and Biological Sciences Research Council; Chief Scientist Office; Department of Health; Medical Research Council; Wellcome Trust

    International journal of antimicrobial agents 2015;45;5;477-84

  • A minimally invasive, lentiviral based method for the rapid and sustained genetic manipulation of renal tubules.

    Espana-Agusti J, Tuveson DA, Adams DJ and Matakidou A

    Department of Oncology, University of Cambridge, CRUK Cambridge institute, Cambridge, CB2 0RE, UK.

    The accelerated discovery of disease-related genes emerging from genomic studies has strained the capacity of traditional genetically engineered mouse models (GEMMs) to provide in-vivo validation. Direct, somatic, genetic engineering approaches allow for accelerated and flexible genetic manipulation and represent an attractive alternative to GEMMs. In this study we investigated the feasibility, safety and efficiency of a minimally invasive, lentiviral based approach for the sustained in-vivo modification of renal tubular epithelial cells. Using ultrasound guidance, reporter vectors were directly injected into the mouse renal parenchyma. We observed transgene expression confined to the renal cortex (specifically proximal and distal tubules) and sustained beyond 2 months post injection. Furthermore, we demonstrate the ability of this methodology to induce long-term, in-vivo knockdown of candidate genes either through somatic recombination of floxed alleles or by direct delivery of specific shRNA sequences. This study demonstrates that ultrasound-guided injection of lentiviral vectors provides a safe and efficient method for the genetic manipulation of renal tubules, representing a quick and versatile alternative to GEMMs for the functional characterisation of disease-related genes.

    Funded by: Cancer Research UK: 12177, 13031, C37839/A12177

    Scientific reports 2015;5;11061

  • Structural variation on the human Y chromosome from population-scale resequencing.

    Espinosa JR, Ayub Q, Chen Y, Xue Y and Tyler-Smith C

    Chris Tyler-Smith,The Wellcome Trust Sanger Institute, Hinxton, Cambs. CB10 1SA, UK,

    Aim: To investigate the information about Y-structural variants (SVs) in the general population that could be obtained by low-coverage whole-genome sequencing.

    Methods: We investigated SVs on the male-specific portion of the Y chromosome in the 70 individuals from Africa, Europe, or East Asia sequenced as part of the 1000 Genomes Pilot project, using data from this project and from additional studies on the same samples. We applied a combination of read-depth and read-pair methods to discover candidate Y-SVs, followed by validation using information from the literature, independent sequence and single nucleotide polymorphism-chip data sets, and polymerase chain reaction experiments.

    Results: We validated 19 Y-SVs, 2 of which were novel. Non-reference allele counts ranged from 1 to 64. The regions richest in variation were the heterochromatic segments near the centromere or the DYZ19 locus, followed by the ampliconic regions, but some Y-SVs were also present in the X-transposed and X-degenerate regions. In all, 5 of the 27 protein-coding gene families on the Y chromosome varied in copy number.

    Conclusions: We confirmed that Y-SVs were readily detected from low-coverage sequence data and were abundant on the chromosome. We also reported both common and rare Y-SVs that are novel.

    Funded by: Wellcome Trust: 098051

    Croatian medical journal 2015;56;3;194-207

  • Genetic Stabilization of the Drug-Resistant PMEN1 Pneumococcus Lineage by Its Distinctive DpnIII Restriction-Modification System.

    Eutsey RA, Powell E, Dordel J, Salter SJ, Clark TA, Korlach J, Ehrlich GD and Hiller NL

    Center of Excellence in Biofilm Research, Allegheny Health Network, Pittsburgh, Pennsylvania, USA.

    Unlabelled: The human pathogen Streptococcus pneumoniae (pneumococcus) exhibits a high degree of genomic diversity and plasticity. Isolates with high genomic similarity are grouped into lineages that undergo homologous recombination at variable rates. PMEN1 is a pandemic, multidrug-resistant lineage. Heterologous gene exchange between PMEN1 and non-PMEN1 isolates is directional, with extensive gene transfer from PMEN1 strains and only modest transfer into PMEN1 strains. Restriction-modification (R-M) systems can restrict horizontal gene transfer, yet most pneumococcal strains code for either the DpnI or DpnII R-M system and neither limits homologous recombination. Our comparative genomic analysis revealed that PMEN1 isolates code for DpnIII, a third R-M system syntenic to the other Dpn systems. Characterization of DpnIII demonstrated that the endonuclease cleaves unmethylated double-stranded DNA at the tetramer sequence 5' GATC 3', and the cognate methylase is a C5 cytosine-specific DNA methylase. We show that DpnIII decreases the frequency of recombination under in vitro conditions, such that the number of transformants is lower for strains transformed with unmethylated DNA than in those transformed with cognately methylated DNA. Furthermore, we have identified two PMEN1 isolates where the DpnIII endonuclease is disrupted, and phylogenetic work by Croucher and colleagues suggests that these strains have accumulated genomic differences at a higher rate than other PMEN1 strains. We propose that the R-M locus is a major determinant of genetic acquisition; the resident R-M system governs the extent of genome plasticity.

    Importance: Pneumococcus is one of the most important community-acquired bacterial pathogens. Pneumococcal strains can develop resistance to antibiotics and to serotype vaccines by acquiring genes from other strains or species. Thus, genomic plasticity is associated with strain adaptability and pneumococcal success. PMEN1 is a widespread and multidrug-resistant highly pathogenic pneumococcal lineage, which has evolved over the past century and displays a relatively stable genome. In this study, we characterize DpnIII, a restriction-modification (R-M) system that limits recombination. DpnIII is encountered in the PMEN1 lineage, where it replaces other R-M systems that do not decrease plasticity. Our hypothesis is that this genomic region, where different pneumococcal lineages code for variable R-M systems, plays a role in the fine-tuning of the extent of genomic plasticity. It is possible that well-adapted lineages such as PMEN1 have a mechanism to increase genomic stability, rather than foster genomic plasticity.

    Funded by: NIAID NIH HHS: R01 AI080935; NIDCD NIH HHS: R00-DC-011322; Wellcome Trust: 098051

    mBio 2015;6;3;e00173

  • Deubiquitinase MYSM1 Is Essential for Normal Fetal Liver Hematopoiesis and for the Maintenance of Hematopoietic Stem Cells in Adult Bone Marrow.

    Förster M, Belle JI, Petrov JC, Ryder EJ, Clare S and Nijnik A

    1 Department of Physiology, McGill University , Montreal, Quebec, Canada .

    MYSM1 is a chromatin-interacting deubiquitinase recently shown to be essential for hematopoietic stem cell (HSC) function and normal progression of hematopoiesis in both mice and humans. However, it remains unknown whether the loss of function in Mysm1-deficient HSCs is due to the essential role of MYSM1 in establishing the HSC pool during development or due to a continuous requirement for MYSM1 in adult HSCs. In this study we, for the first time, address these questions first, by performing a detailed analysis of hematopoiesis in the fetal livers of Mysm1-knockout mice, and second, by assessing the effects of an inducible Mysm1 ablation on adult HSC functions. Our data indicate that MYSM1 is essential for normal HSC function and progression of hematopoiesis in the fetal liver. Furthermore, the inducible knockout model demonstrates a continuous requirement for MYSM1 to maintain HSC functions and antagonize p53 activation in adult bone marrow. These studies advance our understanding of the role of MYSM1 in HSC biology, and provide new insights into the human hematopoietic failure syndrome resulting from MYSM1 deficiency.

    Funded by: Canadian Institutes of Health Research: 123403; Wellcome Trust: 098051

    Stem cells and development 2015;24;16;1865-77

  • Rapid emergence of multidrug resistant, H58-lineage Salmonella typhi in Blantyre, Malawi.

    Feasey NA, Gaskell K, Wong V, Msefula C, Selemani G, Kumwenda S, Allain TJ, Mallewa J, Kennedy N, Bennett A, Nyirongo JO, Nyondo PA, Zulu MD, Parkhill J, Dougan G, Gordon MA and Heyderman RS

    Malawi Liverpool Wellcome Trust Clinical Research Programme, University of Malawi College of Medicine, Blantyre, Malawi; Liverpool School of Tropical Medicine, Liverpool, United Kingdom; Wellcome Trust Sanger Institute, Hinxton, United Kingdom.

    Introduction: Between 1998 and 2010, S. Typhi was an uncommon cause of bloodstream infection (BSI) in Blantyre, Malawi and it was usually susceptible to first-line antimicrobial therapy. In 2011 an increase in a multidrug resistant (MDR) strain was detected through routine bacteriological surveillance conducted at Queen Elizabeth Central Hospital (QECH).

    Methods: Longitudinal trends in culture-confirmed Typhoid admissions at QECH were described between 1998-2014. A retrospective review of patient cases notes was conducted, focusing on clinical presentation, prevalence of HIV and case-fatality. Isolates of S. Typhi were sequenced and the phylogeny of Typhoid in Blantyre was reconstructed and placed in a global context.

    Results: Between 1998-2010, there were a mean of 14 microbiological diagnoses of Typhoid/year at QECH, of which 6.8% were MDR. This increased to 67 in 2011 and 782 in 2014 at which time 97% were MDR. The disease predominantly affected children and young adults (median age 11 [IQR 6-21] in 2014). The prevalence of HIV in adult patients was 16.7% [8/48], similar to that of the general population (17.8%). Overall, the case fatality rate was 2.5% (3/94). Complications included anaemia, myocarditis, pneumonia and intestinal perforation. 112 isolates were sequenced and the phylogeny demonstrated the introduction and clonal expansion of the H58 lineage of S. Typhi.

    Conclusions: Since 2011, there has been a rapid increase in the incidence of multidrug resistant, H58-lineage Typhoid in Blantyre. This is one of a number of reports of the re-emergence of Typhoid in Southern and Eastern Africa. There is an urgent need to understand the reservoirs and transmission of disease and how to arrest this regional increase.

    Funded by: Wellcome Trust: 098051, 101113/Z/13/Z084, WT092152MA

    PLoS neglected tropical diseases 2015;9;4;e0003748

  • Three Epidemics of Invasive Multidrug-Resistant Salmonella Bloodstream Infection in Blantyre, Malawi, 1998-2014.

    Feasey NA, Masesa C, Jassi C, Faragher EB, Mallewa J, Mallewa M, MacLennan CA, Msefula C, Heyderman RS and Gordon MA

    Liverpool School of Tropical Medicine, United Kingdom Malawi Liverpool Wellcome Trust Clinical Research Programme.

    Background: The Malawi Liverpool Wellcome Trust Clinical Research Programme (MLW) has routinely collected specimens for blood culture from febrile patients, and cerebrospinal fluid from patients with suspected meningitis, presenting to Queen Elizabeth Central Hospital (QECH), Blantyre, Malawi, since 1998.

    Methods: We present bloodstream infection (BSI) and meningitis surveillance data from 1998 to 2014. Automated blood culture, manual speciation, serotyping, and antimicrobial susceptibility testing were performed at MLW. Population data for minimum-incidence estimates in urban Blantyre were drawn from published estimates.

    Results: Between 1998 and 2014, 167,028 blood cultures were taken from adult and pediatric medical patients presenting to QECH; Salmonella Typhi was isolated on 2054 occasions (1.2%) and nontyphoidal Salmonella (NTS) serovars were isolated 10,139 times (6.1%), of which 8017 (79.1%) were Salmonella Typhimurium and 1608 (15.8%) were Salmonella Enteritidis. There were 392 cases of NTS meningitis and 9 cases of Salmonella Typhi meningitis. There have been 3 epidemics of Salmonella BSI in Blantyre; Salmonella Enteritidis from 1999 to 2002, Salmonella Typhimurium from 2002 to 2008, and Salmonella Typhi, which began in 2011 and was ongoing in 2014. Multidrug resistance has emerged in all 3 serovars and is seen in the overwhelming majority of isolates, while resistance to third-generation cephalosporins and fluoroquinolones is currently uncommon but has been identified.

    Conclusions: Invasive Salmonella disease in Malawi is dynamic and not clearly attributable to a single risk factor, although all 3 epidemics were associated with multidrug resistance. To inform nonvaccine and vaccine interventions, reservoirs of disease and modes of transmission require further investigation.

    Funded by: Wellcome Trust: 101113/Z/13/Z, WT092152MA

    Clinical infectious diseases : an official publication of the Infectious Diseases Society of America 2015;61 Suppl 4;S363-71

  • Partitioning heritability by functional annotation using genome-wide association summary statistics.

    Finucane HK, Bulik-Sullivan B, Gusev A, Trynka G, Reshef Y, Loh PR, Anttila V, Xu H, Zang C, Farh K, Ripke S, Day FR, ReproGen Consortium, Schizophrenia Working Group of the Psychiatric Genomics Consortium, RACI Consortium, Purcell S, Stahl E, Lindstrom S, Perry JR, Okada Y, Raychaudhuri S, Daly MJ, Patterson N, Neale BM and Price AL

    Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    Recent work has demonstrated that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. Here we analyze a broad set of functional elements, including cell type-specific elements, to estimate their polygenic contributions to heritability in genome-wide association studies (GWAS) of 17 complex diseases and traits with an average sample size of 73,599. To enable this analysis, we introduce a new method, stratified LD score regression, for partitioning heritability from GWAS summary statistics while accounting for linked markers. This new method is computationally tractable at very large sample sizes and leverages genome-wide information. Our findings include a large enrichment of heritability in conserved regions across many traits, a very large immunological disease-specific enrichment of heritability in FANTOM5 enhancers and many cell type-specific enrichments, including significant enrichment of central nervous system cell types in the heritability of body mass index, age at menarche, educational attainment and smoking behavior.

    Funded by: Medical Research Council: MC_U106179472, MC_UU_12015/2; NCI NIH HHS: R03 CA173785, R21 CA182821; NHGRI NIH HHS: R01 HG006399, U01 HG0070033; NIAMS NIH HHS: R01 AR063759; NIGMS NIH HHS: F32 GM106584, T32 GM007748, T32 GM007753; NIMH NIH HHS: R01 MH101244; Wellcome Trust: WT098051

    Nature genetics 2015;47;11;1228-35

  • The value of monitoring to control evolving populations.

    Fischer A, Vázquez-García I and Mustonen V

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom; and.

    Populations can evolve to adapt to external changes. The capacity to evolve and adapt makes successful treatment of infectious diseases and cancer difficult. Indeed, therapy resistance has become a key challenge for global health. Therefore, ideas of how to control evolving populations to overcome this threat are valuable. Here we use the mathematical concepts of stochastic optimal control to study what is needed to control evolving populations. Following established routes to calculate control strategies, we first study how a polymorphism can be maintained in a finite population by adaptively tuning selection. We then introduce a minimal model of drug resistance in a stochastically evolving cancer cell population and compute adaptive therapies. When decisions are in this manner based on monitoring the response of the tumor, this can outperform established therapy paradigms. For both case studies, we demonstrate the importance of high-resolution monitoring of the target population to achieve a given control objective, thus quantifying the intuition that to control, one must monitor.

    Funded by: Wellcome Trust: 097678, 098051

    Proceedings of the National Academy of Sciences of the United States of America 2015;112;4;1007-12

  • Divergent mitochondrial respiratory chains in phototrophic relatives of apicomplexan parasites.

    Flegontov P, Michálek J, Janouškovec J, Lai DH, Jirků M, Hajdušková E, Tomčala A, Otto TD, Keeling PJ, Pain A, Oborník M and Lukeš J

    Institute of Parasitology, Biology Centre, Czech Academy of Sciences, České Budějovice, Czech Republic Life Science Research Centre, Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czech Republic

    Four respiratory complexes and ATP-synthase represent central functional units in mitochondria. In some mitochondria and derived anaerobic organelles, a few or all of these respiratory complexes have been lost during evolution. We show that the respiratory chain of Chromera velia, a phototrophic relative of parasitic apicomplexans, lacks complexes I and III, making it a uniquely reduced aerobic mitochondrion. In Chromera, putative lactate:cytochrome c oxidoreductases are predicted to transfer electrons from lactate to cytochrome c, rendering complex III unnecessary. The mitochondrial genome of Chromera has the smallest known protein-coding capacity of all mitochondria, encoding just cox1 and cox3 on heterogeneous linear molecules. In contrast, another photosynthetic relative of apicomplexans, Vitrella brassicaformis, retains the same set of genes as apicomplexans and dinoflagellates (cox1, cox3, and cob).

    Funded by: Canadian Institutes of Health Research

    Molecular biology and evolution 2015;32;5;1115-31

  • Genetics in PSC: what do the "risk genes" teach us?

    Folseraas T, Liaskou E, Anderson CA and Karlsen TH

    Norwegian PSC Research Center, Department of Transplantation Medicine, Division of Cancer Medicine, Surgery and Transplantation, Oslo University Hospital Rikshospitalet, 4950 Nydalen, 0424, Oslo, Norway.

    A role of genetics in primary sclerosing cholangitis (PSC) development is now firmly established. A total of 16 risk genes have been reported at highly robust ("genome-wide") significance levels, and ongoing efforts suggest that the list will ultimately be considerably longer. Importantly, this genetic risk pool so far accounts for less than 10 % of an estimated overall PSC susceptibility. The relative importance of genetic versus environmental factors (including gene-gene and gene-environment interactions) in remaining aspects of PSC pathogenesis is unknown, and other study designs than genome-wide association studies are needed to explore these aspects. For some of the loci, e.g. HLA and FUT2, distinct interacting environmental factors may exist, and working from the genetic associations may prove one valid path for determining the specific nature of environmental triggers. So far the biological implications for PSC risk genes are typically merely hypothesized based on previously published literature, and there is therefore a strong need for dedicated translational studies to determine their roles within the specific disease context of PSC. Apparently, most risk loci seem to involve in a subset of biological pathways for which genetic associations exist in a multitude of immune-mediated diseases, accounting for both inflammatory bowel disease as well as prototypical autoimmunity. In the present article, we will survey the current knowledge on PSC genetics with a particular emphasis on the pathophysiological insight potentially gained from genetic risk loci involving in this profound immunogenetic pleiotropy.

    Clinical reviews in allergy & immunology 2015;48;2-3;154-64

  • Interaction of Salmonella enterica Serovar Typhimurium with Intestinal Organoids Derived from Human Induced Pluripotent Stem Cells.

    Forbester JL, Goulding D, Vallier L, Hannan N, Hale C, Pickard D, Mukhopadhyay S and Dougan G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.

    The intestinal mucosa forms the first line of defense against infections mediated by enteric pathogens such as salmonellae. Here we exploited intestinal "organoids" (iHOs) generated from human induced pluripotent stem cells (hIPSCs) to explore the interaction of Salmonella enterica serovar Typhimurium with iHOs. Imaging and RNA sequencing were used to analyze these interactions, and clear changes in transcriptional signatures were detected, including altered patterns of cytokine expression after the exposure of iHOs to bacteria. S. Typhimurium microinjected into the lumen of iHOs was able to invade the epithelial barrier, with many bacteria residing within Salmonella-containing vacuoles. An S. Typhimurium invA mutant defective in the Salmonella pathogenicity island 1 invasion apparatus was less capable of invading the iHO epithelium. Hence, we provide evidence that hIPSC-derived organoids are a promising model of the intestinal epithelium for assessing interactions with enteric pathogens.

    Funded by: Medical Research Council: G0701448, G0800784, G1000847; Wellcome Trust: 100891

    Infection and immunity 2015;83;7;2926-34

  • A flow cytometry-based method to simplify the analysis and quantification of protein association to chromatin in mammalian cells.

    Forment JV and Jackson SP

    1] The Wellcome Trust/Cancer Research UK (CRUK) Gurdon Institute, University of Cambridge, Cambridge, UK. [2] Department of Biochemistry, University of Cambridge, Cambridge, UK. [3] , University of Cambridge, Cambridge, UK.

    Protein accumulation on chromatin has traditionally been studied using immunofluorescence microscopy or biochemical cellular fractionation followed by western immunoblot analysis. As a way to improve the reproducibility of this kind of analysis, to make it easier to quantify and to allow a streamlined application in high-throughput screens, we recently combined a classical immunofluorescence microscopy detection technique with flow cytometry. In addition to the features described above, and by combining it with detection of both DNA content and DNA replication, this method allows unequivocal and direct assignment of cell cycle distribution of protein association to chromatin without the need for cell culture synchronization. Furthermore, it is relatively quick (takes no more than a working day from sample collection to quantification), requires less starting material compared with standard biochemical fractionation methods and overcomes the need for flat, adherent cell types that are required for immunofluorescence microscopy.

    Funded by: Cancer Research UK: 11224, A11224, A14492, C6/A11224, C6946/A14492; European Research Council: 268536; Wellcome Trust: 092096, WT092096

    Nature protocols 2015;10;9;1297-307

  • When two is not enough: a CtIP tetramer is required for DNA repair by Homologous Recombination.

    Forment JV, Jackson SP and Pellegrini L

    a The Gurdon Institute; University of Cambridge ; Cambridge , UK.

    Homologous recombination (HR) is central to the repair of double-strand DNA breaks that occur in S/G2 phases of the cell cycle. HR relies on the CtIP protein (Ctp1 in fission yeast, Sae2 in budding yeast) for resection of DNA ends, a key step in generating the 3'-DNA overhangs that are required for the HR strand-exchange reaction. Although much has been learned about the biological importance of CtIP in DNA repair, our mechanistic insight into its molecular functions remains incomplete. It has been recently discovered that CtIP and Ctp1 share a conserved tetrameric architecture that is mediated by their N-terminal domains and is critical for their function in HR. The specific arrangement of protein chains in the CtIP/Ctp1 tetramer indicates that an ability to bridge DNA ends might be an important feature of CtIP/Ctp1 function, establishing an intriguing similarity with the known ability of the MRE11-RAD50-NBS1 complex to link DNA ends. Although the exact mechanism of action remains to be elucidated, the remarkable evolutionary conservation of CtIP/Ctp1 tetramerisation clearly points to its crucial role in HR.

    Funded by: Cancer Research UK: 11224; Wellcome Trust: 104641

    Nucleus (Austin, Tex.) 2015;6;5;344-8

  • Systematic discovery of probiotics.

    Forster SC and Lawley TD

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.

    Nature biotechnology 2015;33;1;47-9

  • HPMCD: the database of human microbial communities from metagenomic datasets and microbial reference genomes.

    Forster SC, Browne HP, Kumar N, Hunt M, Denise H, Mitchell A, Finn RD and Lawley TD

    Host Microbiota Interactions Laboratory, Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK Centre for Innate Immunity and Infectious Diseases, Hudson Institute of Medical Research, Clayton 3168, Australia Department of Molecular and Translational Sciences, Monash University, Clayton 3800, Australia

    The Human Pan-Microbe Communities (HPMC) database ( provides a manually curated, searchable, metagenomic resource to facilitate investigation of human gastrointestinal microbiota. Over the past decade, the application of metagenome sequencing to elucidate the microbial composition and functional capacity present in the human microbiome has revolutionized many concepts in our basic biology. When sufficient high quality reference genomes are available, whole genome metagenomic sequencing can provide direct biological insights and high-resolution classification. The HPMC database provides species level, standardized phylogenetic classification of over 1800 human gastrointestinal metagenomic samples. This is achieved by combining a manually curated list of bacterial genomes from human faecal samples with over 21000 additional reference genomes representing bacteria, viruses, archaea and fungi with manually curated species classification and enhanced sample metadata annotation. A user-friendly, web-based interface provides the ability to search for (i) microbial groups associated with health or disease state, (ii) health or disease states and community structure associated with a microbial group, (iii) the enrichment of a microbial gene or sequence and (iv) enrichment of a functional annotation. The HPMC database enables detailed analysis of human microbial communities and supports research from basic microbiology and immunology to therapeutic development in human health and disease.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/M011755/1; Medical Research Council: 1091097; Wellcome Trust: 098051

    Nucleic acids research 2015;44;D1;D604-9

  • MicroRNA as Type I Interferon-Regulated Transcripts and Modulators of the Innate Immune Response.

    Forster SC, Tate MD and Hertzog PJ

    Centre for Innate Immunity and Infectious Diseases, Hudson Institute of Medical Research , Clayton, VIC , Australia ; Department of Molecular and Translational Sciences, Monash University , Clayton, VIC , Australia ; Host-Microbiota Interactions Laboratory, Wellcome Trust Sanger Institute , Hinxton , UK.

    Type I interferons (IFNs) are an important family of cytokines that regulate innate and adaptive immune responses to pathogens, in cancer and inflammatory diseases. While the regulation and role of protein-coding genes involved in these responses are well characterized, the role of non-coding microRNAs in the IFN responses is less developed. We review the emerging picture of microRNA regulation of the IFN response at the transcriptional and post-transcriptional level. This response forms an important regulatory loop; several microRNAs target transcripts encoding components at many steps of the type I IFN response, both production and action, at the receptor, signaling, transcription factor, and regulated gene level. Not only do IFNs regulate positive signaling molecules but also negative regulators such as SOCS1. In total, 36 microRNA are reported as IFN regulated. Given this apparent multipronged targeting of the IFN response by microRNAs and their well-characterized capacity to "buffer" responses in other situations, the prospects of improved sequencing and microRNA targeting technologies will facilitate the elucidation of the broader regulatory networks of microRNA in this important biological context, and their therapeutic and diagnostic potential.

    Frontiers in immunology 2015;6;334

  • What role could organoids play in the personalization of cancer treatment?

    Francies HE and Garnett MJ

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Cancer treatments are increasingly being targeted to specific patient populations based on the molecular and genetic features of their tumor, so called precision or personalized cancer medicine. Preclinical cancer models are essential tools for cancer research, but unfortunately our current models often fail to effectively represent patient tumors and can be poorly predictive of clinical responses. In this perspective, we discuss the use of new in vitro 3D cell models called 'organoids' as preclinical cancer models in the context of other commonly used models, namely cancer cell lines and patient-derived xenografts. We consider the relative strengths and limitations of each model, and discuss how organoid culture models could facilitate the personalization of cancer medicine.

    Funded by: Wellcome Trust: 102696

    Pharmacogenomics 2015;16;14;1523-6

  • TLR signaling modulates side effects of anticancer therapy in the small intestine.

    Frank M, Hennenberg EM, Eyking A, Rünzi M, Gerken G, Scott P, Parkhill J, Walker AW and Cario E

    Division of Gastroenterology and Hepatology, University Hospital of Essen, D-45147 Essen, Germany; Medical School, University of Duisburg-Essen, D-45122 Essen, Germany;

    Intestinal mucositis represents the most common complication of intensive chemotherapy, which has a severe adverse impact on quality of life of cancer patients. However, the precise pathophysiology remains to be clarified, and there is so far no successful therapeutic intervention. In this study, we investigated the role of innate immunity through TLR signaling in modulating genotoxic chemotherapy-induced small intestinal injury in vitro and in vivo. Genetic deletion of TLR2, but not MD-2, in mice resulted in severe chemotherapy-induced intestinal mucositis in the proximal jejunum with villous atrophy, accumulation of damaged DNA, CD11b(+)-myeloid cell infiltration, and significant gene alterations in xenobiotic metabolism, including a decrease in ABCB1/multidrug resistance (MDR)1 p-glycoprotein (p-gp) expression. Functionally, stimulation of TLR2 induced synthesis and drug efflux activity of ABCB1/MDR1 p-gp in murine and human CD11b(+)-myeloid cells, thus inhibiting chemotherapy-mediated cytotoxicity. Conversely, TLR2 activation failed to protect small intestinal tissues genetically deficient in MDR1A against DNA-damaging drug-induced apoptosis. Gut microbiota depletion by antibiotics led to increased susceptibility to chemotherapy-induced mucosal injury in wild-type mice, which was suppressed by administration of a TLR2 ligand, preserving ABCB1/MDR1 p-gp expression. Findings were confirmed in a preclinical model of human chemotherapy-induced intestinal mucositis using duodenal biopsies by demonstrating that TLR2 activation limited the toxic-inflammatory reaction and maintained assembly of the drug transporter p-gp. In conclusion, this study identifies a novel molecular link between innate immunity and xenobiotic metabolism. TLR2 acts as a central regulator of xenobiotic defense via the multidrug transporter ABCB1/MDR1 p-gp. Targeting TLR2 may represent a novel therapeutic approach in chemotherapy-induced intestinal mucositis.

    Funded by: Wellcome Trust: 098051

    Journal of immunology (Baltimore, Md. : 1950) 2015;194;4;1983-95

  • Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction.

    Frankish A, Uszczynska B, Ritchie GR, Gonzalez JM, Pervouchine D, Petryszak R, Mudge JM, Fonseca N, Brazma A, Guigo R and Harrow J

    Background: A vast amount of DNA variation is being identified by increasingly large-scale exome and genome sequencing projects. To be useful, variants require accurate functional annotation and a wide range of tools are available to this end. McCarthy et al recently demonstrated the large differences in prediction of loss-of-function (LoF) variation when RefSeq and Ensembl transcripts are used for annotation, highlighting the importance of the reference transcripts on which variant functional annotation is based.

    Results: We describe a detailed analysis of the similarities and differences between the gene and transcript annotation in the GENCODE and RefSeq genesets. We demonstrate that the GENCODE Comprehensive set is richer in alternative splicing, novel CDSs, novel exons and has higher genomic coverage than RefSeq, while the GENCODE Basic set is very similar to RefSeq. Using RNAseq data we show that exons and introns unique to one geneset are expressed at a similar level to those common to both. We present evidence that the differences in gene annotation lead to large differences in variant annotation where GENCODE and RefSeq are used as reference transcripts, although this is predominantly confined to non-coding transcripts and UTR sequence, with at most ~30% of LoF variants annotated discordantly. We also describe an investigation of dominant transcript expression, showing that it both supports the utility of the GENCODE Basic set in providing a smaller set of more highly expressed transcripts and provides a useful, biologically-relevant filter for further reducing the complexity of the transcriptome.

    Conclusions: The reference transcripts selected for variant functional annotation do have a large effect on the outcome. The GENCODE Comprehensive transcripts contain more exons, have greater genomic coverage and capture many more variants than RefSeq in both genome and exome datasets, while the GENCODE Basic set shows a higher degree of concordance with RefSeq and has fewer unique features. We propose that the GENCODE Comprehensive set has great utility for the discovery of new variants with functional potential, while the GENCODE Basic set is more suitable for applications demanding less complex interpretation of functional variants.

    Funded by: NHGRI NIH HHS: U41 HG007000, U41 HG007234, U54 HG007004; Wellcome Trust: WT098051

    BMC genomics 2015;16 Suppl 8;S2

  • Ptpn22 and Cd2 Variations Are Associated with Altered Protein Expression and Susceptibility to Type 1 Diabetes in Nonobese Diabetic Mice.

    Fraser HI, Howlett S, Clark J, Rainbow DB, Stanford SM, Wu DJ, Hsieh YW, Maine CJ, Christensen M, Kuchroo V, Sherman LA, Podolin PL, Todd JA, Steward CA, Peterson LB, Bottini N and Wicker LS

    Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, United Kingdom;

    By congenic strain mapping using autoimmune NOD.C57BL/6J congenic mice, we demonstrated previously that the type 1 diabetes (T1D) protection associated with the insulin-dependent diabetes (Idd)10 locus on chromosome 3, originally identified by linkage analysis, was in fact due to three closely linked Idd loci: Idd10, Idd18.1, and Idd18.3. In this study, we define two additional Idd loci--Idd18.2 and Idd18.4--within the boundaries of this cluster of disease-associated genes. Idd18.2 is 1.31 Mb and contains 18 genes, including Ptpn22, which encodes a phosphatase that negatively regulates T and B cell signaling. The human ortholog of Ptpn22, PTPN22, is associated with numerous autoimmune diseases, including T1D. We, therefore, assessed Ptpn22 as a candidate for Idd18.2; resequencing of the NOD Ptpn22 allele revealed 183 single nucleotide polymorphisms with the C57BL/6J (B6) allele--6 exonic and 177 intronic. Functional studies showed higher expression of full-length Ptpn22 RNA and protein, and decreased TCR signaling in congenic strains with B6-derived Idd18.2 susceptibility alleles. The 953-kb Idd18.4 locus contains eight genes, including the candidate Cd2. The CD2 pathway is associated with the human autoimmune disease, multiple sclerosis, and mice with NOD-derived susceptibility alleles at Idd18.4 have lower CD2 expression on B cells. Furthermore, we observed that susceptibility alleles at Idd18.2 can mask the protection provided by Idd10/Cd101 or Idd18.1/Vav3 and Idd18.3. In summary, we describe two new T1D loci, Idd18.2 and Idd18.4, candidate genes within each region, and demonstrate the complex nature of genetic interactions underlying the development of T1D in the NOD mouse model.

    Funded by: NIAID NIH HHS: AI15416, N01 AI015416, P01 AI039671, P01AI039671, R01 AI070544, R01AI070544, U01 AI070351, U01AI070351; Wellcome Trust: 091157, 100140

    Journal of immunology (Baltimore, Md. : 1950) 2015;195;10;4841-52

  • Immunofluorescence Analysis and Diagnosis of Primary Ciliary Dyskinesia with Radial Spoke Defects.

    Frommer A, Hjeij R, Loges NT, Edelbusch C, Jahnke C, Raidt J, Werner C, Wallmeier J, Große-Onnebrink J, Olbrich H, Cindrić S, Jaspers M, Boon M, Memari Y, Durbin R, Kolb-Kokocinski A, Sauer S, Marthin JK, Nielsen KG, Amirav I, Elias N, Kerem E, Shoseyov D, Haeffner K and Omran H

    1 Department of General Pediatrics, University Children's Hospital Muenster, Muenster, Germany.

    Primary ciliary dyskinesia (PCD) is a genetically heterogeneous recessive disorder caused by several distinct defects in genes responsible for ciliary beating, leading to defective mucociliary clearance often associated with randomization of left/right body asymmetry. Individuals with PCD caused by defective radial spoke (RS) heads are difficult to diagnose owing to lack of gross ultrastructural defects and absence of situs inversus. Thus far, most mutations identified in human radial spoke genes (RSPH) are loss-of-function mutations, and missense variants have been rarely described. We studied the consequences of different RSPH9, RSPH4A, and RSPH1 mutations on the assembly of the RS complex to improve diagnostics in PCD. We report 21 individuals with PCD (16 families) with biallelic mutations in RSPH9, RSPH4A, and RSPH1, including seven novel mutations comprising missense variants, and performed high-resolution immunofluorescence analysis of human respiratory cilia. Missense variants are frequent genetic defects in PCD with RS defects. Absence of RSPH4A due to mutations in RSPH4A results in deficient axonemal assembly of the RS head components RSPH1 and RSPH9. RSPH1 mutant cilia, lacking RSPH1, fail to assemble RSPH9, whereas RSPH9 mutations result in axonemal absence of RSPH9, but do not affect the assembly of the other head proteins, RSPH1 and RSPH4A. Interestingly, our results were identical in individuals carrying loss-of-function mutations, missense variants, or one amino acid deletion. Immunofluorescence analysis can improve diagnosis of PCD in patients with loss-of-function mutations as well as missense variants. RSPH4A is the core protein of the RS head.

    American journal of respiratory cell and molecular biology 2015;53;4;563-73

  • Principles Governing A-to-I RNA Editing in the Breast Cancer Transcriptome.

    Fumagalli D, Gacquer D, Rothé F, Lefort A, Libert F, Brown D, Kheddoumi N, Shlien A, Konopka T, Salgado R, Larsimont D, Polyak K, Willard-Gallo K, Desmedt C, Piccart M, Abramowicz M, Campbell PJ, Sotiriou C and Detours V

    Breast Cancer Translational Research Laboratory, Jules Bordet Institute, Université Libre de Bruxelles (ULB), Boulevard de Waterloo, 125-1000 Brussels, Belgium.

    Little is known about how RNA editing operates in cancer. Transcriptome analysis of 68 normal and cancerous breast tissues revealed that the editing enzyme ADAR acts uniformly, on the same loci, across tissues. In controlled ADAR expression experiments, the editing frequency increased at all loci with ADAR expression levels according to the logistic model. Loci-specific "editabilities," i.e., propensities to be edited by ADAR, were quantifiable by fitting the logistic function to dose-response data. The editing frequency was increased in tumor cells in comparison to normal controls. Type I interferon response and ADAR DNA copy number together explained 53% of ADAR expression variance in breast cancers. ADAR silencing using small hairpin RNA lentivirus transduction in breast cancer cell lines led to less cell proliferation and more apoptosis. A-to-I editing is a pervasive, yet reproducible, source of variation that is globally controlled by 1q amplification and inflammation, both of which are highly prevalent among human cancers.

    Cell reports 2015;13;2;277-89

  • CHD2 variants are a risk factor for photosensitivity in epilepsy.

    Galizia EC, Myers CT, Leu C, de Kovel CG, Afrikanova T, Cordero-Maldonado ML, Martins TG, Jacmin M, Drury S, Krishna Chinthapalli V, Muhle H, Pendziwiat M, Sander T, Ruppert AK, Møller RS, Thiele H, Krause R, Schubert J, Lehesjoki AE, Nürnberg P, Lerche H, EuroEPINOMICS CoGIE Consortium, Palotie A, Coppola A, Striano S, Gaudio LD, Boustred C, Schneider AL, Lench N, Jocic-Jakubi B, Covanis A, Capovilla G, Veggiotti P, Piccioli M, Parisi P, Cantonetti L, Sadleir LG, Mullen SA, Berkovic SF, Stephani U, Helbig I, Crawford AD, Esguerra CV, Kasteleijn-Nolst Trenité DG, Koeleman BP, Mefford HC, Scheffer IE and Sisodiya SM

    1 NIHR Biomedical Research Centre Department of Clinical and Experimental Epilepsy, UCL Institute of Neurology, National Hospital for Neurology and Neurosurgery, Queen Square, London, UK 2 Epilepsy Society, Bucks, UK

    Photosensitivity is a heritable abnormal cortical response to flickering light, manifesting as particular electroencephalographic changes, with or without seizures. Photosensitivity is prominent in a very rare epileptic encephalopathy due to de novo CHD2 mutations, but is also seen in epileptic encephalopathies due to other gene mutations. We determined whether CHD2 variation underlies photosensitivity in common epilepsies, specific photosensitive epilepsies and individuals with photosensitivity without seizures. We studied 580 individuals with epilepsy and either photosensitive seizures or abnormal photoparoxysmal response on electroencephalography, or both, and 55 individuals with photoparoxysmal response but no seizures. We compared CHD2 sequence data to publicly available data from 34 427 individuals, not enriched for epilepsy. We investigated the role of unique variants seen only once in the entire data set. We sought CHD2 variants in 238 exomes from familial genetic generalized epilepsies, and in other public exome data sets. We identified 11 unique variants in the 580 individuals with photosensitive epilepsies and 128 unique variants in the 34 427 controls: unique CHD2 variation is over-represented in cases overall (P = 2.17 × 10(-5)). Among epilepsy syndromes, there was over-representation of unique CHD2 variants (3/36 cases) in the archetypal photosensitive epilepsy syndrome, eyelid myoclonia with absences (P = 3.50 × 10(-4)). CHD2 variation was not over-represented in photoparoxysmal response without seizures. Zebrafish larvae with chd2 knockdown were tested for photosensitivity. Chd2 knockdown markedly enhanced mild innate zebrafish larval photosensitivity. CHD2 mutation is the first identified cause of the archetypal generalized photosensitive epilepsy syndrome, eyelid myoclonia with absences. Unique CHD2 variants are also associated with photosensitivity in common epilepsies. CHD2 does not encode an ion channel, opening new avenues for research into human cortical excitability.

    Funded by: NICHD NIH HHS: U54 HD083091; NINDS NIH HHS: R56 NS069605, R56NS69605; Wellcome Trust: 084730

    Brain : a journal of neurology 2015;138;Pt 5;1198-207

  • Bugs full of viruses.

    Gall A

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2015;13;5;253

  • Viral fossils.

    Gall A

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    This month's Genome Watch examines how the increased availability of mammalian genomes provides new insights into the interactions of endogenous retroviruses with other viruses and various hosts.

    Nature reviews. Microbiology 2015;14;2;66

  • Recurrent ETNK1 mutations in atypical chronic myeloid leukemia.

    Gambacorti-Passerini CB, Donadoni C, Parmiani A, Pirola A, Redaelli S, Signore G, Piazza V, Malcovati L, Fontana D, Spinelli R, Magistroni V, Gaipa G, Peronaci M, Morotti A, Panuzzo C, Saglio G, Usala E, Kim DW, Rea D, Zervakis K, Viniou N, Symeonidis A, Becker H, Boultwood J, Campiotti L, Carrabba M, Elli E, Bignell GR, Papaemmanuil E, Campbell PJ, Cazzola M and Piazza R

    Department of Health Sciences, University of Milano-Bicocca, Monza, Italy; Hematology and Clinical Research Unit, San Gerardo Hospital, Monza, Italy;

    Despite the recent identification of recurrent SETBP1 mutations in atypical chronic myeloid leukemia (aCML), a complete description of the somatic lesions responsible for the onset of this disorder is still lacking. To find additional somatic abnormalities in aCML, we performed whole-exome sequencing on 15 aCML cases. In 2 cases (13.3%), we identified somatic missense mutations in the ETNK1 gene. Targeted resequencing on 515 hematological clonal disorders revealed the presence of ETNK1 variants in 6 (8.8%) of 68 aCML and 2 (2.6%) of 77 chronic myelomonocytic leukemia samples. These mutations clustered in a small region of the kinase domain, encoding for H243Y and N244S (1/8 H243Y; 7/8 N244S). They were all heterozygous and present in the dominant clone. The intracellular phosphoethanolamine/phosphocholine ratio was, on average, 5.2-fold lower in ETNK1-mutated samples (P < .05). Similar results were obtained using myeloid TF1 cells transduced with ETNK1 wild type, ETNK1-N244S, and ETNK1-H243Y, where the intracellular phosphoethanolamine/phosphocholine ratio was significantly lower in ETNK1-N244S (0.76 ± 0.07) and ETNK1-H243Y (0.37 ± 0.02) than in ETNK1-WT (1.37 ± 0.32; P = .01 and P = .0008, respectively), suggesting that ETNK1 mutations may inhibit the catalytic activity of the enzyme. In summary, our study shows for the first time the evidence of recurrent somatic ETNK1 mutations in the context of myeloproliferative/myelodysplastic disorders.

    Funded by: Wellcome Trust: 088340

    Blood 2015;125;3;499-503

  • The structure of an endogenous Drosophila centromere reveals the prevalence of tandemly repeated sequences able to form i-motifs.

    Garavís M, Méndez-Lago M, Gabelica V, Whitehead SL, González C and Villasante A

    Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Universidad Autónoma de Madrid, Nicolás Cabrera 1, 28049 Madrid, Spain.

    Centromeres are the chromosomal loci at which spindle microtubules attach to mediate chromosome segregation during mitosis and meiosis. In most eukaryotes, centromeres are made up of highly repetitive DNA sequences (satellite DNA) interspersed with middle repetitive DNA sequences (transposable elements). Despite the efforts to establish complete genomic sequences of eukaryotic organisms, the so-called 'finished' genomes are not actually complete because the centromeres have not been assembled due to the intrinsic difficulties in constructing both physical maps and complete sequence assemblies of long stretches of tandemly repetitive DNA. Here we show the first molecular structure of an endogenous Drosophila centromere and the ability of the C-rich dodeca satellite strand to form dimeric i-motifs. The finding of i-motif structures in simple and complex centromeric satellite DNAs leads us to suggest that these centromeric sequences may have been selected not by their primary sequence but by their ability to form noncanonical secondary structures.

    Scientific reports 2015;5;13307

  • Genetic fine mapping and genomic annotation defines causal mechanisms at type 2 diabetes susceptibility loci.

    Gaulton KJ, Ferreira T, Lee Y, Raimondo A, Mägi R, Reschen ME, Mahajan A, Locke A, Rayner NW, Robertson N, Scott RA, Prokopenko I, Scott LJ, Green T, Sparso T, Thuillier D, Yengo L, Grallert H, Wahl S, Frånberg M, Strawbridge RJ, Kestler H, Chheda H, Eisele L, Gustafsson S, Steinthorsdottir V, Thorleifsson G, Qi L, Karssen LC, van Leeuwen EM, Willems SM, Li M, Chen H, Fuchsberger C, Kwan P, Ma C, Linderman M, Lu Y, Thomsen SK, Rundle JK, Beer NL, van de Bunt M, Chalisey A, Kang HM, Voight BF, Abecasis GR, Almgren P, Baldassarre D, Balkau B, Benediktsson R, Blüher M, Boeing H, Bonnycastle LL, Bottinger EP, Burtt NP, Carey J, Charpentier G, Chines PS, Cornelis MC, Couper DJ, Crenshaw AT, van Dam RM, Doney AS, Dorkhan M, Edkins S, Eriksson JG, Esko T, Eury E, Fadista J, Flannick J, Fontanillas P, Fox C, Franks PW, Gertow K, Gieger C, Gigante B, Gottesman O, Grant GB, Grarup N, Groves CJ, Hassinen M, Have CT, Herder C, Holmen OL, Hreidarsson AB, Humphries SE, Hunter DJ, Jackson AU, Jonsson A, Jørgensen ME, Jørgensen T, Kao WH, Kerrison ND, Kinnunen L, Klopp N, Kong A, Kovacs P, Kraft P, Kravic J, Langford C, Leander K, Liang L, Lichtner P, Lindgren CM, Lindholm E, Linneberg A, Liu CT, Lobbens S, Luan J, Lyssenko V, Männistö S, McLeod O, Meyer J, Mihailov E, Mirza G, Mühleisen TW, Müller-Nurasyid M, Navarro C, Nöthen MM, Oskolkov NN, Owen KR, Palli D, Pechlivanis S, Peltonen L, Perry JR, Platou CG, Roden M, Ruderfer D, Rybin D, van der Schouw YT, Sennblad B, Sigurðsson G, Stančáková A, Steinbach G, Storm P, Strauch K, Stringham HM, Sun Q, Thorand B, Tikkanen E, Tonjes A, Trakalo J, Tremoli E, Tuomi T, Wennauer R, Wiltshire S, Wood AR, Zeggini E, Dunham I, Birney E, Pasquali L, Ferrer J, Loos RJ, Dupuis J, Florez JC, Boerwinkle E, Pankow JS, van Duijn C, Sijbrands E, Meigs JB, Hu FB, Thorsteinsdottir U, Stefansson K, Lakka TA, Rauramaa R, Stumvoll M, Pedersen NL, Lind L, Keinanen-Kiukaanniemi SM, Korpi-Hyövälti E, Saaristo TE, Saltevo J, Kuusisto J, Laakso M, Metspalu A, Erbel R, Jöcke KH, Moebus S, Ripatti S, Salomaa V, Ingelsson E, Boehm BO, Bergman RN, Collins FS, Mohlke KL, Koistinen H, Tuomilehto J, Hveem K, Njølstad I, Deloukas P, Donnelly PJ, Frayling TM, Hattersley AT, de Faire U, Hamsten A, Illig T, Peters A, Cauchi S, Sladek R, Froguel P, Hansen T, Pedersen O, Morris AD, Palmer CN, Kathiresan S, Melander O, Nilsson PM, Groop LC, Barroso I, Langenberg C, Wareham NJ, O'Callaghan CA, Gloyn AL, Altshuler D, Boehnke M, Teslovich TM, McCarthy MI, Morris AP and DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.

    We performed fine mapping of 39 established type 2 diabetes (T2D) loci in 27,206 cases and 57,574 controls of European ancestry. We identified 49 distinct association signals at these loci, including five mapping in or near KCNQ1. 'Credible sets' of the variants most likely to drive each distinct signal mapped predominantly to noncoding sequence, implying that association with T2D is mediated through gene regulation. Credible set variants were enriched for overlap with FOXA2 chromatin immunoprecipitation binding sites in human islet and liver cells, including at MTNR1B, where fine mapping implicated rs10830963 as driving T2D association. We confirmed that the T2D risk allele for this SNP increases FOXA2-bound enhancer activity in islet- and liver-derived cells. We observed allele-specific differences in NEUROD1 binding in islet-derived cells, consistent with evidence that the T2D risk allele increases islet MTNR1B expression. Our study demonstrates how integration of genetic and genomic information can define molecular mechanisms through which variants underlying association signals exert their effects on disease.

    Funded by: British Heart Foundation: PG/11/4/28645, RG/08/008/25291, RG/14/5/30893; Intramural NIH HHS: Z01 HG000024-13; Medical Research Council: G0000649, G0601261, G116/165, MC_U106179471, MC_U106179472, MC_UU_12015/1, MC_UU_12015/2, MR/L02036X/1; NCI NIH HHS: CA055075, P01 CA055075; NCRR NIH HHS: UL1 RR025005, UL1RR025005; NHGRI NIH HHS: 1Z01HG000024, N01HG65403, U01 HG004399, U01 HG004402, U01HG004399, U01HG004402, Z01 HG000024; NHLBI NIH HHS: HHSN268201100005C, HHSN268201100005G, HHSN268201100005I, HHSN268201100006C, HHSN268201100007C, HHSN268201100007I, HHSN268201100008C, HHSN268201100008I, HHSN268201100009C, HHSN268201100009I, HHSN268201100010C, HHSN268201100011C, HHSN268201100011I, HHSN268201100012C, N01HC25195, N02HL64278, R01 HL059367, R01 HL086694, R01 HL087641, R01HL086694, R01HL087641, R01HL59367; NIA NIH HHS: AG028555, AG04563, AG08724, AG08861, AG10175, R01 AG010175, R01 AG028555; NIDDK NIH HHS: DK085545, DK098032, DK58845, K24 DK080140, K24DK080140, P30 DK020572, R01 DK058845, R01 DK062370, R01 DK072193, R01 DK073490, R01 DK078616, R01 DK093757, R01 DK098032, R01 DK101478, R01DK062370, R01DK072193, R01DK073490, R01DK078616, U01 DK062370, U01 DK078616, U01 DK085526, U01 DK085545, U01DK085526; PHS HHS: HHSN268200625226C, HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, HHSN268201100012C; WHI NIH HHS: N01 HC025195, N01 HG065403, N02 HL64278; Wellcome Trust: 072960, 076113, 083270, 083948, 086596, 090367, 090532, 095101, 098017, 098051, 098381, 098395, 101033, GR072960

    Nature genetics 2015;47;12;1415-25

  • A modified R-type bacteriocin specifically targeting Clostridium difficile prevents colonization of mice without affecting gut microbiota diversity.

    Gebhart D, Lok S, Clare S, Tomas M, Stares M, Scholl D, Donskey CJ, Lawley TD and Govoni GR

    AvidBiotics Corp., South San Francisco, California, USA.

    Unlabelled: Clostridium difficile is a leading cause of nosocomial infections worldwide and has become an urgent public health threat requiring immediate attention. Epidemic lineages of the BI/NAP1/027 strain type have emerged and spread through health care systems across the globe over the past decade. Limiting person-to-person transmission and eradicating C. difficile, especially the BI/NAP1/027 strain type, from health care facilities are difficult due to the abundant shedding of spores that are impervious to most interventions. Effective prophylaxis for C. difficile infection (CDI) is lacking. We have genetically modified a contractile R-type bacteriocin ("diffocin") from C. difficile strain CD4 to kill BI/NAP1/027-type strains for this purpose. The natural receptor binding protein (RBP) responsible for diffocin targeting was replaced with a newly discovered RBP identified within a prophage of a BI/NAP1/027-type target strain by genome mining. The resulting modified diffocins (a.k.a. Avidocin-CDs), Av-CD291.1 and Av-CD291.2, were stable and killed all 16 tested BI/NAP1/027-type strains. Av-CD291.2 administered in drinking water survived passage through the mouse gastrointestinal (GI) tract, did not detectably alter the mouse gut microbiota or disrupt natural colonization resistance to C. difficile or the vancomycin-resistant Enterococcus faecium (VREF), and prevented antibiotic-induced colonization of mice inoculated with BI/NAP1/027-type spores. Given the high incidence and virulence of the pathogen, preventing colonization by BI/NAP1/027-type strains and limiting their transmission could significantly reduce the occurrence of the most severe CDIs. This modified diffocin represents a prototype of an Avidocin-CD platform capable of producing targetable, precision anti-C. difficile agents that can prevent and potentially treat CDIs without disrupting protective indigenous microbiota.

    Importance: Treatment and prevention strategies for bacterial diseases rely heavily on traditional antibiotics, which impose strong selection for resistance and disrupt protective microbiota. One consequence has been an upsurge of opportunistic pathogens, such as Clostridium difficile, that exploit antibiotic-induced disruptions in gut microbiota to proliferate and cause life-threatening diseases. We have developed alternative agents that utilize contractile bactericidal protein complexes (R-type bacteriocins) to kill specific C. difficile pathogens. Efficacy in a preclinical animal study indicates these molecules warrant further development as potential prophylactic agents to prevent C. difficile infections in humans. Since these agents do not detectably alter the indigenous gut microbiota or colonization resistance in mice, we believe they will be safe to administer as a prophylactic to block transmission in high-risk environments without rendering patients susceptible to enteric infection after cessation of treatment.

    Funded by: NIAID NIH HHS: 1R43 AI098186; Wellcome Trust: WT 098051

    mBio 2015;6;2

  • An interactive genome browser of association results from the UK10K cohorts project.

    Geihs M, Yan Y, Walter K, Huang J, Memari Y, Min JL, Mead D, UK10K Consortium, Hubbard TJ, Timpson NJ, Down TA and Soranzo N

    Wellcome Trust Sanger Institute, Genome Campus, Hinxton CB10 1HH, UK.

    Unlabelled: High-throughput sequencing technologies survey genetic variation at genome scale and are increasingly used to study the contribution of rare and low-frequency genetic variants to human traits. As part of the Cohorts arm of the UK10K project, genetic variants called from low-read depth (average 7×) whole genome sequencing of 3621 cohort individuals were analysed for statistical associations with 64 different phenotypic traits of biomedical importance. Here, we describe a novel genome browser based on the Biodalliance platform developed to provide interactive access to the association results of the project.

    Availability and implementation: The browser is available at Source code for the Biodalliance platform is available under a BSD license from, and for the LD-display plugin and backend from

    Funded by: Biotechnology and Biological Sciences Research Council: BB/K015427/1; Medical Research Council: MC_PC_15018, MC_UU_12013/1, MC_UU_12013/3; Wellcome Trust: 102215, WT091310, WT098051

    Bioinformatics (Oxford, England) 2015;31;24;4029-31

  • Meta-analysis of Genome-wide Association Studies for Neuroticism, and the Polygenic Association With Major Depressive Disorder.

    Genetics of Personality Consortium, de Moor MH, van den Berg SM, Verweij KJ, Krueger RF, Luciano M, Arias Vasquez A, Matteson LK, Derringer J, Esko T, Amin N, Gordon SD, Hansell NK, Hart AB, Seppälä I, Huffman JE, Konte B, Lahti J, Lee M, Miller M, Nutile T, Tanaka T, Teumer A, Viktorin A, Wedenoja J, Abecasis GR, Adkins DE, Agrawal A, Allik J, Appel K, Bigdeli TB, Busonero F, Campbell H, Costa PT, Davey Smith G, Davies G, de Wit H, Ding J, Engelhardt BE, Eriksson JG, Fedko IO, Ferrucci L, Franke B, Giegling I, Grucza R, Hartmann AM, Heath AC, Heinonen K, Henders AK, Homuth G, Hottenga JJ, Iacono WG, Janzing J, Jokela M, Karlsson R, Kemp JP, Kirkpatrick MG, Latvala A, Lehtimäki T, Liewald DC, Madden PA, Magri C, Magnusson PK, Marten J, Maschio A, Medland SE, Mihailov E, Milaneschi Y, Montgomery GW, Nauck M, Ouwens KG, Palotie A, Pettersson E, Polasek O, Qian Y, Pulkki-Råback L, Raitakari OT, Realo A, Rose RJ, Ruggiero D, Schmidt CO, Slutske WS, Sorice R, Starr JM, St Pourcain B, Sutin AR, Timpson NJ, Trochet H, Vermeulen S, Vuoksimaa E, Widen E, Wouda J, Wright MJ, Zgaga L, Porteous D, Minelli A, Palmer AA, Rujescu D, Ciullo M, Hayward C, Rudan I, Metspalu A, Kaprio J, Deary IJ, Räikkönen K, Wilson JF, Keltikangas-Järvinen L, Bierut LJ, Hettema JM, Grabe HJ, van Duijn CM, Evans DM, Schlessinger D, Pedersen NL, Terracciano A, McGue M, Penninx BW, Martin NG and Boomsma DI

    Department of Clinical Child and Family Studies, VU University Amsterdam, Amsterdam, the Netherlands2Department of Methods, VU University Amsterdam, Amsterdam, the Netherlands3Department of Biological Psychology, VU University Amsterdam, Amsterdam, the Ne.

    Importance: Neuroticism is a pervasive risk factor for psychiatric conditions. It genetically overlaps with major depressive disorder (MDD) and is therefore an important phenotype for psychiatric genetics. The Genetics of Personality Consortium has created a resource for genome-wide association analyses of personality traits in more than 63,000 participants (including MDD cases).

    Objectives: To identify genetic variants associated with neuroticism by performing a meta-analysis of genome-wide association results based on 1000 Genomes imputation; to evaluate whether common genetic variants as assessed by single-nucleotide polymorphisms (SNPs) explain variation in neuroticism by estimating SNP-based heritability; and to examine whether SNPs that predict neuroticism also predict MDD.

    Design, setting, and participants: Genome-wide association meta-analysis of 30 cohorts with genome-wide genotype, personality, and MDD data from the Genetics of Personality Consortium. The study included 63,661 participants from 29 discovery cohorts and 9786 participants from a replication cohort. Participants came from Europe, the United States, or Australia. Analyses were conducted between 2012 and 2014.

    Main outcomes and measures: Neuroticism scores harmonized across all 29 discovery cohorts by item response theory analysis, and clinical MDD case-control status in 2 of the cohorts.

    Results: A genome-wide significant SNP was found on 3p14 in MAGI1 (rs35855737; P = 9.26 × 10-9 in the discovery meta-analysis). This association was not replicated (P = .32), but the SNP was still genome-wide significant in the meta-analysis of all 30 cohorts (P = 2.38 × 10-8). Common genetic variants explain 15% of the variance in neuroticism. Polygenic scores based on the meta-analysis of neuroticism in 27 cohorts significantly predicted neuroticism (1.09 × 10-12 < P < .05) and MDD (4.02 × 10-9 < P < .05) in the 2 other cohorts.

    Conclusions and relevance: This study identifies a novel locus for neuroticism. The variant is located in a known gene that has been associated with bipolar disorder and schizophrenia in previous studies. In addition, the study shows that neuroticism is influenced by many genetic variants of small effect that are either common or tagged by common variants. These genetic variants also influence MDD. Future studies should confirm the role of the MAGI1 locus for neuroticism and further investigate the association of MAGI1 and the polygenic association to a range of other psychiatric disorders that are phenotypically correlated with neuroticism.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F019394/1; Chief Scientist Office: CZB/4/505, CZD/16/6/4, ETM/55; Medical Research Council: G0700704, MC_PC_15018, MC_PC_U127561128, MC_UU_12013/3, MR/K026992/1; NIDA NIH HHS: R01 DA013240, R01 DA024417, R01 DA036216, R37 DA005147; Wellcome Trust: 104036

    JAMA psychiatry 2015;72;7;642-50

  • Combining gene mutation with gene expression data improves outcome prediction in myelodysplastic syndromes.

    Gerstung M, Pellagatti A, Malcovati L, Giagounidis A, Porta MG, Jädersten M, Dolatshad H, Verma A, Cross NC, Vyas P, Killick S, Hellström-Lindberg E, Cazzola M, Papaemmanuil E, Campbell PJ and Boultwood J

    Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Cancer is a genetic disease, but two patients rarely have identical genotypes. Similarly, patients differ in their clinicopathological parameters, but how genotypic and phenotypic heterogeneity are interconnected is not well understood. Here we build statistical models to disentangle the effect of 12 recurrently mutated genes and 4 cytogenetic alterations on gene expression, diagnostic clinical variables and outcome in 124 patients with myelodysplastic syndromes. Overall, one or more genetic lesions correlate with expression levels of ~20% of all genes, explaining 20-65% of observed expression variability. Differential expression patterns vary between mutations and reflect the underlying biology, such as aberrant polycomb repression for ASXL1 and EZH2 mutations or perturbed gene dosage for copy-number changes. In predicting survival, genomic, transcriptomic and diagnostic clinical variables all have utility, with the largest contribution from the transcriptome. Similar observations are made on the TCGA acute myeloid leukaemia cohort, confirming the general trends reported here.

    Funded by: NCI NIH HHS: P30 CA013330; Wellcome Trust: 077012/Z/05/Z, 088340, WT088340MA

    Nature communications 2015;6;5901

  • Genetic susceptibility to invasive Salmonella disease.

    Gilchrist JJ, MacLennan CA and Hill AV

    Wellcome Trust Centre for Human Genetics, Roosevelt Drive, University of Oxford, Oxford OX3 7BN, UK.

    Invasive Salmonella disease, in the form of enteric fever and invasive non-typhoidal Salmonella (iNTS) disease, causes substantial morbidity and mortality in children and adults in the developing world. The study of genetic variations in humans and mice that influence susceptibility of the host to Salmonella infection provides important insights into immunity to Salmonella. In this Review, we discuss data that have helped to elucidate the host genetic determinants of human enteric fever and iNTS disease, alongside data from the mouse model of Salmonella infection. Considered together, these studies provide a detailed picture of the immunobiology of human invasive Salmonella disease.

    Nature reviews. Immunology 2015;15;7;452-63

  • Combinations of PARP Inhibitors with Temozolomide Drive PARP1 Trapping and Apoptosis in Ewing's Sarcoma.

    Gill SJ, Travers J, Pshenichnaya I, Kogera FA, Barthorpe S, Mironenko T, Richardson L, Benes CH, Stratton MR, McDermott U, Jackson SP and Garnett MJ

    Wellcome Trust Sanger Institute, Hinxton, United Kingdom.

    Ewing's sarcoma is a malignant pediatric bone tumor with a poor prognosis for patients with metastatic or recurrent disease. Ewing's sarcoma cells are acutely hypersensitive to poly (ADP-ribose) polymerase (PARP) inhibition and this is being evaluated in clinical trials, although the mechanism of hypersensitivity has not been directly addressed. PARP inhibitors have efficacy in tumors with BRCA1/2 mutations, which confer deficiency in DNA double-strand break (DSB) repair by homologous recombination (HR). This drives dependence on PARP1/2 due to their function in DNA single-strand break (SSB) repair. PARP inhibitors are also cytotoxic through inhibiting PARP1/2 auto-PARylation, blocking PARP1/2 release from substrate DNA. Here, we show that PARP inhibitor sensitivity in Ewing's sarcoma cells is not through an apparent defect in DNA repair by HR, but through hypersensitivity to trapped PARP1-DNA complexes. This drives accumulation of DNA damage during replication, ultimately leading to apoptosis. We also show that the activity of PARP inhibitors is potentiated by temozolomide in Ewing's sarcoma cells and is associated with enhanced trapping of PARP1-DNA complexes. Furthermore, through mining of large-scale drug sensitivity datasets, we identify a subset of glioma, neuroblastoma and melanoma cell lines as hypersensitive to the combination of temozolomide and PARP inhibition, potentially identifying new avenues for therapeutic intervention. These data provide insights into the anti-cancer activity of PARP inhibitors with implications for the design of treatment for Ewing's sarcoma patients with PARP inhibitors.

    Funded by: Cancer Research UK: 11224, C6946/A14492; Wellcome Trust: WT092096

    PloS one 2015;10;10;e0140988

  • The epigenetic regulators CBP and p300 facilitate leukemogenesis and represent therapeutic targets in acute myeloid leukemia.

    Giotopoulos G, Chan WI, Horton SJ, Ruau D, Gallipoli P, Fowler A, Crawley C, Papaemmanuil E, Campbell PJ, Göttgens B, Van Deursen JM, Cole PA and Huntly BJ

    1] Department of Haematology, Cambridge Institute for Medical Research and Addenbrookes Hospital, University of Cambridge, Cambridge, UK [2] Wellcome Trust - Medical Research Council Cambridge Stem Cell Institute, Cambridge, UK.

    Growing evidence links abnormal epigenetic control to the development of hematological malignancies. Accordingly, inhibition of epigenetic regulators is emerging as a promising therapeutic strategy. The acetylation status of lysine residues in histone tails is one of a number of epigenetic post-translational modifications that alter DNA-templated processes, such as transcription, to facilitate malignant transformation. Although histone deacetylases are already being clinically targeted, the role of histone lysine acetyltransferases (KAT) in malignancy is less well characterized. We chose to study this question in the context of acute myeloid leukemia (AML), where, using in vitro and in vivo genetic ablation and knockdown experiments in murine models, we demonstrate a role for the epigenetic regulators CBP and p300 in the induction and maintenance of AML. Furthermore, using selective small molecule inhibitors of their lysine acetyltransferase activity, we validate CBP/p300 as therapeutic targets in vitro across a wide range of human AML subtypes. We proceed to show that growth retardation occurs through the induction of transcriptional changes that induce apoptosis and cell-cycle arrest in leukemia cells and finally demonstrate the efficacy of the KAT inhibitors in decreasing clonogenic growth of primary AML patient samples. Taken together, these data suggest that CBP/p300 are promising therapeutic targets across multiple subtypes in AML.Oncogene advance online publication, 20 April 2015; doi:10.1038/onc.2015.92.

    Oncogene 2015

  • A novel mouse model identifies cooperating mutations and therapeutic targets critical for chronic myeloid leukemia progression.

    Giotopoulos G, van der Weyden L, Osaki H, Rust AG, Gallipoli P, Meduri E, Horton SJ, Chan WI, Foster D, Prinjha RK, Pimanda JE, Tenen DG, Vassiliou GS, Koschmieder S, Adams DJ and Huntly BJ

    Department of Haematology, Cambridge Institute for Medical Research and Addenbrooke's Hospital, University of Cambridge, Cambridge CB2 0XY, England, UK Wellcome Trust - Medical Research Council Cambridge Stem Cell Institute, University of Cambridge, Cambridge CB2 1TN, England, UK.

    The introduction of highly selective ABL-tyrosine kinase inhibitors (TKIs) has revolutionized therapy for chronic myeloid leukemia (CML). However, TKIs are only efficacious in the chronic phase of the disease and effective therapies for TKI-refractory CML, or after progression to blast crisis (BC), are lacking. Whereas the chronic phase of CML is dependent on BCR-ABL, additional mutations are required for progression to BC. However, the identity of these mutations and the pathways they affect are poorly understood, hampering our ability to identify therapeutic targets and improve outcomes. Here, we describe a novel mouse model that allows identification of mechanisms of BC progression in an unbiased and tractable manner, using transposon-based insertional mutagenesis on the background of chronic phase CML. Our BC model is the first to faithfully recapitulate the phenotype, cellular and molecular biology of human CML progression. We report a heterogeneous and unique pattern of insertions identifying known and novel candidate genes and demonstrate that these pathways drive disease progression and provide potential targets for novel therapeutic strategies. Our model greatly informs the biology of CML progression and provides a potent resource for the development of candidate therapies to improve the dismal outcomes in this highly aggressive disease.

    Funded by: Cancer Research UK: 13031; Medical Research Council: MR/M010392/1; Wellcome Trust: 095663; Worldwide Cancer Research: 14-1069

    The Journal of experimental medicine 2015;212;10;1551-69

  • Genetic stability of pneumococcal isolates during 35 days of human experimental carriage.

    Gladstone RA, Gritzfeld JF, Coupland P, Gordon SB and Bentley SD

    Pathogen Genomics, Wellcome Trust Sanger Institute, Cambridge, UK.

    Background: Pneumococcal carriage is a reservoir for transmission and a precursor to pneumococcal disease. The experimental human pneumococcal carriage model provides a useful tool to aid vaccine licensure through the measurement of vaccine efficacy against carriage (VEcol). Documentation of the genetic stability of the experimental human pneumococcal carriage model is important to further strengthen confidence in its safety and conclusions, enabling it to further facilitate vaccine licensure through providing evidence of VEcol.

    Methods: 229 isolates were sequenced from 10 volunteers in whom experimental human pneumococcal carriage was established, sampled over a period of 35 days. Multiple isolates from within a single volunteer at a single time provided a deep resolution for detecting variation. HiSeq data from the isolates were mapped against a PacBio reference of the inoculum to call variable sites.

    Results: The observed variation between experimental carriage isolates was minimal with the maximum SNP distance between any isolate and the reference being 3 SNPs.

    Conclusion: The low-level variation described provides evidence for the stability of the experimental human pneumococcal carriage model over 35 days, which can be reliably and confidently used to measure VEcol and aid future progression of pneumococcal vaccination.

    Funded by: Medical Research Council: MR/M011569/1; Wellcome Trust: 098051

    Vaccine 2015;33;29;3342-5

  • Five winters of pneumococcal serotype replacement in UK carriage following PCV introduction.

    Gladstone RA, Jefferies JM, Tocheva AS, Beard KR, Garley D, Chong WW, Bentley SD, Faust SN and Clarke SC

    Faculty of Medicine and Institute for Life Sciences, University of Southampton, UK.

    The seven-valent pneumococcal conjugate vaccine (PCV7) was added to the UK national immunisation programme in September 2006. PCV13 replaced PCV7 in April 2010. As carriage precedes disease cases this study collected carried pneumococci from children each winter from 2006/7 to 2010/11 over PCV introduction. Conventional microbiology and whole genome sequencing were utilised to characterise pneumococcal strains. Overall prevalence of pneumococcal carriage remained stable. Vaccine serotypes (VT) decreased (p<0.0001) with concomitant increases in non-vaccine serotypes (NVT). In winter 2010/11 only one isolate of PCV7 VT was observed (6B). PCV13 unique VTs decreased between winters immediately preceding and following PCV13 introduction (p=0.04). Significant decreases for VTs 6B, 19F, 23F (PCV7) and 6A (PCV13) and increases for NVT 21, 23B, 33F and 35F were detected. The serotype replacement was accompanied by parallel changes in genotype prevalence for associated sequence types with clonal expansion contributing to replacement. By winter 2010/11, serotype coverage of PCV7 and PCV13 was 1% and 11% respectively. VT replacement was observed for PCV7 and PCV13 serotypes. Conjugate vaccine design and use requires continuous monitoring and revision.

    Vaccine 2015;33;17;2015-21

  • Whole Genome Sequencing Shows a Low Proportion of Tuberculosis Disease Is Attributable to Known Close Contacts in Rural Malawi.

    Glynn JR, Guerra-Assunção JA, Houben RM, Sichali L, Mzembe T, Mwaungulu LK, Mwaungulu JN, McNerney R, Khan P, Parkhill J, Crampin AC and Clark TG

    Faculty of Epidemiology and Population Health, London School of Hygiene & Tropical Medicine, London, United Kingdom.

    Background: The proportion of tuberculosis attributable to transmission from close contacts is not well known. Comparison of the genome of strains from index patients and prior contacts allows transmission to be confirmed or excluded.

    Methods: In Karonga District, Malawi, all tuberculosis patients are asked about prior contact with others with tuberculosis. All available strains from culture-positive patients were sequenced. Up to 10 single nucleotide polymorphisms between index patients and their prior contacts were allowed for confirmation, and ≥ 100 for exclusion. The population attributable fraction was estimated from the proportion of confirmed transmissions and the proportion of patients with contacts.

    Results: From 1997-2010 there were 1907 new culture-confirmed tuberculosis patients, of whom 32% reported at least one family contact and an additional 11% had at least one other contact; 60% of contacts had smear-positive disease. Among case-contact pairs with sequences available, transmission was confirmed from 38% (62/163) smear-positive prior contacts and 0/17 smear-negative prior contacts. Confirmed transmission was more common in those related to the prior contact (42.4%, 56/132) than in non-relatives (19.4%, 6/31, p = 0.02), and in those with more intense contact, to younger index cases, and in more recent years. The proportion of tuberculosis attributable to known contacts was estimated to be 9.4% overall.

    Conclusions: In this population known contacts only explained a small proportion of tuberculosis cases. Even those with a prior family contact with smear positive tuberculosis were more likely to have acquired their infection elsewhere.

    Funded by: Wellcome Trust: 096249/Z/11/B

    PloS one 2015;10;7;e0132840

  • Monoclonal Antibodies of a Diverse Isotype Induced by an O-Antigen Glycoconjugate Vaccine Mediate In Vitro and In Vivo Killing of African Invasive Nontyphoidal Salmonella.

    Goh YS, Clare S, Micoli F, Saul A, Mastroeni P and MacLennan CA

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom Singapore Immunology Network, Agency for Science, Technology and Research, Biopolis, Singapore.

    Nontyphoidal Salmonella (NTS), particularly Salmonella enterica serovars Typhimurium and Enteritidis, is responsible for a major global burden of invasive disease with high associated case-fatality rates. We recently reported the development of a candidate O-antigen-CRM197 glycoconjugate vaccine against S. Typhimurium. Here, using a panel of mouse monoclonal antibodies generated by the vaccine, we examined the relative efficiency of different antibody isotypes specific for the O:4 antigen of S. Typhimurium to effect in vitro and in vivo killing of the invasive African S. Typhimurium strain D23580. All O:4-specific antibody isotypes could mediate cell-free killing and phagocytosis of S. Typhimurium by mouse blood cells. Opsonization of Salmonella with O:4-specific IgA, IgG1, IgG2a, and IgG2b, but not IgM, resulted in cell-dependent bacterial killing. At high concentrations, O:4-specific antibodies inhibited both cell-free complement-mediated and cell-dependent opsonophagocytic killing of S. Typhimurium in vitro. Using passive immunization in mice, the O:4-specific antibodies provided in vivo functional activity by decreasing the bacterial load in the blood and tissues, with IgG2a and IgG2b being the most effective isotypes. In conclusion, an O-antigen-CRM197 glycoconjugate vaccine can induce O-antigen-specific antibodies of different isotypes that exert in vitro and in vivo killing of S. Typhimurium.

    Funded by: Medical Research Council: G0001245; Wellcome Trust

    Infection and immunity 2015;83;9;3722-31

  • A genome-scale vector resource enables high-throughput reverse genetic screening in a malaria parasite.

    Gomes AR, Bushell E, Schwach F, Girling G, Anar B, Quail MA, Herd C, Pfander C, Modrzynska K, Rayner JC and Billker O

    Wellcome Trust Sanger Institute, Hinxton Cambridge CB10 1SA, UK.

    The genome-wide identification of gene functions in malaria parasites is hampered by a lack of reverse genetic screening methods. We present a large-scale resource of barcoded vectors with long homology arms for effective modification of the Plasmodium berghei genome. Cotransfecting dozens of vectors into the haploid blood stages creates complex pools of barcoded mutants, whose competitive fitness can be measured during infection of a single mouse using barcode sequencing (barseq). To validate the utility of this resource, we rescreen the P. berghei kinome, using published kinome screens for comparison. We find that several protein kinases function redundantly in asexual blood stages and confirm the targetability of kinases cdpk1, gsk3, tkl3, and PBANKA_082960 by genotyping cloned mutants. Thus, parallel phenotyping of barcoded mutants unlocks the power of reverse genetic screening for a malaria parasite and will enable the systematic identification of genes essential for in vivo parasite growth and transmission.

    Funded by: Medical Research Council: G0501670; Wellcome Trust: 098051

    Cell host & microbe 2015;17;3;404-413

  • Novel loci associated with usual sleep duration: the CHARGE Consortium Genome-Wide Association Study.

    Gottlieb DJ, Hek K, Chen TH, Watson NF, Eiriksdottir G, Byrne EM, Cornelis M, Warby SC, Bandinelli S, Cherkas L, Evans DS, Grabe HJ, Lahti J, Li M, Lehtimäki T, Lumley T, Marciante KD, Pérusse L, Psaty BM, Robbins J, Tranah GJ, Vink JM, Wilk JB, Stafford JM, Bellis C, Biffar R, Bouchard C, Cade B, Curhan GC, Eriksson JG, Ewert R, Ferrucci L, Fülöp T, Gehrman PR, Goodloe R, Harris TB, Heath AC, Hernandez D, Hofman A, Hottenga JJ, Hunter DJ, Jensen MK, Johnson AD, Kähönen M, Kao L, Kraft P, Larkin EK, Lauderdale DS, Luik AI, Medici M, Montgomery GW, Palotie A, Patel SR, Pistis G, Porcu E, Quaye L, Raitakari O, Redline S, Rimm EB, Rotter JI, Smith AV, Spector TD, Teumer A, Uitterlinden AG, Vohl MC, Widen E, Willemsen G, Young T, Zhang X, Liu Y, Blangero J, Boomsma DI, Gudnason V, Hu F, Mangino M, Martin NG, O'Connor GT, Stone KL, Tanaka T, Viikari J, Gharib SA, Punjabi NM, Räikkönen K, Völzke H, Mignot E and Tiemeier H

    VA Boston Healthcare System, Boston, MA, USA.

    Usual sleep duration is a heritable trait correlated with psychiatric morbidity, cardiometabolic disease and mortality, although little is known about the genetic variants influencing this trait. A genome-wide association study (GWAS) of usual sleep duration was conducted using 18 population-based cohorts totaling 47 180 individuals of European ancestry. Genome-wide significant association was identified at two loci. The strongest is located on chromosome 2, in an intergenic region 35- to 80-kb upstream from the thyroid-specific transcription factor PAX8 (lowest P=1.1 × 10(-9)). This finding was replicated in an African-American sample of 4771 individuals (lowest P=9.3 × 10(-4)). The strongest combined association was at rs1823125 (P=1.5 × 10(-10), minor allele frequency 0.26 in the discovery sample, 0.12 in the replication sample), with each copy of the minor allele associated with a sleep duration 3.1 min longer per night. The alleles associated with longer sleep duration were associated in previous GWAS with a more favorable metabolic profile and a lower risk of attention deficit hyperactivity disorder. Understanding the mechanisms underlying these associations may help elucidate biological mechanisms influencing sleep duration and its association with psychiatric, metabolic and cardiovascular disease.

    Funded by: CCR NIH HHS: RC2ARO58973; Canadian Institutes of Health Research: FRN-CCT-83028; Intramural NIH HHS; NCATS NIH HHS: UL1 TR000124, UL1TR000124; NCI NIH HHS: CA055075, CA087969, CA40356, CA98233; NCRR NIH HHS: 1UL1RR025011, UL1 RR024140, UL1RR025005, UL1RR033176; NHGRI NIH HHS: HG004399, U01HG004402; NHLBI NIH HHS: HL080295, HL087652, HL105756, HL35464, N01 HC025195, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-55222, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85086, N01-HC-85239, N01‐HC‐25195, N02‐HL‐6‐4278, R01 HL070837, R01 HL070838, R01 HL070839, R01 HL070841, R01 HL070842, R01 HL070847, R01 HL070848, R01 HL071194, R01 HL113338, R01HL086694, R01HL087641, R01HL59367, R01HL62252; NIA NIH HHS: 1R01AG030474-01A1, 1R01AG032098-01A1, AG023629, N01-AG-62101, N01-AG-62103, N01-AG-62106, R01 AG005394, R01 AG005407, R01 AG026720, R01 AG027576, R01AG027574, U01 AG18197, U01-AG027810; NIAAA NIH HHS: AA07535, AA10249, AA11998, AA13320, AA13321, AA13326, AA14041; NIAMS NIH HHS: R01 AR35582, R01 AR35583, R01 AR35584, U01 AR066160, U01 AR45580, U01 AR45583, U01 AR45614, U01 AR45632, U01 AR45647, U01 AR45654; NIDDK NIH HHS: DK058845, DK063491, DK070756, P30 DK063491; NIMH NIH HHS: 1RC2 MH089995-01, 1RC2MH089951-01, MH66206, U24 MH068457-06; NINDS NIH HHS: NS23724; PHS HHS: R01D0042157-01A; Wellcome Trust: 079771, 099194, WT089062

    Molecular psychiatry 2015;20;10;1232-9

  • High-density mapping of the MHC identifies a shared role for HLA-DRB1*01:03 in inflammatory bowel diseases and heterozygous advantage in ulcerative colitis.

    Goyette P, Boucher G, Mallon D, Ellinghaus E, Jostins L, Huang H, Ripke S, Gusareva ES, Annese V, Hauser SL, Oksenberg JR, Thomsen I, Leslie S, International Inflammatory Bowel Disease Genetics Consortium, Australia and New Zealand IBDGC, Belgium IBD Genetics Consortium, Italian Group for IBD Genetic Consortium, NIDDK Inflammatory Bowel Disease Genetics Consortium, United Kingdom IBDGC, Wellcome Trust Case Control Consortium, Quebec IBD Genetics Consortium, Daly MJ, Van Steen K, Duerr RH, Barrett JC, McGovern DP, Schumm LP, Traherne JA, Carrington MN, Kosmoliaptsis V, Karlsen TH, Franke A and Rioux JD

    Research Center, Montreal Heart Institute, Montreal, Quebec, Canada.

    Genome-wide association studies of the related chronic inflammatory bowel diseases (IBD) known as Crohn's disease and ulcerative colitis have shown strong evidence of association to the major histocompatibility complex (MHC). This region encodes a large number of immunological candidates, including the antigen-presenting classical human leukocyte antigen (HLA) molecules. Studies in IBD have indicated that multiple independent associations exist at HLA and non-HLA genes, but they have lacked the statistical power to define the architecture of association and causal alleles. To address this, we performed high-density SNP typing of the MHC in >32,000 individuals with IBD, implicating multiple HLA alleles, with a primary role for HLA-DRB1*01:03 in both Crohn's disease and ulcerative colitis. Noteworthy differences were observed between these diseases, including a predominant role for class II HLA variants and heterozygous advantage observed in ulcerative colitis, suggesting an important role of the adaptive immune response in the colonic environment in the pathogenesis of IBD.

    Funded by: AHRQ HHS: HS021747, R01 HS021747; CCR NIH HHS: HHSN261200800001C; Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Department of Health: NIHR-RP-R3-12-026; Medical Research Council: G0600329, G0800675; NCATS NIH HHS: UL1 TR000005; NCI NIH HHS: CA141743, HHSN261200800001E, R01 CA141743; NIAID NIH HHS: U01 AI067068, U19 AI067152; NIDCR NIH HHS: U54 DE023789, U54 DE023789-01; NIDDK NIH HHS: P01 DK046763, P01 DK046763-19, P30 DK043351, R01 DK064869, U01 DK062413, U01 DK062420, U01 DK062423, U01 DK062429, U01 DK062429-14, U01 DK062431, U01 DK062432; NINDS NIH HHS: R01 NS049477; PHS HHS: 1U19 A1067152, HHSN261200800001E; Wellcome Trust: 102974, WT098051

    Nature genetics 2015;47;2;172-9

  • Genome-wide mapping of Hif-1α binding sites in zebrafish.

    Greenald D, Jeyakani J, Pelster B, Sealy I, Mathavan S and van Eeden FJ

    Bateson Centre, Department of Biomedical Science, The University of Sheffield, Western Bank, Sheffield, UK.

    Background: Hypoxia Inducible Factor (HIF) regulates a cascade of transcriptional events in response to decreased oxygenation, acting from the cellular to the physiological level. This response is evolutionarily conserved, allowing the use of zebrafish (Danio rerio) as a model for studying the hypoxic response. Activation of the hypoxic response can be achieved in zebrafish by homozygous null mutation of the von Hippel-Lindau (vhl) tumour suppressor gene. Previous work from our lab has focused on the phenotypic characterisation of this mutant, establishing the links between vhl mutation, the hypoxic response and cancer. To further develop fish as a model for studying hypoxic signalling, we examine the transcriptional profile of the vhl mutant with respect to Hif-1α. As our approach uses embryos consisting of many cell types, it has the potential to uncover additional HIF regulated genes that have escaped detection in analogous mammalian cell culture studies.

    Results: We performed high-density oligonucleotide microarray analysis of the gene expression changes in von Hippel-Lindau mutant zebrafish, which identified up-regulation of well-known hypoxia response genes and down-regulation of genes primarily involved in lipid processing. To identify the dependency of these transcriptional changes on HIF, we undertook Chromatin Immunoprecipitation linked next generation sequencing (ChIP-seq) for the transcription factor Hypoxia Inducible Factor 1α (HIF-1α). We identified HIF-1α binding sites across the genome, with binding sites showing enrichment for an RCGTG motif, showing conservation with the mammalian hypoxia response element.

    Conclusions: Transcriptome analysis of vhl mutant embryos detected activation of key hypoxia response genes seen in human cell models of hypoxia, but also suppression of many genes primarily involved in lipid processing. ChIP-seq analysis of Hif-1α binding sites unveiled an unprecedented number of loci, with a high proportion containing a canonical hypoxia response element. Whether these sites are functional remains unknown, nevertheless their frequent location near transcriptional start sites suggests functionality, and will allow for investigation into the potential hypoxic regulation of genes in their vicinity. We expect that our data will be an excellent starting point for analysis of both fish and mammalian gene regulation by HIF.

    BMC genomics 2015;16;923

  • Automatic concept recognition using the human phenotype ontology reference and test suite corpora.

    Groza T, Köhler S, Doelken S, Collier N, Oellrich A, Smedley D, Couto FM, Baynam G, Zankl A and Robinson PN

    School of ITEE, The University of Queensland, St. Lucia, QLD 4072, Australia, Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK, National Institute of Informatics, Hitotsubashi, Tokyo, Japan, Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK, LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal, Genetic Services of Western Australia, King Edward Memorial Hospital, WA 6008, Australia, School of Paediatrics and Child Health, University of Western Australia, WA 6008, Australia, Institute for Immunology and Infectious Diseases, Murdoch University, WA 6150, Australia, Office of Population Health, Public Health and Clinical Services Division, Western Australian Department of Health, WA 6004, Australia, Academic Department of Medical Genetics, Sydney Children's Hospitals Network (Westmead), NSW 2145, Australia, Discipline of Genetic Medicine, Sydney Medical School, The University of Sydney, NSW 2006, Australia, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany, Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany and Berlin Brandenburg Center for Regenerative Therapies, 13353 Berlin, Germany School of ITEE, The University of Queensland, St. Lucia, QLD 4072, Australia, Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK, National Institute of Informatics, Hitotsubashi, Tokyo, Japan, Mouse Informa

    Concept recognition tools rely on the availability of textual corpora to assess their performance and enable the identification of areas for improvement. Typically, corpora are developed for specific purposes, such as gene name recognition. Gene and protein name identification are longstanding goals of biomedical text mining, and therefore a number of different corpora exist. However, phenotypes only recently became an entity of interest for specialized concept recognition systems, and hardly any annotated text is available for performance testing and training. Here, we present a unique corpus, capturing text spans from 228 abstracts manually annotated with Human Phenotype Ontology (HPO) concepts and harmonized by three curators, which can be used as a reference standard for free text annotation of human phenotypes. Furthermore, we developed a test suite for standardized concept recognition error analysis, incorporating 32 different types of test cases corresponding to 2164 HPO concepts. Finally, three established phenotype concept recognizers (NCBO Annotator, OBO Annotator and Bio-LarK CR) were comprehensively evaluated, and results are reported against both the text corpus and the test suites. The gold standard and test suites corpora are available from Database URL:

    Funded by: NHGRI NIH HHS: 1 U54 HG006370-01, U54 HG006370; Wellcome Trust: 098051

    Database : the journal of biological databases and curation 2015;2015

  • Targeted Next-Generation Sequencing Analysis of 1,000 Individuals with Intellectual Disability.

    Grozeva D, Carss K, Spasic-Boskovic O, Tejada MI, Gecz J, Shaw M, Corbett M, Haan E, Thompson E, Friend K, Hussain Z, Hackett A, Field M, Renieri A, Stevenson R, Schwartz C, Floyd JA, Bentham J, Cosgrove C, Keavney B, Bhattacharya S, Italian X-linked Mental Retardation Project, UK10K Consortium, GOLD Consortium, Hurles M and Raymond FL

    Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, CB2 0XY, United Kingdom.

    To identify genetic causes of intellectual disability (ID), we screened a cohort of 986 individuals with moderate to severe ID for variants in 565 known or candidate ID-associated genes using targeted next-generation sequencing. Likely pathogenic rare variants were found in ∼11% of the cases (113 variants in 107/986 individuals: ∼8% of the individuals had a likely pathogenic loss-of-function [LoF] variant, whereas ∼3% had a known pathogenic missense variant). Variants in SETD5, ATRX, CUL4B, MECP2, and ARID1B were the most common causes of ID. This study assessed the value of sequencing a cohort of probands to provide a molecular diagnosis of ID, without the availability of DNA from both parents for de novo sequence analysis. This modeling is clinically relevant as 28% of all UK families with dependent children are single parent households. In conclusion, to diagnose patients with ID in the absence of parental DNA, we recommend investigation of all LoF variants in known genes that cause ID and assessment of a limited list of proven pathogenic missense variants in these genes. This will provide 11% additional diagnostic yield beyond the 10%-15% yield from array CGH alone.

    Funded by: Wellcome Trust: 100140, WT091310

    Human mutation 2015;36;12;1197-204

  • Large-scale whole genome sequencing of M. tuberculosis provides insights into transmission in a high prevalence area.

    Guerra-Assunção JA, Crampin AC, Houben RM, Mzembe T, Mallard K, Coll F, Khan P, Banda L, Chiwaya A, Pereira RP, McNerney R, Fine PE, Parkhill J, Clark TG and Glynn JR

    Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, United Kingdom.

    To improve understanding of the factors influencing tuberculosis transmission and the role of pathogen variation, we sequenced all available specimens from patients diagnosed over 15 years in a whole district in Malawi. Mycobacterium tuberculosis lineages were assigned and transmission networks constructed, allowing ≤10 single nucleotide polymorphisms (SNPs) difference. We defined disease as due to recent infection if the network-determined source was within 5 years, and assessed transmissibility from forward transmissions resulting in disease. High-quality sequences were available for 1687 disease episodes (72% of all culture-positive episodes): 66% of patients linked to at least one other patient. The between-patient mutation rate was 0.26 SNPs/year (95% CI 0.21-0.31). We showed striking differences by lineage in the proportion of disease due to recent transmission and in transmissibility (highest for lineage-2 and lowest for lineage-1) that were not confounded by immigration, HIV status or drug resistance. Transmissions resulting in disease decreased markedly over time.

    Funded by: Wellcome Trust: 096249/Z/11/B, 098610, 100714, 101594

    eLife 2015;4

  • Recurrence due to relapse or reinfection with Mycobacterium tuberculosis: a whole-genome sequencing approach in a large, population-based cohort with a high HIV infection prevalence and active follow-up.

    Guerra-Assunção JA, Houben RM, Crampin AC, Mzembe T, Mallard K, Coll F, Khan P, Banda L, Chiwaya A, Pereira RP, McNerney R, Harris D, Parkhill J, Clark TG and Glynn JR

    Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine.

    Background: Recurrent tuberculosis is a major health burden and may be due to relapse with the original strain or reinfection with a new strain.

    Methods: In a population-based study in northern Malawi, patients with tuberculosis diagnosed from 1996 to 2010 were actively followed after the end of treatment. Whole-genome sequencing with approximately 100-fold coverage was performed on all available cultures. Results of IS6110 restriction fragment-length polymorphism analyses were available for cultures performed up to 2008.

    Results: Based on our data, a difference of ≤10 single-nucleotide polymorphisms (SNPs) was used to define relapse, and a difference of >100 SNPs was used to define reinfection. There was no evidence of mixed infections among those classified as reinfections. Of 1471 patients, 139 had laboratory-confirmed recurrences: 55 had relapse, and 20 had reinfection; for 64 type of recurrence was unclassified. Almost all relapses occurred in the first 2 years. Human immunodeficiency virus infection was associated with reinfection but not relapse. Relapses were associated with isoniazid resistance, treatment before 2007, and lineage-3 strains. We identified several gene variants associated with relapse. Lineage-2 (Beijing) was overrepresented and lineage-1 underrepresented among the reinfecting strains (P = .004).

    Conclusions: While some of the factors determining recurrence depend on the patient and their treatment, differences in the Mycobacterium tuberculosis genome appear to have a role in both relapse and reinfection.

    Funded by: Wellcome Trust: 096249/Z/11/B

    The Journal of infectious diseases 2015;211;7;1154-63

  • The evolutionary history of lethal metastatic prostate cancer.

    Gundem G, Van Loo P, Kremeyer B, Alexandrov LB, Tubio JMC, Papaemmanuil E, Brewer DS, Kallio HML, Högnäs G, Annala M, Kivinummi K, Goody V, Latimer C, O'Meara S, Dawson KJ, Isaacs W, Emmert-Buck MR, Nykter M, Foster C, Kote-Jarai Z, Easton D, Whitaker HC, ICGC Prostate Group, Neal DE, Cooper CS, Eeles RA, Visakorpi T, Campbell PJ, McDermott U, Wedge DC and Bova GS

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, UK.

    Cancers emerge from an ongoing Darwinian evolutionary process, often leading to multiple competing subclones within a single primary tumour. This evolutionary process culminates in the formation of metastases, which is the cause of 90% of cancer-related deaths. However, despite its clinical importance, little is known about the principles governing the dissemination of cancer cells to distant organs. Although the hypothesis that each metastasis originates from a single tumour cell is generally supported, recent studies using mouse models of cancer demonstrated the existence of polyclonal seeding from and interclonal cooperation between multiple subclones. Here we sought definitive evidence for the existence of polyclonal seeding in human malignancy and to establish the clonal relationship among different metastases in the context of androgen-deprived metastatic prostate cancer. Using whole-genome sequencing, we characterized multiple metastases arising from prostate tumours in ten patients. Integrated analyses of subclonal architecture revealed the patterns of metastatic spread in unprecedented detail. Metastasis-to-metastasis spread was found to be common, either through de novo monoclonal seeding of daughter metastases or, in five cases, through the transfer of multiple tumour clones between metastatic sites. Lesions affecting tumour suppressor genes usually occur as single events, whereas mutations in genes involved in androgen receptor signalling commonly involve multiple, convergent events in different metastases. Our results elucidate in detail the complex patterns of metastatic spread and further our understanding of the development of resistance to androgen-deprivation therapy in prostate cancer.

    Funded by: Cancer Research UK: A12758, A14835; Intramural NIH HHS; NCI NIH HHS: CA92234, R01 CA092234; Wellcome Trust: 077012

    Nature 2015;520;7547;353-357

  • Fine mapping in the MHC region accounts for 18% additional genetic risk for celiac disease.

    Gutierrez-Achury J, Zhernakova A, Pulit SL, Trynka G, Hunt KA, Romanos J, Raychaudhuri S, van Heel DA, Wijmenga C and de Bakker PI

    Department of Genetics, University Medical Center, University of Groningen, Groningen, the Netherlands.

    Although dietary gluten is the trigger for celiac disease, risk is strongly influenced by genetic variation in the major histocompatibility complex (MHC) region. We fine mapped the MHC association signal to identify additional risk factors independent of the HLA-DQA1 and HLA-DQB1 alleles and observed five new associations that account for 18% of the genetic risk. Taking these new loci together with the 57 known non-MHC loci, genetic variation can now explain up to 48% of celiac disease heritability.

    Funded by: NIAMS NIH HHS: 1R01AR062886, R01 AR062886, R01 AR063759, UH2 AR067677; NIGMS NIH HHS: U01 GM092691

    Nature genetics 2015;47;6;577-8

  • Deficiency of ECHS1 causes mitochondrial encephalopathy with cardiac involvement.

    Haack TB, Jackson CB, Murayama K, Kremer LS, Schaller A, Kotzaeridou U, de Vries MC, Schottmann G, Santra S, Büchner B, Wieland T, Graf E, Freisinger P, Eggimann S, Ohtake A, Okazaki Y, Kohda M, Kishita Y, Tokuzawa Y, Sauer S, Memari Y, Kolb-Kokocinski A, Durbin R, Hasselmann O, Cremer K, Albrecht B, Wieczorek D, Engels H, Hahn D, Zink AM, Alston CL, Taylor RW, Rodenburg RJ, Trollmann R, Sperl W, Strom TM, Hoffmann GF, Mayr JA, Meitinger T, Bolognini R, Schuelke M, Nuoffer JM, Kölker S, Prokisch H and Klopstock T

    Institute of Human Genetics, Technische Universität München 81675, Munich, Germany ; Institute of Human Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health 85764, Neuherberg, Germany.

    Objective: Short-chain enoyl-CoA hydratase (ECHS1) is a multifunctional mitochondrial matrix enzyme that is involved in the oxidation of fatty acids and essential amino acids such as valine. Here, we describe the broad phenotypic spectrum and pathobiochemistry of individuals with autosomal-recessive ECHS1 deficiency.

    Methods: Using exome sequencing, we identified ten unrelated individuals carrying compound heterozygous or homozygous mutations in ECHS1. Functional investigations in patient-derived fibroblast cell lines included immunoblotting, enzyme activity measurement, and a palmitate loading assay.

    Results: Patients showed a heterogeneous phenotype with disease onset in the first year of life and course ranging from neonatal death to survival into adulthood. The most prominent clinical features were encephalopathy (10/10), deafness (9/9), epilepsy (6/9), optic atrophy (6/10), and cardiomyopathy (4/10). Serum lactate was elevated and brain magnetic resonance imaging showed white matter changes or a Leigh-like pattern resembling disorders of mitochondrial energy metabolism. Analysis of patients' fibroblast cell lines (6/10) provided further evidence for the pathogenicity of the respective mutations by showing reduced ECHS1 protein levels and reduced 2-enoyl-CoA hydratase activity. While serum acylcarnitine profiles were largely normal, in vitro palmitate loading of patient fibroblasts revealed increased butyrylcarnitine, unmasking the functional defect in mitochondrial β-oxidation of short-chain fatty acids. Urinary excretion of 2-methyl-2,3-dihydroxybutyrate - a potential derivative of acryloyl-CoA in the valine catabolic pathway - was significantly increased, indicating impaired valine oxidation.

    Interpretation: In conclusion, we define the phenotypic spectrum of a new syndrome caused by ECHS1 deficiency. We speculate that both the β-oxidation defect and the block in l-valine metabolism, with accumulation of toxic methacrylyl-CoA and acryloyl-CoA, contribute to the disorder that may be amenable to metabolic treatment approaches.

    Funded by: Department of Health: NIHR-HCS-D12-03-04; Medical Research Council: MR/K000608/1; Wellcome Trust: 096919

    Annals of clinical and translational neurology 2015;2;5;492-509

  • Genetic evidence for an origin of the Armenians from Bronze Age mixing of multiple populations.

    Haber M, Mezzavilla M, Xue Y, Comas D, Gasparini P, Zalloua P and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK.

    The Armenians are a culturally isolated population who historically inhabited a region in the Near East bounded by the Mediterranean and Black seas and the Caucasus, but remain under-represented in genetic studies and have a complex history including a major geographic displacement during World War I. Here, we analyse genome-wide variation in 173 Armenians and compare them with 78 other worldwide populations. We find that Armenians form a distinctive cluster linking the Near East, Europe, and the Caucasus. We show that Armenian diversity can be explained by several mixtures of Eurasian populations that occurred between ~3000 and ~2000 bce, a period characterized by major population migrations after the domestication of the horse, appearance of chariots, and the rise of advanced civilizations in the Near East. However, genetic signals of population mixture cease after ~1200 bce when Bronze Age civilizations in the Eastern Mediterranean world suddenly and violently collapsed. Armenians have since remained isolated and genetic structure within the population developed ~500 years ago when Armenia was divided between the Ottomans and the Safavid Empire in Iran. Finally, we show that Armenians have higher genetic affinity to Neolithic Europeans than other present-day Near Easterners, and that 29% of Armenian ancestry may originate from an ancestral population that is best represented by Neolithic Europeans.

    Funded by: Wellcome Trust: 077009

    European journal of human genetics : EJHG 2015;24;6;931-6

  • Detection of livestock-associated meticillin-resistant Staphylococcus aureus CC398 in retail pork, United Kingdom, February 2015.

    Hadjirin NF, Lay EM, Paterson GK, Harrison EM, Peacock SJ, Parkhill J, Zadoks RN and Holmes MA

    Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom.

    Livestock-associated meticillin-resistant Staphylococcus aureus belonging to clonal complex 398 (LA-MRSA CC398) is an important cause of zoonotic infections in many countries. Here, we describe the isolation of LA-MRSA CC398 from retail meat samples of United Kingdom (UK) farm origin. Our findings indicate that this lineage is probably established in UK pig farms and demonstrate a potential pathway for the transmission of LA-MRSA CC398 from livestock to humans in the UK.

    Funded by: Medical Research Council: G1001787, G1001787/1; Wellcome Trust: 079643

    Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin 2015;20;24

  • Disease insights through cross-species phenotype comparisons.

    Haendel MA, Vasilevsky N, Brush M, Hochheiser HS, Jacobsen J, Oellrich A, Mungall CJ, Washington N, Köhler S, Lewis SE, Robinson PN and Smedley D

    University Library and Department of Medical Informatics and Epidemiology, Oregon Health & Science University, Portland, OR, USA.

    New sequencing technologies have ushered in a new era for diagnosis and discovery of new causative mutations for rare diseases. However, the sheer numbers of candidate variants that require interpretation in an exome or genomic analysis are still a challenging prospect. A powerful approach is the comparison of the patient's set of phenotypes (phenotypic profile) to known phenotypic profiles caused by mutations in orthologous genes associated with these variants. The most abundant source of relevant data for this task is available through the efforts of the Mouse Genome Informatics group and the International Mouse Phenotyping Consortium. In this review, we highlight the challenges in comparing human clinical phenotypes with mouse phenotypes and some of the solutions that have been developed by members of the Monarch Initiative. These tools allow the identification of mouse models for known disease-gene associations that may otherwise have been overlooked as well as candidate genes may be prioritized for novel associations. The culmination of these efforts is the Exomiser software package that allows clinical researchers to analyse patient exomes in the context of variant frequency and predicted pathogenicity as well the phenotypic similarity of the patient to any given candidate orthologous gene.

    Funded by: NHGRI NIH HHS: 1 U54 HG006370-01; NIH HHS: #5R24OD011883, R24 OD011883; Wellcome Trust

    Mammalian genome : official journal of the International Mammalian Genome Society 2015;26;9-10;548-55

  • Draft Genome Sequence of the Streptococcus pneumoniae Avery Strain A66.

    Hahn C, Harrison EM, Parkhill J, Holmes MA and Paterson GK

    School of Biological, Biomedical and Environmental Sciences, University of Hull, Hull, United Kingdom.

    We have used HiSeq 2000 technology to generate a draft genome sequence of Streptococcus pneumoniae strain A66. This is a common study strain used in investigations of pneumococcal bacterium-host interactions and was used in the seminal genetic studies of Avery et al.

    Genome announcements 2015;3;3

  • Induced pluripotent stem cell derived macrophages as a cellular system to study salmonella and other pathogens.

    Hale C, Yeung A, Goulding D, Pickard D, Alasoo K, Powrie F, Dougan G and Mukhopadhyay S

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    A number of pathogens, including several human-restricted organisms, persist and replicate within macrophages (Mφs) as a key step in pathogenesis. The mechanisms underpinning such host-restricted intracellular adaptations are poorly understood, in part, due to a lack of appropriate model systems. Here we explore the potential of human induced pluripotent stem cell derived macrophages (iPSDMs) to study such pathogen interactions. We show iPSDMs express a panel of established Mφ-specific markers, produce cytokines, and polarise into classical and alternative activation states in response to IFN-γ and IL-4 stimulation, respectively. iPSDMs also efficiently phagocytosed inactivated bacterial particles as well as live Salmonella Typhi and S. Typhimurium and were able to kill these pathogens. We conclude that iPSDMs can support productive Salmonella infection and propose this as a flexible system to study host/pathogen interactions. Furthermore, iPSDMs can provide a flexible and practical cellular platform for assessing host responses in multiple genetic backgrounds.

    Funded by: Wellcome Trust: 095688

    PloS one 2015;10;5;e0124307

  • Generation of Distal Airway Epithelium from Multipotent Human Foregut Stem Cells.

    Hannan NR, Sampaziotis F, Segeritz CP, Hanley NA and Vallier L

    1 Anne McLaren Laboratory for Regenerative Medicine, Department of Surgery, Wellcome Trust-Medical Research Council Stem Cell Institute, University of Cambridge , Cambridge, United Kingdom .

    Collectively, lung diseases are one of the largest causes of premature death worldwide and represent a major focus in the field of regenerative medicine. Despite significant progress, only few stem cell platforms are currently available for cell-based therapy, disease modeling, and drug screening in the context of pulmonary disorders. Human foregut stem cells (hFSCs) represent an advantageous progenitor cell type that can be used to amplify large quantities of cells for regenerative medicine applications and can be derived from any human pluripotent stem cell line. Here, we further demonstrate the application of hFSCs by generating a near homogeneous population of early pulmonary endoderm cells coexpressing NKX2.1 and FOXP2. These progenitors are then able to form cells that are representative of distal airway epithelium that express NKX2.1, GATA6, and cystic fibrosis transmembrane conductance regulator (CFTR) and secrete SFTPC. This culture system can be applied to hFSCs carrying the CFTR mutation Δf508, enabling the development of an in vitro model for cystic fibrosis. This platform is compatible with drug screening and functional validations of small molecules, which can reverse the phenotype associated with CFTR mutation. This is the first demonstration that multipotent endoderm stem cells can differentiate not only into both liver and pancreatic cells but also into lung endoderm. Furthermore, our study establishes a new approach for the generation of functional lung cells that can be used for disease modeling as well as for drug screening and the study of lung development.

    Funded by: Medical Research Council; Wellcome Trust: WT088566, WT097820

    Stem cells and development 2015;24;14;1680-90

  • As Clear as Mud? Determining the Diversity and Prevalence of Prophages in the Draft Genomes of Estuarine Isolates of Clostridium difficile.

    Hargreaves KR, Otieno JR, Thanki A, Blades MJ, Millard AD, Browne HP, Lawley TD and Clokie MR

    Department of Infection, Immunity and Inflammation, University of Leicester, United Kingdom Department of Ecology and Evolutionary Biology, University of Arizona.

    The bacterium Clostridium difficile is a significant cause of nosocomial infections worldwide. The pathogenic success of this organism can be attributed to its flexible genome which is characterized by the exchange of mobile genetic elements, and by ongoing genome evolution. Despite its pathogenic status, C. difficile can also be carried asymptomatically, and has been isolated from natural environments such as water and sediments where multiple strain types (ribotypes) are found in close proximity. These include ribotypes which are associated with disease, as well as those that are less commonly isolated from patients. Little is known about the genomic content of strains in such reservoirs in the natural environment. In this study, draft genomes have been generated for 13 C. difficile isolates from estuarine sediments including clinically relevant and environmental associated types. To identify the genetic diversity within this strain collection, whole-genome comparisons were performed using the assemblies. The strains are highly genetically diverse with regards to the C. difficile "mobilome," which includes transposons and prophage elements. We identified a novel transposon-like element in two R078 isolates. Multiple, related and unrelated, prophages were detected in isolates across ribotype groups, including two novel prophage elements and those related to the transducing phage φC2. The susceptibility of these isolates to lytic phage infection was tested using a panel of characterized phages found from the same locality. In conclusion, estuarine sediments are a source of genetically diverse C. difficile strains with a complex network of prophages, which could contribute to the emergence of new strains in clinics.

    Funded by: Medical Research Council: G0700855; Wellcome Trust: 098051, WT086418MA, WT100542AIA

    Genome biology and evolution 2015;7;7;1842-55

  • Genome specialization and decay of the strangles pathogen, Streptococcus equi, is driven by persistent infection.

    Harris SR, Robinson C, Steward KF, Webb KS, Paillot R, Parkhill J, Holden MT and Waller AS

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom;

    Strangles, the most frequently diagnosed infectious disease of horses worldwide, is caused by Streptococcus equi. Despite its prevalence, the global diversity and mechanisms underlying the evolution of S. equi as a host-restricted pathogen remain poorly understood. Here, we define the global population structure of this important pathogen and reveal a population replacement in the late 19th or early 20th Century. Our data reveal a dynamic genome that continues to mutate and decay, but also to amplify and acquire genes despite the organism having lost its natural competence and become host-restricted. The lifestyle of S. equi within the horse is defined by short-term acute disease, strangles, followed by long-term infection. Population analysis reveals evidence of convergent evolution in isolates from post-acute disease samples as a result of niche adaptation to persistent infection within a host. Mutations that lead to metabolic streamlining and the loss of virulence determinants are more frequently found in persistent isolates, suggesting that the pathogenic potential of S. equi reduces as a consequence of long-term residency within the horse post-acute disease. An example of this is the deletion of the equibactin siderophore locus that is associated with iron acquisition, which occurs exclusively in persistent isolates, and renders S. equi significantly less able to cause acute disease in the natural host. We identify several loci that may similarly be required for the full virulence of S. equi, directing future research toward the development of new vaccines against this host-restricted pathogen.

    Funded by: Wellcome Trust: 098051

    Genome research 2015;25;9;1360-71

  • Hierarchical Bayesian model for rare variant association analysis integrating genotype uncertainty in human sequence data.

    He L, Pitkäniemi J, Sarin AP, Salomaa V, Sillanpää MJ and Ripatti S

    Department of Public Health, Hjelt Institute, University of Helsinki, Helsinki, Finland.

    Next-generation sequencing (NGS) has led to the study of rare genetic variants, which possibly explain the missing heritability for complex diseases. Most existing methods for rare variant (RV) association detection do not account for the common presence of sequencing errors in NGS data. The errors can largely affect the power and perturb the accuracy of association tests due to rare observations of minor alleles. We developed a hierarchical Bayesian approach to estimate the association between RVs and complex diseases. Our integrated framework combines the misclassification probability with shrinkage-based Bayesian variable selection. It allows for flexibility in handling neutral and protective RVs with measurement error, and is robust enough for detecting causal RVs with a wide spectrum of minor allele frequency (MAF). Imputation uncertainty and MAF are incorporated into the integrated framework to achieve the optimal statistical power. We demonstrate that sequencing error does significantly affect the findings, and our proposed model can take advantage of it to improve statistical power in both simulated and real data. We further show that our model outperforms existing methods, such as sequence kernel association test (SKAT). Finally, we illustrate the behavior of the proposed method using a Finnish low-density lipoprotein cholesterol study, and show that it identifies an RV known as FH North Karelia in LDLR gene with three carriers in 1,155 individuals, which is missed by both SKAT and Granvil.

    Genetic epidemiology 2015;39;2;89-100

  • Potent organo-osmium compound shifts metabolism in epithelial ovarian cancer cells.

    Hearn JM, Romero-Canelón I, Munro AF, Fu Y, Pizarro AM, Garnett MJ, McDermott U, Carragher NO and Sadler PJ

    Warwick Systems Biology Centre, University of Warwick, Coventry CV4 7AL, United Kingdom; Department of Chemistry, University of Warwick, Coventry CV4 7AL, United Kingdom;

    The organometallic "half-sandwich" compound [Os(η(6)-p-cymene)(4-(2-pyridylazo)-N,N-dimethylaniline)I]PF6 is 49× more potent than the clinical drug cisplatin in the 809 cancer cell lines that we screened and is a candidate drug for cancer therapy. We investigate the mechanism of action of compound 1 in A2780 epithelial ovarian cancer cells. Whole-transcriptome sequencing identified three missense mutations in the mitochondrial genome of this cell line, coding for ND5, a subunit of complex I (NADH dehydrogenase) in the electron transport chain. ND5 is a proton pump, helping to maintain the coupling gradient in mitochondria. The identified mutations correspond to known protein variants (p.I257V, p.N447S, and p.L517P), not reported previously in epithelial ovarian cancer. Time-series RNA sequencing suggested that osmium-exposed A2780 cells undergo a metabolic shunt from glycolysis to oxidative phosphorylation, where defective machinery, associated with mutations in complex I, could enhance activity. Downstream events, measured by time-series reverse-phase protein microarrays, high-content imaging, and flow cytometry, showed a dramatic increase in mitochondrially produced reactive oxygen species (ROS) and subsequent DNA damage with up-regulation of ATM, p53, and p21 proteins. In contrast to platinum drugs, exposure to this organo-osmium compound does not cause significant apoptosis within a 72-h period, highlighting a different mechanism of action. Superoxide production in ovarian, lung, colon, breast, and prostate cancer cells exposed to three other structurally related organo-Os(II) compounds correlated with their antiproliferative activity. DNA damage caused indirectly, through selective ROS generation, may provide a more targeted approach to cancer therapy and a concept for next-generation metal-based anticancer drugs that combat platinum resistance.

    Funded by: Biotechnology and Biological Sciences Research Council: 324594; Wellcome Trust: 086357, 102696

    Proceedings of the National Academy of Sciences of the United States of America 2015;112;29;E3800-5

  • Genomic epidemiology of age-associated meningococcal lineages in national surveillance: an observational cohort study.

    Hill DM, Lucidarme J, Gray SJ, Newbold LS, Ure R, Brehony C, Harrison OB, Bray JE, Jolley KA, Bratcher HB, Parkhill J, Tang CM, Borrow R and Maiden MC

    Department of Zoology, University of Oxford, Oxford, UK.

    Background: Invasive meningococcal disease (IMD) is a worldwide health issue that is potentially preventable with vaccination. In view of its sporadic nature and the high diversity of Neisseria meningitidis, epidemiological surveillance incorporating detailed isolate characterisation is crucial for effective control and understanding the evolving epidemiology of IMD. The Meningitis Research Foundation Meningococcus Genome Library (MRF-MGL) exploits whole-genome sequencing (WGS) for this purpose and presents data on a comprehensive and coherent IMD isolate collection from England and Wales via the internet. We assessed the contribution of these data to investigating IMD epidemiology.

    Methods: WGS data were obtained for all 899 IMD isolates available for England and Wales in epidemiological years 2010-11 and 2011-12. The data had been annotated at 1720 loci, analysed, and disseminated online. Information was also available on meningococcal population structure and vaccine (Bexsero, GlaxoSmithKline, Brentford, Middlesex, UK) antigen variants, which enabled the investigation of IMD-associated genotypes over time and by patients' age groups. Population genomic analyses were done with a hierarchical gene-by-gene approach.

    Findings: The methods used by MRF-MGL efficiently characterised IMD isolates and information was provided in plain language. At least 20 meningococcal lineages were identified, three of which (hyperinvasive clonal complexes 41/44 [lineage 3], 269 [lineage 2], and 23 [lineage 23]) were responsible for 528 (59%) of IMD isolates. Lineages were highly diverse and showed evidence of extensive recombination. Specific lineages were associated with IMD in particular age groups, with notable diversity in the youngest and oldest individuals. The increased incidence of IMD from 1984 to 2010 in England and Wales was due to successive and concurrent epidemics of different lineages. Genetically, 74% of isolates were characterised as encoding group B capsules: 16% group Y, 6% group W, and 3% group C. Exact peptide matches for individual Bexsero vaccine antigens were present in up to 26% of isolates.

    Interpretation: The MRF-MGL represents an effective, broadly applicable model for the storage, analysis, and dissemination of WGS data that can facilitate real-time genomic pathogen surveillance. The data revealed information crucial to effective deployment and assessment of vaccines against N meningitidis.

    Funding: Meningitis Research Foundation, Wellcome Trust, Public Health England, European Union.

    Funded by: Wellcome Trust: 087622, 104992

    The Lancet. Infectious diseases 2015;15;12;1420-8

  • Discovery of a polyomavirus in European badgers (Meles meles) and the evolution of host range in the family Polyomaviridae.

    Hill SC, Murphy AA, Cotten M, Palser AL, Benson P, Lesellier S, Gormley E, Richomme C, Grierson S, Bhuachalla DN, Chambers M, Kellam P, Boschiroli ML, Ehlers B, Jarvis MA and Pybus OG

    Department of Zoology, University of Oxford, UK.

    Polyomaviruses infect a diverse range of mammalian and avian hosts, and are associated with a variety of symptoms. However, it is unknown whether the viruses are found in all mammalian families and the evolutionary history of the polyomaviruses is still unclear. Here, we report the discovery of a novel polyomavirus in the European badger (Meles meles), which to our knowledge represents the first polyomavirus to be characterized in the family Mustelidae, and within a European carnivoran. Although the virus was discovered serendipitously in the supernatant of a cell culture inoculated with badger material, we subsequently confirmed its presence in wild badgers. The European badger polyomavirus was tentatively named Meles meles polyomavirus 1 (MmelPyV1). The genome is 5187 bp long and encodes proteins typical of polyomaviruses. Phylogenetic analyses including all known polyomavirus genomes consistently group MmelPyV1 with California sea lion polyomavirus 1 across all regions of the genome. Further evolutionary analyses revealed phylogenetic discordance amongst polyomavirus genome regions, possibly arising from evolutionary rate heterogeneity, and a complex association between polyomavirus phylogeny and host taxonomic groups.

    Funded by: Wellcome Trust

    The Journal of general virology 2015;96;Pt 6;1411-1422

  • Pharmacogenomics of hypertension: a genome‐wide, placebo‐controlled cross‐over study, using four classes of antihypertensive drugs.

    Hiltunen TP, Donner KM, Sarin AP, Saarela J, Ripatti S, Chapman AB, Gums JG, Gong Y, Cooper-DeHoff RM, Frau F, Glorioso V, Zaninello R, Salvi E, Glorioso N, Boerwinkle E, Turner ST, Johnson JA and Kontula KK

    Background: Identification of genetic markers of antihypertensive drug responses could assist in individualization of hypertension treatment.

    Methods and results: We conducted a genome-wide association study to identify gene loci influencing the responsiveness of 228 male patients to 4 classes of antihypertensive drugs. The Genetics of Drug Responsiveness in Essential Hypertension (GENRES) study is a double-blind, placebo-controlled cross-over study where each subject received amlodipine, bisoprolol,hydrochlorothiazide, and losartan, each as a monotherapy, in a randomized order. Replication analyses were performed in 4 studies with patients of European ancestry (PEAR Study, N=386; GERA I and II Studies, N=196 and N=198; SOPHIA Study, N=372). We identified 3 single-nucleotide polymorphisms within the ACY3 gene that showed associations with bisoprolol response reaching genome-wide significance (P<5x10(-8))however, this could not be replicated in the PEAR Study using atenolol. In addition, 39 single-nucleotide polymorphisms showed P values of 10(-5) to 10(-7). The 20 top-associated single-nucleotide polymorphisms were different for each antihypertensive drug. None of these top single-nucleotide polymorphisms co-localized with the panel of >40 genes identified in genome-wide association studies of hypertension. Replication analyses of GENRES results provided suggestive evidence for a missense variant (rs3814995) in the NPHS1 (nephrin) gene influencing losartan response, and for 2 variants influencing hydrochlorothiazide response, located within or close to the ALDH1A3 (rs3825926) and CLIC5 (rs321329) genes.

    Conclusions: These data provide some evidence for a link between biology of the glomerular protein nephrin and antihypertensive action of angiotensin receptor antagonists and encourage additional studies on aldehyde dehydrogenase–mediated reactions in antihypertensive drug action.

    Funded by: NCATS NIH HHS: UL1 TR000064, UL1 TR000135, UL1 TR000454, UL1TR000064; NHLBI NIH HHS: R01 HL053335, R01 HL074735; NIGMS NIH HHS: U01 GM074492, U01-GM074492

    Journal of the American Heart Association 2015;4;1;e001521

  • Cardiology: Race for healthy hearts.

    Hitz MP and Andelfinger G

    Wellcome Trust Sanger Institute, Cambridge CB10 1SA, UK, and at the University Hospital Schleswig-Holstein and the Christian-Albrechts University, Kiel, Germany.

    Nature 2015;520;7546;160-1

  • WGE: a CRISPR database for genome engineering.

    Hodgkins A, Farne A, Perera S, Grego T, Parry-Smith DJ, Skarnes WC and Iyer V

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Unlabelled: The rapid development of CRISPR-Cas9 mediated genome editing techniques has given rise to a number of online and stand-alone tools to find and score CRISPR sites for whole genomes. Here we describe the Wellcome Trust Sanger Institute Genome Editing database (WGE), which uses novel methods to compute, visualize and select optimal CRISPR sites in a genome browser environment. The WGE database currently stores single and paired CRISPR sites and pre-calculated off-target information for CRISPRs located in the mouse and human exomes. Scoring and display of off-target sites is simple, and intuitive, and filters can be applied to identify high-quality CRISPR sites rapidly. WGE also provides a tool for the design and display of gene targeting vectors in the same genome browser, along with gene models, protein translation and variation tracks. WGE is open, extensible and can be set up to compute and present CRISPR sites for any genome.

    Availability and implementation: The WGE database is freely available at

    Contact: : or

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Funded by: NHGRI NIH HHS: 1 U54 HG006370-01; Wellcome Trust

    Bioinformatics (Oxford, England) 2015;31;18;3078-80

  • Palmitoylation and palmitoyl-transferases in Plasmodium parasites.

    Hodson N, Invergo B, Rayner JC and Choudhary JS

    *Malaria Programme, Wellcome Trust Sanger Institute.

    Protein post-translational modifications (PTM) are commonly used to regulate biological processes. Protein S-acylation is an enzymatically regulated reversible modification that has been shown to modulate protein localization, activity and membrane binding. Proteome-scale discovery on Plasmodium falciparum schizonts has revealed a complement of more than 400 palmitoylated proteins, including those essential for host invasion and drug resistance. The wide regulatory affect on this species is endorsed by the presence of 12 proteins containing the conserved DHHC-CRD (DHHC motif within a cysteine-rich domain) that is associated with palmitoyl-transferase activity. Genetic interrogation of these enzymes in Apicomplexa has revealed essentiality and distinct localization at cellular compartments; these features are species specific and are not observed in yeast. It is clear that palmitoylation has an elaborate role in Plasmodium biology and opens intriguing questions on the functional consequence of this group of acylation modifications and how the protein S-acyl transferases (PATs) orchestrate molecular events.

    Funded by: Wellcome Trust: 079643/Z/06/Z

    Biochemical Society transactions 2015;43;2;240-5

  • Dynamics of immunoglobulin sequence diversity in HIV-1 infected individuals.

    Hoehn KB, Gall A, Bashford-Rogers R, Fidler SJ, Kaye S, Weber JN, McClure MO, SPARTAC Trial Investigators, Kellam P and Pybus OG

    Department of Zoology, University of Oxford, Oxford, UK.

    Advances in immunoglobulin (Ig) sequencing technology are leading to new perspectives on immune system dynamics. Much research in this nascent field has focused on resolving immune responses to viral infection. However, the dynamics of B-cell diversity in early HIV infection, and in response to anti-retroviral therapy, are still poorly understood. Here, we investigate these dynamics through bulk Ig sequencing of samples collected over 2 years from a group of eight HIV-1 infected patients, five of whom received anti-retroviral therapy during the first half of the study period. We applied previously published methods for visualizing and quantifying B-cell sequence diversity, including the Gini index, and compared their efficacy to alternative measures. While we found significantly greater clonal structure in HIV-infected patients versus healthy controls, within HIV patients, we observed no significant relationships between statistics of B-cell clonal expansion and clinical variables such as viral load and CD4(+) count. Although there are many potential explanations for this, we suggest that important factors include poor sampling resolution and complex B-cell dynamics that are difficult to summarize using simple summary statistics. Importantly, we find a significant association between observed Gini indices and sequencing read depth, and we conclude that more robust analytical methods and a closer integration of experimental and theoretical work is needed to further our understanding of B-cell repertoire diversity during viral infection.

    Funded by: Medical Research Council: MR/L006588/1; Wellcome Trust: 069598/Z/02/Z, 090532/Z/09/Z, 104748

    Philosophical transactions of the Royal Society of London. Series B, Biological sciences 2015;370;1676

  • Key challenges for the creation and maintenance of specialist protein resources.

    Holliday GL, Bairoch A, Bagos PG, Chatonnet A, Craik DJ, Finn RD, Henrissat B, Landsman D, Manning G, Nagano N, O'Donovan C, Pruitt KD, Rawlings ND, Saier M, Sowdhamini R, Spedding M, Srinivasan N, Vriend G, Babbitt PC and Bateman A

    Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, 94158.

    As the volume of data relating to proteins increases, researchers rely more and more on the analysis of published data, thus increasing the importance of good access to these data that vary from the supplemental material of individual articles, all the way to major reference databases with professional staff and long-term funding. Specialist protein resources fill an important middle ground, providing interactive web interfaces to their databases for a focused topic or family of proteins, using specialized approaches that are not feasible in the major reference databases. Many are labors of love, run by a single lab with little or no dedicated funding and there are many challenges to building and maintaining them. This perspective arose from a meeting of several specialist protein resources and major reference databases held at the Wellcome Trust Genome Campus (Cambridge, UK) on August 11 and 12, 2014. During this meeting some common key challenges involved in creating and maintaining such resources were discussed, along with various approaches to address them. In laying out these challenges, we aim to inform users about how these issues impact our resources and illustrate ways in which our working together could enhance their accuracy, currency, and overall value.

    Funded by: British Heart Foundation: RG/13/5/30112; Intramural NIH HHS: Z01 LM000071-13; NHGRI NIH HHS: U41 HG002273, U41 HG007822, U41HG007822, U41HG02273; NIGMS NIH HHS: P20GM103446, R01 GM060595, R01 GM60595, R01GM080646; NLM NIH HHS: G08LM010720; Wellcome Trust: 077044, 099156

    Proteins 2015;83;6;1005-13

  • Genome Sequence of Acinetobacter baumannii Strain A1, an Early Example of Antibiotic-Resistant Global Clone 1.

    Holt KE, Hamidian M, Kenyon JJ, Wynn MT, Hawkey J, Pickard D and Hall RM

    Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Melbourne, Australia.

    Acinetobacter baumannii isolate A1 was recovered in the United Kingdom in 1982 and belongs to global clone 1 (GC1). Here, we present its complete 3.91-Mbp genome sequence, generated via a combination of short-read sequencing (Illumina), long-read sequencing (PacBio), and manual finishing.

    Genome announcements 2015;3;2

  • Genomic analysis of diversity, population structure, virulence, and antimicrobial resistance in Klebsiella pneumoniae, an urgent threat to public health.

    Holt KE, Wertheim H, Zadoks RN, Baker S, Whitehouse CA, Dance D, Jenney A, Connor TR, Hsu LY, Severin J, Brisse S, Cao H, Wilksch J, Gorrie C, Schultz MB, Edwards DJ, Nguyen KV, Nguyen TV, Dao TT, Mensink M, Minh VL, Nhu NT, Schultsz C, Kuntaman K, Newton PN, Moore CE, Strugnell RA and Thomson NR

    Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Parkville, VIC 3010, Australia; Department of Microbiology and Immunology, The University of Melbourne, Parkville, VIC 3010, Australia;

    Klebsiella pneumoniae is now recognized as an urgent threat to human health because of the emergence of multidrug-resistant strains associated with hospital outbreaks and hypervirulent strains associated with severe community-acquired infections. K. pneumoniae is ubiquitous in the environment and can colonize and infect both plants and animals. However, little is known about the population structure of K. pneumoniae, so it is difficult to recognize or understand the emergence of clinically important clones within this highly genetically diverse species. Here we present a detailed genomic framework for K. pneumoniae based on whole-genome sequencing of more than 300 human and animal isolates spanning four continents. Our data provide genome-wide support for the splitting of K. pneumoniae into three distinct species, KpI (K. pneumoniae), KpII (K. quasipneumoniae), and KpIII (K. variicola). Further, for K. pneumoniae (KpI), the entity most frequently associated with human infection, we show the existence of >150 deeply branching lineages including numerous multidrug-resistant or hypervirulent clones. We show K. pneumoniae has a large accessory genome approaching 30,000 protein-coding genes, including a number of virulence functions that are significantly associated with invasive community-acquired disease in humans. In our dataset, antimicrobial resistance genes were common among human carriage isolates and hospital-acquired infections, which generally lacked the genes associated with invasive disease. The convergence of virulence and resistance genes potentially could lead to the emergence of untreatable invasive K. pneumoniae infections; our data provide the whole-genome framework against which to track the emergence of such threats.

    Funded by: Wellcome Trust: 089275/H/09/Z, 089276, 098051, 100087

    Proceedings of the National Academy of Sciences of the United States of America 2015;112;27;E3574-81

  • Tracking the origins and drivers of subclonal metastatic expansion in prostate cancer.

    Hong MK, Macintyre G, Wedge DC, Van Loo P, Patel K, Lunke S, Alexandrov LB, Sloggett C, Cmero M, Marass F, Tsui D, Mangiola S, Lonie A, Naeem H, Sapre N, Phal PM, Kurganovs N, Chin X, Kerger M, Warren AY, Neal D, Gnanapragasam V, Rosenfeld N, Pedersen JS, Ryan A, Haviv I, Costello AJ, Corcoran NM and Hovens CM

    1] Department of Surgery, Division of Urology, Royal Melbourne Hospital and University of Melbourne, Parkville 3050, Victoria, Australia [2] The Epworth Prostate Centre, Epworth Hospital, Richmond 3121, Victoria, Australia.

    Tumour heterogeneity in primary prostate cancer is a well-established phenomenon. However, how the subclonal diversity of tumours changes during metastasis and progression to lethality is poorly understood. Here we reveal the precise direction of metastatic spread across four lethal prostate cancer patients using whole-genome and ultra-deep targeted sequencing of longitudinally collected primary and metastatic tumours. We find one case of metastatic spread to the surgical bed causing local recurrence, and another case of cross-metastatic site seeding combining with dynamic remoulding of subclonal mixtures in response to therapy. By ultra-deep sequencing end-stage blood, we detect both metastatic and primary tumour clones, even years after removal of the prostate. Analysis of mutations associated with metastasis reveals an enrichment of TP53 mutations, and additional sequencing of metastases from 19 patients demonstrates that acquisition of TP53 mutations is linked with the expansion of subclones with metastatic potential which we can detect in the blood.

    Funded by: Cancer Research UK: C14303/A17197

    Nature communications 2015;6;6605

  • Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation.

    Horikoshi M, Mӓgi R, van de Bunt M, Surakka I, Sarin AP, Mahajan A, Marullo L, Thorleifsson G, Hӓgg S, Hottenga JJ, Ladenvall C, Ried JS, Winkler TW, Willems SM, Pervjakova N, Esko T, Beekman M, Nelson CP, Willenborg C, Wiltshire S, Ferreira T, Fernandez J, Gaulton KJ, Steinthorsdottir V, Hamsten A, Magnusson PK, Willemsen G, Milaneschi Y, Robertson NR, Groves CJ, Bennett AJ, Lehtimӓki T, Viikari JS, Rung J, Lyssenko V, Perola M, Heid IM, Herder C, Grallert H, Müller-Nurasyid M, Roden M, Hypponen E, Isaacs A, van Leeuwen EM, Karssen LC, Mihailov E, Houwing-Duistermaat JJ, de Craen AJ, Deelen J, Havulinna AS, Blades M, Hengstenberg C, Erdmann J, Schunkert H, Kaprio J, Tobin MD, Samani NJ, Lind L, Salomaa V, Lindgren CM, Slagboom PE, Metspalu A, van Duijn CM, Eriksson JG, Peters A, Gieger C, Jula A, Groop L, Raitakari OT, Power C, Penninx BW, de Geus E, Smit JH, Boomsma DI, Pedersen NL, Ingelsson E, Thorsteinsdottir U, Stefansson K, Ripatti S, Prokopenko I, McCarthy MI, Morris AP and ENGAGE Consortium

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom; Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, United Kingdom.

    Reference panels from the 1000 Genomes (1000G) Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ≥0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE) Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS), supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI) at genome-wide significance, and two for fasting glucose (FG), none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3) and FG (GCK and G6PC2). The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated.

    Funded by: Medical Research Council: G0500539, G0600705, G0902313; NHLBI NIH HHS: 5R01HL087679; NIAAA NIH HHS: AA-00145, AA-09203, AA-12502, AA15416, K02AA018755; NIDDK NIH HHS: U01 DK062418; NIMH NIH HHS: 1RC2 MH089995-01, 1RC2MH089951-01, 1RL1MH083268-01, MH081802, U24 MH068457-06; PHS HHS: R01D0042157-01A; Wellcome Trust: 098381, GR069224, WT064890, WT090532, WT098017

    PLoS genetics 2015;11;7;e1005230

  • A Library of Plasmodium vivax Recombinant Merozoite Proteins Reveals New Vaccine Candidates and Protein-Protein Interactions.

    Hostetler JB, Sharma S, Bartholdson SJ, Wright GJ, Fairhurst RM and Rayner JC

    Laboratory of Malaria and Vector Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, United States of America.

    Background: A vaccine targeting Plasmodium vivax will be an essential component of any comprehensive malaria elimination program, but major gaps in our understanding of P. vivax biology, including the protein-protein interactions that mediate merozoite invasion of reticulocytes, hinder the search for candidate antigens. Only one ligand-receptor interaction has been identified, that between P. vivax Duffy Binding Protein (PvDBP) and the erythrocyte Duffy Antigen Receptor for Chemokines (DARC), and strain-specific immune responses to PvDBP make it a complex vaccine target. To broaden the repertoire of potential P. vivax merozoite-stage vaccine targets, we exploited a recent breakthrough in expressing full-length ectodomains of Plasmodium proteins in a functionally-active form in mammalian cells and initiated a large-scale study of P. vivax merozoite proteins that are potentially involved in reticulocyte binding and invasion.

    Methodology/principal findings: We selected 39 P. vivax proteins that are predicted to localize to the merozoite surface or invasive secretory organelles, some of which show homology to P. falciparum vaccine candidates. Of these, we were able to express 37 full-length protein ectodomains in a mammalian expression system, which has been previously used to express P. falciparum invasion ligands such as PfRH5. To establish whether the expressed proteins were correctly folded, we assessed whether they were recognized by antibodies from Cambodian patients with acute vivax malaria. IgG from these samples showed at least a two-fold change in reactivity over naïve controls in 27 of 34 antigens tested, and the majority showed heat-labile IgG immunoreactivity, suggesting the presence of conformation-sensitive epitopes and native tertiary protein structures. Using a method specifically designed to detect low-affinity, extracellular protein-protein interactions, we confirmed a predicted interaction between P. vivax 6-cysteine proteins P12 and P41, further suggesting that the proteins are natively folded and functional. This screen also identified two novel protein-protein interactions, between P12 and PVX_110945, and between MSP3.10 and MSP7.1, the latter of which was confirmed by surface plasmon resonance.

    Conclusions/significance: We produced a new library of recombinant full-length P. vivax ectodomains, established that the majority of them contain tertiary structure, and used them to identify predicted and novel protein-protein interactions. As well as identifying new interactions for further biological studies, this library will be useful in identifying P. vivax proteins with vaccine potential, and studying P. vivax malaria pathogenesis and immunity.

    Trial registration: NCT00663546.

    Funded by: Medical Research Council: MR/J002283/1; Wellcome Trust: 090851

    PLoS neglected tropical diseases 2015;9;12;e0004264

  • B56δ-related protein phosphatase 2A dysfunction identified in patients with intellectual disability.

    Houge G, Haesen D, Vissers LE, Mehta S, Parker MJ, Wright M, Vogt J, McKee S, Tolmie JL, Cordeiro N, Kleefstra T, Willemsen MH, Reijnders MR, Berland S, Hayman E, Lahat E, Brilstra EH, van Gassen KL, Zonneveld-Huijssoon E, de Bie CI, Hoischen A, Eichler EE, Holdhus R, Steen VM, Døskeland SO, Hurles ME, FitzPatrick DR and Janssens V

    Here we report inherited dysregulation of protein phosphatase activity as a cause of intellectual disability (ID). De novo missense mutations in 2 subunits of serine/threonine (Ser/Thr) protein phosphatase 2A (PP2A) were identified in 16 individuals with mild to severe ID, long-lasting hypotonia, epileptic susceptibility, frontal bossing, mild hypertelorism, and downslanting palpebral fissures. PP2A comprises catalytic (C), scaffolding (A), and regulatory (B) subunits that determine subcellular anchoring, substrate specificity, and physiological function. Ten patients had mutations within a highly conserved acidic loop of the PPP2R5D-encoded B56δ regulatory subunit, with the same E198K mutation present in 6 individuals. Five patients had mutations in the PPP2R1A-encoded scaffolding Aα subunit, with the same R182W mutation in 3 individuals. Some Aα cases presented with large ventricles, causing macrocephaly and hydrocephalus suspicion, and all cases exhibited partial or complete corpus callosum agenesis. Functional evaluation revealed that mutant A and B subunits were stable and uncoupled from phosphatase activity. Mutant B56δ was A and C binding-deficient, while mutant Aα subunits bound B56δ well but were unable to bind C or bound a catalytically impaired C, suggesting a dominant-negative effect where mutant subunits hinder dephosphorylation of B56δ-anchored substrates. Moreover, mutant subunit overexpression resulted in hyperphosphorylation of GSK3β, a B56δ-regulated substrate. This effect was in line with clinical observations, supporting a correlation between the ID degree and biochemical disturbance.

    Funded by: Department of Health; NICHD NIH HHS: U54 HD083091; NIMH NIH HHS: R01 MH101221; Wellcome Trust: WT098051

    The Journal of clinical investigation 2015;125;8;3051-62

  • Using population data for assessing next-generation sequencing performance.

    Houniet DT, Rahman TJ, Al Turki S, Hurles ME, Xu Y, Goodship J, Keavney B and Santibanez Koref M

    Oxford Gene Technology, Begbroke Science Park, Oxford, Oxfordshire, OX5 1PF, Institute of Genetic Medicine, Newcastle University, International Centre for Life, Central Parkway NE1 3BZ, Newcastle upon Tyne and The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Motivation: During the past 4 years, whole-exome sequencing has become a standard tool for finding rare variants causing Mendelian disorders. In that time, there has also been a proliferation of both sequencing platforms and approaches to analyse their output. This requires approaches to assess the performance of different methods. Traditionally, criteria such as comparison with microarray data or a number of known polymorphic sites have been used. Here we expand such approaches, developing a maximum likelihood framework and using it to estimate the sensitivity and specificity of whole-exome sequencing data.

    Results: Using whole-exome sequencing data for a panel of 19 individuals, we show that estimated sensitivity and specificity are similar to those calculated using microarray data as a reference. We explore the effect of frequency misspecification arising from using an inappropriately selected population and find that, although the estimates are affected, the rankings across procedures remain the same.

    Availability and implementation: An implementation using Perl and R can be found at (Username: igm101; Password: Z1z1nts).

    Funded by: British Heart Foundation: FS/10/008/28146; Wellcome Trust: WT098051

    Bioinformatics (Oxford, England) 2015;31;1;56-61

  • Using optical mapping data for the improvement of vertebrate genome assemblies.

    Howe K and Wood JM

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus Hinxton, Cambridge, UK.

    Optical mapping is a technology that gathers long-range information on genome sequences similar to ordered restriction digest maps. Because it is not subject to cloning, amplification, hybridisation or sequencing bias, it is ideally suited to the improvement of fragmented genome assemblies that can no longer be improved by classical methods. In addition, its low cost and rapid turnaround make it equally useful during the scaffolding process of de novo assembly from high throughput sequencing reads. We describe how optical mapping has been used in practice to produce high quality vertebrate genome assemblies. In particular, we detail the efforts undertaken by the Genome Reference Consortium (GRC), which maintains the reference genomes for human, mouse, zebrafish and chicken, and uses different optical mapping platforms for genome curation.

    GigaScience 2015;4;10

  • Development of a Multiplex PCR Assay for Rapid Molecular Serotyping of Haemophilus parasuis.

    Howell KJ, Peters SE, Wang J, Hernandez-Garcia J, Weinert LA, Luan SL, Chaudhuri RR, Angen Ø, Aragon V, Williamson SM, Parkhill J, Langford PR, Rycroft AN, Wren BW, Maskell DJ, Tucker AW and BRaDP1T Consortium

    Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom

    Haemophilus parasuis causes Glässer's disease and pneumonia in pigs. Indirect hemagglutination (IHA) is typically used to serotype this bacterium, distinguishing 15 serovars with some nontypeable isolates. The capsule loci of the 15 reference strains have been annotated, and significant genetic variation was identified between serovars, with the exception of serovars 5 and 12. A capsule locus and in silico serovar were identified for all but two nontypeable isolates in our collection of >200 isolates. Here, we describe the development of a multiplex PCR, based on variation within the capsule loci of the 15 serovars of H. parasuis, for rapid molecular serotyping. The multiplex PCR (mPCR) distinguished between all previously described serovars except 5 and 12, which were detected by the same pair of primers. The detection limit of the mPCR was 4.29 × 10(5) ng/μl bacterial genomic DNA, and high specificity was indicated by the absence of reactivity against closely related commensal Pasteurellaceae and other bacterial pathogens of pigs. A subset of 150 isolates from a previously sequenced H. parasuis collection was used to validate the mPCR with 100% accuracy compared to the in silico results. In addition, the two in silico-nontypeable isolates were typeable using the mPCR. A further 84 isolates were analyzed by mPCR and compared to the IHA serotyping results with 90% concordance (excluding those that were nontypeable by IHA). The mPCR was faster, more sensitive, and more specific than IHA, enabling the differentiation of 14 of the 15 serovars of H. parasuis.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/G003203/1, BB/G018553/1, BB/G019177/1, BB/G019274/1, BB/G020744/1

    Journal of clinical microbiology 2015;53;12;3812-21

  • Evolutionary dynamics of methicillin-resistant Staphylococcus aureus within a healthcare system.

    Hsu LY, Harris SR, Chlebowicz MA, Lindsay JA, Koh TH, Krishnan P, Tan TY, Hon PY, Grubb WB, Bentley SD, Parkhill J, Peacock SJ and Holden MT

    National University Health System, 1E Kent Ridge Road, NUHS Tower Block Level 10, Singapore, 119228, Singapore.

    Background: In the past decade, several countries have seen gradual replacement of endemic multi-resistant healthcare-associated methicillin-resistant Staphylococcus aureus (MRSA) with clones that are more susceptible to antibiotic treatment. One example is Singapore, where MRSA ST239, the dominant clone since molecular profiling of MRSA began in the mid-1980s, has been replaced by ST22 isolates belonging to EMRSA-15, a recently emerged pandemic lineage originating from Europe.

    Results: We investigated the population structure of MRSA in Singaporean hospitals spanning three decades, using whole genome sequencing. Applying Bayesian phylogenetic methods we report that prior to the introduction of ST22, the ST239 MRSA population in Singapore originated from multiple introductions from the surrounding region; it was frequently transferred within the healthcare system resulting in a heterogeneous hospital population. Following the introduction of ST22 around the beginning of the millennium, this clone spread rapidly through Singaporean hospitals, supplanting the endemic ST239 population. Coalescent analysis revealed that although the genetic diversity of ST239 initially decreased as ST22 became more dominant, from 2007 onwards the genetic diversity of ST239 began to increase once more, which was not associated with the emergence of a sub-clone of ST239. Comparative genomic analysis of the accessory genome of the extant ST239 population identified that the Arginine Catabolic Mobile Element arose multiple times, thereby introducing genes associated with enhanced skin colonization into this population.

    Conclusions: Our results clearly demonstrate that, alongside clinical practice and antibiotic usage, competition between clones also has an important role in driving the evolution of nosocomial pathogen populations.

    Funded by: Medical Research Council: G1000803; Wellcome Trust: 098051

    Genome biology 2015;16;81

  • Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel.

    Huang J, Howie B, McCarthy S, Memari Y, Walter K, Min JL, Danecek P, Malerba G, Trabetti E, Zheng HF, UK10K Consortium, Gambaro G, Richards JB, Durbin R, Timpson NJ, Marchini J and Soranzo N

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK.

    Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. We also present a method for combining WGS panels to improve variant coverage and downstream imputation accuracy, which we illustrate by integrating 7,562 WGS haplotypes from the UK10K project with 2,184 haplotypes from the 1000 Genomes Project. Finally, we introduce a novel approximation that maintains speed without sacrificing imputation accuracy for rare variants.

    Funded by: CIHR; British Heart Foundation: PG/13/66/30442, RG/10/13/28570, RG/10/17/28553; Department of Health; Medical Research Council: G0800509, MC_PC_15018, MC_UU_12013/1, MC_UU_12013/3, MC_UU_12015/1, MR/L010305/1; Wellcome Trust: 091551, 095515, 095564, 096599, 098497, 098498, 100140, 100574, 102215, WT091310, WT098051

    Nature communications 2015;6;8111

  • Modulation of genetic associations with serum urate levels by body-mass-index in humans.

    Huffman JE, Albrecht E, Teumer A, Mangino M, Kapur K, Johnson T, Kutalik Z, Pirastu N, Pistis G, Lopez LM, Haller T, Salo P, Goel A, Li M, Tanaka T, Dehghan A, Ruggiero D, Malerba G, Smith AV, Nolte IM, Portas L, Phipps-Green A, Boteva L, Navarro P, Johansson A, Hicks AA, Polasek O, Esko T, Peden JF, Harris SE, Murgia F, Wild SH, Tenesa A, Tin A, Mihailov E, Grotevendt A, Gislason GK, Coresh J, D'Adamo P, Ulivi S, Vollenweider P, Waeber G, Campbell S, Kolcic I, Fisher K, Viigimaa M, Metter JE, Masciullo C, Trabetti E, Bombieri C, Sorice R, Döring A, Reischl E, Strauch K, Hofman A, Uitterlinden AG, Waldenberger M, Wichmann HE, Davies G, Gow AJ, Dalbeth N, Stamp L, Smit JH, Kirin M, Nagaraja R, Nauck M, Schurmann C, Budde K, Farrington SM, Theodoratou E, Jula A, Salomaa V, Sala C, Hengstenberg C, Burnier M, Mägi R, Klopp N, Kloiber S, Schipf S, Ripatti S, Cabras S, Soranzo N, Homuth G, Nutile T, Munroe PB, Hastie N, Campbell H, Rudan I, Cabrera C, Haley C, Franco OH, Merriman TR, Gudnason V, Pirastu M, Penninx BW, Snieder H, Metspalu A, Ciullo M, Pramstaller PP, van Duijn CM, Ferrucci L, Gambaro G, Deary IJ, Dunlop MG, Wilson JF, Gasparini P, Gyllensten U, Spector TD, Wright AF, Hayward C, Watkins H, Perola M, Bochud M, Kao WH, Caulfield M, Toniolo D, Völzke H, Gieger C, Köttgen A and Vitart V

    Medical Research Council (MRC) Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine (IGMM), University of Edinburgh, Edinburgh, United Kingdom.

    We tested for interactions between body mass index (BMI) and common genetic variants affecting serum urate levels, genome-wide, in up to 42569 participants. Both stratified genome-wide association (GWAS) analyses, in lean, overweight and obese individuals, and regression-type analyses in a non BMI-stratified overall sample were performed. The former did not uncover any novel locus with a major main effect, but supported modulation of effects for some known and potentially new urate loci. The latter highlighted a SNP at RBFOX3 reaching genome-wide significant level (effect size 0.014, 95% CI 0.008-0.02, Pinter= 2.6 x 10-8). Two top loci in interaction term analyses, RBFOX3 and ERO1LB-EDARADD, also displayed suggestive differences in main effect size between the lean and obese strata. All top ranking loci for urate effect differences between BMI categories were novel and most had small magnitude but opposite direction effects between strata. They include the locus RBMS1-TANK (men, Pdifflean-overweight= 4.7 x 10-8), a region that has been associated with several obesity related traits, and TSPYL5 (men, Pdifflean-overweight= 9.1 x 10-8), regulating adipocytes-produced estradiol. The top-ranking known urate loci was ABCG2, the strongest known gout risk locus, with an effect halved in obese compared to lean men (Pdifflean-obese= 2 x 10-4). Finally, pathway analysis suggested a role for N-glycan biosynthesis as a prominent urate-associated pathway in the lean stratum. These results illustrate a potentially powerful way to monitor changes occurring in obesogenic environment.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F019394/1; Cancer Research UK: 12076, C348/A12076, C348/A6361; Medical Research Council: G0000657-53203, G0700704, G9521010, G9521010D, MC_PC_U127527198, MC_PC_U127561128, MR/K018647/1, MR/K026992/1; NCRR NIH HHS: UL1RR025005; NHGRI NIH HHS: U01HG004402; NHLBI NIH HHS: R01HL086694, R01HL087641, R01HL59367; NIA NIH HHS: N01-AG-12100; PHS HHS: HHSN268200625226C, HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, HHSN268201100012C; Wellcome Trust: 095831

    PloS one 2015;10;3;e0119752

  • IVA: accurate de novo assembly of RNA virus genomes.

    Hunt M, Gall A, Ong SH, Brener J, Ferns B, Goulder P, Nastouli E, Keane JA, Kellam P and Otto TD

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.

    Motivation: An accurate genome assembly from short read sequencing data is critical for downstream analysis, for example allowing investigation of variants within a sequenced population. However, assembling sequencing data from virus samples, especially RNA viruses, into a genome sequence is challenging due to the combination of viral population diversity and extremely uneven read depth caused by amplification bias in the inevitable reverse transcription and polymerase chain reaction amplification process of current methods.

    Results: We developed a new de novo assembler called IVA (Iterative Virus Assembler) designed specifically for read pairs sequenced at highly variable depth from RNA virus samples. We tested IVA on datasets from 140 sequenced samples from human immunodeficiency virus-1 or influenza-virus-infected people and demonstrated that IVA outperforms all other virus de novo assemblers.

    Availability and implementation: The software runs under Linux, has the GPLv3 licence and is freely available from

    Funded by: Wellcome Trust: 098051

    Bioinformatics (Oxford, England) 2015;31;14;2374-6

  • Circlator: automated circularization of genome assemblies using long sequencing reads.

    Hunt M, Silva ND, Otto TD, Parkhill J, Keane JA and Harris SR

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK.

    The assembly of DNA sequence data is undergoing a renaissance thanks to emerging technologies capable of producing reads tens of kilobases long. Assembling complete bacterial and small eukaryotic genomes is now possible, but the final step of circularizing sequences remains unsolved. Here we present Circlator, the first tool to automate assembly circularization and produce accurate linear representations of circular sequences. Using Pacific Biosciences and Oxford Nanopore data, Circlator correctly circularized 26 of 27 circularizable sequences, comprising 11 chromosomes and 12 plasmids from bacteria, the apicoplast and mitochondrion of Plasmodium falciparum and a human mitochondrion. Circlator is available at .

    Funded by: Wellcome Trust: 098051

    Genome biology 2015;16;294

  • Inter-individual variability contrasts with regional homogeneity in the human brain DNA methylome.

    Illingworth RS, Gruenewald-Schneider U, De Sousa D, Webb S, Merusi C, Kerr AR, James KD, Smith C, Walker R, Andrews R and Bird AP

    Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, Midlothian, EH9 3BF, UK

    The possibility that alterations in DNA methylation are mechanistic drivers of development, aging and susceptibility to disease is widely acknowledged, but evidence remains patchy or inconclusive. Of particular interest in this regard is the brain, where it has been reported that DNA methylation impacts on neuronal activity, learning and memory, drug addiction and neurodegeneration. Until recently, however, little was known about the 'landscape' of the human brain methylome. Here we assay 1.9 million CpGs in each of 43 brain samples representing different individuals and brain regions. The cerebellum was a consistent outlier compared to all other regions, and showed over 16 000 differentially methylated regions (DMRs). Unexpectedly, the sequence characteristics of hypo- and hypermethylated domains in cerebellum were distinct. In contrast, very few DMRs distinguished regions of the cortex, limbic system and brain stem. Inter-individual DMRs were readily detectable in these regions. These results lead to the surprising conclusion that, with the exception of cerebellum, DNA methylation patterns are more homogeneous between different brain regions from the same individual, than they are for a single brain region between different individuals. This finding suggests that DNA sequence composition, not developmental status, is the principal determinant of the human brain DNA methylome.

    Funded by: Medical Research Council: G0800026; Wellcome Trust: 077224

    Nucleic acids research 2015;43;2;732-44

  • INFRAFRONTIER--providing mutant mouse resources as research tools for the international scientific community.

    INFRAFRONTIER Consortium

    The laboratory mouse is a key model organism to investigate mechanism and therapeutics of human disease. The number of targeted genetic mouse models of disease is growing rapidly due to high-throughput production strategies employed by the International Mouse Phenotyping Consortium (IMPC) and the development of new, more efficient genome engineering techniques such as CRISPR based systems. We have previously described the European Mouse Mutant Archive (EMMA) resource and how this international infrastructure provides archiving and distribution worldwide for mutant mouse strains. EMMA has since evolved into INFRAFRONTIER (, the pan-European research infrastructure for the systemic phenotyping, archiving and distribution of mouse disease models. Here we describe new features including improved search for mouse strains, support for new embryonic stem cell resources, access to training materials via a comprehensive knowledgebase and the promotion of innovative analytical and diagnostic techniques.

    Funded by: Medical Research Council: MC_U142684172

    Nucleic acids research 2015;43;Database issue;D1171-5

  • Spatial enhancer clustering and regulation of enhancer-proximal genes by cohesin.

    Ing-Simmons E, Seitan VC, Faure AJ, Flicek P, Carroll T, Dekker J, Fisher AG, Lenhard B and Merkenschlager M

    Lymphocyte Development Group, MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, United Kingdom; Computational Regulatory Genomics Group, MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, United Kingdom;

    In addition to mediating sister chromatid cohesion during the cell cycle, the cohesin complex associates with CTCF and with active gene regulatory elements to form long-range interactions between its binding sites. Genome-wide chromosome conformation capture had shown that cohesin's main role in interphase genome organization is in mediating interactions within architectural chromosome compartments, rather than specifying compartments per se. However, it remains unclear how cohesin-mediated interactions contribute to the regulation of gene expression. We have found that the binding of CTCF and cohesin is highly enriched at enhancers and in particular at enhancer arrays or "super-enhancers" in mouse thymocytes. Using local and global chromosome conformation capture, we demonstrate that enhancer elements associate not just in linear sequence, but also in 3D, and that spatial enhancer clustering is facilitated by cohesin. The conditional deletion of cohesin from noncycling thymocytes preserved enhancer position, H3K27ac, H4K4me1, and enhancer transcription, but weakened interactions between enhancers. Interestingly, ∼ 50% of deregulated genes reside in the vicinity of enhancer elements, suggesting that cohesin regulates gene expression through spatial clustering of enhancer elements. We propose a model for cohesin-dependent gene regulation in which spatial clustering of enhancer elements acts as a unified mechanism for both enhancer-promoter "connections" and "insulation."

    Funded by: Medical Research Council: MC_U120027516; NHGRI NIH HHS: R01 HG003143; Wellcome Trust

    Genome research 2015;25;4;504-13

  • Cardiometabolic effects of genetic upregulation of the interleukin 1 receptor antagonist: a Mendelian randomisation analysis.

    Interleukin 1 Genetics Consortium

    Background: To investigate potential cardiovascular and other effects of long-term pharmacological interleukin 1 (IL-1) inhibition, we studied genetic variants that produce inhibition of IL-1, a master regulator of inflammation.

    Methods: We created a genetic score combining the effects of alleles of two common variants (rs6743376 and rs1542176) that are located upstream of IL1RN, the gene encoding the IL-1 receptor antagonist (IL-1Ra; an endogenous inhibitor of both IL-1α and IL-1β); both alleles increase soluble IL-1Ra protein concentration. We compared effects on inflammation biomarkers of this genetic score with those of anakinra, the recombinant form of IL-1Ra, which has previously been studied in randomised trials of rheumatoid arthritis and other inflammatory disorders. In primary analyses, we investigated the score in relation to rheumatoid arthritis and four cardiometabolic diseases (type 2 diabetes, coronary heart disease, ischaemic stroke, and abdominal aortic aneurysm; 453,411 total participants). In exploratory analyses, we studied the relation of the score to many disease traits and to 24 other disorders of proposed relevance to IL-1 signalling (746,171 total participants).

    Findings: For each IL1RN minor allele inherited, serum concentrations of IL-1Ra increased by 0.22 SD (95% CI 0.18-0.25; 12.5%; p = 9.3 × 10(-33)), concentrations of interleukin 6 decreased by 0.02 SD (-0.04 to -0.01; -1.7%; p = 3.5 × 10(-3)), and concentrations of C-reactive protein decreased by 0.03 SD (-0.04 to -0.02; -3.4%; p = 7.7 × 10(-14)). We noted the effects of the genetic score on these inflammation biomarkers to be directionally concordant with those of anakinra. The allele count of the genetic score had roughly log-linear, dose-dependent associations with both IL-1Ra concentration and risk of coronary heart disease. For people who carried four IL-1Ra-raising alleles, the odds ratio for coronary heart disease was 1.15 (1.08-1.22; p = 1.8 × 10(-6)) compared with people who carried no IL-1Ra-raising alleles; the per-allele odds ratio for coronary heart disease was 1.03 (1.02-1.04; p = 3.9 × 10(-10)). Per-allele odds ratios were 0.97 (0.95-0.99; p = 9.9 × 10(-4)) for rheumatoid arthritis, 0.99 (0.97-1.01; p = 0.47) for type 2 diabetes, 1.00 (0.98-1.02; p = 0.92) for ischaemic stroke, and 1.08 (1.04-1.12; p = 1.8 × 10(-5)) for abdominal aortic aneurysm. In exploratory analyses, we observed per-allele increases in concentrations of proatherogenic lipids, including LDL-cholesterol, but no clear evidence of association for blood pressure, glycaemic traits, or any of the 24 other disorders studied. Modelling suggested that the observed increase in LDL-cholesterol could account for about a third of the association observed between the genetic score and increased coronary risk.

    Interpretation: Human genetic data suggest that long-term dual IL-1α/β inhibition could increase cardiovascular risk and, conversely, reduce the risk of development of rheumatoid arthritis. The cardiovascular risk might, in part, be mediated through an increase in proatherogenic lipid concentrations.

    Funded by: British Heart Foundation: RG/08/014/24067, SP/09/002; Cancer Research UK: 10589, 12076; European Research Council: 268834; Medical Research Council: G0502131, G0800270, MC_PC_U127527198, MC_UU_12013/1, MC_UU_12013/3, MR/L003120/1; NCATS NIH HHS: UL1 TR000427; NHGRI NIH HHS: U01 HG006389; NHLBI NIH HHS: K23 HL114724, R01 HL105756, R01 HL120393; NIAMS NIH HHS: P30 AR047363, RC1 AR058587; Wellcome Trust: 095198

    The lancet. Diabetes & endocrinology 2015;3;4;243-53

  • A Semi-Supervised Approach for Refining Transcriptional Signatures of Drug Response and Repositioning Predictions.

    Iorio F, Shrestha RL, Levin N, Boilot V, Garnett MJ, Saez-Rodriguez J and Draviam VM

    European Molecular Biology Laboratory-European Bioinformatics institute, Wellcome Trust Genome Campus, CB10 1SD, Cambridge, United Kingdom.

    We present a novel strategy to identify drug-repositioning opportunities. The starting point of our method is the generation of a signature summarising the consensual transcriptional response of multiple human cell lines to a compound of interest (namely the seed compound). This signature can be derived from data in existing databases, such as the connectivity-map, and it is used at first instance to query a network interlinking all the connectivity-map compounds, based on the similarity of their transcriptional responses. This provides a drug neighbourhood, composed of compounds predicted to share some effects with the seed one. The original signature is then refined by systematically reducing its overlap with the transcriptional responses induced by drugs in this neighbourhood that are known to share a secondary effect with the seed compound. Finally, the drug network is queried again with the resulting refined signatures and the whole process is carried on for a number of iterations. Drugs in the final refined neighbourhood are then predicted to exert the principal mode of action of the seed compound. We illustrate our approach using paclitaxel (a microtubule stabilising agent) as seed compound. Our method predicts that glipizide and splitomicin perturb microtubule function in human cells: a result that could not be obtained through standard signature matching methods. In agreement, we find that glipizide and splitomicin reduce interphase microtubule growth rates and transiently increase the percentage of mitotic cells-consistent with our prediction. Finally, we validated the refined signatures of paclitaxel response by mining a large drug screening dataset, showing that human cancer cell lines whose basal transcriptional profile is anti-correlated to them are significantly more sensitive to paclitaxel and docetaxel.

    Funded by: Cancer Research UK: 9787; Wellcome Trust: 102696

    PloS one 2015;10;10;e0139446

  • Early maturation and distinct tau pathology in induced pluripotent stem cell-derived neurons from patients with MAPT mutations.

    Iovino M, Agathou S, González-Rueda A, Del Castillo Velasco-Herrera M, Borroni B, Alberici A, Lynch T, O'Dowd S, Geti I, Gaffney D, Vallier L, Paulsen O, Káradóttir RT and Spillantini MG

    1 Department of Clinical Neurosciences, Clifford Allbutt Building, University of Cambridge, Cambridge, UK.

    Tauopathies, such as Alzheimer's disease, some cases of frontotemporal dementia, corticobasal degeneration and progressive supranuclear palsy, are characterized by aggregates of the microtubule-associated protein tau, which are linked to neuronal death and disease development and can be caused by mutations in the MAPT gene. Six tau isoforms are present in the adult human brain and they differ by the presence of 3(3R) or 4(4R) C-terminal repeats. Only the shortest 3R isoform is present in foetal brain. MAPT mutations found in human disease affect tau binding to microtubules or the 3R:4R isoform ratio by altering exon 10 splicing. We have differentiated neurons from induced pluripotent stem cells derived from fibroblasts of controls and patients with N279K and P301L MAPT mutations. Induced pluripotent stem cell-derived neurons recapitulate developmental tau expression, showing the adult brain tau isoforms after several months in culture. Both N279K and P301L neurons exhibit earlier electrophysiological maturation and altered mitochondrial transport compared to controls. Specifically, the N279K neurons show abnormally premature developmental 4R tau expression, including changes in the 3R:4R isoform ratio and AT100-hyperphosphorylated tau aggregates, while P301L neurons are characterized by contorted processes with varicosity-like structures, some containing both alpha-synuclein and 4R tau. The previously unreported faster maturation of MAPT mutant human neurons, the developmental expression of 4R tau and the morphological alterations may contribute to disease development.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council: G0301152, G0800784; NIA NIH HHS: P30 AG010133; Wellcome Trust: 091543/Z/10/Z, 098051

    Brain : a journal of neurology 2015;138;Pt 11;3345-59

  • Discovery and Characterization of Human-Urine Utilization by Asymptomatic-Bacteriuria-Causing Streptococcus agalactiae.

    Ipe DS, Ben Zakour NL, Sullivan MJ, Beatson SA, Ulett KB, Benjamin WH, Davies MR, Dando SJ, King NP, Cripps AW, Schembri MA, Dougan G and Ulett GC

    School of Medical Sciences, Menzies Health Institute Queensland, Griffith University, Gold Coast Campus, QLD, Australia.

    Streptococcus agalactiae causes both symptomatic cystitis and asymptomatic bacteriuria (ABU); however, growth characteristics of S. agalactiae in human urine have not previously been reported. Here, we describe a phenotype of robust growth in human urine observed in ABU-causing S. agalactiae (ABSA) that was not seen among uropathogenic S. agalactiae (UPSA) strains isolated from patients with acute cystitis. In direct competition assays using pooled human urine inoculated with equal numbers of a prototype ABSA strain, designated ABSA 1014, and any one of several UPSA strains, measurement of the percentage of each strain recovered over time showed a markedly superior fitness of ABSA 1014 for urine growth. Comparative phenotype profiling of ABSA 1014 and UPSA strain 807, isolated from a patient with acute cystitis, using metabolic arrays of >2,500 substrates and conditions revealed unique and specific l-malic acid catabolism in ABSA 1014 that was absent in UPSA 807. Whole-genome sequencing also revealed divergence in malic enzyme-encoding genes between the strains predicted to impact the activity of the malate metabolic pathway. Comparative growth assays in urine comparing wild-type ABSA and gene-deficient mutants that were functionally inactivated for the malic enzyme metabolic pathway by targeted disruption of the maeE or maeK gene in ABSA demonstrated attenuated growth of the mutants in normal human urine as well as synthetic human urine containing malic acid. We conclude that some S. agalactiae strains can grow in human urine, and this relates in part to malic acid metabolism, which may affect the persistence or progression of S. agalactiae ABU.

    Infection and immunity 2015;84;1;307-19

  • Off-target mutations are rare in Cas9-modified mice.

    Iyer V, Shen B, Zhang W, Hodgkins A, Keane T, Huang X and Skarnes WC

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Funded by: Medical Research Council: MR/L007428/1; Wellcome Trust

    Nature methods 2015;12;6;479

  • Global Gene Expression Profiling through the Complete Life Cycle of Trypanosoma vivax.

    Jackson AP, Goyard S, Xia D, Foth BJ, Sanders M, Wastling JM, Minoprio P and Berriman M

    Department of Infection Biology, Institute of Infection and Global Health, University of Liverpool, Liverpool, United Kingdom.

    The parasitic flagellate Trypanosoma vivax is a cause of animal trypanosomiasis across Africa and South America. The parasite has a digenetic life cycle, passing between mammalian hosts and insect vectors, and a series of developmental forms adapted to each life cycle stage. Each point in the life cycle presents radically different challenges to parasite metabolism and physiology and distinct host interactions requiring remodeling of the parasite cell surface. Transcriptomic and proteomic studies of the related parasites T. brucei and T. congolense have shown how gene expression is regulated during their development. New methods for in vitro culture of the T. vivax insect stages have allowed us to describe global gene expression throughout the complete T. vivax life cycle for the first time. We combined transcriptomic and proteomic analysis of each life stage using RNA-seq and mass spectrometry respectively, to identify genes with patterns of preferential transcription or expression. While T. vivax conforms to a pattern of highly conserved gene expression found in other African trypanosomes, (e.g. developmental regulation of energy metabolism, restricted expression of a dominant variant antigen, and expression of 'Fam50' proteins in the insect mouthparts), we identified significant differences in gene expression affecting metabolism in the fly and a suite of T. vivax-specific genes with predicted cell-surface expression that are preferentially expressed in the mammal ('Fam29, 30, 42') or the vector ('Fam34, 35, 43'). T. vivax differs significantly from other African trypanosomes in the developmentally-regulated proteins likely to be expressed on its cell surface and thus, in the structure of the host-parasite interface. These unique features may yet explain the species differences in life cycle and could, in the form of bloodstream-stage proteins that do not undergo antigenic variation, provide targets for therapy.

    Funded by: Wellcome Trust: WT 097826/Z/11/A, WT 098051

    PLoS neglected tropical diseases 2015;9;8;e0003975

  • Mouse slc9a8 mutants exhibit retinal defects due to retinal pigmented epithelium dysfunction.

    Jadeja S, Barnard AR, McKie L, Cross SH, White JK, Sanger Mouse Genetics Project, Robertson M, Budd PS, MacLaren RE and Jackson IJ

    MRC Human Genetics Unit, MRC Institute of Genetics & Molecular Medicine, University of Edinburgh, Edinburgh, United Kingdom.

    Purpose: As part of a large scale systematic screen to determine the effects of gene knockout mutations in mice, a retinal phenotype was found in mice lacking the Slc9a8 gene, encoding the sodium/hydrogen ion exchange protein NHE8. We aimed to characterize the mutant phenotype and the role of sodium/hydrogen ion exchange in retinal function.

    Methods: Detailed histology characterized the pathological consequences of Slc9a8 mutation, and retinal function was assessed by electroretinography (ERG). A conditional allele was used to identify the cells in which NHE8 function is critical for retinal function, and mutant cells analyzed for the effect of the mutation on endosomes.

    Results: Histology of mutant retinas reveals a separation of photoreceptors from the RPE and infiltration by macrophages. There is a small reduction in photoreceptor length and a mislocalization of visual pigments. The ERG testing reveals a deficit in rod and cone pathway function. The RPE shows abnormal morphology, and mutation of Slc9a8 in only RPE cells recapitulates the mutant phenotype. The NHE8 protein localizes to endosomes, and mutant cells have much smaller recycling endosomes.

    Conclusions: The NHE8 protein is required in the RPE to maintain correct regulation of endosomal volume and/or pH which is essential for the cellular integrity and subsequent function of RPE.

    Funded by: Medical Research Council: MC_PC_U127561112; Wellcome Trust: 079643

    Investigative ophthalmology & visual science 2015;56;5;3015-26

  • The presence of prolines in the flanking region of an immunodominant HIV-2 gag epitope influences the quality and quantity of the epitope generated.

    Jallow S, Leligdowicz A, Kramer HB, Onyango C, Cotten M, Wright C, Whittle HC, McMichael A, Dong T, Kessler BM and Rowland-Jones SL

    Radcliffe Department of Medicine, Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, University of Oxford, Headington, Oxford, UK.

    Both the recognition of HIV-infected cells and the immunogenicity of candidate CTL vaccines depend on the presentation of a peptide epitope at the cell surface, which in turn depends on intracellular antigen processing. Differential antigen processing maybe responsible for the differences in both the quality and the quantity of epitopes produced, influencing the immunodominance hierarchy of viral epitopes. Previously, we showed that the magnitude of the HIV-2 gag-specific T-cell response is inversely correlated with plasma viral load, particularly when responses are directed against an epitope, 165 DRFYKSLRA173 , within the highly conserved Major Homology Region of gag-p26. We also showed that the presence of three proline residues, at positions 119, 159 and 178 of gag-p26, was significantly correlated with low viral load. Since this proline motif was also associated with stronger gag-specific CTL responses, we investigated the impact of these prolines on proteasomal processing of the protective 165 DRFYKSLRA173 epitope. Our data demonstrate that the 165 DRFYKSLRA173 epitope is most efficiently processed from precursors that contain two flanking proline residues, found naturally in low viral-load patients. Superior antigen processing and enhanced presentation may account for the link between infection with HIV-2 encoding the "PPP-gag" sequence and both strong gag-specific CTL responses as well as lower viral load.

    Funded by: Medical Research Council: G0801751; Wellcome Trust: 084655

    European journal of immunology 2015;45;8;2232-42

  • Looking at Beijing's skyline.

    Jamrozy D and Kallonen T

    Nature reviews. Microbiology 2015;13;9;528

  • Identification of a novel human rhinovirus C type by antibody capture VIDISCA-454.

    Jazaeri Farsani SM, Oude Munnink BB, Canuti M, Deijs M, Cotten M, Jebbink MF, Verhoeven J, Kellam P, Loens K, Goossens H, Ieven M and van der Hoek L

    Laboratory of Experimental Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center of the University of Amsterdam, Amsterdam 1105 AZ, the Netherlands.

    Causative agents for more than 30 percent of respiratory infections remain unidentified, suggesting that unknown respiratory pathogens might be involved. In this study, antibody capture VIDISCA-454 (virus discovery cDNA-AFLP combined with Roche 454 high-throughput sequencing) resulted in the discovery of a novel type of rhinovirus C (RV-C). The virus has an RNA genome of at least 7054 nt and carries the characteristics of rhinovirus C species. The gene encoding viral protein 1, which is used for typing, has only 81% nucleotide sequence identity with the closest known RV-C type, and, therefore, the virus represents the first member of a novel type, named RV-C54.

    Viruses 2015;7;1;239-51

  • The genomic and phenotypic diversity of Schizosaccharomyces pombe.

    Jeffares DC, Rallis C, Rieux A, Speed D, Převorovský M, Mourier T, Marsellach FX, Iqbal Z, Lau W, Cheng TM, Pracana R, Mülleder M, Lawson JL, Chessel A, Bala S, Hellenthal G, O'Fallon B, Keane T, Simpson JT, Bischof L, Tomiczek B, Bitton DA, Sideri T, Codlin S, Hellberg JE, van Trigt L, Jeffery L, Li JJ, Atkinson S, Thodberg M, Febrer M, McLay K, Drou N, Brown W, Hayles J, Carazo Salas RE, Ralser M, Maniatis N, Balding DJ, Balloux F, Durbin R and Bähler J

    Department of Genetics, Evolution and Environment, University College London, London, UK.

    Natural variation within species reveals aspects of genome evolution and function. The fission yeast Schizosaccharomyces pombe is an important model for eukaryotic biology, but researchers typically use one standard laboratory strain. To extend the usefulness of this model, we surveyed the genomic and phenotypic variation in 161 natural isolates. We sequenced the genomes of all strains, finding moderate genetic diversity (π = 3 × 10(-3) substitutions/site) and weak global population structure. We estimate that dispersal of S. pombe began during human antiquity (∼340 BCE), and ancestors of these strains reached the Americas at ∼1623 CE. We quantified 74 traits, finding substantial heritable phenotypic diversity. We conducted 223 genome-wide association studies, with 89 traits showing at least one association. The most significant variant for each trait explained 22% of the phenotypic variance on average, with indels having larger effects than SNPs. This analysis represents a rich resource to examine genotype-phenotype relationships in a tractable model.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/H005854/1, BB/H008802/1, BB/K006320/1; Cancer Research UK; European Research Council: 260801, 260809; Medical Research Council: G0901388, MR/L012561/1; Wellcome Trust: 093735, 093917, 095598, 095598/Z/11/Z, 098051, 098386, RG 093735/Z/10/Z

    Nature genetics 2015;47;3;235-41

  • Recessive nephrocerebellar syndrome on the Galloway-Mowat syndrome spectrum is caused by homozygous protein-truncating mutations of WDR73.

    Jinks RN, Puffenberger EG, Baple E, Harding B, Crino P, Fogo AB, Wenger O, Xin B, Koehler AE, McGlincy MH, Provencher MM, Smith JD, Tran L, Al Turki S, Chioza BA, Cross H, Harlalka GV, Hurles ME, Maroofian R, Heaps AD, Morton MC, Stempak L, Hildebrandt F, Sadowski CE, Zaritsky J, Campellone K, Morton DH, Wang H, Crosby A and Strauss KA

    1 Department of Biology and Biological Foundations of Behaviour Program, Franklin and Marshall College, Lancaster, PA 17604, USA

    We describe a novel nephrocerebellar syndrome on the Galloway-Mowat syndrome spectrum among 30 children (ages 1.0 to 28 years) from diverse Amish demes. Children with nephrocerebellar syndrome had progressive microcephaly, visual impairment, stagnant psychomotor development, abnormal extrapyramidal movements and nephrosis. Fourteen died between ages 2.7 and 28 years, typically from renal failure. Post-mortem studies revealed (i) micrencephaly without polymicrogyria or heterotopia; (ii) atrophic cerebellar hemispheres with stunted folia, profound granule cell depletion, Bergmann gliosis, and signs of Purkinje cell deafferentation; (iii) selective striatal cholinergic interneuron loss; and (iv) optic atrophy with delamination of the lateral geniculate nuclei. Renal tissue showed focal and segmental glomerulosclerosis and extensive effacement and microvillus transformation of podocyte foot processes. Nephrocerebellar syndrome mapped to 700 kb on chromosome 15, which contained a single novel homozygous frameshift variant (WDR73 c.888delT; p.Phe296Leufs*26). WDR73 protein is expressed in human cerebral cortex, hippocampus, and cultured embryonic kidney cells. It is concentrated at mitotic microtubules and interacts with α-, β-, and γ-tubulin, heat shock proteins 70 and 90 (HSP-70; HSP-90), and the carbamoyl phosphate synthetase 2/aspartate transcarbamylase/dihydroorotase multi-enzyme complex. Recombinant WDR73 p.Phe296Leufs*26 and p.Arg256Profs*18 proteins are truncated, unstable, and show increased interaction with α- and β-tubulin and HSP-70/HSP-90. Fibroblasts from patients homozygous for WDR73 p.Phe296Leufs*26 proliferate poorly in primary culture and senesce early. Our data suggest that in humans, WDR73 interacts with mitotic microtubules to regulate cell cycle progression, proliferation and survival in brain and kidney. We extend the Galloway-Mowat syndrome spectrum with the first description of diencephalic and striatal neuropathology.

    Funded by: Howard Hughes Medical Institute; Medical Research Council: G1001931, G1002279; NIDDK NIH HHS: DK064614, DK1068306, DK1069274, R01 DK068306; NIGMS NIH HHS: R01 GM107441

    Brain : a journal of neurology 2015;138;Pt 8;2173-90

  • Allele variants of enterotoxigenic Escherichia coli heat-labile toxin are globally transmitted and associated with colonization factors.

    Joffré E, von Mentzer A, Abd El Ghany M, Oezguen N, Savidge T, Dougan G, Svennerholm AM and Sjöling Å

    Department of Microbiology and Immunology, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden Institute of Molecular Biology and Biotechnology, Universidad Mayor de San Andrés, La Paz, Bolivia.

    Enterotoxigenic Escherichia coli (ETEC) is a significant cause of morbidity and mortality in the developing world. ETEC-mediated diarrhea is orchestrated by heat-labile toxin (LT) and heat-stable toxins (STp and STh), acting in concert with a repertoire of more than 25 colonization factors (CFs). LT, the major virulence factor, induces fluid secretion after delivery of a monomeric ADP-ribosylase (LTA) and its pentameric carrier B subunit (LTB). A study of ETEC isolates from humans in Brazil reported the existence of natural LT variants. In the present study, analysis of predicted amino acid sequences showed that the LT amino acid polymorphisms are associated with a geographically and temporally diverse set of 192 clinical ETEC strains and identified 12 novel LT variants. Twenty distinct LT amino acid variants were observed in the globally distributed strains, and phylogenetic analysis showed these to be associated with different CF profiles. Notably, the most prevalent LT1 allele variants were correlated with major ETEC lineages expressing CS1 + CS3 or CS2 + CS3, and the most prevalent LT2 allele variants were correlated with major ETEC lineages expressing CS5 + CS6 or CFA/I. LTB allele variants generally exhibited more-stringent amino acid sequence conservation (2 substitutions identified) than LTA allele variants (22 substitutions identified). The functional impact of LT1 and LT2 polymorphisms on virulence was investigated by measuring total-toxin production, secretion, and stability using GM1-enzyme-linked immunosorbent assays (GM1-ELISA) and in silico protein modeling. Our data show that LT2 strains produce 5-fold more toxin than LT1 strains (P < 0.001), which may suggest greater virulence potential for this genetic variant. Our data suggest that functionally distinct LT-CF variants with increased fitness have persisted during the evolution of ETEC and have spread globally.

    Funded by: PHS HHS: R01 NIAID AI0094001

    Journal of bacteriology 2015;197;2;392-403

  • Systems genetics identifies Sestrin 3 as a regulator of a proconvulsant gene network in human epileptic hippocampus.

    Johnson MR, Behmoaras J, Bottolo L, Krishnan ML, Pernhorst K, Santoscoy PLM, Rossetti T, Speed D, Srivastava PK, Chadeau-Hyam M, Hajji N, Dabrowska A, Rotival M, Razzaghi B, Kovac S, Wanisch K, Grillo FW, Slaviero A, Langley SR, Shkura K, Roncon P, De T, Mattheisen M, Niehusmann P, O'Brien TJ, Petrovski S, von Lehe M, Hoffmann P, Eriksson J, Coffey AJ, Cichon S, Walker M, Simonato M, Danis B, Mazzuferi M, Foerch P, Schoch S, De Paola V, Kaminski RM, Cunliffe VT, Becker AJ and Petretto E

    Division of Brain Sciences, Imperial College London, Hammersmith Hospital Campus, Burlington Danes Building, London W12 0NN, UK.

    Gene-regulatory network analysis is a powerful approach to elucidate the molecular processes and pathways underlying complex disease. Here we employ systems genetics approaches to characterize the genetic regulation of pathophysiological pathways in human temporal lobe epilepsy (TLE). Using surgically acquired hippocampi from 129 TLE patients, we identify a gene-regulatory network genetically associated with epilepsy that contains a specialized, highly expressed transcriptional module encoding proconvulsive cytokines and Toll-like receptor signalling genes. RNA sequencing analysis in a mouse model of TLE using 100 epileptic and 100 control hippocampi shows the proconvulsive module is preserved across-species, specific to the epileptic hippocampus and upregulated in chronic epilepsy. In the TLE patients, we map the trans-acting genetic control of this proconvulsive module to Sestrin 3 (SESN3), and demonstrate that SESN3 positively regulates the module in macrophages, microglia and neurons. Morpholino-mediated Sesn3 knockdown in zebrafish confirms the regulation of the transcriptional module, and attenuates chemically induced behavioural seizures in vivo.

    Funded by: Medical Research Council: MC_U120088464, MC_U120097112, MR/L001578/1, MR/L012561/1, MR/M004716/1; Wellcome Trust: 079643, WT066056

    Nature communications 2015;6;6031

  • Genomic and Proteomic Studies on the Mode of Action of Oxaboroles against the African Trypanosome.

    Jones DC, Foth BJ, Urbaniak MD, Patterson S, Ong HB, Berriman M and Fairlamb AH

    School of Life Sciences, University of Dundee, Dundee, United Kingdom.

    SCYX-7158, an oxaborole, is currently in Phase I clinical trials for the treatment of human African trypanosomiasis. Here we investigate possible modes of action against Trypanosoma brucei using orthogonal chemo-proteomic and genomic approaches. SILAC-based proteomic studies using an oxaborole analogue immobilised onto a resin was used either in competition with a soluble oxaborole or an immobilised inactive control to identify thirteen proteins common to both strategies. Cell-cycle analysis of cells incubated with sub-lethal concentrations of an oxaborole identified a subtle but significant accumulation of G2 and >G2 cells. Given the possibility of compromised DNA fidelity, we investigated long-term exposure of T. brucei to oxaboroles by generating resistant cell lines in vitro. Resistance proved more difficult to generate than for drugs currently used in the field, and in one of our three cell lines was unstable. Whole-genome sequencing of the resistant cell lines revealed single nucleotide polymorphisms in 66 genes and several large-scale genomic aberrations. The absence of a simple consistent mechanism among resistant cell lines and the diverse list of binding partners from the proteomic studies suggest a degree of polypharmacology that should reduce the risk of resistance to this compound class emerging in the field. The combined genetic and chemical biology approaches have provided lists of candidates to be investigated for more detailed information on the mode of action of this promising new drug class.

    Funded by: Wellcome Trust: 079838, 097945, 098051

    PLoS neglected tropical diseases 2015;9;12;e0004299

  • Characterization of plasmids in extensively drug-resistant acinetobacter strains isolated in India and Pakistan.

    Jones LS, Carvalho MJ, Toleman MA, White PL, Connor TR, Mushtaq A, Weeks JL, Kumarasamy KK, Raven KE, Török ME, Peacock SJ, Howe RA and Walsh TR

    Cardiff University, Cardiff School of Medicine, University Hospital of Wales, Cardiff, United Kingdom Public Health Wales Microbiology Cardiff, University Hospital of Wales, Cardiff, United Kingdom

    The blaNDM-1 gene is associated with extensive drug resistance in Gram-negative bacteria. This probably spread to Enterobacteriaceae from Acinetobacter spp., and we characterized plasmids associated with blaNDM-1 in Acinetobacter spp. to gain insight into their role in this dissemination. Four clinical NDM-1-producing Acinetobacter species strains from India and Pakistan were investigated. A plasmid harboring blaNDM-1, pNDM-40-1, was characterized by whole-genome sequencing of Acinetobacter bereziniae CHI-40-1 and comparison with related plasmids. The presence of similar plasmids in strains from Pakistan was sought by PCR and sequencing of amplicons. Conjugation frequency was tested and stability of pNDM-40-1 investigated by real-time PCR of isolates passaged with and without antimicrobial selection pressure. A. bereziniae and Acinetobacter haemolyticus strains contained plasmids similar to the pNDM-BJ01-like plasmids identified in Acinetobacter spp. in China. The backbone of pNDM-40-1 was almost identical to that of pNDM-BJ01-like plasmids, but the transposon harboring blaNDM-1, Tn125, contained two short deletions. Escherichia coli and Acinetobacter pittii transconjugants were readily obtained. Transconjugants retained pNDM-40-1 after a 14-day passage experiment, although stability was greater with meropenem selection. Fragments of pNDM-BJ01-like plasmid backbones are found near blaNDM-1 in some genetic contexts from Enterobacteriaceae, suggesting that cross-genus transfer has occurred. pNDM-BJ01-like plasmids have been described in isolates originating from a wide geographical region in southern Asia. In vitro data on plasmid transfer and stability suggest that these plasmids could have contributed to the spread of blaNDM-1 into Enterobacteriaceae.

    Funded by: Canadian Institutes of Health Research; Medical Research Council: G1000803, G1100135

    Antimicrobial agents and chemotherapy 2015;59;2;923-9

  • Directional dominance on stature and cognition in diverse human populations.

    Joshi PK, Esko T, Mattsson H, Eklund N, Gandin I, Nutile T, Jackson AU, Schurmann C, Smith AV, Zhang W, Okada Y, Stančáková A, Faul JD, Zhao W, Bartz TM, Concas MP, Franceschini N, Enroth S, Vitart V, Trompet S, Guo X, Chasman DI, O'Connel JR, Corre T, Nongmaithem SS, Chen Y, Mangino M, Ruggiero D, Traglia M, Farmaki AE, Kacprowski T, Bjonnes A, van der Spek A, Wu Y, Giri AK, Yanek LR, Wang L, Hofer E, Rietveld CA, McLeod O, Cornelis MC, Pattaro C, Verweij N, Baumbach C, Abdellaoui A, Warren HR, Vuckovic D, Mei H, Bouchard C, Perry JRB, Cappellani S, Mirza SS, Benton MC, Broeckel U, Medland SE, Lind PA, Malerba G, Drong A, Yengo L, Bielak LF, Zhi D, van der Most PJ, Shriner D, Mägi R, Hemani G, Karaderi T, Wang Z, Liu T, Demuth I, Zhao JH, Meng W, Lataniotis L, van der Laan SW, Bradfield JP, Wood AR, Bonnefond A, Ahluwalia TS, Hall LM, Salvi E, Yazar S, Carstensen L, de Haan HG, Abney M, Afzal U, Allison MA, Amin N, Asselbergs FW, Bakker SJL, Barr RG, Baumeister SE, Benjamin DJ, Bergmann S, Boerwinkle E, Bottinger EP, Campbell A, Chakravarti A, Chan Y, Chanock SJ, Chen C, Chen YI, Collins FS, Connell J, Correa A, Cupples LA, Smith GD, Davies G, Dörr M, Ehret G, Ellis SB, Feenstra B, Feitosa MF, Ford I, Fox CS, Frayling TM, Friedrich N, Geller F, Scotland G, Gillham-Nasenya I, Gottesman O, Graff M, Grodstein F, Gu C, Haley C, Hammond CJ, Harris SE, Harris TB, Hastie ND, Heard-Costa NL, Heikkilä K, Hocking LJ, Homuth G, Hottenga JJ, Huang J, Huffman JE, Hysi PG, Ikram MA, Ingelsson E, Joensuu A, Johansson Å, Jousilahti P, Jukema JW, Kähönen M, Kamatani Y, Kanoni S, Kerr SM, Khan NM, Koellinger P, Koistinen HA, Kooner MK, Kubo M, Kuusisto J, Lahti J, Launer LJ, Lea RA, Lehne B, Lehtimäki T, Liewald DCM, Lind L, Loh M, Lokki ML, London SJ, Loomis SJ, Loukola A, Lu Y, Lumley T, Lundqvist A, Männistö S, Marques-Vidal P, Masciullo C, Matchan A, Mathias RA, Matsuda K, Meigs JB, Meisinger C, Meitinger T, Menni C, Mentch FD, Mihailov E, Milani L, Montasser ME, Montgomery GW, Morrison A, Myers RH, Nadukuru R, Navarro P, Nelis M, Nieminen MS, Nolte IM, O'Connor GT, Ogunniyi A, Padmanabhan S, Palmas WR, Pankow JS, Patarcic I, Pavani F, Peyser PA, Pietilainen K, Poulter N, Prokopenko I, Ralhan S, Redmond P, Rich SS, Rissanen H, Robino A, Rose LM, Rose R, Sala C, Salako B, Salomaa V, Sarin AP, Saxena R, Schmidt H, Scott LJ, Scott WR, Sennblad B, Seshadri S, Sever P, Shrestha S, Smith BH, Smith JA, Soranzo N, Sotoodehnia N, Southam L, Stanton AV, Stathopoulou MG, Strauch K, Strawbridge RJ, Suderman MJ, Tandon N, Tang ST, Taylor KD, Tayo BO, Töglhofer AM, Tomaszewski M, Tšernikova N, Tuomilehto J, Uitterlinden AG, Vaidya D, van Hylckama Vlieg A, van Setten J, Vasankari T, Vedantam S, Vlachopoulou E, Vozzi D, Vuoksimaa E, Waldenberger M, Ware EB, Wentworth-Shields W, Whitfield JB, Wild S, Willemsen G, Yajnik CS, Yao J, Zaza G, Zhu X, Project TBJ, Salem RM, Melbye M, Bisgaard H, Samani NJ, Cusi D, Mackey DA, Cooper RS, Froguel P, Pasterkamp G, Grant SFA, Hakonarson H, Ferrucci L, Scott RA, Morris AD, Palmer CNA, Dedoussis G, Deloukas P, Bertram L, Lindenberger U, Berndt SI, Lindgren CM, Timpson NJ, Tönjes A, Munroe PB, Sørensen TIA, Rotimi CN, Arnett DK, Oldehinkel AJ, Kardia SLR, Balkau B, Gambaro G, Morris AP, Eriksson JG, Wright MJ, Martin NG, Hunt SC, Starr JM, Deary IJ, Griffiths LR, Tiemeier H, Pirastu N, Kaprio J, Wareham NJ, Pérusse L, Wilson JG, Girotto G, Caulfield MJ, Raitakari O, Boomsma DI, Gieger C, van der Harst P, Hicks AA, Kraft P, Sinisalo J, Knekt P, Johannesson M, Magnusson PKE, Hamsten A, Schmidt R, Borecki IB, Vartiainen E, Becker DM, Bharadwaj D, Mohlke KL, Boehnke M, van Duijn CM, Sanghera DK, Teumer A, Zeggini E, Metspalu A, Gasparini P, Ulivi S, Ober C, Toniolo D, Rudan I, Porteous DJ, Ciullo M, Spector TD, Hayward C, Dupuis J, Loos RJF, Wright AF, Chandak GR, Vollenweider P, Shuldiner A, Ridker PM, Rotter JI, Sattar N, Gyllensten U, North KE, Pirastu M, Psaty BM, Weir DR, Laakso M, Gudnason V, Takahashi A, Chambers JC, Kooner JS, Strachan DP, Campbell H, Hirschhorn JN, Perola M, Polašek O and Wilson JF

    Usher Institute for Population Health Sciences and Informatics, University of Edinburgh, Teviot Place, Edinburgh, EH8 9AG, Scotland.

    Homozygosity has long been associated with rare, often devastating, Mendelian disorders, and Darwin was one of the first to recognize that inbreeding reduces evolutionary fitness. However, the effect of the more distant parental relatedness that is common in modern human populations is less well understood. Genomic data now allow us to investigate the effects of homozygosity on traits of public health importance by observing contiguous homozygous segments (runs of homozygosity), which are inferred to be homozygous along their complete length. Given the low levels of genome-wide homozygosity prevalent in most human populations, information is required on very large numbers of people to provide sufficient power. Here we use runs of homozygosity to study 16 health-related quantitative traits in 354,224 individuals from 102 cohorts, and find statistically significant associations between summed runs of homozygosity and four complex traits: height, forced expiratory lung volume in one second, general cognitive ability and educational attainment (P < 1 × 10(-300), 2.1 × 10(-6), 2.5 × 10(-10) and 1.8 × 10(-10), respectively). In each case, increased homozygosity was associated with decreased trait value, equivalent to the offspring of first cousins being 1.2 cm shorter and having 10 months' less education. Similar effect sizes were found across four continental groups and populations with different degrees of genome-wide homozygosity, providing evidence that homozygosity, rather than confounding, directly contributes to phenotypic variance. Contrary to earlier reports in substantially smaller samples, no evidence was seen of an influence of genome-wide homozygosity on blood pressure and low density lipoprotein cholesterol, or ten other cardio-metabolic traits. Since directional dominance is predicted for traits under directional evolutionary selection, this study provides evidence that increased stature and cognitive function have been positively selected in human evolution, whereas many important risk factors for late-onset complex diseases may not have been.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F019394/1, SAG09977; British Heart Foundation: RG/2001004/12869; Chief Scientist Office: CZB/4/276, CZB/4/505, CZB/4/710, CZD/16/6, CZD/16/6/2, CZD/16/6/3, CZD/16/6/4, ETM/55; Department of Health: BARCVBRU-2012-1, RP-PG-0407-10371, SRF/01/010; European Research Council: 250157, 280559, 323195; Medical Research Council: G0601966, G0700704, G0700931, G0701863, G1001799, G9521010, G9815508, MC_PC_15018, MC_PC_U127561128, MC_PC_U127592696, MC_U106179471, MC_U106179472, MC_U127561128, MC_UU_12013/1, MC_UU_12013/3, MC_UU_12015/1, MC_UU_12015/2, MR/K002414/1, MR/K006584/1, MR/K026992/1, MR/N01104X/1; NCATS NIH HHS: UL1 TR000124; NCI NIH HHS: UM1 CA182913; NHGRI NIH HHS: R01 HG002899; NHLBI NIH HHS: R01 HL055673, R01 HL077612, R01 HL085197, R01 HL091357, R01 HL104135, R01 HL117078, U01 HL072524; NIA NIH HHS: P30 AG010129, P30 AG017265, R01 AG008122, R01 AG033193, U01 AG009740, U01 AG049505; NICHD NIH HHS: R01 HD056465; NIDCD NIH HHS: R03 DC013373; NIDDK NIH HHS: P30 DK020572, P30 DK063491, R01 DK072193, R01 DK075787, R01 DK078616, R01 DK089256, R01 DK093757, U01 DK062370, U01 DK078616; NIMHD NIH HHS: P20 MD006899; NINDS NIH HHS: R01 NS017950; Wellcome Trust: 068545, 072856, 072960, 079771, 084723, 098051, 099194, 105022

    Nature 2015;523;7561;459-462

  • Frequent somatic transfer of mitochondrial DNA into the nuclear genome of human cancer cells.

    Ju YS, Tubio JM, Mifsud W, Fu B, Davies HR, Ramakrishna M, Li Y, Yates L, Gundem G, Tarpey PS, Behjati S, Papaemmanuil E, Martin S, Fullam A, Gerstung M, ICGC Prostate Cancer Working Group, ICGC Bone Cancer Working Group, ICGC Breast Cancer Working Group, Nangalia J, Green AR, Caldas C, Borg Å, Tutt A, Lee MT, van't Veer LJ, Tan BK, Aparicio S, Span PN, Martens JW, Knappskog S, Vincent-Salomon A, Børresen-Dale AL, Eyfjörd JE, Myklebost O, Flanagan AM, Foster C, Neal DE, Cooper C, Eeles R, Bova SG, Lakhani SR, Desmedt C, Thomas G, Richardson AL, Purdie CA, Thompson AM, McDermott U, Yang F, Nik-Zainal S, Campbell PJ and Stratton MR

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom;

    Mitochondrial genomes are separated from the nuclear genome for most of the cell cycle by the nuclear double membrane, intervening cytoplasm, and the mitochondrial double membrane. Despite these physical barriers, we show that somatically acquired mitochondrial-nuclear genome fusion sequences are present in cancer cells. Most occur in conjunction with intranuclear genomic rearrangements, and the features of the fusion fragments indicate that nonhomologous end joining and/or replication-dependent DNA double-strand break repair are the dominant mechanisms involved. Remarkably, mitochondrial-nuclear genome fusions occur at a similar rate per base pair of DNA as interchromosomal nuclear rearrangements, indicating the presence of a high frequency of contact between mitochondrial and nuclear DNA in some somatic cells. Transmission of mitochondrial DNA to the nuclear genome occurs in neoplastically transformed cells, but we do not exclude the possibility that some mitochondrial-nuclear DNA fusions observed in cancer occurred years earlier in normal somatic cells.

    Funded by: Cancer Research UK: 12765, 14835, C5047/A14835; Wellcome Trust

    Genome research 2015;25;6;814-24

  • Early insights into the potential of the Oxford Nanopore MinION for the detection of antimicrobial resistance genes.

    Judge K, Harris SR, Reuter S, Parkhill J and Peacock SJ

    Department of Medicine, University of Cambridge, Cambridge, UK.

    Objectives: Genome sequencing will be increasingly used in the clinical setting to tailor antimicrobial prescribing and inform infection control outbreaks. A recent technological innovation that could reduce the delay between pathogen sampling and data generation is single molecule sequencing. An example of this technology, which is undergoing evaluation through an early access programme, is the Oxford Nanopore MinION.

    Methods: We undertook a feasibility study on six clinically significant pathogens, comparing the MinION to the Illumina MiSeq and PacBio RSII platforms. Genomic DNA was prepared and sequenced using the MinION as instructed by the manufacturer, and Illumina MiSeq and PacBio sequencing was performed using established methods.

    Results: An evaluation of the accuracy of the MinION based on sequencing of an MRSA isolate showed that error rates were higher in the MinION reads, but provided an even coverage across the entire genome length. The MinION detected all of the expected carbapenemases and ESBL genes in five Gram-negative isolates and the mecA gene in an MRSA isolate.

    Conclusions: The MinION can detect the presence of acquired resistance genes, but improvements in accuracy are needed so that antimicrobial resistance associated with mutations in chromosomal genes can be identified.

    Funded by: Department of Health; Wellcome Trust: 098051, 098600

    The Journal of antimicrobial chemotherapy 2015;70;10;2775-8

  • Drug-resistance mechanisms and tuberculosis drugs.

    Köser CU, Javid B, Liddell K, Ellington MJ, Feuerriegel S, Niemann S, Brown NM, Burman WJ, Abubakar I, Ismail NA, Moore D, Peacock SJ and Török ME

    Department of Medicine, Addenbrooke's Hospital, University of Cambridge, Cambridge CB2 0QW, UK. Electronic address:

    Funded by: Department of Health: HICF-T5-342; Wellcome Trust: 098600, WT098600

    Lancet (London, England) 2015;385;9965;305-7

  • Homozygous loss-of-function variants in European cosmopolitan and isolate populations.

    Kaiser VB, Svinti V, Prendergast JG, Chau YY, Campbell A, Patarcic I, Barroso I, Joshi PK, Hastie ND, Miljkovic A, Taylor MS, Generation Scotland, UK10K, Enroth S, Memari Y, Kolb-Kokocinski A, Wright AF, Gyllensten U, Durbin R, Rudan I, Campbell H, Polašek O, Johansson Å, Sauer S, Porteous DJ, Fraser RM, Drake C, Vitart V, Hayward C, Semple CA and Wilson JF

    MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine and

    Homozygous loss of function (HLOF) variants provide a valuable window on gene function in humans, as well as an inventory of the human genes that are not essential for survival and reproduction. All humans carry at least a few HLOF variants, but the exact number of inactivated genes that can be tolerated is currently unknown—as are the phenotypic effects of losing function for most human genes. Here, we make use of 1432 whole exome sequences from five European populations to expand the catalogue of known human HLOF mutations; after stringent filtering of variants in our dataset, we identify a total of 173 HLOF mutations, 76 (44%) of which have not been observed previously. We find that population isolates are particularly well suited to surveys of novel HLOF genes because individuals in such populations carry extensive runs of homozygosity, which we show are enriched for novel, rare HLOF variants. Further, we make use of extensive phenotypic data to show that most HLOFs, ascertained in population-based samples, appear to have little detectable effect on the phenotype. On the contrary, we document several genes directly implicated in disease that seem to tolerate HLOF variants. Overall HLOF genes are enriched for olfactory receptor function and are expressed in testes more often than expected, consistent with reduced purifying selection and incipient pseudogenisation.

    Funded by: Chief Scientist Office: CZB/4/276, CZB/4/438, CZB/4/710, CZD/16/6, CZD/16/6/4; Medical Research Council: G0900740, MC_PC_U127561128, MC_PC_U127597124, MR/K001744/1; Wellcome Trust: 100140, WT091310

    Human molecular genetics 2015;24;19;5464-74

  • High multiple carriage and emergence of Streptococcus pneumoniae vaccine serotype variants in Malawian children.

    Kamng'ona AW, Hinds J, Bar-Zeev N, Gould KA, Chaguza C, Msefula C, Cornick JE, Kulohoma BW, Gray K, Bentley SD, French N, Heyderman RS and Everett DB

    Microbes, Immunity and Vaccines, Malawi Liverpool Wellcome Trust Clinical Research Programme, Blantyre, Malawi.

    Background: Carriage of either single or multiple pneumococcal serotypes (multiple carriage) is a prerequisite for developing invasive pneumococcal disease. However, despite the reported high rates of pneumococcal carriage in Malawi, no data on carriage of multiple serotypes has been reported previously. Our study provides the first description of the prevalence of multiple pneumococcal carriage in Malawi.

    Methods: The study was conducted in Blantyre and Karonga districts in Malawi, from 2008 to 2012. We recruited 116 children aged 0-13 years. These children were either HIV-infected (N = 44) or uninfected (N = 72). Nasopharyngeal samples were collected using sterile swabs. Pneumococcal serotypes in the samples were identified by microarray. Strains that could not be typed by microarray were sequenced to characterise possible genetic alterations within the capsular polysaccharide (CPS) locus.

    Results: The microarray identified 179 pneumococcal strains (from 116 subjects), encompassing 43 distinct serotypes and non-typeable (NT) strains. Forty per cent (46/116) of children carried multiple serotypes. Carriage of vaccine type (VT) strains was higher (p = 0.028) in younger (0-2 years) children (71 %, 40/56) compared to older (3-13 years) children (50 %, 30/60). Genetic variations within the CPS locus of known serotypes were observed in 19 % (34/179) of the strains identified. The variants included 13-valent pneumococcal conjugate vaccine (PCV13) serotypes 6B and 19A, and the polysaccharide vaccine serotype 20. Serotype 6B variants were the most frequently isolated (47 %, 16/34). Unlike the wild type, the CPS locus of the 6B variants contained an insertion of the licD-family phosphotransferase gene. The CPS locus of 19A- and 20-variants contained an inversion in the sugar-biosynthesis (rmlD) gene and a 717 bp deletion within the transferase (whaF) gene, respectively.

    Conclusions: The high multiple carriage in Malawian children provides opportunities for genetic exchange through horizontal gene transfer. This may potentially lead to CPS locus variants and vaccine escape. Variants reported here occurred naturally, however, PCV13 introduction could exacerbate the CPS genetic variations. Further studies are therefore recommended to assess the invasive potential of these variants and establish whether PCV13 would offer cross-protection. We have shown that younger children (0-2 years) are a reservoir of VT serotypes, which makes them an ideal target for vaccination.

    Funded by: Wellcome Trust

    BMC infectious diseases 2015;15;234

  • SETD2 loss-of-function promotes renal cancer branched evolution through replication stress and impaired DNA repair.

    Kanu N, Grönroos E, Martinez P, Burrell RA, Yi Goh X, Bartkova J, Maya-Mendoza A, Mistrík M, Rowan AJ, Patel H, Rabinowitz A, East P, Wilson G, Santos CR, McGranahan N, Gulati S, Gerlinger M, Birkbak NJ, Joshi T, Alexandrov LB, Stratton MR, Powles T, Matthews N, Bates PA, Stewart A, Szallasi Z, Larkin J, Bartek J and Swanton C

    UCL Cancer Institute, Paul O'Gorman Building, London, UK.

    Defining mechanisms that generate intratumour heterogeneity and branched evolution may inspire novel therapeutic approaches to limit tumour diversity and adaptation. SETD2 (Su(var), Enhancer of zeste, Trithorax-domain containing 2) trimethylates histone-3 lysine-36 (H3K36me3) at sites of active transcription and is mutated in diverse tumour types, including clear cell renal carcinomas (ccRCCs). Distinct SETD2 mutations have been identified in spatially separated regions in ccRCC, indicative of intratumour heterogeneity. In this study, we have addressed the consequences of SETD2 loss-of-function through an integrated bioinformatics and functional genomics approach. We find that bi-allelic SETD2 aberrations are not associated with microsatellite instability in ccRCC. SETD2 depletion in ccRCC cells revealed aberrant and reduced nucleosome compaction and chromatin association of the key replication proteins minichromosome maintenance complex component (MCM7) and DNA polymerase δ hindering replication fork progression, and failure to load lens epithelium-derived growth factor and the Rad51 homologous recombination repair factor at DNA breaks. Consistent with these data, we observe chromosomal breakpoint locations are biased away from H3K36me3 sites in SETD2 wild-type ccRCCs relative to tumours with bi-allelic SETD2 aberrations and that H3K36me3-negative ccRCCs display elevated DNA damage in vivo. These data suggest a role for SETD2 in maintaining genome integrity through nucleosome stabilization, suppression of replication stress and the coordination of DNA repair.

    Oncogene 2015;34;46;5699-708

  • Evaluation of conjunctival swab sampling in the diagnosis of canine leishmaniasis: A two-year follow-up study in Çukurova Plain, Turkey.

    Karakuş M, Töz S, Ertabaklar H, Paşa S, Atasoy A, Arserim SK, Ölgen MK, Alkan MZ, Durrant C and Özbel Y

    Ege University Faculty of Medicine, Department of Parasitology, Bornova, İzmir, Turkey.

    The diagnosis of canine leishmaniasis (CanL) in symptomatic and asymptomatic dogs is a very important and problematic public health issue in Turkey. A longitudinal study was carried out on dogs in selected villages in the Çukurova Plain in Turkey, from July 2011 to June 2013, where cutaneous (CL) and visceral (VL) leishmaniasis is endemic. The study aimed to determine the prevalence of CanL and to evaluate the early diagnostic performance of the non-invasive conjunctival swab nested PCR (CS n-PCR) test in comparison with the Indirect Fluorescent Antibody Test (IFAT). The consecutive blood and CS samples from a representative number of dogs (80-100 dogs/each survey) were collected in a cohort of 6 villages located in the area. Clinical symptoms, demographic and physical features about each dog were noted and lymph node aspiration samples were obtained from selected dogs with lymphadenopathy. In four surveys during the period, a total of 338 sets (blood and CS) of samples from 206 dogs were obtained, such that 83 dogs were sampled more than once. In the cross-sectional analysis, the CanL prevalence was found to be 27.18% (between 7.14% and 39.13%) by IFAT and 41.74% (between 29.03% and 46.66%) by CS n-PCR. The isolated strains were identified as Leishmania infantum MON-1 (n=9) and MON-98 (n=2) by MLEE analysis. Genetic studies targeting the Hsp70 and ITS1 regions performed on 11 dog isolates also showed two clear separate groups. According to IFAT results, 24 of the 83 dogs sampled more than once showed seroconversion (n=19) or a four-fold increase in Ab titers (n=5), while 17 were positive in the initial screening. Forty-two dogs stayed negative during the whole period. The natural Leishmania exposure rate was detected as 31.14% in the study area. CS n-PCR only detected Leishmania infection earlier than IFAT in 8 dogs. No statistical difference was found after the analysis of demographical and physical data. The results indicated that (i) circulation of the dog population is very common in settlements in the Çukurova Plain, but the disease prevalence is high and stable, (ii) the performance of CS n-PCR for detecting Leishmania-dog contact is higher than IFAT, (iii) and some of the parasites isolated from dogs have different zymodemes and/or genotypes from previous human and sand fly isolates; suggesting the probability of two different cycles of leishmaniasis in this particular area. This hypothesis should be supported by future studies targeting vectors and reservoirs.

    Veterinary parasitology 2015;214;3-4;295-302

  • Antimicrobial resistance and management of invasive Salmonella disease.

    Kariuki S, Gordon MA, Feasey N and Parry CM

    Centre for Microbiology Research, Kenya Medical Research Institute, PO Box 43640-00100, Nairobi, Kenya; The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom. Electronic address:

    Invasive Salmonella infections (typhoidal and non-typhoidal) cause a huge burden of illness estimated at nearly 3.4 million cases and over 600,000 deaths annually especially in resource-limited settings. Invasive non-typhoidal Salmonella (iNTS) infections are particularly important in immunosuppressed populations especially in sub-Saharan Africa, causing a mortality of 20-30% in vulnerable children below 5 years of age. In these settings, where routine surveillance for antimicrobial resistance is rare or non-existent, reports of 50-75% multidrug resistance (MDR) in NTS are common, including strains of NTS also resistant to flouroquinolones and 3rd generation cephalosporins. Typhoid (enteric) fever caused by Salmonella Typhi and Salmonella Paratyphi A remains a major public health problem in many parts of Asia and Africa. Currently over a third of isolates in many endemic areas are MDR, and diminished susceptibility or resistance to fluoroquinolones, the drugs of choice for MDR cases over the last decade is an increasing problem. The situation is particularly worrying in resource-limited settings where the few remaining effective antimicrobials are either unavailable or altogether too expensive to be afforded by either the general public or by public health services. Although the prudent use of effective antimicrobials, improved hygiene and sanitation and the discovery of new antimicrobial agents may offer hope for the management of invasive salmonella infections, it is essential to consider other interventions including the wider use of WHO recommended typhoid vaccines and the acceleration of trials for novel iNTS vaccines. The main objective of this review is to describe existing data on the prevalence and epidemiology of antimicrobial resistant invasive Salmonella infections and how this affects the management of these infections, especially in endemic developing countries.

    Funded by: NIAID NIH HHS: 1R01AI099525, R01 AI099525; Wellcome Trust

    Vaccine 2015;33 Suppl 3;C21-9

  • Ceftriaxone-resistant Salmonella enterica serotype typhimurium sequence type 313 from Kenyan patients is associated with the blaCTX-M-15 gene on a novel IncHI2 plasmid.

    Kariuki S, Okoro C, Kiiru J, Njoroge S, Omuse G, Langridge G, Kingsley RA, Dougan G and Revathi G

    Centre for Microbiology Research, Kenya Medical Research Institute, Nairobi, Kenya The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom

    Multidrug-resistant bacteria pose a major challenge to the clinical management of infections in resource-poor settings. Although nontyphoidal Salmonella (NTS) bacteria cause predominantly enteric self-limiting illness in developed countries, NTS is responsible for a huge burden of life-threatening bloodstream infections in sub-Saharan Africa. Here, we characterized nine S. Typhimurium isolates from an outbreak involving patients who initially failed to respond to ceftriaxone treatment at a referral hospital in Kenya. These Salmonella enterica serotype Typhimurium isolates were resistant to ampicillin, chloramphenicol, cefuroxime, ceftriaxone, aztreonam, cefepime, sulfamethoxazole-trimethoprim, and cefpodoxime. Resistance to β-lactams, including to ceftriaxone, was associated with carriage of a combination of blaCTX-M-15, blaOXA-1, and blaTEM-1 genes. The genes encoding resistance to heavy-metal ions were borne on the novel IncHI2 plasmid pKST313, which also carried a pair of class 1 integrons. All nine isolates formed a single clade within S. Typhimurium ST313, the major clone of an ongoing invasive NTS epidemic in the region. This emerging ceftriaxone-resistant clone may pose a major challenge in the management of invasive NTS in sub-Saharan Africa.

    Funded by: NIAID NIH HHS: 1R01AI099525; Wellcome Trust

    Antimicrobial agents and chemotherapy 2015;59;6;3133-9

  • A recent bottleneck of Y chromosome diversity coincides with a global change in culture.

    Karmin M, Saag L, Vicente M, Wilson Sayres MA, Järve M, Talas UG, Rootsi S, Ilumäe AM, Mägi R, Mitt M, Pagani L, Puurand T, Faltyskova Z, Clemente F, Cardona A, Metspalu E, Sahakyan H, Yunusbayev B, Hudjashov G, DeGiorgio M, Loogväli EL, Eichstaedt C, Eelmets M, Chaubey G, Tambets K, Litvinov S, Mormina M, Xue Y, Ayub Q, Zoraqi G, Korneliussen TS, Akhatova F, Lachance J, Tishkoff S, Momynaliev K, Ricaut FX, Kusuma P, Razafindrazaka H, Pierron D, Cox MP, Sultana GN, Willerslev R, Muller C, Westaway M, Lambert D, Skaro V, Kovačevic L, Turdikulova S, Dalimova D, Khusainova R, Trofimova N, Akhmetova V, Khidiyatova I, Lichman DV, Isakova J, Pocheshkhova E, Sabitov Z, Barashkov NA, Nymadawa P, Mihailov E, Seng JW, Evseeva I, Migliano AB, Abdullah S, Andriadze G, Primorac D, Atramentova L, Utevska O, Yepiskoposyan L, Marjanovic D, Kushniarevich A, Behar DM, Gilissen C, Vissers L, Veltman JA, Balanovska E, Derenko M, Malyarchuk B, Metspalu A, Fedorova S, Eriksson A, Manica A, Mendez FL, Karafet TM, Veeramah KR, Bradman N, Hammer MF, Osipova LP, Balanovsky O, Khusnutdinova EK, Johnsen K, Remm M, Thomas MG, Tyler-Smith C, Underhill PA, Willerslev E, Nielsen R, Metspalu M, Villems R and Kivisild T

    Estonian Biocentre, Tartu, 51010, Estonia; Department of Evolutionary Biology, Institute of Molecular and Cell Biology, University of Tartu, Tartu, 51010, Estonia;

    It is commonly thought that human genetic diversity in non-African populations was shaped primarily by an out-of-Africa dispersal 50-100 thousand yr ago (kya). Here, we present a study of 456 geographically diverse high-coverage Y chromosome sequences, including 299 newly reported samples. Applying ancient DNA calibration, we date the Y-chromosomal most recent common ancestor (MRCA) in Africa at 254 (95% CI 192-307) kya and detect a cluster of major non-African founder haplogroups in a narrow time interval at 47-52 kya, consistent with a rapid initial colonization model of Eurasia and Oceania after the out-of-Africa bottleneck. In contrast to demographic reconstructions based on mtDNA, we infer a second strong bottleneck in Y-chromosome lineages dating to the last 10 ky. We hypothesize that this bottleneck is caused by cultural changes affecting variance of reproductive success among males.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/H005854/1; European Research Council: 261213; NIGMS NIH HHS: R01 GM113657; Wellcome Trust: 098051

    Genome research 2015;25;4;459-66

  • Applying the ARRIVE Guidelines to an In Vivo Database.

    Karp NA, Meehan TF, Morgan H, Mason JC, Blake A, Kurbatova N, Smedley D, Jacobsen J, Mott RF, Iyer V, Matthews P, Melvin DG, Wells S, Flenniken AM, Masuya H, Wakana S, White JK, Lloyd KC, Reynolds CL, Paylor R, West DB, Svenson KL, Chesler EJ, de Angelis MH, Tocchini-Valentini GP, Sorg T, Herault Y, Parkinson H, Mallon AM and Brown SD

    Mouse Informatics Group, Wellcome Trust Sanger Institute, Cambridge, United Kingdom.

    The Animal Research: Reporting of In Vivo Experiments (ARRIVE) guidelines were developed to address the lack of reproducibility in biomedical animal studies and improve the communication of research findings. While intended to guide the preparation of peer-reviewed manuscripts, the principles of transparent reporting are also fundamental for in vivo databases. Here, we describe the benefits and challenges of applying the guidelines for the International Mouse Phenotyping Consortium (IMPC), whose goal is to produce and phenotype 20,000 knockout mouse strains in a reproducible manner across ten research centres. In addition to ensuring the transparency and reproducibility of the IMPC, the solutions to the challenges of applying the ARRIVE guidelines in the context of IMPC will provide a resource to help guide similar initiatives in the future.

    Funded by: Medical Research Council: MC_U142684171, MC_U142684172, MC_UP_1502/3; NCI NIH HHS: P30 CA034196; NHGRI NIH HHS: U54 HG006332, U54 HG006348, U54 HG006364, U54 HG006364-01, U54 HG006370, U54 HG006370-01, U54-HG006332, U54-HG006348, UM1 HG006348; NIH HHS: U42 OD011174, U42 OD011175, U42 OD012210, U42-OD11174, UM1 OD023221; Wellcome Trust: 083573/Z/07/Z, 090532/Z/09/Z, WT098051

    PLoS biology 2015;13;5;e1002151

  • The BRAF pseudogene functions as a competitive endogenous RNA and induces lymphoma in vivo.

    Karreth FA, Reschke M, Ruocco A, Ng C, Chapuy B, Léopold V, Sjoberg M, Keane TM, Verma A, Ala U, Tay Y, Wu D, Seitzer N, Velasco-Herrera Mdel C, Bothmer A, Fung J, Langellotto F, Rodig SJ, Elemento O, Shipp MA, Adams DJ, Chiarle R and Pandolfi PP

    Cancer Research Institute, Beth Israel Deaconess Cancer Center, Department of Medicine and Pathology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA.

    Research over the past decade has suggested important roles for pseudogenes in physiology and disease. In vitro experiments demonstrated that pseudogenes contribute to cell transformation through several mechanisms. However, in vivo evidence for a causal role of pseudogenes in cancer development is lacking. Here, we report that mice engineered to overexpress either the full-length murine B-Raf pseudogene Braf-rs1 or its pseudo "CDS" or "3' UTR" develop an aggressive malignancy resembling human diffuse large B cell lymphoma. We show that Braf-rs1 and its human ortholog, BRAFP1, elicit their oncogenic activity, at least in part, as competitive endogenous RNAs (ceRNAs) that elevate BRAF expression and MAPK activation in vitro and in vivo. Notably, we find that transcriptional or genomic aberrations of BRAFP1 occur frequently in multiple human cancers, including B cell lymphomas. Our engineered mouse models demonstrate the oncogenic potential of pseudogenes and indicate that ceRNA-mediated microRNA sequestration may contribute to the development of cancer.

    Funded by: Cancer Research UK: 13031; NCI NIH HHS: CA170158-01, R01 CA170158; Wellcome Trust; Worldwide Cancer Research: 12-0216

    Cell 2015;161;2;319-32

  • Environmental marine pathogen isolation using mesocosm culture of sharpsnout seabream: striking genomic and morphological features of novel Endozoicomonas sp.

    Katharios P, Seth-Smith HM, Fehr A, Mateos JM, Qi W, Richter D, Nufer L, Ruetten M, Guevara Soto M, Ziegler U, Thomson NR, Schlapbach R and Vaughan L

    Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Center for Marine Research, Heraklion, Crete, Greece.

    Aquaculture is a burgeoning industry, requiring diversification into new farmed species, which are often at risk from infectious disease. We used a mesocosm technique to investigate the susceptibility of sharpsnout seabream (Diplodus puntazzo) larvae to potential environmental pathogens in seawater compared to control borehole water. Fish exposed to seawater succumbed to epitheliocystis from 21 days post hatching, causing mortality in a quarter of the hosts. The pathogen responsible was not chlamydial, as is often found in epitheliocystis, but a novel species of the γ-proteobacterial genus Endozoicomonas. Detailed characterisation of this pathogen within the infectious lesions using high resolution fluorescent and electron microscopy showed densely packed rod shaped bacteria. A draft genome sequence of this uncultured bacterium was obtained from preserved material. Comparison with the genome of the Endozoicomonas elysicola type strain shows that the genome of Ca. Endozoicomonas cretensis is undergoing decay through loss of functional genes and insertion sequence expansion, often indicative of adaptation to a new niche or restriction to an alternative lifestyle. These results demonstrate the advantage of mesocosm studies for investigating the effect of environmental bacteria on susceptible hosts and provide an important insight into the genome dynamics of a novel fish pathogen.

    Funded by: Wellcome Trust: 098051

    Scientific reports 2015;5;17609

  • Trans-ancestry genome-wide association study identifies 12 genetic loci influencing blood pressure and implicates a role for DNA methylation.

    Kato N, Loh M, Takeuchi F, Verweij N, Wang X, Zhang W, Kelly TN, Saleheen D, Lehne B, Leach IM, Drong AW, Abbott J, Wahl S, Tan ST, Scott WR, Campanella G, Chadeau-Hyam M, Afzal U, Ahluwalia TS, Bonder MJ, Chen P, Dehghan A, Edwards TL, Esko T, Go MJ, Harris SE, Hartiala J, Kasela S, Kasturiratne A, Khor CC, Kleber ME, Li H, Yu Mok Z, Nakatochi M, Sapari NS, Saxena R, Stewart AFR, Stolk L, Tabara Y, Teh AL, Wu Y, Wu JY, Zhang Y, Aits I, Da Silva Couto Alves A, Das S, Dorajoo R, Hopewell JC, Kim YK, Koivula RW, Luan J, Lyytikäinen LP, Nguyen QN, Pereira MA, Postmus I, Raitakari OT, Scannell Bryan M, Scott RA, Sorice R, Tragante V, Traglia M, White J, Yamamoto K, Zhang Y, Adair LS, Ahmed A, Akiyama K, Asif R, Aung T, Barroso I, Bjonnes A, Braun TR, Cai H, Chang LC, Chen CH, Cheng CY, Chong YS, Collins R, Courtney R, Davies G, Delgado G, Do LD, Doevendans PA, Gansevoort RT, Gao YT, Grammer TB, Grarup N, Grewal J, Gu D, Wander GS, Hartikainen AL, Hazen SL, He J, Heng CK, Hixson JE, Hofman A, Hsu C, Huang W, Husemoen LLN, Hwang JY, Ichihara S, Igase M, Isono M, Justesen JM, Katsuya T, Kibriya MG, Kim YJ, Kishimoto M, Koh WP, Kohara K, Kumari M, Kwek K, Lee NR, Lee J, Liao J, Lieb W, Liewald DCM, Matsubara T, Matsushita Y, Meitinger T, Mihailov E, Milani L, Mills R, Mononen N, Müller-Nurasyid M, Nabika T, Nakashima E, Ng HK, Nikus K, Nutile T, Ohkubo T, Ohnaka K, Parish S, Paternoster L, Peng H, Peters A, Pham ST, Pinidiyapathirage MJ, Rahman M, Rakugi H, Rolandsson O, Ann Rozario M, Ruggiero D, Sala CF, Sarju R, Shimokawa K, Snieder H, Sparsø T, Spiering W, Starr JM, Stott DJ, Stram DO, Sugiyama T, Szymczak S, Tang WHW, Tong L, Trompet S, Turjanmaa V, Ueshima H, Uitterlinden AG, Umemura S, Vaarasmaki M, van Dam RM, van Gilst WH, van Veldhuisen DJ, Viikari JS, Waldenberger M, Wang Y, Wang A, Wilson R, Wong TY, Xiang YB, Yamaguchi S, Ye X, Young RD, Young TL, Yuan JM, Zhou X, Asselbergs FW, Ciullo M, Clarke R, Deloukas P, Franke A, Franks PW, Franks S, Friedlander Y, Gross MD, Guo Z, Hansen T, Jarvelin MR, Jørgensen T, Jukema JW, Kähönen M, Kajio H, Kivimaki M, Lee JY, Lehtimäki T, Linneberg A, Miki T, Pedersen O, Samani NJ, Sørensen TIA, Takayanagi R, Toniolo D, BIOS-consortium, CARDIo GRAMplusCD, LifeLines Cohort Study, InterAct Consortium, Ahsan H, Allayee H, Chen YT, Danesh J, Deary IJ, Franco OH, Franke L, Heijman BT, Holbrook JD, Isaacs A, Kim BJ, Lin X, Liu J, März W, Metspalu A, Mohlke KL, Sanghera DK, Shu XO, van Meurs JBJ, Vithana E, Wickremasinghe AR, Wijmenga C, Wolffenbuttel BHW, Yokota M, Zheng W, Zhu D, Vineis P, Kyrtopoulos SA, Kleinjans JCS, McCarthy MI, Soong R, Gieger C, Scott J, Teo YY, He J, Elliott P, Tai ES, van der Harst P, Kooner JS and Chambers JC

    Department of Gene Diagnostics and Therapeutics, Research Institute, National Center for Global Health and Medicine, Tokyo, Japan.

    We carried out a trans-ancestry genome-wide association and replication study of blood pressure phenotypes among up to 320,251 individuals of East Asian, European and South Asian ancestry. We find genetic variants at 12 new loci to be associated with blood pressure (P = 3.9 × 10(-11) to 5.0 × 10(-21)). The sentinel blood pressure SNPs are enriched for association with DNA methylation at multiple nearby CpG sites, suggesting that, at some of the loci identified, DNA methylation may lie on the regulatory pathway linking sequence variation to blood pressure. The sentinel SNPs at the 12 new loci point to genes involved in vascular smooth muscle (IGFBP3, KCNK3, PDE3A and PRDM6) and renal (ARHGAP24, OSR1, SLC22A7 and TBX2) function. The new and known genetic variants predict increased left ventricular mass, circulating levels of NT-proBNP, and cardiovascular and all-cause mortality (P = 0.04 to 8.6 × 10(-6)). Our results provide new evidence for the role of DNA methylation in blood pressure regulation.

    Funded by: AHRQ HHS: HS06516; Action on Hearing Loss: G51; Biotechnology and Biological Sciences Research Council: BB/F019394/1; British Heart Foundation: FS/14/55/30806, RG/07/008/23674, RG/08/014/24067, RG/13/2/30098, SP/04/002, SP/09/002; Chief Scientist Office: CZB/4/505, ETM/55; FIC NIH HHS: TW008288, TW05596; MRC: G0500539, G0600331, G0800270, G0902037; Medical Research Council: G0600331, G0601966, G0700931, G0800270, G0802782, G0902037, MC_U106179471, MC_UU_12013/1, MC_UU_12015/1, MR/K002414/1, MR/K013351/1, MR/K026992/1, MR/L003120/1, MR/L01341X/1, MR/M012638/1; NCATS NIH HHS: UL1 TR000439; NCI NIH HHS: R01CA144034, R01CA55069, R01CA80205, R01CA82729, R35CA53890, R37CA70867, UM1 CA173640, UM1 CA182910, UM1CA173640; NCRR NIH HHS: RR20649; NHGRI NIH HHS: N0T-HG-11-009; NHLBI NIH HHS: 5R01HL087679-02, HL085144, P01 HL076491, P01HL076491, P01HL098055, P20HL113452, R01 HL090682, R01HL087263, R01HL090682, R01HL103866, R01HL103931, R21 HL121429, U01HL072507; NIA NIH HHS: 5R01AG13196; NIDDK NIH HHS: DK078150, DK56350, R01DK082766; NIEHS NIH HHS: ES10126, R01ES021801; NIMH NIH HHS: 1RL1MH083268-01, 5R01MH63706:02; NLM NIH HHS: 2R01LM010098; Wellcome Trust: 084723, 084723/Z/08/Z

    Nature genetics 2015;47;11;1282-1293

  • 'Pop-Up' Governance: developing internal governance frameworks for consortia: the example of UK10K.

    Kaye J, Muddyman D, Smee C, Kennedy K, Bell J and UK10K

    HeLEX Centre, Nuffield Department of Population Health, University of Oxford, Oxford, UK.

    Innovations in information technologies have facilitated the development of new styles of research networks and forms of governance. This is evident in genomics where increasingly, research is carried out by large, interdisciplinary consortia focussing on a specific research endeavour. The UK10K project is an example of a human genomics consortium funded to provide insights into the genomics of rare conditions, and establish a community resource from generated sequence data. To achieve its objectives according to the agreed timetable, the UK10K project established an internal governance system to expedite the research and to deal with the complex issues that arose. The project's governance structure exemplifies a new form of network governance called 'pop-up' governance. 'Pop-up' because: it was put together quickly, existed for a specific period, was designed for a specific purpose, and was dismantled easily on project completion. In this paper, we use UK10K to describe how 'pop-up' governance works on the ground and how relational, hierarchical and contractual governance mechanisms are used in this new form of network governance.

    Funded by: Wellcome Trust: 096599

    Life sciences, society and policy 2015;11;10

  • The origins of malaria: there are more things in heaven and earth ….

    Keeling PJ and Rayner JC

    Department of Botany, Canadian Institute for Advanced Research, Evolutionary Biology Program,University of British Columbia,Vancouver, BC V6T 1Z4,Canada.

    SUMMARY Malaria remains one of the most significant global public health burdens, with nearly half of the world's population at risk of infection. Malaria is not however a monolithic disease - it can be caused by multiple different parasite species of the Plasmodium genus, each of which can induce different symptoms and pathology, and which pose quite different challenges for control. Furthermore, malaria is in no way restricted to humans. There are Plasmodium species that have adapted to infect most warm-blooded vertebrate species, and the genus as a whole is both highly successful and highly diverse. How, where and when human malaria parasites originated from within this diversity has long been a subject of fascination and sometimes also controversy. The past decade has seen the publication of a number of important discoveries about malaria parasite origins, all based on the application of molecular diagnostic tools to new sources of samples. This review summarizes some of those recent discoveries and discusses their implication for our current understanding of the origin and evolution of the Plasmodium genus. The nature of these discoveries and the manner in which they are made are then used to lay out a series of opportunities and challenges for the next wave of parasite hunters.

    Funded by: NIAID NIH HHS: R01 AI58715; Wellcome Trust: 098051

    Parasitology 2015;142 Suppl 1;S16-25

  • Intra- and inter-tumor heterogeneity in a vemurafenib-resistant melanoma patient and derived xenografts.

    Kemper K, Krijgsman O, Cornelissen-Steijger P, Shahrabi A, Weeber F, Song JY, Kuilman T, Vis DJ, Wessels LF, Voest EE, Schumacher TN, Blank CU, Adams DJ, Haanen JB and Peeper DS

    Division of Molecular Oncology, The Netherlands Cancer Institute, Amsterdam, The Netherlands.

    The development of targeted inhibitors, like vemurafenib, has greatly improved the clinical outcome of BRAF(V600E) metastatic melanoma. However, resistance to such compounds represents a formidable problem. Using whole-exome sequencing and functional analyses, we have investigated the nature and pleiotropy of vemurafenib resistance in a melanoma patient carrying multiple drug-resistant metastases. Resistance was caused by a plethora of mechanisms, all of which reactivated the MAPK pathway. In addition to three independent amplifications and an aberrant form of BRAF(V600E), we identified a new activating insertion in MEK1. This MEK1(T55delins) (RT) mutation could be traced back to a fraction of the pre-treatment lesion and not only provided protection against vemurafenib but also promoted local invasion of transplanted melanomas. Analysis of patient-derived xenografts (PDX) from therapy-refractory metastases revealed that multiple resistance mechanisms were present within one metastasis. This heterogeneity, both inter- and intra-tumorally, caused an incomplete capture in the PDX of the resistance mechanisms observed in the patient. In conclusion, vemurafenib resistance in a single patient can be established through distinct events, which may be preexisting. Furthermore, our results indicate that PDX may not harbor the full genetic heterogeneity seen in the patient's melanoma.

    Funded by: Cancer Research UK: 13031

    EMBO molecular medicine 2015;7;9;1104-18

  • The nucleosome landscape of Plasmodium falciparum reveals chromatin architecture and dynamics of regulatory sequences.

    Kensche PR, Hoeijmakers WA, Toenhake CG, Bras M, Chappell L, Berriman M and Bártfai R

    Department of Molecular Biology, Radboud University, 6525GA Nijmegen, The Netherlands.

    In eukaryotes, the chromatin architecture has a pivotal role in regulating all DNA-associated processes and it is central to the control of gene expression. For Plasmodium falciparum, a causative agent of human malaria, the nucleosome positioning profile of regulatory regions deserves particular attention because of their extreme AT-content. With the aid of a highly controlled MNase-seq procedure we reveal how positioning of nucleosomes provides a structural and regulatory framework to the transcriptional unit by demarcating landmark sites (transcription/translation start and end sites). In addition, our analysis provides strong indications for the function of positioned nucleosomes in splice site recognition. Transcription start sites (TSSs) are bordered by a small nucleosome-depleted region, but lack the stereotypic downstream nucleosome arrays, highlighting a key difference in chromatin organization compared to model organisms. Furthermore, we observe transcription-coupled eviction of nucleosomes on strong TSSs during intraerythrocytic development and demonstrate that nucleosome positioning and dynamics can be predictive for the functionality of regulatory DNA elements. Collectively, the strong nucleosome positioning over splice sites and surrounding putative transcription factor binding sites highlights the regulatory capacity of the nucleosome landscape in this deadly human pathogen.

    Funded by: Wellcome Trust: WT 098051

    Nucleic acids research 2015;44;5;2110-24

  • Zebrafish Rab5 proteins and a role for Rab5ab in nodal signalling.

    Kenyon EJ, Campos I, Bull JC, Williams PH, Stemple DL and Clark MD

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    The RAB5 gene family is the best characterised of all human RAB families and is essential for in vitro homotypic fusion of early endosomes. In recent years, the disruption or activation of Rab5 family proteins has been used as a tool to understand growth factor signal transduction in whole animal systems such as Drosophila melanogaster and zebrafish. In this study we have examined the functions for four rab5 genes in zebrafish. Disruption of rab5ab expression by antisense morpholino oligonucleotide (MO) knockdown abolishes nodal signalling in early zebrafish embryos, whereas overexpression of rab5ab mRNA leads to ectopic expression of markers that are normally downstream of nodal signalling. By contrast MO disruption of other zebrafish rab5 genes shows little or no effect on expression of markers of dorsal organiser development. We conclude that rab5ab is essential for nodal signalling and organizer specification in the developing zebrafish embryo.

    Funded by: Wellcome Trust: 098051, WR077037/Z/05/Z, WT077047/Z/05/Z

    Developmental biology 2015;397;2;212-24

  • BCL11A is a triple-negative breast cancer gene with critical functions in stem and progenitor cells.

    Khaled WT, Choon Lee S, Stingl J, Chen X, Raza Ali H, Rueda OM, Hadi F, Wang J, Yu Y, Chin SF, Stratton M, Futreal A, Jenkins NA, Aparicio S, Copeland NG, Watson CJ, Caldas C and Liu P

    1] Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK [2] Department of Pharmacology, University of Cambridge, Cambridge CB2 1PD, UK [3].

    Triple-negative breast cancer (TNBC) has poor prognostic outcome compared with other types of breast cancer. The molecular and cellular mechanisms underlying TNBC pathology are not fully understood. Here, we report that the transcription factor BCL11A is overexpressed in TNBC including basal-like breast cancer (BLBC) and that its genomic locus is amplified in up to 38% of BLBC tumours. Exogenous BCL11A overexpression promotes tumour formation, whereas its knockdown in TNBC cell lines suppresses their tumourigenic potential in xenograft models. In the DMBA-induced tumour model, Bcl11a deletion substantially decreases tumour formation, even in p53-null cells and inactivation of Bcl11a in established tumours causes their regression. At the cellular level, Bcl11a deletion causes a reduction in the number of mammary epithelial stem and progenitor cells. Thus, BCL11A has an important role in TNBC and normal mammary epithelial cells. This study highlights the importance of further investigation of BCL11A in TNBC-targeted therapies.

    Funded by: Wellcome Trust: 077186, 098051

    Nature communications 2015;6;5987

  • Characterizing noise structure in single-cell RNA-seq distinguishes genuine from technical stochastic allelic expression.

    Kim JK, Kolodziejczyk AA, Ilicic T, Illicic T, Teichmann SA and Marioni JC

    European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK.

    Single-cell RNA-sequencing (scRNA-seq) facilitates identification of new cell types and gene regulatory networks as well as dissection of the kinetics of gene expression and patterns of allele-specific expression. However, to facilitate such analyses, separating biological variability from the high level of technical noise that affects scRNA-seq protocols is vital. Here we describe and validate a generative statistical model that accurately quantifies technical noise with the help of external RNA spike-ins. Applying our approach to investigate stochastic allele-specific expression in individual cells, we demonstrate that a large fraction of stochastic allele-specific expression can be explained by technical noise, especially for lowly and moderately expressed genes: we predict that only 17.8% of stochastic allele-specific expression patterns are attributable to biological noise with the remainder due to technical noise.

    Funded by: Biotechnology and Biological Sciences Research Council

    Nature communications 2015;6;8687

  • A new strategy for enhancing imputation quality of rare variants from next-generation sequencing data via combining SNP and exome chip data.

    Kim YJ, Lee J, Kim BJ, T2D-Genes Consortium and Park T

    Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 151-742, South Korea.

    Background: Rare variants have gathered increasing attention as a possible alternative source of missing heritability. Since next generation sequencing technology is not yet cost-effective for large-scale genomic studies, a widely used alternative approach is imputation. However, the imputation approach may be limited by the low accuracy of the imputed rare variants. To improve imputation accuracy of rare variants, various approaches have been suggested, including increasing the sample size of the reference panel, using sequencing data from study-specific samples (i.e., specific populations), and using local reference panels by genotyping or sequencing a subset of study samples. While these approaches mainly utilize reference panels, imputation accuracy of rare variants can also be increased by using exome chips containing rare variants. The exome chip contains 250 K rare variants selected from the discovered variants of about 12,000 sequenced samples. If exome chip data are available for previously genotyped samples, the combined approach using a genotype panel of merged data, including exome chips and SNP chips, should increase the imputation accuracy of rare variants.

    Results: In this study, we describe a combined imputation which uses both exome chip and SNP chip data simultaneously as a genotype panel. The effectiveness and performance of the combined approach was demonstrated using a reference panel of 848 samples constructed using exome sequencing data from the T2D-GENES consortium and 5,349 sample genotype panels consisting of an exome chip and SNP chip. As a result, the combined approach increased imputation quality up to 11 %, and genomic coverage for rare variants up to 117.7 % (MAF < 1 %), compared to imputation using the SNP chip alone. Also, we investigated the systematic effect of reference panels on imputation quality using five reference panels and three genotype panels. The best performing approach was the combination of the study specific reference panel and the genotype panel of combined data.

    Conclusions: Our study demonstrates that combined datasets, including SNP chips and exome chips, enhances both the imputation quality and genomic coverage of rare variants.

    Funded by: NIA NIH HHS: P30 AG038072, R01 AG046949; NIDDK NIH HHS: P30 DK020595, U01 DK085501, U01 DK085524, U01 DK085526, U01 DK085545, U01 DK085584

    BMC genomics 2015;16;1109

  • Mosaic structural variation in children with developmental disorders.

    King DA, Jones WD, Crow YJ, Dominiczak AF, Foster NA, Gaunt TR, Harris J, Hellens SW, Homfray T, Innes J, Jones EA, Joss S, Kulkarni A, Mansour S, Morris AD, Parker MJ, Porteous DJ, Shihab HA, Smith BH, Tatton-Brown K, Tolmie JL, Trzaskowski M, Vasudevan PC, Wakeling E, Wright M, Plomin R, Timpson NJ, Hurles ME and Deciphering Developmental Disorders Study

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.

    Delineating the genetic causes of developmental disorders is an area of active investigation. Mosaic structural abnormalities, defined as copy number or loss of heterozygosity events that are large and present in only a subset of cells, have been detected in 0.2-1.0% of children ascertained for clinical genetic testing. However, the frequency among healthy children in the community is not well characterized, which, if known, could inform better interpretation of the pathogenic burden of this mutational category in children with developmental disorders. In a case-control analysis, we compared the rate of large-scale mosaicism between 1303 children with developmental disorders and 5094 children lacking developmental disorders, using an analytical pipeline we developed, and identified a substantial enrichment in cases (odds ratio = 39.4, P-value 1.073e - 6). A meta-analysis that included frequency estimates among an additional 7000 children with congenital diseases yielded an even stronger statistical enrichment (P-value 1.784e - 11). In addition, to maximize the detection of low-clonality events in probands, we applied a trio-based mosaic detection algorithm, which detected two additional events in probands, including an individual with genome-wide suspected chimerism. In total, we detected 12 structural mosaic abnormalities among 1303 children (0.9%). Given the burden of mosaicism detected in cases, we suspected that many of the events detected in probands were pathogenic. Scrutiny of the genotypic-phenotypic relationship of each detected variant assessed that the majority of events are very likely pathogenic. This work quantifies the burden of structural mosaicism as a cause of developmental disorders.

    Funded by: Chief Scientist Office: CZD/16/6, CZD/16/6/4; Medical Research Council: G0500079, G0901245, MC_PC_15018, MC_UU_12013/3, MC_UU_12013/8; Wellcome Trust: 102215, 102215/2/13/2, WT098051

    Human molecular genetics 2015;24;10;2733-45

  • Uric Acid and Cardiovascular Events: A Mendelian Randomization Study.

    Kleber ME, Delgado G, Grammer TB, Silbernagel G, Huang J, Krämer BK, Ritz E and März W

    Fifth Department of Medicine (Nephrology, Hypertensiology, Endocrinology, Diabetology, Rheumatology), Medical Faculty of Mannheim, University of Heidelberg, Mannheim, Germany;

    Obesity and diets rich in uric acid-raising components appear to account for the increased prevalence of hyperuricemia in Westernized populations. Prevalence rates of hypertension, diabetes mellitus, CKD, and cardiovascular disease are also increasing. We used Mendelian randomization to examine whether uric acid is an independent and causal cardiovascular risk factor. Serum uric acid was measured in 3315 patients of the Ludwigshafen Risk and Cardiovascular Health Study. We calculated a weighted genetic risk score (GRS) for uric acid concentration based on eight uric acid-regulating single nucleotide polymorphisms. Causal odds ratios and causal hazard ratios (HRs) were calculated using a two-stage regression estimate with the GRS as the instrumental variable to examine associations with cardiometabolic phenotypes (cross-sectional) and mortality (prospectively) by logistic regression and Cox regression, respectively. Our GRS was not consistently associated with any biochemical marker except for uric acid, arguing against pleiotropy. Uric acid was associated with a range of prevalent diseases, including coronary artery disease. Uric acid and the GRS were both associated with cardiovascular death and sudden cardiac death. In a multivariate model adjusted for factors including medication, causal HRs corresponding to each 1-mg/dl increase in genetically predicted uric acid concentration were significant for cardiovascular death (HR, 1.77; 95% confidence interval, 1.12 to 2.81) and sudden cardiac death (HR, 2.41; 95% confidence interval, 1.16 to 5.00). These results suggest that high uric acid is causally related to adverse cardiovascular outcomes, especially sudden cardiac death.

    Journal of the American Society of Nephrology : JASN 2015;26;11;2831-8

  • Design of a study to determine the impact of insecticide resistance on malaria vector control: a multi-country investigation.

    Kleinschmidt I, Mnzava AP, Kafy HT, Mbogo C, Bashir AI, Bigoga J, Adechoubou A, Raghavendra K, Knox TB, Malik EM, Nkuni ZJ, Bayoh N, Ochomo E, Fondjo E, Kouambeng C, Awono-Ambene HP, Etang J, Akogbeto M, Bhatt R, Swain DK, Kinyari T, Njagi K, Muthami L, Subramaniam K, Bradley J, West P, Massougbodji A, Okê-Sopoh M, Hounto A, Elmardi K, Valecha N, Kamau L, Mathenge E and Donnelly MJ

    MRC Tropical Epidemiology Group, Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine, Keppel Street, London, WC1E 7HT, UK.

    Background: Progress in reducing the malaria disease burden through the substantial scale up of insecticide-based vector control in recent years could be reversed by the widespread emergence of insecticide resistance. The impact of insecticide resistance on the protective effectiveness of insecticide-treated nets (ITN) and indoor residual spraying (IRS) is not known. A multi-country study was undertaken in Sudan, Kenya, India, Cameroon and Benin to quantify the potential loss of epidemiological effectiveness of ITNs and IRS due to decreased susceptibility of malaria vectors to insecticides. The design of the study is described in this paper.

    Methods: Malaria disease incidence rates by active case detection in cohorts of children, and indicators of insecticide resistance in local vectors were monitored in each of approximately 300 separate locations (clusters) with high coverage of malaria vector control over multiple malaria seasons. Phenotypic and genotypic resistance was assessed annually. In two countries, Sudan and India, clusters were randomly assigned to receive universal coverage of ITNs only, or universal coverage of ITNs combined with high coverage of IRS. Association between malaria incidence and insecticide resistance, and protective effectiveness of vector control methods and insecticide resistance were estimated, respectively.

    Results: Cohorts have been set up in all five countries, and phenotypic resistance data have been collected in all clusters. In Sudan, Kenya, Cameroon and Benin data collection is due to be completed in 2015. In India data collection will be completed in 2016.

    Discussion: The paper discusses challenges faced in the design and execution of the study, the analysis plan, the strengths and weaknesses, and the possible alternatives to the chosen study design.

    Funded by: Medical Research Council: MR/K012126/1; Wellcome Trust: 092654

    Malaria journal 2015;14;282

  • Concomitant inactivation of the p53- and pRB- functional pathways predicts resistance to DNA damaging drugs in breast cancer in vivo.

    Knappskog S, Berge EO, Chrisanthar R, Geisler S, Staalesen V, Leirvaag B, Yndestad S, de Faveri E, Karlsen BO, Wedge DC, Akslen LA, Lilleng PK, Løkkevik E, Lundgren S, Østenstad B, Risberg T, Mjaaland I, Aas T and Lønning PE

    Section of Oncology, Department of Clinical Science, University of Bergen, Norway; Department of Oncology, Haukeland University Hospital, Bergen, Norway. Electronic address:

    Chemoresistance is the main obstacle to cancer cure. Contrasting studies focusing on single gene mutations, we hypothesize chemoresistance to be due to inactivation of key pathways affecting cellular mechanisms such as apoptosis, senescence, or DNA repair. In support of this hypothesis, we have previously shown inactivation of either TP53 or its key activators CHK2 and ATM to predict resistance to DNA damaging drugs in breast cancer better than TP53 mutations alone. Further, we hypothesized that redundant pathway(s) may compensate for loss of p53-pathway signaling and that these are inactivated as well in resistant tumour cells. Here, we assessed genetic alterations of the retinoblastoma gene (RB1) and its key regulators: Cyclin D and E as well as their inhibitors p16 and p27. In an exploratory cohort of 69 patients selected from two prospective studies treated with either doxorubicin monotherapy or 5-FU and mitomycin for locally advanced breast cancers, we found defects in the pRB-pathway to be associated with therapy resistance (p-values ranging from 0.001 to 0.094, depending on the cut-off value applied to p27 expression levels). Although statistically weaker, we observed confirmatory associations in a validation cohort from another prospective study (n = 107 patients treated with neoadjuvant epirubicin monotherapy; p-values ranging from 7.0 × 10(-4) to 0.001 in the combined data sets). Importantly, inactivation of the p53-and the pRB-pathways in concert predicted resistance to therapy more strongly than each of the two pathways assessed individually (exploratory cohort: p-values ranging from 3.9 × 10(-6) to 7.5 × 10(-3) depending on cut-off values applied to ATM and p27 mRNA expression levels). Again, similar findings were confirmed in the validation cohort, with p-values ranging from 6.0 × 10(-7) to 6.5 × 10(-5) in the combined data sets. Our findings strongly indicate that concomitant inactivation of the p53- and pRB- pathways predict resistance towards anthracyclines and mitomycin in breast cancer in vivo.

    Molecular oncology 2015;9;8;1553-64

  • The technology and biology of single-cell RNA sequencing.

    Kolodziejczyk AA, Kim JK, Svensson V, Marioni JC and Teichmann SA

    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK; Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    The differences between individual cells can have profound functional consequences, in both unicellular and multicellular organisms. Recently developed single-cell mRNA-sequencing methods enable unbiased, high-throughput, and high-resolution transcriptomic analysis of individual cells. This provides an additional dimension to transcriptomic information relative to traditional methods that profile bulk populations of cells. Already, single-cell RNA-sequencing methods have revealed new biology in terms of the composition of tissues, the dynamics of transcription, and the regulatory relationships between genes. Rapid technological developments at the level of cell capture, phenotyping, molecular biology, and bioinformatics promise an exciting future with numerous biological and medical applications.

    Molecular cell 2015;58;4;610-20

  • Single Cell RNA-Sequencing of Pluripotent States Unlocks Modular Transcriptional Variation.

    Kolodziejczyk AA, Kim JK, Tsang JC, Ilicic T, Henriksson J, Natarajan KN, Tuck AC, Gao X, Bühler M, Liu P, Marioni JC and Teichmann SA

    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK; Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Embryonic stem cell (ESC) culture conditions are important for maintaining long-term self-renewal, and they influence cellular pluripotency state. Here, we report single cell RNA-sequencing of mESCs cultured in three different conditions: serum, 2i, and the alternative ground state a2i. We find that the cellular transcriptomes of cells grown in these conditions are distinct, with 2i being the most similar to blastocyst cells and including a subpopulation resembling the two-cell embryo state. Overall levels of intercellular gene expression heterogeneity are comparable across the three conditions. However, this masks variable expression of pluripotency genes in serum cells and homogeneous expression in 2i and a2i cells. Additionally, genes related to the cell cycle are more variably expressed in the 2i and a2i conditions. Mining of our dataset for correlations in gene expression allowed us to identify additional components of the pluripotency network, including Ptma and Zfp640, illustrating its value as a resource for future discovery.

    Funded by: Biotechnology and Biological Sciences Research Council; Wellcome Trust: 103977

    Cell stem cell 2015;17;4;471-85

  • A Novel Terminal-Repeat Retrotransposon in Miniature (TRIM) Is Massively Expressed in Echinococcus multilocularis Stem Cells.

    Koziol U, Radio S, Smircich P, Zarowiecki M, Fernández C and Brehm K

    Institute of Hygiene and Microbiology, University of Würzburg, Germany Sección Bioquímica y Biología Molecular, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay

    Taeniid cestodes (including the human parasites Echinococcus spp. and Taenia solium) have very few mobile genetic elements (MGEs) in their genome, despite lacking a canonical PIWI pathway. The MGEs of these parasites are virtually unexplored, and nothing is known about their expression and silencing. In this work, we report the discovery of a novel family of small nonautonomous long terminal repeat retrotransposons (also known as terminal-repeat retrotransposons in miniature, TRIMs) which we have named ta-TRIM (taeniid TRIM). ta-TRIMs are only the second family of TRIM elements discovered in animals, and are likely the result of convergent reductive evolution in different taxonomic groups. These elements originated at the base of the taeniid tree and have expanded during taeniid diversification, including after the divergence of closely related species such as Echinococcus multilocularis and Echinococcus granulosus. They are massively expressed in larval stages, from a small proportion of full-length copies and from isolated terminal repeats that show transcriptional read-through into downstream regions, generating novel noncoding RNAs and transcriptional fusions to coding genes. In E. multilocularis, ta-TRIMs are specifically expressed in the germinative cells (the somatic stem cells) during asexual reproduction of metacestode larvae. This would provide a developmental mechanism for insertion of ta-TRIMs into cells that will eventually generate the adult germ line. Future studies of active and inactive ta-TRIM elements could give the first clues on MGE silencing mechanisms in cestodes.

    Genome biology and evolution 2015;7;8;2136-53

  • Clinical and molecular characterization of a novel PLIN1 frameshift mutation identified in patients with familial partial lipodystrophy.

    Kozusko K, Tsang V, Bottomley W, Cho YH, Gandotra S, Mimmack ML, Lim K, Isaac I, Patel S, Saudek V, O'Rahilly S, Srinivasan S, Greenfield JR, Barroso I, Campbell LV and Savage DB

    University of Cambridge Metabolic Research Laboratories, Wellcome Trust-Medical Research Council Institute of Metabolic Science, University of Cambridge, UK.

    Perilipin 1 is a lipid droplet coat protein predominantly expressed in adipocytes, where it inhibits basal and facilitates stimulated lipolysis. Loss-of-function mutations in the PLIN1 gene were recently reported in patients with a novel subtype of familial partial lipodystrophy, designated as FPLD4. We now report the identification and characterization of a novel heterozygous frameshift mutation affecting the carboxy-terminus (439fs) of perilipin 1 in two unrelated families. The mutation cosegregated with a similar phenotype including partial lipodystrophy, severe insulin resistance and type 2 diabetes, extreme hypertriglyceridemia, and nonalcoholic fatty liver disease in both families. Poor metabolic control despite maximal medical therapy prompted two patients to undergo bariatric surgery, with remarkably beneficial consequences. Functional studies indicated that expression levels of the mutant protein were lower than wild-type protein, and in stably transfected preadipocytes the mutant protein was associated with smaller lipid droplets. Interestingly, unlike the previously reported 398 and 404 frameshift mutants, this variant binds and stabilizes ABHD5 expression but still fails to inhibit basal lipolysis as effectively as wild-type perilipin 1. Collectively, these findings highlight the physiological need for exquisite regulation of neutral lipid storage within adipocyte lipid droplets, as well as the possible metabolic benefits of bariatric surgery in this serious disease.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council: MC_UU_12012/5; Wellcome Trust: 091551, 095515, 100574

    Diabetes 2015;64;1;299-310

  • CopywriteR: DNA copy number detection from off-target sequence data.

    Kuilman T, Velds A, Kemper K, Ranzani M, Bombardelli L, Hoogstraat M, Nevedomskaya E, Xu G, de Ruiter J, Lolkema MP, Ylstra B, Jonkers J, Rottenberg S, Wessels LF, Adams DJ, Peeper DS and Krijgsman O

    Division of Molecular Oncology, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands.

    Current methods for detection of copy number variants (CNV) and aberrations (CNA) from targeted sequencing data are based on the depth of coverage of captured exons. Accurate CNA determination is complicated by uneven genomic distribution and non-uniform capture efficiency of targeted exons. Here we present CopywriteR, which eludes these problems by exploiting 'off-target' sequence reads. CopywriteR allows for extracting uniformly distributed copy number information, can be used without reference, and can be applied to sequencing data obtained from various techniques including chromatin immunoprecipitation and target enrichment on small gene panels. CopywriteR outperforms existing methods and constitutes a widely applicable alternative to available tools.

    Funded by: Cancer Research UK: 13031

    Genome biology 2015;16;49

  • Comparative Genomic Analysis of Meningitis- and Bacteremia-Causing Pneumococci Identifies a Common Core Genome.

    Kulohoma BW, Cornick JE, Chaguza C, Yalcin F, Harris SR, Gray KJ, Kiran AM, Molyneux E, French N, Parkhill J, Faragher BE, Everett DB, Bentley SD and Heyderman RS

    Malawi-Liverpool-Wellcome Trust Clinical Research Programme, University of Malawi College of Medicine, Blantyre, Malawi Institute of Infection and Global Health, University of Liverpool, Liverpool, United Kingdom

    Streptococcus pneumoniae is a nasopharyngeal commensal that occasionally invades normally sterile sites to cause bloodstream infection and meningitis. Although the pneumococcal population structure and evolutionary genetics are well defined, it is not clear whether pneumococci that cause meningitis are genetically distinct from those that do not. Here, we used whole-genome sequencing of 140 isolates of S. pneumoniae recovered from bloodstream infection (n = 70) and meningitis (n = 70) to compare their genetic contents. By fitting a double-exponential decaying-function model, we show that these isolates share a core of 1,427 genes (95% confidence interval [CI], 1,425 to 1,435 genes) and that there is no difference in the core genome or accessory gene content from these disease manifestations. Gene presence/absence alone therefore does not explain the virulence behavior of pneumococci that reach the meninges. Our analysis, however, supports the requirement of a range of previously described virulence factors and vaccine candidates for both meningitis- and bacteremia-causing pneumococci. This high-resolution view suggests that, despite considerable competency for genetic exchange, all pneumococci are under considerable pressure to retain key components advantageous for colonization and transmission and that these components are essential for access to and survival in sterile sites.

    Funded by: Wellcome Trust: 084679/Z/08/Z, 098051

    Infection and immunity 2015;83;10;4165-73

  • Fine-mapping cellular QTLs with RASQUAL and ATAC-seq.

    Kumasaka N, Knights AJ and Gaffney DJ

    Wellcome Trust Sanger Institute, Wellcome Genome Campus, Cambridge, UK.

    When cellular traits are measured using high-throughput DNA sequencing, quantitative trait loci (QTLs) manifest as fragment count differences between individuals and allelic differences within individuals. We present RASQUAL (Robust Allele-Specific Quantitation and Quality Control), a new statistical approach for association mapping that models genetic effects and accounts for biases in sequencing data using a single, probabilistic framework. RASQUAL substantially improves fine-mapping accuracy and sensitivity relative to existing methods in RNA-seq, DNase-seq and ChIP-seq data. We illustrate how RASQUAL can be used to maximize association detection by generating the first map of chromatin accessibility QTLs (caQTLs) in a European population using ATAC-seq. Despite a modest sample size, we identified 2,707 independent caQTLs (at a false discovery rate of 10%) and demonstrated how RASQUAL and ATAC-seq can provide powerful information for fine-mapping gene-regulatory variants and for linking distal regulatory elements with gene promoters. Our results highlight how combining between-individual and allele-specific genetic signals improves the functional interpretation of noncoding variation.

    Funded by: Wellcome Trust: 098051

    Nature genetics 2015;48;2;206-13

  • Identification of protein complexes that bind to histone H3 combinatorial modifications using super-SILAC and weighted correlation network analysis.

    Kunowska N, Rotival M, Yu L, Choudhary J and Dillon N

    Gene Regulation and Chromatin Group, MRC Clinical Sciences Centre, Imperial College, Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK.

    The large number of chemical modifications that are found on the histone proteins of eukaryotic cells form multiple complex combinations, which can act as recognition signals for reader proteins. We have used peptide capture in conjunction with super-SILAC quantification to carry out an unbiased high-throughput analysis of the composition of protein complexes that bind to histone H3K9/S10 and H3K27/S28 methyl-phospho modifications. The accurate quantification allowed us to perform Weighted correlation network analysis (WGCNA) to obtain a systems-level view of the histone H3 histone tail interactome. The analysis reveals the underlying modularity of the histone reader network with members of nuclear complexes exhibiting very similar binding signatures, which suggests that many proteins bind to histones as part of pre-organized complexes. Our results identify a novel complex that binds to the double H3K9me3/S10ph modification, which includes Atrx, Daxx and members of the FACT complex. The super-SILAC approach allows comparison of binding to multiple peptides with different combinations of modifications and the resolution of the WGCNA analysis is enhanced by maximizing the number of combinations that are compared. This makes it a useful approach for assessing the effects of changes in histone modification combinations on the composition and function of bound complexes.

    Funded by: Medical Research Council: MC_U120036884

    Nucleic acids research 2015;43;3;1418-32

  • PhenStat: A Tool Kit for Standardized Analysis of High Throughput Phenotypic Data.

    Kurbatova N, Mason JC, Morgan H, Meehan TF and Karp NA

    The EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom.

    The lack of reproducibility with animal phenotyping experiments is a growing concern among the biomedical community. One contributing factor is the inadequate description of statistical analysis methods that prevents researchers from replicating results even when the original data are provided. Here we present PhenStat--a freely available R package that provides a variety of statistical methods for the identification of phenotypic associations. The methods have been developed for high throughput phenotyping pipelines implemented across various experimental designs with an emphasis on managing temporal variation. PhenStat is targeted to two user groups: small-scale users who wish to interact and test data from large resources and large-scale users who require an automated statistical analysis pipeline. The software provides guidance to the user for selecting appropriate analysis methods based on the dataset and is designed to allow for additions and modifications as needed. The package was tested on mouse and rat data and is used by the International Mouse Phenotyping Consortium (IMPC). By providing raw data and the version of PhenStat used, resources like the IMPC give users the ability to replicate and explore results within their own computing environment.

    Funded by: NHGRI NIH HHS: 1 U54 HG006370-1 U54 HG006370-; Wellcome Trust: WT098051

    PloS one 2015;10;7;e0131274

  • Genetic Heritage of the Balto-Slavic Speaking Populations: A Synthesis of Autosomal, Mitochondrial and Y-Chromosomal Data.

    Kushniarevich A, Utevska O, Chuhryaeva M, Agdzhoyan A, Dibirova K, Uktveryte I, Möls M, Mulahasanovic L, Pshenichnov A, Frolova S, Shanko A, Metspalu E, Reidla M, Tambets K, Tamm E, Koshel S, Zaporozhchenko V, Atramentova L, Kučinskas V, Davydenko O, Goncharova O, Evseeva I, Churnosov M, Pocheshchova E, Yunusbayev B, Khusnutdinova E, Marjanović D, Rudan P, Rootsi S, Yankovsky N, Endicott P, Kassian A, Dybo A, Genographic Consortium, Tyler-Smith C, Balanovska E, Metspalu M, Kivisild T, Villems R and Balanovsky O

    Evolutionary Biology Group, Estonian Biocentre, Tartu, Estonia; Institute of Genetics and Cytology, National Academy of Sciences of Belarus, Minsk, Belarus.

    The Slavic branch of the Balto-Slavic sub-family of Indo-European languages underwent rapid divergence as a result of the spatial expansion of its speakers from Central-East Europe, in early medieval times. This expansion-mainly to East Europe and the northern Balkans-resulted in the incorporation of genetic components from numerous autochthonous populations into the Slavic gene pools. Here, we characterize genetic variation in all extant ethnic groups speaking Balto-Slavic languages by analyzing mitochondrial DNA (n = 6,876), Y-chromosomes (n = 6,079) and genome-wide SNP profiles (n = 296), within the context of other European populations. We also reassess the phylogeny of Slavic languages within the Balto-Slavic branch of Indo-European. We find that genetic distances among Balto-Slavic populations, based on autosomal and Y-chromosomal loci, show a high correlation (0.9) both with each other and with geography, but a slightly lower correlation (0.7) with mitochondrial DNA and linguistic affiliation. The data suggest that genetic diversity of the present-day Slavs was predominantly shaped in situ, and we detect two different substrata: 'central-east European' for West and East Slavs, and 'south-east European' for South Slavs. A pattern of distribution of segments identical by descent between groups of East-West and South Slavs suggests shared ancestry or a modest gene flow between those two groups, which might derive from the historic spread of Slavic people.

    Funded by: Wellcome Trust

    PloS one 2015;10;9;e0135820

  • Malaria genomics: tracking a diverse and evolving parasite population.

    Kwiatkowski D

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA and Oxford University, Henry Wellcome Building for Molecular Physiology, Old Road Campus, Headington, Oxford, OX3 7BN, UK

    Malaria parasites are continually evolving to evade the immune system and human attempts to control the disease. To eliminate malaria from regions where it is deeply entrenched we need ways of monitoring what is going on in the parasite population, detecting problematic changes as soon as they arise, and executing a prompt and effective response based on a deep understanding of this natural evolutionary process. Powerful new tools to address this problem are emerging from the fast-growing field of genomic epidemiology, driven by new sequencing technologies and computational methods that allow parasite genome variation to be studied in much greater detail and in many more samples than was previously considered possible. These new tools will provide a deep understanding of what is going on in the parasite population, generating actionable knowledge for strategic planning of control interventions, for monitoring their effects and steering them for greatest impact, and for raising the alert if things start to go wrong.

    Funded by: Medical Research Council: G0600718; Wellcome Trust: 090532/Z/09/Z, 090770, 098051

    International health 2015;7;2;82-4

  • Evolutionary Trade-Offs Underlie the Multi-faceted Virulence of Staphylococcus aureus.

    Laabei M, Uhlemann AC, Lowy FD, Austin ED, Yokoyama M, Ouadi K, Feil E, Thorpe HA, Williams B, Perkins M, Peacock SJ, Clarke SR, Dordel J, Holden M, Votintseva AA, Bowden R, Crook DW, Young BC, Wilson DJ, Recker M and Massey RC

    Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom.

    Bacterial virulence is a multifaceted trait where the interactions between pathogen and host factors affect the severity and outcome of the infection. Toxin secretion is central to the biology of many bacterial pathogens and is widely accepted as playing a crucial role in disease pathology. To understand the relationship between toxicity and bacterial virulence in greater depth, we studied two sequenced collections of the major human pathogen Staphylococcus aureus and found an unexpected inverse correlation between bacterial toxicity and disease severity. By applying a functional genomics approach, we identified several novel toxicity-affecting loci responsible for the wide range in toxic phenotypes observed within these collections. To understand the apparent higher propensity of low toxicity isolates to cause bacteraemia, we performed several functional assays, and our findings suggest that within-host fitness differences between high- and low-toxicity isolates in human serum is a contributing factor. As invasive infections, such as bacteraemia, limit the opportunities for onward transmission, highly toxic strains could gain an additional between-host fitness advantage, potentially contributing to the maintenance of toxicity at the population level. Our results clearly demonstrate how evolutionary trade-offs between toxicity, relative fitness, and transmissibility are critical for understanding the multifaceted nature of bacterial virulence.

    Funded by: Medical Research Council: G0800778, MC_PC_13058; NIAID NIH HHS: T32 AI100852; Wellcome Trust: 101237

    PLoS biology 2015;13;9;e1002229

  • The cytochrome P450 family in the parasitic nematode Haemonchus contortus.

    Laing R, Bartley DJ, Morrison AA, Rezansoff A, Martinelli A, Laing ST and Gilleard JS

    University of Glasgow, Glasgow, UK. Electronic address:

    Haemonchus contortus, a highly pathogenic and economically important parasitic nematode of sheep, is particularly adept at developing resistance to the anthelmintic drugs used in its treatment and control. The basis of anthelmintic resistance is poorly understood for many commonly used drugs with most research being focused on mechanisms involving drug targets or drug efflux. Altered or increased drug metabolism is a possible mechanism that has yet to receive much attention despite the clear role of xenobiotic metabolism in pesticide resistance in insects. The cytochrome P450s (CYPs) are a large family of drug-metabolising enzymes present in almost all living organisms, but for many years thought to be absent from parasitic nematodes. In this paper, we describe the CYP sequences encoded in the H. contortus genome and compare their expression in different parasite life-stages, sexes and tissues. We developed a novel real-time PCR approach based on partially assembled CYP sequences "tags" and confirmed findings in the subsequent draft genome with RNA-seq. Constitutive expression was highest in larval stages for the majority of CYPs, although higher expression was detected in the adult male or female for a small subset of genes. Many CYPs were expressed in the worm intestine. A number of H. contortus genes share high identity with Caenorhabditis elegans CYPs and the similarity in their expression profiles supports their classification as putative orthologues. Notably, H. contortus appears to lack the dramatic CYP subfamily expansions seen in C. elegans and other species, which are typical of CYPs with exogenous roles. However, a small group of H. contortus genes cluster with the C. elegans CYP34 and CYP35 subfamilies and may represent candidate xenobiotic metabolising genes in the parasite.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/E018505/1; Canadian Institutes of Health Research: 230927; Wellcome Trust: 098051

    International journal for parasitology 2015;45;4;243-51

  • Investigation of GRIN2A in common epilepsy phenotypes.

    Lal D, Steinbrücker S, Schubert J, Sander T, Becker F, Weber Y, Lerche H, Thiele H, Krause R, Lehesjoki AE, Nürnberg P, Palotie A, Neubauer BA, Muhle H, Stephani U, Helbig I, Becker AJ, Schoch S, Hansen J, Dorn T, Hohl C, Lüscher N, Epicure consortium, EuroEPINOMICS-CoGIE consortium, von Spiczak S and Lemke JR

    Cologne Center for Genomics, University of Cologne, Cologne, Germany. Electronic address:

    Recently, mutations and deletions in the GRIN2A gene have been identified to predispose to benign and severe idiopathic focal epilepsies (IFE), revealing a higher incidence of GRIN2A alterations among the more severe phenotypes. This study aimed to explore the phenotypic boundaries of GRIN2A mutations by investigating patients with the two most common epilepsy syndromes: (i) idiopathic generalized epilepsy (IGE) and (ii) temporal lobe epilepsy (TLE). Whole exome sequencing data of 238 patients with IGE as well as Sanger sequencing of 84 patients with TLE were evaluated for GRIN2A sequence alterations. Two additional independent cohorts comprising 1469 IGE and 330 TLE patients were screened for structural deletions (>40kb) involving GRIN2A. Apart from a presumably benign, non-segregating variant in a patient with juvenile absence epilepsy, neither mutations nor deletions were detected in either cohort. These findings suggest that mutations in GRIN2A preferentially are involved in genetic variance of pediatric IFE and do not contribute significantly to either adult focal epilepsies as TLE or generalized epilepsies.

    Epilepsy research 2015;115;95-9

  • Modelling the effects of mass drug administration on the molecular epidemiology of schistosomes.

    Lamberton PH, Crellen T, Cotton JA and Webster JP

    Department of Infectious Disease Epidemiology, School of Public Health, Faculty of Medicine, Imperial College London, St Mary's Campus, London, UK.

    As national governments scale up mass drug administration (MDA) programs aimed to combat neglected tropical diseases (NTDs), novel selection pressures on these parasites increase. To understand how parasite populations are affected by MDA and how to maximize the success of control programmes, it is imperative for epidemiological, molecular and mathematical modelling approaches to be combined. Modelling of parasite population genetic and genomic structure, particularly of the NTDs, has been limited through the availability of only a few molecular markers to date. The landscape of infectious disease research is being dramatically reshaped by next-generation sequencing technologies and our understanding of how repeated selective pressures are shaping parasite populations is radically altering. Genomics can provide high-resolution data on parasite population structure, and identify how loci may contribute to key phenotypes such as virulence and/or drug resistance. We discuss the incorporation of genetic and genomic data, focussing on the recently sequenced Schistosoma spp., into novel mathematical transmission models to inform our understanding of the impact of MDA and other control methods. We summarize what is known to date, the models that exist and how population genetics has given us an understanding of the effects of MDA on the parasites. We consider how genetic and genomic data have the potential to shape future research, highlighting key areas where data are lacking, and how future molecular epidemiology knowledge can aid understanding of transmission dynamics and the effects of MDA, ultimately informing public health policy makers of the best interventions for NTDs.

    Funded by: Medical Research Council; Wellcome Trust: 098051

    Advances in parasitology 2015;87;293-327

  • Variable alterations of the microbiota, without metabolic or immunological change, following faecal microbiota transplantation in patients with chronic pouchitis.

    Landy J, Walker AW, Li JV, Al-Hassi HO, Ronde E, English NR, Mann ER, Bernardo D, McLaughlin SD, Parkhill J, Ciclitira PJ, Clark SK, Knight SC and Hart AL

    1] IBD Unit, Gastroenterology Dept. St Mark's Hospital, Harrow, London, UK [2] Antigen Presentation Research Group, Faculty of Medicine, Imperial College London, Northwick Park and St Mark's Campus, Harrow, UK.

    Faecal microbiota transplantation (FMT) is effective in the treatment of Clostridium difficile infection, where efficacy correlates with changes in microbiota diversity and composition. The effects of FMT on recipient microbiota in inflammatory bowel diseases (IBD) remain unclear. We assessed the effects of FMT on microbiota composition and function, mucosal immune response, and clinical outcome in patients with chronic pouchitis. Eight patients with chronic pouchitis (current PDAI ≥7) were treated with FMT via nasogastric administration. Clinical activity was assessed before and four weeks following FMT. Faecal coliform antibiotic sensitivities were analysed, and changes in pouch faecal and mucosal microbiota assessed by 16S rRNA gene pyrosequencing and (1)H NMR spectroscopy. Lamina propria dendritic cell phenotype and cytokine profiles were assessed by flow cytometric analysis and multiplex assay. Following FMT, there were variable shifts in faecal and mucosal microbiota composition and, in some patients, changes in proportional abundance of species suggestive of a "healthier" pouch microbiota. However, there were no significant FMT-induced metabolic or immunological changes, or beneficial clinical response. Given the lack of clinical response following FMT via a single nasogastric administration our results suggest that FMT/bacteriotherapy for pouchitis patients requires further optimisation.

    Funded by: Biotechnology and Biological Sciences Research Council: WMNIP33458; Wellcome Trust: WT098051

    Scientific reports 2015;5;12955

  • Genome watch: The chronicles of virus-host affairs.

    Langat P and Petrova V

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    This month's Genome Watch highlights a new large-scale serological platform for the simultaneous detection of multiple human viruses in a single drop of blood.

    Nature reviews. Microbiology 2015;13;8;460

  • Strain Selection for Generation of O-Antigen-Based Glycoconjugate Vaccines against Invasive Nontyphoidal Salmonella Disease.

    Lanzilao L, Stefanetti G, Saul A, MacLennan CA, Micoli F and Rondini S

    Sclavo Behring Vaccines Institute for Global Health S.r.l., a GSK company (formerly Novartis Vaccines Institute for Global Health S.r.l), Via Fiorentina 1, 53100, Siena, Italy.

    Nontyphoidal Salmonellae, principally S. Typhimurium and S. Enteritidis, are a major cause of invasive bloodstream infections in sub-Saharan Africa with no vaccine currently available. Conjugation of lipopolysaccharide O-antigen to a carrier protein constitutes a promising vaccination strategy. Here we describe a rational process to select the most appropriate isolates of Salmonella as source of O-antigen for developing a bivalent glycoconjugate vaccine. We screened a library of 30 S. Typhimurium and 21 S. Enteritidis in order to identify the most suitable strains for large scale O-antigen production and generation of conjugate vaccines. Initial screening was based on growth characteristics, safety profile of the isolates, O-antigen production, and O-antigen characteristics in terms of molecular size, O-acetylation and glucosylation level and position, as determined by phenol sulfuric assay, NMR, HPLC-SEC and HPAEC-PAD. Three animal isolates for each serovar were identified and used to synthesize candidate glycoconjugate vaccines, using CRM197 as carrier protein. The immunogenicity of these conjugates and the functional activity of the induced antibodies was investigated by ELISA, serum bactericidal assay and flow cytometry. S. Typhimurium O-antigen showed high structural diversity, including O-acetylation of rhamnose in a Malawian invasive strain generating a specific immunodominant epitope. S. Typhimurium conjugates provoked an anti-O-antigen response primarily against the O:5 determinant. O-antigen from S. Enteritidis was structurally more homogeneous than from S. Typhimurium, and no idiosyncratic antibody responses were detected for the S. Enteritidis conjugates. Of the three initially selected isolates, two S. Typhimurium (1418 and 2189) and two S. Enteritidis (502 and 618) strains generated glycoconjugates able to induce high specific antibody levels with high breadth of serovar-specific strain coverage, and were selected for use in vaccine production. The strain selection approach described is potentially applicable to the development of glycoconjugate vaccines against other bacterial pathogens.

    PloS one 2015;10;10;e0139847

  • Ape parasite origins of human malaria virulence genes.

    Larremore DB, Sundararaman SA, Liu W, Proto WR, Clauset A, Loy DE, Speede S, Plenderleith LJ, Sharp PM, Hahn BH, Rayner JC and Buckee CO

    Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, Massachusetts 02115, USA.

    Antigens encoded by the var gene family are major virulence factors of the human malaria parasite Plasmodium falciparum, exhibiting enormous intra- and interstrain diversity. Here we use network analysis to show that var architecture and mosaicism are conserved at multiple levels across the Laverania subgenus, based on var-like sequences from eight single-species and three multi-species Plasmodium infections of wild-living or sanctuary African apes. Using select whole-genome amplification, we also find evidence of multi-domain var structure and synteny in Plasmodium gaboni, one of the ape Laverania species most distantly related to P. falciparum, as well as a new class of Duffy-binding-like domains. These findings indicate that the modular genetic architecture and sequence diversity underlying var-mediated host-parasite interactions evolved before the radiation of the Laverania subgenus, long before the emergence of P. falciparum.

    Funded by: NIAID NIH HHS: P30 AI045008, R01 AI058715, R01 AI091595, R37 AI050529, T32 AI007532; NIGMS NIH HHS: R21 GM100207; Wellcome Trust: 090851, 095831

    Nature communications 2015;6;8368

  • Fgf and Esrrb integrate epigenetic and transcriptional networks that regulate self-renewal of trophoblast stem cells.

    Latos PA, Goncalves A, Oxley D, Mohammed H, Turro E and Hemberger M

    1] Epigenetics Programme, The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK [2] Centre for Trophoblast Research, University of Cambridge, Downing Street, Cambridge CB2 3EG, UK.

    Esrrb (oestrogen-related receptor beta) is a transcription factor implicated in embryonic stem (ES) cell self-renewal, yet its knockout causes intrauterine lethality due to defects in trophoblast development. Here we show that in trophoblast stem (TS) cells, Esrrb is a downstream target of fibroblast growth factor (Fgf) signalling and is critical to drive TS cell self-renewal. In contrast to its occupancy of pluripotency-associated loci in ES cells, Esrrb sustains the stemness of TS cells by direct binding and regulation of TS cell-specific transcription factors including Elf5 and Eomes. To elucidate the mechanisms whereby Esrrb controls the expression of its targets, we characterized its TS cell-specific interactome using mass spectrometry. Unlike in ES cells, Esrrb interacts in TS cells with the histone demethylase Lsd1 and with the RNA Polymerase II-associated Integrator complex. Our findings provide new insights into both the general and context-dependent wiring of transcription factor networks in stem cells by master transcription factors.

    Funded by: Biotechnology and Biological Sciences Research Council; Wellcome Trust

    Nature communications 2015;6;7776

  • Connecting genotypes to medically relevant phenotypes in major vector mosquitoes


    Current Opinion in Insect Science 2015;10;59–64

  • Genetic characterization of three qnrS1-harbouring multidrug-resistance plasmids and qnrS1-containing transposons circulating in Ho Chi Minh City, Vietnam.

    Le V, Nhu NT, Cerdeno-Tarraga A, Campbell JI, Tuyen HT, Nhu Tdo H, Tam PT, Schultsz C, Thwaites G, Thomson NR and Baker S

    1​ Hospital for Tropical Diseases, Wellcome Trust Major Overseas Programme, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam 2​ Division of Infectious Diseases, Department of Medicine, University of California, San Francisco, CA, USA.

    Plasmid-mediated quinolone resistance (PMQR) refers to a family of closely related genes that confer decreased susceptibility to fluoroquinolones. PMQR genes are generally associated with integrons and/or plasmids that carry additional antimicrobial resistance genes active against a range of antimicrobials. In Ho Chi Minh City (HCMC), Vietnam, we have previously shown a high frequency of PMQR genes within commensal Enterobacteriaceae. However, there are limited available sequence data detailing the genetic context in which the PMQR genes reside, and a lack of understanding of how these genes spread across the Enterobacteriaceae. Here, we aimed to determine the genetic background facilitating the spread and maintenance of qnrS1, the dominant PMQR gene circulating in HCMC. We sequenced three qnrS1-carrying plasmids in their entirety to understand the genetic context of these qnrS1-embedded plasmids and also the association of qnrS1-mediated quinolone resistance with other antimicrobial resistance phenotypes. Annotation of the three qnrS1-containing plasmids revealed a qnrS1-containing transposon with a closely related structure. We screened 112 qnrS1-positive commensal Enterobacteriaceae isolated in the community and in a hospital in HCMC to detect the common transposon structure. We found the same transposon structure to be present in 71.4 % (45/63) of qnrS1-positive hospital isolates and in 36.7 % (18/49) of qnrS1-positive isolates from the community. The resulting sequence analysis of the qnrS1 environment suggested that qnrS1 genes are widely distributed and are mobilized on elements with a common genetic background. Our data add additional insight into mechanisms that facilitate resistance to multiple antimicrobials in Gram-negative bacteria in Vietnam.

    Funded by: Wellcome Trust: 089276, 089276/2/09/2, 100087, 100087/Z/12/Z

    Journal of medical microbiology 2015;64;8;869-78

  • R-M systems go on the offensive.

    Lees J and Gladstone RA

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2015;13;3;131

  • A high-content platform to characterise human induced pluripotent stem cell lines.

    Leha A, Moens N, Meleckyte R, Culley OJ, Gervasio MK, Kerz M, Reimer A, Cain SA, Streeter I, Folarin A, Stegle O, Kielty CM, HipSci Consortium, Durbin R, Watt FM and Danovi D

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

    Induced pluripotent stem cells (iPSCs) provide invaluable opportunities for future cell therapies as well as for studying human development, modelling diseases and discovering therapeutics. In order to realise the potential of iPSCs, it is crucial to comprehensively characterise cells generated from large cohorts of healthy and diseased individuals. The human iPSC initiative (HipSci) is assessing a large panel of cell lines to define cell phenotypes, dissect inter- and intra-line and donor variability and identify its key determinant components. Here we report the establishment of a high-content platform for phenotypic analysis of human iPSC lines. In the described assay, cells are dissociated and seeded as single cells onto 96-well plates coated with fibronectin at three different concentrations. This method allows assessment of cell number, proliferation, morphology and intercellular adhesion. Altogether, our strategy delivers robust quantification of phenotypic diversity within complex cell populations facilitating future identification of the genetic, biological and technical determinants of variance. Approaches such as the one described can be used to benchmark iPSCs from multiple donors and create novel platforms that can readily be tailored for disease modelling and drug discovery.

    Funded by: Medical Research Council: MC_PC_12026, MR/K026666/1, MR/L022699/1; Wellcome Trust: 098503

    Methods (San Diego, Calif.) 2015;96;85-96

  • Regulatory Divergence of Transcript Isoforms in a Mammalian Model System.

    Leigh-Brown S, Goncalves A, Thybert D, Stefflova K, Watt S, Flicek P, Brazma A, Marioni JC and Odom DT

    University of Cambridge, Cancer Research UK - Cambridge Institute, Li Ka Shing Centre, Cambridge, United Kingdom.

    Phenotypic differences between species are driven by changes in gene expression and, by extension, by modifications in the regulation of the transcriptome. Investigation of mammalian transcriptome divergence has been restricted to analysis of bulk gene expression levels and gene-internal splicing. Using allele-specific expression analysis in inter-strain hybrids of Mus musculus, we determined the contribution of multiple cellular regulatory systems to transcriptome divergence, including: alternative promoter usage, transcription start site selection, cassette exon usage, alternative last exon usage, and alternative polyadenylation site choice. Between mouse strains, a fifth of genes have variations in isoform usage that contribute to transcriptomic changes, half of which alter encoded amino acid sequence. Virtually all divergence in isoform usage altered the post-transcriptional regulatory instructions in gene UTRs. Furthermore, most genes with isoform differences between strains contain changes originating from multiple regulatory systems. This result indicates widespread cross-talk and coordination exists among different regulatory systems. Overall, isoform usage diverges in parallel with and independently to gene expression evolution, and the cis and trans regulatory contribution to each differs significantly.

    Funded by: Cancer Research UK: 15603; Wellcome Trust: WT095908, WT098051

    PloS one 2015;10;9;e0137367

  • A detailed clinical and molecular survey of subjects with nonsyndromic USH2A retinopathy reveals an allelic hierarchy of disease-causing variants.

    Lenassi E, Vincent A, Li Z, Saihan Z, Coffey AJ, Steele-Stallard HB, Moore AT, Steel KP, Luxon LM, Héon E, Bitner-Glindzicz M and Webster AR

    UCL Institute of Ophthalmology and Moorfields Eye Hospital, University College of London, London, UK.

    Defects in USH2A cause both isolated retinal disease and Usher syndrome (ie, retinal disease and deafness). To gain insights into isolated/nonsyndromic USH2A retinopathy, we screened USH2A in 186 probands with recessive retinal disease and no hearing complaint in childhood (discovery cohort) and in 84 probands with recessive retinal disease (replication cohort). Detailed phenotyping, including retinal imaging and audiological assessment, was performed in individuals with two likely disease-causing USH2A variants. Further genetic testing, including screening for a deep-intronic disease-causing variant and large deletions/duplications, was performed in those with one likely disease-causing change. Overall, 23 of 186 probands (discovery cohort) were found to harbour two likely disease-causing variants in USH2A. Some of these variants were predominantly associated with nonsyndromic retinal degeneration ('retinal disease-specific'); these included the common c.2276 G>T, p.(Cys759Phe) mutation and five additional variants: c.2802 T>G, p.(Cys934Trp); c.10073 G>A, p.(Cys3358Tyr); c.11156 G>A, p.(Arg3719His); c.12295-3 T>A; and c.12575 G>A, p.(Arg4192His). An allelic hierarchy was observed in the discovery cohort and confirmed in the replication cohort. In nonsyndromic USH2A disease, retinopathy was consistent with retinitis pigmentosa and the audiological phenotype was variable. USH2A retinopathy is a common cause of nonsyndromic recessive retinal degeneration and has a different mutational spectrum to that observed in Usher syndrome. The following model is proposed: the presence of at least one 'retinal disease-specific' USH2A allele in a patient with USH2A-related disease results in the preservation of normal hearing. Careful genotype-phenotype studies such as this will become increasingly important, especially now that high-throughput sequencing is widely used in the clinical setting.

    Funded by: Wellcome Trust: 098051

    European journal of human genetics : EJHG 2015;23;10;1318-27

  • Genome-wide analyses identify KLF4 as an important negative regulator in T-cell acute lymphoblastic leukemia through directly inhibiting T-cell associated genes.

    Li W, Jiang Z, Li T, Wei X, Zheng Y, Wu D, Yang L, Chen S, Xu B, Zhong M, Jiang J, Hu Y, Su H, Zhang M, Huang X, Geng S, Weng J, Du X, Liu P, Li Y, Liu H, Yao Y and Li P

    Key Laboratory of Regenerative Biology, South China Institute for Stem Cell Biology and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, 190 Kaiyuan Avenue, Science Park, Guangzhou, Guangdong, 510530, China.

    Background: Kruppel-like factor 4 (KLF4) induces tumorigenesis or suppresses tumor growth in a tissue-dependent manner. However, the roles of KLF4 in hematological malignancies and the mechanisms of action are not fully understood.

    Methods: Inducible KLF4-overexpression Jurkat cell line combined with mouse models bearing cell-derived xenografts and primary T-cell acute lymphoblastic leukemia (T-ALL) cells from four patients were used to assess the functional role of KLF4 in T-ALL cells in vitro and in vivo. A genome-wide RNA-seq analysis was conducted to identify genes