Sanger Institute - Publications 2014
Number of papers published in 2014: 566
The population structure of Vibrio cholerae from the Chandigarh Region of Northern India.
Pathogen Genomics Laboratory, Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia.
Background: Cholera infection continues to be a threat to global public health. The current cholera pandemic associated with Vibrio cholerae El Tor has now been ongoing for over half a century.
Methodology/principal findings: Thirty-eight V. cholerae El Tor isolates associated with a cholera outbreak in 2009 from the Chandigarh region of India were characterised by a combination of microbiology, molecular typing and whole-genome sequencing. The genomic analysis indicated that two clones of V. cholera circulated in the region and caused disease during this time. These clones fell into two distinct sub-clades that map independently onto wave 3 of the phylogenetic tree of seventh pandemic V. cholerae El Tor. Sequence analyses of the cholera toxin gene, the Vibrio seventh Pandemic Island II (VSPII) and SXT element correlated with this phylogenetic position of the two clades on the El Tor tree. The clade 2 isolates, characterized by a drug-resistant profile and the expression of a distinct cholera toxin, are closely related to the recent V. cholerae isolated elsewhere, including Haiti, but fell on a distinct branch of the tree, showing they were independent outbreaks. Multi-Locus Sequence Typing (MLST) distinguishes two sequence types among the 38 isolates, that did not correspond to the clades defined by whole-genome sequencing. Multi-Locus Variable-length tandem-nucleotide repeat Analysis (MLVA) identified 16 distinct clusters.
Conclusions/significance: The use of whole-genome sequencing enabled the identification of two clones of V. cholerae that circulated during the 2009 Chandigarh outbreak. These clones harboured a similar structure of ICEVchHai1 but differed mainly in the structure of CTX phage and VSPII. The limited capacity of MLST and MLVA to discriminate between the clones that circulated in the 2009 Chandigarh outbreak highlights the value of whole-genome sequencing as a route to the identification of further genetic markers to subtype V. cholerae isolates.
Funded by: Wellcome Trust: 098051
PLoS neglected tropical diseases 2014;8;7;e2981
Towards a molecular systems model of coronary artery disease.
Medical Systems Biology, Department of Pathology and Department of Microbiology & Immunology, The University of Melbourne, Parkville, Victoria, 3010, Australia.
Coronary artery disease (CAD) is a complex disease driven by myriad interactions of genetics and environmental factors. Traditionally, studies have analyzed only 1 disease factor at a time, providing useful but limited understanding of the underlying etiology. Recent advances in cost-effective and high-throughput technologies, such as single nucleotide polymorphism (SNP) genotyping, exome/genome/RNA sequencing, gene expression microarrays, and metabolomics assays have enabled the collection of millions of data points in many thousands of individuals. In order to make sense of such 'omics' data, effective analytical methods are needed. We review and highlight some of the main results in this area, focusing on integrative approaches that consider multiple modalities simultaneously. Such analyses have the potential to uncover the genetic basis of CAD, produce genomic risk scores (GRS) for disease prediction, disentangle the complex interactions underlying disease, and predict response to treatment.
Current cardiology reports 2014;16;6;488
Editorial overview: cancer genomics: kill it. Kill it dead.
Experimental Cancer Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1HH, UK. Electronic address: email@example.com.
Funded by: Cancer Research UK: 13031
Current opinion in genetics & development 2014;24;v-vi
Histopathology reveals correlative and unique phenotypes in a high-throughput mouse phenotyping screen.
Centre for Modeling Human Disease, Toronto Centre for Phenogenomics, 25 Orde Street, Toronto, ON M5T 3H7, Canada.
The Mouse Genetics Project (MGP) at the Wellcome Trust Sanger Institute aims to generate and phenotype over 800 genetically modified mouse lines over the next 5 years to gain a better understanding of mammalian gene function and provide an invaluable resource to the scientific community for follow-up studies. Phenotyping includes the generation of a standardized biobank of paraffin-embedded tissues for each mouse line, but histopathology is not routinely performed. In collaboration with the Pathology Core of the Centre for Modeling Human Disease (CMHD) we report the utility of histopathology in a high-throughput primary phenotyping screen. Histopathology was assessed in an unbiased selection of 50 mouse lines with (n=30) or without (n=20) clinical phenotypes detected by the standard MGP primary phenotyping screen. Our findings revealed that histopathology added correlating morphological data in 19 of 30 lines (63.3%) in which the primary screen detected a phenotype. In addition, seven of the 50 lines (14%) presented significant histopathology findings that were not associated with or predicted by the standard primary screen. Three of these seven lines had no clinical phenotype detected by the standard primary screen. Incidental and strain-associated background lesions were present in all mutant lines with good concordance to wild-type controls. These findings demonstrate the complementary and unique contribution of histopathology to high-throughput primary phenotyping of mutant mice.
Funded by: NHGRI NIH HHS: U54 HG006364; NIH HHS: U42 OD011175; Wellcome Trust: 098051
Disease models & mechanisms 2014;7;5;515-24
H3Africa: a tipping point for a revolution in bioinformatics, genomics and health research in Africa.
Computational and Evolutionary Biology/Bioinformatics, Faculty of Life Sciences, University of Manchester, Manchester, UK ; Microbiology Unit, Department of Biological Sciences, Nasarawa State University, Keffi, Nigeria.
Background: A multi-million dollar research initiative involving the National Institutes of Health (NIH), Wellcome Trust and African scientists has been launched. The initiative, referred to as H3Africa, is an acronym that stands for Human Heredity and Health in Africa. Here, we outline what this initiative is set to achieve and the latest commitments of the key players as at October 2013.
Findings: The initiative has so far been awarded over $74 million in research grants. During the first set of awards announced in 2012, the NIH granted $5 million a year for a period of five years, while the Wellcome Trust doled out at least $12 million over the period to the research consortium. This was in addition to Wellcome Trust's provision of administrative support, scientific consultation and advanced training, all in collaboration with the African Society for Human Genetics. In addition, during the second set of awards announced in October 2013, the NIH awarded to the laudable initiative 10 new grants of up to $17 million over the next four years.
Conclusions: H3Africa is poised to transform the face of research in genomics, bioinformatics and health in Africa. The capacity of African scientists will be enhanced through training and the better research facilities that will be acquired. Research collaborations between Africa and the West will grow and all stakeholders, including the funding partners, African scientists, scientists across the globe, physicians and patients will be the eventual winners.
Source code for biology and medicine 2014;9;10
Allelic expression mapping across cellular lineages to establish impact of non-coding SNPs.
Institute National de la Santé et de la Recherche Médicale (INSERM), U1043, Toulouse, France.
Most complex disease-associated genetic variants are located in non-coding regions and are therefore thought to be regulatory in nature. Association mapping of differential allelic expression (AE) is a powerful method to identify SNPs with direct cis-regulatory impact (cis-rSNPs). We used AE mapping to identify cis-rSNPs regulating gene expression in 55 and 63 HapMap lymphoblastoid cell lines from a Caucasian and an African population, respectively, 70 fibroblast cell lines, and 188 purified monocyte samples and found 40-60% of these cis-rSNPs to be shared across cell types. We uncover a new class of cis-rSNPs, which disrupt footprint-derived de novo motifs that are predominantly bound by repressive factors and are implicated in disease susceptibility through overlaps with GWAS SNPs. Finally, we provide the proof-of-principle for a new approach for genome-wide functional validation of transcription factor-SNP interactions. By perturbing NFκB action in lymphoblasts, we identified 489 cis-regulated transcripts with altered AE after NFκB perturbation. Altogether, we perform a comprehensive analysis of cis-variation in four cell populations and provide new tools for the identification of functional variants associated to complex diseases.
Funded by: Canadian Institutes of Health Research
Molecular systems biology 2014;10;754
Human African trypanosomiasis research gets a boost: unraveling the tsetse genome.
Yale School of Public Health, Department of Epidemiology and Public Health, New Haven, Connecticut, United States of America.
Funded by: FIC NIH HHS: D43 TW007391; NIAID NIH HHS: R01 AI051584, R01 AI081774
PLoS neglected tropical diseases 2014;8;4;e2624
Rare variants in NR2F2 cause congenital heart defects in humans.
Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK; Department of Pathology, King Abdulaziz Medical City, P.O. Box 22490, Riyadh 11426, Saudi Arabia.
Congenital heart defects (CHDs) are the most common birth defect worldwide and are a leading cause of neonatal mortality. Nonsyndromic atrioventricular septal defects (AVSDs) are an important subtype of CHDs for which the genetic architecture is poorly understood. We performed exome sequencing in 13 parent-offspring trios and 112 unrelated individuals with nonsyndromic AVSDs and identified five rare missense variants (two of which arose de novo) in the highly conserved gene NR2F2, a very significant enrichment (p = 7.7 × 10(-7)) compared to 5,194 control subjects. We identified three additional CHD-affected families with other variants in NR2F2 including a de novo balanced chromosomal translocation, a de novo substitution disrupting a splice donor site, and a 3 bp duplication that cosegregated in a multiplex family. NR2F2 encodes a pleiotropic developmental transcription factor, and decreased dosage of NR2F2 in mice has been shown to result in abnormal development of atrioventricular septa. Via luciferase assays, we showed that all six coding sequence variants observed in individuals significantly alter the activity of NR2F2 on target promoters.
Funded by: British Heart Foundation: PG/07/045/22690, RG/07/010/23676, RG/10/17/28553, RG/13/10/30376; Medical Research Council: MC_PC_U127561093, MC_U127561093; NHLBI NIH HHS: RC2 HL102923, RC2 HL102924, RC2 HL102925, RC2 HL102926, RC2 HL103010, UC2 HL102923, UC2 HL102924, UC2 HL102925, UC2 HL102926, UC2 HL103010; Wellcome Trust: 090532, 100140, WT098051
American journal of human genetics 2014;94;4;574-85
Mutational signatures: the patterns of somatic mutations hidden in cancer genomes.
Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom. Electronic address: firstname.lastname@example.org.
All cancers originate from a single cell that starts to behave abnormally due to the acquired somatic mutations in its genome. Until recently, the knowledge of the mutational processes that cause these somatic mutations has been very limited. Recent advances in sequencing technologies and the development of novel mathematical approaches have allowed deciphering the patterns of somatic mutations caused by different mutational processes. Here, we summarize our current understanding of mutational patterns and mutational signatures in light of both the somatic cell paradigm of cancer research and the recent developments in the field of cancer genomics.
Funded by: Wellcome Trust: 098051
Current opinion in genetics & development 2014;24;52-60
Reading between the lines; understanding drug response in the post genomic era.
Cambridge Institute of Medical Research, University of Cambridge, Cambridge, UK; Cancer Genome Project, Wellcome Trust Sanger Institute, Cambridge, UK; Dept of Medical Oncology, Charing Cross Hospital, London, UK. Electronic address: email@example.com.
Following the fanfare of initial, often dramatic, success with small molecule inhibitors in the treatment of defined genomic subgroups, it can be argued that the extension of targeted therapeutics to the majority of patients with solid cancers has stalled. Despite encouraging FDA approval rates, the attrition rates of these compounds remains high in early stage clinical studies, with single agent studies repeatedly showing poor efficacy In striking contrast, our understanding of the complexity of solid neoplasms has increased in huge increments, following the publication of large-scale genomic and transcriptomic datasets from large collaborations such as the International Cancer Genome Consortium (ICGC http://www.icgc.org/) and The Cancer Genome Atlas (TCGA http://cancergenome.nih.gov/). However, there remains a clear disconnect between these rich datasets describing the genomic complexity of cancer, including both intra- and inter-tumour heterogeneity, and what a treating oncologist can consider to be a clinically "actionable" mutation profile. Our understanding of these data is in its infancy and we still find difficulties ascribing characteristics to tumours that consistently predict therapeutic response for the majority of small molecule inhibitors. This article will seek to explore the recent studies of the patterns and impact of mutations in drug resistance, and demonstrate how we may use this data to reshape our thinking about biological pathways, critical dependencies and their therapeutic interruption.
Molecular oncology 2014;8;6;1112-9
Single nucleotide polymorphisms with cis-regulatory effects on long non-coding transcripts in human primary monocytes.
Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, Uppsala, Sweden.
We applied genome-wide allele-specific expression analysis of monocytes from 188 samples. Monocytes were purified from white blood cells of healthy blood donors to detect cis-acting genetic variation that regulates the expression of long non-coding RNAs. We analysed 8929 regions harboring genes for potential long non-coding RNA that were retrieved from data from the ENCODE project. Of these regions, 60% were annotated as intergenic, which implies that they do not overlap with protein-coding genes. Focusing on the intergenic regions, and using stringent analysis of the allele-specific expression data, we detected robust cis-regulatory SNPs in 258 out of 489 informative intergenic regions included in the analysis. The cis-regulatory SNPs that were significantly associated with allele-specific expression of long non-coding RNAs were enriched to enhancer regions marked for active or bivalent, poised chromatin by histone modifications. Out of the lncRNA regions regulated by cis-acting regulatory SNPs, 20% (n = 52) were co-regulated with the closest protein coding gene. We compared the identified cis-regulatory SNPs with those in the catalog of SNPs identified by genome-wide association studies of human diseases and traits. This comparison identified 32 SNPs in loci from genome-wide association studies that displayed a strong association signal with allele-specific expression of non-coding RNAs in monocytes, with p-values ranging from 6.7×10(-7) to 9.5×10(-89). The identified cis-regulatory SNPs are associated with diseases of the immune system, like multiple sclerosis and rheumatoid arthritis.
Funded by: British Heart Foundation: RG/09/012/28096
PloS one 2014;9;7;e102612
One patient, two lesions, two oncogenic drivers of gastric cancer.
Deep-sequencing of a primary tumor and metastasis from a single patient, and functional validation in culture, reveals that TGFBR2 and FGFR2 act as drivers of gastric cancer.
Funded by: Cancer Research UK: 13031
Genome biology 2014;15;8;444
Plasmodium falciparum founder populations in western Cambodia have reduced artemisinin sensitivity in vitro.
Laboratory of Malaria and Vector Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, USA.
Reduced Plasmodium falciparum sensitivity to short-course artemisinin (ART) monotherapy manifests as a long parasite clearance half-life. We recently defined three parasite founder populations with long half-lives in Pursat, western Cambodia, where reduced ART sensitivity is prevalent. Using the ring-stage survival assay, we show that these founder populations have reduced ART sensitivity in vitro at the early ring stage of parasite development and that a genetically admixed population contains subsets of parasites with normal or reduced ART sensitivity.
Funded by: Intramural NIH HHS
Antimicrobial agents and chemotherapy 2014;58;8;4935-7
Whole Genome Sequencing of a Methicillin-Resistant Staphylococcus aureus Pseudo-Outbreak in a Professional Football Team.
Duke University Medical Center , Durham, North Carolina.
Two American football players on the same team were diagnosed with methicillin-resistant Staphylococcus aureus (MRSA) skin and soft tissue infections on the same day. Our investigation, including whole genome sequencing, confirmed that players did not transmit MRSA to one another nor did they acquire the MRSA from a single source within the training facility.
Funded by: Medical Research Council: G1000803; NIAID NIH HHS: K23 AI095357, K24 AI093969, R01 AI068804
Open forum infectious diseases 2014;1;3;ofu096
A syndromic form of Pierre Robin sequence is caused by 5q23 deletions encompassing FBN2 and PHAX.
MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK.
Pierre Robin sequence (PRS) is an aetiologically distinct subgroup of cleft palate. We aimed to define the critical genomic interval from five different 5q22-5q31 deletions associated with PRS or PRS-associated features and assess each gene within the region as a candidate for the PRS component of the phenotype. Clinical array-based comparative genome hybridisation (aCGH) data were used to define a 2.08 Mb minimum region of overlap among four de novo deletions and one mother-son inherited deletion associated with at least one component of PRS. Commonly associated anomalies were talipes equinovarus (TEV), finger contractures and crumpled ear helices. Expression analysis of the orthologous genes within the PRS critical region in embryonic mice showed that the strongest candidate genes were FBN2 and PHAX. Targeted aCGH of the critical region and sequencing of these genes in a cohort of 25 PRS patients revealed no plausible disease-causing mutations. In conclusion, deletion of ∼2 Mb on 5q23 region causes a clinically recognisable subtype of PRS. Haploinsufficiency for FBN2 accounts for the digital and auricular features. A possible critical region for TEV is distinct and telomeric to the PRS region. The molecular basis of PRS in these cases remains undetermined but haploinsufficiency for PHAX is a plausible mechanism.
Funded by: Medical Research Council: MC_PC_U127561093
European journal of medical genetics 2014;57;10;587-95
Nonsense mutations in the shelterin complex genes ACD and TERF2IP in familial melanoma.
Affiliations of authors: QIMR Berghofer Medical Research Institute, Brisbane, Australia (LGA, ALP, MG, PJ, JMP, JS, VB, SW, KDR, MSS, GWM, NGM, NKH); Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK (CDRE, TMK, DJA); Department of Clinical Genetics, Rigshospitalet, Copenhagen, Denmark (KW, AMG); Leeds Institute of Cancer and Pathology, University of Leeds, Leeds, UK (MH, HSn, DTB, JANB); Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD (JC, KMB); Departamento de Bioquímica y Biología Molecular, Instituto Universitario de Oncología del Principado de Asturias (IUOPA) Universidad de Oviedo, Oviedo, Spain (VQ, AJR, CLO); Cancer Genomics Research Laboratory, NCI Frederick, SAIC-Frederick Inc., Frederick MD (XZ, KJ); Department of Dermatology, Leiden University Medical Centre, Leiden, the Netherlands (RvD, NAG); Department of Clinical Sciences Lund, Division of Oncology and Pathology, Lund University, Lund, Sweden (HO, CI, ÅB, GJ); Translational Genomics Institute, Phoenix, AZ (JMT); University of Sydney at Westmead Millennium Institute, Westmead, Sydney, NSW, Australia (EAH, HSc, GJM); Melanoma Institute Australia, North Sydney, NSW, Australia (EAH, HSc, GJM).
Background: The shelterin complex protects chromosomal ends by regulating how the telomerase complex interacts with telomeres. Following the recent finding in familial melanoma of inactivating germline mutations in POT1, encoding a member of the shelterin complex, we searched for mutations in the other five components of the shelterin complex in melanoma families.
Methods: Next-generation sequencing techniques were used to screen 510 melanoma families (with unknown genetic etiology) and control cohorts for mutations in shelterin complex encoding genes: ACD, TERF2IP, TERF1, TERF2, and TINF 2. Maximum likelihood and LOD [logarithm (base 10) of odds] analyses were used. Mutation clustering was assessed with χ(2) and Fisher's exact tests. P values under .05 were considered statistically significant (one-tailed with Yates' correction).
Results: Six families had mutations in ACD and four families carried TERF2IP variants, which included nonsense mutations in both genes (p.Q320X and p.R364X, respectively) and point mutations that cosegregated with melanoma. Of five distinct mutations in ACD, four clustered in the POT1 binding domain, including p.Q320X. This clustering of novel mutations in the POT1 binding domain of ACD was statistically higher (P = .005) in melanoma probands compared with population control individuals (n = 6785), as were all novel and rare variants in both ACD (P = .040) and TERF2IP (P = .022). Families carrying ACD and TERF2IP mutations were also enriched with other cancer types, suggesting that these variants also predispose to a broader spectrum of cancers than just melanoma. Novel mutations were also observed in TERF1, TERF2, and TINF2, but these were not convincingly associated with melanoma.
Conclusions: Our findings add to the growing support for telomere dysregulation as a key process associated with melanoma susceptibility.
Funded by: Cancer Research UK: 10589, 13031; Medical Research Council: MR/L01629X/1
Journal of the National Cancer Institute 2014;107;2
A molecular marker of artemisinin-resistant Plasmodium falciparum malaria.
1] Institut Pasteur, Parasite Molecular Immunology Unit, 75724 Paris Cedex 15, France  Centre National de la Recherche Scientifique, Unité de Recherche Associée 2581, 75724 Paris Cedex 15, France  Institut Pasteur, Genetics and Genomics of Insect Vectors Unit, 75724 Paris Cedex 15, France (F.A.); Institut Pasteur, Functional Genetics of Infectious Diseases Unit, 75724 Paris Cedex 15, France (J.B.); Centre de Physiopathologie de Toulouse-Purpan, Institut National de la Santé et de la Recherche Médicale UMR1043, Centre National de la Recherche Scientifique UMR5282, Université Toulouse III, 31024 Toulouse Cedex 3, France Institut Pasteur, Unité de Biologie et Génétique du Paludisme, Team Malaria Targets and Drug Development, 75724 Paris Cedex 15, France (J.-C.B.).
Plasmodium falciparum resistance to artemisinin derivatives in southeast Asia threatens malaria control and elimination activities worldwide. To monitor the spread of artemisinin resistance, a molecular marker is urgently needed. Here, using whole-genome sequencing of an artemisinin-resistant parasite line from Africa and clinical parasite isolates from Cambodia, we associate mutations in the PF3D7_1343700 kelch propeller domain ('K13-propeller') with artemisinin resistance in vitro and in vivo. Mutant K13-propeller alleles cluster in Cambodian provinces where resistance is prevalent, and the increasing frequency of a dominant mutant K13-propeller allele correlates with the recent spread of resistance in western Cambodia. Strong correlations between the presence of a mutant allele, in vitro parasite survival rates and in vivo parasite clearance rates indicate that K13-propeller mutations are important determinants of artemisinin resistance. K13-propeller polymorphism constitutes a useful molecular marker for large-scale surveillance efforts to contain artemisinin resistance in the Greater Mekong Subregion and prevent its global spread.
Funded by: Intramural NIH HHS: Z01 AI001000-01; Medical Research Council: G0600718; Wellcome Trust: 090770/Z/09/Z, 098051
Lipoprotein(a) levels, genotype, and incident aortic valve stenosis: a prospective Mendelian randomization study and replication in a case-control cohort.
From the Montreal Heart Institute Research Center, Montreal, Quebec, Canada (B.J.A., M.-P.D., É.R., J.-C.T.); Department of Medicine, Faculty of Medicine, University of Montreal, Montreal, Quebec, Canada (B.J.A., M.-P.D., É.R., J.-C.T.); Department of Cardiology, Academic Medical Center, Amsterdam, The Netherlands (S.M.B.); MRC Epidemiology Unit (N.J.W.) and Department of Public Health and Primary Care (K.-T.K., M.S.S.), University of Cambridge, Cambridge, United Kingdom; and Genetic Epidemiology Group, Wellcome Trust Sanger Institute, Hinxton, United Kingdom (M.S.S.).
Background: Although a previous study has suggested that a genetic variant in the LPA region was associated with the presence of aortic valve stenosis (AVS), no prospective study has suggested a role for lipoprotein(a) levels in the pathophysiology of AVS. Our objective was to determine whether lipoprotein(a) levels and a common genetic variant that is strongly associated with lipoprotein(a) levels are associated with an increased risk of developing AVS.
Methods and results: Serum lipoprotein(a) levels were measured in 17 553 participants of the European Prospective Investigation into Cancer (EPIC)-Norfolk study. Among these study participants, 118 developed AVS during a mean follow-up of 11.7 years. The rs10455872 genetic variant in LPA was genotyped in 14 735 study participants, who simultaneously had lipoprotein(a) level measurements, and in a replication study of 379 patients with echocardiography-confirmed AVS and 404 controls. In EPIC-Norfolk, compared with participants in the bottom lipoprotein(a) tertile, those in the top lipoprotein(a) tertile had a higher risk of AVS (hazard ratio, 1.57; 95% confidence interval, 1.02-2.42) after adjusting for age, sex, and smoking. Compared with rs10455872 AA homozygotes, carriers of 1 or 2 G alleles were at increased risk of AVS (hazard ratio, 1.78; 95% confidence interval, 1.11-2.87, versus hazard ratio, 4.83; 95% confidence interval, 1.77-13.20, respectively). In the replication study, the genetic variant rs10455872 also showed a positive association with AVS (odds ratio, 1.57; 95% confidence interval, 1.10-2.26).
Conclusions: Patients with high lipoprotein(a) levels are at increased risk for AVS. The rs10455872 variant, which is associated with higher lipoprotein(a) levels, is also associated with increased risk of AVS, suggesting that this association may be causal.
Funded by: Canadian Institutes of Health Research; Cancer Research UK; Medical Research Council: G0401527, G0801566, G1000143, MC_U106179471, MC_UU_12015/1
Circulation. Cardiovascular genetics 2014;7;3;304-10
Spread of artemisinin resistance in Plasmodium falciparum malaria.
The authors' affiliations are listed in the Appendix.
Background: Artemisinin resistance in Plasmodium falciparum has emerged in Southeast Asia and now poses a threat to the control and elimination of malaria. Mapping the geographic extent of resistance is essential for planning containment and elimination strategies.
Methods: Between May 2011 and April 2013, we enrolled 1241 adults and children with acute, uncomplicated falciparum malaria in an open-label trial at 15 sites in 10 countries (7 in Asia and 3 in Africa). Patients received artesunate, administered orally at a daily dose of either 2 mg per kilogram of body weight per day or 4 mg per kilogram, for 3 days, followed by a standard 3-day course of artemisinin-based combination therapy. Parasite counts in peripheral-blood samples were measured every 6 hours, and the parasite clearance half-lives were determined.
Results: The median parasite clearance half-lives ranged from 1.9 hours in the Democratic Republic of Congo to 7.0 hours at the Thailand-Cambodia border. Slowly clearing infections (parasite clearance half-life >5 hours), strongly associated with single point mutations in the "propeller" region of the P. falciparum kelch protein gene on chromosome 13 (kelch13), were detected throughout mainland Southeast Asia from southern Vietnam to central Myanmar. The incidence of pretreatment and post-treatment gametocytemia was higher among patients with slow parasite clearance, suggesting greater potential for transmission. In western Cambodia, where artemisinin-based combination therapies are failing, the 6-day course of antimalarial therapy was associated with a cure rate of 97.7% (95% confidence interval, 90.9 to 99.4) at 42 days.
Conclusions: Artemisinin resistance to P. falciparum, which is now prevalent across mainland Southeast Asia, is associated with mutations in kelch13. Prolonged courses of artemisinin-based combination therapies are currently efficacious in areas where standard 3-day treatments are failing. (Funded by the U.K. Department of International Development and others; ClinicalTrials.gov number, NCT01350856.).
Funded by: Intramural NIH HHS; Wellcome Trust: 077166, 090532, 090770, 093956
The New England journal of medicine 2014;371;5;411-23
Draft genome sequences of the type strains of Shigella flexneri held at Public Health England: comparison of classical phenotypic and novel molecular assays with whole genome sequence.
Gastrointestinal Bacteria Reference Unit, Public Health England, 61 Colindale Ave, NW9 5HT London, England. firstname.lastname@example.org.
Background: Public Health England (PHE) holds a collection of Shigella flexneri Type strains isolated between 1949 and 1972 representing 15 established serotypes and one provisional type, E1037. In this study, the genomes of all 16 PHE Type strains were sequenced using the Illumina HiSeq platform. The relationship between core genome phylogeny and serotype was examined.
Results: The most common target gene for the detection of Shigella species in clinical PCR assays, ipaH, was detected in all genomes. The type-specific target genes were correctly identified in each genome sequence. In contrast to the S. flexneri in serotype 5 strain described by Sun et al. (2012), the two PHE serotype 5 Type strains possessed an additional oac gene and were differentiated by the presence (serotype 5b) or absence (serotype 5a) of gtrX. The somatic antigen structure and phylogenetic relationship were broadly congruent for strains expressing serotype specific antigens III, IV and V, but not for those expressing I and II. The whole genome phylogenies of the 15 isolates sequenced showed that the serotype 6 Type Strain was phylogenetically distinct from the other S. flexneri serotypes sequenced. The provisional serotype E1037 fell within the serotype 4 clade, being most closely related to the Serotype 4a Type Strain.
Conclusions: The S. flexneri genome sequences were used to evaluate phylogenetic relationships between Type strains and validate genotypic and phenotypic assays. The analysis confirmed that the PHE S. flexneri Type strains are phenotypically and genotypically distinct. Novel variants will continue to be added to this archive.
Gut pathogens 2014;6;1;7
estMOI: estimating multiplicity of infection using parasite deep sequencing data.
London School of Hygiene and Tropical Medicine, WC1E 7HT, London, UK, Wellcome Trust Sanger Institute, CB10 1SA, Hinxton, UK and Malawi-Liverpool-Wellcome Trust Clinical Research Programme, Box 30096 BT3, Blantyre, Malawia.
Individuals living in endemic areas generally harbour multiple parasite strains. Multiplicity of infection (MOI) can be an indicator of immune status and transmission intensity. It has a potentially confounding effect on a number of population genetic analyses, which often assume isolates are clonal. Polymerase chain reaction-based approaches to estimate MOI can lack sensitivity. For example, in the human malaria parasite Plasmodium falciparum, genotyping of the merozoite surface protein (MSP1/2) genes is a standard method for assessing MOI, despite the apparent problem of underestimation. The availability of deep coverage data from massively parallizable sequencing technologies means that MOI can be detected genome wide by considering the abundance of heterozygous genotypes. Here, we present a method to estimate MOI, which considers unique combinations of polymorphisms from sequence reads. The method is implemented within the estMOI software. When applied to clinical P.falciparum isolates from three continents, we find that multiple infections are common, especially in regions with high transmission.
Funded by: Medical Research Council; Wellcome Trust: 101113
Bioinformatics (Oxford, England) 2014;30;9;1292-4
Epistasis between the haptoglobin common variant and α+thalassemia influences risk of severe malaria in Kenyan children.
Department of Paediatrics, Oxford University Hospitals National Health Service Trust, University of Oxford, and.
Haptoglobin (Hp) scavenges free hemoglobin following malaria-induced hemolysis. Few studies have investigated the relationship between the common Hp variants and the risk of severe malaria, and their results are inconclusive. We conducted a case-control study of 996 children with severe Plasmodium falciparum malaria and 1220 community controls and genotyped for Hp, hemoglobin (Hb) S heterozygotes, and α(+)thalassemia. Hb S heterozygotes and α(+)thalassemia homozygotes were protected from severe malaria (odds ratio [OR], 0.12; 95% confidence interval [CI], 0.07-0.18 and OR, 0.69; 95% CI, 0.53-0.91, respectively). The risk of severe malaria also varied by Hp genotype: Hp2-1 was associated with the greatest protection against severe malaria and Hp2-2 with the greatest risk. Meta-analysis of the current and published studies suggests that Hp2-2 is associated with increased risk of severe malaria compared with Hp2-1. We found a significant interaction between Hp genotype and α(+)thalassemia in predicting risk of severe malaria: Hp2-1 in combination with heterozygous or homozygous α(+)thalassemia was associated with protection from severe malaria (OR, 0.73; 95% CI, 0.54-0.99 and OR, 0.48; 95% CI, 0.32-0.73, respectively), but α(+)thalassemia in combination with Hp2-2 was not protective. This epistatic interaction together with varying frequencies of α(+)thalassemia across Africa may explain the inconsistent relationship between Hp genotype and malaria reported in previous studies.
Funded by: Arthritis Research UK; British Heart Foundation; Wellcome Trust: 090532, 090770, 091758, 092654
Transcriptionally active chromatin recruits homologous recombination at DNA double-strand breaks.
1] Laboratoire de Biologie Cellulaire et Moléculaire du Contrôle de la Prolifération, Université de Toulouse, Université Paul Sabatier, Toulouse, France.  CNRS, Laboratoire de Biologie Cellulaire et Moléculaire du Contrôle de la Prolifération, Toulouse, France.
Although both homologous recombination (HR) and nonhomologous end joining can repair DNA double-strand breaks (DSBs), the mechanisms by which one of these pathways is chosen over the other remain unclear. Here we show that transcriptionally active chromatin is preferentially repaired by HR. Using chromatin immunoprecipitation-sequencing (ChIP-seq) to analyze repair of multiple DSBs induced throughout the human genome, we identify an HR-prone subset of DSBs that recruit the HR protein RAD51, undergo resection and rely on RAD51 for efficient repair. These DSBs are located in actively transcribed genes and are targeted to HR repair via the transcription elongation-associated mark trimethylated histone H3 K36. Concordantly, depletion of SETD2, the main H3 K36 trimethyltransferase, severely impedes HR at such DSBs. Our study thereby demonstrates a primary role in DSB repair of the chromatin context in which a break occurs.
Funded by: Cancer Research UK: 11224, C6/A11226; Wellcome Trust: 092096
Nature structural & molecular biology 2014;21;4;366-74
Revisiting the thrifty gene hypothesis via 65 loci associated with susceptibility to type 2 diabetes.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1HH, UK.
We have investigated the evidence for positive selection in samples of African, European, and East Asian ancestry at 65 loci associated with susceptibility to type 2 diabetes (T2D) previously identified through genome-wide association studies. Selection early in human evolutionary history is predicted to lead to ancestral risk alleles shared between populations, whereas late selection would result in population-specific signals at derived risk alleles. By using a wide variety of tests based on the site frequency spectrum, haplotype structure, and population differentiation, we found no global signal of enrichment for positive selection when we considered all T2D risk loci collectively. However, in a locus-by-locus analysis, we found nominal evidence for positive selection at 14 of the loci. Selection favored the protective and risk alleles in similar proportions, rather than the risk alleles specifically as predicted by the thrifty gene hypothesis, and may not be related to influence on diabetes. Overall, we conclude that past positive selection has not been a powerful influence driving the prevalence of T2D risk alleles.
Funded by: Wellcome Trust: 090532, 098051, 098381, WT090367MA
American journal of human genetics 2014;94;2;176-85
Novel mutations in penicillin-binding protein genes in clinical Staphylococcus aureus isolates that are methicillin resistant on susceptibility testing, but lack the mec gene.
Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.
Objectives: Methicillin-resistant Staphylococcus aureus (MRSA) is an important global health problem. MRSA resistance to β-lactam antibiotics is mediated by the mecA or mecC genes, which encode an alternative penicillin-binding protein (PBP) 2a that has a low affinity to β-lactam antibiotics. Detection of mec genes or PBP2a is regarded as the gold standard for the diagnosis of MRSA. We identified four MRSA isolates that lacked mecA or mecC genes, but were still phenotypically resistant to pencillinase-resistant β-lactam antibiotics.
Methods: The four human S. aureus isolates were investigated by whole genome sequencing and a range of phenotypic assays.
Results: We identified a number of amino acid substitutions present in the endogenous PBPs 1, 2 and 3 that were found in the resistant isolates but were absent in closely related susceptible isolates and which may be the basis of resistance. Of particular interest are three identical amino acid substitutions in PBPs 1, 2 and 3, occurring independently in isolates from at least two separate multilocus sequence types. Two different non-conservative substitutions were also present in the same amino acid of PBP1 in two isolates from two different sequence types.
Conclusions: This work suggests that phenotypically resistant MRSA could be misdiagnosed using molecular methods alone and provides evidence of alternative mechanisms for β-lactam resistance in MRSA that may need to be considered by diagnostic laboratories.
Funded by: Medical Research Council: G1001787, G1001787/1; Wellcome Trust
The Journal of antimicrobial chemotherapy 2014;69;3;594-7
TB or not TB? Genomic portraits provide answers.
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Nature reviews. Microbiology 2014;12;6;398
Poxviruses in bats … so what?
Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK. email@example.com.
Poxviruses are important pathogens of man and numerous domestic and wild animal species. Cross species (including zoonotic) poxvirus infections can have drastic consequences for the recipient host. Bats are a diverse order of mammals known to carry lethal viral zoonoses such as Rabies, Hendra, Nipah, and SARS. Consequent targeted research is revealing bats to be infected with a rich diversity of novel viruses. Poxviruses were recently identified in bats and the settings in which they were found were dramatically different. Here, we review the natural history of poxviruses in bats and highlight the relationship of the viruses to each other and their context in the Poxviridae family. In addition to considering the zoonotic potential of these viruses, we reflect on the broader implications of these findings. Specifically, the potential to explore and exploit this newfound relationship to study coevolution and cross species transmission together with fundamental aspects of poxvirus host tropism as well as bat virology and immunology.
Gene conversion violates the stepwise mutation model for microsatellites in y-chromosomal palindromic repeats.
UMR5288 CNRS/UPS-AMIS-Université Paul Sabatier, Toulouse, France; Department of Genetics, University of Leicester, Leicester, UK.
The male-specific region of the human Y chromosome (MSY) contains eight large inverted repeats (palindromes), in which high-sequence similarity between repeat arms is maintained by gene conversion. These palindromes also harbor microsatellites, considered to evolve via a stepwise mutation model (SMM). Here, we ask whether gene conversion between palindrome microsatellites contributes to their mutational dynamics. First, we study the duplicated tetranucleotide microsatellite DYS385a,b lying in palindrome P4. We show, by comparing observed data with simulated data under a SMM within haplogroups, that observed heteroallelic combinations in which the modal repeat number difference between copies was large, can give rise to homoallelic combinations with zero-repeats difference, equivalent to many single-step mutations. These are unlikely to be generated under a strict SMM, suggesting the action of gene conversion. Second, we show that the intercopy repeat number difference for a large set of duplicated microsatellites in all palindromes in the MSY reference sequence is significantly reduced compared with that for nonpalindrome-duplicated microsatellites, suggesting that the former are characterized by unusual evolutionary dynamics. These observations indicate that gene conversion violates the SMM for microsatellites in palindromes, homogenizing copies within individual Y chromosomes, but increasing overall haplotype diversity among chromosomes within related groups.
Funded by: Wellcome Trust: 087576
Human mutation 2014;35;5;609-17
Toward male individualization with rapidly mutating y-chromosomal short tandem repeats.
Department of Forensic Molecular Biology, Erasmus MC University Medical Centre Rotterdam, Rotterdam, The Netherlands; Office of the Chief Forensic Scientist, Victoria Police Forensic Services Department, Macleod, Victoria, Australia.
Relevant for various areas of human genetics, Y-chromosomal short tandem repeats (Y-STRs) are commonly used for testing close paternal relationships among individuals and populations, and for male lineage identification. However, even the widely used 17-loci Yfiler set cannot resolve individuals and populations completely. Here, 52 centers generated quality-controlled data of 13 rapidly mutating (RM) Y-STRs in 14,644 related and unrelated males from 111 worldwide populations. Strikingly, >99% of the 12,272 unrelated males were completely individualized. Haplotype diversity was extremely high (global: 0.9999985, regional: 0.99836-0.9999988). Haplotype sharing between populations was almost absent except for six (0.05%) of the 12,156 haplotypes. Haplotype sharing within populations was generally rare (0.8% nonunique haplotypes), significantly lower in urban (0.9%) than rural (2.1%) and highest in endogamous groups (14.3%). Analysis of molecular variance revealed 99.98% of variation within populations, 0.018% among populations within groups, and 0.002% among groups. Of the 2,372 newly and 156 previously typed male relative pairs, 29% were differentiated including 27% of the 2,378 father-son pairs. Relative to Yfiler, haplotype diversity was increased in 86% of the populations tested and overall male relative differentiation was raised by 23.5%. Our study demonstrates the value of RM Y-STRs in identifying and separating unrelated and related males and provides a reference database.
Funded by: Wellcome Trust: 087576
Human mutation 2014;35;8;1021-32
Multi-species, multi-transcription factor binding highlights conserved control of tissue-specific biological pathways.
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom.
As exome sequencing gives way to genome sequencing, the need to interpret the function of regulatory DNA becomes increasingly important. To test whether evolutionary conservation of cis-regulatory modules (CRMs) gives insight into human gene regulation, we determined transcription factor (TF) binding locations of four liver-essential TFs in liver tissue from human, macaque, mouse, rat, and dog. Approximately, two thirds of the TF-bound regions fell into CRMs. Less than half of the human CRMs were found as a CRM in the orthologous region of a second species. Shared CRMs were associated with liver pathways and disease loci identified by genome-wide association studies. Recurrent rare human disease causing mutations at the promoters of several blood coagulation and lipid metabolism genes were also identified within CRMs shared in multiple species. This suggests that multi-species analyses of experimentally determined combinatorial TF binding will help identify genomic regions critical for tissue-specific gene control.
Funded by: Cancer Research UK: 15603; European Research Council: 202218; Wellcome Trust: 095908, WT095908, WT098051
Genetic screens in mice for genome integrity maintenance and cancer predisposition.
The Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, UK; Experimental Cancer Genetics, The Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Genome instability is a feature of nearly all cancers and can be exploited for therapy. In addition, a growing number of genome maintenance genes have been associated with developmental disorders. Efforts to understand the role of genome instability in these processes will be greatly facilitated by a more comprehensive understanding of their genetic network. We highlight recent genetic screens in model organisms that have assisted in the discovery of novel regulators of genome stability and focus on the contribution of mice as a model organism to understanding the role of genome instability during embryonic development, tumour formation and cancer therapy.
Funded by: Cancer Research UK: 11224, C20510/A12401, C6/A11224; Wellcome Trust: 092096
Current opinion in genetics & development 2014;24;1-7
A high-throughput in vivo micronucleus assay for genome instability screening in mice.
1] The Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge, UK.  Maintenance of Genome Stability, The Wellcome Trust Sanger Institute, Genome Campus, Cambridge, UK.
We describe a sensitive, robust, high-throughput method for quantifying the formation of micronuclei, markers of genome instability, in mouse erythrocytes. Micronuclei are whole chromosomes or chromosome segments that have been separated from the nucleus. Other methods of detection rely on labor-intensive, microscopy-based techniques. Here we describe a 2-d, 96-well plate-based flow cytometric method of micronucleus scoring that is simple enough for a research technician experienced in flow cytometry to perform. The assay detects low levels of genome instability that cannot be readily identified by classic phenotyping, using 25 μl of blood. By using this assay, we have screened >10,000 blood samples and discovered novel genes that contribute to vertebrate genome maintenance, as well as novel disease models and mechanisms of genome instability disorders. We discuss experimental design considerations, including statistical power calculation, we provide troubleshooting tips and we discuss factors that contribute to a false-positive increase in the number of micronucleated red blood cells and to experimental variability.
Funded by: Cancer Research UK: 11224, 12401, A11224, C20510/A12401, C6/A11224, C6946/A14492; European Research Council: 268536; Wellcome Trust: WT092096
Nature protocols 2014;10;1;205-15
Mutations in KPTN cause macrocephaly, neurodevelopmental delay, and seizures.
Monogenic Molecular Genetics, University of Exeter Medical School, St. Luke's Campus, Magdalen Road, Exeter EX1 2LU, UK.
The proper development of neuronal circuits during neuromorphogenesis and neuronal-network formation is critically dependent on a coordinated and intricate series of molecular and cellular cues and responses. Although the cortical actin cytoskeleton is known to play a key role in neuromorphogenesis, relatively little is known about the specific molecules important for this process. Using linkage analysis and whole-exome sequencing on samples from families from the Amish community of Ohio, we have demonstrated that mutations in KPTN, encoding kaptin, cause a syndrome typified by macrocephaly, neurodevelopmental delay, and seizures. Our immunofluorescence analyses in primary neuronal cell cultures showed that endogenous and GFP-tagged kaptin associates with dynamic actin cytoskeletal structures and that this association is lost upon introduction of the identified mutations. Taken together, our studies have identified kaptin alterations responsible for macrocephaly and neurodevelopmental delay and define kaptin as a molecule crucial for normal human neuromorphogenesis.
Funded by: Medical Research Council: G1001931, G1002279; Wellcome Trust: WT098051
American journal of human genetics 2014;94;1;87-94
Increased dihydroceramide/ceramide ratio mediated by defective expression of degs1 impairs adipocyte differentiation and function.
Metabolic Research Laboratories, Wellcome Trust-MRC Institute of Metabolic Science, Addenbrooke's Hospital, University of Cambridge, Cambridge, U.K. Instituto Maimónides de Investigación Biomédica de Córdoba, Reina Sofia University Hospital, Córdoba, Spain firstname.lastname@example.org email@example.com firstname.lastname@example.org.
Adipose tissue dysfunction is an important determinant of obesity-associated, lipid-induced metabolic complications. Ceramides are well-known mediators of lipid-induced insulin resistance in peripheral organs such as muscle. DEGS1 is the desaturase catalyzing the last step in the main ceramide biosynthetic pathway. Functional suppression of DEGS1 activity results in substantial changes in ceramide species likely to affect fundamental biological functions such as oxidative stress, cell survival, and proliferation. Here, we show that degs1 expression is specifically decreased in the adipose tissue of obese patients and murine models of genetic and nutritional obesity. Moreover, loss-of-function experiments using pharmacological or genetic ablation of DEGS1 in preadipocytes prevented adipogenesis and decreased lipid accumulation. This was associated with elevated oxidative stress, cellular death, and blockage of the cell cycle. These effects were coupled with increased dihydroceramide content. Finally, we validated in vivo that pharmacological inhibition of DEGS1 impairs adipocyte differentiation. These data identify DEGS1 as a new potential target to restore adipose tissue function and prevent obesity-associated metabolic disturbances.
Funded by: British Heart Foundation; Medical Research Council: G0600717, G0802051, MC_UU_12012/2, MC_UU_12012/5
A genome wide association study of mathematical ability reveals an association at chromosome 3q29, a locus associated with autism and learning difficulties: a preliminary study.
Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridgeshire, United Kingdom; CLASS Clinic, Cambridgeshire and Peterborough NHS Foundation Trust (CPFT), Cambridgeshire, United Kingdom.
Mathematical ability is heritable, but few studies have directly investigated its molecular genetic basis. Here we aimed to identify specific genetic contributions to variation in mathematical ability. We carried out a genome wide association scan using pooled DNA in two groups of U.K. samples, based on end of secondary/high school national academic exam achievement: high (n = 419) versus low (n = 183) mathematical ability while controlling for their verbal ability. Significant differences in allele frequencies between these groups were searched for in 906,600 SNPs using the Affymetrix GeneChip Human Mapping version 6.0 array. After meeting a threshold of p<1.5×10(-5), 12 SNPs from the pooled association analysis were individually genotyped in 542 of the participants and analyzed to validate the initial associations (lowest p-value 1.14 ×10(-6)). In this analysis, one of the SNPs (rs789859) showed significant association after Bonferroni correction, and four (rs10873824, rs4144887, rs12130910 rs2809115) were nominally significant (lowest p-value 3.278 × 10(-4)). Three of the SNPs of interest are located within, or near to, known genes (FAM43A, SFT2D1, C14orf64). The SNP that showed the strongest association, rs789859, is located in a region on chromosome 3q29 that has been previously linked to learning difficulties and autism. rs789859 lies 1.3 kbp downstream of LSG1, and 700 bp upstream of FAM43A, mapping within the potential promoter/regulatory region of the latter. To our knowledge, this is only the second study to investigate the association of genetic variants with mathematical ability, and it highlights a number of interesting markers for future study.
PloS one 2014;9;5;e96374
Global population structure and evolution of Bordetella pertussis and their relationship with vaccination.
Bordetella pertussis causes pertussis, a respiratory disease that is most severe for infants. Vaccination was introduced in the 1950s, and in recent years, a resurgence of disease was observed worldwide, with significant mortality in infants. Possible causes for this include the switch from whole-cell vaccines (WCVs) to less effective acellular vaccines (ACVs), waning immunity, and pathogen adaptation. Pathogen adaptation is suggested by antigenic divergence between vaccine strains and circulating strains and by the emergence of strains with increased pertussis toxin production. We applied comparative genomics to a worldwide collection of 343 B. pertussis strains isolated between 1920 and 2010. The global phylogeny showed two deep branches; the largest of these contained 98% of all strains, and its expansion correlated temporally with the first descriptions of pertussis outbreaks in Europe in the 16th century. We found little evidence of recent geographical clustering of the strains within this lineage, suggesting rapid strain flow between countries. We observed that changes in genes encoding proteins implicated in protective immunity that are included in ACVs occurred after the introduction of WCVs but before the switch to ACVs. Furthermore, our analyses consistently suggested that virulence-associated genes and genes coding for surface-exposed proteins were involved in adaptation. However, many of the putative adaptive loci identified have a physiological role, and further studies of these loci may reveal less obvious ways in which B. pertussis and the host interact. This work provides insight into ways in which pathogens may adapt to vaccination and suggests ways to improve pertussis vaccines. IMPORTANCE Whooping cough is mainly caused by Bordetella pertussis, and current vaccines are targeted against this organism. Recently, there have been increasing outbreaks of whooping cough, even where vaccine coverage is high. Analysis of the genomes of 343 B. pertussis isolates from around the world over the last 100 years suggests that the organism has emerged within the last 500 years, consistent with historical records. We show that global transmission of new strains is very rapid and that the worldwide population of B. pertussis is evolving in response to vaccine introduction, potentially enabling vaccine escape.
Funded by: NIGMS NIH HHS: R01 GM083113, R01 GM113681; Wellcome Trust: 098051
Capturing needles in haystacks: a comparison of B-cell receptor sequencing methods.
Background: Deep-sequencing methods are rapidly developing in the field of B-cell receptor (BCR) and T-cell receptor (TCR) diversity. These promise to revolutionise our understanding of adaptive immune dynamics, identify novel antibodies, and allow monitoring of minimal residual disease. However, different methods for BCR and TCR enrichment and amplification have been proposed. Here we perform the first systematic comparison between different methods of enrichment, amplification and sequencing for generating BCR and TCR repertoires using large sample numbers.
Results: Resampling from the same RNA or cDNA pool results in highly correlated and reproducible repertoires, but resampling low frequency clones leads to stochastic variance. Repertoires generated by different sequencing methods (454 Roche and Illumina MiSeq) and amplification methods (multiplex PCR, 5' Rapid amplification of cDNA ends (5'RACE), and RNA-capture) are highly correlated, and resulting IgHV gene frequencies between the different methods were not significantly different. Read length has an impact on captured repertoire structure, and ultimately full-length BCR sequences are most informative for repertoire analysis as diversity outside of the CDR is very useful for phylogenetic analysis. Additionally, we show RNA-based BCR repertoires are more informative than using DNA.
Conclusions: Repertoires generated by different sequencing and amplification methods are consistent, but we show that read lengths, depths and error profiles should be considered in experimental design, and multiple sampling approaches could be employed to minimise stochastic sampling variation. This detailed investigation of immune repertoire sequencing methods is essential for informing basic and clinical research.
Funded by: Wellcome Trust: 095663
BMC immunology 2014;15;29
Considerations when investigating lncRNA function in vivo.
Andrew R Bassett is in the MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom. email@example.com.
Although a small number of the vast array of animal long non-coding RNAs (lncRNAs) have known effects on cellular processes examined in vitro, the extent of their contributions to normal cell processes throughout development, differentiation and disease for the most part remains less clear. Phenotypes arising from deletion of an entire genomic locus cannot be unequivocally attributed either to the loss of the lncRNA per se or to the associated loss of other overlapping DNA regulatory elements. The distinction between cis- or trans-effects is also often problematic. We discuss the advantages and challenges associated with the current techniques for studying the in vivo function of lncRNAs in the light of different models of lncRNA molecular mechanism, and reflect on the design of experiments to mutate lncRNA loci. These considerations should assist in the further investigation of these transcriptional products of the genome.
Funded by: Cancer Research UK: 11832; European Research Council: 249869; Medical Research Council: G1000801, G1000801B, MC_U137761446, MC_UU_12009/4, MC_UU_12021/1; NCI NIH HHS: P30 CA045508; NHGRI NIH HHS: U54 HG007004, U54 HG007004-2.; Wellcome Trust: 092076, 095606, 103768
A common control group - optimising the experiment design to maximise sensitivity.
Statistical Science Europe, GlaxoSmithKline Pharmaceuticals, Stevenage, United Kingdom.
Methods for choosing an appropriate sample size in animal experiments have received much attention in the statistical and biological literature. Due to ethical constraints the number of animals used is always reduced where possible. However, as the number of animals decreases so the risk of obtaining inconclusive results increases. By using a more efficient experimental design we can, for a given number of animals, reduce this risk. In this paper two popular cases are considered, where planned comparisons are made to compare treatments back to control and when researchers plan to make all pairwise comparisons. By using theoretical and empirical techniques we show that for studies where all pairwise comparisons are made the traditional balanced design, as suggested in the literature, maximises sensitivity. For studies that involve planned comparisons of the treatment groups back to the control group, which are inherently more sensitive due to the reduced multiple testing burden, the sensitivity is maximised by increasing the number of animals in the control group while decreasing the number in the treated groups.
Funded by: NHGRI NIH HHS: 1U54 HG006370-01, U54 HG006370
PloS one 2014;9;12;e114872
No evidence for genome-wide interactions on plasma fibrinogen by smoking, alcohol consumption and body mass index: results from meta-analyses of 80,607 subjects.
Institute of Epidemiology II, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.
Plasma fibrinogen is an acute phase protein playing an important role in the blood coagulation cascade having strong associations with smoking, alcohol consumption and body mass index (BMI). Genome-wide association studies (GWAS) have identified a variety of gene regions associated with elevated plasma fibrinogen concentrations. However, little is yet known about how associations between environmental factors and fibrinogen might be modified by genetic variation. Therefore, we conducted large-scale meta-analyses of genome-wide interaction studies to identify possible interactions of genetic variants and smoking status, alcohol consumption or BMI on fibrinogen concentration. The present study included 80,607 subjects of European ancestry from 22 studies. Genome-wide interaction analyses were performed separately in each study for about 2.6 million single nucleotide polymorphisms (SNPs) across the 22 autosomal chromosomes. For each SNP and risk factor, we performed a linear regression under an additive genetic model including an interaction term between SNP and risk factor. Interaction estimates were meta-analysed using a fixed-effects model. No genome-wide significant interaction with smoking status, alcohol consumption or BMI was observed in the meta-analyses. The most suggestive interaction was found for smoking and rs10519203, located in the LOC123688 region on chromosome 15, with a p value of 6.2 × 10(-8). This large genome-wide interaction study including 80,607 participants found no strong evidence of interaction between genetic variants and smoking status, alcohol consumption or BMI on fibrinogen concentrations. Further studies are needed to yield deeper insight in the interplay between environmental factors and gene variants on the regulation of fibrinogen concentrations.
Funded by: Chief Scientist Office: CZB/4/710; NCI NIH HHS: P30 CA016672; NHLBI NIH HHS: N01-HC-25195, N02-HL-6-4278, R01 HL059367
PloS one 2014;9;12;e111156
Efficacy of a Plasmodium vivax malaria vaccine using ChAd63 and modified vaccinia Ankara expressing thrombospondin-related anonymous protein as assessed with transgenic Plasmodium berghei parasites.
The Jenner Institute, University of Oxford, Oxford, United Kingdom.
Plasmodium vivax is the world's most widely distributed malaria parasite and a potential cause of morbidity and mortality for approximately 2.85 billion people living mainly in Southeast Asia and Latin America. Despite this dramatic burden, very few vaccines have been assessed in humans. The clinically relevant vectors modified vaccinia virus Ankara (MVA) and the chimpanzee adenovirus ChAd63 are promising delivery systems for malaria vaccines due to their safety profiles and proven ability to induce protective immune responses against Plasmodium falciparum thrombospondin-related anonymous protein (TRAP) in clinical trials. Here, we describe the development of new recombinant ChAd63 and MVA vectors expressing P. vivax TRAP (PvTRAP) and show their ability to induce high antibody titers and T cell responses in mice. In addition, we report a novel way of assessing the efficacy of new candidate vaccines against P. vivax using a fully infectious transgenic Plasmodium berghei parasite expressing P. vivax TRAP to allow studies of vaccine efficacy and protective mechanisms in rodents. Using this model, we found that both CD8+ T cells and antibodies mediated protection against malaria using virus-vectored vaccines. Our data indicate that ChAd63 and MVA expressing PvTRAP are good preerythrocytic-stage vaccine candidates with potential for future clinical application.
Funded by: Medical Research Council: G0501670, G0900084; Wellcome Trust: 090532, 095540, 097395, WT098051
Infection and immunity 2014;82;3;1277-86
Human post-mortem synapse proteome integrity screening for proteomic studies of postsynaptic complexes.
Background: Synapses are fundamental components of brain circuits and are disrupted in over 100 neurological and psychiatric diseases. The synapse proteome is physically organized into multiprotein complexes and polygenic mutations converge on postsynaptic complexes in schizophrenia, autism and intellectual disability. Directly characterising human synapses and their multiprotein complexes from post-mortem tissue is essential to understanding disease mechanisms. However, multiprotein complexes have not been directly isolated from human synapses and the feasibility of their isolation from post-mortem tissue is unknown.
Results: Here we establish a screening assay and criteria to identify post-mortem brain samples containing well-preserved synapse proteomes, revealing that neocortex samples are best preserved. We also develop a rapid method for the isolation of synapse proteomes from human brain, allowing large numbers of post-mortem samples to be processed in a short time frame. We perform the first purification and proteomic mass spectrometry analysis of MAGUK Associated Signalling Complexes (MASC) from neurosurgical and post-mortem tissue and find genetic evidence for their involvement in over seventy human brain diseases.
Conclusions: We have demonstrated that synaptic proteome integrity can be rapidly assessed from human post-mortem brain samples prior to its analysis with sophisticated proteomic methods. We have also shown that proteomics of synapse multiprotein complexes from well preserved post-mortem tissue is possible, obtaining structures highly similar to those isolated from biopsy tissue. Finally we have shown that MASC from human synapses are involved with over seventy brain disorders. These findings should have wide application in understanding the synaptic basis of psychiatric and other mental disorders.
Funded by: Medical Research Council: G0802238; Wellcome Trust
Molecular brain 2014;7;88
Genome sequencing of normal cells reveals developmental lineages and mutational processes.
Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.
The somatic mutations present in the genome of a cell accumulate over the lifetime of a multicellular organism. These mutations can provide insights into the developmental lineage tree, the number of divisions that each cell has undergone and the mutational processes that have been operative. Here we describe whole genomes of clonal lines derived from multiple tissues of healthy mice. Using somatic base substitutions, we reconstructed the early cell divisions of each animal, demonstrating the contributions of embryonic cells to adult tissues. Differences were observed between tissues in the numbers and types of mutations accumulated by each cell, which likely reflect differences in the number of cell divisions they have undergone and varying contributions of different mutational processes. If somatic mutation rates are similar to those in mice, the results indicate that precise insights into development and mutagenesis of normal human cells will be possible.
Funded by: Wellcome Trust: 077012/Z/05/Z, 088340, 092096, 098051, 104151, WT100183MA
A pathogenic mosaic TP53 mutation in two germ layers detected by next generation sequencing.
Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom; Department of Paediatrics, University of Cambridge, Cambridge, United Kingdom.
Background: Li-Fraumeni syndrome is caused by germline TP53 mutations and is clinically characterized by a predisposition to a range of cancers, most commonly sarcoma, brain tumours and leukemia. Pathogenic mosaic TP53 mutations have only rarely been described.
Methods and findings: We describe a 2 years old child presenting with three separate cancers over a 6 month period; two soft tissue mesenchymal tumors and an aggressive metastatic neuroblastoma. As conventional testing of blood DNA by Sanger sequencing for mutations in TP53, ALK, and SDH was negative, whole exome sequencing of the blood DNA of the patient and both parents was performed to screen more widely for cancer predisposing mutations. In the patient's but not the parents' DNA we found a c.743 G>A, p.Arg248Gln (CCDS11118.1) TP53 mutation in 3-20% of sequencing reads, a level that would not generally be detectable by Sanger sequencing. Homozygosity for this mutation was detected in all tumor samples analyzed, and germline mosaicism was demonstrated by analysis of the child's newborn blood spot DNA. The occurrence of separate tumors derived from different germ layers suggests that this de novo mutation occurred early in embryogenesis, prior to gastrulation.
Conclusion: The case demonstrates pathogenic mosaicim, detected by next generation deep sequencing, that arose in the early stages of embryogenesis.
Funded by: Wellcome Trust
PloS one 2014;9;5;e96531
Recurrent PTPRB and PLCG1 mutations in angiosarcoma.
Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.
Angiosarcoma is an aggressive malignancy that arises spontaneously or secondarily to ionizing radiation or chronic lymphoedema. Previous work has identified aberrant angiogenesis, including occasional somatic mutations in angiogenesis signaling genes, as a key driver of angiosarcoma. Here we employed whole-genome, whole-exome and targeted sequencing to study the somatic changes underpinning primary and secondary angiosarcoma. We identified recurrent mutations in two genes, PTPRB and PLCG1, which are intimately linked to angiogenesis. The endothelial phosphatase PTPRB, a negative regulator of vascular growth factor tyrosine kinases, harbored predominantly truncating mutations in 10 of 39 tumors (26%). PLCG1, a signal transducer of tyrosine kinases, encoded a recurrent, likely activating p.Arg707Gln missense variant in 3 of 34 cases (9%). Overall, 15 of 39 tumors (38%) harbored at least one driver mutation in angiogenesis signaling genes. Our findings inform and reinforce current therapeutic efforts to target angiogenesis signaling in angiosarcoma.
Funded by: Cancer Research UK: 11359; NCI NIH HHS: K08 CA160443, P30 CA016672; Wellcome Trust: 077012/Z/05/Z, 088340, 093867, 098051
Nature genetics 2014;46;4;376-379
Differential methylation of the TRPA1 promoter in pain sensitivity.
Chronic pain is a global public health problem, but the underlying molecular mechanisms are not fully understood. Here we examine genome-wide DNA methylation, first in 50 identical twins discordant for heat pain sensitivity and then in 50 further unrelated individuals. Whole-blood DNA methylation was characterized at 5.2 million loci by MeDIP sequencing and assessed longitudinally to identify differentially methylated regions associated with high or low pain sensitivity (pain DMRs). Nine meta-analysis pain DMRs show robust evidence for association (false discovery rate 5%) with the strongest signal in the pain gene TRPA1 (P=1.2 × 10(-13)). Several pain DMRs show longitudinal stability consistent with susceptibility effects, have similar methylation levels in the brain and altered expression in the skin. Our approach identifies epigenetic changes in both novel and established candidate genes that provide molecular insights into pain and may generalize to other complex traits.
Funded by: European Research Council: 250157; Wellcome Trust: 084071, 090532
Nature communications 2014;5;2978
Keeping 53BP1 out of focus in mitosis.
The Wellcome Trust and Cancer Research UK Gurdon Institute and Department of Biochemistry, University of Cambridge, Cambridge, CB2 1QN, England, UK.
A recent study published in Science reveals the mechanism and biological importance of DNA damage response abrogation in mitotic cells.
Funded by: Cancer Research UK: 11224; Wellcome Trust: 092096
Cell research 2014;24;7;781-2
Split reality for novel tick virus.
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Nature reviews. Microbiology 2014;12;7;464
The genome of the sparganosis tapeworm Spirometra erinaceieuropaei isolated from the biopsy of a migrating brain lesion.
Background: Sparganosis is an infection with a larval Diphyllobothriidea tapeworm. From a rare cerebral case presented at a clinic in the UK, DNA was recovered from a biopsy sample and used to determine the causative species as Spirometra erinaceieuropaei through sequencing of the cox1 gene. From the same DNA, we have produced a draft genome, the first of its kind for this species, and used it to perform a comparative genomics analysis and to investigate known and potential tapeworm drug targets in this tapeworm.
Results: The 1.26 Gb draft genome of S. erinaceieuropaei is currently the largest reported for any flatworm. Through investigation of β-tubulin genes, we predict that S. erinaceieuropaei larvae are insensitive to the tapeworm drug albendazole. We find that many putative tapeworm drug targets are also present in S. erinaceieuropaei, allowing possible cross application of new drugs. In comparison to other sequenced tapeworm species we observe expansion of protease classes, and of Kuntiz-type protease inhibitors. Expanded gene families in this tapeworm also include those that are involved in processes that add post-translational diversity to the protein landscape, intracellular transport, transcriptional regulation and detoxification.
Conclusions: The S. erinaceieuropaei genome begins to give us insight into an order of tapeworms previously uncharacterized at the genome-wide level. From a single clinical case we have begun to sketch a picture of the characteristics of these organisms. Finally, our work represents a significant technological achievement as we present a draft genome sequence of a rare tapeworm, and from a small amount of starting material.
Funded by: Medical Research Council: G0701652; Wellcome Trust: 098051
Genome biology 2014;15;11;510
A high-definition view of functional genetic variation from natural yeast genomes.
Institute for Research on Cancer and Ageing, Nice (IRCAN), University of Nice, Nice, France.
The question of how genetic variation in a population influences phenotypic variation and evolution is of major importance in modern biology. Yet much is still unknown about the relative functional importance of different forms of genome variation and how they are shaped by evolutionary processes. Here we address these questions by population level sequencing of 42 strains from the budding yeast Saccharomyces cerevisiae and its closest relative S. paradoxus. We find that genome content variation, in the form of presence or absence as well as copy number of genetic material, is higher within S. cerevisiae than within S. paradoxus, despite genetic distances as measured in single-nucleotide polymorphisms being vastly smaller within the former species. This genome content variation, as well as loss-of-function variation in the form of premature stop codons and frameshifting indels, is heavily enriched in the subtelomeres, strongly reinforcing the relevance of these regions to functional evolution. Genes affected by these likely functional forms of variation are enriched for functions mediating interaction with the external environment (sugar transport and metabolism, flocculation, metal transport, and metabolism). Our results and analyses provide a comprehensive view of genomic diversity in budding yeast and expose surprising and pronounced differences between the variation within S. cerevisiae and that within S. paradoxus. We also believe that the sequence data and de novo assemblies will constitute a useful resource for further evolutionary and population genomics studies.
Funded by: Wellcome Trust: WT077192/Z/05/Z
Molecular biology and evolution 2014;31;4;872-88
Izumo meets Juno: preventing polyspermy in fertilization.
Cell Surface Signalling Laboratory; Wellcome Trust Sanger Institute; Cambridge, UK.
Cell cycle (Georgetown, Tex.) 2014;13;13;2019-20
Juno is the egg Izumo receptor and is essential for mammalian fertilization.
Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK.
Fertilization occurs when sperm and egg recognize each other and fuse to form a new, genetically distinct organism. The molecular basis of sperm-egg recognition is unknown, but is likely to require interactions between receptor proteins displayed on their surface. Izumo1 is an essential sperm cell-surface protein, but its receptor on the egg has not been described. Here we identify folate receptor 4 (Folr4) as the receptor for Izumo1 on the mouse egg, and propose to rename it Juno. We show that the Izumo1-Juno interaction is conserved within several mammalian species, including humans. Female mice lacking Juno are infertile and Juno-deficient eggs do not fuse with normal sperm. Rapid shedding of Juno from the oolemma after fertilization suggests a mechanism for the membrane block to polyspermy, ensuring eggs normally fuse with just a single sperm. Our discovery of an essential receptor pair at the nexus of conception provides opportunities for the rational development of new fertility treatments and contraceptives.
Funded by: Wellcome Trust: 098051
A loss of function screen of identified genome-wide association study Loci reveals new genes controlling hematopoiesis.
Department of Haematology, University of Cambridge, Cambridge, United Kingdom; Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom; NHS Blood and Transplant, Cambridge, United Kingdom.
The formation of mature cells by blood stem cells is very well understood at the cellular level and we know many of the key transcription factors that control fate decisions. However, many upstream signalling and downstream effector processes are only partially understood. Genome wide association studies (GWAS) have been particularly useful in providing new directions to dissect these pathways. A GWAS meta-analysis identified 68 genetic loci controlling platelet size and number. Only a quarter of those genes, however, are known regulators of hematopoiesis. To determine function of the remaining genes we performed a medium-throughput genetic screen in zebrafish using antisense morpholino oligonucleotides (MOs) to knock down protein expression, followed by histological analysis of selected genes using a wide panel of different hematopoietic markers. The information generated by the initial knockdown was used to profile phenotypes and to position candidate genes hierarchically in hematopoiesis. Further analysis of brd3a revealed its essential role in differentiation but not maintenance and survival of thrombocytes. Using the from-GWAS-to-function strategy we have not only identified a series of genes that represent novel regulators of thrombopoiesis and hematopoiesis, but this work also represents, to our knowledge, the first example of a functional genetic screening strategy that is a critical step toward obtaining biologically relevant functional data from GWA study for blood cell traits.
Funded by: British Heart Foundation: RG/09/012/28096; PHS HHS: C45041/A14953
PLoS genetics 2014;10;7;e1004450
A quantitative 14-3-3 interaction screen connects the nuclear exosome targeting complex to the DNA damage response.
The Gurdon Institute, Department of Biochemistry, University of Cambridge, Cambridge CB2 1QN, United Kingdom; Genome Integrity Unit, Danish Cancer Society Research Centre, 2100 Copenhagen, Denmark;
RNA metabolism is altered following DNA damage, but the underlying mechanisms are not well understood. Through a 14-3-3 interaction screen for DNA damage-induced protein interactions in human cells, we identified protein complexes connected to RNA biology. These include the nuclear exosome targeting (NEXT) complex that regulates turnover of noncoding RNAs termed promoter upstream transcripts (PROMPTs). We show that the NEXT subunit RBM7 is phosphorylated upon DNA damage by the MAPKAPK2 kinase and establish that this mediates 14-3-3 binding and decreases PROMPT binding. These findings and our observation that cells lacking RBM7 display DNA damage hypersensitivity link PROMPT turnover to the DNA damage response.
Funded by: Cancer Research UK: 11224, C6/A11224, C6946/A14492; Wellcome Trust: 092096, WT092096
Genes & development 2014;28;18;1977-82
Recombination: genomic mix 'n' match.
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Funded by: Medical Research Council: G1100100
Nature reviews. Microbiology 2014;12;12;795
Heterogeneity of genomic evolution and mutational profiles in multiple myeloma.
1] Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK  Department of Haematology, University of Cambridge, CIMR, Cambridge CB2 0XY, UK.
Multiple myeloma is an incurable plasma cell malignancy with a complex and incompletely understood molecular pathogenesis. Here we use whole-exome sequencing, copy-number profiling and cytogenetics to analyse 84 myeloma samples. Most cases have a complex subclonal structure and show clusters of subclonal variants, including subclonal driver mutations. Serial sampling reveals diverse patterns of clonal evolution, including linear evolution, differential clonal response and branching evolution. Diverse processes contribute to the mutational repertoire, including kataegis and somatic hypermutation, and their relative contribution changes over time. We find heterogeneity of mutational spectrum across samples, with few recurrent genes. We identify new candidate genes, including truncations of SP140, LTB, ROBO1 and clustered missense mutations in EGR1. The myeloma genome is heterogeneous across the cohort, and exhibits diversity in clonal admixture and in dynamics of evolution, which may impact prognostic stratification, therapeutic approaches and assessment of disease response to treatment.
Funded by: BLRD VA: I01 BX001584; NCI NIH HHS: P01 CA078378, P01 CA155258, P50 CA100707, R01 CA125711, T32 CA106209; PHS HHS: P01-155258, P01-78378, P50-100007, R01-124929, RCA125711C; Wellcome Trust: 077012/Z/05/Z, 088340
Nature communications 2014;5;2997
Characterization of gene mutations and copy number changes in acute myeloid leukemia using a rapid target enrichment protocol.
Cancer Genome Project, Wellcome Trust Sanger Institute, Cambridge, UK Department of Haematology, University of Cambridge, UK Department of Haematology, Addenbrookes Hospital, Cambridge, UK.
Prognostic stratification is critical for making therapeutic decisions and maximizing survival of patients with acute myeloid leukemia. Advances in the genomics of acute myeloid leukemia have identified several recurrent gene mutations whose prognostic impact is being deciphered. We used HaloPlex target enrichment and Illumina-based next generation sequencing to study 24 recurrently mutated genes in 42 samples of acute myeloid leukemia with a normal karyotype. Read depth varied between and within genes for the same sample, but was predictable and highly consistent across samples. Consequently, we were able to detect copy number changes, such as an interstitial deletion of BCOR, three MLL partial tandem duplications, and a novel KRAS amplification. With regards to coding mutations, we identified likely oncogenic variants in 41 of 42 samples. NPM1 mutations were the most frequent, followed by FLT3, DNMT3A and TET2. NPM1 and FLT3 indels were reported with good efficiency. We also showed that DNMT3A mutations can persist post-chemotherapy and in 2 cases studied at diagnosis and relapse, we were able to delineate the dynamics of tumor evolution and give insights into order of acquisition of variants. HaloPlex is a quick and reliable target enrichment method that can aid diagnosis and prognostic stratification of acute myeloid leukemia patients.
Funded by: Wellcome Trust: 088340, 095663
The Scramble conversion tool.
DNA Pipelines, Wellcome Trust Sanger Institute, Cambridgeshire, CB10 1SA, UK.
Motivation: The reference CRAM file format implementation is in Java. We present 'Scramble': a new C implementation of SAM, BAM and CRAM file I/O.
Results: The C implementation of for CRAM is 1.5-1.7× slower than BAM at decoding but 1.8-2.6× faster at encoding. We see file size savings of 34-55%.
Availability and implementation: Source code is available at http://sourceforge.net/projects/staden/files/io_lib/ under the BSD software licence.
Funded by: Wellcome Trust: 098051
Bioinformatics (Oxford, England) 2014;30;19;2818-9
A genome-wide association study of anorexia nervosa.
1] Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK  University of Split School of Medicine, Split, Croatia.
Anorexia nervosa (AN) is a complex and heritable eating disorder characterized by dangerously low body weight. Neither candidate gene studies nor an initial genome-wide association study (GWAS) have yielded significant and replicated results. We performed a GWAS in 2907 cases with AN from 14 countries (15 sites) and 14 860 ancestrally matched controls as part of the Genetic Consortium for AN (GCAN) and the Wellcome Trust Case Control Consortium 3 (WTCCC3). Individual association analyses were conducted in each stratum and meta-analyzed across all 15 discovery data sets. Seventy-six (72 independent) single nucleotide polymorphisms were taken forward for in silico (two data sets) or de novo (13 data sets) replication genotyping in 2677 independent AN cases and 8629 European ancestry controls along with 458 AN cases and 421 controls from Japan. The final global meta-analysis across discovery and replication data sets comprised 5551 AN cases and 21 080 controls. AN subtype analyses (1606 AN restricting; 1445 AN binge-purge) were performed. No findings reached genome-wide significance. Two intronic variants were suggestively associated: rs9839776 (P=3.01 × 10(-7)) in SOX2OT and rs17030795 (P=5.84 × 10(-6)) in PPP3CA. Two additional signals were specific to Europeans: rs1523921 (P=5.76 × 10(-)(6)) between CUL3 and FAM124B and rs1886797 (P=8.05 × 10(-)(6)) near SPATA13. Comparing discovery with replication results, 76% of the effects were in the same direction, an observation highly unlikely to be due to chance (P=4 × 10(-6)), strongly suggesting that true findings exist but our sample, the largest yet reported, was underpowered for their detection. The accrual of large genotyped AN case-control samples should be an immediate priority for the field.
Funded by: Chief Scientist Office: HSRU1; Medical Research Council: MR/J006742/1, MR/K500999/1; NCATS NIH HHS: UL1 TR000083, UL1TR000083; NCI NIH HHS: 3P50CA093459, 5P50CA097007, 5R01CA133996, P50 CA093459, P50 CA097007, R01 CA133996, U01 CA074783, U24 CA074783, UM1 CA167551; NCRR NIH HHS: U54 RR0252204-01; NIAAA NIH HHS: AA-00145, AA-09203, AA-12502, AA15416, K02 AA018755, K02AA018755, K05 AA000145, R01 AA009203, R01 AA012502, R01 AA015416, R37 AA012502; NICHD NIH HHS: K12 HD001441, K12HD001441; NIEHS NIH HHS: 5R01ES011740, R01 ES011740; NIMH NIH HHS: K01 MH100435, MH066117, MH066122, MH066145, MH066146, MH066147, MH066193, MH0662, MH066287, MH066288, MH066296, R01 MH066117, R01 MH066122, R01 MH066145, R01 MH066146, R01 MH066147, R01 MH066193, R01 MH066287, R01 MH066288, R01 MH066296, R01 MH078075, T32 MH076694; NINDS NIH HHS: F32 NS010045; Wellcome Trust: 088827, 090532, WT088827/Z/09, WT088984
Molecular psychiatry 2014;19;10;1085-94
Pre-Columbian mycobacterial genomes reveal seals as a source of New World human tuberculosis.
1] Department of Archaeological Sciences, University of Tübingen, Ruemelinstraße 23, 72070 Tübingen, Germany .
Modern strains of Mycobacterium tuberculosis from the Americas are closely related to those from Europe, supporting the assumption that human tuberculosis was introduced post-contact. This notion, however, is incompatible with archaeological evidence of pre-contact tuberculosis in the New World. Comparative genomics of modern isolates suggests that M. tuberculosis attained its worldwide distribution following human dispersals out of Africa during the Pleistocene epoch, although this has yet to be confirmed with ancient calibration points. Here we present three 1,000-year-old mycobacterial genomes from Peruvian human skeletons, revealing that a member of the M. tuberculosis complex caused human disease before contact. The ancient strains are distinct from known human-adapted forms and are most closely related to those adapted to seals and sea lions. Two independent dating approaches suggest a most recent common ancestor for the M. tuberculosis complex less than 6,000 years ago, which supports a Holocene dispersal of the disease. Our results implicate sea mammals as having played a role in transmitting the disease to humans across the ocean.
Funded by: European Research Council: 309540; Medical Research Council: MC_U117581288; NIAID NIH HHS: AI090928, R01 AI090928; Wellcome Trust: 098051
Multiplex PCR assay for unequivocal differentiation of Actinobacillus pleuropneumoniae serovars 1 to 3, 5 to 8, 10, and 12.
Section of Paediatrics, Department of Medicine, Imperial College London, St. Mary's Campus, London, United Kingdom.
An improved multiplex PCR, using redesigned primers targeting the serovar 3 capsule locus, which differentiates serovars 3, 6, and 8 Actinobacillus pleuropneumoniae isolates, is described. The new primers eliminate an aberrant serovar 3-indicative amplicon found in some serovar 6 clinical isolates. Furthermore, we have developed a new multiplex PCR for the detection of serovars 1 to 3, 5 to 8, 10, and 12 along with apxIV, thus extending the utility of this diagnostic PCR to cover a broader range of isolates.
Funded by: Biotechnology and Biological Sciences Research Council: BB/G003203/1, BB/G018553/1, BB/G019177/1, BB/G019274/1, BB/G020744/1
Journal of clinical microbiology 2014;52;7;2380-5
Human immunodeficiency virus Tat associates with a specific set of cellular RNAs.
MRC Centre for Medical Molecular Virology, Division of Infection and Immunity, University College London, London WC1E 6BT, UK. firstname.lastname@example.org.
Background: Human Immunodeficiency Virus 1 (HIV-1) exhibits a wide range of interactions with the host cell but whether viral proteins interact with cellular RNA is not clear. A candidate interacting factor is the trans-activator of transcription (Tat) protein. Tat is required for expression of virus genes but activates transcription through an unusual mechanism; binding to an RNA stem-loop, the transactivation response element (TAR), with the host elongation factor P-TEFb. HIV-1 Tat has also been shown to alter the expression of host genes during infection, contributing to viral pathogenesis but, whether Tat also interacts with cellular RNAs is unknown.
Results: Using RNA immunoprecipitation coupled with microarray analysis, we have discovered that HIV-1 Tat is associated with a specific set of human mRNAs in T cells. mRNAs bound by Tat share a stem-loop structural element and encode proteins with common biological roles. In contrast, we do not find evidence that Tat associates with microRNAs or the RNA-induced silencing complex (RISC). The interaction of Tat with cellular RNA requires an intact RNA binding domain and Tat RNA binding is linked to an increase in RNA abundance in cell lines and during infection of primary CD4+ T cells by HIV.
Conclusions: We conclude that Tat interacts with a specific set of human mRNAs in T cells, many of which show changes in abundance in response to Tat and HIV infection. This work uncovers a previously unrecognised interaction between HIV and its host that may contribute to viral alteration of the host cellular environment.
Funded by: Medical Research Council: G0600081, G0802068; Wellcome Trust
Expression of a single-chain human leukocyte antigen-DRA/DRB3*01:01 molecule and differential binding of a monoclonal antibody in the presence of specifically bound human platelet antigen-1a peptide.
Department of Haematology, University of Cambridge, Cambridge, UK.
Background: Studies show that 1 in 1200 neonates have a low platelet (PLT) count due to alloimmunization against human PLT antigen (HPA)-1a (β3 -L33). This mainly occurs in HPA-1a-negative mothers who are positive for the human leukocyte antigen (HLA)-DRB3*01:01 allele, but only about one-third of cases will mount an effective alloimmune response. The development of specific treatment modalities requires that the mechanisms driving the maternal alloimmune response against the fetal PLTs be further explored. An antibody reagent that has a different binding affinity to HLA-DRA/DRB3*01:01 with and without the β3 -L33 peptide would be a valuable reagent to study peptide presentation on maternal antigen-presenting cells.
Study design and methods: To identify such antibodies, HLA-DRA/DRB3*01:01 was recombinantly expressed in Drosophila S2 cells. To delineate the epitope of interesting antibodies, seven mutant HLA-DRA/DRB3*01:01 molecules were generated by site-directed mutagenesis introducing naturally occurring amino acid changes encoded by DRB3*02 and DRB3*03 alleles.
Results: The murine monoclonal antibody (MoAb) DA2 showed robust binding by enzyme-linked immunosorbent assay to recombinant HLA-DRA/DRB3*01:01, but binding was reduced in the presence of β3 -L33 peptide. The binding affinity of DA2 to the mutant HLA-DRA/DRB3*0101 in which serine at Position 60 of the β1-chain was replaced by tyrosine was greatly enhanced. Interestingly the binding of DA2 to the mutant was not reduced by the presence of β3 -L33 peptide.
Conclusion: The results of this study generate a molecular model of the interaction of the HLA-DRA/DRB3*01:01 molecule with MoAb DA2. This will inform functional studies with the recombinant Class II molecules.
Funded by: British Heart Foundation: RG/09/012/28096; Department of Health: RP-PG-0310-1002
Whole-exome sequencing in an extended family with myocardial infarction unmasks familial hypercholesterolemia.
Institute for Integrative and Experimental Genomics, University of Lübeck, 23562 Lübeck, Germany. email@example.com.
Background: Familial hypercholesterolemia (FH) is an autosomal-dominant disease leading to markedly elevated low-density lipoprotein (LDL) cholesterol levels and increased risk for premature myocardial infarction (MI). Mutation carriers display variable LDL cholesterol levels, which may obscure the diagnosis. We examined by whole-exome sequencing a family in which multiple myocardial infarctions occurred at a young age with unclear etiology.
Methods: Whole-exome sequencing of three affected family members, validation of the identified variant with Sanger-sequencing, and subsequent co-segregation analysis in the family.
Results: The index patient (LDL cholesterol 188 mg/dL) was referred for molecular-genetic investigations. He had coronary artery bypass graft (CABG) at the age of 59 years; 12 out of 15 1st, 2nd and 3rd degree relatives were affected with coronary artery disease (CAD) and/or premature myocardial infarction (MI). We sequenced the whole-exome of the patient and two cousins with premature MI. After filtering, we were left with a potentially disease causing variant in the LDL receptor (LDLR) gene, which we validated by Sanger-sequencing (nucleotide substitution in the acceptor splice-site of exon 10, c.1359-1G > A). Sequencing of all family members available for genetic analysis revealed co-segregation of the variant with CAD (LOD 3.0) and increased LDLC (>190 mg/dL), following correction for statin treatment (LOD 4.3). Interestingly, mutation carriers presented with highly variable corrected (183-354 mg/dL) and on-treatment LDL levels (116-274 mg/dL) such that the diagnosis of FH in this family was made only after the molecular-genetic analysis.
Conclusion: Even in families with unusual clustering of CAD FH remains to be underdiagnosed, which underscores the need for implementation of systematic screening programs. Whole-exome sequencing may facilitate identification of disease-causing variants in families with unclear etiology of MI and enable preventive treatment of mutation carriers in a more timely fashion.
BMC cardiovascular disorders 2014;14;108
A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes.
Department of Zoology, University of Oxford, Oxford, UK. firstname.lastname@example.org.
Background: Highly parallel, 'second generation' sequencing technologies have rapidly expanded the number of bacterial whole genome sequences available for study, permitting the emergence of the discipline of population genomics. Most of these data are publically available as unassembled short-read sequence files that require extensive processing before they can be used for analysis. The provision of data in a uniform format, which can be easily assessed for quality, linked to provenance and phenotype and used for analysis, is therefore necessary.
Results: The performance of de novo short-read assembly followed by automatic annotation using the pubMLST.org Neisseria database was assessed and evaluated for 108 diverse, representative, and well-characterised Neisseria meningitidis isolates. High-quality sequences were obtained for >99% of known meningococcal genes among the de novo assembled genomes and four resequenced genomes and less than 1% of reassembled genes had sequence discrepancies or misassembled sequences. A core genome of 1600 loci, present in at least 95% of the population, was determined using the Genome Comparator tool. Genealogical relationships compatible with, but at a higher resolution than, those identified by multilocus sequence typing were obtained with core genome comparisons and ribosomal protein gene analysis which revealed a genomic structure for a number of previously described phenotypes. This unified system for cataloguing Neisseria genetic variation in the genome was implemented and used for multiple analyses and the data are publically available in the PubMLST Neisseria database.
Conclusions: The de novo assembly, combined with automated gene-by-gene annotation, generates high quality draft genomes in which the majority of protein-encoding genes are present with high accuracy. The approach catalogues diversity efficiently, permits analyses of a single genome or multiple genome comparisons, and is a practical approach to interpreting WGS data for large bacterial population samples. The method generates novel insights into the biology of the meningococcus and improves our understanding of the whole population structure, not just disease causing lineages.
Funded by: Wellcome Trust: 087622
BMC genomics 2014;15;1138
The genomic substrate for adaptive radiation in African cichlid fish.
Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.
Cichlid fishes are famous for large, diverse and replicated adaptive radiations in the Great Lakes of East Africa. To understand the molecular mechanisms underlying cichlid phenotypic diversity, we sequenced the genomes and transcriptomes of five lineages of African cichlids: the Nile tilapia (Oreochromis niloticus), an ancestral lineage with low diversity; and four members of the East African lineage: Neolamprologus brichardi/pulcher (older radiation, Lake Tanganyika), Metriaclima zebra (recent radiation, Lake Malawi), Pundamilia nyererei (very recent radiation, Lake Victoria), and Astatotilapia burtoni (riverine species around Lake Tanganyika). We found an excess of gene duplications in the East African lineage compared to tilapia and other teleosts, an abundance of non-coding element divergence, accelerated coding sequence evolution, expression divergence associated with transposable element insertions, and regulation by novel microRNAs. In addition, we analysed sequence data from sixty individuals representing six closely related species from Lake Victoria, and show genome-wide diversifying selection on coding and regulatory variants, some of which were recruited from ancient polymorphisms. We conclude that a number of molecular mechanisms shaped East African cichlid genomes, and that amassing of standing variation during periods of relaxed purifying selection may have been important in facilitating subsequent evolutionary diversification.
Funded by: Cancer Research UK: 11832; Medical Research Council: MC_U137761446, MC_UU_12021/1; NHGRI NIH HHS: U54 HG002045, U54 HG003067; NIDCR NIH HHS: 2R01DE019637-04, F30 DE023013, R01 DE019637; NINDS NIH HHS: R01 NS034950; Wellcome Trust
Phosphoinositide metabolism links cGMP-dependent protein kinase G to essential Ca²⁺ signals at key decision points in the life cycle of malaria parasites.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.
Many critical events in the Plasmodium life cycle rely on the controlled release of Ca²⁺ from intracellular stores to activate stage-specific Ca²⁺-dependent protein kinases. Using the motility of Plasmodium berghei ookinetes as a signalling paradigm, we show that the cyclic guanosine monophosphate (cGMP)-dependent protein kinase, PKG, maintains the elevated level of cytosolic Ca²⁺ required for gliding motility. We find that the same PKG-dependent pathway operates upstream of the Ca²⁺ signals that mediate activation of P. berghei gametocytes in the mosquito and egress of Plasmodium falciparum merozoites from infected human erythrocytes. Perturbations of PKG signalling in gliding ookinetes have a marked impact on the phosphoproteome, with a significant enrichment of in vivo regulated sites in multiple pathways including vesicular trafficking and phosphoinositide metabolism. A global analysis of cellular phospholipids demonstrates that in gliding ookinetes PKG controls phosphoinositide biosynthesis, possibly through the subcellular localisation or activity of lipid kinases. Similarly, phosphoinositide metabolism links PKG to egress of P. falciparum merozoites, where inhibition of PKG blocks hydrolysis of phosphatidylinostitol (4,5)-bisphosphate. In the face of an increasing complexity of signalling through multiple Ca²⁺ effectors, PKG emerges as a unifying factor to control multiple cellular Ca²⁺ signals essential for malaria parasite development and transmission.
Funded by: Medical Research Council: G0501670, G10000779, G1000779; Wellcome Trust: 079643/Z/06/Z, WT093228, WT094752, WT098051
PLoS biology 2014;12;3;e1001806
Genetic interactions affecting human gene expression identified by variance association mapping.
Human Genetics, Wellcome Trust Sanger Institute, Cambridge, United Kingdom NORMENT, KG Jebsen Centre for Psychosis Research, Institute of Clinical Medicine, University of Oslo, Oslo, Norway.
Non-additive interaction between genetic variants, or epistasis, is a possible explanation for the gap between heritability of complex traits and the variation explained by identified genetic loci. Interactions give rise to genotype dependent variance, and therefore the identification of variance quantitative trait loci can be an intermediate step to discover both epistasis and gene by environment effects (GxE). Using RNA-sequence data from lymphoblastoid cell lines (LCLs) from the TwinsUK cohort, we identify a candidate set of 508 variance associated SNPs. Exploiting the twin design we show that GxE plays a role in ∼70% of these associations. Further investigation of these loci reveals 57 epistatic interactions that replicated in a smaller dataset, explaining on average 4.3% of phenotypic variance. In 24 cases, more variance is explained by the interaction than their additive contributions. Using molecular phenotypes in this way may provide a route to uncovering genetic interactions underlying more complex traits.DOI: http://dx.doi.org/10.7554/eLife.01381.001.
Cis and trans effects of human genomic variants on gene expression.
Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland; Institute of Genetics and Genomics in Geneva (iGE3), Geneva, Switzerland; Swiss Institute of Bioinformatics (SIB), Geneva, Switzerland.
Gene expression is a heritable cellular phenotype that defines the function of a cell and can lead to diseases in case of misregulation. In order to detect genetic variations affecting gene expression, we performed association analysis of single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) with gene expression measured in 869 lymphoblastoid cell lines of the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort in cis and in trans. We discovered that 3,534 genes (false discovery rate (FDR) = 5%) are affected by an expression quantitative trait locus (eQTL) in cis and 48 genes are affected in trans. We observed that CNVs are more likely to be eQTLs than SNPs. In addition, we found that variants associated to complex traits and diseases are enriched for trans-eQTLs and that trans-eQTLs are enriched for cis-eQTLs. As a variant affecting both a gene in cis and in trans suggests that the cis gene is functionally linked to the trans gene expression, we looked specifically for trans effects of cis-eQTLs. We discovered that 26 cis-eQTLs are associated to 92 genes in trans with the cis-eQTLs of the transcriptions factors BATF3 and HMX2 affecting the most genes. We then explored if the variation of the level of expression of the cis genes were causally affecting the level of expression of the trans genes and discovered several causal relationships between variation in the level of expression of the cis gene and variation of the level of expression of the trans gene. This analysis shows that a large sample size allows the discovery of secondary effects of human variations on gene expression that can be used to construct short directed gene regulatory networks.
Funded by: Medical Research Council: G9815508, MC_PC_15018, MC_UU_12013/1, MC_UU_12013/4; Wellcome Trust: 102215
PLoS genetics 2014;10;7;e1004461
Gene-gene and gene-environment interactions detected by transcriptome sequence analysis in twins.
1] Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland.  Institute of Genetics and Genomics in Geneva, University of Geneva, Geneva, Switzerland.  Swiss Institute of Bioinformatics, Geneva, Switzerland.
Understanding the genetic architecture of gene expression is an intermediate step in understanding the genetic architecture of complex diseases. RNA sequencing technologies have improved the quantification of gene expression and allow measurement of allele-specific expression (ASE). ASE is hypothesized to result from the direct effect of cis regulatory variants, but a proper estimation of the causes of ASE has not been performed thus far. In this study, we take advantage of a sample of twins to measure the relative contributions of genetic and environmental effects to ASE, and we find substantial effects from gene × gene (G×G) and gene × environment (G×E) interactions. We propose a model where ASE requires genetic variability in cis, a difference in the sequence of both alleles, but where the magnitude of the ASE effect depends on trans genetic and environmental factors that interact with the cis genetic variants.
Funded by: Wellcome Trust: 098051
Nature genetics 2014;47;1;88-91
A multi-country outbreak of Salmonella Newport gastroenteritis in Europe associated with watermelon from Brazil, confirmed by whole genome sequencing: October 2011 to January 2012.
Gastrointestinal, Emerging and Zoonotic Infections, Centre for Infectious Disease Surveillance and Control, Public Health England, Colindale, London, United Kingdom.
In November 2011, the presence of Salmonella Newport in a ready-to-eat watermelon slice was confirmed as part of a local food survey in England. In late December 2011, cases of S. Newport were reported in England, Wales, Northern Ireland, Scotland, Ireland and Germany. During the outbreak, 63 confirmed cases of S. Newport were reported across all six countries with isolates indistinguishable by pulsed-field gel electrophoresis from the watermelon isolate.A subset of outbreak isolates were whole-genome sequenced and were identical to, or one single nucleotide polymorphism different from the watermelon isolate.In total, 46 confirmed cases were interviewed of which 27 reported watermelon consumption. Further investigations confirmed the outbreak was linked to the consumption of watermelon imported from Brazil.Although numerous Salmonella outbreaks associated with melons have been reported in the United States and elsewhere, this is the first of its kind in Europe.Expansion of the melon import market from Brazil represents a potential threat for future outbreaks. Whole genome sequencing is rapidly becoming more accessible and can provide a compelling level of evidence of linkage between human cases and sources of infection,to support public health interventions in global food markets.
Funded by: Wellcome Trust: 079643, WT076964
Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin 2014;19;31;6-13
Genome-wide analysis of cold adaptation in indigenous Siberian populations.
Department of Archaeology and Anthropology, University of Cambridge, Cambridge, United Kingdom.
Following the dispersal out of Africa, where hominins evolved in warm environments for millions of years, our species has colonised different climate zones of the world, including high latitudes and cold environments. The extent to which human habitation in (sub-)Arctic regions has been enabled by cultural buffering, short-term acclimatization and genetic adaptations is not clearly understood. Present day indigenous populations of Siberia show a number of phenotypic features, such as increased basal metabolic rate, low serum lipid levels and increased blood pressure that have been attributed to adaptation to the extreme cold climate. In this study we introduce a dataset of 200 individuals from ten indigenous Siberian populations that were genotyped for 730,525 SNPs across the genome to identify genes and non-coding regions that have undergone unusually rapid allele frequency and long-range haplotype homozygosity change in the recent past. At least three distinct population clusters could be identified among the Siberians, each of which showed a number of unique signals of selection. A region on chromosome 11 (chr11:66-69 Mb) contained the largest amount of clustering of significant signals and also the strongest signals in all the different selection tests performed. We present a list of candidate cold adaption genes that showed significant signals of positive selection with our strongest signals associated with genes involved in energy regulation and metabolism (CPT1A, LRP5, THADA) and vascular smooth muscle contraction (PRKG1). By employing a new method that paints phased chromosome chunks by their ancestry we distinguish local Siberian-specific long-range haplotype signals from those introduced by admixture.
Funded by: Wellcome Trust: 098051
PloS one 2014;9;5;e98076
Rabbit genome analysis reveals a polygenic basis for phenotypic change during domestication.
CIBIO/InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Campus Agrário de Vairão, Universidade do Porto, 4485-661, Vairão, Portugal.
The genetic changes underlying the initial steps of animal domestication are still poorly understood. We generated a high-quality reference genome for the rabbit and compared it to resequencing data from populations of wild and domestic rabbits. We identified more than 100 selective sweeps specific to domestic rabbits but only a relatively small number of fixed (or nearly fixed) single-nucleotide polymorphisms (SNPs) for derived alleles. SNPs with marked allele frequency differences between wild and domestic rabbits were enriched for conserved noncoding sites. Enrichment analyses suggest that genes affecting brain and neuronal development have often been targeted during domestication. We propose that because of a truly complex genetic background, tame behavior in rabbits and other domestic animals evolved by shifts in allele frequencies at many loci, rather than by critical changes at only a few domestication loci.
Funded by: Intramural NIH HHS; NHGRI NIH HHS: U54 HG003067; Wellcome Trust: 095908, WT095908, WT098051
Science (New York, N.Y.) 2014;345;6200;1074-9
Exome sequencing improves genetic diagnosis of structural fetal abnormalities revealed by ultrasound.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.
The genetic etiology of non-aneuploid fetal structural abnormalities is typically investigated by karyotyping and array-based detection of microscopically detectable rearrangements, and submicroscopic copy-number variants (CNVs), which collectively yield a pathogenic finding in up to 10% of cases. We propose that exome sequencing may substantially increase the identification of underlying etiologies. We performed exome sequencing on a cohort of 30 non-aneuploid fetuses and neonates (along with their parents) with diverse structural abnormalities first identified by prenatal ultrasound. We identified candidate pathogenic variants with a range of inheritance models, and evaluated these in the context of detailed phenotypic information. We identified 35 de novo single-nucleotide variants (SNVs), small indels, deletions or duplications, of which three (accounting for 10% of the cohort) are highly likely to be causative. These are de novo missense variants in FGFR3 and COL2A1, and a de novo 16.8 kb deletion that includes most of OFD1. In five further cases (17%) we identified de novo or inherited recessive or X-linked variants in plausible candidate genes, which require additional validation to determine pathogenicity. Our diagnostic yield of 10% is comparable to, and supplementary to, the diagnostic yield of existing microarray testing for large chromosomal rearrangements and targeted CNV detection. The de novo nature of these events could enable couples to be counseled as to their low recurrence risk. This study outlines the way for a substantial improvement in the diagnostic yield of prenatal genetic abnormalities through the application of next-generation sequencing.
Funded by: Wellcome Trust: WT098051
Human molecular genetics 2014;23;12;3269-77
Evolution and transmission of drug-resistant tuberculosis in a Russian population.
Public Health England (PHE) National Mycobacterium Reference Laboratory, Clinical TB and HIV Group, Blizard Institute, Queen Mary University of London, London, UK.
The molecular mechanisms determining the transmissibility and prevalence of drug-resistant tuberculosis in a population were investigated through whole-genome sequencing of 1,000 prospectively obtained patient isolates from Russia. Two-thirds belonged to the Beijing lineage, which was dominated by two homogeneous clades. Multidrug-resistant (MDR) genotypes were found in 48% of isolates overall and in 87% of the major clades. The most common rpoB mutation was associated with fitness-compensatory mutations in rpoA or rpoC, and a new intragenic compensatory substitution was identified. The proportion of MDR cases with extensively drug-resistant (XDR) tuberculosis was 16% overall, with 65% of MDR isolates harboring eis mutations, selected by kanamycin therapy, which may drive the expansion of strains with enhanced virulence. The combination of drug resistance and compensatory mutations displayed by the major clades confers clinical resistance without compromising fitness and transmissibility, showing that, in addition to weaknesses in the tuberculosis control program, biological factors drive the persistence and spread of MDR and XDR tuberculosis in Russia and beyond.
Funded by: Wellcome Trust: 095198, 095198/Z/10/Z, 098051
Nature genetics 2014;46;3;279-86
Whole-genome sequencing reveals clonal expansion of multiresistant Staphylococcus haemolyticus in European hospitals.
Department of Paediatrics, University Hospital of North Norway, Tromsø, Norway Department of Clinical Medicine, UiT The Arctic University of Norway, Tromsø, Norway email@example.com.
Objectives: Staphylococcus haemolyticus is an emerging cause of nosocomial infections, primarily affecting immunocompromised patients. A comparative genomic analysis was performed on clinical S. haemolyticus isolates to investigate their genetic relationship and explore the coding sequences with respect to antimicrobial resistance determinants and putative hospital adaptation.
Methods: Whole-genome sequencing was performed on 134 isolates of S. haemolyticus from geographically diverse origins (Belgium, 2; Germany, 10; Japan, 13; Norway, 54; Spain, 2; Switzerland, 43; UK, 9; USA, 1). Each genome was individually assembled. Protein coding sequences (CDSs) were predicted and homologous genes were categorized into three types: Type I, core genes, homologues present in all strains; Type II, unique core genes, homologues shared by only a subgroup of strains; and Type III, unique genes, strain-specific CDSs. The phylogenetic relationship between the isolates was built from variable sites in the form of single nucleotide polymorphisms (SNPs) in the core genome and used to construct a maximum likelihood phylogeny.
Results: SNPs in the genome core regions divided the isolates into one major group of 126 isolates and one minor group of isolates with highly diverse genomes. The major group was further subdivided into seven clades (A-G), of which four (A-D) encompassed isolates only from Europe. Antimicrobial multiresistance was observed in 77.7% of the collection. High levels of homologous recombination were detected in genes involved in adherence, staphylococcal host adaptation and bacterial cell communication.
Conclusions: The presence of several successful and highly resistant clones underlines the adaptive potential of this opportunistic pathogen.
Funded by: Wellcome Trust: 098051
The Journal of antimicrobial chemotherapy 2014;69;11;2920-7
Found in translation.
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Nature reviews. Microbiology 2014;12;4;238
Spinster homolog 2 (spns2) deficiency causes early onset progressive hearing loss.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom; Wolfson Centre for Age-Related Diseases, King's College London, London, United Kingdom.
Spinster homolog 2 (Spns2) acts as a Sphingosine-1-phosphate (S1P) transporter in zebrafish and mice, regulating heart development and lymphocyte trafficking respectively. S1P is a biologically active lysophospholipid with multiple roles in signalling. The mechanism of action of Spns2 is still elusive in mammals. Here, we report that Spns2-deficient mice rapidly lost auditory sensitivity and endocochlear potential (EP) from 2 to 3 weeks old. We found progressive degeneration of sensory hair cells in the organ of Corti, but the earliest defect was a decline in the EP, suggesting that dysfunction of the lateral wall was the primary lesion. In the lateral wall of adult mutants, we observed structural changes of marginal cell boundaries and of strial capillaries, and reduced expression of several key proteins involved in the generation of the EP (Kcnj10, Kcnq1, Gjb2 and Gjb6), but these changes were likely to be secondary. Permeability of the boundaries of the stria vascularis and of the strial capillaries appeared normal. We also found focal retinal degeneration and anomalies of retinal capillaries together with anterior eye defects in Spns2 mutant mice. Targeted inactivation of Spns2 in red blood cells, platelets, or lymphatic or vascular endothelial cells did not affect hearing, but targeted ablation of Spns2 in the cochlea using a Sox10-Cre allele produced a similar auditory phenotype to the original mutation, suggesting that local Spns2 expression is critical for hearing in mammals. These findings indicate that Spns2 is required for normal maintenance of the EP and hence for normal auditory function, and support a role for S1P signalling in hearing.
Funded by: Medical Research Council: G0300212, MC_PC_U127561112, MC_QA137918; NEI NIH HHS: K08EY020530, R01 EY018213; Wellcome Trust: 098051, 100669
PLoS genetics 2014;10;10;e1004688
A reduction in Ptprq associated with specific features of the deafness phenotype of the miR-96 mutant mouse diminuendo.
Wellcome Trust Sanger Institute, Cambridge, UK; Wolfson Centre for Age-Related Diseases, King's College London, Guy's Campus, London, SE1 1UL, UK.
miR-96 is a microRNA, a non-coding RNA gene which regulates a wide array of downstream genes. The miR-96 mouse mutant diminuendo exhibits deafness and arrested hair cell functional and morphological differentiation. We have previously shown that several genes are markedly downregulated in the diminuendo organ of Corti; one of these is Ptprq, a gene known to be important for maturation and maintenance of hair cells. In order to study the contribution that downregulation of Ptprq makes to the diminuendo phenotype, we carried out microarrays, scanning electron microscopy and single hair cell electrophysiology to compare diminuendo mutants (heterozygous and homozygous) with mice homozygous for a functional null allele of Ptprq. In terms of both morphology and electrophysiology, the auditory phenotype of mice lacking Ptprq resembles that of diminuendo heterozygotes, while diminuendo homozygotes are more severely affected. A comparison of transcriptomes indicates there is a broad similarity between diminuendo homozygotes and Ptprq-null mice. The reduction in Ptprq observed in diminuendo mice appears to be a major contributor to the morphological, transcriptional and electrophysiological phenotype, but does not account for the complete diminuendo phenotype.
Funded by: Action on Hearing Loss: G41; Medical Research Council: G0300212, MC_QA137918; Wellcome Trust: 091895, 100669
The European journal of neuroscience 2014;39;5;744-56
Transcriptional diversity during lineage commitment of human blood progenitors.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
Blood cells derive from hematopoietic stem cells through stepwise fating events. To characterize gene expression programs driving lineage choice, we sequenced RNA from eight primary human hematopoietic progenitor populations representing the major myeloid commitment stages and the main lymphoid stage. We identified extensive cell type-specific expression changes: 6711 genes and 10,724 transcripts, enriched in non-protein-coding elements at early stages of differentiation. In addition, we found 7881 novel splice junctions and 2301 differentially used alternative splicing events, enriched in genes involved in regulatory processes. We demonstrated experimentally cell-specific isoform usage, identifying nuclear factor I/B (NFIB) as a regulator of megakaryocyte maturation-the platelet precursor. Our data highlight the complexity of fating events in closely related progenitor populations, the understanding of which is essential for the advancement of transplantation and regenerative medicine.
Funded by: British Heart Foundation: FS/12/27/29405, RG/09/012/28096, RG/09/12/28096, RP-PG-0310-1002; Cancer Research UK: C45041/A14953; Department of Health: RP-PG-0310-1002; Medical Research Council: MC_UP_0801/1, MR/J011711/1, MR/K006584/1, MR/K023489/1; Wellcome Trust: 082961, 082961/Z/07/Z, 084183/Z/07/Z, 095908, 100140, WT091310, WT098051
Science (New York, N.Y.) 2014;345;6204;1251033
Mutations in SGOL1 cause a novel cohesinopathy affecting heart and gut rhythm.
Department of Pediatrics, Centre Mère Enfants Soleil, Centre Hospitalier de l'Université (CHU) de Québec, Quebec City, Quebec, Canada.
The pacemaking activity of specialized tissues in the heart and gut results in lifelong rhythmic contractions. Here we describe a new syndrome characterized by Chronic Atrial and Intestinal Dysrhythmia, termed CAID syndrome, in 16 French Canadians and 1 Swede. We show that a single shared homozygous founder mutation in SGOL1, a component of the cohesin complex, causes CAID syndrome. Cultured dermal fibroblasts from affected individuals showed accelerated cell cycle progression, a higher rate of senescence and enhanced activation of TGF-β signaling. Karyotypes showed the typical railroad appearance of a centromeric cohesion defect. Tissues derived from affected individuals displayed pathological changes in both the enteric nervous system and smooth muscle. Morpholino-induced knockdown of sgol1 in zebrafish recapitulated the abnormalities seen in humans with CAID syndrome. Our findings identify CAID syndrome as a novel generalized dysrhythmia, suggesting a new role for SGOL1 and the cohesin complex in mediating the integrity of human cardiac and gut rhythm.
Funded by: Canadian Institutes of Health Research
Nature genetics 2014;46;11;1245-9
Generation and characterization of influenza A viruses with altered polymerase fidelity.
Li Ka Shing Faculty of Medicine, Centre of Influenza Research, School of Public Health, The University of Hong Kong, No. 21 Sassoon Road, Pokfulam, Hong Kong SAR, China.
Genetic diversity of influenza A viruses (IAV) acquired through the error-prone RNA-dependent RNA polymerase (RdRP) or through genetic reassortment enables perpetuation of IAV in humans through epidemics or pandemics. Here, to assess the biological significance of genetic diversity acquired through RdRP, we characterize an IAV fidelity variant derived from passaging a seasonal H3N2 virus in the presence of ribavirin, a purine analogue that increases guanosine-to-adenosine mutations. We demonstrate that a single PB1-V43I mutation increases selectivity to guanosine in A/Wuhan/359/95 (H3N2) and A/Vietnam/1203/04 (H5N1) viruses. The H5N1 PB1-V43I-recombinant virus replicates to comparable titres as the wild-type virus in vitro or in the mouse lungs. However, a decrease in viral population diversity at day 3 post inoculation is associated with a tenfold reduced lethality and neurotropism in mice. Applying a fidelity variant with reduced mutational frequency, we provide direct experimental evidence for the role of genetic diversity in IAV pathogenesis.
Funded by: NIAID NIH HHS: HHSN272201400006C, N01AI70005; PHS HHS: HHSN27220140006C; Wellcome Trust
Nature communications 2014;5;4794
Polygenic in vivo validation of cancer mutations using transposons.
The in vivo validation of cancer mutations and genes identified in cancer genomics is resource-intensive because of the low throughput of animal experiments. We describe a mouse model that allows multiple cancer mutations to be validated in each animal line. Animal lines are generated with multiple candidate cancer mutations using transposons. The candidate cancer genes are tagged and randomly expressed in somatic cells, allowing easy identification of the cancer genes involved in the generated tumours. This system presents a useful, generalised and efficient means for animal validation of cancer genes.
Funded by: Wellcome Trust
Genome biology 2014;15;9;455
Your gut microbiota are what you eat.
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Nature reviews. Microbiology 2014;12;1;8
Dense genomic sampling identifies highways of pneumococcal recombination.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
Evasion of clinical interventions by Streptococcus pneumoniae occurs through selection of non-susceptible genomic variants. We report whole-genome sequencing of 3,085 pneumococcal carriage isolates from a 2.4-km(2) refugee camp. This sequencing provides unprecedented resolution of the process of recombination and its impact on population evolution. Genomic recombination hotspots show remarkable consistency between lineages, indicating common selective pressures acting at certain loci, particularly those associated with antibiotic resistance. Temporal changes in antibiotic consumption are reflected in changes in recombination trends, demonstrating rapid spread of resistance when selective pressure is high. The highest frequencies of receipt and donation of recombined DNA fragments were observed in non-encapsulated lineages, implying that this largely overlooked pneumococcal group, which is beyond the reach of current vaccines, may have a major role in genetic exchange and the adaptation of the species as a whole. These findings advance understanding of pneumococcal population dynamics and provide information for the design of future intervention strategies.
Funded by: Wellcome Trust: 098051
Nature genetics 2014;46;3;305-309
Comprehensive identification of single nucleotide polymorphisms associated with beta-lactam resistance within pneumococcal mosaic genes.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
Traditional genetic association studies are very difficult in bacteria, as the generally limited recombination leads to large linked haplotype blocks, confounding the identification of causative variants. Beta-lactam antibiotic resistance in Streptococcus pneumoniae arises readily as the bacteria can quickly incorporate DNA fragments encompassing variants that make the transformed strains resistant. However, the causative mutations themselves are embedded within larger recombined blocks, and previous studies have only analysed a limited number of isolates, leading to the description of "mosaic genes" as being responsible for resistance. By comparing a large number of genomes of beta-lactam susceptible and non-susceptible strains, the high frequency of recombination should break up these haplotype blocks and allow the use of genetic association approaches to identify individual causative variants. Here, we performed a genome-wide association study to identify single nucleotide polymorphisms (SNPs) and indels that could confer beta-lactam non-susceptibility using 3,085 Thai and 616 USA pneumococcal isolates as independent datasets for the variant discovery. The large sample sizes allowed us to narrow the source of beta-lactam non-susceptibility from long recombinant fragments down to much smaller loci comprised of discrete or linked SNPs. While some loci appear to be universal resistance determinants, contributing equally to non-susceptibility for at least two classes of beta-lactam antibiotics, some play a larger role in resistance to particular antibiotics. All of the identified loci have a highly non-uniform distribution in the populations. They are enriched not only in vaccine-targeted, but also non-vaccine-targeted lineages, which may raise clinical concerns. Identification of single nucleotide polymorphisms underlying resistance will be essential for future use of genome sequencing to predict antibiotic sensitivity in clinical microbiology.
Funded by: NIAID NIH HHS: R01 AI106786; Wellcome Trust: 083735/Z/07Z, 098051
PLoS genetics 2014;10;8;e1004547
Calreticulin mutations in myeloproliferative neoplasms and new methodology for their detection and monitoring.
The Center for the Study of Haematological Malignancies, Nicosia, Cyprus.
The diagnosis of the BCR-ABL-negative myeloproliferative neoplasms (MPN), namely polycythemia vera, essential thombocythemia and primary myelofibrosis has relied significantly on the detection of known causative mutations in the JAK2 or MPL genes, which account for the majority of MPN patients. However, around 30 % of patients with MPN, primarily essential thombocythemia and primary myelofibrosis, lack mutations in these two genes making it difficult to reach a confident diagnosis in these cases. The recent discovery of frameshift mutations in CALR in approximately 70 % of MPN patients lacking the JAK2 and MPL mutations offers a reliable diagnostic marker for the latter group. A review of the current literature, plus unpublished data from our laboratory, shows that 55 different CALR insertion/deletion mutations have been identified so far in MPN patients. Among these 55 variants reported to date, a 52-base pair deletion and a 5-base pair insertion are by far the most prominent representing 50 and 35 %, respectively, of all cases with CALR mutations. In this paper, we describe a high-resolution melting (HRM) analysis and a Taqman® Real-Time PCR (RQ-PCR) assay and we propose a new clinical laboratory diagnostic algorithm for CALR mutation analysis. According to this algorithm, samples can go through front-line screening with HMR or fragment analysis, followed by the newly developed RQ-PCR to both discriminate and quantify the two most common mutations in CALR gene.
Funded by: Wellcome Trust: 095663
Annals of hematology 2014;94;3;399-408
International glossina genome initiative 2004-2014: a driver for post-genomic era research on the African continent.
South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa.
Funded by: Wellcome Trust: WT 085775/Z/08/Z; World Health Organization: 001
PLoS neglected tropical diseases 2014;8;8;e3024
Generation of antigenic diversity in Plasmodium falciparum by structured rearrangement of Var genes during mitosis.
Malaria Programme, Wellcome Trust Sanger Institute, Hinxton, United Kingdom.
The most polymorphic gene family in P. falciparum is the ∼60 var genes distributed across parasite chromosomes, both in the subtelomeres and in internal regions. They encode hypervariable surface proteins known as P. falciparum erythrocyte membrane protein 1 (PfEMP1) that are critical for pathogenesis and immune evasion in Plasmodium falciparum. How var gene sequence diversity is generated is not currently completely understood. To address this, we constructed large clone trees and performed whole genome sequence analysis to study the generation of novel var gene sequences in asexually replicating parasites. While single nucleotide polymorphisms (SNPs) were scattered across the genome, structural variants (deletions, duplications, translocations) were focused in and around var genes, with considerable variation in frequency between strains. Analysis of more than 100 recombination events involving var exon 1 revealed that the average nucleotide sequence identity of two recombining exons was only 63% (range: 52.7-72.4%) yet the crossovers were error-free and occurred in such a way that the resulting sequence was in frame and domain architecture was preserved. Var exon 1, which encodes the immunologically exposed part of the protein, recombined in up to 0.2% of infected erythrocytes in vitro per life cycle. The high rate of var exon 1 recombination indicates that millions of new antigenic structures could potentially be generated each day in a single infected individual. We propose a model whereby var gene sequence polymorphism is mainly generated during the asexual part of the life cycle.
Funded by: Medical Research Council: G0600718, MC_EX_MR/L100001/1; NIAID NIH HHS: R01 AI091595; Wellcome Trust: 090770, 098051
PLoS genetics 2014;10;12;e1004812
From cheek swabs to consensus sequences: an A to Z protocol for high-throughput DNA sequencing of complete human mitochondrial genomes.
Department of Anatomy, University of Otago, Dunedin, New Zealand. A.C.Clarke@warwick.ac.uk.
Background: Next-generation DNA sequencing (NGS) technologies have made huge impacts in many fields of biological research, but especially in evolutionary biology. One area where NGS has shown potential is for high-throughput sequencing of complete mtDNA genomes (of humans and other animals). Despite the increasing use of NGS technologies and a better appreciation of their importance in answering biological questions, there remain significant obstacles to the successful implementation of NGS-based projects, especially for new users.
Results: Here we present an 'A to Z' protocol for obtaining complete human mitochondrial (mtDNA) genomes - from DNA extraction to consensus sequence. Although designed for use on humans, this protocol could also be used to sequence small, organellar genomes from other species, and also nuclear loci. This protocol includes DNA extraction, PCR amplification, fragmentation of PCR products, barcoding of fragments, sequencing using the 454 GS FLX platform, and a complete bioinformatics pipeline (primer removal, reference-based mapping, output of coverage plots and SNP calling).
Conclusions: All steps in this protocol are designed to be straightforward to implement, especially for researchers who are undertaking next-generation sequencing for the first time. The molecular steps are scalable to large numbers (hundreds) of individuals and all steps post-DNA extraction can be carried out in 96-well plate format. Also, the protocol has been assembled so that individual 'modules' can be swapped out to suit available resources.
BMC genomics 2014;15;68
Adaptive introgression between Anopheles sibling species eliminates a major genomic island but not reproductive isolation.
1] Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, Liverpool L3 5QA, UK .
Adaptive introgression can provide novel genetic variation to fuel rapid evolutionary responses, though it may be counterbalanced by potential for detrimental disruption of the recipient genomic background. We examine the extent and impact of recent introgression of a strongly selected insecticide-resistance mutation (Vgsc-1014F) located within one of two exceptionally large genomic islands of divergence separating the Anopheles gambiae species pair. Here we show that transfer of the Vgsc mutation results in homogenization of the entire genomic island region (~1.5% of the genome) between species. Despite this massive disruption, introgression is clearly adaptive with a dramatic rise in frequency of Vgsc-1014F and no discernable impact on subsequent reproductive isolation between species. Our results show (1) how resilience of genomes to massive introgression can permit rapid adaptive response to anthropogenic selection and (2) that even extreme prominence of genomic islands of divergence can be an unreliable indicator of importance in speciation.
Funded by: NIAID NIH HHS: R01 AI082734, R01AI082734; Wellcome Trust: 090770, WT094960MA
Nature communications 2014;5;4248
A Selective Sweep on a Deleterious Mutation in CPT1A in Arctic Populations.
Department of Archaeology and Anthropology, University of Cambridge, Cambridge CB2 3QG, UK.
Arctic populations live in an environment characterized by extreme cold and the absence of plant foods for much of the year and are likely to have undergone genetic adaptations to these environmental conditions in the time they have been living there. Genome-wide selection scans based on genotype data from native Siberians have previously highlighted a 3 Mb chromosome 11 region containing 79 protein-coding genes as the strongest candidates for positive selection in Northeast Siberians. However, it was not possible to determine which of the genes might be driving the selection signal. Here, using whole-genome high-coverage sequence data, we identified the most likely causative variant as a nonsynonymous G>A transition (rs80356779; c.1436C>T [p.Pro479Leu] on the reverse strand) in CPT1A, a key regulator of mitochondrial long-chain fatty-acid oxidation. Remarkably, the derived allele is associated with hypoketotic hypoglycemia and high infant mortality yet occurs at high frequency in Canadian and Greenland Inuits and was also found at 68% frequency in our Northeast Siberian sample. We provide evidence of one of the strongest selective sweeps reported in humans; this sweep has driven this variant to high frequency in circum-Arctic populations within the last 6-23 ka despite associated deleterious consequences, possibly as a result of the selective advantage it originally provided to either a high-fat diet or a cold environment.
Funded by: Biotechnology and Biological Sciences Research Council: BB/H002731/1, BB/H005854/1; British Heart Foundation: PG/12/53/29714; European Research Council: 261213; Medical Research Council: G0600717, MC_G0802535, MC_UU_12012/2
American journal of human genetics 2014;95;5;584-589
KLF2 mutation is the most frequent somatic change in splenic marginal zone lymphoma and identifies a subset with distinct genotype.
Division of Molecular Histopathology, Department of Pathology, University of Cambridge, Cambridge, UK.
To characterise the genetics of splenic marginal zone lymphoma (SMZL), we performed whole exome sequencing of 16 cases and identified novel recurrent inactivating mutations in Kruppel-like factor 2 (KLF2), a gene whose deficiency was previously shown to cause splenic marginal zone hyperplasia in mice. KLF2 mutation was found in 40 (42%) of 96 SMZLs, but rarely in other B-cell lymphomas. The majority of KLF2 mutations were frameshift indels or nonsense changes, with missense mutations clustered in the C-terminal zinc finger domains. Functional assays showed that these mutations inactivated the ability of KLF2 to suppress NF-κB activation by TLR, BCR, BAFFR and TNFR signalling. Further extensive investigations revealed common and distinct genetic changes between SMZL with and without KLF2 mutation. IGHV1-2 rearrangement and 7q deletion were primarily seen in SMZL with KLF2 mutation, while MYD88 and TP53 mutations were nearly exclusively found in those without KLF2 mutation. NOTCH2, TRAF3, TNFAIP3 and CARD11 mutations were observed in SMZL both with and without KLF2 mutation. Taken together, KLF2 mutation is the most common genetic change in SMZL and identifies a subset with a distinct genotype characterised by multi-genetic changes. These different genetic changes may deregulate various signalling pathways and generate cooperative oncogenic properties, thereby contributing to lymphomagenesis.
Funded by: Medical Research Council; Wellcome Trust: 095663
PolyTB: a genomic variation map for Mycobacterium tuberculosis.
Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, WC1E 7HT London, UK. Electronic address: firstname.lastname@example.org.
Tuberculosis (TB) caused by Mycobacterium tuberculosis (Mtb) is the second major cause of death from an infectious disease worldwide. Recent advances in DNA sequencing are leading to the ability to generate whole genome information in clinical isolates of M. tuberculosis complex (MTBC). The identification of informative genetic variants such as phylogenetic markers and those associated with drug resistance or virulence will help barcode Mtb in the context of epidemiological, diagnostic and clinical studies. Mtb genomic datasets are increasingly available as raw sequences, which are potentially difficult and computer intensive to process, and compare across studies. Here we have processed the raw sequence data (>1500 isolates, eight studies) to compile a catalogue of SNPs (n = 74,039, 63% non-synonymous, 51.1% in more than one isolate, i.e. non-private), small indels (n = 4810) and larger structural variants (n = 800). We have developed the PolyTB web-based tool (http://pathogenseq.lshtm.ac.uk/polytb) to visualise the resulting variation and important meta-data (e.g. in silico inferred strain-types, location) within geographical map and phylogenetic views. This resource will allow researchers to identify polymorphisms within candidate genes of interest, as well as examine the genomic diversity and distribution of strains. PolyTB source code is freely available to researchers wishing to develop similar tools for their pathogen of interest.
Tuberculosis (Edinburgh, Scotland) 2014;94;3;346-54
Confident and sensitive phosphoproteomics using combinations of collision induced dissociation and electron transfer dissociation.
Proteomic Mass Spectrometry, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.
Unlabelled: We present a workflow using an ETD-optimised version of Mascot Percolator and a modified version of SLoMo (turbo-SLoMo) for analysis of phosphoproteomic data. We have benchmarked this against several database searching algorithms and phosphorylation site localisation tools and show that it offers highly sensitive and confident phosphopeptide identification and site assignment with PSM-level statistics, enabling rigorous comparison of data acquisition methods. We analysed the Plasmodium falciparum schizont phosphoproteome using for the first time, a data-dependent neutral loss-triggered-ETD (DDNL) strategy and a conventional decision-tree method. At a posterior error probability threshold of 0.01, similar numbers of PSMs were identified using both methods with a 73% overlap in phosphopeptide identifications. The false discovery rate associated with spectral pairs where DDNL CID/ETD identified the same phosphopeptide was <1%. 72% of phosphorylation site assignments using turbo-SLoMo without any score filtering, were identical and 99.8% of these cases are associated with a false localisation rate of <5%. We show that DDNL acquisition is a useful approach for phosphoproteomics and results in an increased confidence in phosphopeptide identification without compromising sensitivity or duty cycle. Furthermore, the combination of Mascot Percolator and turbo-SLoMo represents a robust workflow for phosphoproteomic data analysis using CID and ETD fragmentation.
Biological significance: Protein phosphorylation is a ubiquitous post-translational modification that regulates protein function. Mass spectrometry-based approaches have revolutionised its analysis on a large-scale but phosphorylation sites are often identified by single phosphopeptides and therefore require more rigorous data analysis to unsure that sites are identified with high confidence for follow-up experiments to investigate their biological significance. The coverage and confidence of phosphoproteomic experiments can be enhanced by the use of multiple complementary fragmentation methods. Here we have benchmarked a data analysis pipeline for analysis of phosphoproteomic data generated using CID and ETD fragmentation and used it to demonstrate the utility of a data-dependent neutral loss triggered ETD fragmentation strategy for high confidence phosphopeptide identification and phosphorylation site localisation.
Funded by: Wellcome Trust: 079643/Z/06/Z
Journal of proteomics 2014;103;1-14
Human genomic regions with exceptionally high levels of population differentiation identified from 911 whole-genome sequences.
Background: Population differentiation has proved to be effective for identifying loci under geographically localized positive selection, and has the potential to identify loci subject to balancing selection. We have previously investigated the pattern of genetic differentiation among human populations at 36.8 million genomic variants to identify sites in the genome showing high frequency differences. Here, we extend this dataset to include additional variants, survey sites with low levels of differentiation, and evaluate the extent to which highly differentiated sites are likely to result from selective or other processes.
Results: We demonstrate that while sites with low differentiation represent sampling effects rather than balancing selection, sites showing extremely high population differentiation are enriched for positive selection events and that one half may be the result of classic selective sweeps. Among these, we rediscover known examples, where we actually identify the established functional SNP, and discover novel examples including the genes ABCA12, CALD1 and ZNF804, which we speculate may be linked to adaptations in skin, calcium metabolism and defense, respectively.
Conclusions: We identify known and many novel candidate regions for geographically restricted positive selection, and suggest several directions for further research.
Funded by: NCI NIH HHS: R01 CA166661; NHGRI NIH HHS: R01 HG002898, U01 HG006513, U41 HG007234; NIMHD NIH HHS: P20 MD006899; Wellcome Trust: 098051
Genome biology 2014;15;6;R88
Processed pseudogenes acquired somatically during cancer development.
Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.
Cancer evolves by mutation, with somatic reactivation of retrotransposons being one such mutational process. Germline retrotransposition can cause processed pseudogenes, but whether this occurs somatically has not been evaluated. Here we screen sequencing data from 660 cancer samples for somatically acquired pseudogenes. We find 42 events in 17 samples, especially non-small cell lung cancer (5/27) and colorectal cancer (2/11). Genomic features mirror those of germline LINE element retrotranspositions, with frequent target-site duplications (67%), consensus TTTTAA sites at insertion points, inverted rearrangements (21%), 5' truncation (74%) and polyA tails (88%). Transcriptional consequences include expression of pseudogenes from UTRs or introns of target genes. In addition, a somatic pseudogene that integrated into the promoter and first exon of the tumour suppressor gene, MGA, abrogated expression from that allele. Thus, formation of processed pseudogenes represents a new class of mutation occurring during cancer development, with potentially diverse functional consequences depending on genomic context.
Funded by: Cancer Research UK; NCI NIH HHS: P01 CA155258; Wellcome Trust: 077012/Z/05/Z, 088340, 091730, 092002
Nature communications 2014;5;3644
Genomic identification of a novel co-trimoxazole resistance genotype and its prevalence amongst Streptococcus pneumoniae in Malawi.
Malawi-Liverpool-Wellcome Clinical Research Programme, University of Malawi, College of Medicine, Blantyre, Malawi.
Objectives: This study aimed to define the molecular basis of co-trimoxazole resistance in Malawian pneumococci under the dual selective pressure of widespread co-trimoxazole and sulfadoxine/pyrimethamine use.
Methods: We measured the trimethoprim and sulfamethoxazole MICs and analysed folA and folP nucleotide and translated amino acid sequences for 143 pneumococci isolated from carriage and invasive disease in Malawi (2002-08).
Results: Pneumococci were highly resistant to both trimethoprim and sulfamethoxazole (96%, 137/143). Sulfamethoxazole-resistant isolates showed a 3 or 6 bp insertion in the sulphonamide-binding site of folP. The trimethoprim-resistant isolates fell into three genotypic groups based on dihydrofolate reductase (encoded by folA) mutations: Ile-100-Leu (10%), the Ile-100-Leu substitution together with a residue 92 substitution (56%) and those with a novel uncharacterized resistance genotype (34%). The nucleotide sequence divergence and dN/dS of folA and folP remained stable from 2004 onwards.
Conclusions: S. pneumoniae exhibit almost universal co-trimoxazole resistance in vitro and in silico that we believe is driven by extensive co-trimoxazole and sulfadoxine/pyrimethamine use. More than one-third of pneumococci employ a novel mechanism of co-trimoxazole resistance. Resistance has now reached a point of stabilizing evolution. The use of co-trimoxazole to prevent pneumococcal infection in HIV/AIDS patients in sub-Saharan Africa should be re-evaluated.
Funded by: Wellcome Trust: 101113
The Journal of antimicrobial chemotherapy 2014;69;2;368-74
BioJS: an open source standard for biological visualisation - its status in 2014.
The Genome Analysis Centre, Norwich Research Park, Norwich, NR4 7UH, UK.
BioJS is a community-based standard and repository of functional components to represent biological information on the web. The development of BioJS has been prompted by the growing need for bioinformatics visualisation tools to be easily shared, reused and discovered. Its modular architecture makes it easy for users to find a specific functionality without needing to know how it has been built, while components can be extended or created for implementing new functionality. The BioJS community of developers currently provides a range of functionality that is open access and freely available. A registry has been set up that categorises and provides installation instructions and testing facilities at http://www.ebi.ac.uk/tools/biojs/. The source code for all components is available for ready use at https://github.com/biojs/biojs.
Full genome virus detection in fecal samples using sensitive nucleic acid preparation, deep sequencing, and a novel iterative sequence classification algorithm.
Wellcome Trust Sanger Institute, Hinxton, United Kingdom.
We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis.
Funded by: Wellcome Trust
PloS one 2014;9;4;e93269
Deep sequencing of norovirus genomes defines evolutionary patterns in an urban tropical setting.
The Wellcome Trust Sanger Institute, Hinxton, United Kingdom.
Unlabelled: Norovirus is a highly transmissible infectious agent that causes epidemic gastroenteritis in susceptible children and adults. Norovirus infections can be severe and can be initiated from an exceptionally small number of viral particles. Detailed genome sequence data are useful for tracking norovirus transmission and evolution. To address this need, we have developed a whole-genome deep-sequencing method that generates entire genome sequences from small amounts of clinical specimens. This novel approach employs an algorithm for reverse transcription and PCR amplification primer design using all of the publically available norovirus sequence data. Deep sequencing and de novo assembly were used to generate norovirus genomes from a large set of diarrheal patients attending three hospitals in Ho Chi Minh City, Vietnam, over a 2.5-year period. Positive-selection analysis and direct examination of protein changes in the virus over time identified codons in the regions encoding proteins VP1, p48 (NS1-2), and p22 (NS4) under positive selection and expands the known targets of norovirus evolutionary pressure.
Importance: The high transmissibility and rapid evolutionary rate of norovirus, combined with a short-lived host immune responses, are thought to be the reasons why the virus causes the majority of pediatric viral diarrhea cases. The evolutionary patterns of this RNA virus have been described in detail for only a portion of the virus genome and never for a virus from a detailed urban tropical setting. We provide a detailed sequence description of the noroviruses circulating in three Ho Chi Minh City hospitals over a 2.5-year period. This study identified patterns of virus change in known sites of host immune response and identified three additional regions of the virus genome under selection that were not previously recognized. In addition, the method described here provides a robust full-genome sequencing platform for community-based virus surveillance.
Funded by: Wellcome Trust: 100087, WT/093724
Journal of virology 2014;88;19;11056-69
Spread, circulation, and evolution of the Middle East respiratory syndrome coronavirus.
Unlabelled: The Middle East respiratory syndrome coronavirus (MERS-CoV) was first documented in the Kingdom of Saudi Arabia (KSA) in 2012 and, to date, has been identified in 180 cases with 43% mortality. In this study, we have determined the MERS-CoV evolutionary rate, documented genetic variants of the virus and their distribution throughout the Arabian peninsula, and identified the genome positions under positive selection, important features for monitoring adaptation of MERS-CoV to human transmission and for identifying the source of infections. Respiratory samples from confirmed KSA MERS cases from May to September 2013 were subjected to whole-genome deep sequencing, and 32 complete or partial sequences (20 were ≥ 99% complete, 7 were 50 to 94% complete, and 5 were 27 to 50% complete) were obtained, bringing the total available MERS-CoV genomic sequences to 65. An evolutionary rate of 1.12 × 10(-3) substitutions per site per year (95% credible interval [95% CI], 8.76 × 10(-4); 1.37 × 10(-3)) was estimated, bringing the time to most recent common ancestor to March 2012 (95% CI, December 2011; June 2012). Only one MERS-CoV codon, spike 1020, located in a domain required for cell entry, is under strong positive selection. Four KSA MERS-CoV phylogenetic clades were found, with 3 clades apparently no longer contributing to current cases. The size of the population infected with MERS-CoV showed a gradual increase to June 2013, followed by a decline, possibly due to increased surveillance and infection control measures combined with a basic reproduction number (R0) for the virus that is less than 1.
Importance: MERS-CoV adaptation toward higher rates of sustained human-to-human transmission appears not to have occurred yet. While MERS-CoV transmission currently appears weak, careful monitoring of changes in MERS-CoV genomes and of the MERS epidemic should be maintained. The observation of phylogenetically related MERS-CoV in geographically diverse locations must be taken into account in efforts to identify the animal source and transmission of the virus.
Funded by: Wellcome Trust: 095831
The genome and life-stage specific transcriptomes of Globodera pallida elucidate key aspects of plant parasitism by a cyst nematode.
Background: Globodera pallida is a devastating pathogen of potato crops, making it one of the most economically important plant parasitic nematodes. It is also an important model for the biology of cyst nematodes. Cyst nematodes and root-knot nematodes are the two most important plant parasitic nematode groups and together represent a global threat to food security.
Results: We present the complete genome sequence of G. pallida, together with transcriptomic data from most of the nematode life cycle, particularly focusing on the life cycle stages involved in root invasion and establishment of the biotrophic feeding site. Despite the relatively close phylogenetic relationship with root-knot nematodes, we describe a very different gene family content between the two groups and in particular extensive differences in the repertoire of effectors, including an enormous expansion of the SPRY domain protein family in G. pallida, which includes the SPRYSEC family of effectors. This highlights the distinct biology of cyst nematodes compared to the root-knot nematodes that were, until now, the only sedentary plant parasitic nematodes for which genome information was available. We also present in-depth descriptions of the repertoires of other genes likely to be important in understanding the unique biology of cyst nematodes and of potential drug targets and other targets for their control.
Conclusions: The data and analyses we present will be central in exploiting post-genomic approaches in the development of much-needed novel strategies for the control of G. pallida and related pathogens.
Funded by: Biotechnology and Biological Sciences Research Council: BB/F000642/1, BB/F00334X/1, BB/G007071/1
Genome biology 2014;15;3;R43
Genome-wide association study of sexual maturation in males and females highlights a role for body mass and menarche loci in male puberty.
Institute for Molecular Medicine Finland (FIMM).
Little is known about genes regulating male puberty. Further, while many identified pubertal timing variants associate with age at menarche, a late manifestation of puberty, and body mass, little is known about these variants' relationship to pubertal initiation or tempo. To address these questions, we performed genome-wide association meta-analysis in over 11 000 European samples with data on early pubertal traits, male genital and female breast development, measured by the Tanner scale. We report the first genome-wide significant locus for male sexual development upstream of myocardin-like 2 (MKL2) (P = 8.9 × 10(-9)), a menarche locus tagging a developmental pathway linking earlier puberty with reduced pubertal growth (P = 4.6 × 10(-5)) and short adult stature (p = 7.5 × 10(-6)) in both males and females. Furthermore, our results indicate that a proportion of menarche loci are important for pubertal initiation in both sexes. Consistent with epidemiological correlations between increased prepubertal body mass and earlier pubertal timing in girls, body mass index (BMI)-increasing alleles correlated with earlier breast development. In boys, some BMI-increasing alleles associated with earlier, and others with delayed, sexual development; these genetic results mimic the controversy in epidemiological studies, some of which show opposing correlations between prepubertal BMI and male puberty. Our results contribute to our understanding of the pubertal initiation program in both sexes and indicate that although mechanisms regulating pubertal onset in males and females may largely be shared, the relationship between body mass and pubertal timing in boys may be complex and requires further genetic studies.
Funded by: CCR NIH HHS: NIMH 1RC2MH089995-01; Canadian Institutes of Health Research: MOP-82893; Department of Health; Medical Research Council: G0000934, G9815508, MC_PC_15018, MC_U106179472, MC_UP_A620_1014, MC_UU_12011/1, MC_UU_12013/1, MC_UU_12013/3, MC_UU_12015/2; NIDDK NIH HHS: U01 DK062418; NIMH NIH HHS: RC2 MH089951, RC2 MH089995; Wellcome Trust: 068545/Z/02, 076113, 090532, 092731, 098051, 098395, 102215
Human molecular genetics 2014;23;16;4452-64
Quantitation of malaria parasite-erythrocyte cell-cell interactions using optical tweezers.
Cavendish Laboratory, University of Cambridge, Cambridge, United Kingdom.
Erythrocyte invasion by Plasmodium falciparum merozoites is an essential step for parasite survival and hence the pathogenesis of malaria. Invasion has been studied intensively, but our cellular understanding has been limited by the fact that it occurs very rapidly: invasion is generally complete within 1 min, and shortly thereafter the merozoites, at least in in vitro culture, lose their invasive capacity. The rapid nature of the process, and hence the narrow time window in which measurements can be taken, have limited the tools available to quantitate invasion. Here we employ optical tweezers to study individual invasion events for what we believe is the first time, showing that newly released P. falciparum merozoites, delivered via optical tweezers to a target erythrocyte, retain their ability to invade. Even spent merozoites, which had lost the ability to invade, retain the ability to adhere to erythrocytes, and furthermore can still induce transient local membrane deformations in the erythrocyte membrane. We use this technology to measure the strength of the adhesive force between merozoites and erythrocytes, and to probe the cellular mode of action of known invasion inhibitory treatments. These data add to our understanding of the erythrocyte-merozoite interactions that occur during invasion, and demonstrate the power of optical tweezers technologies in unraveling the blood-stage biology of malaria.
Funded by: Wellcome Trust: 098051
Biophysical journal 2014;107;4;846-53
Evidence for soft selective sweeps in the evolution of pneumococcal multidrug resistance and vaccine escape.
Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard School of Public Health, Boston, MassachusettsPathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
The multidrug-resistant Streptococcus pneumoniae Taiwan(19F)-14, or PMEN14, clone was first observed with a 19F serotype, which is targeted by the heptavalent polysaccharide conjugate vaccine (PCV7). However, "vaccine escape" PMEN14 isolates with a 19A serotype became an increasingly important cause of disease post-PCV7. Whole genome sequencing was used to characterize the recent evolution of 173 pneumococci of, or related to, PMEN14. This suggested that PMEN14 is a single lineage that originated in the late 1980s in parallel with the acquisition of multiple resistances by close relatives. One of the four detected serotype switches to 19A generated representatives of the sequence type (ST) 320 isolates that have been highly successful post-PCV7. A second produced an ST236 19A genotype with reduced resistance to β-lactams owing to alteration of pbp1a and pbp2x sequences through the same recombination that caused the change in serotype. A third, which generated a mosaic capsule biosynthesis locus, resulted in serotype 19A ST271 isolates. The rapid diversification through homologous recombination seen in the global collection was similarly observed in the absence of vaccination in a set of isolates from the Maela refugee camp in Thailand, a collection that also allowed variation to be observed within carriage through longitudinal sampling. This suggests that some pneumococcal genotypes generate a pool of standing variation that is sufficiently extensive to result in "soft" selective sweeps: The emergence of multiple mutants in parallel upon a change in selection pressure, such as vaccine introduction. The subsequent competition between these mutants makes this phenomenon difficult to detect without deep sampling of individual lineages.
Funded by: Wellcome Trust: 083735/Z/07/Z, 098051
Genome biology and evolution 2014;6;7;1589-602
Diversification of bacterial genome content through distinct mechanisms over different timescales.
1] Centre for Communicable Disease Dynamics, Harvard School of Public Health, 677 Huntington Avenue, Boston, Massachusetts 02115, USA  Department of Infectious Disease Epidemiology, St. Mary's Campus, Imperial College, London W2 1PG, UK.
Bacterial populations often consist of multiple co-circulating lineages. Determining how such population structures arise requires understanding what drives bacterial diversification. Using 616 systematically sampled genomes, we show that Streptococcus pneumoniae lineages are typically characterized by combinations of infrequently transferred stable genomic islands: those moving primarily through transformation, along with integrative and conjugative elements and phage-related chromosomal islands. The only lineage containing extensive unique sequence corresponds to a set of atypical unencapsulated isolates that may represent a distinct species. However, prophage content is highly variable even within lineages, suggesting frequent horizontal transmission that would necessitate rapidly diversifying anti-phage mechanisms to prevent these viruses sweeping through populations. Correspondingly, two loci encoding Type I restriction-modification systems able to change their specificity over short timescales through intragenomic recombination are ubiquitous across the collection. Hence short-term pneumococcal variation is characterized by movement of phage and intragenomic rearrangements, with the slower transfer of stable loci distinguishing lineages.
Funded by: NIAID NIH HHS: R01 AI066304, R01AI066304; Wellcome Trust: 098051
Nature communications 2014;5;5471
Variable recombination dynamics during the emergence, transmission and 'disarming' of a multidrug-resistant pneumococcal clone.
Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. email@example.com.
Background: Pneumococcal β-lactam resistance was first detected in Iceland in the late 1980s, and subsequently peaked at almost 25% of clinical isolates in the mid-1990s largely due to the spread of the internationally-disseminated multidrug-resistant PMEN2 (or Spain6B-2) clone of Streptococcus pneumoniae.
Results: Whole genome sequencing of an international collection of 189 isolates estimated that PMEN2 emerged around the late 1960s, developing resistance through multiple homologous recombinations and the acquisition of a Tn5253-type integrative and conjugative element (ICE). Two distinct clades entered Iceland in the 1980s, one of which had acquired a macrolide resistance cassette and was estimated to have risen sharply in its prevalence by coalescent analysis. Transmission within the island appeared to mainly emanate from Reykjavík and the Southern Peninsular, with evolution of the bacteria effectively clonal, mainly due to a prophage disrupting a gene necessary for genetic transformation in many isolates. A subsequent decline in PMEN2's prevalence in Iceland coincided with a nationwide campaign that reduced dispensing of antibiotics to children in an attempt to limit its spread. Specific mutations causing inactivation or loss of ICE-borne resistance genes were identified from the genome sequences of isolates that reverted to drug susceptible phenotypes around this time. Phylogenetic analysis revealed some of these occurred on multiple occasions in parallel, suggesting they may have been at least temporarily advantageous. However, alteration of 'core' sequences associated with resistance was precluded by the absence of any substantial homologous recombination events.
Conclusions: PMEN2's clonal evolution was successful over the short-term in a limited geographical region, but its inability to alter major antigens or 'core' gene sequences associated with resistance may have prevented persistence over longer timespans.
Funded by: Wellcome Trust: 098051
BMC biology 2014;12;49
Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins.
Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK Center for Communicable Disease Dynamics, Harvard School of Public Health, 677 Longwood Avenue, Boston, MA 02115, USA Department of Infectious Disease Epidemiology, Imperial College London, St. Mary's Campus, Norfolk Place, London W2 1PG, UK.
The emergence of new sequencing technologies has facilitated the use of bacterial whole genome alignments for evolutionary studies and outbreak analyses. These datasets, of increasing size, often include examples of multiple different mechanisms of horizontal sequence transfer resulting in substantial alterations to prokaryotic chromosomes. The impact of these processes demands rapid and flexible approaches able to account for recombination when reconstructing isolates' recent diversification. Gubbins is an iterative algorithm that uses spatial scanning statistics to identify loci containing elevated densities of base substitutions suggestive of horizontal sequence transfer while concurrently constructing a maximum likelihood phylogeny based on the putative point mutations outside these regions of high sequence diversity. Simulations demonstrate the algorithm generates highly accurate reconstructions under realistically parameterized models of bacterial evolution, and achieves convergence in only a few hours on alignments of hundreds of bacterial genome sequences. Gubbins is appropriate for reconstructing the recent evolutionary history of a variety of haploid genotype alignments, as it makes no assumptions about the underlying mechanism of recombination. The software is freely available for download at github.com/sanger-pathogens/Gubbins, implemented in Python and C and supported on Linux and Mac OS X.
Funded by: Medical Research Council: MR/K010174/1, MR/L015080/1; Wellcome Trust: 098051
Nucleic acids research 2014;43;3;e15
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
Ensembl (http://www.ensembl.org) is a genomic interpretation system providing the most up-to-date annotations, querying tools and access methods for chordates and key model organisms. This year we released updated annotation (gene models, comparative genomics, regulatory regions and variation) on the new human assembly, GRCh38, although we continue to support researchers using the GRCh37.p13 assembly through a dedicated site (http://grch37.ensembl.org). Our Regulatory Build has been revamped to identify regulatory regions of interest and to efficiently highlight their activity across disparate epigenetic data sets. A number of new interfaces allow users to perform large-scale comparisons of their data against our annotations. The REST server (http://rest.ensembl.org), which allows programs written in any language to query our databases, has moved to a full service alongside our upgraded website tools. Our online Variant Effect Predictor tool has been updated to process more variants and calculate summary statistics. Lastly, the WiggleTools package enables users to summarize large collections of data sets and view them as single tracks in Ensembl. The Ensembl code base itself is more accessible: it is now hosted on our GitHub organization page (https://github.com/Ensembl) under an Apache 2.0 open source license.
Funded by: Biotechnology and Biological Sciences Research Council: BB/I025360/2, BB/I025506/1, BB/K009524/1, BB/L024225/1; NHGRI NIH HHS: U41 HG007234, U41HG007234; NICHD NIH HHS: 1R01HD074078; Wellcome Trust: 095908, WT098051
Nucleic acids research 2014;43;Database issue;D662-9
Time between collection and storage significantly influences bacterial sequence composition in sputum samples from cystic fibrosis respiratory infections.
NERC Centre for Ecology & Hydrology, Wallingford, United Kingdom Institute of Pharmaceutical Science, Molecular Microbiology Research Laboratory, King's College London, London, United Kingdom.
Spontaneously expectorated sputum is traditionally used as the sampling method for the investigation of lower airway infections. While guidelines exist for the handling of these samples for culture-based diagnostic microbiology, there is no comparable consensus on their handling prior to culture-independent analysis. The increasing incorporation of culture-independent approaches in diagnostic microbiology means that it is of critical importance to assess potential biases. The aim of this study was to assess the impact of delayed freezing on culture-independent microbiological analyses and to identify acceptable parameters for sample handling. Sputum samples from eight adult cystic fibrosis (CF) patients were collected and aliquoted into sterile Bijou bottles. Aliquots were stored at room temperature before being frozen at -80 °C for increasing intervals, up to a 72-h period. Samples were treated with propidium monoazide to distinguish live from dead cells prior to DNA extraction, and 16S rRNA gene pyrosequencing was used to characterize their bacterial compositions. Substantial variation was observed in samples with high-diversity bacterial communities over time, whereas little variation was observed in low-diversity communities dominated by recognized CF pathogens, regardless of time to freezing. Partitioning into common and rare species demonstrated that the rare species drove changes in similarity. The percentage abundance of anaerobes over the study significantly decreased after 12 h at room temperature (P = 0.008). Failure to stabilize samples at -80 °C within 12 h of collection results in significant changes in the detected community composition.
Funded by: NHLBI NIH HHS: K02HL105543; NIDDK NIH HHS: P30 DK089507; Wellcome Trust: WT 098051
Journal of clinical microbiology 2014;52;8;3011-6
From genome-wide association study hits to new insights into experimental hematology.
Department of Haematology, University of Cambridge, UK; Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. Electronic address: firstname.lastname@example.org.
Despite significant improvements in our knowledge of the mechanisms of normal and pathological hematopoiesis, our current understanding is most likely an oversimplification of the complexity of regulatory networks at play. Thus, considerable efforts have been made to catalogue the total sum of germline alterations in individual genomes affecting human hematopoiesis. These efforts ultimately led to the discovery of a large number of new genes not previously implicated in blood formation. Although identification of novel genes is important in revealing the profiles of genetic variations associated with normal hematopoiesis, further functional studies are necessary to improve our understanding of the mechanism(s) involved in these processes. In this review, we summarize the knowledge gained from genome-wide association studies to elucidate the relationship between genetics and blood cell traits. We discuss the most important recent advances, with an emphasis on functional follow-up studies that have been particularly useful in providing an insight into novel regulatory processes that influence blood cell formation and function. We also discuss potential future directions and challenges in the field.
Funded by: Cancer Research UK: C45041/A14953
Experimental hematology 2014;42;8;630-6
Streptococcus agalactiae clones infecting humans were selected and fixed through the extensive use of tetracycline.
Institut Pasteur, Unité de Biologie des Bacteries Pathogènes à Gram-positif, Paris 75015, France.
Streptococcus agalactiae (Group B Streptococcus, GBS) is a commensal of the digestive and genitourinary tracts of humans that emerged as the leading cause of bacterial neonatal infections in Europe and North America during the 1960s. Due to the lack of epidemiological and genomic data, the reasons for this emergence are unknown. Here we show by comparative genome analysis and phylogenetic reconstruction of 229 isolates that the rise of human GBS infections corresponds to the selection and worldwide dissemination of only a few clones. The parallel expansion of the clones is preceded by the insertion of integrative and conjugative elements conferring tetracycline resistance (TcR). Thus, we propose that the use of tetracycline from 1948 onwards led in humans to the complete replacement of a diverse GBS population by only few TcR clones particularly well adapted to their host, causing the observed emergence of GBS diseases in neonates.
Funded by: Wellcome Trust: 079643, 098051
Nature communications 2014;5;4544
Structure and computational analysis of a novel protein with metallopeptidase-like and circularly permuted winged-helix-turn-helix domains reveals a possible role in modified polysaccharide biosynthesis.
Joint Center for Structural Genomics, La Jolla, CA, USA. email@example.com.
Background: CA_C2195 from Clostridium acetobutylicum is a protein of unknown function. Sequence analysis predicted that part of the protein contained a metallopeptidase-related domain. There are over 200 homologs of similar size in large sequence databases such as UniProt, with pairwise sequence identities in the range of ~40-60%. CA_C2195 was chosen for crystal structure determination for structure-based function annotation of novel protein sequence space.
Results: The structure confirmed that CA_C2195 contained an N-terminal metallopeptidase-like domain. The structure revealed two extra domains: an α+β domain inserted in the metallopeptidase-like domain and a C-terminal circularly permuted winged-helix-turn-helix domain.
Conclusions: Based on our sequence and structural analyses using the crystal structure of CA_C2195 we provide a view into the possible functions of the protein. From contextual information from gene-neighborhood analysis, we propose that rather than being a peptidase, CA_C2195 and its homologs might play a role in biosynthesis of a modified cell-surface carbohydrate in conjunction with several sugar-modification enzymes. These results provide the groundwork for the experimental verification of the function.
Funded by: Intramural NIH HHS; Medical Research Council: MC_U105192716; NIGMS NIH HHS: R01GM101457, U54 GM094586; Wellcome Trust: WT077044/Z/05/Z
BMC bioinformatics 2014;15;75
Emergence of scarlet fever Streptococcus pyogenes emm12 clones in Hong Kong is associated with toxin acquisition and multidrug resistance.
1] Australian Infectious Diseases Research Centre, School of Chemistry and Molecular Biosciences, University of Queensland, St. Lucia, Queensland, Australia.  Wellcome Trust Sanger Institute, Hinxton, UK.
A scarlet fever outbreak began in mainland China and Hong Kong in 2011 (refs. 1-6). Macrolide- and tetracycline-resistant Streptococcus pyogenes emm12 isolates represent the majority of clinical cases. Recently, we identified two mobile genetic elements that were closely associated with emm12 outbreak isolates: the integrative and conjugative element ICE-emm12, encoding genes for tetracycline and macrolide resistance, and prophage ΦHKU.vir, encoding the superantigens SSA and SpeC, as well as the DNase Spd1 (ref. 4). Here we sequenced the genomes of 141 emm12 isolates, including 132 isolated in Hong Kong between 2005 and 2011. We found that the introduction of several ICE-emm12 variants, ΦHKU.vir and a new prophage, ΦHKU.ssa, occurred in three distinct emm12 lineages late in the twentieth century. Acquisition of ssa and transposable elements encoding multidrug resistance genes triggered the expansion of scarlet fever-associated emm12 lineages in Hong Kong. The occurrence of multidrug-resistant ssa-harboring scarlet fever strains should prompt heightened surveillance within China and abroad for the dissemination of these mobile genetic elements.
Funded by: Wellcome Trust: 100891
Nature genetics 2014;47;1;84-7
The correlation between reading and mathematics ability at age twelve has a substantial genetic component.
1] Department of Genetics, Evolution and Environment, UCL Genetics Institute, University College London, London WC1E 6BT, UK  King's College London, Social Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, London SE5 8AF, UK .
Dissecting how genetic and environmental influences impact on learning is helpful for maximizing numeracy and literacy. Here we show, using twin and genome-wide analysis, that there is a substantial genetic component to children's ability in reading and mathematics, and estimate that around one half of the observed correlation in these traits is due to shared genetic effects (so-called Generalist Genes). Thus, our results highlight the potential role of the learning environment in contributing to differences in a child's cognitive abilities at age twelve.
Funded by: European Research Council: 295366; Medical Research Council: G0000934, G0400126, G0901245, G19/2, G9815508, MC_PC_15018; NICHD NIH HHS: R01 HD068728; Wellcome Trust: 068545/Z/02, 075491/Z/04/B, 085475/B/08/Z, 090532, 090532/Z/09/Z, 095552, 097364/Z/11/Z, 102215, 85475/Z/08/Z, WT088984
Nature communications 2014;5;4204
Spatial and temporal diversity in genomic instability processes defines lung cancer evolution.
Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London WC1E 6BT, UK.
Spatial and temporal dissection of the genomic changes occurring during the evolution of human non-small cell lung cancer (NSCLC) may help elucidate the basis for its dismal prognosis. We sequenced 25 spatially distinct regions from seven operable NSCLCs and found evidence of branched evolution, with driver mutations arising before and after subclonal diversification. There was pronounced intratumor heterogeneity in copy number alterations, translocations, and mutations associated with APOBEC cytidine deaminase activity. Despite maintained carcinogen exposure, tumors from smokers showed a relative decrease in smoking-related mutations over time, accompanied by an increase in APOBEC-associated mutations. In tumors from former smokers, genome-doubling occurred within a smoking-signature context before subclonal diversification, which suggested that a long period of tumor latency had preceded clinical detection. The regionally separated driver mutations, coupled with the relentless and heterogeneous nature of the genome instability processes, are likely to confound treatment success in NSCLC.
Funded by: Cancer Research UK: A11590, A17786, A19310, A4688; Medical Research Council: G0902275; Wellcome Trust: 088340, 091730, 105104
Science (New York, N.Y.) 2014;346;6206;251-6
Genome sequencing of disease and carriage isolates of nontypeable Haemophilus influenzae identifies discrete population structure.
Novartis Vaccines, 53100 Siena, Italy.
One of the main hurdles for the development of an effective and broadly protective vaccine against nonencapsulated isolates of Haemophilus influenzae (NTHi) lies in the genetic diversity of the species, which renders extremely difficult the identification of cross-protective candidate antigens. To assess whether a population structure of NTHi could be defined, we performed genome sequencing of a collection of diverse clinical isolates representative of both carriage and disease and of the diversity of the natural population. Analysis of the distribution of polymorphic sites in the core genome and of the composition of the accessory genome defined distinct evolutionary clades and supported a predominantly clonal evolution of NTHi, with the majority of genetic information transmitted vertically within lineages. A correlation between the population structure and the presence of selected surface-associated proteins and lipooligosaccharide structure, known to contribute to virulence, was found. This high-resolution, genome-based population structure of NTHi provides the foundation to obtain a better understanding, of NTHi adaptation to the host as well as its commensal and virulence behavior, that could facilitate intervention strategies against disease caused by this important human pathogen.
Proceedings of the National Academy of Sciences of the United States of America 2014;111;14;5439-44
Chromatin landscapes of retroviral and transposon integration profiles.
Computational Cancer Biology Group, Division of Molecular Carcinogenesis, The Netherlands Cancer Institute, Amsterdam, The Netherlands; Netherlands Consortium for Systems Biology, Amsterdam, The Netherlands.
The ability of retroviruses and transposons to insert their genetic material into host DNA makes them widely used tools in molecular biology, cancer research and gene therapy. However, these systems have biases that may strongly affect research outcomes. To address this issue, we generated very large datasets consisting of ~ 120,000 to ~ 180,000 unselected integrations in the mouse genome for the Sleeping Beauty (SB) and piggyBac (PB) transposons, and the Mouse Mammary Tumor Virus (MMTV). We analyzed ~ 80 (epi)genomic features to generate bias maps at both local and genome-wide scales. MMTV showed a remarkably uniform distribution of integrations across the genome. More distinct preferences were observed for the two transposons, with PB showing remarkable resemblance to bias profiles of the Murine Leukemia Virus. Furthermore, we present a model where target site selection is directed at multiple scales. At a large scale, target site selection is similar across systems, and defined by domain-oriented features, namely expression of proximal genes, proximity to CpG islands and to genic features, chromatin compaction and replication timing. Notable differences between the systems are mainly observed at smaller scales, and are directed by a diverse range of features. To study the effect of these biases on integration sites occupied under selective pressure, we turned to insertional mutagenesis (IM) screens. In IM screens, putative cancer genes are identified by finding frequently targeted genomic regions, or Common Integration Sites (CISs). Within three recently completed IM screens, we identified 7%-33% putative false positive CISs, which are likely not the result of the oncogenic selection process. Moreover, results indicate that PB, compared to SB, is more suited to tag oncogenes.
PLoS genetics 2014;10;4;e1004250
Synaptic, transcriptional and chromatin genes disrupted in autism.
The genetic architecture of autism spectrum disorder involves the interplay of common and rare variants and their impact on hundreds of genes. Using exome sequencing, here we show that analysis of rare coding variation in 3,871 autism cases and 9,937 ancestry-matched or parental controls implicates 22 autosomal genes at a false discovery rate (FDR) < 0.05, plus a set of 107 autosomal genes strongly enriched for those likely to affect risk (FDR < 0.30). These 107 genes, which show unusual evolutionary constraint against mutations, incur de novo loss-of-function mutations in over 5% of autistic subjects. Many of the genes implicated encode proteins for synaptic formation, transcriptional regulation and chromatin-remodelling pathways. These include voltage-gated ion channels regulating the propagation of action potentials, pacemaking and excitability-transcription coupling, as well as histone-modifying enzymes and chromatin remodellers-most prominently those that mediate post-translational lysine methylation/demethylation modifications of histones.
Funded by: Howard Hughes Medical Institute; Medical Research Council: G0500870, MR/L010305/1; NCATS NIH HHS: UL1 TR000445, UL1TR000445; NCRR NIH HHS: 5UL1 RR024975, UL1 RR024975; NHGRI NIH HHS: T32 HG002295, U54 HG003067; NICHD NIH HHS: P30 HD015052, P30 HD15052, P50 HD055751; NIMH NIH HHS: MH077139, MH089482, MH095034, R01 MH061009, R01 MH077139, R01 MH083565, R01 MH089208, R01 MH089482, R01 MH094400, R01 MH095034, R01 MH095797, R01 MH097849, R01MH083565, R01MH089208, R37 MH057881, RC2 MH089952, RC2MH089952, U01 MH100209, U01 MH100229, U01 MH100233, U01 MH100239, U01MH100209, U01MH100229, U01MH100233, U01MH100239; NINDS NIH HHS: R01 NS073601; Wellcome Trust: 091986, WT091310, WT098051
Knowing who to trust: exploring the role of 'ethical metadata' in mediating risk of harm in collaborative genomics research in Africa.
Department of Medicine, University of Cape Town, Anzio Road Observatory, Cape Town 7925, South Africa. Jantina.firstname.lastname@example.org.
Background: The practice of making datasets publicly available for use by the wider scientific community has become firmly integrated in genomic science. One significant gap in literature around data sharing concerns how it impacts on scientists' ability to preserve values and ethical standards that form an essential component of scientific collaborations. We conducted a qualitative sociological study examining the potential for harm to ethnic groups, and implications of such ethical concerns for data sharing. We focused our empirical work on the MalariaGEN Consortium, one of the first international collaborative genomics research projects in Africa.
Methods: We conducted a study in three MalariaGEN project sites in Kenya, the Gambia, and the United Kingdom. The study entailed analysis of project documents and 49 semi-structured interviews with fieldworkers, researchers and ethics committee members.
Results: Concerns about how best to address the potential for harm to ethnic groups in MalariaGEN crystallised in discussions about the development of a data sharing policy. Particularly concerning for researchers was how best to manage the sharing of genomic data outside of the original collaboration. Within MalariaGEN, genomic data is accompanied by information about the locations of sample collection, the limitations of consent and ethics approval, and the values and relations that accompanied sample collection. For interviewees, this information and context were of important ethical value in safeguarding against harmful uses of data, but is not customarily shared with secondary data users. This challenged the ability of primary researchers to protect against harmful uses of 'their' data.
Conclusion: We identified three protective mechanisms--trust, the existence of a shared morality, and detailed contextual understanding--which together might play an important role in preventing the use of genomic data in ways that could harm the ethnic groups included in the study. We suggest that the current practice of sharing of datasets as isolated objects rather than as embedded within a particular scientific culture, without regard for the normative context within which samples were collected, may cause ethical tensions to emerge that could have been prevented or addressed had the 'ethical metadata' that accompanies genomic data also been shared.
Funded by: Medical Research Council: G0600718; Wellcome Trust: 090532, 090770, 090770/Z/09/Z, 091758, WT076934/Z/05/Z, WT077383/Z/05/Z, WT083326, WT087285, WT096527
BMC medical ethics 2014;15;62
Large-scale discovery of novel genetic causes of developmental disorders.
Despite three decades of successful, predominantly phenotype-driven discovery of the genetic causes of monogenic disorders, up to half of children with severe developmental disorders of probable genetic origin remain without a genetic diagnosis. Particularly challenging are those disorders rare enough to have eluded recognition as a discrete clinical entity, those with highly variable clinical manifestations, and those that are difficult to distinguish from other, very similar, disorders. Here we demonstrate the power of using an unbiased genotype-driven approach to identify subsets of patients with similar disorders. By studying 1,133 children with severe, undiagnosed developmental disorders, and their parents, using a combination of exome sequencing and array-based detection of chromosomal rearrangements, we discovered 12 novel genes associated with developmental disorders. These newly implicated genes increase by 10% (from 28% to 31%) the proportion of children that could be diagnosed. Clustering of missense mutations in six of these newly implicated genes suggests that normal development is being perturbed by an activating or dominant-negative mechanism. Our findings demonstrate the value of adopting a comprehensive strategy, both genome-wide and nationwide, to elucidate the underlying causes of rare genetic disorders.
Funded by: Chief Scientist Office: CZD/16/6; Department of Health; Medical Research Council: MC_PC_U127561093; Wellcome Trust: 091986, 098395, 100140, WT098051
Genome-wide association meta-analysis of human longevity identifies a novel locus conferring survival beyond 90 years of age.
Department of Molecular Epidemiology, Netherlands Consortium for Healthy Ageing.
The genetic contribution to the variation in human lifespan is ∼ 25%. Despite the large number of identified disease-susceptibility loci, it is not known which loci influence population mortality. We performed a genome-wide association meta-analysis of 7729 long-lived individuals of European descent (≥ 85 years) and 16 121 younger controls (<65 years) followed by replication in an additional set of 13 060 long-lived individuals and 61 156 controls. In addition, we performed a subset analysis in cases aged ≥ 90 years. We observed genome-wide significant association with longevity, as reflected by survival to ages beyond 90 years, at a novel locus, rs2149954, on chromosome 5q33.3 (OR = 1.10, P = 1.74 × 10(-8)). We also confirmed association of rs4420638 on chromosome 19q13.32 (OR = 0.72, P = 3.40 × 10(-36)), representing the TOMM40/APOE/APOC1 locus. In a prospective meta-analysis (n = 34 103), the minor allele of rs2149954 (T) on chromosome 5q33.3 associates with increased survival (HR = 0.95, P = 0.003). This allele has previously been reported to associate with low blood pressure in middle age. Interestingly, the minor allele (T) associates with decreased cardiovascular mortality risk, independent of blood pressure. We report on the first GWAS-identified longevity locus on chromosome 5q33.3 influencing survival in the general European population. The minor allele of this locus associates with low blood pressure in middle age, although the contribution of this allele to survival may be less dependent on blood pressure. Hence, the pleiotropic mechanisms by which this intragenic variation contributes to lifespan regulation have to be elucidated.
Funded by: Biotechnology and Biological Sciences Research Council: BB/F010966/1, BB/I020748/1; European Research Council: 230374; Medical Research Council: G0500997, G0601333, MR/J012165/1, MR/J50001X/1, MR/K006312/1; NHGRI NIH HHS: T32 HG002536, U41 HG007234; NIA NIH HHS: P01AG08761; NIDDK NIH HHS: U01DK066134; NIMH NIH HHS: MH081802; PHS HHS: NIMH U24 MH068457-06, R01D0042157-01A; Wellcome Trust: 084762, 085475, 087436
Human molecular genetics 2014;23;16;4420-32
Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel.
Department of Statistics, University of Oxford, Oxford OX1 3TG, UK.
A major use of the 1000 Genomes Project (1000 GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or 'scaffold') of haplotypes across each chromosome. We then phase the sequence data 'onto' this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000 GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants.
Funded by: Medical Research Council: G0801823; NCI NIH HHS: R01 CA166661
Nature communications 2014;5;3934
Phylogenetic studies of transmission dynamics in generalized HIV epidemics: an essential tool where the burden is greatest?
*Division of Infectious Diseases, University of North Carolina at Chapel Hill, Chapel Hill, NC; †Department of Microbiology, University of Washington, Seattle, WA; ‡Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom; §Wellcome Trust Sanger Institute, Cambridge, United Kingdom; ‖Division of Infection and Immunity, University College London, London, United Kingdom; ¶Wellcome Trust-Africa Centre for Health and Population Studies, University of Kwazula-Natal, ZA; and #Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom.
Efficient and effective HIV prevention measures for generalized epidemics in sub-Saharan Africa have not yet been validated at the population level. Design and impact evaluation of such measures requires fine-scale understanding of local HIV transmission dynamics. The novel tools of HIV phylogenetics and molecular epidemiology may elucidate these transmission dynamics. Such methods have been incorporated into studies of concentrated HIV epidemics to identify proximate and determinant traits associated with ongoing transmission. However, applying similar phylogenetic analyses to generalized epidemics, including the design and evaluation of prevention trials, presents additional challenges. Here we review the scope of these methods and present examples of their use in concentrated epidemics in the context of prevention. Next, we describe the current uses for phylogenetics in generalized epidemics and discuss their promise for elucidating transmission patterns and informing prevention trials. Finally, we review logistic and technical challenges inherent to large-scale molecular epidemiological studies of generalized epidemics and suggest potential solutions.
Funded by: NCATS NIH HHS: KL2 TR000084, KL2TR000084; NIAID NIH HHS: P30 AI027757, P30 AI050410, P30AI027757; NIMHD NIH HHS: L60 MD005444; Wellcome Trust: 097410
Journal of acquired immune deficiency syndromes (1999) 2014;67;2;181-95
Mitochondrial genome sequencing in Mesolithic North East Europe Unearths a new sub-clade within the broadly distributed human haplogroup C1.
Australian Centre for Ancient DNA, School of Earth and Environmental Sciences, University of Adelaide, Adelaide, South Australia, Australia.
The human mitochondrial haplogroup C1 has a broad global distribution but is extremely rare in Europe today. Recent ancient DNA evidence has demonstrated its presence in European Mesolithic individuals. Three individuals from the 7,500 year old Mesolithic site of Yuzhnyy Oleni Ostrov, Western Russia, could be assigned to haplogroup C1 based on mitochondrial hypervariable region I sequences. However, hypervariable region I data alone could not provide enough resolution to establish the phylogenetic relationship of these Mesolithic haplotypes with haplogroup C1 mitochondrial DNA sequences found today in populations of Europe, Asia and the Americas. In order to obtain high-resolution data and shed light on the origin of this European Mesolithic C1 haplotype, we target-enriched and sequenced the complete mitochondrial genome of one Yuzhnyy Oleni Ostrov C1 individual. The updated phylogeny of C1 haplogroups indicated that the Yuzhnyy Oleni Ostrov haplotype represents a new distinct clade, provisionally coined "C1f". We show that all three C1 carriers of Yuzhnyy Oleni Ostrov belong to this clade. No haplotype closely related to the C1f sequence could be found in the large current database of ancient and present-day mitochondrial genomes. Hence, we have discovered past human mitochondrial diversity that has not been observed in modern-day populations so far. The lack of positive matches in modern populations may be explained by under-sampling of rare modern C1 carriers or by demographic processes, population extinction or replacement, that may have impacted on populations of Northeast Europe since prehistoric times.
PloS one 2014;9;2;e87612
Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility.
To further understanding of the genetic basis of type 2 diabetes (T2D) susceptibility, we aggregated published meta-analyses of genome-wide association studies (GWAS), including 26,488 cases and 83,964 controls of European, east Asian, south Asian and Mexican and Mexican American ancestry. We observed a significant excess in the directional consistency of T2D risk alleles across ancestry groups, even at SNPs demonstrating only weak evidence of association. By following up the strongest signals of association from the trans-ethnic meta-analysis in an additional 21,491 cases and 55,647 controls of European ancestry, we identified seven new T2D susceptibility loci. Furthermore, we observed considerable improvements in the fine-mapping resolution of common variant association signals at several T2D susceptibility loci. These observations highlight the benefits of trans-ethnic GWAS for the discovery and characterization of complex trait loci and emphasize an exciting opportunity to extend insight into the genetic architecture and pathogenesis of human diseases across populations of diverse ancestry.
Funded by: British Heart Foundation: RG/08/008/25291, RG/08/014/24067; Canadian Institutes of Health Research; Chief Scientist Office: CZB/4/710; Medical Research Council: G0601261, G0801056, G1002084, G9521010, MC_PC_U127592696, MC_U106179471, MC_UP_A100_1003, MC_UU_12015/1, MC_UU_12015/2, MC_UU_12015/5, MR/K006584/1, MR/L003120/1; NCATS NIH HHS: UL1 TR001079; NHGRI NIH HHS: HG000376, R01 HG000376; NICHD NIH HHS: R24 HD050924; NIDDK NIH HHS: DK062370, DK073541, DK085501, DK085545, DK085584, K24 DK080140, R01 DK062370, R01 DK072193, R01 DK073541, R01 DK078616, R01 DK093757, U01 DK062370, U01 DK085501, U01 DK085545, U01 DK085584; NIGMS NIH HHS: T32 GM007753; Wellcome Trust: 090532, 095552, 098017, 098381, WT081682, WT085475, WT090367
Nature genetics 2014;46;3;234-44
DNA methylation and body-mass index: a genome-wide analysis.
Department of Cardiovascular Sciences, University of Leicester, Leicester, UK; National Institute for Health Research Leicester Cardiovascular Biomedical Research Unit, Glenfield Hospital, Leicester, UK.
Background: Obesity is a major health problem that is determined by interactions between lifestyle and environmental and genetic factors. Although associations between several genetic variants and body-mass index (BMI) have been identified, little is known about epigenetic changes related to BMI. We undertook a genome-wide analysis of methylation at CpG sites in relation to BMI.
Methods: 479 individuals of European origin recruited by the Cardiogenics Consortium formed our discovery cohort. We typed their whole-blood DNA with the Infinium HumanMethylation450 array. After quality control, methylation levels were tested for association with BMI. Methylation sites showing an association with BMI at a false discovery rate q value of 0·05 or less were taken forward for replication in a cohort of 339 unrelated white patients of northern European origin from the MARTHA cohort. Sites that remained significant in this primary replication cohort were tested in a second replication cohort of 1789 white patients of European origin from the KORA cohort. We examined whether methylation levels at identified sites also showed an association with BMI in DNA from adipose tissue (n=635) and skin (n=395) obtained from white female individuals participating in the MuTHER study. Finally, we examined the association of methylation at BMI-associated sites with genetic variants and with gene expression.
Findings: 20 individuals from the discovery cohort were excluded from analyses after quality-control checks, leaving 459 participants. After adjustment for covariates, we identified an association (q value ≤0·05) between methylation at five probes across three different genes and BMI. The associations with three of these probes--cg22891070, cg27146050, and cg16672562, all of which are in intron 1 of HIF3A--were confirmed in both the primary and second replication cohorts. For every 0·1 increase in methylation β value at cg22891070, BMI was 3·6% (95% CI 2·4-4·9) higher in the discovery cohort, 2·7% (1·2-4·2) higher in the primary replication cohort, and 0·8% (0·2-1·4) higher in the second replication cohort. For the MuTHER cohort, methylation at cg22891070 was associated with BMI in adipose tissue (p=1·72 × 10(-5)) but not in skin (p=0·882). We observed a significant inverse correlation (p=0·005) between methylation at cg22891070 and expression of one HIF3A gene-expression probe in adipose tissue. Two single nucleotide polymorphisms--rs8102595 and rs3826795--had independent associations with methylation at cg22891070 in all cohorts. However, these single nucleotide polymorphisms were not significantly associated with BMI.
Interpretation: Increased BMI in adults of European origin is associated with increased methylation at the HIF3A locus in blood cells and in adipose tissue. Our findings suggest that perturbation of hypoxia inducible transcription factor pathways could have an important role in the response to increased weight in people.
Funding: The European Commission, National Institute for Health Research, British Heart Foundation, and Wellcome Trust.
Funded by: British Heart Foundation: RG/09/012/28096; Canadian Institutes of Health Research: MOP 86466; Department of Health: NF-SI-0611-10170, RP-PG-0310-1002; Wellcome Trust: 081917/Z/07/Z, 098051
Lancet (London, England) 2014;383;9933;1990-8
Open-source electronic data capture system offered increased accuracy and cost-effectiveness compared with paper methods in Africa.
International Health Research Group, Department of Public Health and Primary Care, University of Cambridge, Strangeways Research Laboratory, Wort's Causeway, Cambridge, CB1 8RN, United Kingdom; Genetic Epidemiology Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1HH, United Kingdom.
Objectives: Existing electronic data capture options are often financially unfeasible in resource-poor settings or difficult to support technically in the field. To help facilitate large-scale multicenter studies in sub-Saharan Africa, the African Partnership for Chronic Disease Research (APCDR) has developed an open-source electronic questionnaire (EQ).
Study design and setting: To assess its relative validity, we compared the EQ against traditional pen-and-paper methods using 200 randomized interviews conducted in an ongoing type 2 diabetes case-control study in South Africa.
Results: During its 3-month validation, the EQ had a lower frequency of errors (EQ, 0.17 errors per 100 questions; paper, 0.73 errors per 100 questions; P-value ≤0.001), and a lower monetary cost per correctly entered question, compared with the pen-and-paper method. We found no marked difference in the average duration of the interview between methods (EQ, 5.4 minutes; paper, 5.6 minutes).
Conclusion: This validation study suggests that the EQ may offer increased accuracy, similar interview duration, and increased cost-effectiveness compared with paper-based data collection methods. The APCDR EQ software is freely available (https://github.com/apcdr/questionnaire).
Funded by: Medical Research Council: G0901213, MR/K013491/1
Journal of clinical epidemiology 2014;67;12;1358-63
Estimating telomere length from whole genome sequence data.
Genome Informatics, Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK.
Telomeres play a key role in replicative ageing and undergo age-dependent attrition in vivo. Here, we report a novel method, TelSeq, to measure average telomere length from whole genome or exome shotgun sequence data. In 260 leukocyte samples, we show that TelSeq results correlate with Southern blot measurements of the mean length of terminal restriction fragments (mTRFs) and display age-dependent attrition comparably well as mTRFs.
Funded by: Medical Research Council: G0600717; NIA NIH HHS: R01AG030678; NICHD NIH HHS: R01HD071180; Wellcome Trust: 100140, WT091310, WT098051
Nucleic acids research 2014;42;9;e75
Quantitative genetics of CTCF binding reveal local sequence effects and different modes of X-chromosome association.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom.
Associating genetic variation with quantitative measures of gene regulation offers a way to bridge the gap between genotype and complex phenotypes. In order to identify quantitative trait loci (QTLs) that influence the binding of a transcription factor in humans, we measured binding of the multifunctional transcription and chromatin factor CTCF in 51 HapMap cell lines. We identified thousands of QTLs in which genotype differences were associated with differences in CTCF binding strength, hundreds of them confirmed by directly observable allele-specific binding bias. The majority of QTLs were either within 1 kb of the CTCF binding motif, or in linkage disequilibrium with a variant within 1 kb of the motif. On the X chromosome we observed three classes of binding sites: a minority class bound only to the active copy of the X chromosome, the majority class bound to both the active and inactive X, and a small set of female-specific CTCF sites associated with two non-coding RNA genes. In sum, our data reveal extensive genetic effects on CTCF binding, both direct and indirect, and identify a diversity of patterns of CTCF binding on the X chromosome.
Funded by: NCI NIH HHS: CA130075, R01 CA130075
PLoS genetics 2014;10;11;e1004798
Identification of genes important for cutaneous function revealed by a large scale reverse genetic screen in the mouse.
Department of Biochemistry and Molecular Biology, Monash University, Clayton, Melbourne, Australia.
The skin is a highly regenerative organ which plays critical roles in protecting the body and sensing its environment. Consequently, morbidity and mortality associated with skin defects represent a significant health issue. To identify genes important in skin development and homeostasis, we have applied a high throughput, multi-parameter phenotype screen to the conditional targeted mutant mice generated by the Wellcome Trust Sanger Institute's Mouse Genetics Project (Sanger-MGP). A total of 562 different mouse lines were subjected to a variety of tests assessing cutaneous expression, macroscopic clinical disease, histological change, hair follicle cycling, and aberrant marker expression. Cutaneous lesions were associated with mutations in 23 different genes. Many of these were not previously associated with skin disease in the organ (Mysm1, Vangl1, Trpc4ap, Nom1, Sparc, Farp2, and Prkab1), while others were ascribed new cutaneous functions on the basis of the screening approach (Krt76, Lrig1, Myo5a, Nsun2, and Nf1). The integration of these skin specific screening protocols into the Sanger-MGP primary phenotyping pipelines marks the largest reported reverse genetic screen undertaken in any organ and defines approaches to maximise the productivity of future projects of this nature, while flagging genes for further characterisation.
Funded by: Medical Research Council: G0300212, MC_QA137918; NCI NIH HHS: P30 CA034196; NIAMS NIH HHS: AR063781, R01 AR049288, R01 AR056635; Wellcome Trust: 096540, 098051, 100669
PLoS genetics 2014;10;10;e1004705
Epidermal Wnt/β-catenin signaling regulates adipocyte differentiation via secretion of adipogenic factors.
Centre for Stem Cells and Regenerative Medicine, Kings College London, London SE1 9RT, United Kingdom.
It has long been recognized that the hair follicle growth cycle and oscillation in the thickness of the underlying adipocyte layer are synchronized. Although factors secreted by adipocytes are known to regulate the hair growth cycle, it is unclear whether the epidermis can regulate adipogenesis. We show that inhibition of epidermal Wnt/β-catenin signaling reduced adipocyte differentiation in developing and adult mouse dermis. Conversely, ectopic activation of epidermal Wnt signaling promoted adipocyte differentiation and hair growth. When the Wnt pathway was activated in the embryonic epidermis, there was a dramatic and premature increase in adipocytes in the absence of hair follicle formation, demonstrating that Wnt activation, rather than mature hair follicles, is required for adipocyte generation. Epidermal and dermal gene expression profiling identified keratinocyte-derived adipogenic factors that are induced by β-catenin activation. Wnt/β-catenin signaling-dependent secreted factors from keratinocytes promoted adipocyte differentiation in vitro, and we identified ligands for the bone morphogenetic protein and insulin pathways as proadipogenic factors. Our results indicate epidermal Wnt/β-catenin as a critical initiator of a signaling cascade that induces adipogenesis and highlight the role of epidermal Wnt signaling in synchronizing adipocyte differentiation with the hair growth cycle.
Funded by: Cancer Research UK; Department of Health; Medical Research Council; Wellcome Trust: 096540
Proceedings of the National Academy of Sciences of the United States of America 2014;111;15;E1501-9
Novel determinants of antibiotic resistance: identification of mutated loci in highly methicillin-resistant subpopulations of methicillin-resistant Staphylococcus aureus.
We identified mutated genes in highly resistant subpopulations of methicillin-resistant Staphylococcus aureus (MRSA) that are most likely responsible for the historic failure of the β-lactam family of antibiotics as therapeutic agents against these important pathogens. Such subpopulations are produced during growth of most clinical MRSA strains, including the four historically early MRSA isolates studied here. Chromosomal DNA was prepared from the highly resistant cells along with DNA from the majority of cells (poorly resistant cells) followed by full genome sequencing. In the highly resistant cells, mutations were identified in 3 intergenic sequences and 27 genes representing a wide range of functional categories. A common feature of these mutations appears to be their capacity to induce high-level β-lactam resistance and increased amounts of the resistance protein PBP2A in the bacteria. The observations fit a recently described model in which the ultimate controlling factor of the phenotypic expression of β-lactam resistance in MRSA is a RelA-mediated stringent response. IMPORTANCE It has been well established that the level of antibiotic resistance (i.e., minimum concentration of a β-lactam antibiotic needed to inhibit growth) of a methicillin-resistant Staphylococcus aureus (MRSA) strain depends on the transcription and translation of the resistance protein PBP2A. Here we describe mutated loci in an additional novel set of genetic determinants that appear to be essential for the unusually high resistance levels typical of subpopulations of staphylococci that are produced with unique low frequency in most MRSA clinical isolates. We propose that mutations in these determinants can trigger induction of the stringent stress response which was recently shown to cause increased transcription/translation of the resistance protein PBP2A in parallel with the increased level of resistance.
Funded by: NCATS NIH HHS: UL1 TR000043-07S1; NIAID NIH HHS: 2 RO1 AI457838-14; Wellcome Trust: 098051
Salmonella enterica serovar Typhi and the pathogenesis of typhoid fever.
The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom; email: email@example.com.
Salmonella enterica serovar Typhi, the cause of typhoid, is host restricted to humans. S. Typhi has a monophyletic population structure, indicating that typhoid in humans is a relatively new disease. Antimicrobial usage is reshaping the current S. Typhi global population and may be driving the emergence of a specific haplotype, H58, that is well adapted to transmission in modern settings and is able to resist antimicrobial killing more efficiently than other S. Typhi. Evidence gathered through genomics and functional studies using the mouse and in vitro cell systems, together with clinical investigations, has provided insight into the mechanisms that underpin the pathogenesis of human typhoid and host restriction. Here we review the latest scientific advances in typhoid research and discuss how these novel approaches are changing our understanding of the disease.
Funded by: Wellcome Trust: 100087, 100087/Z/12/Z, 100891
Annual review of microbiology 2014;68;317-36
Neutralization of Plasmodium falciparum merozoites by antibodies against PfRH5.
Jenner Institute, University of Oxford, Oxford OX3 7DQ, United Kingdom;
There is intense interest in induction and characterization of strain-transcending neutralizing Ab against antigenically variable human pathogens. We have recently identified the human malaria parasite Plasmodium falciparum reticulocyte-binding protein homolog 5 (PfRH5) as a target of broadly neutralizing Abs, but there is little information regarding the functional mechanism(s) of Ab-mediated neutralization. In this study, we report that vaccine-induced polyclonal anti-PfRH5 Abs inhibit the tight attachment of merozoites to erythrocytes and are capable of blocking the interaction of PfRH5 with its receptor basigin. Furthermore, by developing anti-PfRH5 mAbs, we provide evidence of the following: 1) the ability to block the PfRH5-basigin interaction in vitro is predictive of functional activity, but absence of blockade does not predict absence of functional activity; 2) neutralizing mAbs bind spatially related epitopes on the folded protein, involving at least two defined regions of the PfRH5 primary sequence; 3) a brief exposure window of PfRH5 is likely to necessitate rapid binding of Ab to neutralize parasites; and 4) intact bivalent IgG contributes to but is not necessary for parasite neutralization. These data provide important insight into the mechanisms of broadly neutralizing anti-malaria Abs and further encourage anti-PfRH5-based malaria prevention efforts.
Funded by: Medical Research Council: G1000527, MC_U117532067, U117532067; Wellcome Trust: 089455, 089455/2/09/z, 092873, 092873/z/10/z, 098051
Journal of immunology (Baltimore, Md. : 1950) 2014;192;1;245-58
A strategy to identify dominant point mutant modifiers of a quantitative trait.
McArdle Laboratory for Cancer Research, Department of Oncology, University of Wisconsin-Madison, Madison, Wisconsin 53706 Laboratory of Genetics, University of Wisconsin-Madison, Madison, Wisconsin 53706 firstname.lastname@example.org.
A central goal in the analysis of complex traits is to identify genes that modify a phenotype. Modifiers of a cancer phenotype may act either intrinsically or extrinsically on the salient cell lineage. Germline point mutagenesis by ethylnitrosourea can provide alleles for a gene of interest that include loss-, gain-, or alteration-of-function. Unlike strain polymorphisms, point mutations with heterozygous quantitative phenotypes are detectable in both essential and nonessential genes and are unlinked from other variants that might confound their identification and analysis. This report analyzes strategies seeking quantitative mutational modifiers of Apc(Min) in the mouse. To identify a quantitative modifier of a phenotype of interest, a cluster of test progeny is needed. The cluster size can be increased as necessary for statistical significance if the founder is a male whose sperm is cryopreserved. A second critical element in this identification is a mapping panel free of polymorphic modifiers of the phenotype, to enable low-resolution mapping followed by targeted resequencing to identify the causative mutation. Here, we describe the development of a panel of six "isogenic mapping partner lines" for C57BL/6J, carrying single-nucleotide markers introduced by mutagenesis. One such derivative, B6.SNVg, shown to be phenotypically neutral in combination with Apc(Min), is an appropriate mapping partner to locate induced mutant modifiers of the Apc(Min) phenotype. The evolved strategy can complement four current major initiatives in the genetic analysis of complex systems: the Genome-wide Association Study; the Collaborative Cross; the Knockout Mouse Project; and The Cancer Genome Atlas.
Funded by: Cancer Research UK: 13031; Medical Research Council: G0800024, MR/L007428/1; NCI NIH HHS: 2P30CA014520, P30 CA014520, R37 CA063677, R37CA063677; NIEHS NIH HHS: T32 ES007015; Wellcome Trust: 098051
G3 (Bethesda, Md.) 2014;4;6;1113-21
Mosaic aneuploidy in Leishmania: the perspective of whole genome sequencing.
Unit of Molecular Parasitology, Department of Biomedical Sciences, Institute of Tropical Medicine, Antwerp, Belgium; Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium. Electronic address: email@example.com.
Trends in parasitology 2014;30;12;554-5
Efficient haplotype matching and storage using the positional Burrows-Wheeler transform (PBWT).
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, UK.
Motivation: Over the last few years, methods based on suffix arrays using the Burrows-Wheeler Transform have been widely used for DNA sequence read matching and assembly. These provide very fast search algorithms, linear in the search pattern size, on a highly compressible representation of the dataset being searched. Meanwhile, algorithmic development for genotype data has concentrated on statistical methods for phasing and imputation, based on probabilistic matching to hidden Markov model representations of the reference data, which while powerful are much less computationally efficient. Here a theory of haplotype matching using suffix array ideas is developed, which should scale too much larger datasets than those currently handled by genotype algorithms.
Results: Given M sequences with N bi-allelic variable sites, an O(NM) algorithm to derive a representation of the data based on positional prefix arrays is given, which is termed the positional Burrows-Wheeler transform (PBWT). On large datasets this compresses with run-length encoding by more than a factor of a hundred smaller than using gzip on the raw data. Using this representation a method is given to find all maximal haplotype matches within the set in O(NM) time rather than O(NM(2)) as expected from naive pairwise comparison, and also a fast algorithm, empirically independent of M given sufficient memory for indexes, to find maximal matches between a new sequence and the set. The discussion includes some proposals about how these approaches could be used for imputation and phasing.
Funded by: Wellcome Trust: 098051
Bioinformatics (Oxford, England) 2014;30;9;1266-72
CYP6 P450 enzymes and ACE-1 duplication produce extreme and multiple insecticide resistance in the malaria mosquito Anopheles gambiae.
Vector Biology Department, Liverpool School of Tropical Medicine, Pembroke Place, Liverpool, United Kingdom; Centre Suisse de Recherches Scientifiques en Côte d'Ivoire, Abidjan, Cote d'Ivoire.
Malaria control relies heavily on pyrethroid insecticides, to which susceptibility is declining in Anopheles mosquitoes. To combat pyrethroid resistance, application of alternative insecticides is advocated for indoor residual spraying (IRS), and carbamates are increasingly important. Emergence of a very strong carbamate resistance phenotype in Anopheles gambiae from Tiassalé, Côte d'Ivoire, West Africa, is therefore a potentially major operational challenge, particularly because these malaria vectors now exhibit resistance to multiple insecticide classes. We investigated the genetic basis of resistance to the most commonly-applied carbamate, bendiocarb, in An. gambiae from Tiassalé. Geographically-replicated whole genome microarray experiments identified elevated P450 enzyme expression as associated with bendiocarb resistance, most notably genes from the CYP6 subfamily. P450s were further implicated in resistance phenotypes by induction of significantly elevated mortality to bendiocarb by the synergist piperonyl butoxide (PBO), which also enhanced the action of pyrethroids and an organophosphate. CYP6P3 and especially CYP6M2 produced bendiocarb resistance via transgenic expression in Drosophila in addition to pyrethroid resistance for both genes, and DDT resistance for CYP6M2 expression. CYP6M2 can thus cause resistance to three distinct classes of insecticide although the biochemical mechanism for carbamates is unclear because, in contrast to CYP6P3, recombinant CYP6M2 did not metabolise bendiocarb in vitro. Strongly bendiocarb resistant mosquitoes also displayed elevated expression of the acetylcholinesterase ACE-1 gene, arising at least in part from gene duplication, which confers a survival advantage to carriers of additional copies of resistant ACE-1 G119S alleles. Our results are alarming for vector-based malaria control. Extreme carbamate resistance in Tiassalé An. gambiae results from coupling of over-expressed target site allelic variants with heightened CYP6 P450 expression, which also provides resistance across contrasting insecticides. Mosquito populations displaying such a diverse basis of extreme and cross-resistance are likely to be unresponsive to standard insecticide resistance management practices.
Funded by: NIAID NIH HHS: 1R01AI082734-01; Wellcome Trust: 093755
PLoS genetics 2014;10;3;e1004236
Accumulation of human-adapting mutations during circulation of A(H1N1)pdm09 influenza virus in humans in the United Kingdom.
Section of Virology, Faculty of Medicine, Imperial College London, London, United Kingdom.
Unlabelled: The influenza pandemic that emerged in 2009 provided an unprecedented opportunity to study adaptation of a virus recently acquired from an animal source during human transmission. In the United Kingdom, the novel virus spread in three temporally distinct waves between 2009 and 2011. Phylogenetic analysis of complete viral genomes showed that mutations accumulated over time. Second- and third-wave viruses replicated more rapidly in human airway epithelial (HAE) cells than did the first-wave virus. In infected mice, weight loss varied between viral isolates from the same wave but showed no distinct pattern with wave and did not correlate with viral load in the mouse lungs or severity of disease in the human donor. However, second- and third-wave viruses induced less alpha interferon in the infected mouse lungs. NS1 protein, an interferon antagonist, had accumulated several mutations in second- and third-wave viruses. Recombinant viruses with the third-wave NS gene induced less interferon in human cells, but this alone did not account for increased virus fitness in HAE cells. Mutations in HA and NA genes in third-wave viruses caused increased binding to α-2,6-sialic acid and enhanced infectivity in human mucus. A recombinant virus with these two segments replicated more efficiently in HAE cells. A mutation in PA (N321K) enhanced polymerase activity of third-wave viruses and also provided a replicative advantage in HAE cells. Therefore, multiple mutations allowed incremental changes in viral fitness, which together may have contributed to the apparent increase in severity of A(H1N1)pdm09 influenza virus during successive waves.
Importance: Although most people infected with the 2009 pandemic influenza virus had mild or unapparent symptoms, some suffered severe and devastating disease. The reasons for this variability were unknown, but the numbers of severe cases increased during successive waves of human infection in the United Kingdom. To determine the causes of this variation, we studied genetic changes in virus isolates from individual hospitalized patients. There were no consistent differences between these viruses and those circulating in the community, but we found multiple evolutionary changes that in combination over time increased the virus's ability to infect human cells. These adaptations may explain the remarkable ability of A(H1N1)pdm09 virus to continue to circulate despite widespread immunity and the apparent increase in severity of influenza over successive waves of infection.
Funded by: Medical Research Council: G0600371, G0802752, G1000758, MC_G1001212; Wellcome Trust: 090382/Z/09/Z, 097117
Journal of virology 2014;88;22;13269-83
Geographic population structure analysis of worldwide human populations infers their biogeographical origins.
1] Department of Animal and Plant Sciences, University of Sheffield, Western Bank, Sheffield, S10 2TN, UK  Department of Mental Health, Johns Hopkins University Bloomberg School of Public Health, 615 N. Wolfe Street, Baltimore, Maryland 21205, USA .
The search for a method that utilizes biological information to predict humans' place of origin has occupied scientists for millennia. Over the past four decades, scientists have employed genetic data in an effort to achieve this goal but with limited success. While biogeographical algorithms using next-generation sequencing data have achieved an accuracy of 700 km in Europe, they were inaccurate elsewhere. Here we describe the Geographic Population Structure (GPS) algorithm and demonstrate its accuracy with three data sets using 40,000-130,000 SNPs. GPS placed 83% of worldwide individuals in their country of origin. Applied to over 200 Sardinians villagers, GPS placed a quarter of them in their villages and most of the rest within 50 km of their villages. GPS's accuracy and power to infer the biogeography of worldwide individuals down to their country or, in some cases, village, of origin, underscores the promise of admixture-based methods for biogeography and has ramifications for genetic ancestry testing.
Funded by: NICHD NIH HHS: HD070996, R01 HD070996; NIGMS NIH HHS: GM068968, R01 GM068968; Wellcome Trust: 098051
Nature communications 2014;5;3513
Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes.
Unidad de Proteómica and.
Determining the full complement of protein-coding genes is a key goal of genome annotation. The most powerful approach for confirming protein-coding potential is the detection of cellular protein expression through peptide mass spectrometry (MS) experiments. Here, we mapped peptides detected in seven large-scale proteomics studies to almost 60% of the protein-coding genes in the GENCODE annotation of the human genome. We found a strong relationship between detection in proteomics experiments and both gene family age and cross-species conservation. Most of the genes for which we detected peptides were highly conserved. We found peptides for >96% of genes that evolved before bilateria. At the opposite end of the scale, we identified almost no peptides for genes that have appeared since primates, for genes that did not have any protein-like features or for genes with poor cross-species conservation. These results motivated us to describe a set of 2001 potential non-coding genes based on features such as weak conservation, a lack of protein features, or ambiguous annotations from major databases, all of which correlated with low peptide detection across the seven experiments. We identified peptides for just 3% of these genes. We show that many of these genes behave more like non-coding genes than protein-coding genes and suggest that most are unlikely to code for proteins under normal circumstances. We believe that their inclusion in the human protein-coding gene catalogue should be revised as part of the ongoing human genome annotation effort.
Funded by: NHGRI NIH HHS: U41 HG007234
Human molecular genetics 2014;23;22;5866-78
Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression.
Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK.
To systematically investigate the impact of immune stimulation upon regulatory variant activity, we exposed primary monocytes from 432 healthy Europeans to interferon-γ (IFN-γ) or differing durations of lipopolysaccharide and mapped expression quantitative trait loci (eQTLs). More than half of cis-eQTLs identified, involving hundreds of genes and associated pathways, are detected specifically in stimulated monocytes. Induced innate immune activity reveals multiple master regulatory trans-eQTLs including the major histocompatibility complex (MHC), coding variants altering enzyme and receptor function, an IFN-β cytokine network showing temporal specificity, and an interferon regulatory factor 2 (IRF2) transcription factor-modulated network. Induced eQTL are significantly enriched for genome-wide association study loci, identifying context-specific associations to putative causal genes including CARD9, ATM, and IRF8. Thus, applying pathophysiologically relevant immune stimuli assists resolution of functional genetic variants.
Funded by: European Research Council: 281824; Medical Research Council: 98082, G1001708; Wellcome Trust: 074318, 088891, 090532, 090532/Z/09/Z
Science (New York, N.Y.) 2014;343;6175;1246949
Low copy number of the salivary amylase gene predisposes to obesity.
1] Department of Genomics of Common Disease, Imperial College London, London, UK.   .
Common multi-allelic copy number variants (CNVs) appear enriched for phenotypic associations compared to their biallelic counterparts. Here we investigated the influence of gene dosage effects on adiposity through a CNV association study of gene expression levels in adipose tissue. We identified significant association of a multi-allelic CNV encompassing the salivary amylase gene (AMY1) with body mass index (BMI) and obesity, and we replicated this finding in 6,200 subjects. Increased AMY1 copy number was positively associated with both amylase gene expression (P = 2.31 × 10(-14)) and serum enzyme levels (P < 2.20 × 10(-16)), whereas reduced AMY1 copy number was associated with increased BMI (change in BMI per estimated copy = -0.15 (0.02) kg/m(2); P = 6.93 × 10(-10)) and obesity risk (odds ratio (OR) per estimated copy = 1.19, 95% confidence interval (CI) = 1.13-1.26; P = 1.46 × 10(-10)). The OR value of 1.19 per copy of AMY1 translates into about an eightfold difference in risk of obesity between subjects in the top (copy number > 9) and bottom (copy number < 4) 10% of the copy number distribution. Our study provides a first genetic link between carbohydrate metabolism and BMI and demonstrates the power of integrated genomic approaches beyond genome-wide association studies.
Funded by: Biotechnology and Biological Sciences Research Council: G20234; Department of Health: SRF/01/010; Medical Research Council: G1002084, G1002084/1, K2010-55X-11285-13, MR/K01353X/1; NHGRI NIH HHS: HG004221, P41 HG004221; NIGMS NIH HHS: GM081533, R01 GM081533; Wellcome Trust: 077006/Z/05/Z, 079534/z/06/z, 085555
Nature genetics 2014;46;5;492-7
Workshops: a great way to enhance and supplement a degree.
H3Africa Bioinformatics Network (H3ABioNet) Node, National Biotechnology Development Agency (NABDA), Federal Ministry of Science and Technology (FMST), Abuja, Nigeria ; International Health Research Group, Dept of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom ; Genetic Epidemiology Group, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.
As part of the International Society for Computational Biology Student Council (ISCB-SC), Regional Student Groups (RSGs) have helped organise workshops in the emerging fields of bioinformatics and computational biology. Workshops are a great way for students to gain hands-on experience and rapidly acquire knowledge in advanced research topics where curriculum-based education is yet to be developed. RSG workshops have improved dissemination of knowledge of the latest bioinformatics techniques and resources among student communities and young scientists, especially in developing nations. This article highlights some of the benefits and challenges encountered while running RSG workshops. Examples cover a variety of subjects, including introductory bioinformatics and advanced bioinformatics, as well as soft skills such as networking, career development, and socializing. The collective experience condensed in this article is a useful starting point for students wishing to organise their own tailor-made workshops.
PLoS computational biology 2014;10;2;e1003497
Computational biology and bioinformatics in Nigeria.
H3Africa Bioinformatics Network (H3ABioNet) Node, National Biotechnology Development Agency (NABDA), Federal Ministry of Science and Technology (FMST), Abuja, Nigeria; Human Genetics Department, Wellcome Trust Sanger Institute, Cambridge, United Kingdom; International Health Research Group, Department of Public Health & Primary Care, University of Cambridge, Cambridge, United Kingdom.
Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological data generated by the scientific community. The critical need to process and analyze such a deluge of data and turn it into useful knowledge has caused bioinformatics to gain prominence and importance. Bioinformatics is an interdisciplinary research area that applies techniques, methodologies, and tools in computer and information science to solve biological problems. In Nigeria, bioinformatics has recently played a vital role in the advancement of biological sciences. As a developing country, the importance of bioinformatics is rapidly gaining acceptance, and bioinformatics groups comprised of biologists, computer scientists, and computer engineers are being constituted at Nigerian universities and research institutes. In this article, we present an overview of bioinformatics education and research in Nigeria. We also discuss professional societies and academic and research institutions that play central roles in advancing the discipline in Nigeria. Finally, we propose strategies that can bolster bioinformatics education and support from policy makers in Nigeria, with potential positive implications for other developing countries.
Funded by: NHGRI NIH HHS: U41 HG006941, U41HG006941; Wellcome Trust
PLoS computational biology 2014;10;4;e1003516
Drug resistance in Salmonella enterica ser. Typhimurium bloodstream infection, Malawi.
Funded by: Medical Research Council: G1100100; Wellcome Trust: 100890
Emerging infectious diseases 2014;20;11;1957-9
Fibroblastic growth factor receptor 1 amplification in osteosarcoma is associated with poor response to neo-adjuvant chemotherapy.
Histopathology, London Sarcoma Service, Royal National Orthopaedic Hospital NHS Trust, Stanmore, Middlesex, HA7 4LP, U.K; UCL Cancer Institute, Huntley Street, London, WC1E 6BT, U.K.
Osteosarcoma, the most common primary bone sarcoma, is a genetically complex disease with no widely accepted biomarker to allow stratification of patients for treatment. After a recent report of one osteosarcoma cell line and one tumor exhibiting fibroblastic growth factor receptor 1 (FGFR1) gene amplification, the aim of this work was to assess the frequency of FGFR1 amplification in a larger cohort of osteosarcoma and to determine if this biomarker could be used for stratification of patients for treatment. About 352 osteosarcoma samples from 288 patients were analyzed for FGFR1 amplification by interphase fluorescence in situ hybridization. FGFR1 amplification was detected in 18.5% of patients whose tumors revealed a poor response to chemotherapy, and no patients whose tumors responded well to therapy harbored this genetic alteration. FGFR1 amplification is present disproportionately in the rarer histological variants of osteosarcoma. This study provides a rationale for inclusion of patients with osteosarcoma in clinical trials using FGFR kinase inhibitors.
Funded by: Wellcome Trust: 077012/Z/05/Z, WT088340MA
Cancer medicine 2014;3;4;980-7
Survival and differentiation of adenovirus-generated induced pluripotent stem cells transplanted into the rat striatum.
Program in Neuroscience, Field Neurosciences Laboratory for Restorative Neurology Brain Research and Integrative Neuroscience Center, Central Michigan University, Mount Pleasant, MI, USA.
Induced pluripotent stem cells (iPSCs) offer certain advantages over embryonic stem cells in cell replacement therapy for a variety of neurological disorders. However, reliable procedures, whereby transplanted iPSCs can survive and differentiate into functional neurons, without forming tumors, have yet to be devised. Currently, retroviral or lentiviral reprogramming methods are often used to reprogram somatic cells. Although the use of these viruses has proven to be effective, formation of tumors often results following in vivo transplantation, possibly due to the integration of the reprogramming genes. The goal of the current study was to develop a new approach, using an adenovirus for reprogramming cells, characterize the iPSCs in vitro, and test their safety, survivability, and ability to differentiate into region-appropriate neurons following transplantation into the rat brain. To this end, iPSCs were derived from bone marrow-derived mesenchymal stem cells and tail-tip fibroblasts using a single cassette lentivirus or a combination of adenoviruses. The reprogramming efficiency and levels of pluripotency were compared using immunocytochemistry, flow cytometry, and real-time polymerase chain reaction. Our data indicate that adenovirus-generated iPSCs from tail-tip fibroblasts are as efficient as the method we used for lentiviral reprogramming. All generated iPSCs were also capable of differentiating into neuronal-like cells in vitro. To test the in vivo survivability and the ability to differentiate into region-specific neurons in the absence of tumor formation, 400,000 of the iPSCs derived from tail-tip fibroblasts that were transfected with the adenovirus pair were transplanted into the striatum of adult, immune-competent rats. We observed that these iPSCs produced region-specific neuronal phenotypes, in the absence of tumor formation, at 90 days posttransplantation. These results suggest that adenovirus-generated iPSCs may provide a safe and viable means for neuronal replacement therapies.
Funded by: Medical Research Council: G0701448
Cell transplantation 2014;23;11;1407-23
Pfam: the protein families database.
HHMI Janelia Farm Research Campus, 19700 Helix Drive, Ashburn, VA 20147 USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK, MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, OX1 3QX, UK, Institute of Biotechnology and Department of Biological and Environmental Sciences, University of Helsinki, PO Box 56 (Viikinkaari 5), 00014 Helsinki, Finland and Stockholm Bioinformatics Center, Swedish eScience Research Center, Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, PO Box 1031, SE-17121 Solna, Sweden.
Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.
Funded by: Howard Hughes Medical Institute
Nucleic acids research 2014;42;Database issue;D222-30
High-definition reconstruction of clonal composition in cancer.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. Electronic address: firstname.lastname@example.org.
The extensive genetic heterogeneity of cancers can greatly affect therapy success due to the existence of subclonal mutations conferring resistance. However, the characterization of subclones in mixed-cell populations is computationally challenging due to the short length of sequence reads that are generated by current sequencing technologies. Here, we report cloneHD, a probabilistic algorithm for the performance of subclone reconstruction from data generated by high-throughput DNA sequencing: read depth, B-allele counts at germline heterozygous loci, and somatic mutation counts. The algorithm can exploit the added information present in correlated longitudinal or multiregion samples and takes into account correlations along genomes caused by events such as copy-number changes. We apply cloneHD to two case studies: a breast cancer sample and time-resolved samples of chronic lymphocytic leukemia, where we demonstrate that monitoring the response of a patient to therapy regimens is feasible. Our work provides new opportunities for tracking cancer development.
Funded by: Wellcome Trust: 097678, 098051, 101239, 101239/Z/13/Z
Cell reports 2014;7;5;1740-1752
Biomarker profiling by nuclear magnetic resonance spectroscopy for the prediction of all-cause mortality: an observational study of 17,345 persons.
The Estonian Genome Center, University of Tartu, Tartu, Estonia.
Background: Early identification of ambulatory persons at high short-term risk of death could benefit targeted prevention. To identify biomarkers for all-cause mortality and enhance risk prediction, we conducted high-throughput profiling of blood specimens in two large population-based cohorts.
Methods and findings: 106 candidate biomarkers were quantified by nuclear magnetic resonance spectroscopy of non-fasting plasma samples from a random subset of the Estonian Biobank (n = 9,842; age range 18-103 y; 508 deaths during a median of 5.4 y of follow-up). Biomarkers for all-cause mortality were examined using stepwise proportional hazards models. Significant biomarkers were validated and incremental predictive utility assessed in a population-based cohort from Finland (n = 7,503; 176 deaths during 5 y of follow-up). Four circulating biomarkers predicted the risk of all-cause mortality among participants from the Estonian Biobank after adjusting for conventional risk factors: alpha-1-acid glycoprotein (hazard ratio [HR] 1.67 per 1-standard deviation increment, 95% CI 1.53-1.82, p = 5×10⁻³¹), albumin (HR 0.70, 95% CI 0.65-0.76, p = 2×10⁻¹⁸), very-low-density lipoprotein particle size (HR 0.69, 95% CI 0.62-0.77, p = 3×10⁻¹²), and citrate (HR 1.33, 95% CI 1.21-1.45, p = 5×10⁻¹⁰). All four biomarkers were predictive of cardiovascular mortality, as well as death from cancer and other nonvascular diseases. One in five participants in the Estonian Biobank cohort with a biomarker summary score within the highest percentile died during the first year of follow-up, indicating prominent systemic reflections of frailty. The biomarker associations all replicated in the Finnish validation cohort. Including the four biomarkers in a risk prediction score improved risk assessment for 5-y mortality (increase in C-statistics 0.031, p = 0.01; continuous reclassification improvement 26.3%, p = 0.001).
Conclusions: Biomarker associations with cardiovascular, nonvascular, and cancer mortality suggest novel systemic connectivities across seemingly disparate morbidities. The biomarker profiling improved prediction of the short-term risk of death from all causes above established risk factors. Further investigations are needed to clarify the biological mechanisms and the utility of these biomarkers for guiding screening and prevention.
Funded by: Medical Research Council; Wellcome Trust
PLoS medicine 2014;11;2;e1001606
Xenopus mutant reveals necessity of rax for specifying the eye field which otherwise forms tissue with telencephalic and diencephalic character.
Department of Biology, University of Virginia, Charlottesville, VA 22904, USA.
The retinal anterior homeobox (rax) gene encodes a transcription factor necessary for vertebrate eye development. rax transcription is initiated at the end of gastrulation in Xenopus, and is a key part of the regulatory network specifying anterior neural plate and retina. We describe here a Xenopus tropicalis rax mutant, the first mutant analyzed in detail from a reverse genetic screen. As in other vertebrates, this nonsense mutation results in eyeless animals, and is lethal peri-metamorphosis. Tissue normally fated to form retina in these mutants instead forms tissue with characteristics of diencephalon and telencephalon. This implies that a key role of rax, in addition to defining the eye field, is in preventing alternative forebrain identities. Our data highlight that brain and retina regions are not determined by the mid-gastrula stage but are by the neural plate stage. An RNA-Seq analysis and in situ hybridization assays for early gene expression in the mutant revealed that several key eye field transcription factors (e.g. pax6, lhx2 and six6) are not dependent on rax activity through neurulation. However, these analyses identified other genes either up- or down-regulated in mutant presumptive retinal tissue. Two neural patterning genes of particular interest that appear up-regulated in the rax mutant RNA-seq analysis are hesx1 and fezf2. These genes were not previously known to be regulated by rax. The normal function of rax is to partially repress their expression by an indirect mechanism in the presumptive retina region in wildtype embryos, thus accounting for the apparent up-regulation in the rax mutant. Knock-down experiments using antisense morpholino oligonucleotides directed against hesx1 and fezf2 show that failure to repress these two genes contributes to transformation of presumptive retinal tissue into non-retinal forebrain identities in the rax mutant.
Funded by: NEI NIH HHS: EY018000, EY022954, R01 EY017400, R01 EY018000, R01 EY022954, R01EY017400
Developmental biology 2014;395;2;317-30
Genome-wide association studies of glycaemic traits: A MAGICal journey
Frontiers in Diabetes 2014;23;42-57
Chromosome instability induced by Mps1 and p53 mutation generates aggressive lymphomas exhibiting aneuploidy-induced stress.
European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, NL-9713 AV, Groningen, The Netherlands; Department of Systems Biology, Harvard Medical School, Boston, MA 02115; Mouse Genomics, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, United Kingdom; and email@example.com firstname.lastname@example.org.
Aneuploidy is a hallmark of human solid cancers that arises from errors in mitosis and results in gain and loss of oncogenes and tumor suppressors. Aneuploidy poses a growth disadvantage for cells grown in vitro, suggesting that cancer cells adapt to this burden. To understand better the consequences of aneuploidy in a rapidly proliferating adult tissue, we engineered a mouse in which chromosome instability was selectively induced in T cells. A flanked by Lox mutation was introduced into the monopolar spindle 1 (Mps1) spindle-assembly checkpoint gene so that Cre-mediated recombination would create a truncated protein (Mps1(DK)) that retained the kinase domain but lacked the kinetochore-binding domain and thereby weakened the checkpoint. In a sensitized p53(+/-) background we observed that Mps1(DK/DK) mice suffered from rapid-onset acute lymphoblastic lymphoma. The tumors were highly aneuploid and exhibited a metabolic burden similar to that previously characterized in aneuploid yeast and cultured cells. The tumors nonetheless grew rapidly and were lethal within 3-4 mo after birth.
Funded by: NCI NIH HHS: CA084179, CA139980, P01 CA139980, P30-CA14051, R01 CA084179; Wellcome Trust
Proceedings of the National Academy of Sciences of the United States of America 2014;111;37;13427-32
COSMIC: exploring the world's knowledge of somatic mutations in human cancer.
Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK, CB10 1SA. email@example.com.
COSMIC, the Catalogue Of Somatic Mutations In Cancer (http://cancer.sanger.ac.uk) is the world's largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer. Our latest release (v70; Aug 2014) describes 2 002 811 coding point mutations in over one million tumor samples and across most human genes. To emphasize depth of knowledge on known cancer genes, mutation information is curated manually from the scientific literature, allowing very precise definitions of disease types and patient details. Combination of almost 20,000 published studies gives substantial resolution of how mutations and phenotypes relate in human cancer, providing insights into the stratification of mutations and biomarkers across cancer patient populations. Conversely, our curation of cancer genomes (over 12,000) emphasizes knowledge breadth, driving discovery of unrecognized cancer-driving hotspots and molecular targets. Our high-resolution curation approach is globally unique, giving substantial insight into molecular biomarkers in human oncology. In addition, COSMIC also details more than six million noncoding mutations, 10,534 gene fusions, 61,299 genome rearrangements, 695,504 abnormal copy number segments and 60,119,787 abnormal expression variants. All these types of somatic mutation are annotated to both the human genome and each affected coding gene, then correlated across disease and mutation types.
Funded by: Wellcome Trust: 077012/Z/05/Z, 088340
Nucleic acids research 2014;43;Database issue;D805-11
Genome-wide association study of intracranial aneurysm identifies a new association on chromosome 7.
From the Indiana University School of Medicine, Indianapolis (T.F., D.L., D.K., J.M.); University Medical Center Utrecht, Utrecht, The Netherlands (F.v.H., G.R., Y.R.); Kuopio University Hospital, Kuopio, Finland (M.I.K., M.v.u.z.F., J.E.J.); University of Eastern Finland, Kuopio, Finland (M.I.K., M.v.u.z.F., J.E.J.); Massachusetts General Hospital and Harvard Medical School, Boston (M.I.K., A.P., S.R.); Broad Institute of Harvard and MIT, Cambridge, MA (M.I.K., A.P., S.R.); University of Sydney and Royal Prince Alfred Hospital, Sydney, Australia (C.S.A.); Mayo Clinic, Rochester, MN (R.D.B., J. Huston I.M.); Columbia University School of Medicine, New York, NY (E.S.C.); University of Helsinki, Helsinki, Finland (J.G.E., E.I.G.); Folkhälsan Research Center, Helsinki, Finland (J.G.E.); National Institute for Health and Welfare, Helsinki, Finland (J.G.E.); Vasa Central Hospital, Vasa, Finland (J.G.E.); Helsinki University Central Hospital, Helsinki, Finland (J.G.E., E.I.G., A.L., J. Hernesniemi, R.K., H.L., M.N.); University of Cincinnati, OH (M. Flaherty, D.K., C.J.M., L.S., D.W., J.B.); University of Texas Health Science Center at Houston (M. Fornage); Radboud University Medical Center, Nijmegen, The Netherlands (L.A.K., S.H.V.); University of California, San Francisco (N.K.); University of Mississippi Medical Center, Jackson (T.H.M.); Jagiellonian University Medical College, Krakow, Poland (M.M., J.P., A.S.); The Wellcome Trust Sanger Institute, Cambridge, United Kingdom (A.P.); University of Montreal, Montréal, Québec, Canada (G.R.); and University of Virginia School of Medicine, Charlottesville (B.B.W.). firstname.lastname@example.org.
Background and purpose: Common variants have been identified using genome-wide association studies which contribute to intracranial aneurysms (IA) susceptibility. However, it is clear that the variants identified to date do not account for the estimated genetic contribution to disease risk.
Methods: Initial analysis was performed in a discovery sample of 2617 IA cases and 2548 controls of white ancestry. Novel chromosomal regions meeting genome-wide significance were further tested for association in 2 independent replication samples: Dutch (717 cases; 3004 controls) and Finnish (799 cases; 2317 controls). A meta-analysis was performed to combine the results from the 3 studies for key chromosomal regions of interest.
Results: Genome-wide evidence of association was detected in the discovery sample on chromosome 9 (CDKN2BAS; rs10733376: P<1.0×10(-11)), in a gene previously associated with IA. A novel region on chromosome 7, near HDAC9, was associated with IA (rs10230207; P=4.14×10(-8)). This association replicated in the Dutch sample (P=0.01) but failed to show association in the Finnish sample (P=0.25). Meta-analysis results of the 3 cohorts reached statistical significant (P=9.91×10(-10)).
Conclusions: We detected a novel region associated with IA susceptibility that was replicated in an independent Dutch sample. This region on chromosome 7 has been previously associated with ischemic stroke and the large vessel stroke occlusive subtype (including HDAC9), suggesting a possible genetic link between this stroke subtype and IA.
Funded by: NCATS NIH HHS: UL1 TR001108; NHGRI NIH HHS: U01HG004402; NHLBI NIH HHS: HL096814, HL096899, HL096902, HL096917, R01-HL70825, R01HL086694, R01HL087641, R01HL59367, U01 HL096812; NINDS NIH HHS: R01 NS039512, R01NS39512, R03NS083468; PHS HHS: HHSN268200625226C, HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, HHSN268201100012C
Stroke; a journal of cerebral circulation 2014;45;11;3194-9
Genomics illuminates parasite biology.
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Nature reviews. Microbiology 2014;12;11;727
Whipworm genome and dual-species transcriptome analyses provide molecular insights into an intimate host-parasite interaction.
Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Whipworms are common soil-transmitted helminths that cause debilitating chronic infections in man. These nematodes are only distantly related to Caenorhabditis elegans and have evolved to occupy an unusual niche, tunneling through epithelial cells of the large intestine. We report here the whole-genome sequences of the human-infective Trichuris trichiura and the mouse laboratory model Trichuris muris. On the basis of whole-transcriptome analyses, we identify many genes that are expressed in a sex- or life stage-specific manner and characterize the transcriptional landscape of a morphological region with unique biological adaptations, namely, bacillary band and stichosome, found only in whipworms and related parasites. Using RNA sequencing data from whipworm-infected mice, we describe the regulated T helper 1 (TH1)-like immune response of the chronically infected cecum in unprecedented detail. In silico screening identified numerous new potential drug targets against trichuriasis. Together, these genomes and associated functional data elucidate key aspects of the molecular host-parasite interactions that define chronic whipworm infection.
Funded by: Wellcome Trust: 088862, 088862/Z/09/Z, 098051, 100290, 100714, WT083620MA, WT100290MA
Nature genetics 2014;46;7;693-700
Human and Vertebrate Analysis and Annotation Group, Wellcome Trust Sanger Institute, Morgan Building, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1HH, UK, email@example.com.
Historically pseudogenes were believed to represent nonfunctional genomic fossils; however, there is emerging evidence that many of them could be biologically active. This possibility has ignited interest in pseudogene loci and made the need for their high-quality annotation more pressing as an accurate knowledge of all pseudogenes in the human reference genome sequence facilitates confident functional analysis. GENCODE have undertaken the first genome-wide pseudogene assignment for protein-coding genes combining both large-scale manual annotation and computational pseudogene prediction pipelines. Multiple computational predictions provide an unbiased set of hints for manual annotators to investigate, both during first-pass annotation and as part of QC to identify any potential missing pseudogene loci. Where a pseudogene is identified, the extent of its homology to the parent locus is fully investigated by a manual annotator; a pseudogene model is built and assigned to one of eight pseudogene biotypes depending on the mechanism of creation and on the presence of locus-specific transcriptional or proteomic data. The high-quality, information-rich set of pseudogenes created has been integrated with ENCODE functional genomics data, specifically expression level, transcription factor and RNA polymerase II binding, and chromatin marks. In this way we have been able to identify some pseudogenes that possess conventional characteristics of functionality as well as others with interesting patterns of partial activity, which might suggest that putatively inactive loci could be gaining a novel function, for example as long noncoding RNAs. The activity data associated with every pseudogene is stored in the psiDR resource.
Methods in molecular biology (Clifton, N.J.) 2014;1167;129-55
Mutation, clonal fitness and field change in epithelial carcinogenesis.
MRC Cancer Unit, University of Cambridge, Hutchison/MRC Research Centre, Box 197, Cambridge Biomedical Campus, Cambridge, CB2 0XZ, UK.
Developments in lineage tracing in mouse models have revealed how stem cells maintain normal squamous and glandular epithelia. Here we review recent quantitative studies tracing the fate of individual mutant stem cells which have uncovered how common oncogenic mutations alter cell behaviour, creating clones with a growth advantage that may persist long term. In the intestine this occurs by a mutant clone colonizing an entire crypt, whilst in the squamous oesophagus blocking differentiation creates clones that expand to colonize large areas of epithelium, a phenomenon known as field change. We consider the implications of these findings for early cancer evolution and the cancer stem cell hypothesis, and the prospects of targeted cancer prevention by purging mutant clones from normal-appearing epithelia.
Funded by: Cancer Research UK: 13031; Medical Research Council: MC_UU_12022/3
The Journal of pathology 2014;234;3;296-301
De novo mutations in schizophrenia implicate synaptic networks.
1] Division of Psychiatric Genomics in the Department of Psychiatry, and Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA  Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.
Inherited alleles account for most of the genetic risk for schizophrenia. However, new (de novo) mutations, in the form of large chromosomal copy number changes, occur in a small fraction of cases and disproportionally disrupt genes encoding postsynaptic proteins. Here we show that small de novo mutations, affecting one or a few nucleotides, are overrepresented among glutamatergic postsynaptic proteins comprising activity-regulated cytoskeleton-associated protein (ARC) and N-methyl-d-aspartate receptor (NMDAR) complexes. Mutations are additionally enriched in proteins that interact with these complexes to modulate synaptic strength, namely proteins regulating actin filament dynamics and those whose messenger RNAs are targets of fragile X mental retardation protein (FMRP). Genes affected by mutations in schizophrenia overlap those mutated in autism and intellectual disability, as do mutation-enriched synaptic pathways. Aligning our findings with a parallel case-control study, we demonstrate reproducible insights into aetiological mechanisms for schizophrenia and reveal pathophysiology shared with other neurodevelopmental disorders.
Funded by: Medical Research Council: G0800509, G0801418; NHGRI NIH HHS: R01HG005827; NIMH NIH HHS: 2 P50MH066392-05A1, R01MH071681, R01MH099126; Wellcome Trust: WT089062, WT098051
Complete Genome Sequence of the WHO International Standard for HIV-1 RNA Determined by Deep Sequencing.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
The World Health Organization (WHO) International Standard for HIV-1 RNA nucleic acid assays was characterized by complete genome deep sequencing analysis. The entire coding sequence and flanking long terminal repeats (LTRs), including minority species, were assigned subtype B. This information will aid the design, development, and evaluation of HIV-1 RNA amplification assays.
Funded by: Medical Research Council: G0600007
Genome announcements 2014;2;1
27th International Mammalian Genome Conference meeting report.
National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, US.
Funded by: NHGRI NIH HHS: 2R13HG0002394, R13 HG002394
Mammalian genome : official journal of the International Mammalian Genome Society 2014;25;5-6;195-201
Comparison of TALE designer transcription factors and the CRISPR/dCas9 in regulation of gene expression by targeting enhancers.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK.
The transcription activator-like effectors (TALEs) and the RNA-guided clustered regularly interspaced short palindromic repeat (CRISPR) associated protein (Cas9) utlilize distinct molecular mechanisms in targeting site recognition. The two proteins can be modified to carry additional functional domains to regulate expression of genomic loci in mammalian cells. In this study, we have compared the two systems in activation and suppression of the Oct4 and Nanog loci by targeting their enhancers. Although both are able to efficiently activate the luciferase reporters, the CRISPR/dCas9 system is much less potent in activating the endogenous loci and in the application of reprogramming somatic cells to iPS cells. Nevertheless, repression by CRISPR/dCas9 is comparable to or even better than TALE repressors. We demonstrated that dCas9 protein binding results in significant physical interference to binding of native transcription factors at enhancer, less efficient active histone markers induction or recruitment of activating complexes in gene activation. This study thus highlighted the merits and drawbacks of transcription regulation by each system. A combined approach of TALEs and CRISPR/dCas9 should provide an optimized solution to regulate genomic loci and to study genetic elements such as enhancers in biological processes including somatic cell reprogramming and guided differentiation.
Funded by: Wellcome Trust: 098051
Nucleic acids research 2014;42;20;e155
The evolving role of cancer cell line-based screens to define the impact of cancer genomes on drug response.
Cancer Genome Project, Wellcome Trust Sanger Institute Hinxton, Cambridge, United Kingdom.
Over the last decade we have witnessed the convergence of two powerful experimental designs toward a common goal of defining the molecular subtypes that underpin the likelihood of a cancer patient responding to treatment in the clinic. The first of these 'experiments' has been the systematic sequencing of large numbers of cancer genomes through the International Cancer Genome Consortium and The Cancer Genome Atlas. This endeavour is beginning to yield a complete catalogue of the cancer genes that are critical for tumourigenesis and amongst which we will find tomorrow's biomarkers and drug targets. The second 'experiment' has been the use of large-scale biological models such as cancer cell lines to correlate mutations in cancer genes with drug sensitivity, such that one could begin to develop rationale clinical trials to begin to test these hypotheses. It is at this intersection of cancer genome sequencing and biological models that there exists the opportunity to completely transform how we stratify cancer patients in the clinic for treatment.
Funded by: Cancer Research UK; Wellcome Trust
Current opinion in genetics & development 2014;24;114-9
Identification of new SNPs in native South American populations by resequencing the Y chromosome.
Institute of Legal Medicine and Forensic Sciences, Department of Forensic Genetics, Charité - Universitätsmedizin Berlin, Germany. Electronic address: firstname.lastname@example.org.
The Y-chromosomal genetic landscape of South America is relatively homogenous. The majority of native Amerindian people are assigned to haplogroup Q and only a small percentage belongs to haplogroup C. With the aim of further differentiating the major Q lineages and thus obtaining new insights into the population history of South America, two individuals, both belonging to the sub-haplogroup Q-M3, were analyzed with next-generation sequencing. Several new candidate SNPs were evaluated and four were confirmed to be new, haplogroup Q-specific, and variable. One of the new SNPs, named MG2, identifies a new sub-haplogroup downstream of Q-M3; the other three (MG11, MG13, MG15) are upstream of Q-M3 but downstream of M242, and describe branches at the same phylogenetic positions as previously known SNPs in the samples tested. These four SNPs were typed in 100 individuals belonging to haplogroup Q.
Funded by: Wellcome Trust: 098051
Forensic science international. Genetics 2014;15;111-4
Subclonal variant calling with multiple samples and prior knowledge.
Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK, Department of Haematology, Addenbrooke's Hospital, Cambridge CB2 0QQ, UK and Department of Haematology, University of Cambridge, Cambridge CB22XY, UK.
Motivation: Targeted resequencing of cancer genes in large cohorts of patients is important to understand the biological and clinical consequences of mutations. Cancers are often clonally heterogeneous, and the detection of subclonal mutations is important from a diagnostic point of view, but presents strong statistical challenges.
Results: Here we present a novel statistical approach for calling mutations from large cohorts of deeply resequenced cancer genes. These data allow for precisely estimating local error profiles and enable detecting mutations with high sensitivity and specificity. Our probabilistic method incorporates knowledge about the distribution of variants in terms of a prior probability. We show that our algorithm has a high accuracy of calling cancer mutations and demonstrate that the detected clonal and subclonal variants have important prognostic consequences.
Funded by: Wellcome Trust: 077012/Z/05/Z, 088340, WT088340MA
Bioinformatics (Oxford, England) 2014;30;9;1198-204
Monitoring parasite diversity for malaria elimination in sub-Saharan Africa.
Noguchi Memorial Institute for Medical Research, Accra, Ghana.
The African continent continues to bear the greatest burden of malaria and the greatest diversity of parasites, mosquito vectors, and human victims. The evolutionary plasticity of malaria parasites and their vectors is a major obstacle to eliminating the disease. Of current concern is the recently reported emergence of resistance to the front-line drug, artemisinin, in South-East Asia in Plasmodium falciparum, which calls for preemptive surveillance of the African parasite population for genetic markers of emerging drug resistance. Here we describe the Plasmodium Diversity Network Africa (PDNA), which has been established across 11 countries in sub-Saharan Africa to ensure that African scientists are enabled to work together and to play a key role in the global effort for tracking and responding to this public health threat.
Funded by: Medical Research Council: G0600718, MC_EX_MR/K02440X/1; Wellcome Trust: 090532, 090770
Science (New York, N.Y.) 2014;345;6202;1297-8
Maturation of induced pluripotent stem cell derived hepatocytes by 3D-culture.
Wellcome Trust-Medical Research Council Stem Cell Institute, Anne McLaren Laboratory for Regenerative Medicine, Department of Surgery, University of Cambridge, Cambridge, United Kingdom ; Immunopathogenesis Section, Laboratory of Parasitic Diseases, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, United States of America.
Induced pluripotent stem cell derived hepatocytes (IPSC-Heps) have the potential to reduce the demand for a dwindling number of primary cells used in applications ranging from therapeutic cell infusions to in vitro toxicology studies. However, current differentiation protocols and culture methods produce cells with reduced functionality and fetal-like properties compared to adult hepatocytes. We report a culture method for the maturation of IPSC-Heps using 3-Dimensional (3D) collagen matrices compatible with high throughput screening. This culture method significantly increases functional maturation of IPSC-Heps towards an adult phenotype when compared to conventional 2D systems. Additionally, this approach spontaneously results in the presence of polarized structures necessary for drug metabolism and improves functional longevity to over 75 days. Overall, this research reveals a method to shift the phenotype of existing IPSC-Heps towards primary adult hepatocytes allowing such cells to be a more relevant replacement for the current primary standard.
Funded by: Medical Research Council: G0701448; Wellcome Trust: 088566
PloS one 2014;9;1;e86372
Genomic Investigations unmask Mycoplasma amphoriforme, a new respiratory pathogen.
School of Medicine, University of St Andrews, United Kingdom.
Background: Mycoplasma amphoriforme has been associated with infection in patients with primary antibody deficiency (PAD). Little is known about the natural history of infection with this organism and its ability to be transmitted in the community.
Methods: The bacterial load was estimated in sequential sputum samples from 9 patients by quantitative polymerase chain reaction. The genomes of all available isolates, originating from patients in the United Kingdom, France, and Tunisia, were sequenced along with the type strain. Genomic data were assembled and annotated, and a high-resolution phylogenetic tree was constructed.
Results: By using high-resolution whole-genome sequencing (WGS) data, we show that patients can be chronically infected with M. amphoriforme manifesting as a relapsing-remitting bacterial load, interspersed by periods when the organism is undetectable. Importantly, we demonstrate transmission of strains within a clinical environment. Antibiotic resistance mutations accumulate in isolates taken from patients who received multiple courses of antibiotics.
Conclusions: Mycoplasma amphoriforme isolates form a closely related species responsible for a chronic relapsing and remitting infection in PAD patients in the United Kingdom and from immunocompetent patients in other countries. We provide strong evidence of transmission between patients attending the same clinic, suggesting that screening and isolation may be necessary for susceptible patients. This work demonstrates the critical role that WGS can play in rapidly unraveling the biology of a novel pathogen.
Funded by: Medical Research Council: G1000413; Wellcome Trust: 097831/Z/11/B, 098051
Clinical infectious diseases : an official publication of the Infectious Diseases Society of America 2014;60;3;381-8
Expression and replication studies to identify new candidate genes involved in normal hearing function.
Department of Medical Sciences, University of Trieste, Trieste, Italy.
Considerable progress has been made in identifying deafness genes, but still little is known about the genetic basis of normal variation in hearing function. We recently carried out a Genome Wide Association Study (GWAS) of quantitative hearing traits in southern European populations and found several SNPs with suggestive but none with significant association. In the current study, we followed up these SNPs to investigate which of them might show a genuine association with auditory function using alternative approaches. Firstly, we generated a shortlist of 19 genes from the published GWAS results. Secondly, we carried out immunocytochemistry to examine expression of these 19 genes in the mouse inner ear. Twelve of them showed distinctive cochlear expression patterns. Four showed expression restricted to sensory hair cells (Csmd1, Arsg, Slc16a6 and Gabrg3), one only in marginal cells of the stria vascularis (Dclk1) while the others (Ptprd, Grm8, GlyBP, Evi5, Rimbp2, Ank2, Cdh13) in multiple cochlear cell types. In the third step, we tested these 12 genes for replication of association in an independent set of samples from the Caucasus and Central Asia. Nine out of them showed nominally significant association (p<0.05). In particular, 4 were replicated at the same SNP and with the same effect direction while the remaining 5 showed a significant association in a gene-based test. Finally, to look for genotype-phenotype relationship, the audiometric profiles of the three genotypes of the most strongly associated gene variants were analyzed. Seven out of the 9 replicated genes (CDH13, GRM8, ANK2, SLC16A6, ARSG, RIMBP2 and DCLK1) showed an audiometric pattern with differences between different genotypes further supporting their role in hearing function. These data demonstrate the usefulness of this multistep approach in providing new insights into the molecular basis of hearing and may suggest new targets for treatment and prevention of hearing impairment.
Funded by: Medical Research Council: G0300212, MC_QA137918; Telethon: GGP09037; Wellcome Trust: 098051, 100669, WT100669MA
PloS one 2014;9;1;e85352
Genome-scale RNAi screens for high-throughput phenotyping in bloodstream-form African trypanosomes.
Division of Biological Chemistry and Drug Discovery, College of Life Sciences, University of Dundee, Dundee, UK.
The ability to simultaneously assess every gene in a genome for a role in a particular process has obvious appeal. This protocol describes how to perform genome-scale RNAi library screens in bloodstream-form African trypanosomes, a family of parasites that causes lethal human and animal diseases and also serves as a model for studies on basic aspects of eukaryotic biology and evolution. We discuss strain assembly, screen design and implementation, the RNAi target sequencing approach and hit validation, and we provide a step-by-step protocol. A screen can yield from one to thousands of 'hits' associated with the phenotype of interest. The screening protocol itself takes 2 weeks or less to be completed, and high-throughput sequencing may also be completed within weeks. Pre- and post-screen strain assembly, validation and follow-up can take several months, depending on the type of screen and the number of hits analyzed.
Funded by: Medical Research Council: MR/K011987/1; Wellcome Trust: 085775/Z/08/Z, 093010/Z/10/Z, 100320, 100320/Z/12/Z, 100476
Nature protocols 2014;10;1;106-33
Fast randomization of large genomic datasets while preserving alteration counts.
Fondazione Bruno Kessler, I-38100 Povo (Trento), Italy, European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge CB10 1SD, UK, Wellcome Trust Sanger Institute, Cambridge CB10 1SD, UK and Universitat Pompeu Fabra, Barcelona 08003, Spain.
Motivation: Studying combinatorial patterns in cancer genomic datasets has recently emerged as a tool for identifying novel cancer driver networks. Approaches have been devised to quantify, for example, the tendency of a set of genes to be mutated in a 'mutually exclusive' manner. The significance of the proposed metrics is usually evaluated by computing P-values under appropriate null models. To this end, a Monte Carlo method (the switching-algorithm) is used to sample simulated datasets under a null model that preserves patient- and gene-wise mutation rates. In this method, a genomic dataset is represented as a bipartite network, to which Markov chain updates (switching-steps) are applied. These steps modify the network topology, and a minimal number of them must be executed to draw simulated datasets independently under the null model. This number has previously been deducted empirically to be a linear function of the total number of variants, making this process computationally expensive.
Results: We present a novel approximate lower bound for the number of switching-steps, derived analytically. Additionally, we have developed the R package BiRewire, including new efficient implementations of the switching-algorithm. We illustrate the performances of BiRewire by applying it to large real cancer genomics datasets. We report vast reductions in time requirement, with respect to existing implementations/bounds and equivalent P-value computations. Thus, we propose BiRewire to study statistical properties in genomic datasets, and other data that can be modeled as bipartite networks.
Availability and implementation: BiRewire is available on BioConductor at http://www.bioconductor.org/packages/2.13/bioc/html/BiRewire.html.
Supplementary information: Supplementary data are available at Bioinformatics online.
Funded by: Wellcome Trust: 102696
Bioinformatics (Oxford, England) 2014;30;17;i617-23
Nongenetic stochastic expansion of JAK2V617F-homozygous subclones in polycythemia vera?
Cambridge Institute for Medical Research and Wellcome Trust/Medical Research Council, Stem Cell Institute and Department of Haematology, University of Cambridge, Cambridge, United Kingdom Department of Haematology, Addenbrooke's Hospital, Cambridge, United Kingdom.
Funded by: Canadian Institutes of Health Research; Cancer Research UK; Medical Research Council; Wellcome Trust: 088340
Genomic epidemiology of Neisseria gonorrhoeae with reduced susceptibility to cefixime in the USA: a retrospective observational study.
Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA; Division of Infectious Diseases, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. Electronic address: email@example.com.
Background: The emergence of Neisseria gonorrhoeae with decreased susceptibility to extended spectrum cephalosporins raises the prospect of untreatable gonorrhoea. In the absence of new treatments, efforts to slow the increasing incidence of resistant gonococcus require insight into the factors that contribute to its emergence and spread. We assessed the relatedness between isolates in the USA and reconstructed likely spread of lineages through different sexual networks.
Methods: We sequenced the genomes of 236 isolates of N gonorrhoeae collected by the Centers for Disease Control and Prevention's Gonococcal Isolate Surveillance Project (GISP) from sentinel public sexually transmitted disease clinics in the USA, including 118 (97%) of the isolates from 2009-10 in GISP with reduced susceptibility to cefixime (cef(RS)) and 118 cefixime-susceptible isolates from GISP matched as closely as possible by location, collection date, and sexual orientation. We assessed the association between antimicrobial resistance genotype and phenotype and correlated phylogenetic clustering with location and sexual orientation.
Findings: Mosaic penA XXXIV had a high positive predictive value for cef(RS). We found that two of the 118 cef(RS) isolates lacked a mosaic penA allele, and rechecking showed that these two were susceptible to cefixime. Of the 116 remaining cef(RS) isolates, 114 (98%) fell into two distinct lineages that have independently acquired mosaic penA allele XXXIV. A major lineage of cef(RS) strains spread eastward, predominantly through a sexual network of men who have sex with men. Eight of nine inferred transitions between sexual networks were introductions from men who have sex with men into the heterosexual population.
Interpretation: Genomic methods might aid efforts to slow the spread of antibiotic-resistant N gonorrhoeae through augmentation of gonococcal outbreak surveillance and identification of populations that could benefit from increased screening for asymptomatic infections.
Funded by: NIAID NIH HHS: 1-K08-AI104767-01, 5-K01-AI101010-02, K01 AI101010, K08 AI104767, R01 AI106786; NIGMS NIH HHS: U54 GM088558, U54GM088558; Wellcome Trust: 098051
The Lancet. Infectious diseases 2014;14;3;220-6
Acute myeloid leukaemia: a paradigm for the clonal evolution of cancer?
Haematological Cancer Genetics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Acute myeloid leukaemia (AML) is an uncontrolled clonal proliferation of abnormal myeloid progenitor cells in the bone marrow and blood. Advances in cancer genomics have revealed the spectrum of somatic mutations that give rise to human AML and drawn our attention to its molecular evolution and clonal architecture. It is now evident that most AML genomes harbour small numbers of mutations, which are acquired in a stepwise manner. This characteristic, combined with our ability to identify mutations in individual leukaemic cells and our detailed understanding of normal human and murine haematopoiesis, makes AML an excellent model for understanding the principles of cancer evolution. Furthermore, a better understanding of how AML evolves can help us devise strategies to improve the therapy and prognosis of AML patients. Here, we draw from recent advances in genomics, clinical studies and experimental models to describe the current knowledge of the clonal evolution of AML and its implications for the biology and treatment of leukaemias and other cancers.
Funded by: Wellcome Trust: 095663
Disease models & mechanisms 2014;7;8;941-51
De novo loss-of-function mutations in SETD5, encoding a methyltransferase in a 3p25 microdeletion syndrome critical region, cause intellectual disability.
Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, UK.
To identify further Mendelian causes of intellectual disability (ID), we screened a cohort of 996 individuals with ID for variants in 565 known or candidate genes by using a targeted next-generation sequencing approach. Seven loss-of-function (LoF) mutations-four nonsense (c.1195A>T [p.Lys399(∗)], c.1333C>T [p.Arg445(∗)], c.1866C>G [p.Tyr622(∗)], and c.3001C>T [p.Arg1001(∗)]) and three frameshift (c.2177_2178del [p.Thr726Asnfs(∗)39], c.3771dup [p.Ser1258Glufs(∗)65], and c.3856del [p.Ser1286Leufs(∗)84])-were identified in SETD5, a gene predicted to encode a methyltransferase. All mutations were compatible with de novo dominant inheritance. The affected individuals had moderate to severe ID with additional variable features of brachycephaly; a prominent high forehead with synophrys or striking full and broad eyebrows; a long, thin, and tubular nose; long, narrow upslanting palpebral fissures; and large, fleshy low-set ears. Skeletal anomalies, including significant leg-length discrepancy, were a frequent finding in two individuals. Congenital heart defects, inguinal hernia, or hypospadias were also reported. Behavioral problems, including obsessive-compulsive disorder, hand flapping with ritualized behavior, and autism, were prominent features. SETD5 lies within the critical interval for 3p25 microdeletion syndrome. The individuals with SETD5 mutations showed phenotypic similarity to those previously reported with a deletion in 3p25, and thus loss of SETD5 might be sufficient to account for many of the clinical features observed in this condition. Our findings add to the growing evidence that mutations in genes encoding methyltransferases regulating histone modification are important causes of ID. This analysis provides sufficient evidence that rare de novo LoF mutations in SETD5 are a relatively frequent (0.7%) cause of ID.
Funded by: Wellcome Trust: 100140, WT091310
American journal of human genetics 2014;94;4;618-24
Gray platelet syndrome: proinflammatory megakaryocytes and α-granule loss cause myelofibrosis and confer metastasis resistance in mice.
Department of Haematology, University of Cambridge, and National Health Service Blood and Transplant, Cambridge Biomedical Campus, Cambridge, United Kingdom;
NBEAL2 encodes a multidomain scaffolding protein with a putative role in granule ontogeny in human platelets. Mutations in NBEAL2 underlie gray platelet syndrome (GPS), a rare inherited bleeding disorder characterized by a lack of α-granules within blood platelets and progressive bone marrow fibrosis. We present here a novel Nbeal2(-/-) murine model of GPS and demonstrate that the lack of α-granules is due to their loss from platelets/mature megakaryocytes (MKs), and not by initial impaired formation. We show that the lack of Nbeal2 confers a proinflammatory phenotype to the bone marrow MKs, which in combination with the loss of proteins from α-granules drives the development of bone marrow fibrosis. In addition, we demonstrate that α-granule deficiency impairs platelet function beyond their purely hemostatic role and that Nbeal2 deficiency has a protective effect against cancer metastasis.
Funded by: British Heart Foundation: FS/09/039/27788, FS/14/40/30921, FS09/039, RG/09/012/28096, RG/09/12/28096; Cancer Research UK: 13031; Department of Health: RP-PG-0310-1002; Wellcome Trust: WT098051
The African Genome Variation Project shapes medical genetics in Africa.
1] Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge CB10 1SA, UK  Department of Public Health and Primary Care, University of Cambridge, 2 Wort's Causeway, Cambridge, CB1 8RN, UK.
Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.
Funded by: Intramural NIH HHS: Z01 HG200362, Z01 HG200362-01, ZIA HG200362-02, ZIA HG200362-03, ZIA HG200362-04, ZIA HG200362-05, ZIA HG200362-06; Medical Research Council: G0600718, G0801566, G0901213, G0901213-92157, G1001333, MC_UP_A900_1118, MR/K013491/1; NIMHD NIH HHS: P20 MD006899; Wellcome Trust: 090770, 100715, 100891, WT077383/Z/05/Z
A systematic review of definitions of extreme phenotypes of HIV control and progression.
aWellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton bStrangeways Research Laboratory, Department of Public Health and Primary Care, University of Cambridge, Wort's Causeway, Cambridge cMedical Research Council, Clinical Trials Unit, Aviation House, London, UK dCentre for the AIDS Programme of Research in South Africa (CAPRISA), Doris Duke Medical Research Institute, Nelson R Mandela School of Medicine, University of KwaZulu-Natal, Durban, South Africa eWellcome Trust Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford fImperial College Healthcare NHS Trust, London gCambridge University Hospitals NHS Foundation Trust, Department of Infectious Diseases, Addenbrooke's Hospital, Cambridge hKing's College London, Weston Education Centre iDivision of Infection and Immunity, University College London, London, UK.
The study of individuals at opposite ends of the HIV clinical spectrum can provide invaluable insights into HIV biology. Heterogeneity in criteria used to define these individuals can introduce inconsistencies in results from research and make it difficult to identify biological mechanisms underlying these phenotypes. In this systematic review, we formally quantified the heterogeneity in definitions used for terms referring to extreme phenotypes in the literature, and identified common definitions and components used to describe these phenotypes. We assessed 714 definitions of HIV extreme phenotypes in 501 eligible studies published between 1 January 2000 and 15 March 2012, and identified substantial variation among these. This heterogeneity in definitions may represent important differences in biological endophenotypes and clinical progression profiles of individuals selected by these, suggesting the need for harmonized definitions. In this context, we were able to identify common components in existing definitions that may provide a framework for developing consensus definitions for these phenotypes in HIV infection.
Funded by: Medical Research Council: G0901213, MC_UU_12023/15; Wellcome Trust
AIDS (London, England) 2014;28;2;149-62
Use of whole-genus genome sequence data to develop a multilocus sequence typing tool that accurately identifies Yersinia isolates to the species and subspecies levels.
Pathogen Research Group, Nottingham Trent University, Nottingham, United Kingdom.
The genus Yersinia is a large and diverse bacterial genus consisting of human-pathogenic species, a fish-pathogenic species, and a large number of environmental species. Recently, the phylogenetic and population structure of the entire genus was elucidated through the genome sequence data of 241 strains encompassing every known species in the genus. Here we report the mining of this enormous data set to create a multilocus sequence typing-based scheme that can identify Yersinia strains to the species level to a level of resolution equal to that for whole-genome sequencing. Our assay is designed to be able to accurately subtype the important human-pathogenic species Yersinia enterocolitica to whole-genome resolution levels. We also report the validation of the scheme on 386 strains from reference laboratory collections across Europe. We propose that the scheme is an important molecular typing system to allow accurate and reproducible identification of Yersinia isolates to the species level, a process often inconsistent in nonspecialist laboratories. Additionally, our assay is the most phylogenetically informative typing scheme available for Y. enterocolitica.
Journal of clinical microbiology 2014;53;1;35-42
The Y-chromosome tree bursts into leaf: 13,000 high-confidence SNPs covering the majority of known clades.
Department of Genetics, University of Leicester, Leicester, United Kingdom.
Many studies of human populations have used the male-specific region of the Y chromosome (MSY) as a marker, but MSY sequence variants have traditionally been subject to ascertainment bias. Also, dating of haplogroups has relied on Y-specific short tandem repeats (STRs), involving problems of mutation rate choice, and possible long-term mutation saturation. Next-generation sequencing can ascertain single nucleotide polymorphisms (SNPs) in an unbiased way, leading to phylogenies in which branch-lengths are proportional to time, and allowing the times-to-most-recent-common-ancestor (TMRCAs) of nodes to be estimated directly. Here we describe the sequencing of 3.7 Mb of MSY in each of 448 human males at a mean coverage of 51×, yielding 13,261 high-confidence SNPs, 65.9% of which are previously unreported. The resulting phylogeny covers the majority of the known clades, provides date estimates of nodes, and constitutes a robust evolutionary framework for analyzing the history of other classes of mutation. Different clades within the tree show subtle but significant differences in branch lengths to the root. We also apply a set of 23 Y-STRs to the same samples, allowing SNP- and STR-based diversity and TMRCA estimates to be systematically compared. Ongoing purifying selection is suggested by our analysis of the phylogenetic distribution of nonsynonymous variants in 15 MSY single-copy genes.
Funded by: Wellcome Trust: 087576
Molecular biology and evolution 2014;32;3;661-73
A GC1 Acinetobacter baumannii isolate carrying AbaR3 and the aminoglycoside resistance transposon TnaphA6 in a conjugative plasmid.
School of Molecular Bioscience, The University of Sydney, NSW 2006, Australia.
Objectives: To locate the acquired antibiotic resistance genes, including the amikacin resistance transposon TnaphA6, in the genome of an Australian isolate belonging to Acinetobacter baumannii global clone 1 (GC1).
Methods: A multiply antibiotic-resistant GC1 isolate harbouring TnaphA6 was sequenced using Illumina HiSeq, and reads were used to generate a de novo assembly and determine multilocus sequence types (STs). PCR was used to assemble the AbaR chromosomal resistance island and a large plasmid carrying TnaphA6. Plasmid DNA sequences were compared with ones available in GenBank. Conjugation experiments were conducted.
Results: The A. baumannii GC1 isolate G7 was shown to include the AbaR3 antibiotic resistance island. It also contains an 8.7 kb cryptic plasmid, pAb-G7-1, and a 70,100 bp plasmid, pAb-G7-2, carrying TnaphA6. pAb-G7-2 belongs to the Aci6 Acinetobacter plasmid family. It encodes transfer functions and was shown to conjugate. Plasmids related to pAb-G7-2 were detected in further amikacin-resistant GC1 isolates using PCR. From the genome sequence, isolate G7 was ST1 (Institut Pasteur scheme) and ST231 (Oxford scheme). Using Oxford scheme PCR-based methods, the isolate was ST109 and this difference was traced to a single base difference resulting from the inclusion of the original primers in the gpi segment analysed.
Conclusions: The multiply antibiotic-resistant GC1 isolate G7 carries most of its resistance genes in AbaR3 located in the chromosome. However, TnaphA6 is on a conjugative plasmid, pAb-G7-2. Primers developed to locate TnaphA6 in pAb-G7-2 will simplify the detection of plasmids related to pAb-G7-2 in A. baumannii isolates.
The Journal of antimicrobial chemotherapy 2014;69;4;955-8
A conjugative plasmid carrying the carbapenem resistance gene blaOXA-23 in AbaR4 in an extensively resistant GC1 Acinetobacter baumannii isolate.
School of Molecular Bioscience, The University of Sydney, NSW 2006, Australia.
Objectives: To locate the acquired bla(OXA-23) carbapenem resistance gene in an Australian A. baumannii global clone 1 (GC1) isolate.
Methods: The genome of the extensively antibiotic-resistant GC1 isolate A85 harbouring bla(OXA-23) in Tn2006 was sequenced using Illumina HiSeq, and the reads were used to generate a de novo assembly. PCR was used to assemble relevant contigs. Sequences were compared with ones in GenBank. Conjugation experiments were conducted.
Results: The sporadic GC1 isolate A85, recovered in 2003, was extensively resistant, exhibiting resistance to imipenem, meropenem and ticarcillin/clavulanate, to cephalosporins and fluoroquinolones and to the older antibiotics gentamicin, kanamycin and neomycin, sulfamethoxazole, trimethoprim and tetracycline. Genes for resistance to older antibiotics are in the chromosome, in an AbaR3 resistance island. A second copy of the ampC gene in Tn6168 confers cephalosporin resistance and the gyrA and parC genes have mutations leading to fluoroquinolone resistance. An 86 335 bp repAci6 plasmid, pA85-3, carrying bla(OXA-23) in Tn2006 in AbaR4, was shown to transfer imipenem, meropenem and ticarcillin/clavulanate resistance into a susceptible recipient. A85 also contains two small cryptic plasmids of 2.7 and 8.7 kb. A85 is sequence type ST126 (Oxford scheme) and carries a novel KL15 capsule locus and the OCL3 outer core locus.
Conclusions: A85 represents a new GC1 lineage identified by the novel capsule locus but retains AbaR3 carrying genes for resistance to older antibiotics. Resistance to imipenem, meropenem and ticarcillin/clavulanate has been introduced into A85 by pA85-3, a repAci6 conjugative plasmid carrying Tn2006 in AbaR4.
Funded by: Wellcome Trust: 098051
The Journal of antimicrobial chemotherapy 2014;69;10;2625-8
Identification of a marker for two lineages within the GC1 clone of Acinetobacter baumannii.
School of Molecular Bioscience, The University of Sydney, NSW 2006, Australia.
Funded by: Wellcome Trust
The Journal of antimicrobial chemotherapy 2014;69;2;557-8
Efficient in vivo deletion of a large imprinted lncRNA by CRISPR/Cas9.
MOE Key Laboratory of Model Animal for Disease Study; Model Animal Research Center of Nanjing University; Nanjing, Jiangsu Province, PR China.
Recent genome-wide studies have revealed that the majority of the mouse genome is transcribed as non-coding RNAs (ncRNAs) and growing evidence supports the importance of ncRNAs in regulating gene expression and epigenetic processes. However, the low efficiency of conventional gene targeting strategies has hindered the functional study of ncRNAs in vivo, particularly in generating large fragment deletions of long non-coding RNAs (lncRNAs) with multiple expression variants. The bacterial clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated 9 (Cas9) system has recently been applied as an efficient tool for engineering site-specific mutations of protein-coding genes in the genome. In this study, we explored the potential of using the CRISPR/Cas9 system to generate large genomic deletions of lncRNAs in mice. We developed an efficient one-step strategy to target the maternally expressed lncRNA, Rian, on chromosome 12 in mice. We showed that paired sgRNAs can precisely generate large deletions up to 23kb and the deletion efficiency can be further improved up to 33% by combining multiple sgRNAs. The deletion successfully abolished the expression of Rian from the maternally inherited allele, validating the biological relevance of the mutations in studying an imprinted locus. Mutation of Rian has differential effects on expression of nearby genes in different somatic tissues. Taken together, we have established a robust one-step method to engineer large deletions to knockout lncRNA genes with the CRISPR/Cas9 system. Our work will facilitate future functional studies of other lncRNAs in vivo.
RNA biology 2014;11;7;829-35
Centre for Microbial Diseases and Immunity Research, University of British Columbia, Vancouver, British Columbia, Canada, and the Wellcome Trust Sanger Institute, Hinxton, UK.
Nature biotechnology 2014;32;1;66-8
Haptoglobin (HP) and Haptoglobin-related protein (HPR) copy number variation, natural selection, and trypanosomiasis.
Department of Genetics, University of Leicester, Leicester, UK.
Haptoglobin, coded by the HP gene, is a plasma protein that acts as a scavenger for free heme, and haptoglobin-related protein (coded by the HPR gene) forms part of the trypanolytic factor TLF-1, together with apolipoprotein L1 (ApoL1). We analyse the polymorphic small intragenic duplication of the HP gene, with alleles Hp1 and Hp2, in 52 populations, and find no evidence for natural selection either from extended haplotype analysis or from correlation with pathogen richness matrices. Using fiber-FISH, the paralog ratio test, and array-CGH data, we also confirm that the HPR gene is copy number variable, with duplication of the whole HPR gene at polymorphic frequencies in west and central Africa, up to an allele frequency of 15 %. The geographical distribution of the HPR duplication allele overlaps the region where the pathogen causing chronic human African trypanosomiasis, Trypanosoma brucei gambiense, is endemic. The HPR duplication has occurred on one SNP haplotype, but there is no strong evidence of extended homozygosity, a characteristic of recent natural selection. The HPR duplication shows a slight, non-significant undertransmission to human African trypanosomiasis-affected children of unaffected parents in the Democratic Republic of Congo. However, taken together with alleles of APOL1, there is an overall significant undertransmission of putative protective alleles to human African trypanosomiasis-affected children.
Funded by: Medical Research Council: G0801123
Human genetics 2014;133;1;69-83
Restriction and recruitment-gene duplication and the origin and evolution of snake venom toxins.
School of Biological Sciences, Bangor University, United Kingdom.
Snake venom has been hypothesized to have originated and diversified through a process that involves duplication of genes encoding body proteins with subsequent recruitment of the copy to the venom gland, where natural selection acts to develop or increase toxicity. However, gene duplication is known to be a rare event in vertebrate genomes, and the recruitment of duplicated genes to a novel expression domain (neofunctionalization) is an even rarer process that requires the evolution of novel combinations of transcription factor binding sites in upstream regulatory regions. Therefore, although this hypothesis concerning the evolution of snake venom is very unlikely and should be regarded with caution, it is nonetheless often assumed to be established fact, hindering research into the true origins of snake venom toxins. To critically evaluate this hypothesis, we have generated transcriptomic data for body tissues and salivary and venom glands from five species of venomous and nonvenomous reptiles. Our comparative transcriptomic analysis of these data reveals that snake venom does not evolve through the hypothesized process of duplication and recruitment of genes encoding body proteins. Indeed, our results show that many proposed venom toxins are in fact expressed in a wide variety of body tissues, including the salivary gland of nonvenomous reptiles and that these genes have therefore been restricted to the venom gland following duplication, not recruited. Thus, snake venom evolves through the duplication and subfunctionalization of genes encoding existing salivary proteins. These results highlight the danger of the elegant and intuitive "just-so story" in evolutionary biology.
Genome biology and evolution 2014;6;8;2088-95
Testing the Toxicofera: comparative transcriptomics casts doubt on the single, early evolution of the reptile venom system.
School of Biological Sciences, Bangor University, Brambell Building, Deiniol Road, Bangor, Gwynedd LL57 2UW, United Kingdom. Electronic address: firstname.lastname@example.org.
The identification of apparently conserved gene complements in the venom and salivary glands of a diverse set of reptiles led to the development of the Toxicofera hypothesis - the single, early evolution of the venom system in reptiles. However, this hypothesis is based largely on relatively small scale EST-based studies of only venom or salivary glands and toxic effects have been assigned to only some putative Toxicoferan toxins in some species. We set out to examine the distribution of these proposed venom toxin transcripts in order to investigate to what extent conservation of gene complements may reflect a bias in previous sampling efforts. Our quantitative transcriptomic analyses of venom and salivary glands and other body tissues in five species of reptile, together with the use of available RNA-Seq datasets for additional species, shows that the majority of genes used to support the establishment and expansion of the Toxicofera are in fact expressed in multiple body tissues and most likely represent general maintenance or "housekeeping" genes. The apparent conservation of gene complements across the Toxicofera therefore reflects an artefact of incomplete tissue sampling. We therefore conclude that venom has evolved multiple times in reptiles.
Toxicon : official journal of the International Society on Toxinology 2014;92;140-56
Abundant and diverse clustered regularly interspaced short palindromic repeat spacers in Clostridium difficile strains and prophages target multiple phage types within this pathogen.
Department of Infection, Inflammation and Immunity, University of Leicester, Leicester, United Kingdom.
Unlabelled: Clostridium difficile is an important human-pathogenic bacterium causing antibiotic-associated nosocomial infections worldwide. Mobile genetic elements and bacteriophages have helped shape C. difficile genome evolution. In many bacteria, phage infection may be controlled by a form of bacterial immunity called the clustered regularly interspaced short palindromic repeats/CRISPR-associated (CRISPR/Cas) system. This uses acquired short nucleotide sequences (spacers) to target homologous sequences (protospacers) in phage genomes. C. difficile carries multiple CRISPR arrays, and in this paper we examine the relationships between the host- and phage-carried elements of the system. We detected multiple matches between spacers and regions in 31 C. difficile phage and prophage genomes. A subset of the spacers was located in prophage-carried CRISPR arrays. The CRISPR spacer profiles generated suggest that related phages would have similar host ranges. Furthermore, we show that C. difficile strains of the same ribotype could either have similar or divergent CRISPR contents. Both synonymous and nonsynonymous mutations in the protospacer sequences were identified, as well as differences in the protospacer adjacent motif (PAM), which could explain how phages escape this system. This paper illustrates how the distribution and diversity of CRISPR spacers in C. difficile, and its prophages, could modulate phage predation for this pathogen and impact upon its evolution and pathogenicity.
Importance: Clostridium difficile is a significant bacterial human pathogen which undergoes continual genome evolution, resulting in the emergence of new virulent strains. Phages are major facilitators of genome evolution in other bacterial species, and we use sequence analysis-based approaches in order to examine whether the CRISPR/Cas system could control these interactions across divergent C. difficile strains. The presence of spacer sequences in prophages that are homologous to phage genomes raises an extra level of complexity in this predator-prey microbial system. Our results demonstrate that the impact of phage infection in this system is widespread and that the CRISPR/Cas system is likely to be an important aspect of the evolutionary dynamics in C. difficile.
Funded by: Medical Research Council: G0700855
Modification of British Committee for Standards in Haematology diagnostic criteria for essential thrombocythaemia.
Department of Haematology, Guy's and St Thomas, Hospitals' NHS Foundation Trust, London, UK. Claire.Harrison@gstt.nhs.uk.
Funded by: Medical Research Council: G84/6443
British journal of haematology 2014;167;3;421-3
A novel hybrid SCCmec-mecC region in Staphylococcus sciuri.
Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.
Objectives: Methicillin resistance in Staphylococcus spp. results from the expression of an alternative penicillin-binding protein 2a (encoded by mecA) with a low affinity for β-lactam antibiotics. Recently, a novel variant of mecA known as mecC (formerly mecALGA251) was identified in Staphylococcus aureus isolates from both humans and animals. In this study, we identified two Staphylococcus sciuri subsp. carnaticus isolates from bovine infections that harbour three different mecA homologues: mecA, mecA1 and mecC.
Methods: We subjected the two isolates to whole-genome sequencing to further understand the genetic context of the mec-containing region. We also used PCR and RT-PCR to investigate the excision and expression of the SCCmec element and mec genes, respectively.
Results: Whole-genome sequencing revealed a novel hybrid SCCmec region at the orfX locus consisting of a class E mec complex (mecI-mecR1-mecC1-blaZ) located immediately downstream of a staphylococcal cassette chromosome mec (SCCmec) type VII element. A second SCCmec attL site (attL2), which was imperfect, was present downstream of the mecC region. PCR analysis of stationary-phase cultures showed that both the SCCmec type VII element and a hybrid SCCmec-mecC element were capable of excision from the genome and forming a circular intermediate. Transcriptional analysis showed that mecC and mecA, but not mecA1, were both expressed in liquid culture supplemented with oxacillin.
Conclusions: Overall, this study further highlights that a range of staphylococcal species harbour the mecC gene and furthers the view that coagulase-negative staphylococci associated with animals may act as reservoirs of antibiotic resistance genes for more pathogenic staphylococcal species.
Funded by: Medical Research Council: G1001787, G1001787/1
The Journal of antimicrobial chemotherapy 2014;69;4;911-8
A shared population of epidemic methicillin-resistant Staphylococcus aureus 15 circulates in humans and companion animals.
Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom.
Unlabelled: Methicillin-resistant Staphylococcus aureus (MRSA) is a global human health problem causing infections in both hospitals and the community. Companion animals, such as cats, dogs, and horses, are also frequently colonized by MRSA and can become infected. We sequenced the genomes of 46 multilocus sequence type (ST) 22 MRSA isolates from cats and dogs in the United Kingdom and compared these to an extensive population framework of human isolates from the same lineage. Phylogenomic analyses showed that all companion animal isolates were interspersed throughout the epidemic MRSA-15 (EMRSA-15) pandemic clade and clustered with human isolates from the United Kingdom, with human isolates basal to those from companion animals, suggesting a human source for isolates infecting companion animals. A number of isolates from the same veterinary hospital clustered together, suggesting that as in human hospitals, EMRSA-15 isolates are readily transmitted in the veterinary hospital setting. Genome-wide association analysis did not identify any host-specific single nucleotide polymorphisms (SNPs) or virulence factors. However, isolates from companion animals were significantly less likely to harbor a plasmid encoding erythromycin resistance. When this plasmid was present in animal-associated isolates, it was more likely to contain mutations mediating resistance to clindamycin. This finding is consistent with the low levels of erythromycin and high levels of clindamycin used in veterinary medicine in the United Kingdom. This study furthers the "one health" view of infectious diseases that the pathogen pool of human and animal populations are intrinsically linked and provides evidence that antibiotic usage in animal medicine is shaping the population of a major human pathogen.
Importance: Methicillin-resistant Staphylococcus aureus (MRSA) is major problem in human medicine. Companion animals, such as cats, dogs, and horses, can also become colonized and infected by MRSA. Here, we demonstrate that a shared population of an important and globally disseminated lineage of MRSA can infect both humans and companion animals without undergoing host adaptation. This suggests that companion animals might act as a reservoir for human infections. We also show that the isolates from companion animals have differences in the presence of certain antibiotic resistance genes. This study furthers the "one health" view of infectious diseases by demonstrating that the pool of MRSA isolates in the human and animal populations are shared and highlights how different antibiotic usage patterns between human and veterinary medicine can shape the population of bacterial pathogens.
Funded by: Medical Research Council: G1001787, G1001787/1; Wellcome Trust: 098051
Using population isolates in genetic association studies.
The use of genetically isolated populations can empower next-generation association studies. In this review, we discuss the advantages of this approach and review study design and analytical considerations of genetic association studies focusing on isolates. We cite successful examples of using population isolates in association studies and outline potential ways forward.
Funded by: European Research Council: 280559; Wellcome Trust: 098051
Briefings in functional genomics 2014;13;5;371-7
Bayesian latent variable collapsing model for detecting rare variant interaction effect in twin study.
Department of Public Health, Hjelt Institute, University of Helsinki, Finland.
By analyzing more next-generation sequencing data, researchers have affirmed that rare genetic variants are widespread among populations and likely play an important role in complex phenotypes. Recently, a handful of statistical models have been developed to analyze rare variant (RV) association in different study designs. However, due to the scarce occurrence of minor alleles in data, appropriate statistical methods for detecting RV interaction effects are still difficult to develop. We propose a hierarchical Bayesian latent variable collapsing method (BLVCM), which circumvents the obstacles by parameterizing the signals of RVs with latent variables in a Bayesian framework and is parameterized for twin data. The BLVCM can tackle nonassociated variants, allow both protective and deleterious effects, capture SNP-SNP synergistic effect, provide estimates for the gene level and individual SNP contributions, and can be applied to both independent and various twin designs. We assessed the statistical properties of the BLVCM using simulated data, and found that it achieved better performance in terms of power for interaction effect detection compared to the Granvil and the SKAT. As proof of practical application, the BLVCM was then applied to a twin study analysis of more than 20,000 gene regions to identify significant RVs associated with low-density lipoprotein cholesterol level. The results show that some of the findings are consistent with previous studies, and we identified some novel gene regions with significant SNP-SNP synergistic effects.
Funded by: NIAAA NIH HHS: AA-00145, AA-09203, AA-12502, AA15416, K02AA018755
Genetic epidemiology 2014;38;4;310-24
Mechanisms underlying mutational signatures in human cancers.
Science for Life Laboratory, Division of Translational Medicine and Chemical Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, S-171 21 Stockholm, Sweden.
The collective somatic mutations observed in a cancer are the outcome of multiple mutagenic processes that have been operative over the lifetime of a patient. Each process leaves a characteristic imprint--a mutational signature--on the cancer genome, which is defined by the type of DNA damage and DNA repair processes that result in base substitutions, insertions and deletions or structural variations. With the advent of whole-genome sequencing, researchers are identifying an increasing array of these signatures. Mutational signatures can be used as a physiological readout of the biological history of a cancer and also have potential use for discerning ongoing mutational processes from historical ones, thus possibly revealing new targets for anticancer therapies.
Funded by: Wellcome Trust: WT100183MA
Nature reviews. Genetics 2014;15;9;585-98
Innate immunity. A Spaetzle-like role for nerve growth factor β in vertebrate immunity to Staphylococcus aureus.
Cambridge Institute for Medical Research, University of Cambridge, UK. Department of Medicine, University of Cambridge, UK.
Many key components of innate immunity to infection are shared between Drosophila and humans. However, the fly Toll ligand Spaetzle is not thought to have a vertebrate equivalent. We have found that the structurally related cystine-knot protein, nerve growth factor β (NGFβ), plays an unexpected Spaetzle-like role in immunity to Staphylococcus aureus infection in chordates. Deleterious mutations of either human NGFβ or its high-affinity receptor tropomyosin-related kinase receptor A (TRKA) were associated with severe S. aureus infections. NGFβ was released by macrophages in response to S. aureus exoproteins through activation of the NOD-like receptors NLRP3 and NLRP4 and enhanced phagocytosis and superoxide-dependent killing, stimulated proinflammatory cytokine production, and promoted calcium-dependent neutrophil recruitment. TrkA knockdown in zebrafish increased susceptibility to S. aureus infection, confirming an evolutionarily conserved role for NGFβ-TRKA signaling in pathogen-specific host immunity.
Funded by: Department of Health; Intramural NIH HHS; Medical Research Council: G0700091, G0701932, MR/K006312/1; National Centre for the Replacement, Refinement and Reduction of Animals in Research: NC/K500392/1; Wellcome Trust: 084953, 089981, 100140
Science (New York, N.Y.) 2014;346;6209;641-6
Optoactivation of locus ceruleus neurons evokes bidirectional changes in thermal nociception in rats.
School of Physiology and Pharmacology, University of Bristol, Bristol BS8 1TD, United Kingdom, Department of Anesthesia, University Hospitals Bristol, Bristol BS2 8HW, United Kingdom, Department of Information Physiology, National Institute for Physiological Sciences, Myodaiji, Okazaki 444-8787, Japan, Wellcome Trust Sanger Institute, Cambridge CB10 1SA, United Kingdom, and Sorbonne Universités, Université Pierre et Marie Curie Paris 6, Unité Mixte de Recherche-Scientifique 8246, Neuroscience Paris Seine, Navigation Memory and Aging team, F-75005 Paris, France.
Pontospinal noradrenergic neurons are thought to form part of a descending endogenous analgesic system that exerts inhibitory influences on spinal nociception. Using optogenetic targeting, we tested the hypothesis that excitation of the locus ceruleus (LC) is antinociceptive. We transduced rat LC neurons by direct injection of a lentiviral vector expressing channelrhodopsin2 under the control of the PRS promoter. Subsequent optoactivation of the LC evoked repeatable, robust, antinociceptive (+4.7°C ± 1.0, p < 0.0001) or pronociceptive (-4.4°C ± 0.7, p < 0.0001) changes in hindpaw thermal withdrawal thresholds. Post hoc anatomical characterization of the distribution of transduced somata referenced against the position of the optical fiber and subsequent further functional analysis showed that antinociceptive actions were evoked from a distinct, ventral subpopulation of LC neurons. Therefore, the LC is capable of exerting potent, discrete, bidirectional influences on thermal nociception that are produced by specific subpopulations of noradrenergic neurons. This reflects an underlying functional heterogeneity of the influence of the LC on the processing of nociceptive information.
Funded by: British Heart Foundation; Wellcome Trust: 088373
The Journal of neuroscience : the official journal of the Society for Neuroscience 2014;34;12;4148-60
Prenatal exome sequencing for fetuses with structural abnormalities: the next step.
College of Women's and Children's Health & School of Clinical and Experimental Medicine, College of Medicine and Dentistry, University of Birmingham, Edgbaston, Birmingham, UK; Fetal Medicine Centre, Birmingham Women's Foundation Trust, Edgbaston, Birmingham, UK.
Ultrasound in obstetrics & gynecology : the official journal of the International Society of Ultrasound in Obstetrics and Gynecology 2014;45;1;4-9
Global phylogenomic analysis of nonencapsulated Streptococcus pneumoniae reveals a deep-branching classic lineage that is distinct from multiple sporadic lineages.
Institute for Infectious Diseases, University of Bern, Switzerland Department of Infectious Diseases, Inselspital, Bern University Hospital and University of Bern, Switzerland email@example.com.
The surrounding capsule of Streptococcus pneumoniae has been identified as a major virulence factor and is targeted by pneumococcal conjugate vaccines (PCV). However, nonencapsulated S. pneumoniae (non-Ec-Sp) have also been isolated globally, mainly in carriage studies. It is unknown if non-Ec-Sp evolve sporadically, if they have high antibiotic nonsusceptiblity rates and a unique, specific gene content. Here, whole-genome sequencing of 131 non-Ec-Sp isolates sourced from 17 different locations around the world was performed. Results revealed a deep-branching classic lineage that is distinct from multiple sporadic lineages. The sporadic lineages clustered with a previously sequenced, global collection of encapsulated S. pneumoniae (Ec-Sp) isolates while the classic lineage is comprised mainly of the frequently identified multilocus sequences types (STs) ST344 (n = 39) and ST448 (n = 40). All ST344 and nine ST448 isolates had high nonsusceptiblity rates to β-lactams and other antimicrobials. Analysis of the accessory genome reveals that the classic non-Ec-Sp contained an increased number of mobile elements, than Ec-Sp and sporadic non-Ec-Sp. Performing adherence assays to human epithelial cells for selected classic and sporadic non-Ec-Sp revealed that the presence of a integrative conjugative element (ICE) results in increased adherence to human epithelial cells (P = 0.005). In contrast, sporadic non-Ec-Sp lacking the ICE had greater growth in vitro possibly resulting in improved fitness. In conclusion, non-Ec-Sp isolates from the classic lineage have evolved separately. They have spread globally, are well adapted to nasopharyngeal carriage and are able to coexist with Ec-Sp. Due to continued use of PCV, non-Ec-Sp may become more prevalent.
Funded by: NIAID NIH HHS: R01 AI106786, R01 AI106786-01; Wellcome Trust: 083735/Z/07
Genome biology and evolution 2014;6;12;3281-94
Medicine. Halting harmful helminths.
Institute of Biological, Environmental and Rural Sciences (IBERS), Aberystwyth University, Aberystwyth SY23 3DA, UK. firstname.lastname@example.org.
Science (New York, N.Y.) 2014;346;6206;168-9
Trypsin- and Chymotrypsin-like serine proteases in schistosoma mansoni-- 'the undiscovered country'.
Institute of Organic Chemistry and Biochemistry, Academy of Sciences of the Czech Republic, Prague, Czech Republic.
Background: Blood flukes (Schistosoma spp.) are parasites that can survive for years or decades in the vasculature of permissive mammalian hosts, including humans. Proteolytic enzymes (proteases) are crucial for successful parasitism, including aspects of invasion, maturation and reproduction. Most attention has focused on the 'cercarial elastase' serine proteases that facilitate skin invasion by infective schistosome larvae, and the cysteine and aspartic proteases that worms use to digest the blood meal. Apart from the cercarial elastases, information regarding other S. mansoni serine proteases (SmSPs) is limited. To address this, we investigated SmSPs using genomic, transcriptomic, phylogenetic and functional proteomic approaches.
Methodology/principal findings: Genes encoding five distinct SmSPs, termed SmSP1 - SmSP5, some of which comprise disparate protein domains, were retrieved from the S. mansoni genome database and annotated. Reverse transcription quantitative PCR (RT- qPCR) in various schistosome developmental stages indicated complex expression patterns for SmSPs, including their constituent protein domains. SmSP2 stood apart as being massively expressed in schistosomula and adult stages. Phylogenetic analysis segregated SmSPs into diverse clusters of family S1 proteases. SmSP1 to SmSP4 are trypsin-like proteases, whereas SmSP5 is chymotrypsin-like. In agreement, trypsin-like activities were shown to predominate in eggs, schistosomula and adults using peptidyl fluorogenic substrates. SmSP5 is particularly novel in the phylogenetics of family S1 schistosome proteases, as it is part of a cluster of sequences that fill a gap between the highly divergent cercarial elastases and other family S1 proteases.
Conclusions/significance: Our series of post-genomics analyses clarifies the complexity of schistosome family S1 serine proteases and highlights their interrelationships, including the cercarial elastases and, not least, the identification of a 'missing-link' protease cluster, represented by SmSP5. A framework is now in place to guide the characterization of individual proteases, their stage-specific expression and their contributions to parasitism, in particular, their possible modulation of host physiology.
Funded by: NIDDK NIH HHS: P30 DK026743
PLoS neglected tropical diseases 2014;8;3;e2766
Obesity accelerates epigenetic aging of human liver.
Departments of Human Genetics, David Geffen School of Medicine, and Biostatistics, School of Public Health, University of California Los Angeles, CA 90095; email@example.com.
Because of the dearth of biomarkers of aging, it has been difficult to test the hypothesis that obesity increases tissue age. Here we use a novel epigenetic biomarker of aging (referred to as an "epigenetic clock") to study the relationship between high body mass index (BMI) and the DNA methylation ages of human blood, liver, muscle, and adipose tissue. A significant correlation between BMI and epigenetic age acceleration could only be observed for liver (r = 0.42, P = 6.8 × 10(-4) in dataset 1 and r = 0.42, P = 1.2 × 10(-4) in dataset 2). On average, epigenetic age increased by 3.3 y for each 10 BMI units. The detected age acceleration in liver is not associated with the Nonalcoholic Fatty Liver Disease Activity Score or any of its component traits after adjustment for BMI. The 279 genes that are underexpressed in older liver samples are highly enriched (1.2 × 10(-9)) with nuclear mitochondrial genes that play a role in oxidative phosphorylation and electron transport. The epigenetic age acceleration, which is not reversible in the short term after rapid weight loss induced by bariatric surgery, may play a role in liver-related comorbidities of obesity, such as insulin resistance and liver cancer.
Funded by: NIA NIH HHS: 5R01AG042511-02, R01 AG042511
Proceedings of the National Academy of Sciences of the United States of America 2014;111;43;15538-43
Host genetics of Epstein-Barr virus infection, latency and disease.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK; Division of Biological Anthropology, Department of Archaeology and Anthropology, University of Cambridge, Cambridge, UK.
Epstein-Barr virus (EBV) infects 95% of the adult population and is the cause of infectious mononucleosis. It is also associated with 1% of cancers worldwide, such as nasopharyngeal carcinoma, Hodgkin's lymphoma and Burkitt's lymphoma. Human and cancer genetic studies are now major forces determining gene variants associated with many cancers, including nasopharyngeal carcinoma and Hodgkin's lymphoma. Host genetics is also important in infectious disease; however, there have been no large-scale efforts towards understanding the contribution that human genetic variation plays in primary EBV infection and latency. This review covers 25 years of studies into host genetic susceptibility to EBV infection and disease, from candidate gene studies, to the first genome-wide association study of EBV antibody response, and an EBV-status stratified genome-wide association study of Hodgkin's lymphoma. Although many genes are implicated in EBV-related disease, studies are often small, not replicated or followed up in a different disease. Larger, appropriately powered genomic studies to understand the host response to EBV will be needed to move our understanding of the biology of EBV infection beyond the handful of genes currently identified. Fifty years since the discovery of EBV and its identification as a human oncogenic virus, a glimpse of the future is shown by the first whole-genome and whole-exome studies, revealing new human genes at the heart of the host-EBV interaction.
Funded by: Medical Research Council: G0900209; Wellcome Trust: 098051
Reviews in medical virology 2014;25;2;71-84
Host genetic variants and gene expression patterns associated with Epstein-Barr virus copy number in lymphoblastoid cell lines.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom; Division of Biological Anthropology, Department of Archaeology and Anthropology, University of Cambridge, Cambridge, United Kingdom.
Lymphoblastoid cell lines (LCLs) are commonly used in molecular genetics, supplying DNA for the HapMap and 1000 Genomes Projects, used to test chemotherapeutic agents, and informing the basis of a number of population genetics studies of gene expression. The process of transforming human B cells into LCLs requires the presence of Epstein-Barr virus (EBV), a double-stranded DNA virus which through B-cell immortalisation maintains an episomal virus genome in every cell of an LCL at variable copy numbers. Previous studies have reported that EBV alters host-gene expression and EBV copy number may be under host genetic control. We performed a genome-wide association study of EBV genome copy number in LCLs and found the phenotype to be highly heritable, although no individual SNPs achieved a significant association with EBV copy number. The expression of two host genes (CXCL16 and AGL) was positively correlated and expression of ADARB2 was negatively correlated with EBV copy number in a genotype-independent manner. This study shows an association between EBV copy number and the gene expression profile of LCLs, and suggests that EBV copy number should be considered as a covariate in future studies of host gene expression in LCLs.
Funded by: Medical Research Council: G0900209; Wellcome Trust: 098051
PloS one 2014;9;10;e108384
Different waves and directions of Neolithic migrations in the Armenian Highland.
Laboratory of Ethnogenomics, Institute of Molecular Biology NAS RA, 7 Hasratyan Str., Yerevan, Armenia.
Background: The peopling of Europe and the nature of the Neolithic agricultural migration as a primary issue in the modern human colonization of the globe is still widely debated. At present, much uncertainty is associated with the reconstruction of the routes of migration for the first farmers from the Near East. In this context, hospitable climatic conditions and the key geographic position of the Armenian Highland suggest that it may have served as a conduit for several waves of expansion of the first agriculturalists from the Near East to Europe and the North Caucasus.
Results: Here, we assess Y-chromosomal distribution in six geographically distinct populations of Armenians that roughly represent the extent of historical Armenia. Using the general haplogroup structure and the specific lineages representing putative genetic markers of the Neolithic Revolution, haplogroups R1b1a2, J2, and G, we identify distinct patterns of genetic affinity between the populations of the Armenian Highland and the neighboring ones north and west from this area.
Conclusions: Based on the results obtained, we suggest a new insight on the different routes and waves of Neolithic expansion of the first farmers through the Armenian Highland. We detected at least two principle migratory directions: (1) westward alongside the coastline of the Mediterranean Sea and (2) northward to the North Caucasus.
Investigative genetics 2014;5;1;15
The use of genome wide association methods to investigate pathogenicity, population structure and serovar in Haemophilus parasuis.
Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge, CB3 0ES, UK. firstname.lastname@example.org.
Background: Haemophilus parasuis is the etiologic agent of Glässer's disease in pigs and causes devastating losses to the farming industry. Whilst some hyper-virulent isolates have been described, the relationship between genetics and disease outcome has been only partially established. In particular, there is weak correlation between serovar and disease phenotype. We sequenced the genomes of 212 isolates of H. parasuis and have used this to describe the pan-genome and to correlate this with clinical and carrier status, as well as with serotype.
Results: Recombination and population structure analyses identified five groups with very high rates of recombination, separated into two clades of H. parasuis with no signs of recombination between them. We used genome-wide association methods including discriminant analysis of principal components (DAPC) and generalised linear modelling (glm) to look for genetic determinants of this population partition, serovar and pathogenicity. We were able to identify genes from the accessory genome that were significantly associated with phenotypes such as potential serovar specific genes including capsule genes, and 48 putative virulence factors that were significantly different between the clinical and non-clinical isolates. We also show that the presence of many previously suggested virulence factors is not an appropriate marker of virulence.
Conclusions: These genes will inform the generation of new molecular diagnostics and vaccines, and refinement of existing typing schemes and show the importance of the accessory genome of a diverse species when investigating the relationship between genotypes and phenotypes.
Funded by: Biotechnology and Biological Sciences Research Council: BB/G003203/1, BB/G018553/1, BB/G019177/1, BB/G019274/1, BB/G020744/1
BMC genomics 2014;15;1179
Genome-wide association study for circulating tissue plasminogen activator levels and functional follow-up implicates endothelial STXBP5 and STX2.
From National Heart, Lung, and Blood Institute's Framingham Heart Study, Framingham, MA (J.H., A.D.J., C.J.O.); Division of Intramural Research, National Heart, Lung, and Blood Institute, Bethesda, MD (J.H., A.D.J., C.J.O.); MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, Western General Hospital, Edinburgh, Scotland, United Kingdom (J.E.H., V.V., A.F.W., C.H.); The Aab Cardiovascular Research Institute, Department of Medicine, University of Rochester School of Medicine and Dentistry, Rochester, NY (M.Y., C.J.L.); Departments of Cardiology (S.T., J.W.J.), Gerontology and Geriatrics (S.T., A.J.M.d.C., R.G.J.W.), and Molecular Epidemiology (P.E.S.), Leiden University Medical Center, the Netherlands; Department of Cardiology, Division of Heart and Lungs, University Medical Center Utrecht, Utrecht, the Netherlands (F.W.A.); Durrer Center for Cardiogenetic Research, ICIN-Netherlands Heart Institute, Utrecht, the Netherlands (F.W.A.); Institute of Cardiovascular Science, Faculty of Population Health Sciences, University College London, London, United Kingdom (F.W.A.); Cardiovascular Genetics and Genomics Group, Atherosclerosis Research Unit, Department of Medicine (M.S.-L., L.F., P.E., A.H.), Karolinska Institutet, Karolinska University Hospital, Solna, Stockholm, Sweden; INSERM UMRS 937, Pierre et Marie Curie University, Paris, France (D.-A.T., V.T., T.O.M., F.C.); ICAN Institute for Cardiometabolism and Nutrion, Paris, France (D.-A.T., V.T., F.C.); Departments of Public Health Sciences (W.M.C., B.B.W., F.C.) and Biochemistry and Molecular Genetics (M.M.S.), Center for Public Health Genomics, University of Virginia, Charlottesville, VA; Departments of Epidemiology (N.L.S., B.M.P., B.M.), Medicine (B.M.P., J.C.B.), and Health Services (B.M.P.), University of Washington, Seattle, WA; Group Health Research Institute, Group Health Cooperative, Seattle, WA (N.L.S., B.M.P.); Seattle Epidemiologic Research and Information Center, VA Office of Research and
Objective: Tissue plasminogen activator (tPA), a serine protease, catalyzes the conversion of plasminogen to plasmin, the major enzyme responsible for endogenous fibrinolysis. In some populations, elevated plasma levels of tPA have been associated with myocardial infarction and other cardiovascular diseases. We conducted a meta-analysis of genome-wide association studies to identify novel correlates of circulating levels of tPA.
Approach and results: Fourteen cohort studies with tPA measures (N=26 929) contributed to the meta-analysis. Three loci were significantly associated with circulating tPA levels (P<5.0×10(-8)). The first locus is on 6q24.3, with the lead single nucleotide polymorphism (SNP; rs9399599; P=2.9×10(-14)) within STXBP5. The second locus is on 8p11.21. The lead SNP (rs3136739; P=1.3×10(-9)) is intronic to POLB and <200 kb away from the tPA encoding the gene PLAT. We identified a nonsynonymous SNP (rs2020921) in modest linkage disequilibrium with rs3136739 (r(2)=0.50) within exon 5 of PLAT (P=2.0×10(-8)). The third locus is on 12q24.33, with the lead SNP (rs7301826; P=1.0×10(-9)) within intron 7 of STX2. We further found evidence for the association of lead SNPs in STXBP5 and STX2 with expression levels of the respective transcripts. In in vitro cell studies, silencing STXBP5 decreased the release of tPA from vascular endothelial cells, whereas silencing STX2 increased the tPA release. Through an in silico lookup, we found no associations of the 3 lead SNPs with coronary artery disease or stroke.
Conclusions: We identified 3 loci associated with circulating tPA levels, the PLAT region, STXBP5, and STX2. Our functional studies implicate a novel role for STXBP5 and STX2 in regulating tPA release.
Funded by: Chief Scientist Office: CZB/4/710; Intramural NIH HHS: Z99 HL999999, ZIA HL006002-07; Medical Research Council: G0000934, MC_PC_U127561128; NCATS NIH HHS: UL1 TR000124, UL1 TR001079, UL1TR000124; NCRR NIH HHS: M01 RR00052, RR018787, UL1RR033176; NHGRI NIH HHS: U01 HG005157, U01 HG005160; NHLBI NIH HHS: 1U01 HL072518, HL080295, HL087652, HL105756, HL65234, HL67466, N01 HC025195, P01 HL56091, P01 HL65608, R01 HL074061, R01 HL78635, R01-HL093029, R01HL59684, U01 HL096917; NIA NIH HHS: AG-023629, AG-027058, AG-15928, AG-20098, AG023629, AG033193, AG08122, R01 AG008122, R01 AG033193; NIDDK NIH HHS: DK063491, K24 DK080140, P30 DK063491, U01 DK062418; NINDS NIH HHS: NS17950, R01 NS34447; NLM NIH HHS: LM010098
Arteriosclerosis, thrombosis, and vascular biology 2014;34;5;1093-101
Using ancestry-informative markers to identify fine structure across 15 populations of European origin.
The Wellcome Trust Sanger Institute (WTSI), Hinxton, UK.
The Wellcome Trust Case Control Consortium 3 anorexia nervosa genome-wide association scan includes 2907 cases from 15 different populations of European origin genotyped on the Illumina 670K chip. We compared methods for identifying population stratification, and suggest list of markers that may help to counter this problem. It is usual to identify population structure in such studies using only common variants with minor allele frequency (MAF) >5%; we find that this may result in highly informative SNPs being discarded, and suggest that instead all SNPs with MAF >1% may be used. We established informative axes of variation identified via principal component analysis and highlight important features of the genetic structure of diverse European-descent populations, some studied for the first time at this scale. Finally, we investigated the substructure within each of these 15 populations and identified SNPs that help capture hidden stratification. This work can provide information regarding the designing and interpretation of association results in the International Consortia.
Funded by: British Heart Foundation: RG/09/012/28096; Department of Health: RP-PG-0310-1002; Medical Research Council: MR/J006742/1, MR/J500355/1, MR/K500999/1; NIA NIH HHS: U19 AG023122; NIMH NIH HHS: K01 MH100435; Wellcome Trust: 090532, 098051
European journal of human genetics : EJHG 2014;22;10;1190-200
Whole exome sequencing in family trios reveals de novo mutations in PURA as a cause of severe neurodevelopmental delay and learning disability.
Wessex Clinical Genetics Service, Princess Anne Hospital, Southampton, UK.
Background: De novo mutations are emerging as an important cause of neurocognitive impairment, and whole exome sequencing of case-parent trios is a powerful way of detecting them. Here, we report the findings in four such trios.
Methods: The Deciphering Developmental Disorders study is using whole exome sequencing in family trios to investigate children with severe, sporadic, undiagnosed developmental delay. Three of our patients were ascertained from the first 1133 children to have been investigated through this large-scale study. Case 4 was a phenotypically isolated case recruited into an undiagnosed rare disorders sequencing study.
Results: Protein-altering de novo mutations in PURA were identified in four subjects. They include two different frameshifts, one inframe deletion and one missense mutation. PURA encodes Pur-α, a highly conserved multifunctional protein that has an important role in normal postnatal brain development in animal models. The associated human phenotype of de novo heterozygous mutations in this gene is variable, but moderate to severe neurodevelopmental delay and learning disability are common to all. Neonatal hypotonia, early feeding difficulties and seizures, or 'seizure-like' movements, were also common. Additionally, it is suspected that anterior pituitary dysregulation may be within the spectrum of this disorder. Psychomotor developmental outcomes appear variable between patients, and we propose a possible genotype-phenotype correlation, with disruption of Pur repeat III resulting in a more severe phenotype.
Conclusions: These findings provide definitive evidence for the role of PURA in causing a variable syndrome of neurodevelopmental delay, learning disability, neonatal hypotonia, feeding difficulties, abnormal movements and epilepsy in humans, and help clarify the role of PURA in the previously described 5q31.3 microdeletion phenotype.
Funded by: Department of Health; Wellcome Trust: WT098051
Journal of medical genetics 2014;51;12;806-13
A comprehensive evaluation of assembly scaffolding tools.
Background: Genome assembly is typically a two-stage process: contig assembly followed by the use of paired sequencing reads to join contigs into scaffolds. Scaffolds are usually the focus of reported assembly statistics; longer scaffolds greatly facilitate the use of genome sequences in downstream analyses, and it is appealing to present larger numbers as metrics of assembly performance. However, scaffolds are highly prone to errors, especially when generated using short reads, which can directly result in inflated assembly statistics.
Results: Here we provide the first independent evaluation of scaffolding tools for second-generation sequencing data. We find large variations in the quality of results depending on the tool and dataset used. Even extremely simple test cases of perfect input, constructed to elucidate the behaviour of each algorithm, produced some surprising results. We further dissect the performance of the scaffolders using real and simulated sequencing data derived from the genomes of Staphylococcus aureus, Rhodobacter sphaeroides, Plasmodium falciparum and Homo sapiens. The results from simulated data are of high quality, with several of the tools producing perfect output. However, at least 10% of joins remains unidentified when using real data.
Conclusions: The scaffolders vary in their usability, speed and number of correct and missed joins made between contigs. Results from real data highlight opportunities for further improvements of the tools. Overall, SGA, SOPRA and SSPACE generally outperform the other tools on our datasets. However, the quality of the results is highly dependent on the read mapper and genome complexity.
Funded by: Wellcome Trust: 082130/Z/07/Z, 098051
Genome biology 2014;15;3;R42
Insertional mutagenesis and deep profiling reveals gene hierarchies and a Myc/p53-dependent bottleneck in lymphomagenesis.
Centre for Virus Research, Institute of Infection, Immunity and Inflammation, College of Medicine, Veterinary Medicine and Life Sciences, University of Glasgow, Glasgow, United Kingdom.
Retroviral insertional mutagenesis (RIM) is a powerful tool for cancer genomics that was combined in this study with deep sequencing (RIM/DS) to facilitate a comprehensive analysis of lymphoma progression. Transgenic mice expressing two potent collaborating oncogenes in the germ line (CD2-MYC, -Runx2) develop rapid onset tumours that can be accelerated and rendered polyclonal by neonatal Moloney murine leukaemia virus (MoMLV) infection. RIM/DS analysis of 28 polyclonal lymphomas identified 771 common insertion sites (CISs) defining a 'progression network' that encompassed a remarkably large fraction of known MoMLV target genes, with further strong indications of oncogenic selection above the background of MoMLV integration preference. Progression driven by RIM was characterised as a Darwinian process of clonal competition engaging proliferation control networks downstream of cytokine and T-cell receptor signalling. Enhancer mode activation accounted for the most efficiently selected CIS target genes, including Ccr7 as the most prominent of a set of chemokine receptors driving paracrine growth stimulation and lymphoma dissemination. Another large target gene subset including candidate tumour suppressors was disrupted by intragenic insertions. A second RIM/DS screen comparing lymphomas of wild-type and parental transgenics showed that CD2-MYC tumours are virtually dependent on activation of Runx family genes in strong preference to other potent Myc collaborating genes (Gfi1, Notch1). Ikzf1 was identified as a novel collaborating gene for Runx2 and illustrated the interface between integration preference and oncogenic selection. Lymphoma target genes for MoMLV can be classified into (a) a small set of master regulators that confer self-renewal; overcoming p53 and other failsafe pathways and (b) a large group of progression genes that control autonomous proliferation in transformed cells. These findings provide insights into retroviral biology, human cancer genetics and the safety of vector-mediated gene therapy.
Funded by: Cancer Research UK: 11951, 13031; Medical Research Council: G0801822; Wellcome Trust
PLoS genetics 2014;10;2;e1004167
Who watches the watchmen? An appraisal of benchmarks for multiple sequence alignment.
Wellcome Trust Sanger Institute, Cambridge, UK.
Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique in bioinformatics used to infer related residues among biological sequences. Thus alignment accuracy is crucial to a vast range of analyses, often in ways difficult to assess in those analyses. To compare the performance of different aligners and help detect systematic errors in alignments, a number of benchmarking strategies have been pursued. Here we present an overview of the main strategies-based on simulation, consistency, protein structure, and phylogeny-and discuss their different advantages and associated risks. We outline a set of desirable characteristics for effective benchmarking, and evaluate each strategy in light of them. We conclude that there is currently no universally applicable means of benchmarking MSA, and that developers and users of alignment tools should base their choice of benchmark depending on the context of application-with a keen awareness of the assumptions underlying each benchmarking strategy.
Methods in molecular biology (Clifton, N.J.) 2014;1079;59-73
The genomic basis of vomeronasal-mediated behaviour.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
The vomeronasal organ (VNO) is a chemosensory subsystem found in the nose of most mammals. It is principally tasked with detecting pheromones and other chemical signals that initiate innate behavioural responses. The VNO expresses subfamilies of vomeronasal receptors (VRs) in a cell-specific manner: each sensory neuron expresses just one or two receptors and silences all the other receptor genes. VR genes vary greatly in number within mammalian genomes, from no functional genes in some primates to many hundreds in rodents. They bind semiochemicals, some of which are also encoded in gene families that are coexpanded in species with correspondingly large VR repertoires. Protein and peptide cues that activate the VNO tend to be expressed in exocrine tissues in sexually dimorphic, and sometimes individually variable, patterns. Few chemical ligand-VR-behaviour relationships have been fully elucidated to date, largely due to technical difficulties in working with large, homologous gene families with high sequence identity. However, analysis of mouse lines with mutations in genes involved in ligand-VR signal transduction has revealed that the VNO mediates a range of social behaviours, including male-male and maternal aggression, sexual attraction, lordosis, and selective pregnancy termination, as well as interspecific responses such as avoidance and defensive behaviours. The unusual logic of VR expression now offers an opportunity to map the specific neural circuits that drive these behaviours.
Funded by: Wellcome Trust: 098051
Mammalian genome : official journal of the International Mammalian Genome Society 2014;25;1-2;75-86
The olfactory transcriptomes of mice.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
The olfactory (OR) and vomeronasal receptor (VR) repertoires are collectively encoded by 1700 genes and pseudogenes in the mouse genome. Most OR and VR genes were identified by comparative genomic techniques and therefore, in many of those cases, only their protein coding sequences are defined. Some also lack experimental support, due in part to the similarity between them and their monogenic, cell-specific expression in olfactory tissues. Here we use deep RNA sequencing, expression microarray and quantitative RT-PCR in both the vomeronasal organ and whole olfactory mucosa to quantify their full transcriptomes in multiple male and female mice. We find evidence of expression for all VR, and almost all OR genes that are annotated as functional in the reference genome, and use the data to generate over 1100 new, multi-exonic, significantly extended receptor gene annotations. We find that OR and VR genes are neither equally nor randomly expressed, but have reproducible distributions of abundance in both tissues. The olfactory transcriptomes are only minimally different between males and females, suggesting altered gene expression at the periphery is unlikely to underpin the striking sexual dimorphism in olfactory-mediated behavior. Finally, we present evidence that hundreds of novel, putatively protein-coding genes are expressed in these highly specialized olfactory tissues, and carry out a proof-of-principle validation. Taken together, these data provide a comprehensive, quantitative catalog of the genes that mediate olfactory perception and pheromone-evoked behavior at the periphery.
Funded by: Wellcome Trust: 098051
PLoS genetics 2014;10;9;e1004593
Deletions of chromosomal regulatory boundaries are associated with congenital disease.
Background: Recent data from genome-wide chromosome conformation capture analysis indicate that the human genome is divided into conserved megabase-sized self-interacting regions called topological domains. These topological domains form the regulatory backbone of the genome and are separated by regulatory boundary elements or barriers. Copy-number variations can potentially alter the topological domain architecture by deleting or duplicating the barriers and thereby allowing enhancers from neighboring domains to ectopically activate genes causing misexpression and disease, a mutational mechanism that has recently been termed enhancer adoption.
Results: We use the Human Phenotype Ontology database to relate the phenotypes of 922 deletion cases recorded in the DECIPHER database to monogenic diseases associated with genes in or adjacent to the deletions. We identify combinations of tissue-specific enhancers and genes adjacent to the deletion and associated with phenotypes in the corresponding tissue, whereby the phenotype matched that observed in the deletion. We compare this computationally with a gene-dosage pathomechanism that attempts to explain the deletion phenotype based on haploinsufficiency of genes located within the deletions. Up to 11.8% of the deletions could be best explained by enhancer adoption or a combination of enhancer adoption and gene-dosage effects.
Conclusions: Our results suggest that enhancer adoption caused by deletions of regulatory boundaries may contribute to a substantial minority of copy-number variation phenotypes and should thus be taken into account in their medical interpretation.
Funded by: NCI NIH HHS: T32 CA009337; NIH HHS: 5R24OD011883, R24 OD011883
Genome biology 2014;15;9;423
Removal of reprogramming transgenes improves the tissue reconstitution potential of keratinocytes generated from human induced pluripotent stem cells.
Department of Dermatology and Department of Social and Environmental Medicine, Graduate School of Medicine, Osaka University, Osaka, Japan; Department of Dermatology, Graduate School of Medicine, Tokyo Medical and Dental University, Tokyo, Japan; Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom; Department of Embryonic Stem Cell Research, Institute for Frontier Medical Sciences, Kyoto University, Kyoto, Japan; Department of Reproductive Biology, Center for Regenerative Medicine, National Center for Child Health and Development, Tokyo, Japan email@example.com firstname.lastname@example.org.
Human induced pluripotent stem cell (hiPSC) lines have a great potential for therapeutics because customized cells and organs can be induced from such cells. Assessment of the residual reprogramming factors after the generation of hiPSC lines is required, but an ideal system has been lacking. Here, we generated hiPSC lines from normal human dermal fibroblasts with piggyBac transposon bearing reprogramming transgenes followed by removal of the transposon by the transposase. Under this condition, we compared the phenotypes of transgene-residual and -free hiPSCs of the same genetic background. The transgene-residual hiPSCs, in which the transcription levels of the reprogramming transgenes were eventually suppressed, were quite similar to the transgene-free hiPSCs in a pluripotent state. However, after differentiation into keratinocytes, clear differences were observed. Morphological, functional, and molecular analyses including single-cell gene expression profiling revealed that keratinocytes from transgene-free hiPSC lines were more similar to normal human keratinocytes than those from transgene-residual hiPSC lines, which may be partly explained by reactivation of residual transgenes upon induction of keratinocyte differentiation. These results suggest that transgene-free hiPSC lines should be chosen for therapeutic purposes.
Stem cells translational medicine 2014;3;9;992-1001
A complete view of the genetic diversity of the Escherichia coli O-antigen biosynthesis gene cluster.
Department of Animal and Grassland Sciences, Faculty of Agriculture, University of Miyazaki, Miyazaki 889-2192, Japan email@example.com.
The O antigen constitutes the outermost part of the lipopolysaccharide layer in Gram-negative bacteria. The chemical composition and structure of the O antigen show high levels of variation even within a single species revealing itself as serological diversity. Here, we present a complete sequence set for the O-antigen biosynthesis gene clusters (O-AGCs) from all 184 recognized Escherichia coli O serogroups. By comparing these sequences, we identified 161 well-defined O-AGCs. Based on the wzx/wzy or wzm/wzt gene sequences, in addition to 145 singletons, 37 serogroups were placed into 16 groups. Furthermore, phylogenetic analysis of all the E. coli O-serogroup reference strains revealed that the nearly one-quarter of the 184 serogroups were found in the ST10 lineage, which may have a unique genetic background allowing a more successful exchange of O-AGCs. Our data provide a complete view of the genetic diversity of O-AGCs in E. coli showing a stronger association between host phylogenetic lineage and O-serogroup diversification than previously recognized. These data will be a valuable basis for developing a systematic molecular O-typing scheme that will allow traditional typing approaches to be linked to genomic exploration of E. coli diversity.
DNA research : an international journal for rapid publication of reports on genes and genomes 2014;22;1;101-7
Genome evolution and plasticity of Serratia marcescens, an important multidrug-resistant nosocomial pathogen.
Interdisciplinary Research Organization, University of Miyazaki, JapanPresent address: Department of Animal and Grassland Sciences, Faculty of Agriculture, University of Miyazaki, Japan.
Serratia marcescens is an important nosocomial pathogen that can cause an array of infections, most notably of the urinary tract and bloodstream. Naturally, it is found in many environmental niches, and is capable of infecting plants and animals. The emergence and spread of multidrug-resistant strains producing extended-spectrum or metallo beta-lactamases now pose a threat to public health worldwide. Here we report the complete genome sequences of two carefully selected S. marcescens strains, a multidrug-resistant clinical isolate (strain SM39) and an insect isolate (strain Db11). Our comparative analyses reveal the core genome of S. marcescens and define the potential metabolic capacity, virulence, and multidrug resistance of this species. We show a remarkable intraspecies genetic diversity, both at the sequence level and with regards genome flexibility, which may reflect the diversity of niches inhabited by members of this species. A broader analysis with other Serratia species identifies a set of approximately 3,000 genes that characterize the genus. Within this apparent genetic diversity, we identified many genes implicated in the high virulence potential and antibiotic resistance of SM39, including the metallo beta-lactamase and multiple other drug resistance determinants carried on plasmid pSMC1. We further show that pSMC1 is most closely related to plasmids circulating in Pseudomonas species. Our data will provide a valuable basis for future studies on S. marcescens and new insights into the genetic mechanisms that underlie the emergence of pathogens highly resistant to multiple antimicrobial agents.
Funded by: Wellcome Trust
Genome biology and evolution 2014;6;8;2096-110
Identifying selection in the within-host evolution of influenza using viral sequence data.
Department of Genetics, University of Cambridge, Cambridge, United Kingdom.
The within-host evolution of influenza is a vital component of its epidemiology. A question of particular interest is the role that selection plays in shaping the viral population over the course of a single infection. We here describe a method to measure selection acting upon the influenza virus within an individual host, based upon time-resolved genome sequence data from an infection. Analysing sequence data from a transmission study conducted in pigs, describing part of the haemagglutinin gene (HA1) of an influenza virus, we find signatures of non-neutrality in six of a total of sixteen infections. We find evidence for both positive and negative selection acting upon specific alleles, while in three cases, the data suggest the presence of time-dependent selection. In one infection we observe what is potentially a specific immune response against the virus; a non-synonymous mutation in an epitope region of the virus is found to be under initially positive, then strongly negative selection. Crucially, given the lack of homologous recombination in influenza, our method accounts for linkage disequilibrium between nucleotides at different positions in the haemagglutinin gene, allowing for the analysis of populations in which multiple mutations are present at any given time. Our approach offers a new insight into the dynamics of influenza infection, providing a detailed characterisation of the forces that underlie viral evolution.
Funded by: Wellcome Trust: 098051, 101239, 101239/Z/13/Z
PLoS computational biology 2014;10;7;e1003755
Genome sequence of the tsetse fly (Glossina morsitans): vector of African trypanosomiasis.
Tsetse flies are the sole vectors of human African trypanosomiasis throughout sub-Saharan Africa. Both sexes of adult tsetse feed exclusively on blood and contribute to disease transmission. Notable differences between tsetse and other disease vectors include obligate microbial symbioses, viviparous reproduction, and lactation. Here, we describe the sequence and annotation of the 366-megabase Glossina morsitans morsitans genome. Analysis of the genome and the 12,308 predicted protein-encoding genes led to multiple discoveries, including chromosomal integrations of bacterial (Wolbachia) genome sequences, a family of lactation-specific proteins, reduced complement of host pathogen recognition proteins, and reduced olfaction/chemosensory associated genes. These genome data provide a foundation for research into trypanosomiasis prevention and yield important insights with broad implications for multiple aspects of tsetse biology.
Funded by: FIC NIH HHS: D43 TW007391, R03 TW008413, R03 TW009444; Medical Research Council: MR/K002279/1; NCI NIH HHS: F32 CA091768; NHGRI NIH HHS: U54 HG003079; NIAID NIH HHS: R01 AI051584, R01 AI081774; Wellcome Trust: 085775/Z/08/Z, 098051
Science (New York, N.Y.) 2014;344;6182;380-6
Jannovar: a java library for exome annotation.
Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany; Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Berlin, Germany.
Transcript-based annotation and pedigree analysis are two basic steps in the computational analysis of whole-exome sequencing experiments in genetic diagnostics and disease-gene discovery projects. Here, we present Jannovar, a stand-alone Java application as well as a Java library designed to be used in larger software frameworks for exome and genome analysis. Jannovar uses an interval tree to identify all transcripts affected by a given variant, and provides Human Genome Variation Society-compliant annotations both for variants affecting coding sequences and splice junctions as well as untranslated regions and noncoding RNA transcripts. Jannovar can also perform family-based pedigree analysis with Variant Call Format (VCF) files with data from members of a family segregating a Mendelian disorder. Using a desktop computer, Jannovar requires a few seconds to annotate a typical VCF file with exome data. Jannovar is freely available under the BSD2 license. Source code as well as the Java application and library file can be downloaded from http://compbio.charite.de (with tutorial) and https://github.com/charite/jannovar.
Human mutation 2014;35;5;548-55
The evolutionary dynamics of variant antigen genes in Babesia reveal a history of genomic innovation underlying host-parasite interaction.
Department of Infection Biology, Institute of Infection and Global Health, University of Liverpool, Liverpool Science Park Ic2, 146 Brownlow Hill, Liverpool L3 5RF, UK firstname.lastname@example.org.
Babesia spp. are tick-borne, intraerythrocytic hemoparasites that use antigenic variation to resist host immunity, through sequential modification of the parasite-derived variant erythrocyte surface antigen (VESA) expressed on the infected red blood cell surface. We identified the genomic processes driving antigenic diversity in genes encoding VESA (ves1) through comparative analysis within and between three Babesia species, (B. bigemina, B. divergens and B. bovis). Ves1 structure diverges rapidly after speciation, notably through the evolution of shortened forms (ves2) from 5' ends of canonical ves1 genes. Phylogenetic analyses show that ves1 genes are transposed between loci routinely, whereas ves2 genes are not. Similarly, analysis of sequence mosaicism shows that recombination drives variation in ves1 sequences, but less so for ves2, indicating the adoption of different mechanisms for variation of the two families. Proteomic analysis of the B. bigemina PR isolate shows that two dominant VESA1 proteins are expressed in the population, whereas numerous VESA2 proteins are co-expressed, consistent with differential transcriptional regulation of each family. Hence, VESA2 proteins are abundant and previously unrecognized elements of Babesia biology, with evolutionary dynamics consistently different to those of VESA1, suggesting that their functions are distinct.
Funded by: Medical Research Council: MR/K002279/1; NIAID NIH HHS: R01 AI055864; Wellcome Trust: 097826/Z/11/A, 098051
Nucleic acids research 2014;42;11;7113-31
Human stem cells for craniomaxillofacial reconstruction.
1 Anne McLaren Laboratory for Regenerative Medicine, Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, University of Cambridge , Cambridge, United Kingdom .
Human stem cell research represents an exceptional opportunity for regenerative medicine and the surgical reconstruction of the craniomaxillofacial complex. The correct architecture and function of the vastly diverse tissues of this important anatomical region are critical for life supportive processes, the delivery of senses, social interaction, and aesthetics. Craniomaxillofacial tissue loss is commonly associated with inflammatory responses of the surrounding tissue, significant scarring, disfigurement, and psychological sequelae as an inevitable consequence. The in vitro production of fully functional cells for skin, muscle, cartilage, bone, and neurovascular tissue formation from human stem cells, may one day provide novel materials for the reconstructive surgeon operating on patients with both hard and soft tissue deficit due to cancer, congenital disease, or trauma. However, the clinical translation of human stem cell technology, including the application of human pluripotent stem cells (hPSCs) in novel regenerative therapies, faces several hurdles that must be solved to permit safe and effective use in patients. The basic biology of hPSCs remains to be fully elucidated and concerns of tumorigenicity need to be addressed, prior to the development of cell transplantation treatments. Furthermore, functional comparison of in vitro generated tissue to their in vivo counterparts will be necessary for confirmation of maturity and suitability for application in reconstructive surgery. Here, we provide an overview of human stem cells in disease modeling, drug screening, and therapeutics, while also discussing the application of regenerative medicine for craniomaxillofacial tissue deficit and surgical reconstruction.
Funded by: Medical Research Council: G0701448
Stem cells and development 2014;23;13;1437-51
Histone deacetylase (HDAC) 1 and 2 are essential for accurate cell division and the pluripotency of embryonic stem cells.
Department of Biochemistry, University of Leicester, Leicester LE1 9HN, United Kingdom; and.
Histone deacetylases 1 and 2 (HDAC1/2) form the core catalytic components of corepressor complexes that modulate gene expression. In most cell types, deletion of both Hdac1 and Hdac2 is required to generate a discernible phenotype, suggesting their activity is largely redundant. We have therefore generated an ES cell line in which Hdac1 and Hdac2 can be inactivated simultaneously. Loss of HDAC1/2 resulted in a 60% reduction in total HDAC activity and a loss of cell viability. Cell death is dependent upon cell cycle progression, because differentiated, nonproliferating cells retain their viability. Furthermore, we observe increased mitotic defects, chromatin bridges, and micronuclei, suggesting HDAC1/2 are necessary for accurate chromosome segregation. Consistent with a critical role in the regulation of gene expression, microarray analysis of Hdac1/2-deleted cells reveals 1,708 differentially expressed genes. Significantly for the maintenance of stem cell self-renewal, we detected a reduction in the expression of the pluripotent transcription factors, Oct4, Nanog, Esrrb, and Rex1. HDAC1/2 activity is regulated through binding of an inositol tetraphosphate molecule (IP4) sandwiched between the HDAC and its cognate corepressor. This raises the important question of whether IP4 regulates the activity of the complex in cells. By rescuing the viability of double-knockout cells, we demonstrate for the first time (to our knowledge) that mutations that abolish IP4 binding reduce the activity of HDAC1/2 in vivo. Our data indicate that HDAC1/2 have essential and pleiotropic roles in cellular proliferation and regulate stem cell self-renewal by maintaining expression of key pluripotent transcription factors.
Funded by: Biotechnology and Biological Sciences Research Council: BB/J009598/1; Medical Research Council: G0600135, MR/J009202/1; Wellcome Trust: 085408, 100237; Worldwide Cancer Research: 13-0042
Proceedings of the National Academy of Sciences of the United States of America 2014;111;27;9840-5
A novel RCE1 isoform is required for H-Ras plasma membrane localization and is regulated by USP17.
*School of Pharmacy, Queen's University Belfast, McClay Research Building, 97 Lisburn Road, Belfast BT9 7BL, U.K.
Processing of the 'CaaX' motif found on the C-termini of many proteins, including the proto-oncogene Ras, requires the ER (endoplasmic reticulum)-resident protease RCE1 (Ras-converting enzyme 1) and is necessary for the proper localization and function of many of these 'CaaX' proteins. In the present paper, we report that several mammalian species have a novel isoform (isoform 2) of RCE1 resulting from an alternate splice site and producing an N-terminally truncated protein. We demonstrate that both RCE1 isoform 1 and the newly identified isoform 2 are required to reinstate proper H-Ras processing and thus plasma membrane localization in RCE1-null cells. In addition, we show that the deubiquitinating enzyme USP17 (ubiquitin-specific protease 17), previously shown to modulate RCE1 activity, can regulate the abundance and localization of isoform 2. Furthermore, we show that isoform 2 is ubiquitinated on Lys43 and deubiquitinated by USP17. Collectively, the findings of the present study indicate that RCE1 isoform 2 is required for proper 'CaaX' processing and that USP17 can regulate this via its modulation of RCE1 isoform 2 ubiquitination.
Funded by: Biotechnology and Biological Sciences Research Council: BB/F013647/1
The Biochemical journal 2014;457;2;289-300
Genome analysis of a major urban malaria vector mosquito, Anopheles stephensi.
Background: Anopheles stephensi is the key vector of malaria throughout the Indian subcontinent and Middle East and an emerging model for molecular and genetic studies of mosquito-parasite interactions. The type form of the species is responsible for the majority of urban malaria transmission across its range.
Results: Here, we report the genome sequence and annotation of the Indian strain of the type form of An. stephensi. The 221 Mb genome assembly represents more than 92% of the entire genome and was produced using a combination of 454, Illumina, and PacBio sequencing. Physical mapping assigned 62% of the genome onto chromosomes, enabling chromosome-based analysis. Comparisons between An. stephensi and An. gambiae reveal that the rate of gene order reshuffling on the X chromosome was three times higher than that on the autosomes. An. stephensi has more heterochromatin in pericentric regions but less repetitive DNA in chromosome arms than An. gambiae. We also identify a number of Y-chromosome contigs and BACs. Interspersed repeats constitute 7.1% of the assembled genome while LTR retrotransposons alone comprise more than 49% of the Y contigs. RNA-seq analyses provide new insights into mosquito innate immunity, development, and sexual dimorphism.
Conclusions: The genome analysis described in this manuscript provides a resource and platform for fundamental and translational research into a major urban malaria vector. Chromosome-based investigations provide unique perspectives on Anopheles chromosome evolution. RNA-seq analysis and studies of immunity genes offer new insights into mosquito biology and mosquito-parasite interactions.
Funded by: NIAID NIH HHS: AI042361, AI073685, AI073745, AI078183, AI080799, AI094289, AI095842, AI099528, AI105575, AI29746, AI77680, R01 AI073745, R01 AI078183, R01 AI080799, R01 AI095842, R37 AI029746
Genome biology 2014;15;9;459
The sheep genome illuminates biology of the rumen and lipid metabolism.
State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China. Commonwealth Scientific and Industrial Research Organisation Animal Food and Health Sciences, St Lucia, QLD 4067, Australia. College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China.
Sheep (Ovis aries) are a major source of meat, milk, and fiber in the form of wool and represent a distinct class of animals that have a specialized digestive organ, the rumen, that carries out the initial digestion of plant material. We have developed and analyzed a high-quality reference sheep genome and transcriptomes from 40 different tissues. We identified highly expressed genes encoding keratin cross-linking proteins associated with rumen evolution. We also identified genes involved in lipid metabolism that had been amplified and/or had altered tissue expression patterns. This may be in response to changes in the barrier lipids of the skin, an interaction between lipid metabolism and wool synthesis, and an increased role of volatile fatty acids in ruminants compared with nonruminant animals.
Funded by: Biotechnology and Biological Sciences Research Council: BB/1025360/1, BB/I025328/1, BB/I025360/1, BB/I025506/1; NHGRI NIH HHS: U54 HG003273; Wellcome Trust: 095908, 098051, WT095908, WT098051
Science (New York, N.Y.) 2014;344;6188;1168-73
Open science and community norms: Data retention and publication moratoria policies in genomics projects
Medical Law International 2014;12;2;92-120
OutbreakTools: a new platform for disease outbreak analysis using the R software.
MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, United Kingdom. Electronic address: email@example.com.
The investigation of infectious disease outbreaks relies on the analysis of increasingly complex and diverse data, which offer new prospects for gaining insights into disease transmission processes and informing public health policies. However, the potential of such data can only be harnessed using a number of different, complementary approaches and tools, and a unified platform for the analysis of disease outbreaks is still lacking. In this paper, we present the new R package OutbreakTools, which aims to provide a basis for outbreak data management and analysis in R. OutbreakTools is developed by a community of epidemiologists, statisticians, modellers and bioinformaticians, and implements classes and methods for storing, handling and visualizing outbreak data. It includes real and simulated outbreak datasets. Together with a number of tools for infectious disease epidemiology recently made available in R, OutbreakTools contributes to the emergence of a new, free and open-source platform for the analysis of disease outbreaks.
Funded by: Medical Research Council: G0800596, G0801822, MC_U105260556, MR/J013862/1, MR/J01432X/1, MR/K010174/1; NIAID NIH HHS: UM1 AI068619; Wellcome Trust: 099202/Z/12/Z, WR/094527, WR092311MF
RNA-seq analysis of host and viral gene expression highlights interaction between varicella zoster virus and keratinocyte differentiation.
Division of Infection and Immunity, University College London, London, United Kingdom.
Varicella zoster virus (VZV) is the etiological agent of chickenpox and shingles, diseases characterized by epidermal skin blistering. Using a calcium-induced keratinocyte differentiation model we investigated the interaction between epidermal differentiation and VZV infection. RNA-seq analysis showed that VZV infection has a profound effect on differentiating keratinocytes, altering the normal process of epidermal gene expression to generate a signature that resembles patterns of gene expression seen in both heritable and acquired skin-blistering disorders. Further investigation by real-time PCR, protein analysis and electron microscopy revealed that VZV specifically reduced expression of specific suprabasal cytokeratins and desmosomal proteins, leading to disruption of epidermal structure and function. These changes were accompanied by an upregulation of kallikreins and serine proteases. Taken together VZV infection promotes blistering and desquamation of the epidermis, both of which are necessary to the viral spread and pathogenesis. At the same time, analysis of the viral transcriptome provided evidence that VZV gene expression was significantly increased following calcium treatment of keratinocytes. Using reporter viruses and immunohistochemistry we confirmed that VZV gene and protein expression in skin is linked with cellular differentiation. These studies highlight the intimate host-pathogen interaction following VZV infection of skin and provide insight into the mechanisms by which VZV remodels the epidermal environment to promote its own replication and spread.
Funded by: Medical Research Council: G0501446, G0700814, G0900950, G9721629; NEI NIH HHS: EY08098; NINDS NIH HHS: NS064022; Wellcome Trust: 081703/B/06/Z
PLoS pathogens 2014;10;1;e1003896
Origins and functional consequences of somatic mitochondrial DNA mutations in human cancer.
Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, United Kingdom.
Recent sequencing studies have extensively explored the somatic alterations present in the nuclear genomes of cancers. Although mitochondria control energy metabolism and apoptosis, the origins and impact of cancer-associated mutations in mtDNA are unclear. In this study, we analyzed somatic alterations in mtDNA from 1675 tumors. We identified 1907 somatic substitutions, which exhibited dramatic replicative strand bias, predominantly C > T and A > G on the mitochondrial heavy strand. This strand-asymmetric signature differs from those found in nuclear cancer genomes but matches the inferred germline process shaping primate mtDNA sequence content. A number of mtDNA mutations showed considerable heterogeneity across tumor types. Missense mutations were selectively neutral and often gradually drifted towards homoplasmy over time. In contrast, mutations resulting in protein truncation undergo negative selection and were almost exclusively heteroplasmic. Our findings indicate that the endogenous mutational mechanism has far greater impact than any other external mutagens in mitochondria and is fundamentally linked to mtDNA replication.
Funded by: Medical Research Council: G0900871, G1000729, MR/K000608/1; NCI NIH HHS: P01 CA155258; Wellcome Trust: 088340, 095663, 096919, 101876
The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data.
Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany, Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany, Lawrence Berkeley National Laboratory, Mail Stop 84R0171, Berkeley, CA 94720, USA, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK, Department of Medical Genetics, Cambridge University Addenbrooke's Hospital, Cambridge CB2 2QQ, UK, Université Paul Sabatier, Faculté de Chirurgie Dentaire, CHU Toulouse, France, Centre for Genomic Medicine, Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Sciences Centre (MAHSC), Manchester, UK, Centre for Genomic Medicine, Institute of Human Development, Faculty of Medical and Human Sciences, University of Manchester, MAHSC, Manchester M13 9WL, UK, Institute of Genetic Medicine. Newcastle University, Central Parkway, Newcastle upon Tyne, NE1 3BZ, UK, Department of Computer Science, University of Toronto, Ontario, Canada, Centre for Computational Medicine, Hospital for Sick Children, Toronto, Ontario, Canada, Department of Clinical Genetics, Leeds Teaching Hospitals NHS Trust, Leeds LS2 9NS, UK, MRC Human Genetics Unit, MRC Institute of Genetic and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK, The Jackson Laboratory, Bar Harbor, ME 04609, USA, Center for Molecular and Vascular Biology, University of Leuven, Belgium, Department of Neuropediatrics, University Medical Center Schleswig-Holstein, Kiel Campus, 24105 Kiel, Germany, NE Thames Genetics Service, Great Ormond Street Hospital, London WC1N 3JH, UK, Drexel University College of Medicine, Philadelphia, PA 19102, USA, Department of Haematology, University of Cambridge and NHS Blood and Transplant Cambridge, CB2 0PT Cambridge, UK, Autism and Developmental Medicine Institute, Geisinger Health System
The Human Phenotype Ontology (HPO) project, available at http://www.human-phenotype-ontology.org, provides a structured, comprehensive and well-defined set of 10,088 classes (terms) describing human phenotypic abnormalities and 13,326 subclass relations between the HPO classes. In addition we have developed logical definitions for 46% of all HPO classes using terms from ontologies for anatomy, cell types, function, embryology, pathology and other domains. This allows interoperability with several resources, especially those containing phenotype information on model organisms such as mouse and zebrafish. Here we describe the updated HPO database, which provides annotations of 7,278 human hereditary syndromes listed in OMIM, Orphanet and DECIPHER to classes of the HPO. Various meta-attributes such as frequency, references and negations are associated with each annotation. Several large-scale projects worldwide utilize the HPO for describing phenotype information in their datasets. We have therefore generated equivalence mappings to other phenotype vocabularies such as LDDB, Orphanet, MedDRA, UMLS and phenoDB, allowing integration of existing datasets and interoperability with multiple biomedical resources. We have created various ways to access the HPO database content using flat files, a MySQL database, and Web-based tools. All data and documentation on the HPO project can be found online.
Funded by: NIH HHS: R24 OD011883
Nucleic acids research 2014;42;1;D966-74
Clinical interpretation of CNVs with cross-species phenotype data.
Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany Berlin-Brandenburg Center for Regenerative Therapies (BCRT), Berlin, Germany.
Background: Clinical evaluation of CNVs identified via techniques such as array comparative genome hybridisation (aCGH) involves the inspection of lists of known and unknown duplications and deletions with the goal of distinguishing pathogenic from benign CNVs. A key step in this process is the comparison of the individual's phenotypic abnormalities with those associated with Mendelian disorders of the genes affected by the CNV. However, because often there is not much known about these human genes, an additional source of data that could be used is model organism phenotype data. Currently, almost 6000 genes in mouse and zebrafish are, when knocked out, associated with a phenotype in the model organism, but no disease is known to be caused by mutations in the human ortholog. Yet, searching model organism databases and comparing model organism phenotypes with patient phenotypes for identifying novel disease genes and medical evaluation of CNVs is hindered by the difficulty in integrating phenotype information across species and the lack of appropriate software tools.
Methods: Here, we present an integrated ranking scheme based on phenotypic matching, degree of overlap with known benign or pathogenic CNVs and the haploinsufficiency score for the prioritisation of CNVs responsible for a patient's clinical findings.
Results: We show that this scheme leads to significant improvements compared with rankings that do not exploit phenotypic information. We provide a software tool called PhenogramViz, which supports phenotype-driven interpretation of aCGH findings based on multiple data sources, including the integrated cross-species phenotype ontology Uberpheno, in order to visualise gene-to-phenotype relations.
Conclusions: Integrating and visualising cross-species phenotype information on the affected genes may help in routine diagnostics of CNVs.
Funded by: NIH HHS: 5R24OD011883, R24 OD011883
Journal of medical genetics 2014;51;11;766-72
Comment on: characterization of the embB gene in Mycobacterium tuberculosis isolates from Barcelona and rapid detection of main mutations related to ethambutol resistance using a low-density DNA array.
Department of Medicine, University of Cambridge, Cambridge, UK firstname.lastname@example.org.
The Journal of antimicrobial chemotherapy 2014;69;8;2298-9
Genetic diversity within Mycobacterium tuberculosis complex impacts on the accuracy of genotypic pyrazinamide drug-susceptibility assay.
Department of Medicine, University of Cambridge, Cambridge, United Kingdom. Electronic address: email@example.com.
Funded by: Wellcome Trust
Tuberculosis (Edinburgh, Scotland) 2014;94;4;451-3
Whole-genome sequencing to control antimicrobial resistance.
Department of Medicine, University of Cambridge, Cambridge, UK. Electronic address: firstname.lastname@example.org.
Following recent improvements in sequencing technologies, whole-genome sequencing (WGS) is positioned to become an essential tool in the control of antibiotic resistance, a major threat in modern healthcare. WGS has already found numerous applications in this area, ranging from the development of novel antibiotics and diagnostic tests through to antibiotic stewardship of currently available drugs via surveillance and the elucidation of the factors that allow the emergence and persistence of resistance. Numerous proof-of-principle studies have also highlighted the value of WGS as a tool for day-to-day infection control and, for some pathogens, as a primary diagnostic tool to detect antibiotic resistance. However, appropriate data analysis platforms will need to be developed before routine WGS can be introduced on a large scale.
Funded by: Department of Health; Wellcome Trust: WT098600
Trends in genetics : TIG 2014;30;9;401-7
A transcriptional switch underlies commitment to sexual development in malaria parasites.
1] Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA  Division of Basic Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA (B.F.C.K.); Department of Molecular Biology and Center for Infectious Disease Dynamics, The Pennsylvania State University, State College, Pennsylvania 16802, USA (V.M.C., M.L.).
The life cycles of many parasites involve transitions between disparate host species, requiring these parasites to go through multiple developmental stages adapted to each of these specialized niches. Transmission of malaria parasites (Plasmodium spp.) from humans to the mosquito vector requires differentiation from asexual stages replicating within red blood cells into non-dividing male and female gametocytes. Although gametocytes were first described in 1880, our understanding of the molecular mechanisms involved in commitment to gametocyte formation is extremely limited, and disrupting this critical developmental transition remains a long-standing goal. Here we show that expression levels of the DNA-binding protein PfAP2-G correlate strongly with levels of gametocyte formation. Using independent forward and reverse genetics approaches, we demonstrate that PfAP2-G function is essential for parasite sexual differentiation. By combining genome-wide PfAP2-G cognate motif occurrence with global transcriptional changes resulting from PfAP2-G ablation, we identify early gametocyte genes as probable targets of PfAP2-G and show that their regulation by PfAP2-G is critical for their wild-type level expression. In the asexual blood-stage parasites pfap2-g appears to be among a set of epigenetically silenced loci prone to spontaneous activation. Stochastic activation presents a simple mechanism for a low baseline of gametocyte production. Overall, these findings identify PfAP2-G as a master regulator of sexual-stage development in malaria parasites and mark the first discovery of a transcriptional switch controlling a differentiation decision in protozoan parasites.
Funded by: Biotechnology and Biological Sciences Research Council; Howard Hughes Medical Institute; Medical Research Council: G0600230, G0600718, J005398; NIAID NIH HHS: R01 AI076276; NIGMS NIH HHS: P50 GM071508, P50GM071508, T32 GM007388; Wellcome Trust: 090532, 090532/Z/09/Z, 090770, 094752, 098051
Detecting Break Points of Insertions and Deletions from Paired-end Short Reads
Next-generation Sequencing: Current Technologies and Applications 2014
K13-propeller polymorphisms in Plasmodium falciparum parasites from sub-Saharan Africa.
KEMRI/United States Army Medical Research Unit-Kenya, Kisumu.
Mutations in the Plasmodium falciparum K13-propeller domain have recently been shown to be important determinants of artemisinin resistance in Southeast Asia. This study investigated the prevalence of K13-propeller polymorphisms across sub-Saharan Africa. A total of 1212 P. falciparum samples collected from 12 countries were sequenced. None of the K13-propeller mutations previously reported in Southeast Asia were found, but 22 unique mutations were detected, of which 7 were nonsynonymous. Allele frequencies ranged between 1% and 3%. Three mutations were observed in >1 country, and the A578S was present in parasites from 5 countries. This study provides the baseline prevalence of K13-propeller mutations in sub-Saharan Africa.
Funded by: Medical Research Council: G0600718, MC_EX_MR/K02440X/1, MR/M006212/1; Wellcome Trust: 090770, G0600718
The Journal of infectious diseases 2014;211;8;1352-5
Antibacterial resistance in sub-Saharan Africa: an underestimated emergency.
Centre for Microbiology Research, Kenya Medical Research Institute, Nairobi, Kenya; Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.
Antibacterial resistance-associated infections are known to increase morbidity, mortality, and cost of treatment, and to potentially put others in the community at higher risk of infections. In high-income countries, where the burden of infectious diseases is relatively modest, resistance to first-line antibacterial agents is usually overcome by use of second- and third-line agents. However, in developing countries where the burden of infectious diseases is high, patients with antibacterial-resistant infections may be unable to obtain or afford effective second-line treatments. In sub-Saharan Africa (SSA), the situation is aggravated by poor hygiene, unreliable water supplies, civil conflicts, and increasing numbers of immunocompromised people, such as those with HIV, which facilitate both the evolution of resistant pathogens and their rapid spread in the community. Because of limited capacity for disease detection and surveillance, the burden of illnesses due to treatable bacterial infections, their specific etiologies, and the awareness of antibacterial resistance are less well established in most of SSA, and therefore the ability to mitigate their consequences is significantly limited.
Funded by: NIAID NIH HHS: R01 AI099525; Wellcome Trust: 100891
Annals of the New York Academy of Sciences 2014;1323;43-55
Natural selection and infectious disease in human populations.
1] Center for Systems Biology, Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138, USA.  Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142, USA.
The ancient biological 'arms race' between microbial pathogens and humans has shaped genetic variation in modern populations, and this has important implications for the growing field of medical genomics. As humans migrated throughout the world, populations encountered distinct pathogens, and natural selection increased the prevalence of alleles that are advantageous in the new ecosystems in both host and pathogens. This ancient history now influences human infectious disease susceptibility and microbiome homeostasis, and contributes to common diseases that show geographical disparities, such as autoimmune and metabolic disorders. Using new high-throughput technologies, analytical methods and expanding public data resources, the investigation of natural selection is leading to new insights into the function and dysfunction of human biology.
Funded by: Medical Research Council: G19/9; Wellcome Trust: 090532, 090770
Nature reviews. Genetics 2014;15;6;379-93
Impact of temporal variation on design and analysis of mouse knockout phenotyping studies.
Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom.
A significant challenge facing high-throughput phenotyping of in-vivo knockout mice is ensuring phenotype calls are robust and reliable. Central to this problem is selecting an appropriate statistical analysis that models both the experimental design (the workflow and the way control mice are selected for comparison with knockout animals) and the sources of variation. Recently we proposed a mixed model suitable for small batch-oriented studies, where controls are not phenotyped concurrently with mutants. Here we evaluate this method both for its sensitivity to detect phenotypic effects and to control false positives, across a range of workflows used at mouse phenotyping centers. We found the sensitivity and control of false positives depend on the workflow. We show that the phenotypes in control mice fluctuate unexpectedly between batches and this can cause the false positive rate of phenotype calls to be inflated when only a small number of batches are tested, when the effect of knockout becomes confounded with temporal fluctuations in control mice. This effect was observed in both behavioural and physiological assays. Based on this analysis, we recommend two approaches (workflow and accompanying control strategy) and associated analyses, which would be robust, for use in high-throughput phenotyping pipelines. Our results show the importance in modelling all sources of variability in high-throughput phenotyping studies.
Funded by: Cancer Research UK: 13031; Medical Research Council: MR/L007428/1; NHGRI NIH HHS: 1 U54 HG006370-01, U54 HG006370; Wellcome Trust: 083573/Z/07/Z, 090532/Z/09/Z, WT098051
PloS one 2014;9;10;e111239
Kdm3a lysine demethylase is an Hsp90 client required for cytoskeletal rearrangements during spermatogenesis.
MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, Western General Hospital, University of Edinburgh, Edinburgh EH4 2XU, United Kingdom Edinburgh Cancer Research UK Centre, Institute of Genetics and Molecular Medicine, Western General Hospital, University of Edinburgh, Edinburgh EH4 2XU, United Kingdom Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1HH, United Kingdom Biomedical Sciences Research Complex Mass Spectrometry and Proteomics Facility, University of St. Andrews, St. Andrews, Fife KY16 9ST, United Kingdom.
The lysine demethylase Kdm3a (Jhdm2a, Jmjd1a) is required for male fertility, sex determination, and metabolic homeostasis through its nuclear role in chromatin remodeling. Many histone-modifying enzymes have additional nonhistone substrates, as well as nonenzymatic functions, contributing to the full spectrum of events underlying their biological roles. We present two Kdm3a mouse models that exhibit cytoplasmic defects that may account in part for the globozoospermia phenotype reported previously. Electron microscopy revealed abnormal acrosome and manchette and the absence of implantation fossa at the caudal end of the nucleus in mice without Kdm3a demethylase activity, which affected cytoplasmic structures required to elongate the sperm head. We describe an enzymatically active new Kdm3a isoform and show that subcellular distribution, protein levels, and lysine demethylation activity of Kdm3a depended on Hsp90. We show that Kdm3a localizes to cytoplasmic structures of maturing spermatids affected in Kdm3a mutant mice, which in turn display altered fractionation of β-actin and γ-tubulin. Kdm3a is therefore a multifunctional Hsp90 client protein that participates directly in the regulation of cytoskeletal components.
Funded by: Medical Research Council: MC_PC_U127527199, MC_PC_U127561112, MC_PC_U127580973, MC_U127527199
Molecular biology of the cell 2014;25;8;1216-33
Managing clinically significant findings in research: the UK10K example.
Nuffield Department of Population Health, HeLEX - Centre for Health, Law and Emerging Technologies, University of Oxford, Oxford, UK.
Recent advances in sequencing technology allow data on the human genome to be generated more quickly and in greater detail than ever before. Such detail includes findings that may be of significance to the health of the research participant involved. Although research studies generally do not feed back information on clinically significant findings (CSFs) to participants, this stance is increasingly being questioned. There may be difficulties and risks in feeding clinically significant information back to research participants, however, the UK10K consortium sought to address these by creating a detailed management pathway. This was not intended to create any obligation upon the researchers to feed back any CSFs they discovered. Instead, it provides a mechanism to ensure that any such findings can be passed on to the participant where appropriate. This paper describes this mechanism and the specific criteria, which must be fulfilled in order for a finding and participant to qualify for feedback. This mechanism could be used by future research consortia, and may also assist in the development of sound principles for dealing with CSFs.
Funded by: Medical Research Council: G0500870, MC_PC_15018, MC_PC_U127561093, MC_UU_12013/3, MR/L010305/1; Wellcome Trust: 091310, 092731, 096599, 098498, 100140, 102215, WT096599/2/11/Z
European journal of human genetics : EJHG 2014;22;9;1100-4
Identification of structural variation in mouse genomes.
Wellcome Trust Sanger Institute Hinxton, Cambridge, UK.
Structural variation is variation in structure of DNA regions affecting DNA sequence length and/or orientation. It generally includes deletions, insertions, copy-number gains, inversions, and transposable elements. Traditionally, the identification of structural variation in genomes has been challenging. However, with the recent advances in high-throughput DNA sequencing and paired-end mapping (PEM) methods, the ability to identify structural variation and their respective association to human diseases has improved considerably. In this review, we describe our current knowledge of structural variation in the mouse, one of the prime model systems for studying human diseases and mammalian biology. We further present the evolutionary implications of structural variation on transposable elements. We conclude with future directions on the study of structural variation in mouse genomes that will increase our understanding of molecular architecture and functional consequences of structural variation.
Funded by: Medical Research Council: G0800024, MR/L007428/1
Frontiers in genetics 2014;5;192
Expression of phosphofructokinase in skeletal muscle is influenced by genetic variation and associated with insulin sensitivity.
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, U.K.
Using an integrative approach in which genetic variation, gene expression, and clinical phenotypes are assessed in relevant tissues may help functionally characterize the contribution of genetics to disease susceptibility. We sought to identify genetic variation influencing skeletal muscle gene expression (expression quantitative trait loci [eQTLs]) as well as expression associated with measures of insulin sensitivity. We investigated associations of 3,799,401 genetic variants in expression of >7,000 genes from three cohorts (n = 104). We identified 287 genes with cis-acting eQTLs (false discovery rate [FDR] <5%; P < 1.96 × 10(-5)) and 49 expression-insulin sensitivity phenotype associations (i.e., fasting insulin, homeostasis model assessment-insulin resistance, and BMI) (FDR <5%; P = 1.34 × 10(-4)). One of these associations, fasting insulin/phosphofructokinase (PFKM), overlaps with an eQTL. Furthermore, the expression of PFKM, a rate-limiting enzyme in glycolysis, was nominally associated with glucose uptake in skeletal muscle (P = 0.026; n = 42) and overexpressed (Bonferroni-corrected P = 0.03) in skeletal muscle of patients with T2D (n = 102) compared with normoglycemic controls (n = 87). The PFKM eQTL (rs4547172; P = 7.69 × 10(-6)) was nominally associated with glucose uptake, glucose oxidation rate, intramuscular triglyceride content, and metabolic flexibility (P = 0.016-0.048; n = 178). We explored eQTL results using published data from genome-wide association studies (DIAGRAM and MAGIC), and a proxy for the PFKM eQTL (rs11168327; r(2) = 0.75) was nominally associated with T2D (DIAGRAM P = 2.7 × 10(-3)). Taken together, our analysis highlights PFKM as a potential regulator of skeletal muscle insulin sensitivity.
Funded by: Wellcome Trust: 081917/Z/07/Z, 086596/Z/08/Z, 090532
Reply to Brunet and Doolittle: Both selected effect and causal role elements can influence human biology and disease.
Funded by: NHGRI NIH HHS: U41 HG007234, U54 HG007004; NIGMS NIH HHS: P01 GM085354, R01 GM083337
Proceedings of the National Academy of Sciences of the United States of America 2014;111;33;E3366
Defining functional DNA elements in the human genome.
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139.
With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease.
Funded by: NCI NIH HHS: P30 CA008748, P30 CA045508; NHGRI NIH HHS: R01 HG003143, R01 HG004037, U41 HG007234, U54 HG006996, U54 HG006997; NIA NIH HHS: R01 AG016379; NIGMS NIH HHS: R01 GM083337
Proceedings of the National Academy of Sciences of the United States of America 2014;111;17;6131-8
The impact of different DNA extraction kits and laboratories upon the assessment of human gut microbiota composition by 16S rRNA gene sequencing.
Gastrointestinal Unit, Centre for Genomic and Experimental Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom.
Introduction: Determining bacterial community structure in fecal samples through DNA sequencing is an important facet of intestinal health research. The impact of different commercially available DNA extraction kits upon bacterial community structures has received relatively little attention. The aim of this study was to analyze bacterial communities in volunteer and inflammatory bowel disease (IBD) patient fecal samples extracted using widely used DNA extraction kits in established gastrointestinal research laboratories.
Methods: Fecal samples from two healthy volunteers (H3 and H4) and two relapsing IBD patients (I1 and I2) were investigated. DNA extraction was undertaken using MoBio Powersoil and MP Biomedicals FastDNA SPIN Kit for Soil DNA extraction kits. PCR amplification for pyrosequencing of bacterial 16S rRNA genes was performed in both laboratories on all samples. Hierarchical clustering of sequencing data was done using the Yue and Clayton similarity coefficient.
Results: DNA extracted using the FastDNA kit and the MoBio kit gave median DNA concentrations of 475 (interquartile range 228-561) and 22 (IQR 9-36) ng/µL respectively (p<0.0001). Hierarchical clustering of sequence data by Yue and Clayton coefficient revealed four clusters. Samples from individuals H3 and I2 clustered by patient; however, samples from patient I1 extracted with the MoBio kit clustered with samples from patient H4 rather than the other I1 samples. Linear modelling on relative abundance of common bacterial families revealed significant differences between kits; samples extracted with MoBio Powersoil showed significantly increased Bacteroidaceae, Ruminococcaceae and Porphyromonadaceae, and lower Enterobacteriaceae, Lachnospiraceae, Clostridiaceae, and Erysipelotrichaceae (p<0.05).
Conclusion: This study demonstrates significant differences in DNA yield and bacterial DNA composition when comparing DNA extracted from the same fecal sample with different extraction kits. This highlights the importance of ensuring that samples in a study are prepared with the same method, and the need for caution when cross-comparing studies that use different methods.
Funded by: Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Medical Research Council: G0600329, G0800675; Wellcome Trust: 097943, 102974, WT076964, WT097943MA
PloS one 2014;9;2;e88982
Insertions in the OCL1 locus of Acinetobacter baumannii lead to shortened lipooligosaccharides.
School of Molecular Bioscience, The University of Sydney, New South Wales, Australia. Electronic address: email@example.com.
Genomes of 82 Acinetobacter baumannii global clones 1 (GC1) and 2 (GC2) isolates were sequenced and different forms of the locus predicted to direct synthesis of the outer core (OC) of the lipooligosaccharide were identified. OCL1 was in all GC2 genomes, whereas GC1 isolates carried OCL1, OCL3 or a new locus, OCL5. Three mutants in which an insertion sequence (ISAba1 or ISAba23) interrupted OCL1 were identified. Isolates with OCL1 intact produced only lipooligosaccharide, while the mutants produced lipooligosaccharide of reduced molecular weight. Thus, the assignment of the OC locus as that responsible for the synthesis of the OC is correct.
Funded by: Wellcome Trust: 098051
Research in microbiology 2014;165;6;472-5
Ensembl Genomes 2013: scaling up access to genome-wide data.
The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Wellcome Trust Sanger Centre, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK, Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, NY 11724, USA and USDA-ARS, Cornell University, Ithaca, NY, 14853, USA.
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species. The project exploits and extends technologies for genome annotation, analysis and dissemination, developed in the context of the vertebrate-focused Ensembl project, and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. This article provides an update to the previous publications about the resource, with a focus on recent developments. These include the addition of important new genomes (and related data sets) including crop plants, vectors of human disease and eukaryotic pathogens. In addition, the resource has scaled up its representation of bacterial genomes, and now includes the genomes of over 9000 bacteria. Specific extensions to the web and programmatic interfaces have been developed to support users in navigating these large data sets. Looking forward, analytic tools to allow targeted selection of data for visualization and download are likely to become increasingly important in future as the number of available genomes increases within all domains of life, and some of the challenges faced in representing bacterial data are likely to become commonplace for eukaryotes in future.
Funded by: Biotechnology and Biological Sciences Research Council: BB/F19793/1, BB/H531519/1, BB/I001077/1, BB/I008071/1, BB/I00I0077/1, BB/J00328X/1, BB/J017299/1; Wellcome Trust: 090548/B/09/Z, 095831
Nucleic acids research 2014;42;Database issue;D546-52
Cancer mouse models: past, present and future.
Department of Pharmacology, University of Cambridge, Cambridge CB2 1PD, UK. Electronic address: firstname.lastname@example.org.
The development and advances in gene targeting technology over the past three decades has facilitated the generation of cancer mouse models that recapitulate features of human malignancies. These models have been and still remain instrumental in revealing the complexities of human cancer biology. However, they will need to evolve in the post-genomic era of cancer research. In this review we will highlight some of the key developments over the past decades and will discuss the new possibilities of cancer mouse models in the light of emerging powerful gene manipulating tools.
Seminars in cell & developmental biology 2014;27;54-60
A novel method for detecting uniparental disomy from trio genotypes identifies a significant excess in children with developmental disorders.
Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, United Kingdom;
Exome sequencing of parent-offspring trios is a popular strategy for identifying causative genetic variants in children with rare diseases. This method owes its strength to the leveraging of inheritance information, which facilitates de novo variant calling, inference of compound heterozygosity, and the identification of inheritance anomalies. Uniparental disomy describes the inheritance of a homologous chromosome pair from only one parent. This aberration is important to detect in genetic disease studies because it can result in imprinting disorders and recessive diseases. We have developed a software tool to detect uniparental disomy from child-mother-father genotype data that uses a binomial test to identify chromosomes with a significant burden of uniparentally inherited genotypes. This tool is the first to read VCF-formatted genotypes, to perform integrated copy number filtering, and to use a statistical test inherently robust for use in platforms of varying genotyping density and noise characteristics. Simulations demonstrated superior accuracy compared with previously developed approaches. We implemented the method on 1057 trios from the Deciphering Developmental Disorders project, a trio-based rare disease study, and detected six validated events, a significant enrichment compared with the population prevalence of UPD (1 in 3500), suggesting that most of these events are pathogenic. One of these events represents a known imprinting disorder, and exome analyses have identified rare homozygous candidate variants, mainly in the isodisomic regions of UPD chromosomes, which, among other variants, provide targets for further genetic and functional evaluation.
Funded by: Wellcome Trust: 076113, WT098051
Genome research 2014;24;4;673-87
Determinants of invasiveness beneath the capsule of the pneumococcus.
Department of Global Health, Emory University.
The Journal of infectious diseases 2014;209;3;321-2
Population distribution and ancestry of the cancer protective MDM2 SNP285 (rs117039649).
Section of Oncology, Department of Clinical Science, University of Bergen, 5020 Bergen, Norway. Department of Oncology, Haukeland University Hospital, 5021 Bergen, Norway.
The MDM2 promoter SNP285C is located on the SNP309G allele. While SNP309G enhances Sp1 transcription factor binding and MDM2 transcription, SNP285C antagonizes Sp1 binding and reduces the risk of breast-, ovary- and endometrial cancer. Assessing SNP285 and 309 genotypes across 25 different ethnic populations (>10.000 individuals), the incidence of SNP285C was 6-8% across European populations except for Finns (1.2%) and Saami (0.3%). The incidence decreased towards the Middle-East and Eastern Russia, and SNP285C was absent among Han Chinese, Mongolians and African Americans. Interhaplotype variation analyses estimated SNP285C to have originated about 14,700 years ago (95% CI: 8,300 - 33,300). Both this estimate and the geographical distribution suggest SNP285C to have arisen after the separation between Caucasians and modern day East Asians (17,000 - 40,000 years ago). We observed a strong inverse correlation (r = -0.805; p < 0.001) between the percentage of SNP309G alleles harboring SNP285C and the MAF for SNP309G itself across different populations suggesting selection and environmental adaptation with respect to MDM2 expression in recent human evolution. In conclusion, we found SNP285C to be a pan-Caucasian variant. Ethnic variation regarding distribution of SNP285C needs to be taken into account when assessing the impact of MDM2 SNPs on cancer risk.
Whole genome sequencing reveals potential spread of Clostridium difficile between humans and farm animals in the Netherlands, 2002 to 2011.
Section Experimental Bacteriology, Department of Medical Microbiology, Leiden University Medical Center, Leiden, the Netherlands.
Farm animals are a potential reservoir for human Clostridium difficile infection (CDI), particularly PCR ribotype 078 which is frequently found in animals and humans. Here, whole genome single-nucleotide polymorphism (SNP) analysis was used to study the evolutionary relatedness of C. difficile 078 isolated from humans and animals on Dutch pig farms. All sequenced genomes were surveyed for potential antimicrobial resistance determinants and linked to an antimicrobial resistance phenotype. We sequenced the whole genome of 65 C. difficile 078 isolates collected between 2002 and 2011 from pigs (n = 19), asymptomatic farmers (n = 15) and hospitalised patients (n = 31) in the Netherlands. The collection included 12 pairs of human and pig isolates from 2011 collected at 12 different pig farms. A mutation rate of 1.1 SNPs per genome per year was determined for C. difficile 078. Importantly, we demonstrate that farmers and pigs were colonised with identical (no SNP differences) and nearly identical (less than two SNP differences) C. difficile clones. Identical tetracycline and streptomycin resistance determinants were present in human and animal C. difficile 078 isolates. Our observation that farmers and pigs share identical C. difficile strains suggests transmission between these populations, although we cannot exclude the possibility of transmission from a common environmental source.
Funded by: Medical Research Council: 93614, MR/L015080/1; Wellcome Trust: 079643, 086418, 098051
Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin 2014;19;45;20954
USP28 is recruited to sites of DNA damage by the tandem BRCT domains of 53BP1 but plays a minor role in double-strand break metabolism.
Institute for Research in Biomedicine (IRB Barcelona), Barcelona, Spain.
The DNA damage response (DDR) is critical for genome stability and the suppression of a wide variety of human malignancies, including neurodevelopmental disorders, immunodeficiency, and cancer. In addition, the efficacy of many chemotherapeutic strategies is dictated by the status of the DDR. Ubiquitin-specific protease 28 (USP28) was reported to govern the stability of multiple factors that are critical for diverse aspects of the DDR. Here, we examined the effects of USP28 depletion on the DDR in cells and in vivo. We found that USP28 is recruited to double-strand breaks in a manner that requires the tandem BRCT domains of the DDR protein 53BP1. However, we observed only minor DDR defects in USP28-depleted cells, and mice lacking USP28 showed normal longevity, immunological development, and radiation responses. Our results thus indicate that USP28 is not a critical factor in double-strand break metabolism and is unlikely to be an attractive target for therapeutic intervention aimed at chemotherapy sensitization.
Funded by: Cancer Research UK: 11224, C6946/A14492; Wellcome Trust: 092096, WT092096
Molecular and cellular biology 2014;34;11;2062-74
Confinement and deformation of single cells and their nuclei inside size-adapted microtubes.
Institute for Integrative Nanosciences, IFW Dresden, Helmholtzstraße 20, Dresden, D-01069, Germany.
Funded by: Cancer Research UK: 11224; European Research Council: 311529; Wellcome Trust: 092096
Advanced healthcare materials 2014;3;11;1753-8
MARIMO cells harbor a CALR mutation but are not dependent on JAK2/STAT5 signaling.
Cambridge Institute for Medical Research, Wellcome Trust/MRC Stem Cell Institute and Department of Haematology, University of Cambridge, Cambridge, UK.
Funded by: Cancer Research UK: 12765, A12765
A comparison of peak callers used for DNase-Seq data.
The Babraham Institute, Babraham Research Campus, Cambridge, United Kingdom; Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom.
Genome-wide profiling of open chromatin regions using DNase I and high-throughput sequencing (DNase-seq) is an increasingly popular approach for finding and studying regulatory elements. A variety of algorithms have been developed to identify regions of open chromatin from raw sequence-tag data, which has motivated us to assess and compare their performance. In this study, four published, publicly available peak calling algorithms used for DNase-seq data analysis (F-seq, Hotspot, MACS and ZINBA) are assessed at a range of signal thresholds on two published DNase-seq datasets for three cell types. The results were benchmarked against an independent dataset of regulatory regions derived from ENCODE in vivo transcription factor binding data for each particular cell type. The level of overlap between peak regions reported by each algorithm and this ENCODE-derived reference set was used to assess sensitivity and specificity of the algorithms. Our study suggests that F-seq has a slightly higher sensitivity than the next best algorithms. Hotspot and the ChIP-seq oriented method, MACS, both perform competitively when used with their default parameters. However the generic peak finder ZINBA appears to be less sensitive than the other three. We also assess accuracy of each algorithm over a range of signal thresholds. In particular, we show that the accuracy of F-Seq can be considerably improved by using a threshold setting that is different from the default value.
Funded by: Wellcome Trust: 098051
PloS one 2014;9;5;e96303
Dysfunction of phospholipase Cγ in immune disorders and cancer.
Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London, UK; Division of Molecular Structure, Medical Research Council (MRC) National Institute for Medical Research, London, UK.
The surge in genetic and genomic investigations over the past 5 years has resulted in many discoveries of causative variants relevant to disease pathophysiology. Although phospholipase C (PLC) enzymes have long been recognized as important components in intracellular signal transmission, it is only recently that this approach highlighted their role in disease development through gain-of-function mutations. In this review we describe the new findings that link the PLCγ family to immune disorders and cancer, and illustrate further efforts to elucidate the molecular mechanisms that underpin their dysfunction.
Funded by: Cancer Research UK; Wellcome Trust
Trends in biochemical sciences 2014;39;12;603-11
A linguistically informed autosomal STR survey of human populations residing in the greater Himalayan region.
MGC Department of Human and Clinical Genetics, Leiden University Medical Centre, Leiden, the Netherlands.
The greater Himalayan region demarcates two of the most prominent linguistic phyla in Asia: Tibeto-Burman and Indo-European. Previous genetic surveys, mainly using Y-chromosome polymorphisms and/or mitochondrial DNA polymorphisms suggested a substantially reduced geneflow between populations belonging to these two phyla. These studies, however, have mainly focussed on populations residing far to the north and/or south of this mountain range, and have not been able to study geneflow patterns within the greater Himalayan region itself. We now report a detailed, linguistically informed, genetic survey of Tibeto-Burman and Indo-European speakers from the Himalayan countries Nepal and Bhutan based on autosomal microsatellite markers and compare these populations with surrounding regions. The genetic differentiation between populations within the Himalayas seems to be much higher than between populations in the neighbouring countries. We also observe a remarkable genetic differentiation between the Tibeto-Burman speaking populations on the one hand and Indo-European speaking populations on the other, suggesting that language and geography have played an equally large role in defining the genetic composition of present-day populations within the Himalayas.
Funded by: Wellcome Trust: 087576, WT 087576, WT 098051
PloS one 2014;9;3;e91534
Crystal structures of three representatives of a new Pfam family PF14869 (DUF4488) suggest they function in sugar binding/uptake.
Joint Center for Structural Genomics, http://www.jcsg.org; Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, California, 94025.
Crystal structures of three members (BACOVA_00364 from Bacteroides ovatus, BACUNI_03039 from Bacteroides uniformis and BACEGG_00036 from Bacteroides eggerthii) of the Pfam domain of unknown function (DUF4488) were determined to 1.95, 1.66, and 1.81 Å resolutions, respectively. The protein structures adopt an eight-stranded, calycin-like, β-barrel fold and bind an endogenous unknown ligand at one end of the β-barrel. The amino acids interacting with the ligand are not conserved in any other protein of known structure with this particular fold. The size and chemical environment of the bound ligand suggest binding or transport of a small polar molecule(s) as a potential function for these proteins. These are the first structural representatives of a newly defined PF14869 (DUF4488) Pfam family.
Funded by: NIGMS NIH HHS: P41GM103393, U54 GM094586
Protein science : a publication of the Protein Society 2014;23;10;1380-91
Gene-Lifestyle Interactions in Complex Diseases: Design and Description of the GLACIER and VIKING Studies.
Department of Clinical Sciences, Genetic and Molecular Epidemiology Unit, Lund University, Skåne University Hospital Malmö, CRC, Building 91, Level 10, Jan Waldenströms gata 35, SE-205 02 Malmö, Sweden.
Most complex diseases have well-established genetic and non-genetic risk factors. In some instances, these risk factors are likely to interact, whereby their joint effects convey a level of risk that is either significantly more or less than the sum of these risks. Characterizing these gene-environment interactions may help elucidate the biology of complex diseases, as well as to guide strategies for their targeted prevention. In most cases, the detection of gene-environment interactions will require sample sizes in excess of those needed to detect the marginal effects of the genetic and environmental risk factors. Although many consortia have been formed, comprising multiple diverse cohorts to detect gene-environment interactions, few robust examples of such interactions have been discovered. This may be because combining data across studies, usually through meta-analysis of summary data from the contributing cohorts, is often a statistically inefficient approach for the detection of gene-environment interactions. Ideally, single, very large and well-genotyped prospective cohorts, with validated measures of environmental risk factor and disease outcomes should be used to study interactions. The presence of strong founder effects within those cohorts might further strengthen the capacity to detect novel genetic effects and gene-environment interactions. Access to accurate genealogical data would also aid in studying the diploid nature of the human genome, such as genomic imprinting (parent-of-origin effects). Here we describe two studies from northern Sweden (the GLACIER and VIKING studies) that fulfill these characteristics.
Funded by: NHGRI NIH HHS: U01 HG004399
Current nutrition reports 2014;3;4;400-411
High risk population isolate reveals low frequency variants predisposing to intracranial aneurysms.
Neurosurgery, NeuroCenter, Kuopio University Hospital, Kuopio, Finland ; Neurosurgery, Institute of Clinical Medicine, University of Eastern Finland, Kuopio, Finland ; Department of Neurobiology, A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio, Finland.
3% of the population develops saccular intracranial aneurysms (sIAs), a complex trait, with a sporadic and a familial form. Subarachnoid hemorrhage from sIA (sIA-SAH) is a devastating form of stroke. Certain rare genetic variants are enriched in the Finns, a population isolate with a small founder population and bottleneck events. As the sIA-SAH incidence in Finland is >2× increased, such variants may associate with sIA in the Finnish population. We tested 9.4 million variants for association in 760 Finnish sIA patients (enriched for familial sIA), and in 2,513 matched controls with case-control status and with the number of sIAs. The most promising loci (p<5E-6) were replicated in 858 Finnish sIA patients and 4,048 controls. The frequencies and effect sizes of the replicated variants were compared to a continental European population using 717 Dutch cases and 3,004 controls. We discovered four new high-risk loci with low frequency lead variants. Three were associated with the case-control status: 2q23.3 (MAF 2.1%, OR 1.89, p 1.42×10-9); 5q31.3 (MAF 2.7%, OR 1.66, p 3.17×10-8); 6q24.2 (MAF 2.6%, OR 1.87, p 1.87×10-11) and one with the number of sIAs: 7p22.1 (MAF 3.3%, RR 1.59, p 6.08×-9). Two of the associations (5q31.3, 6q24.2) replicated in the Dutch sample. The 7p22.1 locus was strongly differentiated; the lead variant was more frequent in Finland (4.6%) than in the Netherlands (0.3%). Additionally, we replicated a previously inconclusive locus on 2q33.1 in all samples tested (OR 1.27, p 1.87×10-12). The five loci explain 2.1% of the sIA heritability in Finland, and may relate to, but not explain, the increased incidence of sIA-SAH in Finland. This study illustrates the utility of population isolates, familial enrichment, dense genotype imputation and alternate phenotyping in search for variants associated with complex diseases.
PLoS genetics 2014;10;1;e1004134
Genomic diversity of Epstein-Barr virus genomes isolated from primary nasopharyngeal carcinoma biopsy samples.
Department of Paediatrics and Adolescent Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China.
Unlabelled: Undifferentiated nasopharyngeal carcinoma (NPC) has a 100% association with Epstein-Barr virus (EBV). However, only three EBV genomes isolated from NPC patients have been sequenced to date, and the role of EBV genomic variations in the pathogenesis of NPC is unclear. We sought to obtain the sequences of EBV genomes in multiple NPC biopsy specimens in the same geographic location in order to reveal their sequence diversity. Three published EBV (B95-8, C666-1, and HKNPC1) genomes were first resequenced using the sequencing workflow of target enrichment of EBV DNA by hybridization, followed by next-generation sequencing, de novo assembly, and joining of contigs by Sanger sequencing. The sequences of eight NPC biopsy specimen-derived EBV (NPC-EBV) genomes, designated HKNPC2 to HKNPC9, were then determined. They harbored 1,736 variations in total, including 1,601 substitutions, 64 insertions, and 71 deletions, compared to the reference EBV. Furthermore, genes encoding latent, early lytic, and tegument proteins and glycoproteins were found to contain nonsynonymous mutations of potential biological significance. Phylogenetic analysis showed that the HKNPC6 and -7 genomes, which were isolated from tumor biopsy specimens of advanced metastatic NPC cases, were distinct from the other six NPC-EBV genomes, suggesting the presence of at least two parental lineages of EBV among the NPC-EBV genomes. In conclusion, much greater sequence diversity among EBV isolates derived from NPC biopsy specimens is demonstrated on a whole-genome level through a complete sequencing workflow. Large-scale sequencing and comparison of EBV genomes isolated from NPC and normal subjects should be performed to assess whether EBV genomic variations contribute to NPC pathogenesis.
Importance: This study established a sequencing workflow from EBV DNA capture and sequencing to de novo assembly and contig joining. We reported eight newly sequenced EBV genomes isolated from primary NPC biopsy specimens and revealed the sequence diversity on a whole-genome level among these EBV isolates. At least two lineages of EBV strains are observed, and recombination among these lineages is inferred. Our study has demonstrated the value of, and provided a platform for, genome sequencing of EBV.
Journal of virology 2014;88;18;10662-72
Genomic encyclopedia of bacteria and archaea: sequencing a myriad of type strains.
DOE-Joint Genome Institute, Walnut Creek, California, United States of America; Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia.
Microbes hold the key to life. They hold the secrets to our past (as the descendants of the earliest forms of life) and the prospects for our future (as we mine their genes for solutions to some of the planet's most pressing problems, from global warming to antibiotic resistance). However, the piecemeal approach that has defined efforts to study microbial genetic diversity for over 20 years and in over 30,000 genome projects risks squandering that promise. These efforts have covered less than 20% of the diversity of the cultured archaeal and bacterial species, which represent just 15% of the overall known prokaryotic diversity. Here we call for the funding of a systematic effort to produce a comprehensive genomic catalog of all cultured Bacteria and Archaea by sequencing, where available, the type strain of each species with a validly published name (currently∼11,000). This effort will provide an unprecedented level of coverage of our planet's genetic diversity, allow for the large-scale discovery of novel genes and functions, and lead to an improved understanding of microbial evolution and function in the environment.
PLoS biology 2014;12;8;e1001920
Design of clone-specific probes from genome sequences for rapid PCR-typing of outbreak pathogens.
Servicio de Microbiología, Hospital Universitario La Paz, IdiPAZ, Madrid, Spain.
The genome sequence of one OXA-48-producing Klebsiella pneumoniae belonging to sequence type (ST) 405, and three belonging to ST11, were used to design and test ST-specific PCR assays for typing OXA-48-producing K. pneumoniae. The approach proved to be useful for in-house development of rapid PCR typing assays for local outbreak surveillance.
Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases 2014;20;11;O891-3
Predicting the virulence of MRSA from its genome sequence.
Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, United Kingdom;
Microbial virulence is a complex and often multifactorial phenotype, intricately linked to a pathogen's evolutionary trajectory. Toxicity, the ability to destroy host cell membranes, and adhesion, the ability to adhere to human tissues, are the major virulence factors of many bacterial pathogens, including Staphylococcus aureus. Here, we assayed the toxicity and adhesiveness of 90 MRSA (methicillin resistant S. aureus) isolates and found that while there was remarkably little variation in adhesion, toxicity varied by over an order of magnitude between isolates, suggesting different evolutionary selection pressures acting on these two traits. We performed a genome-wide association study (GWAS) and identified a large number of loci, as well as a putative network of epistatically interacting loci, that significantly associated with toxicity. Despite this apparent complexity in toxicity regulation, a predictive model based on a set of significant single nucleotide polymorphisms (SNPs) and insertion and deletions events (indels) showed a high degree of accuracy in predicting an isolate's toxicity solely from the genetic signature at these sites. Our results thus highlight the potential of using sequence data to determine clinically relevant parameters and have further implications for understanding the microbial virulence of this opportunistic pathogen.
Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council: G1000362, G1000803, G9219778; NIAID NIH HHS: P01 AI083211; Wellcome Trust: 101237
Genome research 2014;24;5;839-49
Emergence of a new epidemic Neisseria meningitidis serogroup A Clone in the African meningitis belt: high-resolution picture of genomic changes that mediate immune evasion.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
In the African "meningitis belt," outbreaks of meningococcal meningitis occur in cycles, representing a model for the role of host-pathogen interactions in epidemic processes. The periodicity of the epidemics is not well understood, nor is it currently possible to predict them. In our longitudinal colonization and disease surveys, we have observed waves of clonal replacement with the same serogroup, suggesting that immunity to noncapsular antigens plays a significant role in natural herd immunity. Here, through comparative genomic analysis of 100 meningococcal isolates, we provide a high-resolution view of the evolutionary changes that occurred during clonal replacement of a hypervirulent meningococcal clone (ST-7) by a descendant clone (ST-2859). We show that the majority of genetic changes are due to homologous recombination of laterally acquired DNA, with more than 20% of these events involving acquisition of DNA from other species. Signals of adaptation to evade herd immunity were indicated by genomic hot spots of recombination. Most striking is the high frequency of changes involving the pgl locus, which determines the glycosylation patterns of major protein antigens. High-frequency changes were also observed for genes involved in the regulation of pilus expression and the synthesis of Maf3 adhesins, highlighting the importance of these surface features in host-pathogen interaction and immune evasion. Importance: While established meningococcal capsule polysaccharide vaccines are protective through the induction of anticapsular antibodies, findings of our longitudinal studies in the African meningitis belt have indicated that immunity to noncapsular antigens plays a significant role in natural herd immunity. Our results show that meningococci evade herd immunity through the rapid homologous replacement of just a few key genomic loci that affect noncapsular cell surface components. Identification of recombination hot spots thus represents an eminent approach to gain insight into targets of protective natural immune responses. Moreover, our results highlight the role of the dynamics of the protein glycosylation repertoire in immune evasion by Neisseria meningitidis. These results have major implications for the design of next-generation protein-based subunit vaccines.
Funded by: Wellcome Trust
Gene-lifestyle interaction and type 2 diabetes: the EPIC interact case-cohort study.
Medical Research Council Epidemiology Unit, University of Cambridge, Cambridge, United Kingdom.
Background: Understanding of the genetic basis of type 2 diabetes (T2D) has progressed rapidly, but the interactions between common genetic variants and lifestyle risk factors have not been systematically investigated in studies with adequate statistical power. Therefore, we aimed to quantify the combined effects of genetic and lifestyle factors on risk of T2D in order to inform strategies for prevention.
Methods and findings: The InterAct study includes 12,403 incident T2D cases and a representative sub-cohort of 16,154 individuals from a cohort of 340,234 European participants with 3.99 million person-years of follow-up. We studied the combined effects of an additive genetic T2D risk score and modifiable and non-modifiable risk factors using Prentice-weighted Cox regression and random effects meta-analysis methods. The effect of the genetic score was significantly greater in younger individuals (p for interaction = 1.20×10-4). Relative genetic risk (per standard deviation [4.4 risk alleles]) was also larger in participants who were leaner, both in terms of body mass index (p for interaction = 1.50×10-3) and waist circumference (p for interaction = 7.49×10-9). Examination of absolute risks by strata showed the importance of obesity for T2D risk. The 10-y cumulative incidence of T2D rose from 0.25% to 0.89% across extreme quartiles of the genetic score in normal weight individuals, compared to 4.22% to 7.99% in obese individuals. We detected no significant interactions between the genetic score and sex, diabetes family history, physical activity, or dietary habits assessed by a Mediterranean diet score.
Conclusions: The relative effect of a T2D genetic risk score is greater in younger and leaner participants. However, this sub-group is at low absolute risk and would not be a logical target for preventive interventions. The high absolute risk associated with obesity at any level of genetic risk highlights the importance of universal rather than targeted approaches to lifestyle intervention.
Funded by: Cancer Research UK: 14136; Medical Research Council: G0401527, G0601261, G1000143, G1002084, MC_U106179471, MC_UP_A100_1003, MC_UU_12015/1, MC_UU_12015/5; Wellcome Trust: 083270/Z/07/Z, 090532, 098017
PLoS medicine 2014;11;5;e1001647
New insights into the maternal to zygotic transition.
MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, UK.
The initial phases of embryonic development occur in the absence of de novo transcription and are instead controlled by maternally inherited mRNAs and proteins. During this initial period, cell cycles are synchronous and lack gap phases. Following this period of transcriptional silence, zygotic transcription begins, the maternal influence on development starts to decrease, and dramatic changes to the cell cycle take place. Here, we discuss recent work that is shedding light on the maternal to zygotic transition and the interrelated but distinct mechanisms regulating the onset of zygotic transcription and changes to the cell cycle during early embryonic development.
Funded by: Medical Research Council: A252-5RG50, MC_U117597140; Wellcome Trust
Development (Cambridge, England) 2014;141;20;3834-41
Buzz off, that's my bee!
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Nature reviews. Microbiology 2014;12;10;659
Patterns of genome evolution that have accompanied host adaptation in Salmonella.
Pathogen Genomics, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom; email@example.com.
Many bacterial pathogens are specialized, infecting one or few hosts, and this is often associated with more acute disease presentation. Specific genomes show markers of this specialization, which often reflect a balance between gene acquisition and functional gene loss. Within Salmonella enterica subspecies enterica, a single lineage exists that includes human and animal pathogens adapted to cause infection in different hosts, including S. enterica serovar Enteritidis (multiple hosts), S. Gallinarum (birds), and S. Dublin (cattle). This provides an excellent evolutionary context in which differences between these pathogen genomes can be related to host range. Genome sequences were obtained from ∼ 60 isolates selected to represent the known diversity of this lineage. Examination and comparison of the clades within the phylogeny of this lineage revealed signs of host restriction as well as evolutionary events that mark a path to host generalism. We have identified the nature and order of events for both evolutionary trajectories. The impact of functional gene loss was predicted based upon position within metabolic pathways and confirmed with phenotyping assays. The structure of S. Enteritidis is more complex than previously known, as a second clade of S. Enteritidis was revealed that is distinct from those commonly seen to cause disease in humans or animals, and that is more closely related to S. Gallinarum. Isolates from this second clade were tested in a chick model of infection and exhibited a reduced colonization phenotype, which we postulate represents an intermediate stage in pathogen-host adaptation.
Funded by: Biotechnology and Biological Sciences Research Council: BB/D007542/1, BB/F007973/1; Wellcome Trust: 098051, 100890
Proceedings of the National Academy of Sciences of the United States of America 2014;112;3;863-8
Chemical inhibition of NAT10 corrects defects of laminopathic cells.
The Wellcome Trust/Cancer Research UK (CRUK) Gurdon Institute and Department of Biochemistry, University of Cambridge, CB2 1QN Cambridge, UK.
Down-regulation and mutations of the nuclear-architecture proteins lamin A and C cause misshapen nuclei and altered chromatin organization associated with cancer and laminopathies, including the premature-aging disease Hutchinson-Gilford progeria syndrome (HGPS). Here, we identified the small molecule "Remodelin" that improved nuclear architecture, chromatin organization, and fitness of both human lamin A/C-depleted cells and HGPS-derived patient cells and decreased markers of DNA damage in these cells. Using a combination of chemical, cellular, and genetic approaches, we identified the acetyl-transferase protein NAT10 as the target of Remodelin that mediated nuclear shape rescue in laminopathic cells via microtubule reorganization. These findings provide insights into how NAT10 affects nuclear architecture and suggest alternative strategies for treating laminopathies and aging.
Funded by: Cancer Research UK: 11224, A11224, C6/A11224, C6946/A14492; Medical Research Council: MR/L019116/1; Wellcome Trust: 092096
Science (New York, N.Y.) 2014;344;6183;527-32
Complete humanization of the mouse immunoglobulin loci enables efficient therapeutic antibody discovery.
Kymab Ltd., Babraham Research Campus, Cambridge, UK.
If immunized with an antigen of interest, transgenic mice with large portions of unrearranged human immunoglobulin loci can produce fully human antigen-specific antibodies; several such antibodies are in clinical use. However, technical limitations inherent to conventional transgenic technology and sequence divergence between the human and mouse immunoglobulin constant regions limit the utility of these mice. Here, using repetitive cycles of genome engineering in embryonic stem cells, we have inserted the entire human immunoglobulin variable-gene repertoire (2.7 Mb) into the mouse genome, leaving the mouse constant regions intact. These transgenic mice are viable and fertile, with an immune system resembling that of wild-type mice. Antigen immunization results in production of high-affinity antibodies with long human-like complementarity-determining region 3 (CDR3H), broad epitope coverage and strong signatures of somatic hypermutation. These mice provide a robust system for the discovery of therapeutic human monoclonal antibodies; as a surrogate readout of the human antibody response, they may also aid vaccine design efforts.
Funded by: Wellcome Trust
Nature biotechnology 2014;32;4;356-63
Reprogramming the methylome: erasing memory and creating diversity.
Epigenetics Programme, The Babraham Institute, Cambridge, CB22 3AT, UK; Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.
The inheritance of epigenetic marks, in particular DNA methylation, provides a molecular memory that ensures faithful commitment to transcriptional programs during mammalian development. Epigenetic reprogramming results in global hypomethylation of the genome together with a profound loss of memory, which underlies naive pluripotency. Such global reprogramming occurs in primordial germ cells, early embryos, and embryonic stem cells where reciprocal molecular links connect the methylation machinery to pluripotency. Priming for differentiation is initiated upon exit from pluripotency, and we propose that epigenetic mechanisms create diversity of transcriptional states, which help with symmetry breaking during cell fate decisions and lineage commitment.
Funded by: Biotechnology and Biological Sciences Research Council; Wellcome Trust: 095645
Cell stem cell 2014;14;6;710-9
Molecular genetic evidence for overlap between general cognitive ability and risk for schizophrenia: a report from the Cognitive Genomics consorTium (COGENT).
1] Division of Psychiatry Research, Zucker Hillside Hospital, Glen Oaks, NY, USA  Center for Psychiatric Neuroscience, Feinstein Institute for Medical Research, Manhasset, NY, USA  Hofstra North Shore-LIJ School of Medicine, Departments of Psychiatry and Molecular Medicine, Hempstead, NY, USA.
It has long been recognized that generalized deficits in cognitive ability represent a core component of schizophrenia (SCZ), evident before full illness onset and independent of medication. The possibility of genetic overlap between risk for SCZ and cognitive phenotypes has been suggested by the presence of cognitive deficits in first-degree relatives of patients with SCZ; however, until recently, molecular genetic approaches to test this overlap have been lacking. Within the last few years, large-scale genome-wide association studies (GWAS) of SCZ have demonstrated that a substantial proportion of the heritability of the disorder is explained by a polygenic component consisting of many common single-nucleotide polymorphisms (SNPs) of extremely small effect. Similar results have been reported in GWAS of general cognitive ability. The primary aim of the present study is to provide the first molecular genetic test of the classic endophenotype hypothesis, which states that alleles associated with reduced cognitive ability should also serve to increase risk for SCZ. We tested the endophenotype hypothesis by applying polygenic SNP scores derived from a large-scale cognitive GWAS meta-analysis (~5000 individuals from nine nonclinical cohorts comprising the Cognitive Genomics consorTium (COGENT)) to four SCZ case-control cohorts. As predicted, cases had significantly lower cognitive polygenic scores compared to controls. In parallel, polygenic risk scores for SCZ were associated with lower general cognitive ability. In addition, using our large cognitive meta-analytic data set, we identified nominally significant cognitive associations for several SNPs that have previously been robustly associated with SCZ susceptibility. Results provide molecular confirmation of the genetic overlap between SCZ and general cognitive ability, and may provide additional insight into pathophysiology of the disorder.
Funded by: Biotechnology and Biological Sciences Research Council: BB/F019394/1, BB/F022441/1; Chief Scientist Office: CZB/4/505, ETM/55; Medical Research Council: G0700704, MR/K026992/1; NIMH NIH HHS: K01 MH085812, K23 MH077807, K99 MH101255, P50 MH080173, R01 MH079800, R01 MH080912, R01 MH100141, RC2 MH089964
Molecular psychiatry 2014;19;2;168-74
JAK2V617F homozygosity drives a phenotypic switch in myeloproliferative neoplasms, but is insufficient to sustain disease.
Cambridge Institute for Medical Research and Wellcome Trust/Medical Research Council Stem Cell Institute, University of Cambridge, Cambridge, United Kingdom; Department of Haematology, University of Cambridge, United Kingdom;
Genomic regions of acquired uniparental disomy (UPD) are common in malignancy and frequently harbor mutated oncogenes. Homozygosity for such gain-of-function mutations is thought to modulate tumor phenotype, but direct evidence has been elusive. Polycythemia vera (PV) and essential thrombocythemia (ET), 2 subtypes of myeloproliferative neoplasms, are associated with an identical acquired JAK2V617F mutation but the mechanisms responsible for distinct clinical phenotypes remain unclear. We provide direct genetic evidence and demonstrate that homozygosity for human JAK2V617F in knock-in mice results in a striking phenotypic switch from an ET-like to PV-like phenotype. The resultant erythrocytosis is driven by increased numbers of early erythroid progenitors and enhanced erythroblast proliferation, whereas reduced platelet numbers are associated with impaired platelet survival. JAK2V617F-homozygous mice developed a severe hematopoietic stem cell defect, suggesting that additional lesions are needed to sustain clonal expansion. Together, our results indicate that UPD for 9p plays a causal role in the PV phenotype in patients as a consequence of JAK2V617F homozygosity. The generation of a JAK2V617F allelic series of mice with a dose-dependent effect on hematopoiesis provides a powerful model for studying the consequences of mutant JAK2 homozygosity.
Funded by: British Heart Foundation: FS/09/039/27788, FS/14/40/30921; Canadian Institutes of Health Research; Cancer Research UK: 12765
Constitutional and somatic rearrangement of chromosome 21 in acute lymphoblastic leukaemia.
Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, UK.
Changes in gene dosage are a major driver of cancer, known to be caused by a finite, but increasingly well annotated, repertoire of mutational mechanisms. This can potentially generate correlated copy-number alterations across hundreds of linked genes, as exemplified by the 2% of childhood acute lymphoblastic leukaemia (ALL) with recurrent amplification of megabase regions of chromosome 21 (iAMP21). We used genomic, cytogenetic and transcriptional analysis, coupled with novel bioinformatic approaches, to reconstruct the evolution of iAMP21 ALL. Here we show that individuals born with the rare constitutional Robertsonian translocation between chromosomes 15 and 21, rob(15;21)(q10;q10)c, have approximately 2,700-fold increased risk of developing iAMP21 ALL compared to the general population. In such cases, amplification is initiated by a chromothripsis event involving both sister chromatids of the Robertsonian chromosome, a novel mechanism for cancer predisposition. In sporadic iAMP21, breakage-fusion-bridge cycles are typically the initiating event, often followed by chromothripsis. In both sporadic and rob(15;21)c-associated iAMP21, the final stages frequently involve duplications of the entire abnormal chromosome. The end-product is a derivative of chromosome 21 or the rob(15;21)c chromosome with gene dosage optimized for leukaemic potential, showing constrained copy-number levels over multiple linked genes. Thus, dicentric chromosomes may be an important precipitant of chromothripsis, as we show rob(15;21)c to be constitutionally dicentric and breakage-fusion-bridge cycles generate dicentric chromosomes somatically. Furthermore, our data illustrate that several cancer-specific mutational processes, applied sequentially, can coordinate to fashion copy-number profiles over large genomic scales, incrementally refining the fitness benefits of aggregated gene dosage changes.
Funded by: NCI NIH HHS: U10 CA098543, U10 CA180886; Wellcome Trust: 077012/Z/05/Z, 088340, 093867, WT088340MA
Novel skin phenotypes revealed by a genome-wide mouse reverse genetic screen.
1] Centre for Stem Cells and Regenerative Medicine, King's College London, Guy's Hospital, London SE1 9RT, UK  Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1QW, UK  Wellcome Trust-Medical Research Council Stem Cell Institute, University of Cambridge, Tennis Court Road, Cambridge CB2 1QR, UK.
Permanent stop-and-shop large-scale mouse mutant resources provide an excellent platform to decipher tissue phenogenomics. Here we analyse skin from 538 knockout mouse mutants generated by the Sanger Institute Mouse Genetics Project. We optimize immunolabelling of tail epidermal wholemounts to allow systematic annotation of hair follicle, sebaceous gland and interfollicular epidermal abnormalities using ontology terms from the Mammalian Phenotype Ontology. Of the 50 mutants with an epidermal phenotype, 9 map to human genetic conditions with skin abnormalities. Some mutant genes are expressed in the skin, whereas others are not, indicating systemic effects. One phenotype is affected by diet and several are incompletely penetrant. In-depth analysis of three mutants, Krt76, Myo5a (a model of human Griscelli syndrome) and Mysm1, provides validation of the screen. Our study is the first large-scale genome-wide tissue phenotype screen from the International Knockout Mouse Consortium and provides an open access resource for the scientific community.
Funded by: Medical Research Council: G0300212, MC_QA137918; Wellcome Trust: 096540, 098051, 100669
Nature communications 2014;5;3540
Distribution and medical impact of loss-of-function variants in the Finnish founder population.
Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, United States of America; Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, United States of America; Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America; Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, Massachusetts, United States of America.
Exome sequencing studies in complex diseases are challenged by the allelic heterogeneity, large number and modest effect sizes of associated variants on disease risk and the presence of large numbers of neutral variants, even in phenotypically relevant genes. Isolated populations with recent bottlenecks offer advantages for studying rare variants in complex diseases as they have deleterious variants that are present at higher frequencies as well as a substantial reduction in rare neutral variation. To explore the potential of the Finnish founder population for studying low-frequency (0.5-5%) variants in complex diseases, we compared exome sequence data on 3,000 Finns to the same number of non-Finnish Europeans and discovered that, despite having fewer variable sites overall, the average Finn has more low-frequency loss-of-function variants and complete gene knockouts. We then used several well-characterized Finnish population cohorts to study the phenotypic effects of 83 enriched loss-of-function variants across 60 phenotypes in 36,262 Finns. Using a deep set of quantitative traits collected on these cohorts, we show 5 associations (p<5×10⁻⁸) including splice variants in LPA that lowered plasma lipoprotein(a) levels (P = 1.5×10⁻¹¹⁷). Through accessing the national medical records of these participants, we evaluate the LPA finding via Mendelian randomization and confirm that these splice variants confer protection from cardiovascular disease (OR = 0.84, P = 3×10⁻⁴), demonstrating for the first time the correlation between very low levels of LPA in humans with potential therapeutic implications for cardiovascular diseases. More generally, this study articulates substantial advantages for studying the role of rare variation in complex phenotypes in founder populations like the Finns and by combining a unique population genetic history with data from large population cohorts and centralized research access to National Health Registers.
Funded by: NHLBI NIH HHS: HL-102926, HL-103010, RC2 HL-102925, RFA-HL-12-007; NIDDK NIH HHS: DK062370, DK085584, P30 DK020572, R01 DK062370, R01DK075787, RC2-DK088389, U01 DK062370, U01-DK-085545; Wellcome Trust: 086596/Z/08/Z, 090367, 098381
PLoS genetics 2014;10;7;e1004494
Robust identification of noncoding RNA from transcriptomes requires phylogenetically-informed sampling.
Department of Biology, University of Copenhagen, Copenhagen, Denmark; School of Biological Sciences, University of Canterbury, Christchurch, New Zealand.
Noncoding RNAs are integral to a wide range of biological processes, including translation, gene regulation, host-pathogen interactions and environmental sensing. While genomics is now a mature field, our capacity to identify noncoding RNA elements in bacterial and archaeal genomes is hampered by the difficulty of de novo identification. The emergence of new technologies for characterizing transcriptome outputs, notably RNA-seq, are improving noncoding RNA identification and expression quantification. However, a major challenge is to robustly distinguish functional outputs from transcriptional noise. To establish whether annotation of existing transcriptome data has effectively captured all functional outputs, we analysed over 400 publicly available RNA-seq datasets spanning 37 different Archaea and Bacteria. Using comparative tools, we identify close to a thousand highly-expressed candidate noncoding RNAs. However, our analyses reveal that capacity to identify noncoding RNA outputs is strongly dependent on phylogenetic sampling. Surprisingly, and in stark contrast to protein-coding genes, the phylogenetic window for effective use of comparative methods is perversely narrow: aggregating public datasets only produced one phylogenetic cluster where these tools could be used to robustly separate unannotated noncoding RNAs from a null hypothesis of transcriptional noise. Our results show that for the full potential of transcriptomics data to be realized, a change in experimental design is paramount: effective transcriptomics requires phylogeny-aware sampling.
PLoS computational biology 2014;10;10;e1003907
CD28 expression is required after T cell priming for helper T cell responses and protective immunity to infection.
Cambridge Institute for Medical Research, University of Cambridge School of Clinical Medicine, Cambridge, United Kingdom.
The co-stimulatory molecule CD28 is essential for activation of helper T cells. Despite this critical role, it is not known whether CD28 has functions in maintaining T cell responses following activation. To determine the role for CD28 after T cell priming, we generated a strain of mice where CD28 is removed from CD4(+) T cells after priming. We show that continued CD28 expression is important for effector CD4(+) T cells following infection; maintained CD28 is required for the expansion of T helper type 1 cells, and for the differentiation and maintenance of T follicular helper cells during viral infection. Persistent CD28 is also required for clearance of the bacterium Citrobacter rodentium from the gastrointestinal tract. Together, this study demonstrates that CD28 persistence is required for helper T cell polarization in response to infection, describing a novel function for CD28 that is distinct from its role in T cell priming.
Funded by: Biotechnology and Biological Sciences Research Council: BBS/E/B/000C0407; Wellcome Trust: 083650/Z/07/Z, 098051
Genetic studies of Crohn's disease: past, present and future.
The Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.
The exact aetiology of Crohn's disease is unknown, though it is clear from early epidemiological studies that a combination of genetic and environmental risk factors contributes to an individual's disease susceptibility. Here, we review the history of gene-mapping studies of Crohn's disease, from the linkage-based studies that first implicated the NOD2 locus, through to modern-day genome-wide association studies that have discovered over 140 loci associated with Crohn's disease and yielded novel insights into the biological pathways underlying pathogenesis. We describe on-going and future gene-mapping studies that utilise next generation sequencing technology to pinpoint causal variants and identify rare genetic variation underlying Crohn's disease risk. We comment on the utility of genetic markers for predicting an individual's disease risk and discuss their potential for identifying novel drug targets and influencing disease management. Finally, we describe how these studies have shaped and continue to shape our understanding of the genetic architecture of Crohn's disease.
Funded by: Wellcome Trust: 098051
Best practice & research. Clinical gastroenterology 2014;28;3;373-86
African origin of the malaria parasite Plasmodium vivax.
Department of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.
Plasmodium vivax is the leading cause of human malaria in Asia and Latin America but is absent from most of central Africa due to the near fixation of a mutation that inhibits the expression of its receptor, the Duffy antigen, on human erythrocytes. The emergence of this protective allele is not understood because P. vivax is believed to have originated in Asia. Here we show, using a non-invasive approach, that wild chimpanzees and gorillas throughout central Africa are endemically infected with parasites that are closely related to human P. vivax. Sequence analyses reveal that ape parasites lack host specificity and are much more diverse than human parasites, which form a monophyletic lineage within the ape parasite radiation. These findings indicate that human P. vivax is of African origin and likely selected for the Duffy-negative mutation. All extant human P. vivax parasites are derived from a single ancestor that escaped out of Africa.
Funded by: Medical Research Council: MR/L008661/1; NIAID NIH HHS: P30 AI045008, R01 AI058715, R01 AI091595, R01 AI58715, R37 AI050529, T32 AI007532; Wellcome Trust: 095831, 098051
Nature communications 2014;5;3346
Loss-of-function mutations in MICU1 cause a brain and muscle disorder linked to primary alterations in mitochondrial calcium signaling.
Mitochondrial Ca(2+) uptake has key roles in cell life and death. Physiological Ca(2+) signaling regulates aerobic metabolism, whereas pathological Ca(2+) overload triggers cell death. Mitochondrial Ca(2+) uptake is mediated by the Ca(2+) uniporter complex in the inner mitochondrial membrane, which comprises MCU, a Ca(2+)-selective ion channel, and its regulator, MICU1. Here we report mutations of MICU1 in individuals with a disease phenotype characterized by proximal myopathy, learning difficulties and a progressive extrapyramidal movement disorder. In fibroblasts from subjects with MICU1 mutations, agonist-induced mitochondrial Ca(2+) uptake at low cytosolic Ca(2+) concentrations was increased, and cytosolic Ca(2+) signals were reduced. Although resting mitochondrial membrane potential was unchanged in MICU1-deficient cells, the mitochondrial network was severely fragmented. Whereas the pathophysiology of muscular dystrophy and the core myopathies involves abnormal mitochondrial Ca(2+) handling, the phenotype associated with MICU1 deficiency is caused by a primary defect in mitochondrial Ca(2+) signaling, demonstrating the crucial role of mitochondrial Ca(2+) uptake in humans.
Funded by: British Heart Foundation; Medical Research Council: G0600717, MR/K000608/1, MR/K011154/1; NIA NIH HHS: 1P01AG025532-01A1; Parkinson's UK: G-0905; Telethon: GEP12066, GGP11082, GPP10005; Wellcome Trust: 090532, 100140, 100574, WT091310
Nature genetics 2014;46;2;188-93
Do you smell what I smell? Genetic variation in olfactory perception.
*Welcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, U.K.
The sense of smell is mediated by the detection of chemical odours by ORs (olfactory receptors) in the nose. This initiates a neural percept of the odour in the brain, which may provoke an emotional or behavioural response. Analogous to colour-blindness in the visual system, some individuals report a very different percept of specific odours to others, in terms of intensity, valence or detection threshold. A significant proportion of variance in odour perception is heritable, and recent advances in genome sequencing and genotyping technologies have permitted studies into the genes that underpin these phenotypic differences. In the present article, I review the evidence that OR genes are extremely variable between individuals. I argue that this contributes to a unique receptor repertoire in our noses that provides us each with a personalized perception of our environment. I highlight specific examples where known OR variants influence odour detection and discuss the wider implications of this for both humans and other mammals that use chemical communication for social interaction.
Funded by: Wellcome Trust: WT098051
Biochemical Society transactions 2014;42;4;861-5
A DERL3-associated defect in the degradation of SLC2A1 mediates the Warburg effect.
Cancer Epigenetics and Biology Program (PEBC), Bellvitge Biomedical Research Institute (IDIBELL), L'Hospitalet, Barcelona, 08908 Catalonia, Spain.
Cancer cells possess aberrant proteomes that can arise by the disruption of genes involved in physiological protein degradation. Here we demonstrate the presence of promoter CpG island hypermethylation-linked inactivation of DERL3 (Derlin-3), a key gene in the endoplasmic reticulum-associated protein degradation pathway, in human tumours. The restoration of in vitro and in vivo DERL3 activity highlights the tumour suppressor features of the gene. Using the stable isotopic labelling of amino acids in cell culture workflow for differential proteome analysis, we identify SLC2A1 (glucose transporter 1, GLUT1) as a downstream target of DERL3. Most importantly, SLC2A1 overexpression mediated by DERL3 epigenetic loss contributes to the Warburg effect in the studied cells and pinpoints a subset of human tumours with greater vulnerability to drugs targeting glycolysis.
Funded by: NCI NIH HHS: R01 CA168653
Nature communications 2014;5;3608
Genome-wide association analysis identifies six new loci associated with forced vital capacity.
1] Department of Epidemiology, Erasmus MC, Rotterdam, the Netherlands.  Netherlands Health Care Inspectorate, The Hague, the Netherlands. .
Forced vital capacity (FVC), a spirometric measure of pulmonary function, reflects lung volume and is used to diagnose and monitor lung diseases. We performed genome-wide association study meta-analysis of FVC in 52,253 individuals from 26 studies and followed up the top associations in 32,917 additional individuals of European ancestry. We found six new regions associated at genome-wide significance (P < 5 × 10(-8)) with FVC in or near EFEMP1, BMP6, MIR129-2-HSD17B12, PRDM11, WWOX and KCNJ2. Two loci previously associated with spirometric measures (GSTCD and PTCH1) were related to FVC. Newly implicated regions were followed up in samples from African-American, Korean, Chinese and Hispanic individuals. We detected transcripts for all six newly implicated genes in human lung tissue. The new loci may inform mechanisms involved in lung development and the pathogenesis of restrictive lung disease.
Funded by: Biotechnology and Biological Sciences Research Council: BB/F019394/1; Chief Scientist Office: CZB/4/505, CZB/4/710, CZD/16/6/4, ETM/55; Department of Health: SRF/01/010; Intramural NIH HHS: Z01 ES043012-10; Medical Research Council: G0100266, G0700704, G0701863, G0902313, G1000861, MC_PC_U127561128, MC_U106179471, MC_UU_12015/1, MR/K026992/1; NCATS NIH HHS: UL1 TR000124, UL1 TR001079; NHLBI NIH HHS: N01 HC095159, R01 HL077612; NIA NIH HHS: R01 AG023629, U01 AG023746, U01 AG023749; NIMH NIH HHS: R25 MH083620
Nature genetics 2014;46;7;669-77
A proteomic chronology of gene expression through the cell cycle in human myeloid leukemia cells.
Centre for Gene Regulation and Expression, College of Life Sciences, University of Dundee, Dundee, United Kingdom.
Technological advances have enabled the analysis of cellular protein and RNA levels with unprecedented depth and sensitivity, allowing for an unbiased re-evaluation of gene regulation during fundamental biological processes. Here, we have chronicled the dynamics of protein and mRNA expression levels across a minimally perturbed cell cycle in human myeloid leukemia cells using centrifugal elutriation combined with mass spectrometry-based proteomics and RNA-Seq, avoiding artificial synchronization procedures. We identify myeloid-specific gene expression and variations in protein abundance, isoform expression and phosphorylation at different cell cycle stages. We dissect the relationship between protein and mRNA levels for both bulk gene expression and for over ∼6000 genes individually across the cell cycle, revealing complex, gene-specific patterns. This data set, one of the deepest surveys to date of gene expression in human cells, is presented in an online, searchable database, the Encyclopedia of Proteome Dynamics (http://www.peptracker.com/epd/). DOI: http://dx.doi.org/10.7554/eLife.01630.001.
Funded by: NIGMS NIH HHS: T32 GM007040; Wellcome Trust: 073980, 097769/Z/11/Z, 097945
Barriers to the effective treatment of sepsis: antimicrobial agents, sepsis definitions, and host-directed therapies.
Centre for Microbial Diseases and Immunity Research, University of British Columbia, Vancouver, British Columbia, Canada.
Sepsis is a complex clinical syndrome involving both infection and a deleterious host immune response. Antimicrobial agents are key elements of sepsis treatment, yet despite great strides in antimicrobial development in the last decades, sepsis continues to be associated with unacceptably high mortality (~30%). This is the result, on one hand, of the rise of antimicrobial resistant organisms and, on the other hand, of the dearth of effective host-directed immune therapies. A major obstacle to the development of good host-directed therapies is the lack of understanding of the host immune response. The problem is exacerbated by poor nonspecific clinical definitions of disease. Poor definitions have had a profound impact on sepsis research, from epidemiologic studies to the failed clinical trials of host-directed therapies. Therefore, better definitions must be developed to enable advancement in the field.
Funded by: Canadian Institutes of Health Research
Annals of the New York Academy of Sciences 2014;1323;101-14
Low MITF/AXL ratio predicts early resistance to multiple targeted drugs in melanoma.
Division of Molecular Oncology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066CX Amsterdam, The Netherlands.
Increased expression of the Microphthalmia-associated transcription factor (MITF) contributes to melanoma progression and resistance to BRAF pathway inhibition. Here we show that the lack of MITF is associated with more severe resistance to a range of inhibitors, while its presence is required for robust drug responses. Both in primary and acquired resistance, MITF levels inversely correlate with the expression of several activated receptor tyrosine kinases, most frequently AXL. The MITF-low/AXL-high/drug-resistance phenotype is common among mutant BRAF and NRAS melanoma cell lines. The dichotomous behaviour of MITF in drug response is corroborated in vemurafenib-resistant biopsies, including MITF-high and -low clones in a relapsed patient. Furthermore, drug cocktails containing AXL inhibitor enhance melanoma cell elimination by BRAF or ERK inhibition. Our results demonstrate that a low MITF/AXL ratio predicts early resistance to multiple targeted drugs, and warrant clinical validation of AXL inhibitors to combat resistance of BRAF and NRAS mutant MITF-low melanomas.
Funded by: NCATS NIH HHS: UL1 TR000124, UL1TR000124; NCI NIH HHS: P01 CA168585, R01 CA176111
Nature communications 2014;5;5712
Cloning of recombinant monoclonal antibodies from hybridomas in a single mammalian expression plasmid.
Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.
Antibodies are an integral part of biological and medical research. In addition, immunoglobulins are used in many diagnostic tests and are becoming increasingly important in the therapy of diseases. To express antibodies recombinantly, the immunoglobulin heavy and light chains are usually cloned into two different expression plasmids. Here, we describe a method for recombinant antibody expression from a single plasmid.
Methods in molecular biology (Clifton, N.J.) 2014;1131;229-40
Guidelines for investigating causality of sequence variants in human disease.
1] Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA  Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA.
The discovery of rare genetic variants is accelerating, and clear guidelines for distinguishing disease-causing sequence variants from the many potentially functional variants present in any human genome are urgently needed. Without rigorous standards we risk an acceleration of false-positive reports of causality, which would impede the translation of genomic research findings into the clinical diagnostic setting and hinder biological understanding of disease. Here we discuss the key challenges of assessing sequence variants in human disease, integrating both gene-level and variant-level support for causality. We propose guidelines for summarizing confidence in variant pathogenicity and highlight several areas that require further resource development.
Funded by: NHGRI NIH HHS: R01 HG007022, U54 HG006997; NHLBI NIH HHS: R01 HL117626; NIDDK NIH HHS: P30 DK020595, P30 DK042086; NIMH NIH HHS: R01 MH101810
The rate of nonallelic homologous recombination in males is highly variable, correlated between monozygotic twins and independent of age.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
Nonallelic homologous recombination (NAHR) between highly similar duplicated sequences generates chromosomal deletions, duplications and inversions, which can cause diverse genetic disorders. Little is known about interindividual variation in NAHR rates and the factors that influence this. We estimated the rate of deletion at the CMT1A-REP NAHR hotspot in sperm DNA from 34 male donors, including 16 monozygotic (MZ) co-twins (8 twin pairs) aged 24 to 67 years old. The average NAHR rate was 3.5 × 10(-5) with a seven-fold variation across individuals. Despite good statistical power to detect even a subtle correlation, we observed no relationship between age of unrelated individuals and the rate of NAHR in their sperm, likely reflecting the meiotic-specific origin of these events. We then estimated the heritability of deletion rate by calculating the intraclass correlation (ICC) within MZ co-twins, revealing a significant correlation between MZ co-twins (ICC = 0.784, p = 0.0039), with MZ co-twins being significantly more correlated than unrelated pairs. We showed that this heritability cannot be explained by variation in PRDM9, a known regulator of NAHR, or variation within the NAHR hotspot itself. We also did not detect any correlation between Body Mass Index (BMI), smoking status or alcohol intake and rate of NAHR. Our results suggest that other, as yet unidentified, genetic or environmental factors play a significant role in the regulation of NAHR and are responsible for the extensive variation in the population for the probability of fathering a child with a genomic disorder resulting from a pathogenic deletion.
Funded by: Wellcome Trust: 077014/Z/05/Z
PLoS genetics 2014;10;3;e1004195
Single cell genomics: advances and future perspectives.
Single Cell Genomics Centre, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.
Advances in whole-genome and whole-transcriptome amplification have permitted the sequencing of the minute amounts of DNA and RNA present in a single cell, offering a window into the extent and nature of genomic and transcriptomic heterogeneity which occurs in both normal development and disease. Single-cell approaches stand poised to revolutionise our capacity to understand the scale of genomic, epigenomic, and transcriptomic diversity that occurs during the lifetime of an individual organism. Here, we review the major technological and biological breakthroughs achieved, describe the remaining challenges to overcome, and provide a glimpse into the promise of recent and future developments.
Funded by: Wellcome Trust
PLoS genetics 2014;10;1;e1004126
Exome Sequencing in Fetuses with Structural Malformations.
Centre of Women's and Children's Health & School of Clinical and Experimental Medicine, College of Medicine and Dentistry, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK. firstname.lastname@example.org.
Prenatal diagnostic testing is a rapidly advancing field. An accurate diagnosis of structural anomalies and additional abnormalities in fetuses with structural anomalies is important to allow "triage" and designation of prognosis. This will allow parents to make an informed decision relating to the pregnancy. This review outlines the current tests used in prenatal diagnosis, focusing particularly on "new technologies" such as exome sequencing. We demonstrate the utility of exome sequencing above that of conventional karyotyping and Chromosomal Microarray (CMA) alone by outlining a recent proof of concept study investigating 30 parent-fetus trios where the fetus is known to have a structural anomaly. This may allow the identification of pathological gene anomalies and consequently improved prognostic profiling, as well as excluding anomalies and distinguishing between de novo and inherited mutations, in order to estimate the recurrence risk in future pregnancies. The potential ethical dilemmas surrounding exome sequencing are also considered, and the future of prenatal genetic diagnosis is discussed.
Journal of clinical medicine 2014;3;3;747-62
Evidence for camel-to-human transmission of MERS coronavirus.
The New England journal of medicine 2014;371;14;1360
Targeting of Slc25a21 is associated with orofacial defects and otitis media due to disrupted expression of a neighbouring gene.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom.
Homozygosity for Slc25a21(tm1a(KOMP)Wtsi) results in mice exhibiting orofacial abnormalities, alterations in carpal and rugae structures, hearing impairment and inflammation in the middle ear. In humans it has been hypothesised that the 2-oxoadipate mitochondrial carrier coded by SLC25A21 may be involved in the disease 2-oxoadipate acidaemia. Unexpectedly, no 2-oxoadipate acidaemia-like symptoms were observed in animals homozygous for Slc25a21(tm1a(KOMP)Wtsi) despite confirmation that this allele reduces Slc25a21 expression by 71.3%. To study the complete knockout, an allelic series was generated using the loxP and FRT sites typical of a Knockout Mouse Project allele. After removal of the critical exon and neomycin selection cassette, Slc25a21 knockout mice homozygous for the Slc25a21(tm1b(KOMP)Wtsi) and Slc25a21(tm1d(KOMP)Wtsi) alleles were phenotypically indistinguishable from wild-type. This led us to explore the genomic environment of Slc25a21 and to discover that expression of Pax9, located 3' of the target gene, was reduced in homozygous Slc25a21(tm1a(KOMP)Wtsi) mice. We hypothesize that the presence of the selection cassette is the cause of the down regulation of Pax9 observed. The phenotypes we observed in homozygous Slc25a21(tm1a(KOMP)Wtsi) mice were broadly consistent with a hypomorphic Pax9 allele with the exception of otitis media and hearing impairment which may be a novel consequence of Pax9 down regulation. We explore the ramifications associated with this particular targeted mutation and emphasise the need to interpret phenotypes taking into consideration all potential underlying genetic mechanisms.
Funded by: Medical Research Council: G0300212, G0901338, MC_QA137918, MC_UP_1502/1; Wellcome Trust: 098051, 100669
PloS one 2014;9;3;e91807
Single-cell RNA sequencing reveals T helper cells synthesizing steroids de novo to contribute to immune homeostasis.
EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK; MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 OQH, UK; Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. Electronic address: email@example.com.
T helper 2 (Th2) cells regulate helminth infections, allergic disorders, tumor immunity, and pregnancy by secreting various cytokines. It is likely that there are undiscovered Th2 signaling molecules. Although steroids are known to be immunoregulators, de novo steroid production from immune cells has not been previously characterized. Here, we demonstrate production of the steroid pregnenolone by Th2 cells in vitro and in vivo in a helminth infection model. Single-cell RNA sequencing and quantitative PCR analysis suggest that pregnenolone synthesis in Th2 cells is related to immunosuppression. In support of this, we show that pregnenolone inhibits Th cell proliferation and B cell immunoglobulin class switching. We also show that steroidogenic Th2 cells inhibit Th cell proliferation in a Cyp11a1 enzyme-dependent manner. We propose pregnenolone as a "lymphosteroid," a steroid produced by lymphocytes. We speculate that this de novo steroid production may be an intrinsic phenomenon of Th2-mediated immune responses to actively restore immune homeostasis.
Funded by: Cancer Research UK: 12765; European Research Council: 260507; Medical Research Council: G0801473, G0900567, MC_PC_12009, MC_U105161047, MC_U105178805; National Centre for the Replacement, Refinement and Reduction of Animals in Research: G0900729/1; Wellcome Trust
Cell reports 2014;7;4;1130-42
A 2.5-kilobase deletion containing a cluster of nine microRNAs in the latency-associated-transcript locus of the pseudorabies virus affects the host response of porcine trigeminal ganglia during established latency.
INRA, AgroParisTech, UMR1313 Animal Genetics and Integrative Biology, Jouy-en-Josas, France CEA, DSV, IRCM, SREIT, LREG, Jouy-en-Josas, France.
Unlabelled: The alphaherpesvirus pseudorabies virus (PrV) establishes latency primarily in neurons of trigeminal ganglia when only the transcription of the latency-associated transcript (LAT) locus is detected. Eleven microRNAs (miRNAs) cluster within the LAT, suggesting a role in establishment and/or maintenance of latency. We generated a mutant (M) PrV deleted of nine miRNA genes which displayed properties that were almost identical to those of the parental PrV wild type (WT) during propagation in vitro. Fifteen pigs were experimentally infected with either WT or M virus or were mock infected. Similar levels of virus excretion and host antibody response were observed in all infected animals. At 62 days postinfection, trigeminal ganglia were excised and profiled by deep sequencing and quantitative RT-PCR. Latency was established in all infected animals without evidence of viral reactivation, demonstrating that miRNAs are not essential for this process. Lower levels of the large latency transcript (LLT) were found in ganglia infected by M PrV than in those infected by WT PrV. All PrV miRNAs were expressed, with highest expression observed for prv-miR-LLT1, prv-miR-LLT2 (in WT ganglia), and prv-miR-LLT10 (in both WT and M ganglia). No evidence of differentially expressed porcine miRNAs was found. Fifty-four porcine genes were differentially expressed between WT, M, and control ganglia. Both viruses triggered a strong host immune response, but in M ganglia gene upregulation was prevalent. Pathway analyses indicated that several biofunctions, including those related to cell-mediated immune response and the migration of dendritic cells, were impaired in M ganglia. These findings are consistent with a function of the LAT locus in the modulation of host response for maintaining a latent state.
Importance: This study provides a thorough reference on the establishment of latency by PrV in its natural host, the pig. Our results corroborate the evidence obtained from the study of several LAT mutants of other alphaherpesviruses encoding miRNAs from their LAT regions. Neither PrV miRNA expression nor high LLT expression levels are essential to achieve latency in trigeminal ganglia. Once latency is established by PrV, the only remarkable differences are found in the pattern of host response. This indicates that, as in herpes simplex virus, LAT functions as an immune evasion locus.
Journal of virology 2014;89;1;428-42
Glucose-6-phosphate dehydrogenase polymorphisms and susceptibility to mild malaria in Dogon and Fulani, Mali.
Malaria Research and Training Centre, Department of Epidemiology of Parasitic Diseases, Faculty of Medicine, Pharmacy and Odonto - Stomatology, USTTB, BP 1805 Bamako, Mali. firstname.lastname@example.org.
Background: Glucose-6-phosphate dehydrogenase (G6PD) deficiency is associated with protection from severe malaria, and potentially uncomplicated malaria phenotypes. It has been documented that G6PD deficiency in sub-Saharan Africa is due to the 202A/376G G6PD A-allele, and association studies have used genotyping as a convenient technique for epidemiological studies. However, recent studies have shown discrepancies in G6PD202/376 associations with severe malaria. There is evidence to suggest that other G6PD deficiency alleles may be common in some regions of West Africa, and that allelic heterogeneity could explain these discrepancies.
Methods: A cross-sectional epidemiological study of malaria susceptibility was conducted during 2006 and 2007 in the Sahel meso-endemic malaria zone of Mali. The study included Dogon (n = 375) and Fulani (n = 337) sympatric ethnic groups, where the latter group is characterized by lower susceptibility to Plasmodium falciparum malaria. Fifty-three G6PD polymorphisms, including 202/376, were genotyped across the 712 samples. Evidence of association of these G6PD polymorphisms and mild malaria was assessed in both ethnic groups using genotypic and haplotypic statistical tests.
Results: It was confirmed that the Fulani are less susceptible to malaria, and the 202A mutation is rare in this group (<1% versus Dogon 7.9%). The Betica-Selma 968C/376G (~11% enzymatic activity) was more common in Fulani (6.1% vs Dogon 0.0%). There are differences in haplotype frequencies between Dogon and Fulani, and association analysis did not reveal strong evidence of protective G6PD genetic effects against uncomplicated malaria in both ethnic groups and gender. However, there was some evidence of increased risk of mild malaria in Dogon with the 202A mutation, attaining borderline statistical significance in females. The rs915942 polymorphism was found to be associated with asymptomatic malaria in Dogon females, and the rs61042368 polymorphism was associated with clinical malaria in Fulani males.
Conclusions: The results highlight the need to consider markers in addition to G6PD202 in studies of deficiency. Further, large genetic epidemiological studies of multi-ethnic groups in West Africa across a spectrum of malaria severity phenotypes are required to establish who receives protection from G6PD deficiency.
Funded by: Medical Research Council: G0600230, G0600718, MR/K000551/1; Wellcome Trust: 077012/Z/05/Z, 090532/Z/09/Z, 090770/Z/09/Z, WT077383/Z/05/Z
Malaria journal 2014;13;270
Fc gamma receptor IIa-H131R polymorphism and malaria susceptibility in sympatric ethnic groups, Fulani and Dogon of Mali.
Malaria Research and Training Center/Department of Epidemiology of Parasitic Diseases/Faculty of Medicine, Pharmacy and Odonto - Stomatology, Bamako/USTTB, Mali; Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, Stockholm, Sweden.
It has been previously shown that there are some interethnic differences in susceptibility to malaria between two sympatric ethnic groups of Mali, the Fulani and the Dogon. The lower susceptibility to Plasmodium falciparum malaria seen in the Fulani has not been fully explained by genetic polymorphisms previously known to be associated with malaria resistance, including haemoglobin S (HbS), haemoglobin C (HbC), alpha-thalassaemia and glucose-6-phosphate dehydrogenase (G6PD) deficiency. Given the observed differences in the distribution of FcγRIIa allotypes among different ethnic groups and with malaria susceptibility that have been reported, we analysed the rs1801274-R131H polymorphism in the FcγRIIa gene in a study of Dogon and Fulani in Mali (n = 939). We confirm that the Fulani have less parasite densities, less parasite prevalence, more spleen enlargement and higher levels of total IgG antibodies (anti-CSP, anti-AMA1, anti-MSP1 and anti-MSP2) and more total IgE (P < 0.05) compared with the Dogon ethnic group. Furthermore, the Fulani exhibit higher frequencies of the blood group O (56.5%) compared with the Dogon (43.5%) (P < 0.001). With regard to the FcγRIIa polymorphism and allele frequency, the Fulani group have a higher frequency of the H allele (Fulani 0.474, Dogon 0.341, P < 0.0001), which was associated with greater total IgE production (P = 0.004). Our findings show that the FcγRIIa polymorphism might have an implication in the relative protection seen in the Fulani tribe, with confirmatory studies required in other malaria endemic settings.
Funded by: Medical Research Council; Wellcome Trust
Scandinavian journal of immunology 2014;79;1;43-50
Characterization of Vibrio cholerae bacteriophages isolated from the environmental waters of the Lake Victoria region of Kenya.
School of Biological Sciences, University of Nairobi, Nairobi, Kenya, email@example.com.
Over the last decade, cholera outbreaks have become common in some parts of Kenya. The most recent cholera outbreak occurred in Coastal and Lake Victoria region during January 2009 and May 2010, where a total of 11,769 cases and 274 deaths were reported by the Ministry of Public Health and Sanitation. The objective of this study is to isolate Vibrio cholerae bacteriophages from the environmental waters of the Lake Victoria region of Kenya with potential for use as a biocontrol for cholera outbreaks. Water samples from wells, ponds, sewage effluent, boreholes, rivers, and lakes of the Lake Victoria region of Kenya were enriched for 48 h at 37 °C in broth containing a an environmental strain of V. cholerae. Bacteriophages were isolated from 5 out of the 42 environmental water samples taken. Isolated phages produced tiny, round, and clear plaques suggesting that these phages were lytic to V. cholerae. Transmission electron microscope examination revealed that all the nine phages belonged to the family Myoviridae, with typical icosahedral heads, long contractile tails, and fibers. Head had an average diameter of 88.3 nm and tail of length and width 84.9 and 16.1 nm, respectively. Vibriophages isolated from the Lake Victoria region of Kenya have been characterized and the isolated phages may have a potential to be used as antibacterial agents to control pathogenic V. cholerae bacteria in water reservoirs.
Current microbiology 2014;68;1;64-70
Mutation in KERA identified by linkage analysis and targeted resequencing in a pedigree with premature atherosclerosis.
Department of Vascular Medicine, Academic Medical Centre, Amsterdam, the Netherlands; Department of Experimental Vascular Medicine, Academic Medical Centre, Amsterdam, the Netherlands.
Aims: Genetic factors explain a proportion of the inter-individual variation in the risk for atherosclerotic events, but the genetic basis of atherosclerosis and atherothrombosis in families with Mendelian forms of premature atherosclerosis is incompletely understood. We set out to unravel the molecular pathology in a large kindred with an autosomal dominant inherited form of premature atherosclerosis.
Methods and results: Parametric linkage analysis was performed in a pedigree comprising 4 generations, of which a total of 11 members suffered from premature vascular events. A parametric LOD-score of 3.31 was observed for a 4.4 Mb interval on chromosome 12. Upon sequencing, a non-synonymous variant in KERA (c.920C>G; p.Ser307Cys) was identified. The variant was absent from nearly 28,000 individuals, including 2,571 patients with premature atherosclerosis. KERA, a proteoglycan protein, was expressed in lipid-rich areas of human atherosclerotic lesions, but not in healthy arterial specimens. Moreover, KERA expression in plaques was significantly associated with plaque size in a carotid-collar Apoe-/- mice (r2 = 0.69; p<0.0001).
Conclusion: A rare variant in KERA was identified in a large kindred with premature atherosclerosis. The identification of KERA in atherosclerotic plaque specimen in humans and mice lends support to its potential role in atherosclerosis.
Funded by: British Heart Foundation: RG/09/012/28096, RG/09/12/28096; Medical Research Council
PloS one 2014;9;5;e98289
Reappraisal of known malaria resistance loci in a large multicenter study.
Many human genetic associations with resistance to malaria have been reported, but few have been reliably replicated. We collected data on 11,890 cases of severe malaria due to Plasmodium falciparum and 17,441 controls from 12 locations in Africa, Asia and Oceania. We tested 55 SNPs in 27 loci previously reported to associate with severe malaria. There was evidence of association at P < 1 × 10(-4) with the HBB, ABO, ATP2B4, G6PD and CD40LG loci, but previously reported associations at 22 other loci did not replicate in the multicenter analysis. The large sample size made it possible to identify authentic genetic effects that are heterogeneous across populations or phenotypes, with a striking example being the main African form of G6PD deficiency, which reduced the risk of cerebral malaria but increased the risk of severe malarial anemia. The finding that G6PD deficiency has opposing effects on different fatal complications of P. falciparum infection indicates that the evolutionary origins of this common human genetic disorder are more complex than previously supposed.
Funded by: FIC NIH HHS: D43 TW001589; Medical Research Council: G0600230, G0600718, G19/9, G9901439; Wellcome Trust: 076934/Z/05/Z, 084538, 087285, 089276/Z/09/Z, 090532, 090532/Z/09/Z, 090770, 090770/Z/09/Z, 091758, 091758/Z/10/Z, 096527, 097364/Z/11/Z, 098051/Z/05/Z, WT077383/Z/05/Z
Nature genetics 2014;46;11;1197-204
Driver somatic mutations identify distinct disease entities within myeloid neoplasms with myelodysplasia.
Department of Molecular Medicine, University of Pavia, Pavia, Italy; Department of Hematology Oncology, Fondazione Istituto di Ricovero e Cura a Carattere Scientifico Policlinico San Matteo, Pavia, Italy;
Our knowledge of the genetic basis of myelodysplastic syndromes (MDS) and myelodysplastic/myeloproliferative neoplasms (MDS/MPN) has considerably improved. To define genotype/phenotype relationships of clinical relevance, we studied 308 patients with MDS, MDS/MPN, or acute myeloid leukemia evolving from MDS. Unsupervised statistical analysis, including the World Health Organization classification criteria and somatic mutations, showed that MDS associated with SF3B1-mutation (51 of 245 patients, 20.8%) is a distinct nosologic entity irrespective of current morphologic classification criteria. Conversely, MDS with ring sideroblasts with nonmutated SF3B1 segregated in different clusters with other MDS subtypes. Mutations of genes involved in DNA methylation, splicing factors other than SF3B1, and genes of the RAS pathway and cohesin complex were independently associated with multilineage dysplasia and identified a distinct subset (51 of 245 patients, 20.8%). No recurrent mutation pattern correlated with unilineage dysplasia without ring sideroblasts. Irrespective of driver somatic mutations, a threshold of 5% bone marrow blasts retained a significant discriminant value for identifying cases with clonal evolution. Comutation of TET2 and SRSF2 was highly predictive of a myeloid neoplasm characterized by myelodysplasia and monocytosis, including but not limited to, chronic myelomonocytic leukemia. These results serve as a proof of concept that a molecular classification of myeloid neoplasms is feasible.
Funded by: Wellcome Trust: 088340
High throughput exome coverage of clinically relevant cardiac genes.
Division of Cardiology, Department of Pediatrics, Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada. firstname.lastname@example.org.
Background: Given the growing use of whole-exome sequencing (WES) for clinical diagnostics of complex human disorders, we evaluated coverage of clinically relevant cardiac genes on WES and factors influencing uniformity and depth of coverage of exonic regions.
Methods: Two hundred and thirteen human DNA samples were exome sequenced via Illumina HiSeq using different versions of the Agilent SureSelect capture kit. 50 cardiac genes were further analyzed including 31 genes from the American College of Medical Genetics (ACMG) list for reporting of incidental findings and 19 genes associated with congenital heart disease for which clinical testing is available. Gene coordinates were obtained from two databases, CCDS and Known Gene and compared. Read depth for each region was extracted from the exomes and used to assess capture variability between kits for individual genes, and for overall coverage. GC content, gene size, and inter-sample variability were also tested as potential contributors to variability in gene coverage.
Results: All versions of capture kits (designed based on Consensus coding sequence) included only 55% of known genomic regions for the cardiac genes. Although newer versions of each Agilent kit showed improvement in capture of CCDS regions to 99%, only 64% of Known Gene regions were captured even with newer capture kits. There was considerable variability in coverage of the cardiac genes. 10 of the 50 genes including 6 on the ACMG list had less than the optimal coverage of 30X. Within each gene, only 32 of the 50 genes had the majority of their bases covered at an interquartile range ≥30X. Heterogeneity in gene coverage was modestly associated with gene size and significantly associated with GC content.
Conclusions: Despite improvement in overall coverage across the exome with newer capture kit versions and higher sequencing depths, only 50% of known genomic regions of clinical cardiac genes are targeted and individual gene coverage is non-uniform. This may contribute to a bias with greater attribution of disease causation to mutations in well-represented and well-covered genes. Improvements in WES technology are needed before widespread clinical application.
BMC medical genomics 2014;7;67
Absence of Appl2 sensitizes endotoxin shock through activation of PI3K/Akt pathway.
Key Laboratory of Regenerative Biology, Guangzhou Institute of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou, 510530 China.
Background: The adapter proteins Appl1 (adaptor protein containing pleckstrin homology domain, phosphotyrosine domain, and leucine zipper motif 1) and Appl2 are highly homologous and involved in several signaling pathways. While previous studies have shown that Appl1 plays a pivotal role in adiponectin signaling and insulin secretion, the physiological functions of Appl2 are largely unknown.
Results: In the present study, the role of Appl2 in sepsis shock was investigated by using Appl2 knockout (KO) mice. When challenged with lipopolysaccharides (LPS), Appl2 KO mice exhibited more severe symptoms of endotoxin shock, accompanied by increased production of proinflammatory cytokines. In comparison with the wild-type control, deletion of Appl2 led to higher levels of TNF-α and IL-1β in primary macrophages. In addition, phosphorylation of Akt and its downstream effector NF-κB was significantly enhanced. By co-immunoprecipitation, we found that Appl2 and Appl1 interacted with each other and formed a complex with PI3K regulatory subunit p85α, which is an upstream regulator of Akt. Consistent with these results, deletion of Appl1 in macrophages exhibited characteristics of reduced Akt activation and decreased the production of TNFα and IL-1β when challenged by LPS.
Conclusions: Results of the present study demonstrated that Appl2 is a critical negative regulator of innate immune response via inhibition of PI3K/Akt/NF-κB signaling pathway by forming a complex with Appl1 and PI3K.
Cell & bioscience 2014;4;1;60
The common marmoset genome provides insight into primate biology and evolution.
We report the whole-genome sequence of the common marmoset (Callithrix jacchus). The 2.26-Gb genome of a female marmoset was assembled using Sanger read data (6×) and a whole-genome shotgun strategy. A first analysis has permitted comparison with the genomes of apes and Old World monkeys and the identification of specific features that might contribute to the unique biology of this diminutive primate, including genetic changes that may influence body size, frequent twinning and chimerism. We observed positive selection in growth hormone/insulin-like growth factor genes (growth pathways), respiratory complex I genes (metabolic pathways), and genes encoding immunobiological factors and proteases (reproductive and immunity pathways). In addition, both protein-coding and microRNA genes related to reproduction exhibited evidence of rapid sequence evolution. This genome sequence for a New World monkey enables increased power for comparative analyses among available primate genomes and facilitates biomedical research application.
Funded by: European Research Council: 260372; Howard Hughes Medical Institute; NHGRI NIH HHS: K99 HG005846, R01 HG002385, U41 HG002371, U54 HG003079, U54 HG003273; NIDDK NIH HHS: R01 DK077639; NIGMS NIH HHS: R01 GM059290; NIH HHS: P51 OD011133; Wellcome Trust: 095908
Nature genetics 2014;46;8;850-7
Parallel dynamics and evolution: Protein conformational fluctuations and assembly reflect evolutionary changes in sequence and structure.
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
Protein structure is dynamic: the intrinsic flexibility of polypeptides facilitates a range of conformational fluctuations, and individual protein chains can assemble into complexes. Proteins are also dynamic in evolution: significant variations in secondary, tertiary and quaternary structure can be observed among divergent members of a protein family. Recent work has highlighted intriguing similarities between these structural and evolutionary dynamics occurring at various levels. Here we review evidence showing how evolutionary changes in protein sequence and structure are often closely related to local protein flexibility and disorder, large-scale motions and quaternary structure assembly. We suggest that these correspondences can be largely explained by neutral evolution, while deviations between structural and evolutionary dynamics can provide valuable functional insights. Finally, we address future prospects for the field and practical applications that arise from a deeper understanding of the intimate relationship between protein structure, dynamics, function and evolution.
Funded by: Medical Research Council: MC_U105161047
BioEssays : news and reviews in molecular, cellular and developmental biology 2014;36;2;209-18
Mutations in PLK4, encoding a master regulator of centriole biogenesis, cause microcephaly, growth failure and retinopathy.
Medical Research Council (MRC) Human Genetics Unit, Institute of Genetics and Molecular Medicine (IGMM), University of Edinburgh, Edinburgh, UK.
Centrioles are essential for ciliogenesis. However, mutations in centriole biogenesis genes have been reported in primary microcephaly and Seckel syndrome, disorders without the hallmark clinical features of ciliopathies. Here we identify mutations in the genes encoding PLK4 kinase, a master regulator of centriole duplication, and its substrate TUBGCP6 in individuals with microcephalic primordial dwarfism and additional congenital anomalies, including retinopathy, thereby extending the human phenotypic spectrum associated with centriole dysfunction. Furthermore, we establish that different levels of impaired PLK4 activity result in growth and cilia phenotypes, providing a mechanism by which microcephaly disorders can occur with or without ciliopathic features.
Funded by: Medical Research Council
Nature genetics 2014;46;12;1283-92
Assessing multivariate gene-metabolome associations with rare variants using Bayesian reduced rank regression.
Department of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA, USA Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Unit of Public Health Genomics, National Institute for Health and Welfare, Helsinki, Computational Medicine, Institute of Health Sciences, University of Oulu and Oulu University Hospital, Oulu, NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland, Department of Epidemiology and Biostatistics, MRC Health Protection, Agency (HPA) Centre for Environment and Health, School of Public Health, Imperial College, London, UK, Institute of Health Sciences, Biocenter Oulu, University of Oulu, Oulu, Department of Clinical Physiology, Tampere University Hospital and University of Tampere, Department of Clinical Chemistry, Fimlab Laboratories, University of Tampere School of Medicine, Tampere, Finland, Computational Medicine, School of Social and Community Medicine and the Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK, Department of Clinical Physiology and Nuclear Medicine, Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku and Turku University Hospital, Turku, Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Unit of Primary Care, Oulu University Hospital, Department of Children and Young People and Families, National Institute for Health and Welfare, Oulu, Finland, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK, Hjelt Institute and Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, FinlandDepartment of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Dise
Motivation: A typical genome-wide association study searches for associations between single nucleotide polymorphisms (SNPs) and a univariate phenotype. However, there is a growing interest to investigate associations between genomics data and multivariate phenotypes, for example, in gene expression or metabolomics studies. A common approach is to perform a univariate test between each genotype-phenotype pair, and then to apply a stringent significance cutoff to account for the large number of tests performed. However, this approach has limited ability to uncover dependencies involving multiple variables. Another trend in the current genetics is the investigation of the impact of rare variants on the phenotype, where the standard methods often fail owing to lack of power when the minor allele is present in only a limited number of individuals.
Results: We propose a new statistical approach based on Bayesian reduced rank regression to assess the impact of multiple SNPs on a high-dimensional phenotype. Because of the method's ability to combine information over multiple SNPs and phenotypes, it is particularly suitable for detecting associations involving rare variants. We demonstrate the potential of our method and compare it with alternatives using the Northern Finland Birth Cohort with 4702 individuals, for whom genome-wide SNP data along with lipoprotein profiles comprising 74 traits are available. We discovered two genes (XRCC4 and MTHFD2L) without previously reported associations, which replicated in a combined analysis of two additional cohorts: 2390 individuals from the Cardiovascular Risk in Young Finns study and 3659 individuals from the FINRISK study.
Availability and implementation: R-code freely available for download at http://users.ics.aalto.fi/pemartti/gene_metabolome/.
Funded by: Medical Research Council: G0500539, G0600705, G1002319; NHLBI NIH HHS: 5R01HL087679-02; NIMH NIH HHS: 1RL1MH083268-01, 5R01MH63706-02
Bioinformatics (Oxford, England) 2014;30;14;2026-34
Bacillary dysentery from World War 1 and NCTC1, the first bacterial isolate in the National Collection.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.
Lancet (London, England) 2014;384;9955;1720
Antibiotics and Collateral Damage
Book review of : Missing Microbes How the Overuse of Antibiotics Is Fueling Our Modern Plagues by Martin J. Blaser
The genome sequence of ectromelia virus Naval and Cornell isolates from outbreaks in North America.
Centro de Biología Molecular Severo Ochoa (Consejo Superior de Investigaciones Científicas-Universidad Autónoma de Madrid), Nicolas Cabrera 1, Campus de Cantoblanco, Madrid, Spain.
Ectromelia virus (ECTV) is the causative agent of mousepox, a disease of laboratory mouse colonies and an excellent model for human smallpox. We report the genome sequence of two isolates from outbreaks in laboratory mouse colonies in the USA in 1995 and 1999: ECTV-Naval and ECTV-Cornell, respectively. The genome of ECTV-Naval and ECTV-Cornell was sequenced by the 454-Roche technology. The ECTV-Naval genome was also sequenced by the Sanger and Illumina technologies in order to evaluate these technologies for poxvirus genome sequencing. Genomic comparisons revealed that ECTV-Naval and ECTV-Cornell correspond to the same virus isolated from independent outbreaks. Both ECTV-Naval and ECTV-Cornell are extremely virulent in susceptible BALB/c mice, similar to ECTV-Moscow. This is consistent with the ECTV-Naval genome sharing 98.2% DNA sequence identity with that of ECTV-Moscow, and indicates that the genetic differences with ECTV-Moscow do not affect the virulence of ECTV-Naval in the mousepox model of footpad infection.
Funded by: Wellcome Trust: 051087/Z97/Z
Molecular evolution of broadly neutralizing Llama antibodies to the CD4-binding site of HIV-1.
Wohl Virion Centre and Medical Research Council (MRC) Centre for Medical Molecular Virology, Division of Infection and Immunity, University College London, London, United Kingdom.
To date, no immunization of humans or animals has elicited broadly neutralizing sera able to prevent HIV-1 transmission; however, elicitation of broad and potent heavy chain only antibodies (HCAb) has previously been reported in llamas. In this study, the anti-HIV immune responses in immunized llamas were studied via deep sequencing analysis using broadly neutralizing monoclonal HCAbs as a guides. Distinct neutralizing antibody lineages were identified in each animal, including two defined by novel antibodies (as variable regions called VHH) identified by robotic screening of over 6000 clones. The combined application of five VHH against viruses from clades A, B, C and CRF_AG resulted in neutralization as potent as any of the VHH individually and a predicted 100% coverage with a median IC50 of 0.17 µg/ml for the panel of 60 viruses tested. Molecular analysis of the VHH repertoires of two sets of immunized animals showed that each neutralizing lineage was only observed following immunization, demonstrating that they were elicited de novo. Our results show that immunization can induce potent and broadly neutralizing antibodies in llamas with features similar to human antibodies and provide a framework to analyze the effectiveness of immunization protocols.
Funded by: Medical Research Council: G0900950, MC_EX_G0800785
PLoS pathogens 2014;10;12;e1004552
The cavefish genome reveals candidate genes for eye loss.
The Genome Institute, Washington University, Campus Box 8501, St Louis, Missouri 63108, USA.
Natural populations subjected to strong environmental selection pressures offer a window into the genetic underpinnings of evolutionary change. Cavefish populations, Astyanax mexicanus (Teleostei: Characiphysi), exhibit repeated, independent evolution for a variety of traits including eye degeneration, pigment loss, increased size and number of taste buds and mechanosensory organs, and shifts in many behavioural traits. Surface and cave forms are interfertile making this system amenable to genetic interrogation; however, lack of a reference genome has hampered efforts to identify genes responsible for changes in cave forms of A. mexicanus. Here we present the first de novo genome assembly for Astyanax mexicanus cavefish, contrast repeat elements to other teleost genomes, identify candidate genes underlying quantitative trait loci (QTL), and assay these candidate genes for potential functional and expression differences. We expect the cavefish genome to advance understanding of the evolutionary process, as well as, analogous human disease including retinal dysfunction.
Funded by: NCRR NIH HHS: R24 RR032658, R24 RR032658-01; NEI NIH HHS: R01 EY014619; NIDCR NIH HHS: DE022403, R03 DE022403; NIH HHS: R24 OD011198; Wellcome Trust: 095908, WT095908, WT098051
Nature communications 2014;5;5307
Identification of novel genetic Loci associated with thyroid peroxidase antibodies and clinical thyroid disease.
Department of Internal Medicine, Erasmus Medical Center Rotterdam, Rotterdam, The Netherlands.
Autoimmune thyroid diseases (AITD) are common, affecting 2-5% of the general population. Individuals with positive thyroid peroxidase antibodies (TPOAbs) have an increased risk of autoimmune hypothyroidism (Hashimoto's thyroiditis), as well as autoimmune hyperthyroidism (Graves' disease). As the possible causative genes of TPOAbs and AITD remain largely unknown, we performed GWAS meta-analyses in 18,297 individuals for TPOAb-positivity (1769 TPOAb-positives and 16,528 TPOAb-negatives) and in 12,353 individuals for TPOAb serum levels, with replication in 8,990 individuals. Significant associations (P<5×10(-8)) were detected at TPO-rs11675434, ATXN2-rs653178, and BACH2-rs10944479 for TPOAb-positivity, and at TPO-rs11675434, MAGI3-rs1230666, and KALRN-rs2010099 for TPOAb levels. Individual and combined effects (genetic risk scores) of these variants on (subclinical) hypo- and hyperthyroidism, goiter and thyroid cancer were studied. Individuals with a high genetic risk score had, besides an increased risk of TPOAb-positivity (OR: 2.18, 95% CI 1.68-2.81, P = 8.1×10(-8)), a higher risk of increased thyroid-stimulating hormone levels (OR: 1.51, 95% CI 1.26-1.82, P = 2.9×10(-6)), as well as a decreased risk of goiter (OR: 0.77, 95% CI 0.66-0.89, P = 6.5×10(-4)). The MAGI3 and BACH2 variants were associated with an increased risk of hyperthyroidism, which was replicated in an independent cohort of patients with Graves' disease (OR: 1.37, 95% CI 1.22-1.54, P = 1.2×10(-7) and OR: 1.25, 95% CI 1.12-1.39, P = 6.2×10(-5)). The MAGI3 variant was also associated with an increased risk of hypothyroidism (OR: 1.57, 95% CI 1.18-2.10, P = 1.9×10(-3)). This first GWAS meta-analysis for TPOAbs identified five newly associated loci, three of which were also associated with clinical thyroid disease. With these markers we identified a large subgroup in the general population with a substantially increased risk of TPOAbs. The results provide insight into why individuals with thyroid autoimmunity do or do not eventually develop thyroid disease, and these markers may therefore predict which TPOAb-positives are particularly at risk of developing clinical thyroid dysfunction.
Funded by: NCATS NIH HHS: UL1 TR000124, UL1TR000124; NCI NIH HHS: P01 CA124570, P30 CA016058, P30 CA16058; NCRR NIH HHS: UL1RR033176; NHLBI NIH HHS: HL080295, HL087652, HL105756; NIA NIH HHS: AG023629; NIDDK NIH HHS: DK063491, P30 DK063491; Wellcome Trust: 085541/Z/08/Z, WT089062
PLoS genetics 2014;10;2;e1004123
C. elegans whole-genome sequencing reveals mutational signatures related to carcinogens and DNA repair deficiency.
Centre for Gene Regulation and Expression, University of Dundee, Dundee DD1 5EH, Scotland, United Kingdom;
Mutation is associated with developmental and hereditary disorders, aging, and cancer. While we understand some mutational processes operative in human disease, most remain mysterious. We used Caenorhabditis elegans whole-genome sequencing to model mutational signatures, analyzing 183 worm populations across 17 DNA repair-deficient backgrounds propagated for 20 generations or exposed to carcinogens. The baseline mutation rate in C. elegans was approximately one per genome per generation, not overtly altered across several DNA repair deficiencies over 20 generations. Telomere erosion led to complex chromosomal rearrangements initiated by breakage-fusion-bridge cycles and completed by simultaneously acquired, localized clusters of breakpoints. Aflatoxin B1 induced substitutions of guanines in a GpC context, as observed in aflatoxin-induced liver cancers. Mutational burden increased with impaired nucleotide excision repair. Cisplatin and mechlorethamine, DNA crosslinking agents, caused dose- and genotype-dependent signatures among indels, substitutions, and rearrangements. Strikingly, both agents induced clustered rearrangements resembling "chromoanasynthesis," a replication-based mutational signature seen in constitutional genomic disorders, suggesting that interstrand crosslinks may play a pathogenic role in such events. Cisplatin mutagenicity was most pronounced in xpf-1 mutants, suggesting that this gene critically protects cells against platinum chemotherapy. Thus, experimental model systems combined with genome sequencing can recapture and mechanistically explain mutational signatures associated with human disease.
Funded by: Wellcome Trust: 077012/Z/05/Z, 088340, 090944, 097945
Genome research 2014;24;10;1624-36
Respiratory tract samples, viral load, and genome fraction yield in patients with Middle East respiratory syndrome.
Global Centre for Mass Gatherings Medicine and Ministry of Health, Riyadh, Kingdom of Saudi Arabia and College of Medicine, Alfaisal University.
Background: Analysis of clinical samples from patients with new viral infections is critical to confirm the diagnosis, to specify the viral load, and to sequence data necessary for characterizing the viral kinetics, transmission, and evolution. We analyzed samples from 112 patients infected with the recently discovered Middle East respiratory syndrome coronavirus (MERS-CoV).
Methods: Respiratory tract samples from cases of MERS-CoV infection confirmed by polymerase chain reaction (PCR) were investigated to determine the MERS-CoV load and fraction of the MERS-CoV genome. These values were analyzed to determine associations with clinical sample type.
Results: Samples from 112 individuals in which MERS-CoV was detected by PCR were analyzed, of which 13 were sputum samples, 64 were nasopharyngeal swab specimens, 30 were tracheal aspirates, and 3 were bronchoalveolar lavage specimens; 2 samples were of unknown origin. Tracheal aspirates yielded significantly higher MERS-CoV loads, compared with nasopharyngeal swab specimens (P = .005) and sputum specimens (P = .0001). Tracheal aspirates had viral loads similar to those in bronchoalveolar lavage samples (P = .3079). Bronchoalveolar lavage samples and tracheal aspirates had significantly higher genome fraction than nasopharyngeal swab specimens (P = .0095 and P = .0002, respectively) and sputum samples (P = .0009 and P = .0001, respectively). The genome yield from tracheal aspirates and bronchoalveolar lavage samples were similar (P = .1174).
Conclusions: Lower respiratory tract samples yield significantly higher MERS-CoV loads and genome fractions than upper respiratory tract samples.
Funded by: Department of Health; Wellcome Trust
The Journal of infectious diseases 2014;210;10;1590-4
Human infection with MERS coronavirus after exposure to infected camels, Saudi Arabia, 2013.
We investigated a case of human infection with Middle East respiratory syndrome coronavirus (MERS-CoV) after exposure to infected camels. Analysis of the whole human-derived virus and 15% of the camel-derived virus sequence yielded nucleotide polymorphism signatures suggestive of cross-species transmission. Camels may act as a direct source of human MERS-CoV infection.
Emerging infectious diseases 2014;20;6;1012-5
Community case clusters of Middle East respiratory syndrome coronavirus in Hafr Al-Batin, Kingdom of Saudi Arabia: a descriptive genomic study.
Global Centre for Mass Gatherings Medicine (GCMGM), Ministry of Health, Riyadh, Kingdom of Saudi Arabia (KSA); College of Medicine, Alfaisal University, Riyadh, Kingdom of Saudi Arabia. Electronic address: email@example.com.
The Middle East respiratory syndro