Sanger Institute - Publications 2013
Number of papers published in 2013: 396
Bloomsbury report on mouse embryo phenotyping: recommendations from the IMPC workshop on embryonic lethal screening.
Wellcome Trust Sanger Institute, Cambridge, CB10 1SA, UK.
Identifying genes that are important for embryo development is a crucial first step towards understanding their many functions in driving the ordered growth, differentiation and organogenesis of embryos. It can also shed light on the origins of developmental disease and congenital abnormalities. Current international efforts to examine gene function in the mouse provide a unique opportunity to pinpoint genes that are involved in embryogenesis, owing to the emergence of embryonic lethal knockout mutants. Through internationally coordinated efforts, the International Knockout Mouse Consortium (IKMC) has generated a public resource of mouse knockout strains and, in April 2012, the International Mouse Phenotyping Consortium (IMPC), supported by the EU InfraCoMP programme, convened a workshop to discuss developing a phenotyping pipeline for the investigation of embryonic lethal knockout lines. This workshop brought together over 100 scientists, from 13 countries, who are working in the academic and commercial research sectors, including experts and opinion leaders in the fields of embryology, animal imaging, data capture, quality control and annotation, high-throughput mouse production, phenotyping, and reporter gene analysis. This article summarises the outcome of the workshop, including (1) the vital scientific importance of phenotyping embryonic lethal mouse strains for basic and translational research; (2) a common framework to harmonise international efforts within this context; (3) the types of phenotyping that are likely to be most appropriate for systematic use, with a focus on 3D embryo imaging; (4) the importance of centralising data in a standardised form to facilitate data mining; and (5) the development of online tools to allow open access to and dissemination of the phenotyping data.
Funded by: British Heart Foundation: RG/10/17/28553; Cancer Research UK: 13031; Medical Research Council: G0801124, G0802163, MC_U117562103; NHGRI NIH HHS: U54 HG006348, U54 HG006364; NICHD NIH HHS: P30 HD024064; NIH HHS: U42 OD011174, U42 OD011175, U42 OD011185; Wellcome Trust: 090532, 100160
Disease models & mechanisms 2013;6;3;571-9
Bacteriotherapy for the treatment of intestinal dysbiosis caused by Clostridium difficile infection.
Bacterial Pathogenesis Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.
Faecal microbiota transplantation (FMT) has been used for more than five decades to treat a variety of intestinal diseases associated with pathological imbalances within the resident microbiota, termed dysbiosis. FMT has been particularly effective for treating patients with recurrent Clostridium difficile infection who are left with few clinical options other than continued antibiotic therapy. Our increasing knowledge of the structure and function of the human intestinal microbiota and C. difficile pathogenesis has led to the understanding that FMT promotes intestinal ecological restoration and highlights the microbiota as a viable therapeutic target. However, the use of undefined faecal samples creates a barrier for widespread clinical use because of safety and aesthetic issues. An emerging concept of bacteriotherapy, the therapeutic use of a defined mixture of harmless, health-associated bacteria, holds promise for the treatment of patients with severe C. difficile infection, and possibly represents a paradigm shift for the treatment of diseases linked to intestinal dysbiosis.
Funded by: Medical Research Council: 93614; Wellcome Trust: 098051
Current opinion in microbiology 2013;16;5;596-601
Dynamic image-based modelling of kidney branching morphogenesis
Lecture Notes in Computer Science 2013;8130 LNBI;106-19
Sequencing ancient calcified dental plaque shows changes in oral microbiota with dietary shifts of the Neolithic and Industrial revolutions.
Australian Centre for Ancient DNA, School of Earth and Environmental Sciences, The University of Adelaide, Adelaide, South Australia, Australia.
The importance of commensal microbes for human health is increasingly recognized, yet the impacts of evolutionary changes in human diet and culture on commensal microbiota remain almost unknown. Two of the greatest dietary shifts in human evolution involved the adoption of carbohydrate-rich Neolithic (farming) diets (beginning ∼10,000 years before the present) and the more recent advent of industrially processed flour and sugar (in ∼1850). Here, we show that calcified dental plaque (dental calculus) on ancient teeth preserves a detailed genetic record throughout this period. Data from 34 early European skeletons indicate that the transition from hunter-gatherer to farming shifted the oral microbial community to a disease-associated configuration. The composition of oral microbiota remained unexpectedly constant between Neolithic and medieval times, after which (the now ubiquitous) cariogenic bacteria became dominant, apparently during the Industrial Revolution. Modern oral microbiotic ecosystems are markedly less diverse than historic populations, which might be contributing to chronic oral (and other) disease in postindustrial lifestyles.
Funded by: Wellcome Trust: 076964, WT092799/Z/10/Z, WT098051
Nature genetics 2013;45;4;450-5, 455e1
Partial sleep restriction activates immune response-related gene expression pathways: experimental and epidemiological studies in humans.
Department of Physiology, Institute of Biomedicine, University of Helsinki, Helsinki, Finland.
Epidemiological studies have shown that short or insufficient sleep is associated with increased risk for metabolic diseases and mortality. To elucidate mechanisms behind this connection, we aimed to identify genes and pathways affected by experimentally induced, partial sleep restriction and to verify their connection to insufficient sleep at population level. The experimental design simulated sleep restriction during a working week: sleep of healthy men (N = 9) was restricted to 4 h/night for five nights. The control subjects (N = 4) spent 8 h/night in bed. Leukocyte RNA expression was analyzed at baseline, after sleep restriction, and after recovery using whole genome microarrays complemented with pathway and transcription factor analysis. Expression levels of the ten most up-regulated and ten most down-regulated transcripts were correlated with subjective assessment of insufficient sleep in a population cohort (N = 472). Experimental sleep restriction altered the expression of 117 genes. Eight of the 25 most up-regulated transcripts were related to immune function. Accordingly, fifteen of the 25 most up-regulated Gene Ontology pathways were also related to immune function, including those for B cell activation, interleukin 8 production, and NF-κB signaling (P<0.005). Of the ten most up-regulated genes, expression of STX16 correlated negatively with self-reported insufficient sleep in a population sample, while three other genes showed tendency for positive correlation. Of the ten most down-regulated genes, TBX21 and LGR6 correlated negatively and TGFBR3 positively with insufficient sleep. Partial sleep restriction affects the regulation of signaling pathways related to the immune system. Some of these changes appear to be long-lasting and may at least partly explain how prolonged sleep restriction can contribute to inflammation-associated pathological states, such as cardiometabolic diseases.
PloS one 2013;8;10;e77184
CCL3L1 copy number, HIV load, and immune reconstitution in sub-Saharan Africans.
Department of Genetics, University of Leicester, University Road, Leicester, LE1 7RH, UK. Ejh33@le.ac.uk.
Background: The role of copy number variation of the CCL3L1 gene, encoding MIP1α, in contributing to the host variation in susceptibility and response to HIV infection is controversial. Here we analyse a sub-Saharan African cohort from Tanzania and Ethiopia, two countries with a high prevalence of HIV-1 and a high co-morbidity of HIV with tuberculosis.
Methods: We use a form of quantitative PCR called the paralogue ratio test to determine CCL3L1 gene copy number in 1134 individuals and validate our copy number typing using array comparative genomic hybridisation and fiber-FISH.
Results: We find no significant association of CCL3L1 gene copy number with HIV load in antiretroviral-naïve patients prior to initiation of combination highly active anti-retroviral therapy. However, we find a significant association of low CCL3L1 gene copy number with improved immune reconstitution following initiation of highly active anti-retroviral therapy (p = 0.012), replicating a previous study.
Conclusions: Our work supports a role for CCL3L1 copy number in immune reconstitution following antiretroviral therapy in HIV, and suggests that the MIP1α -CCR5 axis might be targeted to aid immune reconstitution.
Funded by: Medical Research Council: GO801123; Wellcome Trust: WT087663, WT098051
BMC infectious diseases 2013;13;536
AHT-ChIP-seq: a completely automated robotic protocol for high-throughput chromatin immunoprecipitation.
ChIP-seq is an established manually-performed method for identifying DNA-protein interactions genome-wide. Here, we describe a protocol for automated high-throughput (AHT) ChIP-seq. To demonstrate the quality of data obtained using AHT-ChIP-seq, we applied it to five proteins in mouse livers using a single 96-well plate, demonstrating an extremely high degree of qualitative and quantitative reproducibility among biological and technical replicates. We estimated the optimum and minimum recommended cell numbers required to perform AHT-ChIP-seq by running an additional plate using HepG2 and MCF7 cells. With this protocol, commercially available robotics can perform four hundred experiments in five days.
Funded by: Cancer Research UK: 15603, A10185; European Research Council: 202218; Wellcome Trust: 098051
Genome biology 2013;14;11;R124
Signatures of mutational processes in human cancer.
Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.
All cancers are caused by somatic mutations; however, understanding of the biological processes generating these mutations is limited. The catalogue of somatic mutations from a cancer genome bears the signatures of the mutational processes that have been operative. Here we analysed 4,938,362 mutations from 7,042 cancers and extracted more than 20 distinct mutational signatures. Some are present in many cancer types, notably a signature attributed to the APOBEC family of cytidine deaminases, whereas others are confined to a single cancer class. Certain signatures are associated with age of the patient at cancer diagnosis, known mutagenic exposures or defects in DNA maintenance, but many are of cryptic origin. In addition to these genome-wide mutational signatures, hypermutation localized to small genomic regions, 'kataegis', is found in many cancer types. The results reveal the diversity of mutational processes underlying the development of cancer, with potential implications for understanding of cancer aetiology, prevention and therapy.
Funded by: NCI NIH HHS: T32 CA009216; Wellcome Trust: 088340, 093867, 098051
Deciphering signatures of mutational processes operative in human cancer.
Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.
The genome of a cancer cell carries somatic mutations that are the cumulative consequences of the DNA damage and repair processes operative during the cellular lineage between the fertilized egg and the cancer cell. Remarkably, these mutational processes are poorly characterized. Global sequencing initiatives are yielding catalogs of somatic mutations from thousands of cancers, thus providing the unique opportunity to decipher the signatures of mutational processes operative in human cancer. However, until now there have been no theoretical models describing the signatures of mutational processes operative in cancer genomes and no systematic computational approaches are available to decipher these mutational signatures. Here, by modeling mutational processes as a blind source separation problem, we introduce a computational framework that effectively addresses these questions. Our approach provides a basis for characterizing mutational signatures from cancer-derived somatic mutational catalogs, paving the way to insights into the pathogenetic mechanism underlying all cancers.
Funded by: Wellcome Trust: 088340, 093867, 098051, WT088340MA
Cell reports 2013;3;1;246-59
Modeling the association of space, time, and host species with variation of the HA, NA, and NS genes of H5N1 highly pathogenic avian influenza viruses isolated from birds in Romania in 2005-2007.
Center for Animal Disease Modeling and Surveillance (CADMS), School of Veterinary Medicine, One Shields Avenue, University of California, Davis, CA 95616, USA. firstname.lastname@example.org
Molecular characterization studies of a diverse collection of avian influenza viruses (AIVs) have demonstrated that AIVs' greatest genetic variability lies in the HA, NA, and NS genes. The objective here was to quantify the association between geographical locations, periods of time, and host species and pairwise nucleotide variation in the HA, NA, and NS genes of 70 isolates of H5N1 highly pathogenic avian influenza virus (HPAIV) collected from October 2005 to December 2007 from birds in Romania. A mixed-binomial Bayesian regression model was used to quantify the probability of nucleotide variation between isolates and its association with space, time, and host species. As expected for the three target genes, a higher probability of nucleotide differences (odds ratios [ORs] > 1) was found between viruses sampled from places at greater geographical distances from each other, viruses sampled over greater periods of time, and viruses derived from different species. The modeling approach in the present study maybe useful in further understanding the molecular epidemiology of H5N1 HPAI virus in bird populations. The methodology presented here will be useful in predicting the most likely genetic distance for any of the three gene segments of viruses that have not yet been isolated or sequenced based on space, time, and host species during the course of an epidemic.
Funded by: Wellcome Trust: 079643, 093724
Avian diseases 2013;57;3;612-21
The anatomy of successful computational biology software.
National Center for Biotechnology Information, Bethesda, Maryland.
Funded by: Intramural NIH HHS: ZIA LM000072-18; NIGMS NIH HHS: R01 GM070743
Nature biotechnology 2013;31;10;894-7
The African coelacanth genome provides insights into tetrapod evolution.
Molecular Genetics Program, Benaroya Research Institute, Seattle, Washington 98101, USA. email@example.com
The discovery of a living coelacanth specimen in 1938 was remarkable, as this lineage of lobe-finned fish was thought to have become extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features. Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain and olfaction. Functional assays of enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues show the importance of the coelacanth genome as a blueprint for understanding tetrapod evolution.
Funded by: Medical Research Council: MC_U137761446; NCRR NIH HHS: R24 RR032670; NHGRI NIH HHS: R01 HG003474, U54 HG003067; NICHD NIH HHS: R37 HD032443; NIEHS NIH HHS: P42 ES007381, R01 ES006272; NIH HHS: R01 OD011116, R24 OD011199; Wellcome Trust: 095908
Phosphoinositide 3-Kinase δ Gene Mutation Predisposes to Respiratory Infection and Airway Damage.
Department of Medicine, University of Cambridge, Cambridge, UK.
Genetic mutations cause primary immunodeficiencies (PIDs), which predispose to infections. Here we describe Activated PI3K-δ Syndrome (APDS), a PID associated with a dominant gain-of-function mutation in which lysine replaced glutamic acid at residue 1021 (E1021K) in the p110δ protein, the catalytic subunit of phosphoinositide 3-kinase δ (PI3Kδ), encoded by the PIK3CD gene. We found E1021K in 17 patients from seven unrelated families, but not among 3346 healthy subjects. APDS was characterized by recurrent respiratory infections, progressive airway damage, lymphopenia, increased circulating transitional B cells, increased immunoglobulin M and reduced immunoglobulin G2 levels in serum and impaired vaccine responses. The E1021K mutation enhanced membrane association and kinase activity of p110δ. Patient-derived lymphocytes had increased levels of phosphatidylinositol 3,4,5-trisphosphate and phosphorylated AKT protein and were prone to activation-induced cell death. Selective p110δ inhibitors IC87114 and GS-1101 reduced the activity of the mutant enzyme in vitro, which suggested a therapeutic approach for patients with APDS.
Science (New York, N.Y.) 2013
The COMBREX Project: Design, Methodology, and Initial Results.
New England Biolabs, Ipswich, Massachusetts, United States of America.
Experimental data exists for only a vanishingly small fraction of sequenced microbial genes. This community page discusses the progress made by the COMBREX project to address this important issue using both computational and experimental resources.
PLoS biology 2013;11;8;e1001638
Genome-wide meta-analysis identifies new susceptibility loci for migraine.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK. firstname.lastname@example.org
Migraine is the most common brain disorder, affecting approximately 14% of the adult population, but its molecular mechanisms are poorly understood. We report the results of a meta-analysis across 29 genome-wide association studies, including a total of 23,285 individuals with migraine (cases) and 95,425 population-matched controls. We identified 12 loci associated with migraine susceptibility (P<5×10(-8)). Five loci are new: near AJAP1 at 1p36, near TSPAN2 at 1p13, within FHL5 at 6q16, within C7orf10 at 7p14 and near MMP16 at 8q21. Three of these loci were identified in disease subgroup analyses. Brain tissue expression quantitative trait locus analysis suggests potential functional candidate genes at four loci: APOA1BP, TBC1D7, FUT9, STAT6 and ATP5B.
Funded by: Intramural NIH HHS: Z01 AG000949-02; Medical Research Council: G0802462, G9815508, MC_UU_12013/1; NIAAA NIH HHS: K05 AA017688; NIGMS NIH HHS: T32 GM007753; Wellcome Trust: 089062, 092731
Nature genetics 2013;45;8;912-7
Association of cytokine and Toll-like receptor gene polymorphisms with severe malaria in three regions of Cameroon.
Department of Biochemistry and Molecular Biology, University of Buea, Buea, Cameroon.
P. falciparum malaria is one of the most widespread and deadliest infectious diseases in children under five years in endemic areas. The disease has been a strong force for evolutionary selection in the human genome, and uncovering the critical human genetic factors that confer resistance to the disease would provide clues to the molecular basis of protective immunity that would be invaluable for vaccine development. We investigated the effect of single nucleotide polymorphisms (SNPs) on malaria pathology in a case- control study of 1862 individuals from two major ethnic groups in three regions with intense perennial P. falciparum transmission in Cameroon. Twenty nine polymorphisms in cytokine and toll-like receptor (TLR) genes as well as the sickle cell trait (HbS) were assayed on the Sequenom iPLEX platform. Our results confirm the known protective effect of HbS against severe malaria and also reveal a protective effect of SNPs in interleukin-10 (IL10) cerebral malaria and hyperpyrexia. Furthermore, IL17RE rs708567 GA and hHbS rs334 AT individuals were associated with protection from uncomplicated malaria and anaemia respectively in this study. Meanwhile, individuals with the hHbS rs334 TT, IL10 rs3024500 AA, and IL17RD rs6780995 GA genotypes were more susceptible to severe malarial anaemia, cerebral malaria, and hyperpyrexia respectively. Taken together, our results suggest that polymorphisms in some immune response genes may have important implications for the susceptibility to severe malaria in Cameroonians. Moreover using uncomplicated malaria may allow us to identify novel pathways in the early development of the disease.
Funded by: Medical Research Council: G0600230, G0600718; Wellcome Trust: 075491/Z/04, 077012/Z/05/Z, 087285, 090532, 090532/Z/09/Z, 090770, 090770/Z/09/Z, 77383/Z/05/Z
PloS one 2013;8;11;e81071
Genome-wide, whole mount in situ analysis of transcriptional regulators in zebrafish embryos.
Institute of Toxicology and Genetics, Karlsruhe Institute of Technology, Postfach 3640, 76021 Karlsruhe, Germany.
Transcription is the primary step in the retrieval of genetic information. A substantial proportion of the protein repertoire of each organism consists of transcriptional regulators (TRs). It is believed that the differential expression and combinatorial action of these TRs is essential for vertebrate development and body homeostasis. We mined the zebrafish genome exhaustively for genes encoding TRs and determined their expression in the zebrafish embryo by sequencing to saturation and in situ hybridisation. At the evolutionary conserved phylotypic stage, 75% of the 3302 TR genes encoded in the genome are already expressed. The number of expressed TR genes increases only marginally in subsequent stages and is maintained during adulthood suggesting important roles of the TR genes in body homeostasis. Fewer than half of the TR genes (45%, n=1711 genes) are expressed in a tissue-restricted manner in the embryo. Transcripts of 207 genes were detected in a single tissue in the 24h embryo, potentially acting as regulators of specific processes. Other TR genes were expressed in multiple tissues. However, with the exception of certain territories in the nervous system, we did not find significant synexpression suggesting that most tissue-restricted TRs act in a freely combinatorial fashion. Our data indicate that elaboration of body pattern and function from the phylotypic stage onward relies mostly on redeployment of TRs and post-transcriptional processes.
Funded by: Medical Research Council: MC_UP_1102/1; Wellcome Trust: 079643
Developmental biology 2013;380;2;351-62
Hospital outbreak of Middle East respiratory syndrome coronavirus.
Global Center for Mass Gatherings Medicine, Ministry of Health, Riyadh, Saudi Arabia.
Background: In September 2012, the World Health Organization reported the first cases of pneumonia caused by the novel Middle East respiratory syndrome coronavirus (MERS-CoV). We describe a cluster of health care-acquired MERS-CoV infections.
Methods: Medical records were reviewed for clinical and demographic information and determination of potential contacts and exposures. Case patients and contacts were interviewed. The incubation period and serial interval (the time between the successive onset of symptoms in a chain of transmission) were estimated. Viral RNA was sequenced.
Results: Between April 1 and May 23, 2013, a total of 23 cases of MERS-CoV infection were reported in the eastern province of Saudi Arabia. Symptoms included fever in 20 patients (87%), cough in 20 (87%), shortness of breath in 11 (48%), and gastrointestinal symptoms in 8 (35%); 20 patients (87%) presented with abnormal chest radiographs. As of June 12, a total of 15 patients (65%) had died, 6 (26%) had recovered, and 2 (9%) remained hospitalized. The median incubation period was 5.2 days (95% confidence interval [CI], 1.9 to 14.7), and the serial interval was 7.6 days (95% CI, 2.5 to 23.1). A total of 21 of the 23 cases were acquired by person-to-person transmission in hemodialysis units, intensive care units, or in-patient units in three different health care facilities. Sequencing data from four isolates revealed a single monophyletic clade. Among 217 household contacts and more than 200 health care worker contacts whom we identified, MERS-CoV infection developed in 5 family members (3 with laboratory-confirmed cases) and in 2 health care workers (both with laboratory-confirmed cases).
Conclusions: Person-to-person transmission of MERS-CoV can occur in health care settings and may be associated with considerable morbidity. Surveillance and infection-control measures are critical to a global public health response.
Funded by: NIGMS NIH HHS: U01 GM070708, U54 GM088491; Wellcome Trust: 093724
The New England journal of medicine 2013;369;5;407-16
Genome sequencing reveals loci under artificial selection that underlie disease phenotypes in the laboratory rat.
Physiological Genomic and Medicine Group, MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, UK.
Large numbers of inbred laboratory rat strains have been developed for a range of complex disease phenotypes. To gain insights into the evolutionary pressures underlying selection for these phenotypes, we sequenced the genomes of 27 rat strains, including 11 models of hypertension, diabetes, and insulin resistance, along with their respective control strains. Altogether, we identified more than 13 million single-nucleotide variants, indels, and structural variants across these rat strains. Analysis of strain-specific selective sweeps and gene clusters implicated genes and pathways involved in cation transport, angiotensin production, and regulators of oxidative stress in the development of cardiovascular disease phenotypes in rats. Many of the rat loci that we identified overlap with previously mapped loci for related traits in humans, indicating the presence of shared pathways underlying these phenotypes in rats and humans. These data represent a step change in resources available for evolutionary analysis of complex traits in disease models.
Funded by: British Heart Foundation: RE/08/002; Cancer Research UK: 13031; Medical Research Council: G0800024, MC_U120061454, MC_U120097112; NHLBI NIH HHS: HL094446, R01 HL020176, R01 HL089895, R01 HL094446; NIDDK NIH HHS: R21 DK089417; Wellcome Trust: 075491/Z/04
Effective preparation of Plasmodium vivax field isolates for high-throughput whole genome sequencing.
Global and Tropical Health Division, Menzies School of Health Research, Charles Darwin University, Darwin, Australia. email@example.com
Whole genome sequencing (WGS) of Plasmodium vivax is problematic due to the reliance on clinical isolates which are generally low in parasitaemia and sample volume. Furthermore, clinical isolates contain a significant contaminating background of host DNA which confounds efforts to map short read sequence of the target P. vivax DNA. Here, we discuss a methodology to significantly improve the success of P. vivax WGS on natural (non-adapted) patient isolates. Using 37 patient isolates from Indonesia, Thailand, and travellers, we assessed the application of CF11-based white blood cell filtration alone and in combination with short term ex vivo schizont maturation. Although CF11 filtration reduced human DNA contamination in 8 Indonesian isolates tested, additional short-term culture increased the P. vivax DNA yield from a median of 0.15 to 6.2 ng µl(-1) packed red blood cells (pRBCs) (p = 0.001) and reduced the human DNA percentage from a median of 33.9% to 6.22% (p = 0.008). Furthermore, post-CF11 and culture samples from Thailand gave a median P. vivax DNA yield of 2.34 ng µl(-1) pRBCs, and 2.65% human DNA. In 22 P. vivax patient isolates prepared with the 2-step method, we demonstrate high depth (median 654X coverage) and breadth (≥89%) of coverage on the Illumina GAII and HiSeq platforms. In contrast to the A+T-rich P. falciparum genome, negligible bias was observed in coverage depth between coding and non-coding regions of the P. vivax genome. This uniform coverage will greatly facilitate the detection of SNPs and copy number variants across the genome, enabling unbiased exploration of the natural diversity in P. vivax populations.
Funded by: Medical Research Council: G19/9; Wellcome Trust: 089275, 090532, 091625
PloS one 2013;8;1;e53160
Genomic triumph meets clinical reality.
A report on the 'Genomic Disorders 2013: from 60 years of DNA to human genomes in the clinic' meeting, held at Homerton College, Cambridge, UK, April 10-12, 2013.
Funded by: Wellcome Trust: 09851
Genome biology 2013;14;5;307
FOXP2 targets show evidence of positive selection in European populations.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. firstname.lastname@example.org
Forkhead box P2 (FOXP2) is a highly conserved transcription factor that has been implicated in human speech and language disorders and plays important roles in the plasticity of the developing brain. The pattern of nucleotide polymorphisms in FOXP2 in modern populations suggests that it has been the target of positive (Darwinian) selection during recent human evolution. In our study, we searched for evidence of selection that might have followed FOXP2 adaptations in modern humans. We examined whether or not putative FOXP2 targets identified by chromatin-immunoprecipitation genomic screening show evidence of positive selection. We developed an algorithm that, for any given gene list, systematically generates matched lists of control genes from the Ensembl database, collates summary statistics for three frequency-spectrum-based neutrality tests from the low-coverage resequencing data of the 1000 Genomes Project, and determines whether these statistics are significantly different between the given gene targets and the set of controls. Overall, there was strong evidence of selection of FOXP2 targets in Europeans, but not in the Han Chinese, Japanese, or Yoruba populations. Significant outliers included several genes linked to cellular movement, reproduction, development, and immune cell trafficking, and 13 of these constituted a significant network associated with cardiac arteriopathy. Strong signals of selection were observed for CNTNAP2 and RBFOX1, key neurally expressed genes that have been consistently identified as direct FOXP2 targets in multiple studies and that have themselves been associated with neurodevelopmental disorders involving language dysfunction.
Funded by: Wellcome Trust: 098051
American journal of human genetics 2013;92;5;696-706
Cooperativity of imprinted genes inactivated by acquired chromosome 20q deletions.
Large regions of recurrent genomic loss are common in cancers; however, with a few well-characterized exceptions, how they contribute to tumor pathogenesis remains largely obscure. Here we identified primate-restricted imprinting of a gene cluster on chromosome 20 in the region commonly deleted in chronic myeloid malignancies. We showed that a single heterozygous 20q deletion consistently resulted in the complete loss of expression of the imprinted genes L3MBTL1 and SGK2, indicative of a pathogenetic role for loss of the active paternally inherited locus. Concomitant loss of both L3MBTL1 and SGK2 dysregulated erythropoiesis and megakaryopoiesis, 2 lineages commonly affected in chronic myeloid malignancies, with distinct consequences in each lineage. We demonstrated that L3MBTL1 and SGK2 collaborated in the transcriptional regulation of MYC by influencing different aspects of chromatin structure. L3MBTL1 is known to regulate nucleosomal compaction, and we here showed that SGK2 inactivated BRG1, a key ATP-dependent helicase within the SWI/SNF complex that regulates nucleosomal positioning. These results demonstrate a link between an imprinted gene cluster and malignancy, reveal a new pathogenetic mechanism associated with acquired regions of genomic loss, and underline the complex molecular and cellular consequences of "simple" cancer-associated chromosome deletions.
The Journal of clinical investigation 2013;123;5;2169-82
Y-chromosome and mtDNA genetics reveal significant contrasts in affinities of modern Middle Eastern populations with European and African populations.
The Lebanese American University, Chouran, Beirut, Lebanon.
The Middle East was a funnel of human expansion out of Africa, a staging area for the Neolithic Agricultural Revolution, and the home to some of the earliest world empires. Post LGM expansions into the region and subsequent population movements created a striking genetic mosaic with distinct sex-based genetic differentiation. While prior studies have examined the mtDNA and Y-chromosome contrast in focal populations in the Middle East, none have undertaken a broad-spectrum survey including North and sub-Saharan Africa, Europe, and Middle Eastern populations. In this study 5,174 mtDNA and 4,658 Y-chromosome samples were investigated using PCA, MDS, mean-linkage clustering, AMOVA, and Fisher exact tests of F(ST)'s, R(ST)'s, and haplogroup frequencies. Geographic differentiation in affinities of Middle Eastern populations with Africa and Europe showed distinct contrasts between mtDNA and Y-chromosome data. Specifically, Lebanon's mtDNA shows a very strong association to Europe, while Yemen shows very strong affinity with Egypt and North and East Africa. Previous Y-chromosome results showed a Levantine coastal-inland contrast marked by J1 and J2, and a very strong North African component was evident throughout the Middle East. Neither of these patterns were observed in the mtDNA. While J2 has penetrated into Europe, the pattern of Y-chromosome diversity in Lebanon does not show the widespread affinities with Europe indicated by the mtDNA data. Lastly, while each population shows evidence of connections with expansions that now define the Middle East, Africa, and Europe, many of the populations in the Middle East show distinctive mtDNA and Y-haplogroup characteristics that indicate long standing settlement with relatively little impact from and movement into other populations.
PloS one 2013;8;1;e54616
Metagenomic study of the viruses of African straw-coloured fruit bats: detection of a chiropteran poxvirus and isolation of a novel adenovirus.
University of Cambridge, Department of Veterinary Medicine, Madingley Rd, Cambridge, Cambridgeshire, CB3 0ES, United Kingdom. email@example.com
Viral emergence as a result of zoonotic transmission constitutes a continuous public health threat. Emerging viruses such as SARS coronavirus, hantaviruses and henipaviruses have wildlife reservoirs. Characterising the viruses of candidate reservoir species in geographical hot spots for viral emergence is a sensible approach to develop tools to predict, prevent, or contain emergence events. Here, we explore the viruses of Eidolon helvum, an Old World fruit bat species widely distributed in Africa that lives in close proximity to humans. We identified a great abundance and diversity of novel herpes and papillomaviruses, described the isolation of a novel adenovirus, and detected, for the first time, sequences of a chiropteran poxvirus closely related with Molluscum contagiosum. In sum, E. helvum display a wide variety of mammalian viruses, some of them genetically similar to known human pathogens, highlighting the possibility of zoonotic transmission.
Funded by: Medical Research Council: G0801822; Wellcome Trust
Fitness benefits in fluoroquinolone-resistant Salmonella Typhi in the absence of antimicrobial pressure.
Oxford University Clinical Research Unit, Wellcome Trust Major Overseas Programme, Ho Chi Minh City, Vietnam.
Fluoroquinolones (FQ) are the recommended antimicrobial treatment for typhoid, a severe systemic infection caused by the bacterium Salmonella enterica serovar Typhi. FQ-resistance mutations in S. Typhi have become common, hindering treatment and control efforts. Using in vitro competition experiments, we assayed the fitness of eleven isogenic S. Typhi strains with resistance mutations in the FQ target genes, gyrA and parC. In the absence of antimicrobial pressure, 6 out of 11 mutants carried a selective advantage over the antimicrobial-sensitive parent strain, indicating that FQ resistance in S. Typhi is not typically associated with fitness costs. Double-mutants exhibited higher than expected fitness as a result of synergistic epistasis, signifying that epistasis may be a critical factor in the evolution and molecular epidemiology of S. Typhi. Our findings have important implications for the management of drug-resistant S. Typhi, suggesting that FQ-resistant strains would be naturally maintained even if fluoroquinolone use were reduced. DOI: http://dx.doi.org/10.7554/eLife.01229.001.
Funded by: Wellcome Trust: 089276/B/09/Z, 098051, 100087
Atypical mitogen-activated protein kinase phosphatase implicated in regulating transition from pre-S-Phase asexual intraerythrocytic development of Plasmodium falciparum.
Department of Global Health, College of Public Health, University of South Florida, Tampa, Florida, USA.
Intraerythrocytic development of the human malaria parasite Plasmodium falciparum appears as a continuous flow through growth and proliferation. To develop a greater understanding of the critical regulatory events, we utilized piggyBac insertional mutagenesis to randomly disrupt genes. Screening a collection of piggyBac mutants for slow growth, we isolated the attenuated parasite C9, which carried a single insertion disrupting the open reading frame (ORF) of PF3D7_1305500. This gene encodes a protein structurally similar to a mitogen-activated protein kinase (MAPK) phosphatase, except for two notable characteristics that alter the signature motif of the dual-specificity phosphatase domain, suggesting that it may be a low-activity phosphatase or pseudophosphatase. C9 parasites demonstrated a significantly lower growth rate with delayed entry into the S/M phase of the cell cycle, which follows the stage of maximum PF3D7_1305500 expression in intact parasites. Genetic complementation with the full-length PF3D7_1305500 rescued the wild-type phenotype of C9, validating the importance of the putative protein phosphatase PF3D7_1305500 as a regulator of pre-S-phase cell cycle progression in P. falciparum.
Funded by: NIAID NIH HHS: F31 AI083053, F31AI083053, R01 AI094973, R01AI033656, R01AI094973; Wellcome Trust
Eukaryotic cell 2013;12;9;1171-8
Imputation-based meta-analysis of severe malaria in three African populations.
Wellcome Trust Centre for Human Genetics, Oxford, United Kingdom.
Combining data from genome-wide association studies (GWAS) conducted at different locations, using genotype imputation and fixed-effects meta-analysis, has been a powerful approach for dissecting complex disease genetics in populations of European ancestry. Here we investigate the feasibility of applying the same approach in Africa, where genetic diversity, both within and between populations, is far more extensive. We analyse genome-wide data from approximately 5,000 individuals with severe malaria and 7,000 population controls from three different locations in Africa. Our results show that the standard approach is well powered to detect known malaria susceptibility loci when sample sizes are large, and that modern methods for association analysis can control the potential confounding effects of population structure. We show that pattern of association around the haemoglobin S allele differs substantially across populations due to differences in haplotype structure. Motivated by these observations we consider new approaches to association analysis that might prove valuable for multicentre GWAS in Africa: we relax the assumptions of SNP-based fixed effect analysis; we apply Bayesian approaches to allow for heterogeneity in the effect of an allele on risk across studies; and we introduce a region-based test to allow for heterogeneity in the location of causal alleles.
Funded by: Medical Research Council: G0600230, G0600718, G19/9; Wellcome Trust: 075491/Z/04, 077012/Z/05/Z, 087285, 090532, 090532/Z/09/Z, 090770, 090770/Z/09/Z, 091758, 091758/Z/10/Z, 092654, 096527, 097364/Z/11/Z, WT077383/Z/05/Z, WT098051
PLoS genetics 2013;9;5;e1003509
A comparison of dense transposon insertion libraries in the Salmonella serovars Typhi and Typhimurium.
Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK. firstname.lastname@example.org
Salmonella Typhi and Typhimurium diverged only ∼50 000 years ago, yet have very different host ranges and pathogenicity. Despite the availability of multiple whole-genome sequences, the genetic differences that have driven these changes in phenotype are only beginning to be understood. In this study, we use transposon-directed insertion-site sequencing to probe differences in gene requirements for competitive growth in rich media between these two closely related serovars. We identify a conserved core of 281 genes that are required for growth in both serovars, 228 of which are essential in Escherichia coli. We are able to identify active prophage elements through the requirement for their repressors. We also find distinct differences in requirements for genes involved in cell surface structure biogenesis and iron utilization. Finally, we demonstrate that transposon-directed insertion-site sequencing is not only applicable to the protein-coding content of the cell but also has sufficient resolution to generate hypotheses regarding the functions of non-coding RNAs (ncRNAs) as well. We are able to assign probable functions to a number of cis-regulatory ncRNA elements, as well as to infer likely differences in trans-acting ncRNA regulatory networks.
Funded by: Medical Research Council: G0600805; Wellcome Trust: WT076964, WT079643, WT098051
Nucleic acids research 2013;41;8;4549-64
Handbook of Proteolytic Enzymes 2013;1;509-13
Cytosol Alanyl Aminopeptidase
Handbook of Proteolytic Enzymes 2013;1;431-4
Handbook of Proteolytic Enzymes 2013;3;581;2624-5
Handbook of Proteolytic Enzymes 2013;2;518;2309-14
Introduction: Unsequenced Serine Peptidases
Handbook of Proteolytic Enzymes 2013;3;824;3737
Handbook of Proteolytic Enzymes 2013;2;392;1710-11
Network properties derived from deep sequencing of human B-cell receptor repertoires delineate B-cell populations.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom;
The adaptive immune response selectively expands B- and T-cell clones following antigen recognition by B- and T-cell receptors (BCR and TCR), respectively. Next-generation sequencing is a powerful tool for dissecting the BCR and TCR populations at high resolution, but robust computational analyses are required to interpret such sequencing. Here, we develop a novel computational approach for BCR repertoire analysis using established next-generation sequencing methods coupled with network construction and population analysis. BCR sequences organize into networks based on sequence diversity, with differences in network connectivity clearly distinguishing between diverse repertoires of healthy individuals and clonally expanded repertoires from individuals with chronic lymphocytic leukemia (CLL) and other clonal blood disorders. Network population measures defined by the Gini Index and cluster sizes quantify the BCR clonality status and are robust to sampling and sequencing depths. BCR network analysis therefore allows the direct and quantifiable comparison of BCR repertoires between samples and intra-individual population changes between temporal or spatially separated samples and over the course of therapy.
Funded by: Wellcome Trust: 079249, 095663, 100140
Genome research 2013;23;11;1874-84
ISCB computational biology Wikipedia competition.
European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom.
PLoS computational biology 2013;9;9;e1003242
Deep resequencing of GWAS loci identifies rare variants in CARD9, IL23R and RNF186 that are associated with ulcerative colitis.
Montreal Heart Institute, Research Center, Montreal, Quebec, Canada.
Genome-wide association studies and follow-up meta-analyses in Crohn's disease (CD) and ulcerative colitis (UC) have recently identified 163 disease-associated loci that meet genome-wide significance for these two inflammatory bowel diseases (IBD). These discoveries have already had a tremendous impact on our understanding of the genetic architecture of these diseases and have directed functional studies that have revealed some of the biological functions that are important to IBD (e.g. autophagy). Nonetheless, these loci can only explain a small proportion of disease variance (~14% in CD and 7.5% in UC), suggesting that not only are additional loci to be found but that the known loci may contain high effect rare risk variants that have gone undetected by GWAS. To test this, we have used a targeted sequencing approach in 200 UC cases and 150 healthy controls (HC), all of French Canadian descent, to study 55 genes in regions associated with UC. We performed follow-up genotyping of 42 rare non-synonymous variants in independent case-control cohorts (totaling 14,435 UC cases and 20,204 HC). Our results confirmed significant association to rare non-synonymous coding variants in both IL23R and CARD9, previously identified from sequencing of CD loci, as well as identified a novel association in RNF186. With the exception of CARD9 (OR = 0.39), the rare non-synonymous variants identified were of moderate effect (OR = 1.49 for RNF186 and OR = 0.79 for IL23R). RNF186 encodes a protein with a RING domain having predicted E3 ubiquitin-protein ligase activity and two transmembrane domains. Importantly, the disease-coding variant is located in the ubiquitin ligase domain. Finally, our results suggest that rare variants in genes identified by genome-wide association in UC are unlikely to contribute significantly to the overall variance for the disease. Rather, these are expected to help focus functional studies of the corresponding disease loci.
Funded by: Canadian Institutes of Health Research: GPG-102170; Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Medical Research Council: G0600329, G0800759; NCATS NIH HHS: UL1 TR000005, UL1 TR000077; NCI NIH HHS: R01 CA141743; NIDDK NIH HHS: DK062413, DK062420, DK062422, DK062423, DK062429, DK062431, DK062432, DK064869, P01 DK046763, P30 DK043351, T32 DK007191, U01 DK062420, U01 DK062423, U01 DK062431
PLoS genetics 2013;9;9;e1003723
Distinct H3F3A and H3F3B driver mutations define chondroblastoma and giant cell tumor of bone.
1] Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.  Department of Paediatrics, University of Cambridge, Cambridge, UK. .
It is recognized that some mutated cancer genes contribute to the development of many cancer types, whereas others are cancer type specific. For genes that are mutated in multiple cancer classes, mutations are usually similar in the different affected cancer types. Here, however, we report exquisite tumor type specificity for different histone H3.3 driver alterations. In 73 of 77 cases of chondroblastoma (95%), we found p.Lys36Met alterations predominantly encoded in H3F3B, which is one of two genes for histone H3.3. In contrast, in 92% (49/53) of giant cell tumors of bone, we found histone H3.3 alterations exclusively in H3F3A, leading to p.Gly34Trp or, in one case, p.Gly34Leu alterations. The mutations were restricted to the stromal cell population and were not detected in osteoclasts or their precursors. In the context of previously reported H3F3A mutations encoding p.Lys27Met and p.Gly34Arg or p.Gly34Val alterations in childhood brain tumors, a remarkable picture of tumor type specificity for histone H3.3 driver alterations emerges, indicating that histone H3.3 residues, mutations and genes have distinct functions.
Funded by: Cancer Research UK; Wellcome Trust: 077012/Z/05/Z, 088340, 098051, WT088340MA
Nature genetics 2013;45;12;1479-82
Microbial genomes as cheat sheets.
Nature reviews. Microbiology 2013;11;5;302
Genome-wide meta-analysis identifies 11 new loci for anthropometric traits and provides insights into genetic architecture.
US Department of Health and Human Services, Division of Cancer Epidemiology and Genetics, National Cancer Institute, US National Institutes of Health, Bethesda, Maryland, USA.
Approaches exploiting trait distribution extremes may be used to identify loci associated with common traits, but it is unknown whether these loci are generalizable to the broader population. In a genome-wide search for loci associated with the upper versus the lower 5th percentiles of body mass index, height and waist-to-hip ratio, as well as clinical classes of obesity, including up to 263,407 individuals of European ancestry, we identified 4 new loci (IGFBP4, H6PD, RSRC1 and PPP2R2A) influencing height detected in the distribution tails and 7 new loci (HNF4G, RPTOR, GNAT2, MRPS33P4, ADCY9, HS6ST3 and ZZZ3) for clinical classes of obesity. Further, we find a large overlap in genetic structure and the distribution of variants between traits based on extremes and the general population and little etiological heterogeneity between obesity subgroups.
Funded by: Cancer Research UK: 14136; Chief Scientist Office: CZB/4/710; Medical Research Council: G0600237, G0601261, G1000143, G1002084, G9521010, MC_PC_U127561128, MC_U105260558, MC_U106179471, MC_U106179472, MC_U106188470, MC_U123092720, MR/K006584/1; NHGRI NIH HHS: U01 HG007416; NHLBI NIH HHS: R01 HL105756; NIAAA NIH HHS: K05 AA017688; NIDDK NIH HHS: P60 DK020541, R01 DK072193, R01 DK075787; NIGMS NIH HHS: T32 GM074905; Wellcome Trust: 090532, 097117
Nature genetics 2013;45;5;501-12
The evolutionary dynamics of influenza A virus adaptation to mammalian hosts.
Department of Zoology, University of Oxford, Oxford, UK.
Few questions on infectious disease are more important than understanding how and why avian influenza A viruses successfully emerge in mammalian populations, yet little is known about the rate and nature of the virus' genetic adaptation in new hosts. Here, we measure, for the first time, the genomic rate of adaptive evolution of swine influenza viruses (SwIV) that originated in birds. By using a curated dataset of more than 24 000 human and swine influenza gene sequences, including 41 newly characterized genomes, we reconstructed the adaptive dynamics of three major SwIV lineages (Eurasian, EA; classical swine, CS; triple reassortant, TR). We found that, following the transfer of the EA lineage from birds to swine in the late 1970s, EA virus genes have undergone substantially faster adaptive evolution than those of the CS lineage, which had circulated among swine for decades. Further, the adaptation rates of the EA lineage antigenic haemagglutinin and neuraminidase genes were unexpectedly high and similar to those observed in human influenza A. We show that the successful establishment of avian influenza viruses in swine is associated with raised adaptive evolution across the entire genome for many years after zoonosis, reflecting the contribution of multiple mutations to the coordinated optimization of viral fitness in a new environment. This dynamics is replicated independently in the polymerase genes of the TR lineage, which established in swine following separate transmission from non-swine hosts.
Funded by: Biotechnology and Biological Sciences Research Council: BB/E009670/1, BB/H014306/1; Department of Health: 709; Medical Research Council: MC_G0902096; Wellcome Trust
Philosophical transactions of the Royal Society of London. Series B, Biological sciences 2013;368;1614;20120382
Genome-wide association study of intraocular pressure identifies the GLCCI1/ICA1 region as a glaucoma susceptibility locus.
To discover quantitative trait loci for intraocular pressure, a major risk factor for glaucoma and the only modifiable one, we performed a genome-wide association study on a discovery cohort of 2175 individuals from Sydney, Australia. We found a novel association between intraocular pressure and a common variant at 7p21 near to GLCCI1 and ICA1. The findings in this region were confirmed through two UK replication cohorts totalling 4866 individuals (rs59072263, Pcombined = 1.10 × 10(-8)). A copy of the G allele at this SNP is associated with an increase in mean IOP of 0.45 mmHg (95%CI = 0.30-0.61 mmHg). These results lend support to the implication of vesicle trafficking and glucocorticoid inducibility pathways in the determination of intraocular pressure and in the pathogenesis of primary open-angle glaucoma.
Human molecular genetics 2013;22;22;4653-60
Uniparental markers in Italy reveal a sex-biased genetic structure and different historical strata.
Laboratorio di Antropologia Molecolare, Dipartimento di Scienze Biologiche, Geologiche e Ambientali, Università di Bologna, Bologna, Italy.
Located in the center of the Mediterranean landscape and with an extensive coastal line, the territory of what is today Italy has played an important role in the history of human settlements and movements of Southern Europe and the Mediterranean Basin. Populated since Paleolithic times, the complexity of human movements during the Neolithic, the Metal Ages and the most recent history of the two last millennia (involving the overlapping of different cultural and demic strata) has shaped the pattern of the modern Italian genetic structure. With the aim of disentangling this pattern and understanding which processes more importantly shaped the distribution of diversity, we have analyzed the uniparentally-inherited markers in ∼900 individuals from an extensive sampling across the Italian peninsula, Sardinia and Sicily. Spatial PCAs and DAPCs revealed a sex-biased pattern indicating different demographic histories for males and females. Besides the genetic outlier position of Sardinians, a North West-South East Y-chromosome structure is found in continental Italy. Such structure is in agreement with recent archeological syntheses indicating two independent and parallel processes of Neolithisation. In addition, date estimates pinpoint the importance of the cultural and demographic events during the late Neolithic and Metal Ages. On the other hand, mitochondrial diversity is distributed more homogeneously in agreement with older population events that might be related to the presence of an Italian Refugium during the last glacial period in Europe.
PloS one 2013;8;5;e65441
Oxidative bisulfite sequencing of 5-methylcytosine and 5-hydroxymethylcytosine.
Department of Chemistry, University of Cambridge, Cambridge, UK.
To uncover the function of and interplay between the mammalian cytosine modifications 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC), new techniques and advances in current technology are needed. To this end, we have developed oxidative bisulfite sequencing (oxBS-seq), which can quantitatively locate 5mC and 5hmC marks at single-base resolution in genomic DNA. In bisulfite sequencing (BS-seq), both 5mC and 5hmC are read as cytosines and thus cannot be discriminated; however, in oxBS-seq, specific oxidation of 5hmC to 5-formylcytosine (5fC) and conversion of the newly formed 5fC to uracil (under bisulfite conditions) means that 5hmC can be discriminated from 5mC. A positive readout of actual 5mC is gained from a single oxBS-seq run, and 5hmC levels are inferred by comparison with a BS-seq run. Here we describe an optimized second-generation protocol that can be completed in 2 d.
Funded by: Biotechnology and Biological Sciences Research Council; Wellcome Trust
Nature protocols 2013;8;10;1841-51
Genome-wide screen identifies new candidate genes associated with artemisinin susceptibility in Plasmodium falciparum in Kenya.
1] KEMRI-Wellcome Trust Research Programme, Kilifi, Kenya  Institute of Microbiology, Magdeburg University School of Medicine, Germany.
Early identification of causal genetic variants underlying antimalarial drug resistance could provide robust epidemiological tools for timely public health interventions. Using a novel natural genetics strategy for mapping novel candidate genes we analyzed >75,000 high quality single nucleotide polymorphisms selected from high-resolution whole-genome sequencing data in 27 isolates of Plasmodium falciparum. We identified genetic variants associated with susceptibility to dihydroartemisinin that implicate one region on chromosome 13, a candidate gene on chromosome 1 (PFA0220w, a UBP1 ortholog) and others (PFB0560w, PFB0630c, PFF0445w) with putative roles in protein homeostasis and stress response. There was a strong signal for positive selection on PFA0220w, but not the other candidate loci. Our results demonstrate the power of full-genome sequencing-based association studies for uncovering candidate genes that determine parasite sensitivity to artemisinins. Our study provides a unique reference for the interpretation of results from resistant infections.
Funded by: Wellcome Trust: 090532, 092654
Scientific reports 2013;3;3318
A variant in LDLR is associated with abdominal aortic aneurysm.
Background: Abdominal aortic aneurysm (AAA) is a common cardiovascular disease among older people and demonstrates significant heritability. In contrast to similar complex diseases, relatively few genetic associations with AAA have been confirmed. We reanalyzed our genome-wide study and carried through to replication suggestive discovery associations at a lower level of significance.
Methods and results: A genome-wide association study was conducted using 1830 cases from the United Kingdom, New Zealand, and Australia with infrarenal aorta diameter≥30 mm or ruptured AAA and 5435 unscreened controls from the 1958 Birth Cohort and National Blood Service cohort from the Wellcome Trust Case Control Consortium. Eight suggestive associations with P<1×10(-4) were carried through to in silico replication in 1292 AAA cases and 30,503 controls. One single-nucleotide polymorphism associated with P<0.05 after Bonferroni correction in the in silico study underwent further replication (706 AAA cases and 1063 controls from the United Kingdom, 507 AAA cases and 199 controls from Denmark, and 885 AAA cases and 1000 controls from New Zealand). Low-density lipoprotein receptor (LDLR) rs6511720 A was significantly associated overall and in 3 of 5 individual replication studies. The full study showed an association that reached genome-wide significance (odds ratio, 0.76; 95% confidence interval, 0.70-0.83; P=2.08×10(-10)).
Conclusions: LDLR rs6511720 is associated with AAA. This finding is consistent with established effects of this variant on coronary artery disease. Shared causal pathways with other cardiovascular diseases may present novel opportunities for preventative and therapeutic strategies for AAA.
Funded by: British Heart Foundation: FS/11/16/28696, PG/10/001/28098; Medical Research Council: G1001799; Wellcome Trust: 076113, 084695, 085475
Circulation. Cardiovascular genetics 2013;6;5;498-504
A new method for high-resolution imaging of Ku foci to decipher mechanisms of DNA double-strand break repair.
The Wellcome Trust and Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge, CB2 1QN, England, UK.
DNA double-strand breaks (DSBs) are the most toxic of all genomic insults, and pathways dealing with their signaling and repair are crucial to prevent cancer and for immune system development. Despite intense investigations, our knowledge of these pathways has been technically limited by our inability to detect the main repair factors at DSBs in cells. In this paper, we present an original method that involves a combination of ribonuclease- and detergent-based preextraction with high-resolution microscopy. This method allows direct visualization of previously hidden repair complexes, including the main DSB sensor Ku, at virtually any type of DSB, including those induced by anticancer agents. We demonstrate its broad range of applications by coupling it to laser microirradiation, super-resolution microscopy, and single-molecule counting to investigate the spatial organization and composition of repair factories. Furthermore, we use our method to monitor DNA repair and identify mechanisms of repair pathway choice, and we show its utility in defining cellular sensitivities and resistance mechanisms to anticancer agents.
Funded by: Cancer Research UK: 11224, A11224, C6/A11224, C6946/A14492; European Research Council: 268536; Wellcome Trust: 092096, WT092096
The Journal of cell biology 2013;202;3;579-95
Neolithic mitochondrial haplogroup H genomes and the genetic origins of Europeans.
The Australian Centre for Ancient DNA, School of Earth and Environmental Sciences, University of Adelaide, Adelaide, South Australia 5005, Australia. email@example.com
Haplogroup H dominates present-day Western European mitochondrial DNA variability (>40%), yet was less common (~19%) among Early Neolithic farmers (~5450 BC) and virtually absent in Mesolithic hunter-gatherers. Here we investigate this major component of the maternal population history of modern Europeans and sequence 39 complete haplogroup H mitochondrial genomes from ancient human remains. We then compare this 'real-time' genetic data with cultural changes taking place between the Early Neolithic (~5450 BC) and Bronze Age (~2200 BC) in Central Europe. Our results reveal that the current diversity and distribution of haplogroup H were largely established by the Mid Neolithic (~4000 BC), but with substantial genetic contributions from subsequent pan-European cultures such as the Bell Beakers expanding out of Iberia in the Late Neolithic (~2800 BC). Dated haplogroup H genomes allow us to reconstruct the recent evolutionary history of haplogroup H and reveal a mutation rate 45% higher than current estimates for human mitochondria.
Funded by: Wellcome Trust: 079643
Nature communications 2013;4;1764
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Nature reviews. Microbiology 2013
Whole-genome sequencing to identify transmission of Mycobacterium abscessus between patients with cystic fibrosis: a retrospective cohort study.
Wellcome Trust Sanger Institute, Hinxton, UK.
Background: Increasing numbers of individuals with cystic fibrosis are becoming infected with the multidrug-resistant non-tuberculous mycobacterium (NTM) Mycobacterium abscessus, which causes progressive lung damage and is extremely challenging to treat. How this organism is acquired is not currently known, but there is growing concern that person-to-person transmission could occur. We aimed to define the mechanisms of acquisition of M abscessus in individuals with cystic fibrosis.
Method: Whole genome sequencing and antimicrobial susceptibility testing were done on 168 consecutive isolates of M abscessus from 31 patients attending an adult cystic fibrosis centre in the UK between 2007 and 2011. In parallel, we undertook detailed environmental testing for NTM and defined potential opportunities for transmission between patients both in and out of hospital using epidemiological data and social network analysis.
Findings: Phylogenetic analysis revealed two clustered outbreaks of near-identical isolates of the M abscessus subspecies massiliense (from 11 patients), differing by less than ten base pairs. This variation represents less diversity than that seen within isolates from a single individual, strongly indicating between-patient transmission. All patients within these clusters had numerous opportunities for within-hospital transmission from other individuals, while comprehensive environmental sampling, initiated during the outbreak, failed to detect any potential point source of NTM infection. The clusters of M abscessus subspecies massiliense showed evidence of transmission of mutations acquired during infection of an individual to other patients. Thus, isolates with constitutive resistance to amikacin and clarithromycin were isolated from several individuals never previously exposed to long-term macrolides or aminoglycosides, further indicating cross-infection.
Interpretation: Whole genome sequencing has revealed frequent transmission of multidrug resistant NTM between patients with cystic fibrosis despite conventional cross-infection measures. Although the exact transmission route is yet to be established, our epidemiological analysis suggests that it could be indirect.
Funding: The Wellcome Trust, Papworth Hospital, NIHR Cambridge Biomedical Research Centre, UK Health Protection Agency, Medical Research Council, and the UKCRC Translational Infection Research Initiative.
Funded by: Medical Research Council: G1000803; Wellcome Trust: 084953, 098051, 100140
Lancet (London, England) 2013;381;9877;1551-60
Transmission of M abscessus in patients with cystic fibrosis - Authors' reply.
Whole-genome sequencing to establish relapse or re-infection with Mycobacterium tuberculosis: a retrospective observational study.
Wellcome Trust Sanger Institute, Hinxton, UK.
Background: Recurrence of tuberculosis after treatment makes management difficult and is a key factor for determining treatment efficacy. Two processes can cause recurrence: relapse of the primary infection or re-infection with an exogenous strain. Although re-infection can and does occur, its importance to tuberculosis epidemiology and its biological basis is still debated. We used whole-genome sequencing-which is more accurate than conventional typing used to date-to assess the frequency of recurrence and to gain insight into the biological basis of re-infection.
Methods: We assessed patients from the REMoxTB trial-a randomised controlled trial of tuberculosis treatment that enrolled previously untreated participants with Mycobacterium tuberculosis infection from Malaysia, South Africa, and Thailand. We did whole-genome sequencing and mycobacterial interspersed repetitive unit-variable number of tandem repeat (MIRU-VNTR) typing of pairs of isolates taken by sputum sampling: one from before treatment and another from either the end of failed treatment at 17 weeks or later or from a recurrent infection. We compared the number and location of SNPs between isolates collected at baseline and recurrence.
Findings: We assessed 47 pairs of isolates. Whole-genome sequencing identified 33 cases with little genetic distance (0-6 SNPs) between strains, deemed relapses, and three cases for which the genetic distance ranged from 1306 to 1419 SNPs, deemed re-infections. Six cases of relapse and six cases of mixed infection were classified differently by whole-genome sequencing and MIRU-VNTR. We detected five single positive isolates (positive culture followed by at least two negative cultures) without clinical evidence of disease.
Interpretation: Whole-genome sequencing enables the differentiation of relapse and re-infection cases with greater resolution than do genotyping methods used at present, such as MIRU-VNTR, and provides insights into the biology of recurrence. The additional clarity provided by whole-genome sequencing might have a role in defining endpoints for clinical trials.
Funding: Wellcome Trust, European Union, Medical Research Council, Global Alliance for TB Drug Development, European and Developing Country Clinical Trials Partnership.
Funded by: Medical Research Council; Wellcome Trust: 098051
The Lancet. Respiratory medicine 2013;1;10;786-92
Inferring patient to patient transmission of Mycobacterium tuberculosis from whole genome sequencing data.
Background: Mycobacterium tuberculosis is characterised by limited genomic diversity, which makes the application of whole genome sequencing particularly attractive for clinical and epidemiological investigation. However, in order to confidently infer transmission events, an accurate knowledge of the rate of change in the genome over relevant timescales is required.
Methods: We attempted to estimate a molecular clock by sequencing 199 isolates from epidemiologically linked tuberculosis cases, collected in the Netherlands spanning almost 16 years.
Results: Multiple analyses support an average mutation rate of ~0.3 SNPs per genome per year. However, all analyses revealed a very high degree of variation around this mean, making the confirmation of links proposed by epidemiology, and inference of novel links, difficult. Despite this, in some cases, the phylogenetic context of other strains provided evidence supporting the confident exclusion of previously inferred epidemiological links.
Conclusions: This in-depth analysis of the molecular clock revealed that it is slow and variable over short time scales, which limits its usefulness in transmission studies. However, the superior resolution of whole genome sequencing can provide the phylogenetic context to allow the confident exclusion of possible transmission events previously inferred via traditional DNA fingerprinting techniques and epidemiological cluster investigation. Despite the slow generation of variation even at the whole genome level we conclude that the investigation of tuberculosis transmission will benefit greatly from routine whole genome sequencing.
Funded by: Medical Research Council; Wellcome Trust: 098051
BMC infectious diseases 2013;13;110
Discovery by the Epistasis Project of an epistatic interaction between the GSTM3 gene and the HHEX/IDE/KIF11 locus in the risk of Alzheimer's disease.
Human Genetics, School of Molecular Medical Sciences, Queen's Medical Centre, University of Nottingham, Nottingham, UK.
Despite recent discoveries in the genetics of sporadic Alzheimer's disease, there remains substantial "hidden heritability." It is thought that some of this missing heritability may be because of gene-gene, i.e., epistatic, interactions. We examined potential epistasis between 110 candidate polymorphisms in 1757 cases of Alzheimer's disease and 6294 control subjects of the Epistasis Project, divided between a discovery and a replication dataset. We found an epistatic interaction, between rs7483 in GSTM3 and rs1111875 in the HHEX/IDE/KIF11 gene cluster, with a closely similar, significant result in both datasets. The synergy factor (SF) in the combined dataset was 1.79, 95% confidence interval [CI], 1.35-2.36; p = 0.00004. Consistent interaction was also found in 7 out of the 8 additional subsets that we examined post hoc: i.e., it was shown in both North Europe and North Spain, in both men and women, in both those with and without the ε4 allele of apolipoprotein E, and in people older than 75 years (SF, 2.27; 95% CI, 1.60-3.20; p < 0.00001), but not in those younger than 75 years (SF, 1.06; 95% CI, 0.59-1.91; p = 0.84). The association with Alzheimer's disease was purely epistatic with neither polymorphism showing an independent effect: odds ratio, 1.0; p ≥ 0.7. Indeed, each factor was associated with protection in the absence of the other factor, but with risk in its presence. In conclusion, this epistatic interaction showed a high degree of consistency when stratifying by sex, the ε4 allele of apolipoprotein E genotype, and geographic region.
Funded by: Department of Health; Medical Research Council: G0400546
Neurobiology of aging 2013;34;4;1309.e1-7
A full-length recombinant Plasmodium falciparum PfRH5 protein induces inhibitory antibodies that are effective across common PfRH5 genetic variants.
Malaria Programme, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.
The lack of an effective licensed vaccine remains one of the most significant gaps in the portfolio of tools being developed to eliminate Plasmodium falciparum malaria. Vaccines targeting erythrocyte invasion - an essential step for both parasite development and malaria pathogenesis - have faced the particular challenge of genetic diversity. Immunity-driven balancing selection pressure on parasite invasion proteins often results in the presence of multiple, antigenically distinct, variants within a population, leading to variant-specific immune responses. Such variation makes it difficult to design a vaccine that covers the full range of diversity, and could potentially facilitate the evolution of vaccine-resistant parasite strains. In this study, we investigate the effect of genetic diversity on invasion inhibition by antibodies to a high priority P. falciparum invasion candidate antigen, P. falciparum Reticulocyte Binding Protein Homologue 5 (PfRH5). Previous work has shown that virally delivered PfRH5 can induce antibodies that protect against a wide range of genetic variants. Here, we show that a full-length recombinant PfRH5 protein expressed in mammalian cells is biochemically active, as judged by saturable binding to its receptor, basigin, and is able to induce antibodies that strongly inhibit P. falciparum growth and invasion. Whole genome sequencing of 290 clinical P. falciparum isolates from across the world identifies only five non-synonymous PfRH5 SNPs that are present at frequencies of 10% or more in at least one geographical region. Antibodies raised against the 3D7 variant of PfRH5 were able to inhibit nine different P. falciparum strains, which between them included all of the five most common PfRH5 SNPs in this dataset, with no evidence for strain-specific immunity. We conclude that protein-based PfRH5 vaccines are an urgent priority for human efficacy trials.
Funded by: Medical Research Council: G19/9; Wellcome Trust: 090532, 098051
Large-scale association analysis identifies new risk loci for coronary artery disease.
Coronary artery disease (CAD) is the commonest cause of death. Here, we report an association analysis in 63,746 CAD cases and 130,681 controls identifying 15 loci reaching genome-wide significance, taking the number of susceptibility loci for CAD to 46, and a further 104 independent variants (r(2) < 0.2) strongly associated with CAD at a 5% false discovery rate (FDR). Together, these variants explain approximately 10.6% of CAD heritability. Of the 46 genome-wide significant lead SNPs, 12 show a significant association with a lipid trait, and 5 show a significant association with blood pressure, but none is significantly associated with diabetes. Network analysis with 233 candidate genes (loci at 10% FDR) generated 5 interaction networks comprising 85% of these putative genes involved in CAD. The four most significant pathways mapping to these networks are linked to lipid metabolism and inflammation, underscoring the causal role of these activities in the genetic etiology of CAD. Our study provides insights into the genetic basis of CAD and identifies key biological pathways.
Funded by: British Heart Foundation: PG/08/094/26019, RG/08/014/24067, RG/09/012/28096; Medical Research Council: G0601261, G0601966, G0700931, G0801566, MC_U137686857, MR/L003120/1; NHLBI NIH HHS: K24 HL107643, R00 HL094535, R01 HL111694; NIDDK NIH HHS: R01 DK062370
Nature genetics 2013;45;1;25-33
Pitpnm1 is expressed in hair cells during development but is not required for hearing.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, Cambs, CB10 1SA, United Kingdom.
Deafness is a genetically complex disorder with many contributing genes still unknown. Here we describe the expression of Pitpnm1 in the inner ear. It is expressed in the inner hair cells of the organ of Corti from late embryonic stages until adulthood, and transiently in the outer hair cells during early postnatal stages. Despite this specific expression, Pitpnm1 null mice showed no hearing defects, possibly due to redundancy with the paralogous genes Pitpnm2 and Pitpnm3.
Adaptive changes of the Insig1/SREBP1/SCD1 set point help adipose tissue to cope with increased storage demands of obesity.
University of Cambridge, Metabolic Research Laboratories, Institute of Metabolic Science Addenbrooke's Treatment Centre, Addenbrooke's Hospital, Cambridge, U.K.
The epidemic of obesity imposes unprecedented challenges on human adipose tissue (WAT) storage capacity that may benefit from adaptive mechanisms to maintain adipocyte functionality. Here, we demonstrate that changes in the regulatory feedback set point control of Insig1/SREBP1 represent an adaptive response that preserves WAT lipid homeostasis in obese and insulin-resistant states. In our experiments, we show that Insig1 mRNA expression decreases in WAT from mice with obesity-associated insulin resistance and from morbidly obese humans and in in vitro models of adipocyte insulin resistance. Insig1 downregulation is part of an adaptive response that promotes the maintenance of SREBP1 maturation and facilitates lipogenesis and availability of appropriate levels of fatty acid unsaturation, partially compensating the antilipogenic effect associated with insulin resistance. We describe for the first time the existence of this adaptive mechanism in WAT, which involves Insig1/SREBP1 and preserves the degree of lipid unsaturation under conditions of obesity-induced insulin resistance. These adaptive mechanisms contribute to maintain lipid desaturation through preferential SCD1 regulation and facilitate fat storage in WAT, despite on-going metabolic stress.
Funded by: Biotechnology and Biological Sciences Research Council: BB/H002731/1, BB/H013539/1, JF16994; British Heart Foundation: PG/10/38/28359; Medical Research Council: G0802051, MC_G0802535, MC_UP_A090_1006
Adipogenesis: new insights into brown adipose tissue differentiation.
S Carobbio, Wellcome Trust Genome Campus, Welcome Trust Sanger Insitute, Cambridge, United Kingdom.
Confirmation of the presence of functional brown adipose tissue (BAT) in humans has renewed the interest in investigating the potential therapeutic use of this tissue. The finding that its activity positively correlates with decreased BMI, fat content and augmented energy expenditure suggests that increasing BAT mass/activity or browning of WAT could be a strategy to prevent or treat obesity and its associated morbidities. The challenge now is to find a safe and efficient way to develop this idea. Whereas BAT has being widely studied in murine models both in vivo and in vitro, there is an urgent need for human cellular models to investigate BAT physiology and functionality from a molecular point of view. In our review, we focus on the latest insights surrounding BAT development and activation in rodents and humans. Then, we discuss how the availability of murine models has been essential to identify BAT progenitors and trace their lineage. Finally, we address how this information can be exploited to develop human cellular models for BAT differentiation/activation. In this context, human embryonic (hES) and induced plutipotent cells (hIPS)-based cellular models represent a resource of great potential value, as they can provide a virtually inexhaustible supply of starting material for functional genetic studies, -omics based analysis and validation of therapeutic approaches. Moreover, these cells can be easily genetically engineered, opening the possibility of generating patient-specific cellular models, allowing the investigation of the impact of different genetic backgrounds on BAT differentiation both in pathological or physiological states.
Journal of molecular endocrinology 2013
Mutations in GDP-Mannose Pyrophosphorylase B Cause Congenital and Limb-Girdle Muscular Dystrophies Associated with Hypoglycosylation of α-Dystroglycan.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.
Congenital muscular dystrophies with hypoglycosylation of α-dystroglycan (α-DG) are a heterogeneous group of disorders often associated with brain and eye defects in addition to muscular dystrophy. Causative variants in 14 genes thought to be involved in the glycosylation of α-DG have been identified thus far. Allelic mutations in these genes might also cause milder limb-girdle muscular dystrophy phenotypes. Using a combination of exome and Sanger sequencing in eight unrelated individuals, we present evidence that mutations in guanosine diphosphate mannose (GDP-mannose) pyrophosphorylase B (GMPPB) can result in muscular dystrophy variants with hypoglycosylated α-DG. GMPPB catalyzes the formation of GDP-mannose from GTP and mannose-1-phosphate. GDP-mannose is required for O-mannosylation of proteins, including α-DG, and it is the substrate of cytosolic mannosyltransferases. We found reduced α-DG glycosylation in the muscle biopsies of affected individuals and in available fibroblasts. Overexpression of wild-type GMPPB in fibroblasts from an affected individual partially restored glycosylation of α-DG. Whereas wild-type GMPPB localized to the cytoplasm, five of the identified missense mutations caused formation of aggregates in the cytoplasm or near membrane protrusions. Additionally, knockdown of the GMPPB ortholog in zebrafish caused structural muscle defects with decreased motility, eye abnormalities, and reduced glycosylation of α-DG. Together, these data indicate that GMPPB mutations are responsible for congenital and limb-girdle muscular dystrophies with hypoglycosylation of α-DG.
American journal of human genetics 2013
Use of Vitek 2 Antimicrobial Susceptibility Profile To Identify mecC in Methicillin-Resistant Staphylococcus aureus.
Department of Medicine, University of Cambridge, Cambridge, United Kingdom.
The emergence of mecC methicillin-resistant Staphylococcus aureus (MRSA) poses a diagnostic challenge for clinical microbiology laboratories. Using the Vitek 2 system, we tested a panel of 896 Staphylococcus aureus isolates and found that an oxacillin-sensitive/cefoxitin-resistant profile had a sensitivity of 88.7% and a specificity of 99.5% for the identification of mecC MRSA isolates. The presence of the mecC gene, determined by bacterial whole-genome sequencing, was used as the gold standard. This profile could provide a zero-cost screening method for identification of mecC-positive MRSA strains.
Journal of clinical microbiology 2013;51;8;2732-4
Phosphoproteomics data classify hematological cancer cell lines according to tumor type and sensitivity to kinase inhibitors.
Analytical Signalling Group, Centre for Cell Signalling, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1B 6BQ, UK. firstname.lastname@example.org.
BACKGROUND: Tumor classification based on their predicted responses to kinase inhibitors is a major goal for advancing targeted personalized therapies. Here, we used a phosphoproteomic approach to investigate biological heterogeneity across hematological cancer cell lines including acute myeloid leukemia, lymphoma, and multiple myeloma. RESULTS: Mass spectrometry was used to quantify 2,000 phosphorylation sites across three acute myeloid leukemia, three lymphoma, and three multiple myeloma cell lines in six biological replicates. The intensities of the phosphorylation sites grouped these cancer cell lines according to their tumor type. In addition, a phosphoproteomic analysis of seven acute myeloid leukemia cell lines revealed a battery of phosphorylation sites whose combined intensities correlated with the growth-inhibitory responses to three kinase inhibitors with remarkable correlation coefficients and fold changes (> 100 between the most resistant and sensitive cells). Modeling based on regression analysis indicated that a subset of phosphorylation sites could be used to predict response to the tested drugs. Quantitative analysis of phosphorylation motifs indicated that resistant and sensitive cells differed in their patterns of kinase activities, but, interestingly, phosphorylations correlating with responses were not on members of the pathway being targeted; instead, these mainly were on parallel kinase pathways. CONCLUSION: This study reveals that the information on kinase activation encoded in phosphoproteomics data correlates remarkably well with the phenotypic responses of cancer cells to compounds that target kinase signaling and could be useful for the identification of novel markers of resistance or sensitivity to drugs that target the signaling network.
Genome biology 2013;14;4;R37
Persistence of HIV-1 Transmitted Drug Resistance Mutations.
Medical Research Council Clinical Trials Unit, London, United Kingdom.
There are few data on the persistence of individual human immunodeficiency virus type 1 (HIV-1) transmitted drug resistance (TDR) mutations in the absence of selective drug pressure. We studied 313 patients in whom TDR mutations were detected at their first resistance test and who had a subsequent test performed while ART-naive. The rate at which mutations became undetectable was estimated using exponential regression accounting for interval censoring. Most thymidine analogue mutations (TAMs) and T215 revertants (but not T215F/Y) were found to be highly stable, with NNRTI and PI mutations being relatively less persistent. Our estimates are important for informing HIV transmission models.
The Journal of infectious diseases 2013;208;9;1459-63
Comprehensive assignment of roles for salmonella typhimurium genes in intestinal colonization of food-producing animals.
Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom.
Chickens, pigs, and cattle are key reservoirs of Salmonella enterica, a foodborne pathogen of worldwide importance. Though a decade has elapsed since publication of the first Salmonella genome, thousands of genes remain of hypothetical or unknown function, and the basis of colonization of reservoir hosts is ill-defined. Moreover, previous surveys of the role of Salmonella genes in vivo have focused on systemic virulence in murine typhoid models, and the genetic basis of intestinal persistence and thus zoonotic transmission have received little study. We therefore screened pools of random insertion mutants of S. enterica serovar Typhimurium in chickens, pigs, and cattle by transposon-directed insertion-site sequencing (TraDIS). The identity and relative fitness in each host of 7,702 mutants was simultaneously assigned by massively parallel sequencing of transposon-flanking regions. Phenotypes were assigned to 2,715 different genes, providing a phenotype-genotype map of unprecedented resolution. The data are self-consistent in that multiple independent mutations in a given gene or pathway were observed to exert a similar fitness cost. Phenotypes were further validated by screening defined null mutants in chickens. Our data indicate that a core set of genes is required for infection of all three host species, and smaller sets of genes may mediate persistence in specific hosts. By assigning roles to thousands of Salmonella genes in key reservoir hosts, our data facilitate systems approaches to understand pathogenesis and the rational design of novel cross-protective vaccines and inhibitors. Moreover, by simultaneously assigning the genotype and phenotype of over 90% of mutants screened in complex pools, our data establish TraDIS as a powerful tool to apply rich functional annotation to microbial genomes with minimal animal use.
PLoS genetics 2013;9;4;e1003456
Mcph1-deficient mice reveal a role for MCPH1 in otitis media.
Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom.
Otitis media is a common reason for hearing loss, especially in children. Otitis media is a multifactorial disease and environmental factors, anatomic dysmorphology and genetic predisposition can all contribute to its pathogenesis. However, the reasons for the variable susceptibility to otitis media are elusive. MCPH1 mutations cause primary microcephaly in humans. So far, no hearing impairment has been reported either in the MCPH1 patients or mouse models with Mcph1 deficiency. In this study, Mcph1-deficient (Mcph1(tm1a) (/tm1a) ) mice were produced using embryonic stem cells with a targeted mutation by the Sanger Institute's Mouse Genetics Project. Auditory brainstem response measurements revealed that Mcph1(tm1a) (/tm1a) mice had mild to moderate hearing impairment with around 70% penetrance. We found otitis media with effusion in the hearing-impaired Mcph1(tm1a) (/tm1a) mice by anatomic and histological examinations. Expression of Mcph1 in the epithelial cells of middle ear cavities supported its involvement in the development of otitis media. Other defects of Mcph1(tm1a) (/tm1a) mice included small skull sizes, increased micronuclei in red blood cells, increased B cells and ocular abnormalities. These findings not only recapitulated the defects found in other Mcph1-deficient mice or MCPH1 patients, but also revealed an unexpected phenotype, otitis media with hearing impairment, which suggests Mcph1 is a new gene underlying genetic predisposition to otitis media.
Funded by: Cancer Research UK: 12401, 13031, C20510/A13031; Medical Research Council: G0300212, MC_QA137918; NEI NIH HHS: P30 EY019007, R01 EY018213; Wellcome Trust: 098051, 100669
PloS one 2013;8;3;e58156
Proteomic comparison of historic and recently emerged hypervirulent Clostridium difficile strains.
Department of Population Medicine and Diagnostic Sciences, Cornell University, Ithaca, New York 14853, United States.
Clostridium difficile in recent years has undergone rapid evolution and has emerged as a serious human pathogen. Proteomic approaches can improve the understanding of the diversity of this important pathogen, especially in comparing the adaptive ability of different C. difficile strains. In this study, TMT labeling and nanoLC-MS/MS driven proteomics were used to investigate the responses of four C. difficile strains to nutrient shift and osmotic shock. We detected 126 and 67 differentially expressed proteins in at least one strain under nutrition shift and osmotic shock, respectively. During nutrient shift, several components of the phosphotransferase system (PTS) were found to be differentially expressed, which indicated that the carbon catabolite repression (CCR) was relieved to allow the expression of enzymes and transporters responsible for the utilization of alternate carbon sources. Some classical osmotic shock associated proteins, such as GroEL, RecA, CspG, and CspF, and other stress proteins such as PurG and SerA were detected during osmotic shock. Furthermore, the recently emerged strains were found to contain a more robust gene network in response to both stress conditions. This work represents the first comparative proteomic analysis of historic and recently emerged hypervirulent C. difficile strains, complementing the previously published proteomics studies utilizing only one reference strain.
Funded by: NCRR NIH HHS: S10 RR025449
Journal of proteome research 2013;12;3;1151-61
Your gut microbiota are what you eat.
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Nature reviews. Microbiology 2013;12;1;8
Elucidating emergence and transmission of multidrug-resistant tuberculosis in treatment experienced patients by whole genome sequencing.
Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, London, United Kingdom ; Faculty of Epidemiology and Population Health, London School of Hygiene & Tropical Medicine, London, United Kingdom.
Background: Understanding the emergence and spread of multidrug-resistant tuberculosis (MDR-TB) is crucial for its control. MDR-TB in previously treated patients is generally attributed to the selection of drug resistant mutants during inadequate therapy rather than transmission of a resistant strain. Traditional genotyping methods are not sufficient to distinguish strains in populations with a high burden of tuberculosis and it has previously been difficult to assess the degree of transmission in these settings. We have used whole genome analysis to investigate M. tuberculosis strains isolated from treatment experienced patients with MDR-TB in Uganda over a period of four years. We used high throughput genome sequencing technology to investigate small polymorphisms and large deletions in 51 Mycobacterium tuberculosis samples from 41 treatment-experienced TB patients attending a TB referral and treatment clinic in Kampala. This was a convenience sample representing 69% of MDR-TB cases identified over the four year period. Low polymorphism was observed in longitudinal samples from individual patients (2-15 SNPs). Clusters of samples with less than 50 SNPs variation were examined. Three clusters comprising a total of 8 patients were found with almost identical genetic profiles, including mutations predictive for resistance to rifampicin and isoniazid, suggesting transmission of MDR-TB. Two patients with previous drug susceptible disease were found to have acquired MDR strains, one of which shared its genotype with an isolate from another patient in the cohort. Conclusions: Whole genome sequence analysis identified MDR-TB strains that were shared by more than one patient. The transmission of multidrug-resistant disease in this cohort of retreatment patients emphasises the importance of early detection and need for infection control. Consideration should be given to rapid testing for drug resistance in patients undergoing treatment to monitor the emergence of resistance and permit early intervention to avoid onward transmission.
PloS one 2013;8;12;e83012
Genome of Acanthamoeba castellanii highlights extensive lateral gene transfer and early evolution of tyrosine kinase signaling.
Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland. email@example.com.
BACKGROUND: The Amoebozoa constitute one of the primary divisions of eukaryotes, encompassing taxa of both biomedical and evolutionary importance, yet its genomic diversity remains largely unsampled. Here we present an analysis of a whole genome assembly of Acanthamoeba castellanii (Ac) the first representative from a solitary free-living amoebozoan. RESULTS: Ac encodes 15,455 compact intron-rich genes, a significant number of which are predicted to have arisen through inter-kingdom lateral gene transfer (LGT). A majority of the LGT candidates have undergone a substantial degree of intronization and Ac appears to have incorporated them into established transcriptional programs. Ac manifests a complex signaling and cell communication repertoire, including a complete tyrosine kinase signaling toolkit and a comparable diversity of predicted extracellular receptors to that found in the facultatively multicellular dictyostelids. An important environmental host of a diverse range of bacteria and viruses, Ac utilizes a diverse repertoire of predicted pattern recognition receptors, many with predicted orthologous functions in the innate immune systems of higher organisms. CONCLUSIONS: Our analysis highlights the important role of LGT in the biology of Ac and in the diversification of microbial eukaryotes. The early evolution of a key signaling facility implicated in the evolution of metazoan multicellularity strongly argues for its emergence early in the Unikont lineage. Overall, the availability of an Ac genome should aid in deciphering the biology of the Amoebozoa and facilitate functional genomic studies in this important model organism and environmental host.
Genome biology 2013;14;2;R11
Identification of seven loci affecting mean telomere length and their association with disease.
Department of Cardiovascular Sciences, University of Leicester, Leicester, UK.
Interindividual variation in mean leukocyte telomere length (LTL) is associated with cancer and several age-associated diseases. We report here a genome-wide meta-analysis of 37,684 individuals with replication of selected variants in an additional 10,739 individuals. We identified seven loci, including five new loci, associated with mean LTL (P < 5 × 10(-8)). Five of the loci contain candidate genes (TERC, TERT, NAF1, OBFC1 and RTEL1) that are known to be involved in telomere biology. Lead SNPs at two loci (TERC and TERT) associate with several cancers and other diseases, including idiopathic pulmonary fibrosis. Moreover, a genetic risk score analysis combining lead variants at all 7 loci in 22,233 coronary artery disease cases and 64,762 controls showed an association of the alleles associated with shorter LTL with increased risk of coronary artery disease (21% (95% confidence interval, 5-35%) per standard deviation in LTL, P = 0.014). Our findings support a causal role of telomere-length variation in some age-related diseases.
Funded by: British Heart Foundation: RG/08/014/24067; Medical Research Council: G0902313; NIDA NIH HHS: R56 DA012854
Nature genetics 2013;45;4;422-7, 427e1-2
Real-time genomic epidemiological evaluation of human campylobacter isolates by use of whole-genome multilocus sequence typing.
Department of Zoology, University of Oxford, Oxford, United Kingdom.
Sequence-based typing is essential for understanding the epidemiology of Campylobacter infections, a major worldwide cause of bacterial gastroenteritis. We demonstrate the practical and rapid exploitation of whole-genome sequencing to provide routine definitive characterization of Campylobacter jejuni and Campylobacter coli for clinical and public health purposes. Short-read data from 384 Campylobacter clinical isolates collected over 4 months in Oxford, United Kingdom, were assembled de novo. Contigs were deposited at the pubMLST.org/campylobacter website and automatically annotated for 1,667 loci. Typing and phylogenetic information was extracted and comparative analyses were performed for various subsets of loci, up to the level of the whole genome, using the Genome Comparator and Neighbor-net algorithms. The assembled sequences (for 379 isolates) were diverse and resembled collections from previous studies of human campylobacteriosis. Small subsets of very closely related isolates originated mainly from repeated sampling from the same patients and, in one case, likely laboratory contamination. Much of the within-patient variation occurred in phase-variable genes. Clinically and epidemiologically informative data can be extracted from whole-genome sequence data in real time with straightforward, publicly available tools. These analyses are highly scalable, are transparent, do not require closely related genome reference sequences, and provide improved resolution (i) among Campylobacter clonal complexes and (ii) between very closely related isolates. Additionally, these analyses rapidly differentiated unrelated isolates, allowing the detection of single-strain clusters. The approach is widely applicable to analyses of human bacterial pathogens in real time in clinical laboratories, with little specialist training required.
Journal of clinical microbiology 2013;51;8;2526-34
Two Pfam protein families characterized by a crystal structure of protein lpg2210 from Legionella pneumophila.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. firstname.lastname@example.org.
Background: Every genome contains a large number of uncharacterized proteins that may encode entirely novel biological systems. Many of these uncharacterized proteins fall into related sequence families. By applying sequence and structural analysis we hope to provide insight into novel biology.
Results: We analyze a previously uncharacterized Pfam protein family called DUF4424 [Pfam:PF14415]. The recently solved three-dimensional structure of the protein lpg2210 from Legionella pneumophila provides the first structural information pertaining to this family. This protein additionally includes the first representative structure of another Pfam family called the YARHG domain [Pfam:PF13308]. The Pfam family DUF4424 adopts a 19-stranded beta-sandwich fold that shows similarity to the N-terminal domain of leukotriene A-4 hydrolase. The YARHG domain forms an all-helical domain at the C-terminus. Structure analysis allows us to recognize distant similarities between the DUF4424 domain and individual domains of M1 aminopeptidases and tricorn proteases, which form massive proteasome-like capsids in both archaea and bacteria.
Conclusions: Based on our analyses we hypothesize that the DUF4424 domain may have a role in forming large, multi-component enzyme complexes. We suggest that the YARGH domain may play a role in binding a moiety in proximity with peptidoglycan, such as a hydrophobic outer membrane lipid or lipopolysaccharide.
Funded by: Howard Hughes Medical Institute; Intramural NIH HHS; Medical Research Council: MC_U105192716; NIGMS NIH HHS: P41GM103393, R01GM101457, U54 GM094586; Wellcome Trust: WT077044/Z/05/Z
BMC bioinformatics 2013;14;265
Toward knowledge support for analysis and interpretation of complex traits.
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. email@example.com.
The systematic description of complex traits, from the organism to the cellular level, is important for hypothesis generation about underlying disease mechanisms. We discuss how intelligent algorithms might provide support, leading to faster throughput.
Genome biology 2013;14;9;214
Learning to Recognize Phenotype Candidates in the Auto-Immune Literature Using SVM Re-Ranking.
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, United Kingdom ; National Institute of Informatics, Tokyo, Japan.
The identification of phenotype descriptions in the scientific literature, case reports and patient records is a rewarding task for bio-medical text mining. Any progress will support knowledge discovery and linkage to other resources. However because of their wide variation a number of challenges still remain in terms of their identification and semantic normalisation before they can be fully exploited for research purposes. This paper presents novel techniques for identifying potential complex phenotype mentions by exploiting a hybrid model based on machine learning, rules and dictionary matching. A systematic study is made of how to combine sequence labels from these modules as well as the merits of various ontological resources. We evaluated our approach on a subset of Medline abstracts cited by the Online Mendelian Inheritance of Man database related to auto-immune diseases. Using partial matching the best micro-averaged F-score for phenotypes and five other entity classes was 79.9%. A best performance of 75.3% was achieved for phenotype candidates using all semantics resources. We observed the advantage of using SVM-based learn-to-rank for sequence label combination over maximum entropy and a priority list approach. The results indicate that the identification of simple entity types such as chemicals and genes are robustly supported by single semantic resources, whereas phenotypes require combinations. Altogether we conclude that our approach coped well with the compositional structure of phenotypes in the auto-immune domain.
PloS one 2013;8;10;e72965
Genomic and proteomic dissection of the ubiquitous plant pathogen, Armillaria mellea: toward a new infection model system.
Department of Biology, National University of Ireland Maynooth, Maynooth, Co Kildare, Ireland.
Armillaria mellea is a major plant pathogen. Yet, no large-scale "-omics" data are available to enable new studies, and limited experimental models are available to investigate basidiomycete pathogenicity. Here we reveal that the A. mellea genome comprises 58.35 Mb, contains 14473 gene models, of average length 1575 bp (4.72 introns/gene). Tandem mass spectrometry identified 921 mycelial (n = 629 unique) and secreted (n = 183 unique) proteins. Almost 100 mycelial proteins were either species-specific or previously unidentified at the protein level. A number of proteins (n = 111) was detected in both mycelia and culture supernatant extracts. Signal sequence occurrence was 4-fold greater for secreted (50.2%) compared to mycelial (12%) proteins. Analyses revealed a rich reservoir of carbohydrate degrading enzymes, laccases, and lignin peroxidases in the A. mellea proteome, reminiscent of both basidiomycete and ascomycete glycodegradative arsenals. We discovered that A. mellea exhibits a specific killing effect against Candida albicans during coculture. Proteomic investigation of this interaction revealed the unique expression of defensive and potentially offensive A. mellea proteins (n = 30). Overall, our data reveal new insights into the origin of basidiomycete virulence and we present a new model system for further studies aimed at deciphering fungal pathogenic mechanisms.
Journal of proteome research 2013;12;6;2552-70
Small effective population size and genetic homogeneity in the Val Borbera isolate.
Institute of Genetics and Biophysics 'A. Buzzati-Traverso', National Research Council (CNR), Naples, Italy. firstname.lastname@example.org
Population isolates are a valuable resource for medical genetics because of their reduced genetic, phenotypic and environmental heterogeneity. Further, extended linkage disequilibrium (LD) allows accurate haplotyping and imputation. In this study, we use nuclear and mitochondrial DNA data to determine to what extent the geographically isolated population of the Val Borbera valley also presents features of genetic isolation. We performed a comparative analysis of population structure and estimated effective population size exploiting LD data. We also evaluated haplotype sharing through the analysis of segments of autozygosity. Our findings reveal that the valley has features characteristic of a genetic isolate, including reduced genetic heterogeneity and reduced effective population size. We show that this population has been subject to prolonged genetic drift and thus we expect many variants that are rare in the general population to reach significant frequency values in the valley, making this population suitable for the identification of rare variants underlying complex traits.
European journal of human genetics : EJHG 2013;21;1;89-94
Out-of-Africa migration and Neolithic coexpansion of Mycobacterium tuberculosis with modern humans.
1] Genomics and Health Unit, Centre for Public Health Research (CSISP-FISABIO), Valencia, Spain.  CIBER (Centros de Investigación Biomédica en Red) in Epidemiology and Public Health, Barcelona, Spain.
Tuberculosis caused 20% of all human deaths in the Western world between the seventeenth and nineteenth centuries and remains a cause of high mortality in developing countries. In analogy to other crowd diseases, the origin of human tuberculosis has been associated with the Neolithic Demographic Transition, but recent studies point to a much earlier origin. We analyzed the whole genomes of 259 M. tuberculosis complex (MTBC) strains and used this data set to characterize global diversity and to reconstruct the evolutionary history of this pathogen. Coalescent analyses indicate that MTBC emerged about 70,000 years ago, accompanied migrations of anatomically modern humans out of Africa and expanded as a consequence of increases in human population density during the Neolithic period. This long coevolutionary history is consistent with MTBC displaying characteristics indicative of adaptation to both low and high host densities.
Funded by: Medical Research Council: MC_U117581288, MC_U117588500, U.1175.02.002.00015.01, U117581288; NIAID NIH HHS: AI090928 AND, R01 AI090928; PHS HHS: HHSN266200700022C; Wellcome Trust: 089276, 098051
Nature genetics 2013;45;10;1176-82
Epigenetic regulation of COL15A1 in smooth muscle cell replicative aging and atherosclerosis.
Department of Medicine and Division of Cardiovascular Medicine and.
Smooth muscle cell (SMC) proliferation is a hallmark of vascular injury and disease. Global hypomethylation occurs during SMC proliferation in culture and in vivo during neointimal formation. Regardless of the programmed or stochastic nature of hypomethylation, identifying these changes is important in understanding vascular disease, as maintenance of a cells' epigenetic profile is essential for maintaining cellular phenotype. Global hypomethylation of proliferating aortic SMCs and concomitant decrease of DNMT1 expression were identified in culture during passage. An epigenome screen identified regions of the genome that were hypomethylated during proliferation and a region containing Collagen, type XV, alpha 1 (COL15A1) was selected by 'genomic convergence' for characterization. COL15A1 transcript and protein levels increased with passage-dependent decreases in DNA methylation and the transcript was sensitive to treatment with 5-Aza-2'-deoxycytidine, suggesting DNA methylation-mediated gene expression. Phenotypically, knockdown of COL15A1 increased SMC migration and decreased proliferation and Col15a1 expression was induced in an atherosclerotic lesion and localized to the atherosclerotic cap. A sequence variant in COL15A1 that is significantly associated with atherosclerosis (rs4142986, P = 0.017, OR = 1.434) was methylated and methylation of the risk allele correlated with decreased gene expression and increased atherosclerosis in human aorta. In summary, hypomethylation of COL15A1 occurs during SMC proliferation and the consequent increased gene expression may impact SMC phenotype and atherosclerosis formation. Hypomethylated genes, such as COL15A1, provide evidence for concomitant epigenetic regulation and genetic susceptibility, and define a class of causal targets that sit at the intersection of genetic and epigenetic predisposition in the etiology of complex disease.
Funded by: NHLBI NIH HHS: HL073389, HL73042, K99/R00HL089412; NIA NIH HHS: AG028716, P30 AG028716
Human molecular genetics 2013;22;25;5107-20
Detailed molecular characterisation of acute myeloid leukaemia with a normal karyotype using targeted DNA capture.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
Advances in sequencing technologies are giving unprecedented insights into the spectrum of somatic mutations underlying acute myeloid leukaemia with a normal karyotype (AML-NK). It is clear that the prognosis of individual patients is strongly influenced by the combination of mutations in their leukaemia and that many leukaemias are composed of multiple subclones, with differential susceptibilities to treatment. Here, we describe a method, employing targeted capture coupled with next-generation sequencing and tailored bioinformatic analysis, for the simultaneous study of 24 genes recurrently mutated in AML-NK. Mutational analysis was performed using open source software and an in-house script (Mutation Identification and Analysis Software), which identified dominant clone mutations with 100% specificity. In each of seven cases of AML-NK studied, we identified and verified mutations in 2-4 genes in the main leukaemic clone. Additionally, high sequencing depth enabled us to identify putative subclonal mutations and detect leukaemia-specific mutations in DNA from remission marrow. Finally, we used normalised read depths to detect copy number changes and identified and subsequently verified a tandem duplication of exons 2-9 of MLL and at least one deletion involving PTEN. This methodology reliably detects sequence and copy number mutations, and can thus greatly facilitate the classification, clinical research, diagnosis and management of AML-NK.
Funded by: Wellcome Trust: 079249, 095663, 100140
Where genotype is not predictive of phenotype: towards an understanding of the molecular basis of reduced penetrance in human inherited disease.
Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK, cooperDN@cardiff.ac.uk.
Some individuals with a particular disease-causing mutation or genotype fail to express most if not all features of the disease in question, a phenomenon that is known as 'reduced (or incomplete) penetrance'. Reduced penetrance is not uncommon; indeed, there are many known examples of 'disease-causing mutations' that fail to cause disease in at least a proportion of the individuals who carry them. Reduced penetrance may therefore explain not only why genetic diseases are occasionally transmitted through unaffected parents, but also why healthy individuals can harbour quite large numbers of potentially disadvantageous variants in their genomes without suffering any obvious ill effects. Reduced penetrance can be a function of the specific mutation(s) involved or of allele dosage. It may also result from differential allelic expression, copy number variation or the modulating influence of additional genetic variants in cis or in trans. The penetrance of some pathogenic genotypes is known to be age- and/or sex-dependent. Variable penetrance may also reflect the action of unlinked modifier genes, epigenetic changes or environmental factors. At least in some cases, complete penetrance appears to require the presence of one or more genetic variants at other loci. In this review, we summarize the evidence for reduced penetrance being a widespread phenomenon in human genetics and explore some of the molecular mechanisms that may help to explain this enigmatic characteristic of human inherited disease.
Funded by: Wellcome Trust: 098051
Human genetics 2013;132;10;1077-130
Novel Mycobacterium tuberculosis complex isolate from a wild chimpanzee.
Swiss Tropical and Public Health Institute, Basel, Switzerland.
Tuberculosis (TB) is caused by gram-positive bacteria known as the Mycobacterium tuberculosis complex (MTBC). MTBC include several human-associated lineages and several variants adapted to domestic and, more rarely, wild animal species. We report an M. tuberculosis strain isolated from a wild chimpanzee in Côte d'Ivoire that was shown by comparative genomic and phylogenomic analyses to belong to a new lineage of MTBC, closer to the human-associated lineage 6 (also known as M. africanum West Africa 2) than to the other classical animal-associated MTBC strains. These results show that the general view of the genetic diversity of MTBC is limited and support the possibility that other MTBC variants exist, particularly in wild mammals in Africa. Exploring this diversity is crucial to the understanding of the biology and evolutionary history of this widespread infectious disease.
Funded by: NIAID NIH HHS: AI090928; PHS HHS: HHSN266200700022C; Wellcome Trust
Emerging infectious diseases 2013;19;6;969-76
Full-genome deep sequencing and phylogenetic analysis of novel human betacoronavirus.
Wellcome Trust Sanger Institute, Hinxton, UK.
A novel betacoronavirus associated with lethal respiratory and renal complications was recently identified in patients from several countries in the Middle East. We report the deep genome sequencing of the virus directly from a patient's sputum sample. Our high-throughput sequencing yielded a substantial depth of genome sequence assembly and showed the minority viral variants in the specimen. Detailed phylogenetic analysis of the virus genome (England/Qatar/2012) revealed its close relationship to European bat coronaviruses circulating among the bat species of the Vespertilionidae family. Molecular clock analysis showed that the 2 human infections of this betacoronavirus in June 2012 (EMC/2012) and September 2012 (England/Qatar/2012) share a common virus ancestor most likely considerably before early 2012, suggesting the human diversity is the result of multiple zoonotic events.
Funded by: Medical Research Council: MR/K006584/1; Wellcome Trust: 093724, 095831
Emerging infectious diseases 2013;19;5;736-42B
Genome-wide association and longitudinal analyses reveal genetic loci linking pubertal height growth, pubertal timing and childhood adiposity.
Institute for Molecular Medicine, and Department of Public Health, University of Helsinki, Helsinki, Finland.
The pubertal height growth spurt is a distinctive feature of childhood growth reflecting both the central onset of puberty and local growth factors. Although little is known about the underlying genetics, growth variability during puberty correlates with adult risks for hormone-dependent cancer and adverse cardiometabolic health. The only gene so far associated with pubertal height growth, LIN28B, pleiotropically influences childhood growth, puberty and cancer progression, pointing to shared underlying mechanisms. To discover genetic loci influencing pubertal height and growth and to place them in context of overall growth and maturation, we performed genome-wide association meta-analyses in 18 737 European samples utilizing longitudinally collected height measurements. We found significant associations (P < 1.67 × 10(-8)) at 10 loci, including LIN28B. Five loci associated with pubertal timing, all impacting multiple aspects of growth. In particular, a novel variant correlated with expression of MAPK3, and associated both with increased prepubertal growth and earlier menarche. Another variant near ADCY3-POMC associated with increased body mass index, reduced pubertal growth and earlier puberty. Whereas epidemiological correlations suggest that early puberty marks a pathway from rapid prepubertal growth to reduced final height and adult obesity, our study shows that individual loci associating with pubertal growth have variable longitudinal growth patterns that may differ from epidemiological observations. Overall, this study uncovers part of the complex genetic architecture linking pubertal height growth, the timing of puberty and childhood obesity and provides new information to pinpoint processes linking these traits.
Funded by: British Heart Foundation; Canadian Institutes of Health Research: MOP-82893; Medical Research Council: 74882, G0000934, G0500539, G0600705, G0601653, G9815508, MC_U127592696, MC_UP_A620_1014, MC_UP_A620_1017, MC_UU_12011/1, MC_UU_12013/3, RD1634; NHGRI NIH HHS: U01 HG004423, U01 HG006830; NHLBI NIH HHS: 5R01HL087679-02; NIA NIH HHS: R01 AG041517; NIAAA NIH HHS: AA-08315, AA-09203, AA-12502, K05 AA017688; NICHD NIH HHS: R01 HD056465; NIDDK NIH HHS: U01 DK062418; NIMH NIH HHS: 1RL1MH083268-01, 5R01MH63706:02; Wellcome Trust: 068545/Z/02, 076467, 084762MA, 090532, 092731, WT083431MA
Human molecular genetics 2013;22;13;2735-47
Large scale variation in DNA copy number in chicken breeds.
Animal Breeding and Genomics Centre, Wageningen University, P,O, box 338, Wageningen 6700 AH, The Netherlands. email@example.com.
Background: Detecting genetic variation is a critical step in elucidating the molecular mechanisms underlying phenotypic diversity. Until recently, such detection has mostly focused on single nucleotide polymorphisms (SNPs) because of the ease in screening complete genomes. Another type of variant, copy number variation (CNV), is emerging as a significant contributor to phenotypic variation in many species. Here we describe a genome-wide CNV study using array comparative genomic hybridization (aCGH) in a wide variety of chicken breeds. Results: We identified 3,154 CNVs, grouped into 1,556 CNV regions (CNVRs). Thirty percent of the CNVs were detected in at least 2 individuals. The average size of the CNVs detected was 46.3 kb with the largest CNV, located on GGAZ, being 4.3 Mb. Approximately 75% of the CNVs are copy number losses relatively to the Red Jungle Fowl reference genome. The genome coverage of CNVRs in this study is 60 Mb, which represents almost 5.4% of the chicken genome. In particular large gene families such as the keratin gene family and the MHC show extensive CNV. Conclusions: A relative large group of the CNVs are line-specific, several of which were previously shown to be related to the causative mutation for a number of phenotypic variants. The chance that inter-specific CNVs fall into CNVRs detected in chicken is related to the evolutionary distance between the species. Our results provide a valuable resource for the study of genetic and phenotypic variation in this phenotypically diverse species.
BMC genomics 2013;14;398
Identification of Null Alleles and Deletions from SNP Genotypes for an Intercross Between Domestic and Wild Chickens.
Wellcome Trust Sanger Institute.
We analyzed genotypes from ~10K SNPs in two families of an F2 intercross between Red Junglefowl and White Leghorn chickens. Possible null alleles were found by patterns of incompatible and missing genotypes. We estimated that 2.6% of SNPs had null alleles compared to 2.3% with genotyping errors and that 40% of SNPs where a parent and offspring were genotyped as different homozygotes had null alleles. Putative deletions were identified by null alleles at adjacent markers. We found two candidate deletions that were supported by fluorescence intensity data from a 60K SNP chip. One of the candidate deletions was from the Red Junglefowl and one was present in both the Red Junglefowl and White Leghorn. Both candidate deletions spanned protein-coding regions and were close to a previously detected QTL affecting body weight in this population. This study demonstrates that the ~50K SNP genotyping arrays now available for several agricultural species can be used to identify null alleles and deletions in data from large families. We suggest that our approach could be a useful complement to linkage analysis in experimental crosses.
G3 (Bethesda, Md.) 2013
A library of functional recombinant cell-surface and secreted P. falciparum merozoite proteins.
Cell Surface Signalling laboratory, Wellcome Trust Sanger Institute, Cambridge CB10 1HH, UK;
Malaria, an infectious disease caused by parasites of the Plasmodium genus, is one of the world's major public health concerns causing up to a million deaths annually, mostly because of P. falciparum infections. All of the clinical symptoms are associated with the blood stage of the disease, an obligate part of the parasite life cycle, when a form of the parasite called the merozoite recognizes and invades host erythrocytes. During erythrocyte invasion, merozoites are directly exposed to the host humoral immune system making the blood stage of the parasite a conceptually attractive therapeutic target. Progress in the functional and molecular characterization of P. falciparum merozoite proteins, however, has been hampered by the technical challenges associated with expressing these proteins in a biochemically active recombinant form. This challenge is particularly acute for extracellular proteins, which are the likely targets of host antibody responses, because they contain structurally critical post-translational modifications that are not added by some recombinant expression systems. Here, we report the development of a method that uses a mammalian expression system to compile a protein resource containing the entire ectodomains of 42 P. falciparum merozoite secreted and cell surface proteins, many of which have not previously been characterized. Importantly, we are able to recapitulate known biochemical activities by showing that recombinant MSP1-MSP7 and P12-P41 directly interact, and that both recombinant EBA175 and EBA140 can bind human erythrocytes in a sialic acid-dependent manner. Finally, we use sera from malaria-exposed immune adults to profile the relative immunoreactivity of the proteins and show that the majority of the antigens contain conformational (heat-labile) epitopes. We envisage that this resource of recombinant proteins will make a valuable contribution toward a molecular understanding of the blood stage of P. falciparum infections and facilitate the comparative screening of antigens as blood-stage vaccine candidates.
Funded by: Medical Research Council: MR/J002283/1; Wellcome Trust: 092654, 098051
Molecular & cellular proteomics : MCP 2013;12;12;3976-86
Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs.
The University of Queensland, Queensland Brain Institute, Brisbane, Queensland, Australia.
Most psychiatric disorders are moderately to highly heritable. The degree to which genetic variation is unique to individual disorders or shared across disorders is unclear. To examine shared genetic etiology, we use genome-wide genotype data from the Psychiatric Genomics Consortium (PGC) for cases and controls in schizophrenia, bipolar disorder, major depressive disorder, autism spectrum disorders (ASD) and attention-deficit/hyperactivity disorder (ADHD). We apply univariate and bivariate methods for the estimation of genetic variation within and covariation between disorders. SNPs explained 17-29% of the variance in liability. The genetic correlation calculated using common SNPs was high between schizophrenia and bipolar disorder (0.68 ± 0.04 s.e.), moderate between schizophrenia and major depressive disorder (0.43 ± 0.06 s.e.), bipolar disorder and major depressive disorder (0.47 ± 0.06 s.e.), and ADHD and major depressive disorder (0.32 ± 0.07 s.e.), low between schizophrenia and ASD (0.16 ± 0.06 s.e.) and non-significant for other pairs of disorders as well as between psychiatric disorders and the negative control of Crohn's disease. This empirical evidence of shared genetic etiology for psychiatric disorders can inform nosology and encourages the investigation of common pathophysiologies for related disorders.
Nature genetics 2013;45;9;984-94
Population genomics of post-vaccine changes in pneumococcal epidemiology.
Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, USA.
Whole-genome sequencing of 616 asymptomatically carried Streptococcus pneumoniae isolates was used to study the impact of the 7-valent pneumococcal conjugate vaccine. Comparison of closely related isolates showed the role of transformation in facilitating capsule switching to non-vaccine serotypes and the emergence of drug resistance. However, such recombination was found to occur at significantly different rates across the species, and the evolution of the population was primarily driven by changes in the frequency of distinct genotypes extant before the introduction of the vaccine. These alterations resulted in little overall effect on accessory genome composition at the population level, contrasting with the decrease in pneumococcal disease rates after the vaccine's introduction.
Funded by: NIAID NIH HHS: R01 AI066304, R01AI066304; Wellcome Trust: 098051
Nature genetics 2013;45;6;656-63
Bacterial genomes in epidemiology--present and future.
Department of Epidemiology, Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA, USA.
Sequence data are well established in the reconstruction of the phylogenetic and demographic scenarios that have given rise to outbreaks of viral pathogens. The application of similar methods to bacteria has been hindered in the main by the lack of high-resolution nucleotide sequence data from quality samples. Developing and already available genomic methods have greatly increased the amount of data that can be used to characterize an isolate and its relationship to others. However, differences in sequencing platforms and data analysis mean that these enhanced data come with a cost in terms of portability: results from one laboratory may not be directly comparable with those from another. Moreover, genomic data for many bacteria bear the mark of a history including extensive recombination, which has the potential to greatly confound phylogenetic and coalescent analyses. Here, we discuss the exacting requirements of genomic epidemiology, and means by which the distorting signal of recombination can be minimized to permit the leverage of growing datasets of genomic data from bacterial pathogens.
Funded by: NIAID NIH HHS: T32 AI007061; NIGMS NIH HHS: GM088558-01; Wellcome Trust: 098051
Philosophical transactions of the Royal Society of London. Series B, Biological sciences 2013;368;1614;20120202
Dominant role of nucleotide substitution in the diversification of serotype 3 pneumococci over decades and during a single infection.
Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom ; Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, United States of America.
Streptococcus pneumoniae of serotype 3 possess a mucoid capsule and cause disease associated with high mortality rates relative to other pneumococci. Phylogenetic analysis of a complete reference genome and 81 draft sequences from clonal complex 180, the predominant serotype 3 clone in much of the world, found most sampled isolates belonged to a clade affected by few diversifying recombinations. However, other isolates indicate significant genetic variation has accumulated over the clonal complex's entire history. Two closely related genomes, one from the blood and another from the cerebrospinal fluid, were obtained from a patient with meningitis. The pair differed in their behaviour in a mouse model of disease and in their susceptibility to antimicrobials, with at least some of these changes attributable to a mutation that up-regulated the patAB efflux pump. This indicates clinically important phenotypic variation can accumulate rapidly through small alterations to the genotype.
Funded by: Wellcome Trust: 086547, 098051
PLoS genetics 2013;9;10;e1003868
High-resolution mapping of complex traits with a four-parent advanced intercross yeast population.
Centre for Genetics and Genomics, Queen's Medical Centre, University of Nottingham, Nottingham, NG7 2UH, United Kingdom.
A large fraction of human complex trait heritability is due to a high number of variants with small marginal effects and their interactions with genotype and environment. Such alleles are more easily studied in model organisms, where environment, genetic makeup, and allele frequencies can be controlled. Here, we examine the effect of natural genetic variation on heritable traits in a very large pool of baker's yeast from a multiparent 12th generation intercross. We selected four representative founder strains to produce the Saccharomyces Genome Resequencing Project (SGRP)-4X mapping population and sequenced 192 segregants to generate an accurate genetic map. Using these individuals, we mapped 25 loci linked to growth traits under heat stress, arsenite, and paraquat, the majority of which were best explained by a diverging phenotype caused by a single allele in one condition. By sequencing pooled DNA from millions of segregants grown under heat stress, we further identified 34 and 39 regions selected in haploid and diploid pools, respectively, with most of the selection against a single allele. While the most parsimonious model for the majority of loci mapped using either approach was the effect of an allele private to one founder, we could validate examples of pleiotropic effects and complex allelic series at a locus. SGRP-4X is a deeply characterized resource that provides a framework for powerful and high-resolution genetic analysis of yeast phenotypes and serves as a test bed for testing avenues to attack human complex traits.
Funded by: Wellcome Trust: 098051, WT077192/Z/05/Z
SMIM1 underlies the Vel blood group and influences red blood cell traits.
Department of Haematology, University of Cambridge, Cambridge, UK. firstname.lastname@example.org
The blood group Vel was discovered 60 years ago, but the underlying gene is unknown. Individuals negative for the Vel antigen are rare and are required for the safe transfusion of patients with antibodies to Vel. To identify the responsible gene, we sequenced the exomes of five individuals negative for the Vel antigen and found that four were homozygous and one was heterozygous for a low-frequency 17-nucleotide frameshift deletion in the gene encoding the 78-amino-acid transmembrane protein SMIM1. A follow-up study showing that 59 of 64 Vel-negative individuals were homozygous for the same deletion and expression of the Vel antigen on SMIM1-transfected cells confirm SMIM1 as the gene underlying the Vel blood group. An expression quantitative trait locus (eQTL), the common SNP rs1175550 contributes to variable expression of the Vel antigen (P = 0.003) and influences the mean hemoglobin concentration of red blood cells (RBCs; P = 8.6 × 10(-15)). In vivo, zebrafish with smim1 knockdown showed a mild reduction in the number of RBCs, identifying SMIM1 as a new regulator of RBC formation. Our findings are of immediate relevance, as the homozygous presence of the deletion allows the unequivocal identification of Vel-negative blood donors.
Funded by: British Heart Foundation: RG/09/012/28096, RG/09/12/28096; Cancer Research UK: A14953, C45041/A14953; Wellcome Trust: 082597, 082597/Z/07/Z, 084183, 084183/Z/07/Z
Nature genetics 2013;45;5;542-5
Horizontally acquired glycosyltransferase operons drive salmonellae lipopolysaccharide diversity.
Centre for Immunology and Infection, Hull York Medical School and the Department of Biology, University of York, York, UK.
The immunodominant lipopolysaccharide is a key antigenic factor for Gram-negative pathogens such as salmonellae where it plays key roles in host adaptation, virulence, immune evasion, and persistence. Variation in the lipopolysaccharide is also the major differentiating factor that is used to classify Salmonella into over 2600 serovars as part of the Kaufmann-White scheme. While lipopolysaccharide diversity is generally associated with sequence variation in the lipopolysaccharide biosynthesis operon, extraneous genetic factors such as those encoded by the glucosyltransferase (gtr) operons provide further structural heterogeneity by adding additional sugars onto the O-antigen component of the lipopolysaccharide. Here we identify and examine the O-antigen modifying glucosyltransferase genes from the genomes of Salmonella enterica and Salmonella bongori serovars. We show that Salmonella generally carries between 1 and 4 gtr operons that we have classified into 10 families on the basis of gtrC sequence with apparent O-antigen modification detected for five of these families. The gtr operons localize to bacteriophage-associated genomic regions and exhibit a dynamic evolutionary history driven by recombination and gene shuffling events leading to new gene combinations. Furthermore, evidence of Dam- and OxyR-dependent phase variation of gtr gene expression was identified within eight gtr families. Thus, as O-antigen modification generates significant intra- and inter-strain phenotypic diversity, gtr-mediated modification is fundamental in assessing Salmonella strain variability. This will inform appropriate vaccine and diagnostic approaches, in addition to contributing to our understanding of host-pathogen interactions.
Funded by: Wellcome Trust: 076964, 080086MA
PLoS genetics 2013;9;6;e1003568
Structural and functional annotation of the porcine immunome.
USDA-ARS, Beltsville Human Nutrition Research Center, Diet, Genomics, Immunology Laboratory, Beltsville, MD 20705, USA.
Background: The domestic pig is known as an excellent model for human immunology and the two species share many pathogens. Susceptibility to infectious disease is one of the major constraints on swine performance, yet the structure and function of genes comprising the pig immunome are not well-characterized. The completion of the pig genome provides the opportunity to annotate the pig immunome, and compare and contrast pig and human immune systems.
Results: The Immune Response Annotation Group (IRAG) used computational curation and manual annotation of the swine genome assembly 10.2 (Sscrofa10.2) to refine the currently available automated annotation of 1,369 immunity-related genes through sequence-based comparison to genes in other species. Within these genes, we annotated 3,472 transcripts. Annotation provided evidence for gene expansions in several immune response families, and identified artiodactyl-specific expansions in the cathelicidin and type 1 Interferon families. We found gene duplications for 18 genes, including 13 immune response genes and five non-immune response genes discovered in the annotation process. Manual annotation provided evidence for many new alternative splice variants and 8 gene duplications. Over 1,100 transcripts without porcine sequence evidence were detected using cross-species annotation. We used a functional approach to discover and accurately annotate porcine immune response genes. A co-expression clustering analysis of transcriptomic data from selected experimental infections or immune stimulations of blood, macrophages or lymph nodes identified a large cluster of genes that exhibited a correlated positive response upon infection across multiple pathogens or immune stimuli. Interestingly, this gene cluster (cluster 4) is enriched for known general human immune response genes, yet contains many un-annotated porcine genes. A phylogenetic analysis of the encoded proteins of cluster 4 genes showed that 15% exhibited an accelerated evolution as compared to 4.1% across the entire genome.
Conclusions: This extensive annotation dramatically extends the genome-based knowledge of the molecular genetics and structure of a major portion of the porcine immunome. Our complementary functional approach using co-expression during immune response has provided new putative immune response annotation for over 500 porcine genes. Our phylogenetic analysis of this core immunome cluster confirms rapid evolutionary change in this set of genes, and that, as in other species, such genes are important components of the pig's adaptation to pathogen challenge over evolutionary time. These comprehensive and integrated analyses increase the value of the porcine genome sequence and provide important tools for global analyses and data-mining of the porcine immune response.
Funded by: Biotechnology and Biological Sciences Research Council: BB/E010520/1, BB/E010520/2, BB/G004013/1, BB/I025328/1, EC FP6; NCRR NIH HHS: P20-RR017686; NIAID NIH HHS: T32 AI083196, T32 AI83196; Wellcome Trust: 098051
BMC genomics 2013;14;332
Prelamin A causes progeria through cell-extrinsic mechanisms and prevents cancer invasion.
Instituto de Medicina Oncológica y Molecular de Asturias IMOMA, 33193 Oviedo, Spain.
Defining the relationship between ageing and cancer is a crucial but challenging task. Mice deficient in Zmpste24, a metalloproteinase mutated in human progeria and involved in nuclear prelamin A maturation, recapitulate multiple features of ageing. However, their short lifespan and serious cell-intrinsic and cell-extrinsic alterations restrict the application and interpretation of carcinogenesis protocols. Here we present Zmpste24 mosaic mice that lack these limitations. Zmpste24 mosaic mice develop normally and keep similar proportions of Zmpste24-deficient (prelamin A-accumulating) and Zmpste24-proficient (mature lamin A-containing) cells throughout life, revealing that cell-extrinsic mechanisms are preeminent for progeria development. Moreover, prelamin A accumulation does not impair tumour initiation and growth, but it decreases the incidence of infiltrating oral carcinomas. Accordingly, silencing of ZMPSTE24 reduces human cancer cell invasiveness. Our results support the potential of cell-based and systemic therapies for progeria and highlight ZMPSTE24 as a new anticancer target.
Funded by: Wellcome Trust: 079643
Nature communications 2013;4;2268
Mutational genomics for cancer pathway discovery
Lecture Notes in Computer Science 2013;7986;35-46
Identification of heart rate-associated loci and their effects on cardiac conduction and rhythm disorders.
Medical Research Council MRC Epidemiology Unit, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge, UK.
Elevated resting heart rate is associated with greater risk of cardiovascular disease and mortality. In a 2-stage meta-analysis of genome-wide association studies in up to 181,171 individuals, we identified 14 new loci associated with heart rate and confirmed associations with all 7 previously established loci. Experimental downregulation of gene expression in Drosophila melanogaster and Danio rerio identified 20 genes at 11 loci that are relevant for heart rate regulation and highlight a role for genes involved in signal transmission, embryonic cardiac development and the pathophysiology of dilated cardiomyopathy, congenital heart failure and/or sudden cardiac death. In addition, genetic susceptibility to increased heart rate is associated with altered cardiac conduction and reduced risk of sick sinus syndrome, and both heart rate-increasing and heart rate-decreasing variants associate with risk of atrial fibrillation. Our findings provide fresh insights into the mechanisms regulating heart rate and identify new therapeutic targets.
Funded by: British Heart Foundation: PG/12/38/29615; Chief Scientist Office: CZB/4/710; Medical Research Council: G0600705, G0801056, G1000143, G1002084, G9815508, MC_PC_U127561128, MC_U106179471, MC_U106179472, MC_U106179473, MC_U106188470, MC_U123092720, MC_U127592696, MC_UP_A100_1003, MC_UU_12013/1, MC_UU_12013/3; NCATS NIH HHS: UL1 TR000124; NHLBI NIH HHS: K24 HL105780, R00 HL094535, R01 HL090620, R01 HL092217, R01 HL105756, R01 HL111314, U19 HL065962; NIDA NIH HHS: R21 DA026982; NIDDK NIH HHS: P30 DK063491; NIGMS NIH HHS: T32 GM007753; Wellcome Trust: 092731
Nature genetics 2013;45;6;621-31
Activity of a heptad of transcription factors is associated with stem cell programs and clinical outcome in acute myeloid leukemia.
Lowy Cancer Research Centre and the Prince of Wales Clinical School, University of New South Wales, Sydney, Australia.
Aberrant transcriptional programs in combination with abnormal proliferative signaling drive leukemic transformation. These programs operate in normal hematopoiesis where they are involved in hematopoietic stem cell (HSC) proliferation and maintenance. Ets Related Gene (ERG) is a component of normal and leukemic stem cell signatures and high ERG expression is a risk factor for poor prognosis in acute myeloid leukemia (AML). However, mechanisms that underlie ERG expression in AML and how its expression relates to leukemic stemness are unknown. We report that ERG expression in AML is associated with activity of the ERG promoters and +85 stem cell enhancer and a heptad of transcription factors that combinatorially regulate genes in HSCs. Gene expression signatures derived from ERG promoter-stem cell enhancer and heptad activity are associated with clinical outcome when ERG expression alone fails. We also show that the heptad signature is associated with AMLs that lack somatic mutations in NPM1 and confers an adverse prognosis when associated with FLT3 mutations. Taken together, these results suggest that transcriptional regulators cooperate to establish or maintain primitive stem cell-like signatures in leukemic cells and that the underlying pattern of somatic mutations contributes to the development of these signatures and modulate their influence on clinical outcome.
Funded by: Wellcome Trust: 079249, 095663, 100140
Association of HIV and ART with cardiometabolic traits in sub-Saharan Africa: a systematic review and meta-analysis.
Department of Public Health and Primary Care, Institute of Public Health, University of Cambridge, Cambridge, UK, Genetic Epidemiology Group, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK, MRC/UVRI Uganda Research Unit on AIDS, Entebbe, Uganda, Division of Diabetic Medicine and Endocrinology, Department of Medicine, University of Cape Town, Cape Town, South Africa; Chronic Diseases Initiative in Africa, Department of Chemical Pathology, National Health Laboratory Service, University of the Witwatersrand Medical School, Johannesburg, South Africa, Malawi-Liverpool-Wellcome Trust Clinical Research Programme, Blantyre, Malawi, Department of Physiology, Muhimbili University of Health and Allied Sciences, Dar es Salaam, Tanzania, Department of Medicine, Muhimbili University of Health and Allied Sciences, Dar es Salaam, Tanzania, Royal Victoria Teaching Hospital, School of Medicine, University of The Gambia, Banjul, The Gambia, Department of Medicine, Obafemi Awolowo University, Ile Ife, Nigeria, Women's Equity in Access to Care &Treatment, Kigali, Rwanda, HIV-1 Immunopathogenesis Laboratory, Wistar Institute, Philadelphia, PA, Tuberculosis Research Unit, Department of Medicine, Case Western Reserve University, Cleveland, OH, Department of Medical and Surgical Sciences, University of Padua, Padua, Italy, Division of Diabetic Medicine and Endocrinology, Department of Medicine, University of Cape Town, Cape Town, South Africa, Infectious Diseases Unit, Department of Medicine, Grey's Hospital, Pietermaritzburg, South Africa, Department of Clinical Immunology, Aarhus University Hospital, Aarhus, Denmark, HART (Hypertension in Africa Research Team), North-West University, Potchefstroom, South Africa, Department of Nutrition, Exercise and Sports, Faculty of Science, University of Copenhagen, Copenhagen, Denmark, Africa Unit for Transdisciplinary Health Research (AUTHeR), North-West University, Potchefstroom, South Africa, Department of Medicine, Jos University Teachin
Background: Sub-Saharan Africa (SSA) has the highest burden of HIV in the world and a rising prevalence of cardiometabolic disease; however, the interrelationship between HIV, antiretroviral therapy (ART) and cardiometabolic traits is not well described in SSA populations.
Methods: We conducted a systematic review and meta-analysis through MEDLINE and EMBASE (up to January 2012), as well as direct author contact. Eligible studies provided summary or individual-level data on one or more of the following traits in HIV+ and HIV-, or ART+ and ART- subgroups in SSA: body mass index (BMI), systolic blood pressure (SBP), diastolic blood pressure (DBP), high-density lipoprotein (HDL), low-density lipoprotein (LDL), triglycerides (TGs) and fasting blood glucose (FBG) or glycated hemoglobin (HbA1c). Information was synthesized under a random-effects model and the primary outcomes were the standardized mean differences (SMD) of the specified traits between subgroups of participants.
Results: Data were obtained from 49 published and 3 unpublished studies which reported on 29 755 individuals. HIV infection was associated with higher TGs [SMD, 0.26; 95% confidence interval (CI), 0.08 to 0.44] and lower HDL (SMD, -0.59; 95% CI, -0.86 to -0.31), BMI (SMD, -0.32; 95% CI, -0.45 to -0.18), SBP (SMD, -0.40; 95% CI, -0.55 to -0.25) and DBP (SMD, -0.34; 95% CI, -0.51 to -0.17). Among HIV+ individuals, ART use was associated with higher LDL (SMD, 0.43; 95% CI, 0.14 to 0.72) and HDL (SMD, 0.39; 95% CI, 0.11 to 0.66), and lower HbA1c (SMD, -0.34; 95% CI, -0.62 to -0.06). Fully adjusted estimates from analyses of individual participant data were consistent with meta-analysis of summary estimates for most traits.
Conclusions: Broadly consistent with results from populations of European descent, these results suggest differences in cardiometabolic traits between HIV-infected and uninfected individuals in SSA, which might be modified by ART use. In a region with the highest burden of HIV, it will be important to clarify these findings to reliably assess the need for monitoring and managing cardiometabolic risk in HIV-infected populations in SSA.
Funded by: Medical Research Council: G0901213, MR/K013491/1; Wellcome Trust: 098504, 101113
International journal of epidemiology 2013;42;6;1754-71
Back to the future!
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Nature reviews. Microbiology 2013;11;9;600
Histone deacetylase 1 and 2 are essential for normal T-cell development and genomic stability in mice.
Department of Biochemistry, University of Leicester, Leicester, UK.
Histone deacetylase 1 and 2 (HDAC1/2) regulate chromatin structure as the catalytic core of the Sin3A, NuRD and CoREST co-repressor complexes. To better understand the key pathways regulated by HDAC1/2 in the adaptive immune system and inform their exploitation as drug targets, we have generated mice with a T-cell specific deletion. Loss of either HDAC1 or HDAC2 alone has little effect, while dual inactivation results in a 5-fold reduction in thymocyte cellularity, accompanied by developmental arrest at the double-negative to double-positive transition. Transcriptome analysis revealed 892 misregulated genes in Hdac1/2 knock-out thymocytes, including down-regulation of LAT, Themis and Itk, key components of the T-cell receptor (TCR) signaling pathway. Down-regulation of these genes suggests a model in which HDAC1/2 deficiency results in defective propagation of TCR signaling, thus blocking development. Furthermore, mice with reduced HDAC1/2 activity (Hdac1 deleted and a single Hdac2 allele) develop a lethal pathology by 3-months of age, caused by neoplastic transformation of immature T cells in the thymus. Tumor cells become aneuploid, express increased levels of c-Myc and show elevated levels of the DNA damage marker, γH2AX. These data demonstrate a crucial role for HDAC1/2 in T-cell development and the maintenance of genomic stability.
Funded by: Medical Research Council: G0600135, MR/J009202/1; Wellcome Trust: 079643, 095663
Filling out the structural map of the NTF2-like superfamily.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK. email@example.com.
Background: The NTF2-like superfamily is a versatile group of protein domains sharing a common fold. The sequences of these domains are very diverse and they share no common sequence motif. These domains serve a range of different functions within the proteins in which they are found, including both catalytic and non-catalytic versions. Clues to the function of protein domains belonging to such a diverse superfamily can be gleaned from analysis of the proteins and organisms in which they are found.
Results: Here we describe three protein domains of unknown function found mainly in bacteria: DUF3828, DUF3887 and DUF4878. Structures of representatives of each of these domains: BT_3511 from Bacteroides thetaiotaomicron (strain VPI-5482) [PDB:3KZT], Cj0202c from Campylobacter jejuni subsp. jejuni serotype O:2 (strain NCTC 11168) [PDB:3K7C], rumgna_01855) and RUMGNA_01855 from Ruminococcus gnavus (strain ATCC 29149) [PDB:4HYZ] have been solved by X-ray crystallography. All three domains are similar in structure and all belong to the NTF2-like superfamily. Although the function of these domains remains unknown at present, our analysis enables us to present a hypothesis concerning their role.
Conclusions: Our analysis of these three protein domains suggests a potential non-catalytic ligand-binding role. This may regulate the activities of domains with which they are combined in the same polypeptide or via operonic linkages, such as signaling domains (e.g. serine/threonine protein kinase), peptidoglycan-processing hydrolases (e.g. NlpC/P60 peptidases) or nucleic acid binding domains (e.g. Zn-ribbons).
Funded by: Howard Hughes Medical Institute; Intramural NIH HHS; Medical Research Council: MC_U105192716; NIGMS NIH HHS: P41 GM103393, R01 GM101457, U54 GM094586; Wellcome Trust: WT077044/Z/05/Z
BMC bioinformatics 2013;14;327
The GenoChip: a new tool for genetic anthropology.
Department of Mental Health, Johns Hopkins University Bloomberg School of Public Health, USA.
The Genographic Project is an international effort aimed at charting human migratory history. The project is nonprofit and nonmedical, and, through its Legacy Fund, supports locally led efforts to preserve indigenous and traditional cultures. Although the first phase of the project was focused on uniparentally inherited markers on the Y-chromosome and mitochondrial DNA (mtDNA), the current phase focuses on markers from across the entire genome to obtain a more complete understanding of human genetic variation. Although many commercial arrays exist for genome-wide single-nucleotide polymorphism (SNP) genotyping, they were designed for medical genetic studies and contain medically related markers that are inappropriate for global population genetic studies. GenoChip, the Genographic Project's new genotyping array, was designed to resolve these issues and enable higher resolution research into outstanding questions in genetic anthropology. The GenoChip includes ancestry informative markers obtained for over 450 human populations, an ancient human (Saqqaq), and two archaic hominins (Neanderthal and Denisovan) and was designed to identify all known Y-chromosome and mtDNA haplogroups. The chip was carefully vetted to avoid inclusion of medically relevant markers. To demonstrate its capabilities, we compared the FST distributions of GenoChip SNPs to those of two commercial arrays. Although all arrays yielded similarly shaped (inverse J) FST distributions, the GenoChip autosomal and X-chromosomal distributions had the highest mean FST, attesting to its ability to discern subpopulations. The chip performances are illustrated in a principal component analysis for 14 worldwide populations. In summary, the GenoChip is a dedicated genotyping platform for genetic anthropology. With an unprecedented number of approximately 12,000 Y-chromosomal and approximately 3,300 mtDNA SNPs and over 130,000 autosomal and X-chromosomal SNPs without any known health, medical, or phenotypic relevance, the GenoChip is a useful tool for genetic anthropology and population genetics.
Funded by: NIMH NIH HHS: T32 MH014592; Wellcome Trust: 098051
Genome biology and evolution 2013;5;5;1021-31
The 5q31 region in two African populations as a facet of natural selection by infectious diseases.
Unit of Disease and Diversity, Department of Molecular Biology, Institute of Endemic Diseases, University of Khartoum, Khartoum, Sudan. firstname.lastname@example.org
Cases of extreme natural selection could lead either to rapid fixation or extinction of alleles depending on the population structure and size. It may also manifest in excess of heterozygosity and the locus concerned will be displaying such drastic features of allele change. We suspect the 5q31 in chromosome 5 to mirror situation of such extreme natural selection particularly that the region encompasses genes of type 2 cytokine known to associate with a number of infectious and non-infectious diseases. We typed two sets of single nucleotide polymorphisms (SNPS) in two populations: an initial limited set of only 4 SNP within the genes of IL-4, IL-13, IL-5 and IL-9 in 108 unrelated individuals and a replicating set of 14 SN P in 924 individuals from the same populations with disregard to relatedness. The results suggest the 5q31 area to be under intense selective pressure as indicated by marked heterozygosity independent of Linkage Disequilibrium (LD); difference in heterozygosity, allele, and haplotype frequencies between generations and departure from Hardy-Weinberg expectations (DHWE). The study area is endemic for several infectious diseases including malaria and visceral leishmaniasis (VL). Malaria caused by Plasmodiumfalciparum, however, occurs mostly with mild clinical symptoms in all ages, which makes it unlikely to account for these indices. The strong selection signals seems to emanate from recent outbreaks of VL which affected both populations to varying extent.
Evaluation of the genetic overlap between osteoarthritis with body mass index and height using genome-wide association scan data.
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.
Objectives: Obesity as measured by body mass index (BMI) is one of the major risk factors for osteoarthritis. In addition, genetic overlap has been reported between osteoarthritis and normal adult height variation. We investigated whether this relationship is due to a shared genetic aetiology on a genome-wide scale.
Methods: We compared genetic association summary statistics (effect size, p value) for BMI and height from the GIANT consortium genome-wide association study (GWAS) with genetic association summary statistics from the arcOGEN consortium osteoarthritis GWAS. Significance was evaluated by permutation. Replication of osteoarthritis association of the highlighted signals was investigated in an independent dataset. Phenotypic information of height and BMI was accounted for in a separate analysis using osteoarthritis-free controls.
Results: We found significant overlap between osteoarthritis and height (p=3.3×10(-5) for signals with p≤0.05) when the GIANT and arcOGEN GWAS were compared. For signals with p≤0.001 we found 17 shared signals between osteoarthritis and height and four between osteoarthritis and BMI. However, only one of the height or BMI signals that had shown evidence of association with osteoarthritis in the arcOGEN GWAS was also associated with osteoarthritis in the independent dataset: rs12149832, within the FTO gene (combined p=2.3×10(-5)). As expected, this signal was attenuated when we adjusted for BMI.
Conclusions: We found a significant excess of shared signals between both osteoarthritis and height and osteoarthritis and BMI, suggestive of a common genetic aetiology. However, only one signal showed association with osteoarthritis when followed up in a new dataset.
Funded by: Arthritis Research UK: 18030; Medical Research Council: G0100594, G0901461, MC_U122886349; Wellcome Trust: 090532, 098051, WT079557MA
Annals of the rheumatic diseases 2013;72;6;935-41
Systematic evaluation of spliced alignment programs for RNA-seq data.
1] European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK. .
High-throughput RNA sequencing is an increasingly accessible method for studying gene structure and activity on a genome-wide scale. A critical step in RNA-seq data analysis is the alignment of partial transcript reads to a reference genome sequence. To assess the performance of current mapping software, we invited developers of RNA-seq aligners to process four large human and mouse RNA-seq data sets. In total, we compared 26 mapping protocols based on 11 programs and pipelines and found major performance differences between methods on numerous benchmarks, including alignment yield, basewise accuracy, mismatch and gap placement, exon junction discovery and suitability of alignments for transcript reconstruction. We observed concordant results on real and simulated RNA-seq data, confirming the relevance of the metrics employed. Future developments in RNA-seq alignment methods would benefit from improved placement of multimapped reads, balanced utilization of existing gene annotation and a reduced false discovery rate for splice junctions.
Funded by: NHGRI NIH HHS: R01 HG006272, U41 HG007234, U54 HG004555, U54 HG004557, U54HG004555, U54HG004557; Wellcome Trust: 098051, WT09805
Nature methods 2013;10;12;1185-91
The DOT1L rs12982744 polymorphism is associated with osteoarthritis of the hip with genome-wide statistical significance in males.
Funded by: Arthritis Research UK: 18030, 19542; Wellcome Trust: 098051
Annals of the rheumatic diseases 2013;72;7;1264-5
Defining the range of pathogens susceptible to Ifitm3 restriction using a knockout mouse model.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.
The interferon-inducible transmembrane (IFITM) family of proteins has been shown to restrict a broad range of viruses in vitro and in vivo by halting progress through the late endosomal pathway. Further, single nucleotide polymorphisms (SNPs) in its sequence have been linked with risk of developing severe influenza virus infections in humans. The number of viruses restricted by this host protein has continued to grow since it was first demonstrated as playing an antiviral role; all of which enter cells via the endosomal pathway. We therefore sought to test the limits of antimicrobial restriction by Ifitm3 using a knockout mouse model. We showed that Ifitm3 does not impact on the restriction or pathogenesis of bacterial (Salmonella typhimurium, Citrobacter rodentium, Mycobacterium tuberculosis) or protozoan (Plasmodium berghei) pathogens, despite in vitro evidence. However, Ifitm3 is capable of restricting respiratory syncytial virus (RSV) in vivo either through directly restricting RSV cell infection, or by exerting a previously uncharacterised function controlling disease pathogenesis. This represents the first demonstration of a virus that enters directly through the plasma membrane, without the need for the endosomal pathway, being restricted by the IFITM family; therefore further defining the role of these antiviral proteins.
Funded by: Medical Research Council: G0501670, MC_U117581288, U117581288
PloS one 2013;8;11;e80723
The Role of Adiposity in Cardiometabolic Traits: A Mendelian Randomization Analysis.
Molecular Epidemiology and Science for Life Laboratory, Department of Medical Sciences, Uppsala University, Uppsala, Sweden ; Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.
Background: The association between adiposity and cardiometabolic traits is well known from epidemiological studies. Whilst the causal relationship is clear for some of these traits, for others it is not. We aimed to determine whether adiposity is causally related to various cardiometabolic traits using the Mendelian randomization approach. We used the adiposity-associated variant rs9939609 at the FTO locus as an instrumental variable (IV) for body mass index (BMI) in a Mendelian randomization design. Thirty-six population-based studies of individuals of European descent contributed to the analyses. Age- and sex-adjusted regression models were fitted to test for association between (i) rs9939609 and BMI (n = 198,502), (ii) rs9939609 and 24 traits, and (iii) BMI and 24 traits. The causal effect of BMI on the outcome measures was quantified by IV estimators. The estimators were compared to the BMI-trait associations derived from the same individuals. In the IV analysis, we demonstrated novel evidence for a causal relationship between adiposity and incident heart failure (hazard ratio, 1.19 per BMI-unit increase; 95% CI, 1.03-1.39) and replicated earlier reports of a causal association with type 2 diabetes, metabolic syndrome, dyslipidemia, and hypertension (odds ratio for IV estimator, 1.1-1.4; all p<0.05). For quantitative traits, our results provide novel evidence for a causal effect of adiposity on the liver enzymes alanine aminotransferase and gamma-glutamyl transferase and confirm previous reports of a causal effect of adiposity on systolic and diastolic blood pressure, fasting insulin, 2-h post-load glucose from the oral glucose tolerance test, C-reactive protein, triglycerides, and high-density lipoprotein cholesterol levels (all p<0.05). The estimated causal effects were in agreement with traditional observational measures in all instances except for type 2 diabetes, where the causal estimate was larger than the observational estimate (p = 0.001). Conclusions: We provide novel evidence for a causal relationship between adiposity and heart failure as well as between adiposity and increased liver enzymes. Please see later in the article for the Editors' Summary.
PLoS medicine 2013;10;6;e1001474
A method for selectively enriching microbial DNA from contaminating vertebrate host DNA.
New England Biolabs Inc., Ipswich, Massachusetts, United States of America.
DNA samples derived from vertebrate skin, bodily cavities and body fluids contain both host and microbial DNA; the latter often present as a minor component. Consequently, DNA sequencing of a microbiome sample frequently yields reads originating from the microbe(s) of interest, but with a vast excess of host genome-derived reads. In this study, we used a methyl-CpG binding domain (MBD) to separate methylated host DNA from microbial DNA based on differences in CpG methylation density. MBD fused to the Fc region of a human antibody (MBD-Fc) binds strongly to protein A paramagnetic beads, forming an effective one-step enrichment complex that was used to remove human or fish host DNA from bacterial and protistan DNA for subsequent sequencing and analysis. We report enrichment of DNA samples from human saliva, human blood, a mock malaria-infected blood sample and a black molly fish. When reads were mapped to reference genomes, sequence reads aligning to host genomes decreased 50-fold, while bacterial and Plasmodium DNA sequences reads increased 8-11.5-fold. The Shannon-Wiener diversity index was calculated for 149 bacterial species in saliva before and after enrichment. Unenriched saliva had an index of 4.72, while the enriched sample had an index of 4.80. The similarity of these indices demonstrates that bacterial species diversity and relative phylotype abundance remain conserved in enriched samples. Enrichment using the MBD-Fc method holds promise for targeted microbiome sequence analysis across a broad range of sample types.
Funded by: Wellcome Trust: 079355/Z/06/Z, 098051
PloS one 2013;8;10;e76096
FGF signaling inhibition in ESCs drives rapid genome-wide demethylation to the epigenetic ground state of pluripotency.
Epigenetics Programme, The Babraham Institute, Cambridge, CB22 3AT, UK. email@example.com
Genome-wide erasure of DNA methylation takes place in primordial germ cells (PGCs) and early embryos and is linked with pluripotency. Inhibition of Erk1/2 and Gsk3β signaling in mouse embryonic stem cells (ESCs) by small-molecule inhibitors (called 2i) has recently been shown to induce hypomethylation. We show by whole-genome bisulphite sequencing that 2i induces rapid and genome-wide demethylation on a scale and pattern similar to that in migratory PGCs and early embryos. Major satellites, intracisternal A particles (IAPs), and imprinted genes remain relatively resistant to erasure. Demethylation involves oxidation of 5-methylcytosine (5mC) to 5-hydroxymethylcytosine (5hmC), impaired maintenance of 5mC and 5hmC, and repression of the de novo methyltransferases (Dnmt3a and Dnmt3b) and Dnmt3L. We identify a Prdm14- and Nanog-binding cis-acting regulatory region in Dnmt3b that is highly responsive to signaling. These insights provide a framework for understanding how signaling pathways regulate reprogramming to an epigenetic ground state of pluripotency.
Funded by: Biotechnology and Biological Sciences Research Council; Cancer Research UK: 14867; Medical Research Council: G0801156, G0801727; Wellcome Trust: 095645; Worldwide Cancer Research: 12-1172
Cell stem cell 2013;13;3;351-9
Global analysis of the sporulation pathway of Clostridium difficile.
Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont, USA.
The Gram-positive, spore-forming pathogen Clostridium difficile is the leading definable cause of healthcare-associated diarrhea worldwide. C. difficile infections are difficult to treat because of their frequent recurrence, which can cause life-threatening complications such as pseudomembranous colitis. The spores of C. difficile are responsible for these high rates of recurrence, since they are the major transmissive form of the organism and resistant to antibiotics and many disinfectants. Despite the importance of spores to the pathogenesis of C. difficile, little is known about their composition or formation. Based on studies in Bacillus subtilis and other Clostridium spp., the sigma factors σ(F), σ(E), σ(G), and σ(K) are predicted to control the transcription of genes required for sporulation, although their specific functions vary depending on the organism. In order to determine the roles of σ(F), σ(E), σ(G), and σ(K) in regulating C. difficile sporulation, we generated loss-of-function mutations in genes encoding these sporulation sigma factors and performed RNA-Sequencing to identify specific sigma factor-dependent genes. This analysis identified 224 genes whose expression was collectively activated by sporulation sigma factors: 183 were σ(F)-dependent, 169 were σ(E)-dependent, 34 were σ(G)-dependent, and 31 were σ(K)-dependent. In contrast with B. subtilis, C. difficile σ(E) was dispensable for σ(G) activation, σ(G) was dispensable for σ(K) activation, and σ(F) was required for post-translationally activating σ(G). Collectively, these results provide the first genome-wide transcriptional analysis of genes induced by specific sporulation sigma factors in the Clostridia and highlight that diverse mechanisms regulate sporulation sigma factor activity in the Firmicutes.
Funded by: NCRR NIH HHS: P20RR021905; NIGMS NIH HHS: P20 GM103496, P30 GM103498, R00 GM092934, R00GM092934
PLoS genetics 2013;9;8;e1003660
EMu: probabilistic inference of mutational processes and their localization in the cancer genome.
The spectrum of mutations discovered in cancer genomes can be explained by the activity of a few elementary mutational processes. We present a novel probabilistic method, EMu, to infer the mutational signatures of these processes from a collection of sequenced tumors. EMu naturally incorporates the tumor-specific opportunity for different mutation types according to sequence composition. Applying EMu to breast cancer data, we derive detailed maps of the activity of each process, both genome-wide and within specific local regions of the genome. Our work provides new opportunities to study the mutational processes underlying cancer development. EMu is available at http://www.sanger.ac.uk/resources/software/emu/.
Funded by: Wellcome Trust: 088340, 098051
Genome biology 2013;14;4;R39
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK. firstname.lastname@example.org
The Ensembl project (http://www.ensembl.org) provides genome information for sequenced chordate genomes with a particular focus on human, mouse, zebrafish and rat. Our resources include evidenced-based gene sets for all supported species; large-scale whole genome multiple species alignments across vertebrates and clade-specific alignments for eutherian mammals, primates, birds and fish; variation data resources for 17 species and regulation annotations based on ENCODE and other data sets. Ensembl data are accessible through the genome browser at http://www.ensembl.org and through other tools and programmatic interfaces.
Funded by: Biotechnology and Biological Sciences Research Council: BB/I025506/1; NHGRI NIH HHS: U01HG004695, U41HG006104, U54HG004563; Wellcome Trust: 095908, WT062023, WT079643
Nucleic acids research 2013;41;Database issue;D48-55
Global analysis of apicomplexan protein S-acyl transferases reveals an enzyme essential for invasion.
Department of Microbiology and Molecular Medicine, CMU, University of Geneva, Rue Michel-Servet 1, CH-1211, Geneva 4, Switzerland.
The advent of techniques to study palmitoylation on a whole proteome scale has revealed that it is an important reversible modification that plays a role in regulating multiple biological processes. Palmitoylation can control the affinity of a protein for lipid membranes, which allows it to impact protein trafficking, stability, folding, signalling and interactions. The publication of the palmitome of the schizont stage of Plasmodium falciparum implicated a role for palmitoylation in host cell invasion, protein export and organelle biogenesis. However, nothing is known so far about the repertoire of protein S-acyl transferases (PATs) that catalyse this modification in Apicomplexa. We undertook a comprehensive analysis of the repertoire of Asp-His-His-Cys cysteine-rich domain (DHHC-CRD) PAT family in Toxoplasma gondii and Plasmodium berghei by assessing their localization and essentiality. Unlike functional redundancies reported in other eukaryotes, some apicomplexan-specific DHHCs are essential for parasite growth, and several are targeted to organelles unique to this phylum. Of particular interest is DHHC7, which localizes to rhoptry organelles in all parasites tested, including the major human pathogen P. falciparum. TgDHHC7 interferes with the localization of the rhoptry palmitoylated protein TgARO and affects the apical positioning of the rhoptry organelles. This PAT has a major impact on T. gondii host cell invasion, but not on the parasite's ability to egress.
Funded by: Howard Hughes Medical Institute; Medical Research Council: G0501670, G0501670); Wellcome Trust: WT098051
Traffic (Copenhagen, Denmark) 2013;14;8;895-911
Clonal Expansion Analysis of Transposon Insertions by High-Throughput Sequencing Identifies Candidate Cancer Genes in a PiggyBac Mutagenesis Screen.
Department of Neuroscience, Department of Developmental and Regenerative Biology, Department of Neurosurgery, Icahn School of Medicine at Mount, Sinai, New York, New York, United States of America.
Somatic transposon mutagenesis in mice is an efficient strategy to investigate the genetic mechanisms of tumorigenesis. The identification of tumor driving transposon insertions traditionally requires the generation of large tumor cohorts to obtain information about common insertion sites. Tumor driving insertions are also characterized by their clonal expansion in tumor tissue, a phenomenon that is facilitated by the slow and evolving transformation process of transposon mutagenesis. We describe here an improved approach for the detection of tumor driving insertions that assesses the clonal expansion of insertions by quantifying the relative proportion of sequence reads obtained in individual tumors. To this end, we have developed a protocol for insertion site sequencing that utilizes acoustic shearing of tumor DNA and Illumina sequencing. We analyzed various solid tumors generated by PiggyBac mutagenesis and for each tumor >10(6) reads corresponding to >10(4) insertion sites were obtained. In each tumor, 9 to 25 insertions stood out by their enriched sequence read frequencies when compared to frequencies obtained from tail DNA controls. These enriched insertions are potential clonally expanded tumor driving insertions, and thus identify candidate cancer genes. The candidate cancer genes of our study comprised many established cancer genes, but also novel candidate genes such as Mastermind-like1 (Mamld1) and Diacylglycerolkinase delta (Dgkd). We show that clonal expansion analysis by high-throughput sequencing is a robust approach for the identification of candidate cancer genes in insertional mutagenesis screens on the level of individual tumors.
PloS one 2013;8;8;e72338
Global properties and functional complexity of human gene regulatory variation.
Wellcome Trust Sanger Institute, Cambridge, United Kingdom. email@example.com
Identification and functional interpretation of gene regulatory variants is a major focus of modern genomics. The application of genetic mapping to molecular and cellular traits has enabled the detection of regulatory variation on genome-wide scales and revealed an enormous diversity of regulatory architecture in humans and other species. In this review I summarise the insights gained and questions raised by a decade of genetic mapping of gene expression variation. I discuss recent extensions of this approach using alternative molecular phenotypes that have revealed some of the biological mechanisms that drive gene expression variation between individuals. Finally, I highlight outstanding problems and future directions for development.
Funded by: Wellcome Trust: 098051
PLoS genetics 2013;9;5;e1003501
An elephantine viral problem.
This month's Genome Watch highlights how deep sequencing was used to generate the first full genomes of herpesviruses associated with a fatal disease in elephants.
Nature reviews. Microbiology 2013;11;8;512
Restriction of V3 region sequence divergence in the HIV-1 envelope gene during antiretroviral treatment in a cohort of recent seroconverters.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Background: Dynamic changes in Human Immunodeficiency Virus 1 (HIV-1) sequence diversity and divergence are associated with immune control during primary infection and progression to AIDS. Consensus sequencing or single genome amplification sequencing of the HIV-1 envelope (env) gene, in particular the variable (V) regions, is used as a marker for HIV-1 genome diversity, but population diversity is only minimally, or semi-quantitatively sampled using these methods.
Results: Here we use second generation deep sequencing to determine inter-and intra-patient sequence heterogeneity and to quantify minor variants in a cohort of individuals either receiving or not receiving antiretroviral treatment following seroconversion; the SPARTAC trial. We show, through a cross-sectional study of sequence diversity of the env V3 in 30 antiretroviral-naive patients during primary infection that considerable population structure diversity exists, with some individuals exhibiting highly constrained plasma virus diversity. Diversity was independent of clinical markers (viral load, time from seroconversion, CD4 cell count) of infection. Serial sampling over 60 weeks of non-treated individuals that define three initially different diversity profiles showed that complex patterns of continuing HIV-1 sequence diversification and divergence could be readily detected. Evidence for minor sequence turnover, emergence of new variants and re-emergence of archived variants could be inferred from this analysis. Analysis of viral divergence over the same time period in patients who received short (12 weeks, ART12) or long course antiretroviral therapy (48 weeks, ART48) and a non-treated control group revealed that ART48 successfully suppressed viral divergence while ART12 did not have a significant effect.
Conclusions: Deep sequencing is a sensitive and reliable method for investigating the diversity of the env V3 as an important component of HIV-1 genome diversity. Detailed insights into the complex early intra-patient dynamics of env V3 diversity and divergence were explored in antiretroviral-naïve recent seroconverters. Long course antiretroviral therapy, initiated soon after seroconversion and administered for 48 weeks, restricts HIV-1 divergence significantly. The effect of ART12 and ART48 on clinical markers of HIV infection and progression is currently investigated in the SPARTAC trial.
Funded by: NIAID NIH HHS: R01 AI046995; Wellcome Trust
Reprogramming to pluripotency using designer TALE transcription factors targeting enhancers.
Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.
The modular DNA recognition code of the transcription-activator-like effectors (TALEs) from plant pathogenic bacterial genus Xanthomonas provides a powerful genetic tool to create designer transcription factors (dTFs) targeting specific DNA sequences for manipulating gene expression. Previous studies have suggested critical roles of enhancers in gene regulation and reprogramming. Here, we report dTF activator targeting the distal enhancer of the Pou5f1 (Oct4) locus induces epigenetic changes, reactivates its expression, and substitutes exogenous OCT4 in reprogramming mouse embryonic fibroblast cells (MEFs) to induced pluripotent stem cells (iPSCs). Similarly, dTF activator targeting a Nanog enhancer activates Nanog expression and reprograms epiblast stem cells (EpiSCs) to iPSCs. Conversely, dTF repressors targeting the same genetic elements inhibit expression of these loci, and effectively block reprogramming. This study indicates that dTFs targeting specific enhancers can be used to study other biological processes such as transdifferentiation or directed differentiation of stem cells.
Stem cell reports 2013;1;2;183-97
Sleeping Beauty mutagenesis in a mouse medulloblastoma model defines networks that discriminate between human molecular subgroups.
Institute for Molecular Bioscience, The University of Queensland, St. Lucia, QLD 4072, Australia.
The Sleeping Beauty (SB) transposon mutagenesis screen is a powerful tool to facilitate the discovery of cancer genes that drive tumorigenesis in mouse models. In this study, we sought to identify genes that functionally cooperate with sonic hedgehog signaling to initiate medulloblastoma (MB), a tumor of the cerebellum. By combining SB mutagenesis with Patched1 heterozygous mice (Ptch1(lacZ/+)), we observed an increased frequency of MB and decreased tumor-free survival compared with Ptch1(lacZ/+) controls. From an analysis of 85 tumors, we identified 77 common insertion sites that map to 56 genes potentially driving increased tumorigenesis. The common insertion site genes identified in the mutagenesis screen were mapped to human orthologs, which were used to select probes and corresponding expression data from an independent set of previously described human MB samples, and surprisingly were capable of accurately clustering known molecular subgroups of MB, thereby defining common regulatory networks underlying all forms of MB irrespective of subgroup. We performed a network analysis to discover the likely mechanisms of action of subnetworks and used an in vivo model to confirm a role for a highly ranked candidate gene, Nfia, in promoting MB formation. Our analysis implicates candidate cancer genes in the deregulation of apoptosis and translational elongation, and reveals a strong signature of transcriptional regulation that will have broad impact on expression programs in MB. These networks provide functional insights into the complex biology of human MB and identify potential avenues for intervention common to all clinical subgroups.
Funded by: Cancer Research UK: 13031; Wellcome Trust
Proceedings of the National Academy of Sciences of the United States of America 2013;110;46;E4325-34
Gene expression changes with age in skin, adipose tissue, blood and brain.
Background: Previous studies have demonstrated that gene expression levels change with age. These changes are hypothesized to influence the aging rate of an individual. We analyzed gene expression changes with age in abdominal skin, subcutaneous adipose tissue and lymphoblastoid cell lines in 856 female twins in the age range of 39-85 years. Additionally, we investigated genotypic variants involved in genotype-by-age interactions to understand how the genomic regulation of gene expression alters with age.
Results: Using a linear mixed model, differential expression with age was identified in 1,672 genes in skin and 188 genes in adipose tissue. Only two genes expressed in lymphoblastoid cell lines showed significant changes with age. Genes significantly regulated by age were compared with expression profiles in 10 brain regions from 100 postmortem brains aged 16 to 83 years. We identified only one age-related gene common to the three tissues. There were 12 genes that showed differential expression with age in both skin and brain tissue and three common to adipose and brain tissues.
Conclusions: Skin showed the most age-related gene expression changes of all the tissues investigated, with many of the genes being previously implicated in fatty acid metabolism, mitochondrial activity, cancer and splicing. A significant proportion of age-related changes in gene expression appear to be tissue-specific with only a few genes sharing an age effect in expression across tissues. More research is needed to improve our understanding of the genetic influences on aging and the relationship with age-related diseases.
Funded by: Department of Health; Medical Research Council: G0802462, MR/J006742/1, MR/K01417X/1; NIMH NIH HHS: R01 MH090941; Wellcome Trust: 081917, 090532, 095515, WT098051
Genome biology 2013;14;7;R75
Discovery and refinement of loci associated with lipid levels.
1] Department of Internal Medicine, Division of Cardiovascular Medicine, University of Michigan, Ann Arbor, Michigan, USA.  Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA.  Department of Human Genetics, University of Michigan, Ann Arbor, Michigan, USA.  Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.  .
Levels of low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, triglycerides and total cholesterol are heritable, modifiable risk factors for coronary artery disease. To identify new loci and refine known loci influencing these lipids, we examined 188,577 individuals using genome-wide and custom genotyping arrays. We identify and annotate 157 loci associated with lipid levels at P < 5 × 10(-8), including 62 loci not previously associated with lipid levels in humans. Using dense genotyping in individuals of European, East Asian, South Asian and African ancestry, we narrow association signals in 12 loci. We find that loci associated with blood lipid levels are often associated with cardiovascular and metabolic traits, including coronary artery disease, type 2 diabetes, blood pressure, waist-hip ratio and body mass index. Our results demonstrate the value of using genetic data from individuals of diverse ancestry and provide insights into the biological mechanisms regulating blood lipids to guide future genetic, biological and therapeutic research.
Funded by: British Heart Foundation: PG/08/094/26019, RG/08/008/25291, RG/08/014/24067; Chief Scientist Office: CZB/4/672, CZB/4/710; Medical Research Council: G0801566, G0901213, G1000143, MC_U106179471, MC_U106179472, MC_U106188470, MC_U123092720, MC_U950080926, MR/K013351/1, MR/L003120/1; NCATS NIH HHS: UL1 TR000124; NHLBI NIH HHS: R00 HL094535, R01 HL105756, R01 HL109946, U01 HL069757; NIDDK NIH HHS: P30 DK063491, P30 DK072488, P60 DK020541, R01 DK072193; Wellcome Trust: 090532
Nature genetics 2013;45;11;1274-83
Clonal analyses reveal associations of JAK2V617F homozygosity with hematologic features, age and gender in polycythemia vera and essential thrombocythemia.
Subclones homozygous for JAK2V617F are more common and larger in patients with polycythemia vera compared to essential thrombocythemia, but their role in determining phenotype remains unclear. We genotyped 4564 erythroid colonies from 59 patients with polycythemia vera or essential thrombocythemia to investigate whether the proportion of JAK2V617F -homozygous precursors, compared to heterozygous precursors, is associated with clinical or demographic features. In polycythemia vera, a higher proportion of homozygous-mutant precursors was associated with more extreme blood counts at diagnosis, consistent with a causal role for homozygosity in polycythemia vera pathogenesis. Larger numbers of homozygous-mutant colonies were associated with older age, and with male gender in polycythemia vera but female gender in essential thrombocythemia. These results suggest that age promotes development or expansion of homozygous-mutant clones and that gender modulates the phenotypic consequences of JAK2V617F homozygosity, thus providing a potential explanation for the long-standing observations of a preponderance of men with polycythemia vera but of women with essential thrombocythemia.
Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene.
Background: RNA sequencing has opened new avenues for the study of transcriptome composition. Significant evidence has accumulated showing that the human transcriptome contains in excess of a hundred thousand different transcripts. However, it is still not clear to what extent this diversity prevails when considering the relative abundances of different transcripts from the same gene.
Results: Here we show that, in a given condition, most protein coding genes have one major transcript expressed at significantly higher level than others, that in human tissues the major transcripts contribute almost 85 percent to the total mRNA from protein coding loci, and that often the same major transcript is expressed in many tissues. We detect a high degree of overlap between the set of major transcripts and a recently published set of alternatively spliced transcripts that are predicted to be translated utilizing proteomic data. Thus, we hypothesize that although some minor transcripts may play a functional role, the major ones are likely to be the main contributors to the proteome. However, we still detect a non-negligible fraction of protein coding genes for which the major transcript does not code a protein.
Conclusions: Overall, our findings suggest that the transcriptome from protein coding loci is dominated by one transcript per gene and that not all the transcripts that contribute to transcriptome diversity are equally likely to contribute to protein diversity. This observation can help to prioritize candidate targets in proteomics research and to predict the functional impact of the detected changes in variation studies.
Funded by: NHGRI NIH HHS: 5U54HG004555, U41 HG007234, U54 HG004555; Wellcome Trust: 098051
Genome biology 2013;14;7;R70
Computational approaches to identify functional genetic variants in cancer genomes.
Research Unit on Biomedical Informatics, University Pompeu Fabra, Barcelona, Spain.
The International Cancer Genome Consortium (ICGC) aims to catalog genomic abnormalities in tumors from 50 different cancer types. Genome sequencing reveals hundreds to thousands of somatic mutations in each tumor but only a minority of these drive tumor progression. We present the result of discussions within the ICGC on how to address the challenge of identifying mutations that contribute to oncogenesis, tumor maintenance or response to therapy, and recommend computational techniques to annotate somatic variants and predict their impact on cancer phenotype.
Funded by: NCI NIH HHS: R01 CA180778; NHGRI NIH HHS: U01 HG006517, U54 HG003079; Wellcome Trust: 095908
Nature methods 2013;10;8;723-9
Examination of the relationship between variation at 17q21 and childhood wheeze phenotypes.
School of Social and Community Medicine, University of Bristol, Bristol, United Kingdom. firstname.lastname@example.org
Background: Genome-wide association studies have identified associations of genetic variants at 17q21 near ORMDL3 with childhood asthma.
Objectives: We sought to determine whether associations in this region are specific to particular asthma phenotypes and specific to ORMDL3.
Methods: We examined associations between 244 independent single nucleotide polymorphisms (SNPs) plus 13 previously identified asthma-related SNPs in the region between 34 and 36 Mb on chromosome 17 and early wheezing phenotypes, doctor-diagnosed asthma and atopy at 7½ years, and bronchial hyperresponsiveness and lung function at 8½ years in 7045 children from the Avon Longitudinal Study of Parents and Children birth cohort study. With this, cis expression quantitative trait loci signals for the same SNPs were assessed in 875 samples across genes in the same region.
Results: The strongest evidence for phenotypic association was seen for persistent wheezing (rs8076131 near ORMDL3: relative risk ratio [RRR], 1.60 [95% CI, 1.40-1.84], P = 1.4 × 10(-11); rs2305480 near GSDML: RRR, 1.60 [95% CI, 1.39-1.83], P = 1.5 × 10(-11); and rs9303277 near IKZF3: RRR, 1.57 [95% CI, 1.37-1.79], P = 4.4 × 10(-11)). Similar but less precisely estimated effects were seen for intermediate-onset wheeze, but there was little evidence of associations with other wheezing phenotypes. There was some evidence of associations with bronchial hyperresponsiveness. SNPs across the whole region show strong evidence of association with differential levels of expression at GSDML, IKZF3, and MED24, as well as ORMDL3.
Conclusions: Associations of SNPs in the 17q21 locus are specific to asthma and specific wheezing phenotypes and are not explained by associations with intermediate phenotypes, such as atopy or lung function.
Funded by: Medical Research Council: G0401540, G9815508; Wellcome Trust: 092731, WT083431MA
The Journal of allergy and clinical immunology 2013;131;3;685-94
Replication of bipolar disorder susceptibility alleles and identification of two novel genome-wide significant associations in a new bipolar disorder case-control sample.
Department of Psychological Medicine, MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, Heath Park, Cardiff, UK.
We have conducted a genotyping study using a custom Illumina Infinium HD genotyping array, the ImmunoChip, in a new UK sample of 1218 bipolar disorder (BD) cases and 2913 controls that have not been used in any studies previously reported independently or in meta-analyses. The ImmunoChip was designed before the publication of the Psychiatric Genome-Wide Association Study Consortium Bipolar Disorder Working Group (PGC-BD) meta-analysis data. As such 3106 single-nucleotide polymorphisms (SNPs) with a P-value <1 × 10(-3) from the BD meta-analysis by Ferreira et al. were genotyped. We report support for two of the three most strongly associated chromosomal regions in the Ferreira study, CACNA1C (rs1006737, P=4.09 × 10(-4)) and 15q14 (rs2172835, P=0.043) but not ANK3 (rs10994336, P=0.912). We have combined our ImmunoChip data (569 quasi-independent SNPs from the 3016 SNPs genotyped) with the recently published PGC-BD meta-analysis data, using either the PGC-BD combined discovery and replication data where available or just the discovery data where the SNP was not typed in a replication sample in PGC-BD. Our data provide support for two regions, at ODZ4 and CACNA1C, with prior evidence for genome-wide significant (GWS) association in PGC-BD meta-analysis. In addition, the combined analysis shows two novel GWS associations. First, rs7296288 (P=8.97 × 10(-9), odds ratio (OR)=0.9), an intergenic polymorphism on chromosome 12 located between RHEBL1 and DHH. Second, rs3818253 (P=3.88 × 10(-8), OR=1.16), an intronic SNP on chromosome 20q11.2 in the gene TRPC4AP, which lies in a high linkage disequilibrium region along with the genes GSS and MYH7B.
Funded by: Medical Research Council: G0000934; Wellcome Trust: 068545/Z/0, 076113/C/04/Z
Molecular psychiatry 2013;18;12;1302-7
Reduced burden of very large and rare CNVs in bipolar affective disorder.
MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff.
Objectives: Large, rare chromosomal copy number variants (CNVs) have been shown to increase the risk for schizophrenia and other neuropsychiatric disorders including autism, attention-deficit hyperactivity disorder, learning difficulties, and epilepsy. Their role in bipolar disorder (BD) is less clear. There are no reports of an increase in large, rare CNVs in BD in general, but some have reported an increase in early-onset cases. We previously found that the rate of such CNVs in individuals with BD was not increased, even in early-onset cases. Our aim here was to examine the rate of large rare CNVs in BD in comparison with a new large independent reference sample from the same country.
Methods: We studied the CNVs in a case-control sample consisting of 1,650 BD cases (reported previously) and 10,259 reference individuals without a known psychiatric disorder who took part in the original Wellcome Trust Case Control Consortium (WTCCC) study. The 10,259 reference individuals were affected with six non-psychiatric disorders (coronary artery disease, types 1 and 2 diabetes, hypertension, Crohn's disease, and rheumatoid arthritis). Affymetrix 500K array genotyping data were used to call the CNVs.
Results: The rate of CNVs > 100 kb was not statistically different between cases and controls. The rate of very large (defined as > 1 Mb) and rare (< 1%) CNVs was significantly lower in patients with BD compared with the reference group. CNV loci associated with schizophrenia were not enriched in BD and, in fact, cases of BD had the lowest number of such CNVs compared with any of the WTCCC cohorts; this finding held even for the early-onset BD cases.
Conclusions: Schizophrenia and BD differ with respect to CNV burden and association with specific CNVs. Our findings support the hypothesis that BD is etiologically distinct from schizophrenia with respect to large, rare CNVs and the accompanying associated neurodevelopmental abnormalities.
Funded by: Wellcome Trust: 076113, 085475
Bipolar disorders 2013;15;8;893-8
Global analysis of DNA methylation variation in adipose tissue from twins reveals links to disease-associated variants in distal regulatory elements.
Wellcome Trust Sanger Institute, CB101SA Hinxton, UK; Department of Twin Research and Genetic Epidemiology, King's College London, SE17EH London, UK. Electronic address: email@example.com.
Epigenetic modifications such as DNA methylation play a key role in gene regulation and disease susceptibility. However, little is known about the genome-wide frequency, localization, and function of methylation variation and how it is regulated by genetic and environmental factors. We utilized the Multiple Tissue Human Expression Resource (MuTHER) and generated Illumina 450K adipose methylome data from 648 twins. We found that individual CpGs had low variance and that variability was suppressed in promoters. We noted that DNA methylation variation was highly heritable (h(2)median = 0.34) and that shared environmental effects correlated with metabolic phenotype-associated CpGs. Analysis of methylation quantitative-trait loci (metQTL) revealed that 28% of CpGs were associated with nearby SNPs, and when overlapping them with adipose expression quantitative-trait loci (eQTL) from the same individuals, we found that 6% of the loci played a role in regulating both gene expression and DNA methylation. These associations were bidirectional, but there were pronounced negative associations for promoter CpGs. Integration of metQTL with adipose reference epigenomes and disease associations revealed significant enrichment of metQTL overlapping metabolic-trait or disease loci in enhancers (the strongest effects were for high-density lipoprotein cholesterol and body mass index [BMI]). We followed up with the BMI SNP rs713586, a cg01884057 metQTL that overlaps an enhancer upstream of ADCY3, and used bisulphite sequencing to refine this region. Our results showed widespread population invariability yet sequence dependence on adipose DNA methylation but that incorporating maps of regulatory elements aid in linking CpG variation to gene regulation and disease risk in a tissue-dependent manner.
Funded by: Canadian Institutes of Health Research: EP1-120608; Medical Research Council: G9824984, MC_UU_12012/1; Wellcome Trust: 081917/Z/07/Z, 083270/Z/07/Z, 090532, 095515, 098051, 100140
American journal of human genetics 2013;93;5;876-90
Genome-wide diversity in the levant reveals recent structuring by culture.
Institut de Biologia Evolutiva (CSIC-UPF), Departament de Ciències de la Salut i de la Vida, Universitat Pompeu Fabra, Barcelona, Spain.
The Levant is a region in the Near East with an impressive record of continuous human existence and major cultural developments since the Paleolithic period. Genetic and archeological studies present solid evidence placing the Middle East and the Arabian Peninsula as the first stepping-stone outside Africa. There is, however, little understanding of demographic changes in the Middle East, particularly the Levant, after the first Out-of-Africa expansion and how the Levantine peoples relate genetically to each other and to their neighbors. In this study we analyze more than 500,000 genome-wide SNPs in 1,341 new samples from the Levant and compare them to samples from 48 populations worldwide. Our results show recent genetic stratifications in the Levant are driven by the religious affiliations of the populations within the region. Cultural changes within the last two millennia appear to have facilitated/maintained admixture between culturally similar populations from the Levant, Arabian Peninsula, and Africa. The same cultural changes seem to have resulted in genetic isolation of other groups by limiting admixture with culturally different neighboring populations. Consequently, Levant populations today fall into two main groups: one sharing more genetic characteristics with modern-day Europeans and Central Asians, and the other with closer genetic affinities to other Middle Easterners and Africans. Finally, we identify a putative Levantine ancestral component that diverged from other Middle Easterners ∼23,700-15,500 years ago during the last glacial period, and diverged from Europeans ∼15,900-9,100 years ago between the last glacial warming and the start of the Neolithic.
Funded by: PEPFAR: 098051; Wellcome Trust
PLoS genetics 2013;9;2;e1003316
The Role of Salt Bridges, Charge Density, and Subunit Flexibility in Determining Disassembly Routes of Protein Complexes.
Physical and Theoretical Chemistry Laboratory, Department of Chemistry, University of Oxford, Oxford OX1 3QZ, UK.
Mass spectrometry can be used to characterize multiprotein complexes, defining their subunit stoichiometry and composition following solution disruption and collision-induced dissociation (CID). While CID of protein complexes in the gas phase typically results in the dissociation of unfolded subunits, a second atypical route is possible wherein compact subunits or subcomplexes are ejected without unfolding. Because tertiary structure and subunit interactions may be retained, this is the preferred route for structural investigations. How can we influence which pathway is adopted? By studying properties of a series of homomeric and heteromeric protein complexes and varying their overall charge in solution, we found that low subunit flexibility, higher charge densities, fewer salt bridges, and smaller interfaces are likely to be involved in promoting dissociation routes without unfolding. Manipulating the charge on a protein complex therefore enables us to direct dissociation through structurally informative pathways that mimic those followed in solution.
Structure (London, England : 1993) 2013
Contributions of protein-coding and regulatory change to adaptive molecular evolution in murid rodents.
Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom.
The contribution of regulatory versus protein change to adaptive evolution has long been controversial. In principle, the rate and strength of adaptation within functional genetic elements can be quantified on the basis of an excess of nucleotide substitutions between species compared to the neutral expectation or from effects of recent substitutions on nucleotide diversity at linked sites. Here, we infer the nature of selective forces acting in proteins, their UTRs and conserved noncoding elements (CNEs) using genome-wide patterns of diversity in wild house mice and divergence to related species. By applying an extension of the McDonald-Kreitman test, we infer that adaptive substitutions are widespread in protein-coding genes, UTRs and CNEs, and we estimate that there are at least four times as many adaptive substitutions in CNEs and UTRs as in proteins. We observe pronounced reductions in mean diversity around nonsynonymous sites (whether or not they have experienced a recent substitution). This can be explained by selection on multiple, linked CNEs and exons. We also observe substantial dips in mean diversity (after controlling for divergence) around protein-coding exons and CNEs, which can also be explained by the combined effects of many linked exons and CNEs. A model of background selection (BGS) can adequately explain the reduction in mean diversity observed around CNEs. However, BGS fails to explain the wide reductions in mean diversity surrounding exons (encompassing ~100 Kb, on average), implying that there is a substantial role for adaptation within exons or closely linked sites. The wide dips in diversity around exons, which are hard to explain by BGS, suggest that the fitness effects of adaptive amino acid substitutions could be substantially larger than substitutions in CNEs. We conclude that although there appear to be many more adaptive noncoding changes, substitutions in proteins may dominate phenotypic evolution.
Funded by: Cancer Research UK: 13031; Medical Research Council: G0800024; Wellcome Trust
PLoS genetics 2013;9;12;e1003995
Fine mapping of type 1 diabetes regions Idd9.1 and Idd9.2 reveals genetic complexity.
Department of Immunology and Microbial Sciences, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA, 92037, USA.
Nonobese diabetic (NOD) mice congenic for C57BL/10 (B10)-derived genes in the Idd9 region of chromosome 4 are highly protected from type 1 diabetes (T1D). Idd9 has been divided into three protective subregions (Idd9.1, 9.2, and 9.3), each of which partially prevents disease. In this study we have fine-mapped the Idd9.1 and Idd9.2 regions, revealing further genetic complexity with at least two additional subregions contributing to protection from T1D. Using the NOD sequence from bacterial artificial chromosome clones of the Idd9.1 and Idd9.2 regions as well as whole-genome sequence data recently made available, sequence polymorphisms within the regions highlight a high degree of polymorphism between the NOD and B10 strains in the Idd9 regions. Among numerous candidate genes are several with immunological importance. The Idd9.1 region has been separated into Idd9.1 and Idd9.4, with Lck remaining a candidate gene within Idd9.1. One of the Idd9.2 regions contains the candidate genes Masp2 (encoding mannan-binding lectin serine peptidase 2) and Mtor (encoding mammalian target of rapamycin). From mRNA expression analyses, we have also identified several other differentially expressed candidate genes within the Idd9.1 and Idd9.2 regions. These findings highlight that multiple, relatively small genetic effects combine and interact to produce significant changes in immune tolerance and diabetes onset.
Funded by: NIAID NIH HHS: AI 070351, AI 15416, U19AI050864-07; Wellcome Trust: 091157, 100140
Mammalian genome : official journal of the International Mammalian Genome Society 2013;24;9-10;358-75
Mutations in B4GALNT1 (GM2 synthase) underlie a new disorder of ganglioside biosynthesis.
1 Institute of Biomedical and Clinical Science, University of Exeter Medical School, St. Luke's Campus, Heavitree Road, EX1 2LU, Exeter, Devon, UK.
Glycosphingolipids are ubiquitous constituents of eukaryotic plasma membranes, and their sialylated derivatives, gangliosides, are the major class of glycoconjugates expressed by neurons. Deficiencies in their catabolic pathways give rise to a large and well-studied group of inherited disorders, the lysosomal storage diseases. Although many glycosphingolipid catabolic defects have been defined, only one proven inherited disease arising from a defect in ganglioside biosynthesis is known. This disease, because of defects in the first step of ganglioside biosynthesis (GM3 synthase), results in a severe epileptic disorder found at high frequency amongst the Old Order Amish. Here we investigated an unusual neurodegenerative phenotype, most commonly classified as a complex form of hereditary spastic paraplegia, present in families from Kuwait, Italy and the Old Order Amish. Our genetic studies identified mutations in B4GALNT1 (GM2 synthase), encoding the enzyme that catalyzes the second step in complex ganglioside biosynthesis, as the cause of this neurodegenerative phenotype. Biochemical profiling of glycosphingolipid biosynthesis confirmed a lack of GM2 in affected subjects in association with a predictable increase in levels of its precursor, GM3, a finding that will greatly facilitate diagnosis of this condition. With the description of two neurological human diseases involving defects in two sequentially acting enzymes in ganglioside biosynthesis, there is the real possibility that a previously unidentified family of ganglioside deficiency diseases exist. The study of patients and animal models of these disorders will pave the way for a greater understanding of the role gangliosides play in neuronal structure and function and provide insights into the development of effective treatment therapies.
Brain : a journal of neurology 2013;136;Pt 12;3618-24
Whole-genome sequencing for analysis of an outbreak of meticillin-resistant Staphylococcus aureus: a descriptive study.
Wellcome Trust Sanger Institute, Cambridge, UK.
Background: The emergence of meticillin-resistant Staphylococcus aureus (MRSA) that can persist in the community and replace existing hospital-adapted lineages of MRSA means that it is necessary to understand transmission dynamics in terms of hospitals and the community as one entity. We assessed the use of whole-genome sequencing to enhance detection of MRSA transmission between these settings.
Methods: We studied a putative MRSA outbreak on a special care baby unit (SCBU) at a National Health Service Foundation Trust in Cambridge, UK. We used whole-genome sequencing to validate and expand findings from an infection-control team who assessed the outbreak through conventional analysis of epidemiological data and antibiogram profiles. We sequenced isolates from all colonised patients in the SCBU, and sequenced MRSA isolates from patients in the hospital or community with the same antibiotic susceptibility profile as the outbreak strain.
Findings: The hospital infection-control team identified 12 infants colonised with MRSA in a 6 month period in 2011, who were suspected of being linked, but a persistent outbreak could not be confirmed with conventional methods. With whole-genome sequencing, we identified 26 related cases of MRSA carriage, and showed transmission occurred within the SCBU, between mothers on a postnatal ward, and in the community. The outbreak MRSA type was a new sequence type (ST) 2371, which is closely related to ST22, but contains genes encoding Panton-Valentine leucocidin. Whole-genome sequencing data were used to propose and confirm that MRSA carriage by a staff member had allowed the outbreak to persist during periods without known infection on the SCBU and after a deep clean.
Interpretation: Whole-genome sequencing holds great promise for rapid, accurate, and comprehensive identification of bacterial transmission pathways in hospital and community settings, with concomitant reductions in infections, morbidity, and costs.
Funding: UK Clinical Research Collaboration Translational Infection Research Initiative, Wellcome Trust, Health Protection Agency, and the National Institute for Health Research Cambridge Biomedical Research Centre.
Funded by: Biotechnology and Biological Sciences Research Council; Chief Scientist Office; Department of Health; Medical Research Council: G1000803; Wellcome Trust: 098051
The Lancet. Infectious diseases 2013;13;2;130-6
Read and assembly metrics inconsequential for clinical utility of whole-genome sequencing in mapping outbreaks.
Funded by: Medical Research Council: G1000803; Wellcome Trust: 098051
Nature biotechnology 2013;31;7;592-4
VS-5584, a novel and highly selective PI3K/mTOR kinase inhibitor for the treatment of cancer.
S*BIO Pte Ltd., Singapore 117528, Singapore. firstname.lastname@example.org
Dysregulation of the PI3K/mTOR pathway, either through amplifications, deletions, or as a direct result of mutations, has been closely linked to the development and progression of a wide range of cancers. Moreover, this pathway activation is a poor prognostic marker for many tumor types and confers resistance to various cancer therapies. Here, we describe VS-5584, a novel, low-molecular weight compound with equivalent potent activity against mTOR (IC(50) = 37 nmol/L) and all class I phosphoinositide 3-kinase (PI3K) isoforms IC(50): PI3Kα = 16 nmol/L; PI3Kβ = 68 nmol/L; PI3Kγ = 25 nmol/L; PI3Kδ = 42 nmol/L, without relevant activity on 400 lipid and protein kinases. VS-5584 shows robust modulation of cellular PI3K/mTOR pathways, inhibiting phosphorylation of substrates downstream of PI3K and mTORC1/2. A large human cancer cell line panel screen (436 lines) revealed broad antiproliferative sensitivity and that cells harboring mutations in PI3KCA are generally more sensitive toward VS-5584 treatment. VS-5584 exhibits favorable pharmacokinetic properties after oral dosing in mice and is well tolerated. VS-5584 induces long-lasting and dose-dependent inhibition of PI3K/mTOR signaling in tumor tissue, leading to tumor growth inhibition in various rapalog-sensitive and -resistant human xenograft models. Furthermore, VS-5584 is synergistic with an EGF receptor inhibitor in a gastric tumor model. The unique selectivity profile and favorable pharmacologic and pharmaceutical properties of VS-5584 and its efficacy in a wide range of human tumor models supports further investigations of VS-5584 in clinical trials.
Funded by: Wellcome Trust: 093868
Molecular cancer therapeutics 2013;12;2;151-61
Identification of the zebrafish maternal and paternal transcriptomes.
The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.
Transcription is an essential component of basic cellular and developmental processes. However, early embryonic development occurs in the absence of transcription and instead relies upon maternal mRNAs and proteins deposited in the egg during oocyte maturation. Although the early zebrafish embryo is competent to transcribe exogenous DNA, factors present in the embryo maintain genomic DNA in a state that is incompatible with transcription. The cell cycles of the early embryo titrate out these factors, leading to zygotic transcription initiation, presumably in response to a change in genomic DNA chromatin structure to a state that supports transcription. To understand the molecular mechanisms controlling this maternal to zygotic transition, it is important to distinguish between the maternal and zygotic transcriptomes during this period. Here we use exome sequencing and RNA-seq to achieve such discrimination and in doing so have identified the first zygotic genes to be expressed in the embryo. Our work revealed different profiles of maternal mRNA post-transcriptional regulation prior to zygotic transcription initiation. Finally, we demonstrate that maternal mRNAs are required for different modes of zygotic transcription initiation, which is not simply dependent on the titration of factors that maintain genomic DNA in a transcriptionally incompetent state.
Development (Cambridge, England) 2013;140;13;2703-10
A blood pressure genetic risk score is a significant predictor of incident cardiovascular events in 32 669 individuals.
Center for Human Genetic Research, Cardiovascular Research Center, Massachusetts General Hospital, 185 Cambridge St, CPZN 5.242, Boston, MA 02114.
Recent genome-wide association studies have identified genetic variants associated with blood pressure (BP). We investigated whether genetic risk scores (GRSs) constructed of these variants would predict incident cardiovascular disease (CVD) events. We genotyped 32 common single nucleotide polymorphisms in several Finnish cohorts, with up to 32 669 individuals after exclusion of prevalent CVD cases. The median follow-up was 9.8 years, during which 2295 incident CVD events occurred. We created GRSs separately for systolic BP and diastolic BP by multiplying the risk allele count of each single nucleotide polymorphism by the effect size estimated in published genome-wide association studies. We performed Cox regression analyses with and without adjustment for clinical factors, including BP at baseline in each cohort. The results were combined by inverse variance-weighted fixed-effects meta-analysis. The GRSs were strongly associated with systolic BP and diastolic BP, and baseline hypertension (all P<10(-62)). Hazard ratios comparing the highest quintiles of systolic BP and diastolic BP GRSs with the lowest quintiles after adjustment for age, age squared, and sex were 1.25 (1.07-1.46; P=0.006) and 1.23 (1.05-1.43; P=0.01), respectively, for incident coronary heart disease; 1.24 (1.01-1.53; P=0.04) and 1.35 (1.09-1.66; P=0.005), respectively, for incident stroke; and 1.23 (1.08-1.40; P=2×10(-6)) and 1.26 (1.11-1.44; P=5×10(-4)), respectively, for composite CVD. In conclusion, BP findings from genome-wide association studies are strongly replicated. GRSs comprising bona fide BP-single nucleotide polymorphisms predicted CVD risk, consistent with a lifelong effect on BP of these variants collectively.
Funded by: NHLBI NIH HHS: R01 HL098283
Mcl-1 and FBW7 control a dominant survival pathway underlying HDAC and Bcl-2 inhibitor synergy in squamous cell carcinoma.
Massachusetts General Hospital Cancer Center and Harvard Medical School, Boston, Massachusetts 02114, USA.
Effective targeted therapeutics for squamous cell carcinoma (SCC) are lacking. Here, we uncover Mcl-1 as a dominant and tissue-specific survival factor in SCC, providing a roadmap for a new therapeutic approach. Treatment with the histone deacetylase (HDAC) inhibitor vorinostat regulates Bcl-2 family member expression to disable the Mcl-1 axis and thereby induce apoptosis in SCC cells. Although Mcl-1 dominance renders SCC cells resistant to the BH3-mimetic ABT-737, vorinostat primes them for sensitivity to ABT-737 by shuttling Bim from Mcl-1 to Bcl-2/Bcl-xl, resulting in dramatic synergy for this combination and sustained tumor regression in vivo. Moreover, somatic FBW7 mutation in SCC is associated with stabilized Mcl-1 and high Bim levels, resulting in a poor response to standard chemotherapy but a robust response to HDAC inhibitors and enhanced synergy with the combination vorinostat/ABT-737. Collectively, our findings provide a biochemical rationale and predictive markers for the application of this therapeutic combination in SCC.
Funded by: NCI NIH HHS: BC093523; NIDCR NIH HHS: K08 DE020139, NIH KO8 DE-020139, R01 DE015945; Wellcome Trust: 086357, 093868
Cancer discovery 2013;3;3;324-37
Emergence and global spread of epidemic healthcare-associated Clostridium difficile.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.
Epidemic C. difficile (027/BI/NAP1) has rapidly emerged in the past decade as the leading cause of antibiotic-associated diarrhea worldwide. However, the key events in evolutionary history leading to its emergence and the subsequent patterns of global spread remain unknown. Here, we define the global population structure of C. difficile 027/BI/NAP1 using whole-genome sequencing and phylogenetic analysis. We show that two distinct epidemic lineages, FQR1 and FQR2, not one as previously thought, emerged in North America within a relatively short period after acquiring the same fluoroquinolone resistance-conferring mutation and a highly related conjugative transposon. The two epidemic lineages showed distinct patterns of global spread, and the FQR2 lineage spread more widely, leading to healthcare-associated outbreaks in the UK, continental Europe and Australia. Our analysis identifies key genetic changes linked to the rapid transcontinental dissemination of epidemic C. difficile 027/BI/NAP1 and highlights the routes by which it spreads through the global healthcare system.
Funded by: Medical Research Council: 93614, G0901743, G1000214, MR/K000551/1; Wellcome Trust: 086418, 093869, 098051
Nature genetics 2013;45;1;109-13
A genome-wide association study of depressive symptoms.
Research Centre O3, Department of Psychiatry, Erasmus MC, Rotterdam, The Netherlands; Department of Epidemiology, Erasmus MC, Rotterdam, The Netherlands.
Background: Depression is a heritable trait that exists on a continuum of varying severity and duration. Yet, the search for genetic variants associated with depression has had few successes. We exploit the entire continuum of depression to find common variants for depressive symptoms.
Methods: In this genome-wide association study, we combined the results of 17 population-based studies assessing depressive symptoms with the Center for Epidemiological Studies Depression Scale. Replication of the independent top hits (p<1×10(-5)) was performed in five studies assessing depressive symptoms with other instruments. In addition, we performed a combined meta-analysis of all 22 discovery and replication studies.
Results: The discovery sample comprised 34,549 individuals (mean age of 66.5) and no loci reached genome-wide significance (lowest p = 1.05×10(-7)). Seven independent single nucleotide polymorphisms were considered for replication. In the replication set (n = 16,709), we found suggestive association of one single nucleotide polymorphism with depressive symptoms (rs161645, 5q21, p = 9.19×10(-3)). This 5q21 region reached genome-wide significance (p = 4.78×10(-8)) in the overall meta-analysis combining discovery and replication studies (n = 51,258).
Conclusions: The results suggest that only a large sample comprising more than 50,000 subjects may be sufficiently powered to detect genes for depressive symptoms.
Funded by: NCI NIH HHS: 5UO1CA098233, CA49449, CA50385, CA65725, CA67262, CA87969; NCRR NIH HHS: UL1-RR-024156, UL1RR025005, UL1RR033176; NHGRI NIH HHS: U01-HG004402; NHLBI NIH HHS: HL075366, HL080295, HL087652, HL105756, N01 HC-15103, N01 HC-55222, N01-HC-35129, N01-HC-45133, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-65226, N01-HC-75150, N01-HC-85079, N01-HC-85086, N01-HC-85239, N01-HC-95159, N01-HC-95169, N02-HL-6-4278, R01 HL101161, R01-HL087641, R01-HL093029, R01-HL70825; NIA NIH HHS: 1R01AG032098-01A1, AG-023629, AG-027058, AG-15928, AG-20098, AG]916413, K08 AG034290, K08AG34290, N01-AG-1-2109, N01-AG-12100, N01AG62101, N01AG62103, N01AG62106, N01]AG]821336, P30AG10161, R01 AG015819, R01-AG29451, R01AG15819, R01AG17917, R01AG30146, ZIA AG000183-22, ZIA AG000183-23, ZIA AG000196-03, ZIA AG000196-04, ZIA AG000197-03, ZIA AG000197-04; NIDDK NIH HHS: DK063491; NIMH NIH HHS: R01 MH086498; NIMHD NIH HHS: 263 MD 821336, 263 MD 9164 13; PHS HHS: HHSN268200625226C, HHSN268200782096C; Wellcome Trust: WT098051
Biological psychiatry 2013;73;7;667-78
Aberrant 3' oligoadenylation of spliceosomal U6 small nuclear RNA in poikiloderma with neutropenia.
MRC Laboratory of Molecular Biology, Cambridge, UK.
The recessive disorder poikiloderma with neutropenia (PN) is caused by mutations in the C16orf57 gene that encodes the highly conserved USB1 protein. Here, we present the 1.1 Å resolution crystal structure of human USB1, defining it as a member of the LigT-like superfamily of 2H phosphoesterases. We show that human USB1 is a distributive 3'-5' exoribonuclease that posttranscriptionally removes uridine and adenosine nucleosides from the 3' end of spliceosomal U6 small nuclear RNA (snRNA), directly catalyzing terminal 2', 3' cyclic phosphate formation. USB1 measures the appropriate length of the U6 oligo(U) tail by reading the position of a key adenine nucleotide (A102) and pausing 5 uridine residues downstream.We show that the 3' ends of U6 snRNA in PN patient lymphoblasts are elongated and unexpectedly carry nontemplated 3' oligo(A) tails that are characteristic of nuclear RNA surveillance targets. Thus, our study reveals a novel quality control pathway in which posttranscriptional 3'-end processing by USB1 protects U6 snRNA from targeting and destruction by the nuclear exosome. Our data implicate aberrant oligoadenylation of U6 snRNA in the pathogenesis of the leukemia predisposition disorder PN.
Funded by: Medical Research Council: G0800784, U105161083; Wellcome Trust: 079249
Dense genotyping of immune-related disease regions identifies 14 new susceptibility loci for juvenile idiopathic arthritis.
1] Arthritis Research UK Epidemiology Unit, Manchester Academic Health Science Centre, The University of Manchester, Manchester, UK.  National Institute for Health Research Manchester Musculoskeletal Biomedical Research Unit, Central Manchester University Hospitals National Health Service Foundation Trust, Manchester Academic Health Science Centre, Manchester, UK. .
We used the Immunochip array to analyze 2,816 individuals with juvenile idiopathic arthritis (JIA), comprising the most common subtypes (oligoarticular and rheumatoid factor-negative polyarticular JIA), and 13,056 controls. We confirmed association of 3 known JIA risk loci (the human leukocyte antigen (HLA) region, PTPN22 and PTPN2) and identified 14 loci reaching genome-wide significance (P < 5 × 10(-8)) for the first time. Eleven additional new regions showed suggestive evidence of association with JIA (P < 1 × 10(-6)). Dense mapping of loci along with bioinformatics analysis refined the associations to one gene in each of eight regions, highlighting crucial pathways, including the interleukin (IL)-2 pathway, in JIA disease pathogenesis. The entire Immunochip content, the HLA region and the top 27 loci (P < 1 × 10(-6)) explain an estimated 18, 13 and 6% of the risk of JIA, respectively. In summary, this is the largest collection of JIA cases investigated so far and provides new insight into the genetic basis of this childhood autoimmune disease.
Funded by: NIDDK NIH HHS: U01 DK062418
Nature genetics 2013;45;6;664-9
JAK2V617F leads to intrinsic changes in platelet formation and reactivity in a knock-in mouse model of essential thrombocythemia.
National Health System Blood and Transplant, Cambridge, United Kingdom;
The principal morbidity and mortality in patients with essential thrombocythemia (ET) and polycythemia rubra vera (PV) stems from thrombotic events. Most patients with ET/PV harbor a JAK2V617F mutation, but its role in the thrombotic diathesis remains obscure. Platelet function studies in patients are difficult to interpret because of interindividual heterogeneity, reflecting variations in the proportion of platelets derived from the malignant clone, differences in the presence of additional mutations, and the effects of medical treatments. To circumvent these issues, we have studied a JAK2V617F knock-in mouse model of ET in which all megakaryocytes and platelets express JAK2V617F at a physiological level, equivalent to that present in human ET patients. We show that, in addition to increased differentiation, JAK2V617F-positive megakaryocytes display greater migratory ability and proplatelet formation. We demonstrate in a range of assays that platelet reactivity to agonists is enhanced, with a concomitant increase in platelet aggregation in vitro and a reduced duration of bleeding in vivo. These data suggest that JAK2V617F leads to intrinsic changes in both megakaryocyte and platelet biology beyond an increase in cell number. In support of this hypothesis, we identify multiple differentially expressed genes in JAK2V617F megakaryocytes that may underlie the observed biological differences.
Funded by: British Heart Foundation: FS/09/039, FS/09/039/27788; Wellcome Trust: 079249, 100140
A genomic portrait of the emergence, evolution, and global spread of a methicillin-resistant Staphylococcus aureus pandemic.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB19 1SA, United Kingdom.
The widespread use of antibiotics in association with high-density clinical care has driven the emergence of drug-resistant bacteria that are adapted to thrive in hospitalized patients. Of particular concern are globally disseminated methicillin-resistant Staphylococcus aureus (MRSA) clones that cause outbreaks and epidemics associated with health care. The most rapidly spreading and tenacious health-care-associated clone in Europe currently is EMRSA-15, which was first detected in the UK in the early 1990s and subsequently spread throughout Europe and beyond. Using phylogenomic methods to analyze the genome sequences for 193 S. aureus isolates, we were able to show that the current pandemic population of EMRSA-15 descends from a health-care-associated MRSA epidemic that spread throughout England in the 1980s, which had itself previously emerged from a primarily community-associated methicillin-sensitive population. The emergence of fluoroquinolone resistance in this EMRSA-15 subclone in the English Midlands during the mid-1980s appears to have played a key role in triggering pandemic spread, and occurred shortly after the first clinical trials of this drug. Genome-based coalescence analysis estimated that the population of this subclone over the last 20 yr has grown four times faster than its progenitor. Using comparative genomic analysis we identified the molecular genetic basis of 99.8% of the antimicrobial resistance phenotypes of the isolates, highlighting the potential of pathogen genome sequencing as a diagnostic tool. We document the genetic changes associated with adaptation to the hospital environment and with increasing drug resistance over time, and how MRSA evolution likely has been influenced by country-specific drug use regimens.
Funded by: Biotechnology and Biological Sciences Research Council; Chief Scientist Office: CZB/4/717; Medical Research Council: G0800777, MR/K001744/1; PHS HHS: 2 RO1I457838-12; Wellcome Trust: 089472, 098051
Genome research 2013;23;4;653-64
Tracking the establishment of local endemic populations of an emergent enteric pathogen.
Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, VIC 3010, Australia.
Shigella sonnei is a human-adapted pathogen that is emerging globally as the dominant agent of bacterial dysentery. To investigate local establishment, we sequenced the genomes of 263 Vietnamese S. sonnei isolated over 15 y. Our data show that S. sonnei was introduced into Vietnam in the 1980s and has undergone localized clonal expansion, punctuated by genomic fixation events through periodic selective sweeps. We uncover geographical spread, spatially restricted frontier populations, and convergent evolution through local gene pool sampling. This work provides a unique, high-resolution insight into the microevolution of a pioneering human pathogen during its establishment in a new host population.
Funded by: Wellcome Trust: 093724, 098051, 100087
Proceedings of the National Academy of Sciences of the United States of America 2013;110;43;17522-7
The duck genome and transcriptome provide insight into an avian influenza virus reservoir species.
State Key Laboratory for Agrobiotechnology, China Agricultural University, Beijing, China.
The duck (Anas platyrhynchos) is one of the principal natural hosts of influenza A viruses. We present the duck genome sequence and perform deep transcriptome analyses to investigate immune-related genes. Our data indicate that the duck possesses a contractive immune gene repertoire, as in chicken and zebra finch, and this repertoire has been shaped through lineage-specific duplications. We identify genes that are responsive to influenza A viruses using the lung transcriptomes of control ducks and ones that were infected with either a highly pathogenic (A/duck/Hubei/49/05) or a weakly pathogenic (A/goose/Hubei/65/05) H5N1 virus. Further, we show how the duck's defense mechanisms against influenza infection have been optimized through the diversification of its β-defensin and butyrophilin-like repertoires. These analyses, in combination with the genomic and transcriptomic data, provide a resource for characterizing the interaction between host and influenza viruses.
Funded by: Wellcome Trust: 095908
Nature genetics 2013;45;7;776-83
Olfaction and olfactory-mediated behaviour in psychiatric disease models.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
Rats and mice are the most widely used species for modelling psychiatric disease. Assessment of these rodent models typically involves the analysis of aberrant behaviour with behavioural interactions often being manipulated to generate the model. Rodents rely heavily on their excellent sense of smell and almost all their social interactions have a strong olfactory component. Therefore, experimental paradigms that exploit these olfactory-mediated behaviours are among the most robust available and are highly prevalent in psychiatric disease research. These include tests of aggression and maternal instinct, foraging, olfactory memory and habituation and the establishment of social hierarchies. An appreciation of the way that rodents regulate these behaviours in an ethological context can assist experimenters to generate better data from their models and to avoid common pitfalls. We describe some of the more commonly used behavioural paradigms from a rodent olfactory perspective and discuss their application in existing models of psychiatric disease. We introduce the four olfactory subsystems that integrate to mediate the behavioural responses and the types of sensory cue that promote them and discuss their control and practical implementation to improve experimental outcomes. In addition, because smell is critical for normal behaviour in rodents and yet olfactory dysfunction is often associated with neuropsychiatric disease, we introduce some tests for olfactory function that can be applied to rodent models of psychiatric disorders as part of behavioural analysis.
Cell and tissue research 2013;354;1;69-80
Negligible impact of rare autoimmune-locus coding-region variants on missing heritability.
Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK.
Genome-wide association studies (GWAS) have identified common variants of modest-effect size at hundreds of loci for common autoimmune diseases; however, a substantial fraction of heritability remains unexplained, to which rare variants may contribute. To discover rare variants and test them for association with a phenotype, most studies re-sequence a small initial sample size and then genotype the discovered variants in a larger sample set. This approach fails to analyse a large fraction of the rare variants present in the entire sample set. Here we perform simultaneous amplicon-sequencing-based variant discovery and genotyping for coding exons of 25 GWAS risk genes in 41,911 UK residents of white European origin, comprising 24,892 subjects with six autoimmune disease phenotypes and 17,019 controls, and show that rare coding-region variants at known loci have a negligible role in common autoimmune disease susceptibility. These results do not support the rare-variant synthetic genome-wide-association hypothesis (in which unobserved rare causal variants lead to association detected at common tag variants). Many known autoimmune disease risk loci contain multiple, independently associated, common and low-frequency variants, and so genes at these loci are a priori stronger candidates for harbouring rare coding-region variants than other genes. Our data indicate that the missing heritability for common autoimmune diseases may not be attributable to the rare coding-region variant portion of the allelic spectrum, but perhaps, as others have proposed, may be a result of many common-variant loci of weak effect.
REAPR: a universal tool for genome assembly evaluation.
Methods to reliably assess the accuracy of genome sequence data are lacking. Currently completeness is only described qualitatively and mis-assemblies are overlooked. Here we present REAPR, a tool that precisely identifies errors in genome assemblies without the need for a reference sequence. We have validated REAPR on complete genomes or de novo assemblies from bacteria, malaria and Caenorhabditis elegans, and demonstrate that 86% and 82% of the human and mouse reference genomes are error-free, respectively. When applied to an ongoing genome project, REAPR provides corrected assembly statistics allowing the quantitative comparison of multiple assemblies. REAPR is available at http://www.sanger.ac.uk/resources/software/reapr/.
Funded by: Wellcome Trust: 082130/Z/07/Z, 098051
Genome biology 2013;14;5;R47
LUD, a new protein domain associated with lactate utilization.
Joint Center for Structural Genomics, La Jolla, CA 92037, USA. email@example.com.
Background: A novel highly conserved protein domain, DUF162 [Pfam: PF02589], can be mapped to two proteins: LutB and LutC. Both proteins are encoded by a highly conserved LutABC operon, which has been implicated in lactate utilization in bacteria. Based on our analysis of its sequence, structure, and recent experimental evidence reported by other groups, we hereby redefine DUF162 as the LUD domain family.
Results: JCSG solved the first crystal structure [PDB:2G40] from the LUD domain family: LutC protein, encoded by ORF DR_1909, of Deinococcus radiodurans. LutC shares features with domains in the functionally diverse ISOCOT superfamily. We have observed that the LUD domain has an increased abundance in the human gut microbiome.
Conclusions: We propose a model for the substrate and cofactor binding and regulation in LUD domain. The significance of LUD-containing proteins in the human gut microbiome, and the implication of lactate metabolism in the radiation-resistance of Deinococcus radiodurans are discussed.
Funded by: NIGMS NIH HHS: P41 GM103393, U54 GM094586; Wellcome Trust: WT077044/Z/05/Z
BMC bioinformatics 2013;14;341
Astroglial IFITM3 mediates neuronal impairments following neonatal immune challenge in mice.
Department of Neuropsychopharmacology and Hospital Pharmacy, Nagoya University Graduate School of Medicine, Nagoya, Japan.
Interferon-induced transmembrane protein 3 (IFITM3) ıplays a crucial role in the antiviral responses of Type I interferons (IFNs). The role of IFITM3 in the central nervous system (CNS) is, however, largely unknown, despite the fact that its expression is increased in the brains of patients with neurologic and neuropsychiatric diseases. Here, we show the role of IFITM3 in long-lasting neuronal impairments in mice following polyriboinosinic-polyribocytidylic acid (polyI:C, a synthetic double-stranded RNA)-induced immune challenge during the early stages of development. We found that the induction of IFITM3 expression in the brain of mice treated with polyI:C was observed only in astrocytes. Cultured astrocytes were activated by polyI:C treatment, leading to an increase in the mRNA levels of inflammatory cytokines as well as Ifitm3. When cultured neurons were treated with the conditioned medium of polyI:C-treated astrocytes (polyI:C-ACM), neurite development was impaired. These polyI:C-ACM-induced neurodevelopmental abnormalities were alleviated by ifitm3(-/-) astrocyte-conditioned medium. Furthermore, decreases of MAP2 expression, spine density, and dendrite complexity in the frontal cortex as well as memory impairment were evident in polyI:C-treated wild-type mice, but such neuronal impairments were not observed in ifitm3(-) (/) (-) mice. We also found that IFITM3 proteins were localized to the early endosomes of astrocytes following polyI:C treatment and reduced endocytic activity. These findings suggest that the induction of IFITM3 expression in astrocytes by the activation of the innate immune system during the early stages of development has non-cell autonomous effects that affect subsequent neurodevelopment, leading to neuropathological impairments and brain dysfunction, by impairing endocytosis in astrocytes.
Funded by: Cancer Research UK: 13031; Wellcome Trust: 092096
The role of high-throughput technologies in clinical cancer genomics.
Department of Hematology/Oncology, Cambridge University NHS Hospitals Foundation Trust, Cambridge, CB2 0QQ, UK.
Cancer is a genetic disease driven by both heritable and somatic alterations in DNA, which underpin not only oncogenesis but also progression and eventual metastasis. The major impetus for elucidating the nature and function of somatic mutations in cancer genomes is the potential for the development of effective targeted anticancer therapies. Over the last decade, high-throughput technologies have allowed us unprecedented access to a host of cancer genomes, leading to an influx of new information about their pathobiology. The challenge now is to integrate such emerging information into clinical practice to achieve tangible benefits for cancer patients. This review examines the roles array-based comparative genomic hybridization and next-generation sequencing are playing in furthering our understanding of both hematological and solid-organ tumors. Furthermore, the authors discuss the current challenges in translating the role of these technologies from bench to bedside.
Funded by: Wellcome Trust: 095663
Expert review of molecular diagnostics 2013;13;2;167-81
Inferring genome-wide recombination landscapes from advanced intercross lines: application to yeast crosses.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.
Accurate estimates of recombination rates are of great importance for understanding evolution. In an experimental genetic cross, recombination breaks apart and rejoins genetic material, such that the genomes of the resulting isolates are comprised of distinct blocks of differing parental origin. We here describe a method exploiting this fact to infer genome-wide recombination profiles from sequenced isolates from an advanced intercross line (AIL). We verified the accuracy of the method against simulated data. Next, we sequenced 192 isolates from a twelve-generation cross between West African and North American yeast Saccharomyces cerevisiae strains and inferred the underlying recombination landscape at a fine genomic resolution (mean segregating site distance 0.22 kb). Comparison was made with landscapes inferred for a similar cross between four yeast strains, and with a previous single-generation, intra-strain cross (Mancera et al., Nature 2008). Moderate congruence was identified between landscapes (correlation 0.58-0.77 at 5 kb resolution), albeit with variance between mean genome-wide recombination rates. The multiple generations of mating undergone in the AILs gave more precise inference of recombination rates than could be achieved from a single-generation cross, in particular in identifying recombination cold-spots. The recombination landscapes we describe have particular utility; both AILs are part of a resource to study complex yeast traits (see e.g. Parts et al., Genome Res 2011). Our results will enable future applications of this resource to take better account of local linkage structure heterogeneities. Our method has general applicability to other crossing experiments, including a variety of experimental designs.
PloS one 2013;8;5;e62266
Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis.
1] John P. Hussman Institute for Human Genomics, University of Miami, Miller School of Medicine, Miami, Florida, USA. .
Using the ImmunoChip custom genotyping array, we analyzed 14,498 subjects with multiple sclerosis and 24,091 healthy controls for 161,311 autosomal variants and identified 135 potentially associated regions (P < 1.0 × 10(-4)). In a replication phase, we combined these data with previous genome-wide association study (GWAS) data from an independent 14,802 subjects with multiple sclerosis and 26,703 healthy controls. In these 80,094 individuals of European ancestry, we identified 48 new susceptibility variants (P < 5.0 × 10(-8)), 3 of which we found after conditioning on previously identified variants. Thus, there are now 110 established multiple sclerosis risk variants at 103 discrete loci outside of the major histocompatibility complex. With high-resolution Bayesian fine mapping, we identified five regions where one variant accounted for more than 50% of the posterior probability of association. This study enhances the catalog of multiple sclerosis risk variants and illustrates the value of fine mapping in the resolution of GWAS signals.
Funded by: Chief Scientist Office: CZB/4/710; Medical Research Council: G0000934, G0700061; Multiple Sclerosis Society: 862, 894, 898, 955; NCI NIH HHS: R01 CA104021; NCRR NIH HHS: UL1 RR024975; NIAID NIH HHS: R01 AI076544; NIEHS NIH HHS: R01 ES017080; NIGMS NIH HHS: RC2 GM093080; NINDS NIH HHS: R01 NS026799, R01 NS032830, R01 NS049477, R01 NS049510, R01 NS067305, RC2 NS070340; Wellcome Trust: 068545, 085475DONNELLY, 085475PELTONEN, 090532, 095552, 098051
Nature genetics 2013;45;11;1353-60
Network based elucidation of drug response: from modulators to targets.
Telethon Institute of Genetics and Medicine, Naples, Italy. firstname.lastname@example.org.
: Network-based drug discovery aims at harnessing the power of networks to investigate the mechanism of action of existing drugs, or new molecules, in order to identify innovative therapeutic treatments. In this review, we describe some of the most recent advances in the field of network pharmacology, starting with approaches relying on computational models of transcriptional networks, then moving to protein and signaling network models and concluding with "drug networks". These networks are derived from different sources of experimental data, or literature-based analysis, and provide a complementary view of drug mode of action. Molecular and drug networks are powerful integrated computational and experimental approaches that will likely speed up and improve the drug discovery process, once fully integrated into the academic and industrial drug discovery pipeline.
BMC systems biology 2013;7;139
iAnn: an event sharing platform for the life sciences.
EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.
Summary: We present iAnn, an open source community-driven platform for dissemination of life science events, such as courses, conferences and workshops. iAnn allows automatic visualisation and integration of customised event reports. A central repository lies at the core of the platform: curators add submitted events, and these are subsequently accessed via web services. Thus, once an iAnn widget is incorporated into a website, it permanently shows timely relevant information as if it were native to the remote site. At the same time, announcements submitted to the repository are automatically disseminated to all portals that query the system. To facilitate the visualization of announcements, iAnn provides powerful filtering options and views, integrated in Google Maps and Google Calendar. All iAnn widgets are freely available.
Bioinformatics (Oxford, England) 2013;29;15;1919-21
The CD225 domain of IFITM3 is required for both IFITM protein association and inhibition of influenza A virus and dengue virus replication.
Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, Worcester, Massachusetts, USA.
The interferon-induced transmembrane protein 3 (IFITM3) gene is an interferon-stimulated gene that inhibits the replication of multiple pathogenic viruses in vitro and in vivo. IFITM3 is a member of a large protein superfamily, whose members share a functionally undefined area of high amino acid conservation, the CD225 domain. We performed mutational analyses of IFITM3 and identified multiple residues within the CD225 domain, consisting of the first intramembrane domain (intramembrane domain 1 [IM1]) and a conserved intracellular loop (CIL), that are required for restriction of both influenza A virus (IAV) and dengue virus (DENV) infection in vitro. Two phenylalanines within IM1 (F75 and F78) also mediate a physical association between IFITM proteins, and the loss of this interaction decreases IFITM3-mediated restriction. By extension, similar IM1-mediated associations may contribute to the functions of additional members of the CD225 domain family. IFITM3's distal N-terminal domain is also needed for full antiviral activity, including a tyrosine (Y20), whose alteration results in mislocalization of a portion of IFITM3 to the cell periphery and surface. Comparative analyses demonstrate that similar molecular determinants are needed for IFITM3's restriction of both IAV and DENV. However, a portion of the CIL including Y99 and R87 is preferentially needed for inhibition of the orthomyxovirus. Several IFITM3 proteins engineered with rare single-nucleotide polymorphisms demonstrated reduced expression or mislocalization, and these events were associated with enhanced viral replication in vitro, suggesting that possessing such alleles may impact an individual's risk for viral infection. On the basis of this and other data, we propose a model for IFITM3-mediated restriction.
Funded by: Howard Hughes Medical Institute; NIAID NIH HHS: 1R01AI091786, R01 AI091786; Wellcome Trust
Journal of virology 2013;87;14;7837-52
Presynaptic maturation in auditory hair cells requires a critical period of sensory-independent spiking activity.
Department of Biomedical Science, University of Sheffield, Sheffield S10 2TN, United Kingdom.
The development of neural circuits relies on spontaneous electrical activity that occurs during immature stages of development. In the developing mammalian auditory system, spontaneous calcium action potentials are generated by inner hair cells (IHCs), which form the primary sensory synapse. It remains unknown whether this electrical activity is required for the functional maturation of the auditory system. We found that sensory-independent electrical activity controls synaptic maturation in IHCs. We used a mouse model in which the potassium channel SK2 is normally overexpressed, but can be modulated in vivo using doxycycline. SK2 overexpression affected the frequency and duration of spontaneous action potentials, which prevented the development of the Ca(2+)-sensitivity of vesicle fusion at IHC ribbon synapses, without affecting their morphology or general cell development. By manipulating the in vivo expression of SK2 channels, we identified the "critical period" during which spiking activity influences IHC synaptic maturation. Here we provide direct evidence that IHC development depends upon a specific temporal pattern of calcium spikes before sound-driven neuronal activity.
Proceedings of the National Academy of Sciences of the United States of America 2013;110;21;8720-5
Open science and community norms: Data retention and publication moratoria policies in genomics projects
Medical Law International 2013;12;2;92-120
A sequence variant associated with sortilin-1 (SORT1) on 1p13.3 is independently associated with abdominal aortic aneurysm.
Abdominal aortic aneurysm (AAA) is a common human disease with a high estimated heritability (0.7); however, only a small number of associated genetic loci have been reported to date. In contrast, over 100 loci have now been reproducibly associated with either blood lipid profile and/or coronary artery disease (CAD) (both risk factors for AAA) in large-scale meta-analyses. This study employed a staged design to investigate whether the loci for these two phenotypes are also associated with AAA. Validated CAD and dyslipidaemia loci underwent screening using the Otago AAA genome-wide association data set. Putative associations underwent staged secondary validation in 10 additional cohorts. A novel association between the SORT1 (1p13.3) locus and AAA was identified. The rs599839 G allele, which has been previously associated with both dyslipidaemia and CAD, reached genome-wide significance in 11 combined independent cohorts (meta-analysis with 7048 AAA cases and 75 976 controls: G allele OR 0.81, 95% CI 0.76-0.85, P = 7.2 × 10(-14)). Modelling for confounding interactions of concurrent dyslipidaemia, heart disease and other risk factors suggested that this marker is an independent predictor of AAA susceptibility. In conclusion, a genetic marker associated with cardiovascular risk factors, and in particular concurrent vascular disease, appeared to independently contribute to susceptibility for AAA. Given the potential genetic overlap between risk factor and disease phenotypes, the use of well-characterized case-control cohorts allowing for modelling of cardiovascular disease risk confounders will be an important component in the future discovery of genetic markers for conditions such as AAA.
Funded by: NHLBI NIH HHS: R01 HL064310
Human molecular genetics 2013;22;14;2941-7
Using genetic prediction from known complex disease Loci to guide the design of next-generation sequencing experiments.
Medical Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom ; Wellcome Trust Centre for Human Genetics, Univeristy of Oxford, Oxford, United Kingdom.
A central focus of complex disease genetics after genome-wide association studies (GWAS) is to identify low frequency and rare risk variants, which may account for an important fraction of disease heritability unexplained by GWAS. A profusion of studies using next-generation sequencing are seeking such risk alleles. We describe how already-known complex trait loci (largely from GWAS) can be used to guide the design of these new studies by selecting cases, controls, or families who are most likely to harbor undiscovered risk alleles. We show that genetic risk prediction can select unrelated cases from large cohorts who are enriched for unknown risk factors, or multiply-affected families that are more likely to harbor high-penetrance risk alleles. We derive the frequency of an undiscovered risk allele in selected cases and controls, and show how this relates to the variance explained by the risk score, the disease prevalence and the population frequency of the risk allele. We also describe a new method for informing the design of sequencing studies using genetic risk prediction in large partially-genotyped families using an extension of the Inside-Outside algorithm for inference on trees. We explore several study design scenarios using both simulated and real data, and show that in many cases genetic risk prediction can provide significant increases in power to detect low-frequency and rare risk alleles. The same approach can also be used to aid discovery of non-genetic risk factors, suggesting possible future utility of genetic risk prediction in conventional epidemiology. Software implementing the methods in this paper is available in the R package Mangrove.
PloS one 2013;8;10;e76328
Near in place linear time minimum redundancy coding
Data Compression Conference Proceedings 2013;411-20
Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research.
Institute for Medical and Human Genetics, Chairité-Universitatsmedizin Berlin, Berlin, 13353, Germany ; Berlin-Brandenberg Center for Regenerative Therapies (BCRT), Charité-Universitatsmedizin Berlin, Berlin, 13352, Germany.
Phenotype analyses, e.g. investigating metabolic processes, tissue formation, or organism behavior, are an important element of most biological and medical research activities. Biomedical researchers are making increased use of ontological standards and methods to capture the results of such analyses, with one focus being the comparison and analysis of phenotype information between species. We have generated a cross-species phenotype ontology for human, mouse and zebra fish that contains zebrafish phenotypes. We also provide up-to-date annotation data connecting human genes to phenotype classes from the generated ontology. We have included the data generation pipeline into our continuous integration system ensuring stable and up-to-date releases. This article describes the data generation process and is intended to help interested researchers access both the phenotype annotation data and the associated cross-species phenotype ontology. The resource described here can be used in sophisticated semantic similarity and gene set enrichment analyses for phenotype data across species. The stable releases of this resource can be obtained from http://purl.obolibrary.org/obo/hp/uberpheno/.
Whole-genome sequencing for rapid susceptibility testing of M. tuberculosis.
Funded by: Biotechnology and Biological Sciences Research Council; Chief Scientist Office: G1000803; Department of Health; Medical Research Council: G1000803; Wellcome Trust: WT098051
The New England journal of medicine 2013;369;3;290-2
Consequences of whiB7 (Rv3197A) Mutations in Beijing Genotype Isolates of the Mycobacterium tuberculosis Complex.
Public Health England, Cambridge, United Kingdom.
Antimicrobial agents and chemotherapy 2013;57;7;3461
Genome-wide association analyses identify 18 new loci associated with serum urate concentrations.
Renal Division, Freiburg University Hospital, Freiburg, Germany. email@example.com
Elevated serum urate concentrations can cause gout, a prevalent and painful inflammatory arthritis. By combining data from >140,000 individuals of European ancestry within the Global Urate Genetics Consortium (GUGC), we identified and replicated 28 genome-wide significant loci in association with serum urate concentrations (18 new regions in or near TRIM46, INHBB, SFMBT1, TMEM171, VEGFA, BAZ1B, PRKAG2, STC1, HNF4G, A1CF, ATXN2, UBE2Q2, IGF1R, NFAT5, MAF, HLF, ACVR1B-ACVRL1 and B3GNT4). Associations for many of the loci were of similar magnitude in individuals of non-European ancestry. We further characterized these loci for associations with gout, transcript expression and the fractional excretion of urate. Network analyses implicate the inhibins-activins signaling pathways and glucose metabolism in systemic urate control. New candidate genes for serum urate concentration highlight the importance of metabolic control of urate production and excretion, which may have implications for the treatment and prevention of gout.
Funded by: Biotechnology and Biological Sciences Research Council: BB/F019394/1; Cancer Research UK: 12076, 14136; Chief Scientist Office: CZB/4/710; Intramural NIH HHS: Z01 AG000954-06; Medical Research Council: G0600237, G0700704, G1000143, G1002084, G9521010, MC_PC_U127561128, MC_U106179471, MC_U106188470, MC_U127527198, MC_U127592696, MR/K006584/1, MR/K026992/1; NCATS NIH HHS: UL1 TR000124; NCI NIH HHS: P01 CA087969, R01 CA047988; NCRR NIH HHS: K12 RR023250, M01 RR016500, UL1 RR025005; NHGRI NIH HHS: U01 HG004402, U01 HG004424, U01 HG004446, U01 HG004729; NHLBI NIH HHS: HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, HHSN268201100012C, HHSN268201200036C, N01 HC025195, N01 HC045134, N01 HC095170, N01 HC095171, N01 HC095172, N01HC05187, N01HC45204, N01HC45205, N01HC48047, N01HC48048, N01HC48049, N01HC48050, N01HC55222, N01HC75150, N01HC85079, N01HC85086, N01HC95095, N02HL64278, R01 HL043851, R01 HL059367, R01 HL080295, R01 HL084099, R01 HL086694, R01 HL087641, R01 HL087652, R01 HL088119, R01 HL105756, T32 HL007024, U01 HL069757, U01 HL072515, U01 HL084756; NIA NIH HHS: N01AG12100, N01AG12109, R01 AG015928, R01 AG018728, R01 AG020098, R01 AG023629, R01 AG027058; NIAAA NIH HHS: K05 AA017688, P50 AA011998, R01 AA007535, R01 AA013320, R01 AA013321, R01 AA013326, R01 AA014041; NIAMS NIH HHS: P60 AR047785, R01 AR056291, R21 AR056042; NIDA NIH HHS: R01 DA012854; NIDDK NIH HHS: P30 DK063491, P30 DK072488, P30 DK079637, P60 DK079637; NIGMS NIH HHS: U01 GM074518; NIMH NIH HHS: R01 MH066206; Wellcome Trust: 090532
Nature genetics 2013;45;2;145-54
KAT5 tyrosine phosphorylation couples chromatin sensing to ATM signalling.
The Gurdon Institute and Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, UK.
The detection of DNA lesions within chromatin represents a critical step in cellular responses to DNA damage. However, the regulatory mechanisms that couple chromatin sensing to DNA-damage signalling in mammalian cells are not well understood. Here we show that tyrosine phosphorylation of the protein acetyltransferase KAT5 (also known as TIP60) increases after DNA damage in a manner that promotes KAT5 binding to the histone mark H3K9me3. This triggers KAT5-mediated acetylation of the ATM kinase, promoting DNA-damage-checkpoint activation and cell survival. We also establish that chromatin alterations can themselves enhance KAT5 tyrosine phosphorylation and ATM-dependent signalling, and identify the proto-oncogene c-Abl as a mediator of this modification. These findings define KAT5 tyrosine phosphorylation as a key event in the sensing of genomic and chromatin perturbations, and highlight a key role for c-Abl in such processes.
Funded by: Cancer Research UK: 11224, A11224, C6/A11224; European Research Council: 268536; Wellcome Trust: 092096, WT092096
Unusual features in organisation of capsular polysaccharide-related genes of C. jejuni strain X.
School of Life Sciences, Kingston University, Faculty of Science, Engineering and Computing, Penrhyn Road, Kingston-upon Thames, KT1 2EE, UK. firstname.lastname@example.org
PCR probing of the genome of Campylobacter jejuni strain X using conserved capsular polysaccharide (CPS)-related genes allowed elucidation of a complete sequence of the respective gene cluster (cps). This is the largest known Campylobacter cps cluster (38 kb excluding flanking kps regions), which includes a number of genes not detected in other Campylobacter strains. Sequence analysis suggests genetic rearrangements both within and outside the cps gene cluster, a mechanism which may be responsible for mosaic organisation of sugar transferase-related genes leading to structural variability of the capsular polysaccharide (CPS).
Funded by: Biotechnology and Biological Sciences Research Council: EGA16167
Human melioidosis, Malawi, 2011.
A case of human melioidosis caused by a novel sequence type of Burkholderia pseudomallei occurred in a child in Malawi, southern Africa. A literature review showed that human cases reported from the continent have been increasing.
Emerging infectious diseases 2013;19;6;981-4
Activation of the B cell antigen receptor triggers reactivation of latent Kaposi's sarcoma-associated herpesvirus in B cells.
Institute of Virology, Hanover Medical School, Hanover, Germany.
Kaposi's sarcoma-associated herpesvirus (KSHV) is an oncogenic herpesvirus and the cause of Kaposi's sarcoma, primary effusion lymphoma (PEL) and multicentric Castleman's disease. Latently infected B cells are the main reservoir of this virus in vivo, but the nature of the stimuli that lead to its reactivation in B cells is only partially understood. We established stable BJAB cell lines harboring latent KSHV by cell-free infection with recombinant virus carrying a puromycin resistance marker. Our latently infected B cell lines, termed BrK.219, can be reactivated by triggering the B cell receptor (BCR) with antibodies to surface IgM, a stimulus imitating antigen recognition. Using this B cell model system we studied the mechanisms that mediate the reactivation of KSHV in B cells following the stimulation of the BCR and could identify phosphatidylinositol 3-kinase (PI3K) and X-box binding protein 1 (XBP-1) as proteins that play an important role in the BCR-mediated reactivation of latent KSHV.
Journal of virology 2013;87;14;8004-16
RetroSeq: transposable element discovery from next-generation sequencing data.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK. email@example.com
Unlabelled: A significant proportion of eukaryote genomes consist of transposable element (TE)-derived sequence. These elements are known to have the capacity to modulate gene function and genome evolution. We have developed RetroSeq for detecting non-reference TE insertions from Illumina paired-end whole-genome sequencing data. We evaluate RetroSeq on a human trio from the 1000 Genomes Project, showing that it produces highly accurate TE calls.
Availabilty: RetroSeq is open-source and available from https://github.com/tk2/RetroSeq.
Funded by: Cancer Research UK: 13031; Medical Research Council: G0800024; Wellcome Trust
Bioinformatics (Oxford, England) 2013;29;3;389-90
Different patterns of Epstein-Barr virus latency in endemic Burkitt lymphoma (BL) lead to distinct variants within the BL-associated gene expression signature.
School of Cancer Sciences, College of Medical and Dental Sciences, University of Birmingham, Edgbaston, Birmingham, United Kingdom.
Epstein-Barr virus (EBV) is present in all cases of endemic Burkitt lymphoma (BL) but in few European/North American sporadic BLs. Gene expression arrays of sporadic tumors have defined a consensus BL profile within which tumors are classifiable as "molecular BL" (mBL). Where endemic BLs fall relative to this profile remains unclear, since they not only carry EBV but also display one of two different forms of virus latency. Here, we use early-passage BL cell lines from different tumors, and BL subclones from a single tumor, to compare EBV-negative cells with EBV-positive cells displaying either classical latency I EBV infection (where EBNA1 is the only EBV antigen expressed from the wild-type EBV genome) or Wp-restricted latency (where an EBNA2 gene-deleted virus genome broadens antigen expression to include the EBNA3A, -3B, and -3C proteins and BHRF1). Expression arrays show that both types of endemic BL fall within the mBL classification. However, while EBV-negative and latency I BLs show overlapping profiles, Wp-restricted BLs form a distinct subgroup, characterized by a detectable downregulation of the germinal center (GC)-associated marker Bcl6 and upregulation of genes marking early plasmacytoid differentiation, notably IRF4 and BLIMP1. Importantly, these same changes can be induced in EBV-negative or latency I BL cells by infection with an EBNA2-knockout virus. Thus, we infer that the distinct gene profile of Wp-restricted BLs does not reflect differences in the identity of the tumor progenitor cell per se but differences imposed on a common progenitor by broadened EBV gene expression.
Funded by: Cancer Research UK: C910/A8829
Journal of virology 2013;87;5;2882-94
A systematic genome-wide analysis of zebrafish protein-coding gene function.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Since the publication of the human reference genome, the identities of specific genes associated with human diseases are being discovered at a rapid rate. A central problem is that the biological activity of these genes is often unclear. Detailed investigations in model vertebrate organisms, typically mice, have been essential for understanding the activities of many orthologues of these disease-associated genes. Although gene-targeting approaches and phenotype analysis have led to a detailed understanding of nearly 6,000 protein-coding genes, this number falls considerably short of the more than 22,000 mouse protein-coding genes. Similarly, in zebrafish genetics, one-by-one gene studies using positional cloning, insertional mutagenesis, antisense morpholino oligonucleotides, targeted re-sequencing, and zinc finger and TAL endonucleases have made substantial contributions to our understanding of the biological activity of vertebrate genes, but again the number of genes studied falls well short of the more than 26,000 zebrafish protein-coding genes. Importantly, for both mice and zebrafish, none of these strategies are particularly suited to the rapid generation of knockouts in thousands of genes and the assessment of their biological activity. Here we describe an active project that aims to identify and phenotype the disruptive mutations in every zebrafish protein-coding gene, using a well-annotated zebrafish reference genome sequence, high-throughput sequencing and efficient chemical mutagenesis. So far we have identified potentially disruptive mutations in more than 38% of all known zebrafish protein-coding genes. We have developed a multi-allelic phenotyping scheme to efficiently assess the effects of each allele during embryogenesis and have analysed the phenotypic consequences of over 1,000 alleles. All mutant alleles and data are available to the community and our phenotyping scheme is adaptable to phenotypic analysis beyond embryogenesis.
Funded by: Medical Research Council: G0777791; NHGRI NIH HHS: 5R01HG00481; Wellcome Trust: 098051
Integrative annotation of variants from 1092 humans: application to cancer genomics.
Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.
Interpreting variants, especially noncoding ones, in the increasing number of personal genomes is challenging. We used patterns of polymorphisms in functionally annotated regions in 1092 humans to identify deleterious variants; then we experimentally validated candidates. We analyzed both coding and noncoding regions, with the former corroborating the latter. We found regions particularly sensitive to mutations ("ultrasensitive") and variants that are disruptive because of mechanistic effects on transcription-factor binding (that is, "motif-breakers"). We also found variants in regions with higher network centrality tend to be deleterious. Insertions and deletions followed a similar pattern to single-nucleotide variants, with some notable exceptions (e.g., certain deletions and enhancers). On the basis of these patterns, we developed a computational tool (FunSeq), whose application to ~90 cancer genomes reveals nearly a hundred candidate noncoding drivers.
Funded by: NCATS NIH HHS: UL1 TR000457; NCI NIH HHS: CA167824, R01 CA166661, R01CA152057, U01 CA111275; NCRR NIH HHS: G12 RR003050; NHGRI NIH HHS: HG005718, HG007000, R01 HG002898, R01HG4719, U01 HG005718, U01HG6513, U41 HG007000, U54 HG003079; NIGMS NIH HHS: GM104424; NIMHD NIH HHS: G12 MD007579, P20 MD006899; Wellcome Trust: 085532, 090532, 095908, 098051, WT085532, WT095908
Science (New York, N.Y.) 2013;342;6154;1235587
Genome and transcriptome adaptation accompanying emergence of the definitive type 2 host-restricted Salmonella enterica serovar Typhimurium pathovar.
The Wellcome Trust Sanger Institute, the Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom. firstname.lastname@example.org
Salmonella enterica serovar Typhimurium definitive type 2 (DT2) is host restricted to Columba livia (rock or feral pigeon) but is also closely related to S. Typhimurium isolates that circulate in livestock and cause a zoonosis characterized by gastroenteritis in humans. DT2 isolates formed a distinct phylogenetic cluster within S. Typhimurium based on whole-genome-sequence polymorphisms. Comparative genome analysis of DT2 94-213 and S. Typhimurium SL1344, DT104, and D23580 identified few differences in gene content with the exception of variations within prophages. However, DT2 94-213 harbored 22 pseudogenes that were intact in other closely related S. Typhimurium strains. We report a novel in silico approach to identify single amino acid substitutions in proteins that have a high probability of a functional impact. One polymorphism identified using this method, a single-residue deletion in the Tar protein, abrogated chemotaxis to aspartate in vitro. DT2 94-213 also exhibited an altered transcriptional profile in response to culture at 42°C compared to that of SL1344. Such differentially regulated genes included a number involved in flagellum biosynthesis and motility. IMPORTANCE Whereas Salmonella enterica serovar Typhimurium can infect a wide range of animal species, some variants within this serovar exhibit a more limited host range and altered disease potential. Phylogenetic analysis based on whole-genome sequences can identify lineages associated with specific virulence traits, including host adaptation. This study represents one of the first to link pathogen-specific genetic signatures, including coding capacity, genome degradation, and transcriptional responses to host adaptation within a Salmonella serovar. We performed comparative genome analysis of reference and pigeon-adapted definitive type 2 (DT2) S. Typhimurium isolates alongside phenotypic and transcriptome analyses, to identify genetic signatures linked to host adaptation within the DT2 lineage.
Funded by: Wellcome Trust
Analysis of tumor heterogeneity and cancer gene networks using deep sequencing of MMTV-induced mouse mammary tumors.
Division of Molecular Pathology, The Netherlands Cancer Institute, Amsterdam, The Netherlands ; Division of Molecular Carcinogenesis, The Netherlands Cancer Institute, Amsterdam, The Netherlands.
Cancer develops through a multistep process in which normal cells progress to malignant tumors via the evolution of their genomes as a result of the acquisition of mutations in cancer driver genes. The number, identity and mode of action of cancer driver genes, and how they contribute to tumor evolution is largely unknown. This study deployed the Mouse Mammary Tumor Virus (MMTV) as an insertional mutagen to find both the driver genes and the networks in which they function. Using deep insertion site sequencing we identified around 31000 retroviral integration sites in 604 MMTV-induced mammary tumors from mice with mammary gland-specific deletion of Trp53, Pten heterozygous knockout mice, or wildtype strains. We identified 18 known common integration sites (CISs) and 12 previously unknown CISs marking new candidate cancer genes. Members of the Wnt, Fgf, Fgfr, Rspo and Pdgfr gene families were commonly mutated in a mutually exclusive fashion. The sequence data we generated yielded also information on the clonality of insertions in individual tumors, allowing us to develop a data-driven model of MMTV-induced tumor development. Insertional mutations near Wnt and Fgf genes mark the earliest "initiating" events in MMTV induced tumorigenesis, whereas Fgfr genes are targeted later during tumor progression. Our data shows that insertional mutagenesis can be used to discover the mutational networks, the timing of mutations, and the genes that initiate and drive tumor evolution.
Funded by: Cancer Research UK: 13031
PloS one 2013;8;5;e62113
Current application and future perspectives of molecular typing methods to study Clostridium difficile infections.
Section Experimental Microbiology, Department of Medical Microbiology, Center of Infectious Diseases, Leiden University Medical Center, Leiden, Netherlands.
Molecular typing is an essential tool to monitor Clostridium difficile infections and outbreaks within healthcare facilities. Molecular typing also plays a key role in defining the regional and global changes in circulating C. difficile types. The patterns of C. difficile types circulating within Europe (and globally) remain poorly understood, although international efforts are under way to understand the spatial and temporal patterns of C. difficile types. A complete picture is essential to properly investigate type-specific risk factors for C. difficile infections (CDI) and track long-range transmission. Currently, conventional agarose gel-based polymerase chain reaction (PCR) ribotyping is the most common typing method used in Europe to type C. difficile. Although this method has proved to be useful to study epidemiology on local, national and European level, efforts are made to replace it with capillary electrophoresis PCR ribotyping to increase pattern recognition, reproducibility and interpretation. However, this method lacks sufficient discriminatory power to study outbreaks and therefore multilocus variable-number tandem repeat analysis (MLVA) has been developed to study transmission between humans, animals and food. Sequence-based methods are increasingly being used for C. difficile fingerprinting/typing because of their ability to discriminate between highly related strains, the ease of data interpretation and transferability of data. The first studies using whole-genome single nucleotide polymorphism typing of healthcare-associated C. difficile within a clinically relevant timeframe are very promising and, although limited to select facilities because of complex data interpretation and high costs, these approaches will likely become commonly used over the coming years.
Euro surveillance : bulletin Européen sur les maladies transmissibles = European communicable disease bulletin 2013;18;4;20381
Tracking chromosome evolution in southern African gerbils using flow-sorted chromosome paints.
Evolutionary Genomics Group, Department of Botany and Zoology, University of Stellenbosch, Stellenbosch, South Africa.
Desmodillus and Gerbilliscus (formerly Tatera) comprise a monophyletic group of gerbils (subfamily Gerbillinae) which last shared an ancestor approximately 8 million years ago; diploid chromosome number variation among the species ranges from 2n = 36 to 2n = 50. In an attempt to shed more light on chromosome evolution and speciation in these rodents, we compared the karyotypes of 7 species, representing 3 genera, based on homology data revealed by chromosome painting with probes derived from flow-sorted chromosomes of the hairy footed gerbil, Gerbillurus paeba (2n = 36). The fluorescent in situ hybridization data revealed remarkable genome conservation: these species share a high proportion of conserved chromosomes, and differences are due to 10 Robertsonian (Rb) rearrangements (3 autapomorphies, 3 synapomorphies and 4 hemiplasies/homoplasies). Our data suggest that chromosome evolution in Desmodillus occurred at a rate of ~1.25 rearrangements per million years (Myr), and that the rate among Gerbilliscus over a time period spanning 8 Myr is also ~1.25 rearrangements/Myr. The recently diverged Gerbillurus (G. tytonis and G. paeba) share an identical karyotype, while Gerbilliscus kempi, G. afra and G. leucogaster differ by 6 Rb rearrangements (a rate of ~1 rearrangement/Myr). Thus, our data suggests a very slow rate of chromosomal evolution in Southern African gerbils.
Funded by: Wellcome Trust: WT098051
Cytogenetic and genome research 2013;139;4;267-75
Chromatin Accessibility Data Sets Show Bias Due to Sequence Specificity of the DNase I Enzyme.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.
Background: DNase I is an enzyme which cuts duplex DNA at a rate that depends strongly upon its chromatin environment. In combination with high-throughput sequencing (HTS) technology, it can be used to infer genome-wide landscapes of open chromatin regions. Using this technology, systematic identification of hundreds of thousands of DNase I hypersensitive sites (DHS) per cell type has been possible, and this in turn has helped to precisely delineate genomic regulatory compartments. However, to date there has been relatively little investigation into possible biases affecting this data. Results: We report a significant degree of sequence preference spanning sites cut by DNase I in a number of published data sets. The two major protocols in current use each show a different pattern, but for a given protocol the pattern of sequence specificity seems to be quite consistent. The patterns are substantially different from biases seen in other types of HTS data sets, and in some cases the most constrained position lies outside the sequenced fragment, implying that this constraint must relate to the digestion process rather than events occurring during library preparation or sequencing. Conclusions: DNase I is a sequence-specific enzyme, with a specificity that may depend on experimental conditions. This sequence specificity is not taken into account by existing pipelines for identifying open chromatin regions. Care must be taken when interpreting DNase I results, especially when looking at the precise locations of the reads. Future studies may be able to improve the sensitivity and precision of chromatin state measurement by compensating for sequence bias.
PloS one 2013;8;7;e69853
Piliation of Invasive Streptococcus pneumoniae Isolates in the Era before Pneumococcal Conjugate Vaccine Introduction in Malawi.
The Malawi-Liverpool-Wellcome Trust Clinical Research Programme, Blantyre, Malawi.
The pneumococcal pilus has been shown to be an important determinant of adhesion and virulence in mouse models of colonization, pneumonia, and bacteremia. A pilus is capable of inducing protective immunity, supporting its inclusion in next-generation pneumococcal protein vaccine formulations. Whether this vaccine target is common among pneumococci in sub-Saharan Africa is uncertain. To define the prevalence and genetic diversity of type I and II pili among invasive pneumococci in Malawi prior to the introduction of the 13-valent pneumococcal conjugate vaccine (PCV13) into routine childhood immunization, we examined 188 Streptococcus pneumoniae isolates collected between 2002 and 2008 (17% serotype 1). In this region of high disease burden, we found a low frequency of invasive piliated pneumococci (14%) and pilus gene sequence diversity similar to that seen previously in multiple global pneumococcal lineages. All common serotypes with pilus were covered by PCV13 and so we predict that pilus prevalence will be reduced in the Malawian pneumococcal population after PCV13 introduction.
Clinical and vaccine immunology : CVI 2013;20;11;1729-35
The genome and transcriptome of Haemonchus contortus, a key model parasite for drug and vaccine discovery.
Background: The small ruminant parasite Haemonchus contortus is the most widely used parasitic nematode in drug discovery, vaccine development and anthelmintic resistance research. Its remarkable propensity to develop resistance threatens the viability of the sheep industry in many regions of the world and provides a cautionary example of the effect of mass drug administration to control parasitic nematodes. Its phylogenetic position makes it particularly well placed for comparison with the free-living nematode Caenorhabditis elegans and the most economically important parasites of livestock and humans.
Results: Here we report the detailed analysis of a draft genome assembly and extensive transcriptomic dataset for H. contortus. This represents the first genome to be published for a strongylid nematode and the most extensive transcriptomic dataset for any parasitic nematode reported to date. We show a general pattern of conservation of genome structure and gene content between H. contortus and C. elegans, but also a dramatic expansion of important parasite gene families. We identify genes involved in parasite-specific pathways such as blood feeding, neurological function, and drug metabolism. In particular, we describe complete gene repertoires for known drug target families, providing the most comprehensive understanding yet of the action of several important anthelmintics. Also, we identify a set of genes enriched in the parasitic stages of the lifecycle and the parasite gut that provide a rich source of vaccine and drug target candidates.
Conclusions: The H. contortus genome and transcriptome provide an essential platform for postgenomic research in this and other important strongylid parasites.
Funded by: Biotechnology and Biological Sciences Research Council: BB/E018130/1; Canadian Institutes of Health Research: 230927; Wellcome Trust: 067811, 098051
Genome biology 2013;14;8;R88
Etoposide Induces Nuclear Re-Localisation of AID.
Epigenetics Programme, The Babraham Institute, Cambridge, United Kingdom.
During B cell activation, the DNA lesions that initiate somatic hypermutation and class switch recombination are introduced by activation-induced cytidine deaminase (AID). AID is a highly mutagenic protein that is maintained in the cytoplasm at steady state, however AID is shuttled across the nuclear membrane and the protein transiently present in the nucleus appears sufficient for targeted alteration of immunoglobulin loci. AID has been implicated in epigenetic reprogramming in primordial germ cells and cell fusions and in induced pluripotent stem cells (iPS cells), however AID expression in non-B cells is very low. We hypothesised that epigenetic reprogramming would require a pathway that instigates prolonged nuclear residence of AID. Here we show that AID is completely re-localised to the nucleus during drug withdrawal following etoposide treatment, in the period in which double strand breaks (DSBs) are repaired. Re-localisation occurs 2-6 hours after etoposide treatment, and AID remains in the nucleus for 10 or more hours, during which time cells remain live and motile. Re-localisation is cell-cycle dependent and is only observed in G2. Analysis of DSB dynamics shows that AID is re-localised in response to etoposide treatment, however re-localisation occurs substantially after DSB formation and the levels of re-localisation do not correlate with γH2AX levels. We conclude that DSB formation initiates a slow-acting pathway which allows stable long-term nuclear localisation of AID, and that such a pathway may enable AID-induced DNA demethylation during epigenetic reprogramming.
PloS one 2013;8;12;e82110
Cerebral organoids model human brain development and microcephaly.
Institute of Molecular Biotechnology of the Austrian Academy of Science (IMBA), Vienna 1030, Austria.
The complexity of the human brain has made it difficult to study many brain disorders in model organisms, highlighting the need for an in vitro model of human brain development. Here we have developed a human pluripotent stem cell-derived three-dimensional organoid culture system, termed cerebral organoids, that develop various discrete, although interdependent, brain regions. These include a cerebral cortex containing progenitor populations that organize and produce mature cortical neuron subtypes. Furthermore, cerebral organoids are shown to recapitulate features of human cortical development, namely characteristic progenitor zone organization with abundant outer radial glial stem cells. Finally, we use RNA interference and patient-specific induced pluripotent stem cells to model microcephaly, a disorder that has been difficult to recapitulate in mice. We demonstrate premature neuronal differentiation in patient organoids, a defect that could help to explain the disease phenotype. Together, these data show that three-dimensional organoids can recapitulate development and disease even in this most complex human tissue.
Intestinal colonization resistance.
Bacterial Pathogenesis Laboratory, Wellcome Trust Sanger Institute, Hinxton, UK. email@example.com
Dense, complex microbial communities, collectively termed the microbiota, occupy a diverse array of niches along the length of the mammalian intestinal tract. During health and in the absence of antibiotic exposure the microbiota can effectively inhibit colonization and overgrowth by invading microbes such as pathogens. This phenomenon is called 'colonization resistance' and is associated with a stable and diverse microbiota in tandem with a controlled lack of inflammation, and involves specific interactions between the mucosal immune system and the microbiota. Here we overview the microbial ecology of the healthy mammalian intestinal tract and highlight the microbe-microbe and microbe-host interactions that promote colonization resistance. Emerging themes highlight immunological (T helper type 17/regulatory T-cell balance), microbiota (diverse and abundant) and metabolic (short-chain fatty acid) signatures of intestinal health and colonization resistance. Intestinal pathogens use specific virulence factors or exploit antibiotic use to subvert colonization resistance for their own benefit by triggering inflammation to disrupt the harmony of the intestinal ecosystem. A holistic view that incorporates immunological and microbiological facets of the intestinal ecosystem should facilitate the development of immunomodulatory and microbe-modulatory therapies that promote intestinal homeostasis and colonization resistance.
Funded by: Medical Research Council: 93614; Wellcome Trust: 076964, 098051
Murine models to study Clostridium difficile infection and transmission.
Bacterial Pathogenesis Laboratory, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Clostridium difficile is the leading cause of antibiotic-associated diarrhea in healthcare facilities worldwide. C. difficile infections are difficult to treat because of the high rate of disease recurrence after antibiotic therapy, leaving few treatment options for patients. C. difficile is also difficult to contain within a healthcare setting due to a highly-transmissible, resistant spore form that challenges standard infection control measures. The recent development of murine infection models to study the interactions between C. difficile, the host and the microbiota are providing novel insight into the mechanisms of pathogenesis and transmission that should guide the development of therapies and intervention measures.
Funded by: Medical Research Council: G0901743; NIAID NIH HHS: AI090871, U19 AI090871; Wellcome Trust: 076964, 098051
Richness of human gut microbiome correlates with metabolic markers.
INRA, Institut National de la Recherche Agronomique, US1367 Metagenopolis, 78350 Jouy en Josas, France.
We are facing a global metabolic health crisis provoked by an obesity epidemic. Here we report the human gut microbial composition in a population sample of 123 non-obese and 169 obese Danish individuals. We find two groups of individuals that differ by the number of gut microbial genes and thus gut bacterial richness. They contain known and previously unknown bacterial species at different proportions; individuals with a low bacterial richness (23% of the population) are characterized by more marked overall adiposity, insulin resistance and dyslipidaemia and a more pronounced inflammatory phenotype when compared with high bacterial richness individuals. The obese individuals among the lower bacterial richness group also gain more weight over time. Only a few bacterial species are sufficient to distinguish between individuals with high and low bacterial richness, and even between lean and obese participants. Our classifications based on variation in the gut microbiome identify subsets of individuals in the general white adult population who may be at increased risk of progressing to adiposity-associated co-morbidities.
Human SNP links differential outcomes in inflammatory and infectious disease to a FOXO3-regulated pathway.
Cambridge Institute for Medical Research, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0XY, UK; Department of Medicine, University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Cambridge CB2 0QQ, UK.
The clinical course and eventual outcome, or prognosis, of complex diseases varies enormously between affected individuals. This variability critically determines the impact a disease has on a patient's life but is very poorly understood. Here, we exploit existing genome-wide association study data to gain insight into the role of genetics in prognosis. We identify a noncoding polymorphism in FOXO3A (rs12212067: T > G) at which the minor (G) allele, despite not being associated with disease susceptibility, is associated with a milder course of Crohn's disease and rheumatoid arthritis and with increased risk of severe malaria. Minor allele carriage is shown to limit inflammatory responses in monocytes via a FOXO3-driven pathway, which through TGFβ1 reduces production of proinflammatory cytokines, including TNFα, and increases production of anti-inflammatory cytokines, including IL-10. Thus, we uncover a shared genetic contribution to prognosis in distinct diseases that operates via a FOXO3-driven pathway modulating inflammatory responses.
Funded by: Arthritis Research UK: 20385; Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Medical Research Council: G0400929, G0600329, G0800675, G0800759; Wellcome Trust: 079895, 083650/Z/07/Z, 087007/Z/08/Z, 089276, 091758, 091758/Z/10/Z, 092654, 098051, 100140, 102974
Genome-wide profiling of chromosome interactions in Plasmodium falciparum characterizes nuclear architecture and reconfigurations associated with antigenic variation.
Weatherall Institute of Molecular Medicine, Headington, Oxford, OX3 9DS, UK; National Institute of Allergy and Infectious Disease, NIH, Rockville, MD, 20892, USA.
Spatial relationships within the eukaryotic nucleus are essential for proper nuclear function. In Plasmodium falciparum, the repositioning of chromosomes has been implicated in the regulation of the expression of genes responsible for antigenic variation, and the formation of a single, peri-nuclear nucleolus results in the clustering of rDNA. Nevertheless, the precise spatial relationships between chromosomes remain poorly understood, because, until recently, techniques with sufficient resolution have been lacking. Here we have used chromosome conformation capture and second-generation sequencing to study changes in chromosome folding and spatial positioning that occur during switches in var gene expression. We have generated maps of chromosomal spatial affinities within the P. falciparum nucleus at 25 Kb resolution, revealing a structured nucleolus, an absence of chromosome territories, and confirming previously identified clustering of heterochromatin foci. We show that switches in var gene expression do not appear to involve interaction with a distant enhancer, but do result in local changes at the active locus. These maps reveal the folding properties of malaria chromosomes, validate known physical associations, and characterize the global landscape of spatial interactions. Collectively, our data provide critical information for a better understanding of gene expression regulation and antigenic variation in malaria parasites.
Funded by: Wellcome Trust: 082130, 082130/Z/07/Z, 098051
Molecular microbiology 2013;90;3;519-37
The NGS WikiBook: a dynamic collaborative online training effort with long-term sustainability.
School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR. Tel.: +852-39431302; firstname.lastname@example.org.
Next-generation sequencing (NGS) is increasingly being adopted as the backbone of biomedical research. With the commercialization of various affordable desktop sequencers, NGS will be reached by increasing numbers of cellular and molecular biologists, necessitating community consensus on bioinformatics protocols to tackle the exponential increase in quantity of sequence data. The current resources for NGS informatics are extremely fragmented. Finding a centralized synthesis is difficult. A multitude of tools exist for NGS data analysis; however, none of these satisfies all possible uses and needs. This gap in functionality could be filled by integrating different methods in customized pipelines, an approach helped by the open-source nature of many NGS programmes. Drawing from community spirit and with the use of the Wikipedia framework, we have initiated a collaborative NGS resource: The NGS WikiBook. We have collected a sufficient amount of text to incentivize a broader community to contribute to it. Users can search, browse, edit and create new content, so as to facilitate self-learning and feedback to the community. The overall structure and style for this dynamic material is designed for the bench biologists and non-bioinformaticians. The flexibility of online material allows the readers to ignore details in a first read, yet have immediate access to the information they need. Each chapter comes with practical exercises so readers may familiarize themselves with each step. The NGS WikiBook aims to create a collective laboratory book and protocol that explains the key concepts and describes best practices in this fast-evolving field.
Funded by: Biotechnology and Biological Sciences Research Council
Briefings in bioinformatics 2013;14;5;548-55
Non dominant-negative KCNJ2 gene mutations leading to Andersen-Tawil syndrome with an isolated cardiac phenotype.
Institut für Physiologie und Pathophysiologie, Vegetative Physiologie, Philipps-University Marburg, Deutschhausstraße 1-2, 35037 Marburg, Germany.
Andersen-Tawil syndrome (ATS) is characterized by dysmorphic features, periodic paralyses and abnormal ventricular repolarization. After genotyping a large set of patients with congenital long-QT syndrome, we identified two novel, heterozygous KCNJ2 mutations (p.N318S, p.W322C) located in the C-terminus of the Kir2.1 subunit. These mutations have a different localization than classical ATS mutations which are mostly located at a potential interaction face with the slide helix or at the interface between the C-termini. Mutation carriers were without the key features of ATS, causing an isolated cardiac phenotype. While the N318S mutants regularly reached the plasma membrane, W322C mutants primarily resided in late endosomes. Co-expression of N318S or W322C with wild-type Kir2.1 reduced current amplitudes only by 20-25 %. This mild loss-of-function for the heteromeric channels resulted from defective channel trafficking (W322C) or gating (N318S). Strikingly, and in contrast to the majority of ATS mutations, neither mutant caused a dominant-negative suppression of wild-type Kir2.1, Kir2.2 and Kir2.3 currents. Thus, a mild reduction of native Kir2.x currents by non dominant-negative mutants may cause ATS with an isolated cardiac phenotype.
Funded by: Wellcome Trust: 098051
Basic research in cardiology 2013;108;3;353
Amphotericin B increases influenza A virus infection by preventing IFITM3-mediated restriction.
Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, Worcester, MA 01655, USA.
The IFITMs inhibit influenza A virus (IAV) replication in vitro and in vivo. Here, we establish that the antimycotic heptaen, amphotericin B (AmphoB), prevents IFITM3-mediated restriction of IAV, thereby increasing viral replication. Consistent with its neutralization of IFITM3, a clinical preparation of AmphoB, AmBisome, reduces the majority of interferon's protective effect against IAV in vitro. Mechanistic studies reveal that IFITM1 decreases host-membrane fluidity, suggesting both a possible mechanism for IFITM-mediated restriction and its negation by AmphoB. Notably, we reveal that mice treated with AmBisome succumbed to a normally mild IAV infection, similar to animals deficient in Ifitm3. Therefore, patients receiving antifungal therapy with clinical preparations of AmphoB may be functionally immunocompromised and thus more vulnerable to influenza, as well as other IFITM3-restricted viral infections.
Funded by: NIAID NIH HHS: 1R01AI091786, R01 AI091786; Wellcome Trust
Cell reports 2013;5;4;895-908
A combination of improved differential and global RNA-seq reveals pervasive transcription initiation and events in all stages of the life-cycle of functional RNAs in Propionibacterium acnes, a major contributor to wide-spread human disease.
Astbury Centre for Structural Molecular Biology, School of Molecular and Cellular Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, UK. email@example.com.
Background: Sequencing of the genome of Propionibacterium acnes produced a catalogue of genes many of which enable this organism to colonise skin and survive exposure to the elements. Despite this platform, there was little understanding of the gene regulation that gives rise to an organism that has a major impact on human health and wellbeing and causes infections beyond the skin. To address this situation, we have undertaken a genome-wide study of gene regulation using a combination of improved differential and global RNA-sequencing and an analytical approach that takes into account the inherent noise within the data.
Results: We have produced nucleotide-resolution transcriptome maps that identify and differentiate sites of transcription initiation from sites of stable RNA processing and mRNA cleavage. Moreover, analysis of these maps provides strong evidence for 'pervasive' transcription and shows that contrary to initial indications it is not biased towards the production of antisense RNAs. In addition, the maps reveal an extensive array of riboswitches, leaderless mRNAs and small non-protein-coding RNAs alongside vegetative promoters and post-transcriptional events, which includes unusual tRNA processing. The identification of such features will inform models of complex gene regulation, as illustrated here for ribonucleotide reductases and a potential quorum-sensing, two-component system.
Conclusions: The approach described here, which is transferable to any bacterial species, has produced a step increase in whole-cell knowledge of gene regulation in P. acnes. Continued expansion of our maps to include transcription associated with different growth conditions and genetic backgrounds will provide a new platform from which to computationally model the gene expression that determines the physiology of P. acnes and its role in human disease.
Funded by: Biotechnology and Biological Sciences Research Council
BMC genomics 2013;14;620
Survey of culture, goldengate assay, universal biosensor assay, and 16S rRNA Gene sequencing as alternative methods of bacterial pathogen detection.
University of Maryland, School of Medicine, Baltimore, Maryland, USA.
Cultivation-based assays combined with PCR or enzyme-linked immunosorbent assay (ELISA)-based methods for finding virulence factors are standard methods for detecting bacterial pathogens in stools; however, with emerging molecular technologies, new methods have become available. The aim of this study was to compare four distinct detection technologies for the identification of pathogens in stools from children under 5 years of age in The Gambia, Mali, Kenya, and Bangladesh. The children were identified, using currently accepted clinical protocols, as either controls or cases with moderate to severe diarrhea. A total of 3,610 stool samples were tested by established clinical culture techniques: 3,179 DNA samples by the Universal Biosensor assay (Ibis Biosciences, Inc.), 1,466 DNA samples by the GoldenGate assay (Illumina), and 1,006 DNA samples by sequencing of 16S rRNA genes. Each method detected different proportions of samples testing positive for each of seven enteric pathogens, enteroaggregative Escherichia coli (EAEC), enterotoxigenic E. coli (ETEC), enteropathogenic E. coli (EPEC), Shigella spp., Campylobacter jejuni, Salmonella enterica, and Aeromonas spp. The comparisons among detection methods included the frequency of positive stool samples and kappa values for making pairwise comparisons. Overall, the standard culture methods detected Shigella spp., EPEC, ETEC, and EAEC in smaller proportions of the samples than either of the methods based on detection of the virulence genes from DNA in whole stools. The GoldenGate method revealed the greatest agreement with the other methods. The agreement among methods was higher in cases than in controls. The new molecular technologies have a high potential for highly sensitive identification of bacterial diarrheal pathogens.
Journal of clinical microbiology 2013;51;10;3263-9
Dense genotyping of immune-related disease regions identifies nine new risk loci for primary sclerosing cholangitis.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
Primary sclerosing cholangitis (PSC) is a severe liver disease of unknown etiology leading to fibrotic destruction of the bile ducts and ultimately to the need for liver transplantation. We compared 3,789 PSC cases of European ancestry to 25,079 population controls across 130,422 SNPs genotyped using the Immunochip. We identified 12 genome-wide significant associations outside the human leukocyte antigen (HLA) complex, 9 of which were new, increasing the number of known PSC risk loci to 16. Despite comorbidity with inflammatory bowel disease (IBD) in 72% of the cases, 6 of the 12 loci showed significantly stronger association with PSC than with IBD, suggesting overlapping yet distinct genetic architectures for these two diseases. We incorporated association statistics from 7 diseases clinically occurring with PSC in the analysis and found suggestive evidence for 33 additional pleiotropic PSC risk loci. Together with network analyses, these findings add to the genetic risk map of PSC and expand on the relationship between PSC and other immune-mediated diseases.
Funded by: Medical Research Council: G0601816; NCATS NIH HHS: UL1 TR000005; NCI NIH HHS: R01 CA141743; Wellcome Trust: 091745, 098051
Nature genetics 2013;45;6;670-5
Fine mapping of the pond snail left-right asymmetry (chirality) locus using RAD-Seq and fibre-FISH.
School of Biology, University of Nottingham, University Park, Nottingham, UK.
The left-right asymmetry of snails, including the direction of shell coiling, is determined by the delayed effect of a maternal gene on the chiral twist that takes place during early embryonic cell divisions. Yet, despite being a well-established classical problem, the identity of the gene and the means by which left-right asymmetry is established in snails remain unknown. We here demonstrate the power of new genomic approaches for identification of the chirality gene, "D". First, heterozygous (Dd) pond snails Lymnaea stagnalis were self-fertilised or backcrossed, and the genotype of more than six thousand offspring inferred, either dextral (DD/Dd) or sinistral (dd). Then, twenty of the offspring were used for Restriction-site-Associated DNA Sequencing (RAD-Seq) to identify anonymous molecular markers that are linked to the chirality locus. A local genetic map was constructed by genotyping three flanking markers in over three thousand snails. The three markers lie either side of the chirality locus, with one very tightly linked (<0.1 cM). Finally, bacterial artificial chromosomes (BACs) were isolated that contained the three loci. Fluorescent in situ hybridization (FISH) of pachytene cells showed that the three BACs tightly cluster on the same bivalent chromosome. Fibre-FISH identified a region of greater that ∼0.4 Mb between two BAC clone markers that must contain D. This work therefore establishes the resources for molecular identification of the chirality gene and the variation that underpins sinistral and dextral coiling. More generally, the results also show that combining genomic technologies, such as RAD-Seq and high resolution FISH, is a robust approach for mapping key loci in non-model systems.
Funded by: Biotechnology and Biological Sciences Research Council: BB/F018940/1, BB/F021135/1, F021135, G00661X; Medical Research Council: G0900740, MR/K001744/1; Wellcome Trust: WT098051
PloS one 2013;8;8;e71067
Detecting and characterizing genomic signatures of positive selection in global populations.
NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore 117456, Singapore; Saw Swee Hock School of Public Health, National University of Singapore, Singapore 117597, Singapore.
Natural selection is a significant force that shapes the architecture of the human genome and introduces diversity across global populations. The question of whether advantageous mutations have arisen in the human genome as a result of single or multiple mutation events remains unanswered except for the fact that there exist a handful of genes such as those that confer lactase persistence, affect skin pigmentation, or cause sickle cell anemia. We have developed a long-range-haplotype method for identifying genomic signatures of positive selection to complement existing methods, such as the integrated haplotype score (iHS) or cross-population extended haplotype homozygosity (XP-EHH), for locating signals across the entire allele frequency spectrum. Our method also locates the founder haplotypes that carry the advantageous variants and infers their corresponding population frequencies. This presents an opportunity to systematically interrogate the whole human genome whether a selection signal shared across different populations is the consequence of a single mutation process followed subsequently by gene flow between populations or of convergent evolution due to the occurrence of multiple independent mutation events either at the same variant or within the same gene. The application of our method to data from 14 populations across the world revealed that positive-selection events tend to cluster in populations of the same ancestry. Comparing the founder haplotypes for events that are present across different populations revealed that convergent evolution is a rare occurrence and that the majority of shared signals stem from the same evolutionary event.
Funded by: Medical Research Council: G0600718, G19/9; Wellcome Trust: 090532, 090532/Z/09/Z, 090770, 090770/Z/09/Z, 098051, WT00383/Z/05/Z
American journal of human genetics 2013;92;6;866-81
Epigenetic conservation at gene regulatory elements revealed by non-methylated DNA profiling in seven vertebrates.
Department of Biochemistry , University of Oxford , Oxford , United Kingdom ; Weatherall Institute of Molecular Medicine, University of Oxford , Oxford , United Kingdom.
Two-thirds of gene promoters in mammals are associated with regions of non-methylated DNA, called CpG islands (CGIs), which counteract the repressive effects of DNA methylation on chromatin. In cold-blooded vertebrates, computational CGI predictions often reside away from gene promoters, suggesting a major divergence in gene promoter architecture across vertebrates. By experimentally identifying non-methylated DNA in the genomes of seven diverse vertebrates, we instead reveal that non-methylated islands (NMIs) of DNA are a central feature of vertebrate gene promoters. Furthermore, NMIs are present at orthologous genes across vast evolutionary distances, revealing a surprising level of conservation in this epigenetic feature. By profiling NMIs in different tissues and developmental stages we uncover a unifying set of features that are central to the function of NMIs in vertebrates. Together these findings demonstrate an ancient logic for NMI usage at gene promoters and reveal an unprecedented level of epigenetic conservation across vertebrate evolution. DOI:http://dx.doi.org/10.7554/eLife.00348.001.
Genome-wide association study on detailed profiles of smoking behavior and nicotine dependence in a twin sample.
Department of Public Health, Hjelt Institute, University of Helsinki, Helsinki, Finland.
Smoking is a major risk factor for several somatic diseases and is also emerging as a causal factor for neuropsychiatric disorders. Genome-wide association (GWA) and candidate gene studies for smoking behavior and nicotine dependence (ND) have disclosed too few predisposing variants to account for the high estimated heritability. Previous large-scale GWA studies have had very limited phenotypic definitions of relevance to smoking-related behavior, which has likely impeded the discovery of genetic effects. We performed GWA analyses on 1114 adult twins ascertained for ever smoking from the population-based Finnish Twin Cohort study. The availability of 17 smoking-related phenotypes allowed us to comprehensively portray the dimensions of smoking behavior, clustered into the domains of smoking initiation, amount smoked and ND. Our results highlight a locus on 16p12.3, with several single-nucleotide polymorphisms (SNPs) in the vicinity of CLEC19A showing association (P<1 × 10(-6)) with smoking quantity. Interestingly, CLEC19A is located close to a previously reported attention-deficit hyperactivity disorder (ADHD) linkage locus and an evident link between ADHD and smoking has been established. Intriguing preliminary association (P<1 × 10(-5)) was detected between DSM-IV (Diagnostic and Statistical Manual of Mental Disorders, 4th edition) ND diagnosis and several SNPs in ERBB4, coding for a Neuregulin receptor, on 2q33. The association between ERBB4 and DSM-IV ND diagnosis was replicated in an independent Australian sample. Recently, a significant increase in ErbB4 and Neuregulin 3 (Nrg3) expression was revealed following chronic nicotine exposure and withdrawal in mice and an association between NRG3 SNPs and smoking cessation success was detected in a clinical trial. ERBB4 has previously been associated with schizophrenia; further, it is located within an established schizophrenia linkage locus and within a linkage locus for a smoker phenotype identified in this sample. In conclusion, we disclose novel tentative evidence for the involvement of ERBB4 in ND, suggesting the involvement of the Neuregulin/ErbB signalling pathway in addictions and providing a plausible link between the high co-morbidity of schizophrenia and ND.Molecular Psychiatry advance online publication, 11 June 2013; doi:10.1038/mp.2013.72.
Funded by: NIAAA NIH HHS: K05 AA017688, P60 AA011998, R01 AA013320; NIDA NIH HHS: R01 DA012854
Molecular psychiatry 2013
Generation of a Tn5 transposon library in Haemophilus parasuis and analysis by transposon-directed insertion-site sequencing (TraDIS).
Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge CB3 0ES, UK. firstname.lastname@example.org
Haemophilus parasuis is an important respiratory tract pathogen of swine and the etiological agent of Glässer's disease. The molecular pathogenesis of H. parasuis is not well studied, mainly due to the lack of efficient tools for genetic manipulation of this bacterium. In this study we describe a Tn5-based random mutagenesis method for use in H. parasuis. A novel chloramphenicol-resistant Tn5 transposome was electroporated into the virulent H. parasuis serovar 5 strain 29755. High transposition efficiency of Tn5, up to 10(4) transformants/μg of transposon DNA, was obtained by modification of the Tn5 DNA in the H. parasuis strain HS071 and establishment of optimal electrotransformation conditions, and a library of approximately 10,500 mutants was constructed. Analysis of the library using transposon-directed insertion-site sequencing (TraDIS) revealed that the insertion of Tn5 was evenly distributed throughout the genome. 10,001 individual mutants were identified, with 1561 genes being disrupted (69.4% of the genome). This newly-developed, efficient mutagenesis approach will be a powerful tool for genetic manipulation of H. parasuis in order to study its physiology and pathogenesis.
Funded by: Biotechnology and Biological Sciences Research Council: BB/G003203/1, BB/G018553/1, BB/G019177/1, BB/G019274/1, BB/G020744/1
Veterinary microbiology 2013;166;3-4;558-66
The Distribution and 'In Vivo' Phase Variation Status of Haemoglobin Receptors in Invasive Meningococcal Serogroup B Disease: Genotypic and Phenotypic Analysis.
Public Health England, Manchester, United Kingdom.
Two haemoglobin-binding proteins, HmbR and HpuAB, contribute to iron acquisition by Neisseria meningitidis. These receptors are subject to high frequency, reversible switches in gene expression - phase variation (PV) - due to mutations in homopolymeric (poly-G) repeats present in the open reading frame. The distribution and PV state of these receptors was assessed for a representative collection of isolates from invasive meningococcal disease patients of England, Wales and Northern Ireland. Most of the major clonal complexes had only the HmbR receptor whilst the recently expanding ST-275-centred cluster of the ST-269 clonal complex had both receptors. At least one of the receptors was in an 'ON' configuration in 76.3% of the isolates, a finding that was largely consistent with phenotypic analyses. As PV status may change during isolation and culture of meningococci, a PCR-based protocol was utilised to confirm the expression status of the receptors within contemporaneously acquired clinical specimens (blood/cerebrospinal fluid) from the respective patients. The expression state was confirmed for all isolate/specimen pairs with <15 tract repeats indicating that the PV status of these receptors is stable during isolation. This study therefore establishes a protocol for determining in vivo PV status to aid in determining the contributions of phase variable genes to invasive meningococcal disease. Furthermore, the results of the study support a putative but non-essential role of the meningococcal haemoglobin receptors as virulence factors whilst further highlighting their vaccine candidacy.
PloS one 2013;8;9;e76932
RocA Truncation Underpins Hyper-Encapsulation, Carriage Longevity and Transmissibility of Serotype M18 Group A Streptococci.
Faculty of Medicine, Imperial College London, Hammersmith Hospital, London, United Kingdom.
Group A streptococcal isolates of serotype M18 are historically associated with epidemic waves of pharyngitis and the non-suppurative immune sequela rheumatic fever. The serotype is defined by a unique, highly encapsulated phenotype, yet the molecular basis for this unusual colony morphology is unknown. Here we identify a truncation in the regulatory protein RocA, unique to and conserved within our serotype M18 GAS collection, and demonstrate that it underlies the characteristic M18 capsule phenotype. Reciprocal allelic exchange mutagenesis of rocA between M18 GAS and M89 GAS demonstrated that truncation of RocA was both necessary and sufficient for hyper-encapsulation via up-regulation of both precursors required for hyaluronic acid synthesis. Although RocA was shown to positively enhance covR transcription, quantitative proteomics revealed RocA to be a metabolic regulator with activity beyond the CovR/S regulon. M18 GAS demonstrated a uniquely protuberant chain formation following culture on agar that was dependent on excess capsule and the RocA mutation. Correction of the M18 rocA mutation reduced GAS survival in human blood, and in vivo naso-pharyngeal carriage longevity in a murine model, with an associated drop in bacterial airborne transmission during infection. In summary, a naturally occurring truncation in a regulator explains the encapsulation phenotype, carriage longevity and transmissibility of M18 GAS, highlighting the close interrelation of metabolism, capsule and virulence.
PLoS pathogens 2013;9;12;e1003842
Next-generation sequencing of disseminated tumor cells.
Department of Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet , Oslo , Norway ; K.G. Jebsen Center for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo , Oslo , Norway.
Disseminated tumor cells (DTCs) detected in the bone marrow have been shown as an independent prognostic factor for women with breast cancer. However, the mechanisms behind the tumor cell dissemination are still unclear and more detailed knowledge is needed to fully understand why some cells remain dormant and others metastasize. Sequencing of single cells has opened for the possibility to dissect the genetic content of subclones of a primary tumor, as well as DTCs. Previous studies of genetic changes in DTCs have employed single-cell array comparative genomic hybridization which provides information about larger aberrations. To date, next-generation sequencing provides the possibility to discover new, smaller, and copy neutral genetic changes. In this study, we performed whole-genome amplification and subsequently next-generation sequencing to analyze DTCs from two breast cancer patients. We compared copy-number profiles of the DTCs and the corresponding primary tumor generated from sequencing and SNP-comparative genomic hybridization (CGH) data, respectively. While one tumor revealed mostly whole-arm gains and losses, the other had more complex alterations, as well as subclonal amplification and deletions. Whole-arm gains or losses in the primary tumor were in general also observed in the corresponding DTC. Both primary tumors showed amplification of chromosome 1q and deletion of parts of chromosome 16q, which was recaptured in the corresponding DTCs. Interestingly, clear differences were also observed, indicating that the DTC underwent further evolution at the copy-number level. This study provides a proof-of-principle for sequencing of DTCs and correlation with primary copy-number profiles. The analyses allow insight into tumor cell dissemination and show ongoing copy-number evolution in DTCs compared to the primary tumors.
Frontiers in oncology 2013;3;320
Human candidate polymorphisms in sympatric ethnic groups differing in malaria susceptibility in mali.
Malaria Research and Training Center / Department of Epidemiology of Parasitic Diseases / Faculty of Medicine, Pharmacy and Odonto - Stomatology, BP 1805, Bamako, USTTB, Mali ; Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, Stockholm, Sweden.
Malaria still remains a major public health problem in Mali, although disease susceptibility varies between ethnic groups, particularly between the Fulani and Dogon. These two sympatric groups share similar socio-cultural factors and malaria transmission rates, but Fulani individuals tend to show significantly higher spleen enlargement scores, lower parasite prevalence, and seem less affected by the disease than their Dogon neighbours. We have used genetic polymorphisms from malaria-associated genes to investigate associations with various malaria metrics between the Fulanai and Dogon groups. Two cross sectional surveys (transmission season 2006, dry season 2007) were performed. Healthy volunteers from the both ethnic groups (n=939) were recruited in a rural setting. In each survey, clinical (spleen enlargement, axillary temperature, weight) and parasitological data (malaria parasite densities and species) were collected, as well as blood samples. One hundred and sixty six SNPs were genotyped and 5 immunoassays (AMA1, CSP, MSP1, MSP2, total IgE) were performed on the DNA and serum samples respectively. The data confirm the reduced malaria susceptibility in the Fulani, with a higher level of the protective O-blood group, and increased circulating antibody levels to several malaria antigens (p<10(-15)). We identified SNP allele frequency differences between the 2 ethnic groups in CD36, IL4, RTN3 and ADCY9. Moreover, polymorphisms in FCER1A, RAD50, TNF, SLC22A4, and IL13 genes were correlated with antibody production (p-value<0.003). Further work is required to understand the mechanisms underpinning these genetic factors.
PloS one 2013;8;10;e75675
The agr locus regulates virulence and colonization genes in Clostridium difficile 027.
Department of Pathogen Molecular Biology, London School of Hygiene and Tropical Medicine, University of London, London, United Kingdom.
The transcriptional regulator AgrA, a member of the LytTR family of proteins, plays a key role in controlling gene expression in some Gram-positive pathogens, including Staphylococcus aureus and Enterococcus faecalis. AgrA is encoded by the agrACDB global regulatory locus, and orthologues are found within the genome of most Clostridium difficile isolates, including the epidemic lineage 027/BI/NAP1. Comparative RNA sequencing of the wild type and otherwise isogenic agrA null mutant derivatives of C. difficile R20291 revealed a network of approximately 75 differentially regulated transcripts at late exponential growth phase, including many genes associated with flagellar assembly and function, such as the major structural subunit, FliC. Other differentially regulated genes include several involved in bis-(3'-5')-cyclic dimeric GMP (c-di-GMP) synthesis and toxin A expression. C. difficile 027 R20291 agrA mutant derivatives were poorly flagellated and exhibited reduced levels of colonization and relapses in the murine infection model. Thus, the agr locus likely plays a contributory role in the fitness and virulence potential of C. difficile strains in the 027/BI/NAP1 lineage.
Funded by: Medical Research Council: 93614, G1000214; Wellcome Trust: 086418, 086418/Z, 098051
Journal of bacteriology 2013;195;16;3672-81
Playing fast and loose with mutation.
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Nature reviews. Microbiology 2013;11;12;822
Distinguishable epidemics of multidrug-resistant Salmonella Typhimurium DT104 in different hosts.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.
The global epidemic of multidrug-resistant Salmonella Typhimurium DT104 provides an important example, both in terms of the agent and its resistance, of a widely disseminated zoonotic pathogen. Here, with an unprecedented national collection of isolates collected contemporaneously from humans and animals and including a sample of internationally derived isolates, we have used whole-genome sequencing to dissect the phylogenetic associations of the bacterium and its antimicrobial resistance genes through the course of an epidemic. Contrary to current tenets supporting a single homogeneous epidemic, we demonstrate that the bacterium and its resistance genes were largely maintained within animal and human populations separately and that there was limited transmission, in either direction. We also show considerable variation in the resistance profiles, in contrast to the largely stable bacterial core genome, which emphasizes the critical importance of integrated genotypic data sets in understanding the ecology of bacterial zoonoses and antimicrobial resistance.
Funded by: European Research Council: 260864; NHGRI NIH HHS: HG006139, R01 HG006139; NIAID NIH HHS: AI107034, R01 AI107034; NIGMS NIH HHS: R01 GM086887; Wellcome Trust: 098051
Science (New York, N.Y.) 2013;341;6153;1514-7
Mosaic copy number variation in human neurons.
Laboratory of Genetics, Salk Institute for Biological Studies, La Jolla, CA 92037, USA.
We used single-cell genomic approaches to map DNA copy number variation (CNV) in neurons obtained from human induced pluripotent stem cell (hiPSC) lines and postmortem human brains. We identified aneuploid neurons, as well as numerous subchromosomal CNVs in euploid neurons. Neurotypic hiPSC-derived neurons had larger CNVs than fibroblasts, and several large deletions were found in hiPSC-derived neurons but not in matched neural progenitor cells. Single-cell sequencing of endogenous human frontal cortex neurons revealed that 13 to 41% of neurons have at least one megabase-scale de novo CNV, that deletions are twice as common as duplications, and that a subset of neurons have highly aberrant genomes marked by multiple alterations. Our results show that mosaic CNV is abundant in human neurons.
Funded by: NCCDPHP CDC HHS: DP20D006493-01; NICHD NIH HHS: N01-HD-9-011; NIH HHS: DP2 OD006493; NIMH NIH HHS: R01 MH095741; PHS HHS: HHSN2752009000011C
Science (New York, N.Y.) 2013;342;6158;632-7
Retrospective analysis of whole genome sequencing compared to prospective typing data in further informing the epidemiological investigation of an outbreak of Shigella sonnei in the UK.
North East and North Central London Health Protection Unit, Health Protection Agency, London, UK.
The aim of this study was to retrospectively assess the value of whole genome sequencing (WGS) compared to conventional typing methods in the investigation and control of an outbreak of Shigella sonnei in the Orthodox Jewish (OJ) community in the UK. The genome sequence analysis showed that the strains implicated in the outbreak formed three phylogenetically distinct clusters. One cluster represented cases associated with recent exposure to a single strain, whereas the other two clusters represented related but distinct strains of S. sonnei circulating in the OJ community across the UK. The WGS data challenged the conclusions drawn during the initial outbreak investigation and allowed cases of dysentery to be implicated or ruled out of the outbreak that were previously misclassified. This study showed that the resolution achieved using WGS would have clearly defined the outbreak, thus facilitating the promotion of infection control measures within local schools and the dissemination of a stronger public health message to the community.
Funded by: Wellcome Trust: 098051
Epidemiology and infection 2013;141;12;2568-75
Association study of common genetic variants and HIV-1 acquisition in 6,300 infected cases and 7,200 controls.
School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
Multiple genome-wide association studies (GWAS) have been performed in HIV-1 infected individuals, identifying common genetic influences on viral control and disease course. Similarly, common genetic correlates of acquisition of HIV-1 after exposure have been interrogated using GWAS, although in generally small samples. Under the auspices of the International Collaboration for the Genomics of HIV, we have combined the genome-wide single nucleotide polymorphism (SNP) data collected by 25 cohorts, studies, or institutions on HIV-1 infected individuals and compared them to carefully matched population-level data sets (a list of all collaborators appears in Note S1 in Text S1). After imputation using the 1,000 Genomes Project reference panel, we tested approximately 8 million common DNA variants (SNPs and indels) for association with HIV-1 acquisition in 6,334 infected patients and 7,247 population samples of European ancestry. Initial association testing identified the SNP rs4418214, the C allele of which is known to tag the HLA-B*57:01 and B*27:05 alleles, as genome-wide significant (p = 3.6 × 10⁻¹¹). However, restricting analysis to individuals with a known date of seroconversion suggested that this association was due to the frailty bias in studies of lethal diseases. Further analyses including testing recessive genetic models, testing for bulk effects of non-genome-wide significant variants, stratifying by sexual or parenteral transmission risk and testing previously reported associations showed no evidence for genetic influence on HIV-1 acquisition (with the exception of CCR5Δ32 homozygosity). Thus, these data suggest that genetic influences on HIV acquisition are either rare or have smaller effects than can be detected by this sample size.
Funded by: Howard Hughes Medical Institute; Intramural NIH HHS; NHLBI NIH HHS: R01 HL087676; NIAID NIH HHS: R37 AI047734, T32 AI007140, UM1 AI069496; NIDA NIH HHS: R01 DA012568; PHS HHS: HHSN26120080001E
PLoS pathogens 2013;9;7;e1003515
The evolutionary path to extraintestinal pathogenic, drug-resistant Escherichia coli is marked by drastic reduction in detectable recombination within the core genome.
Pathogen Research Group, Nottingham Trent University, United Kingdom. email@example.com
Escherichia coli is a highly diverse group of pathogens ranging from commensal of the intestinal tract, through to intestinal pathogen, and extraintestinal pathogen. Here, we present data on the population diversity of E. coli, using Bayesian analysis to identify 13 distinct clusters within the population from multilocus sequence typing data, which map onto a whole-genome-derived phylogeny based on 62 genome sequences. Bayesian analysis of recombination within the core genome identified reduction in detectable core genome recombination as one moves from the commensals, through the intestinal pathogens down to the multidrug-resistant extraintestinal pathogenic clone E. coli ST131. Our data show that the emergence of a multidrug-resistant, extraintestinal pathogenic lineage of E. coli is marked by substantial reduction in detectable core genome recombination, resulting in a lineage which is phylogenetically distinct and sexually isolated in terms of core genome recombination.
Genome biology and evolution 2013;5;4;699-710
Selecting antagonistic antibodies that control differentiation through inducible expression in embryonic stem cells.
Department of Biochemistry, University of Cambridge, Cambridge CB2 1QW, United Kingdom.
Antibodies that modulate receptor function have great untapped potential in the control of stem cell differentiation. In contrast to many natural ligands, antibodies are stable, exquisitely specific, and are unaffected by the regulatory mechanisms that act on natural ligands. Here we describe an innovative system for identifying such antibodies by introducing and expressing antibody gene populations in ES cells. Following induced antibody expression and secretion, changes in differentiation outcomes of individual antibody-expressing ES clones are monitored using lineage-specific gene expression to identify clones that encode and express signal-modifying antibodies. This in-cell expression and reporting system was exemplified by generating blocking antibodies to FGF4 and its receptor FGFR1β, identified through delayed onset of ES cell differentiation. Functionality of the selected antibodies was confirmed by addition of exogenous antibodies to three different ES reporter cell lines, where retained expression of pluripotency markers Oct4, Nanog, and Rex1 was observed. This work demonstrates the potential for discovery and utility of functional antibodies in stem cell differentiation. This work is also unique in constituting an example of ES cells carrying an inducible antibody that causes a functional protein "knock-down" and allows temporal control of stable signaling components at the protein level.
Proceedings of the National Academy of Sciences of the United States of America 2013;110;44;17802-7
Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties.
European Bioinformatics Institute, Wellcome Trust Genome Campus-Cambridge, Cambridge, United Kingdom.
Predicting the response of a specific cancer to a therapy is a major goal in modern oncology that should ultimately lead to a personalised treatment. High-throughput screenings of potentially active compounds against a panel of genomically heterogeneous cancer cell lines have unveiled multiple relationships between genomic alterations and drug responses. Various computational approaches have been proposed to predict sensitivity based on genomic features, while others have used the chemical properties of the drugs to ascertain their effect. In an effort to integrate these complementary approaches, we developed machine learning models to predict the response of cancer cell lines to drug treatment, quantified through IC₅₀ values, based on both the genomic features of the cell lines and the chemical properties of the considered drugs. Models predicted IC₅₀ values in a 8-fold cross-validation and an independent blind test with coefficient of determination R² of 0.72 and 0.64 respectively. Furthermore, models were able to predict with comparable accuracy (R² of 0.61) IC50s of cell lines from a tissue not used in the training stage. Our in silico models can be used to optimise the experimental design of drug-cell screenings by estimating a large proportion of missing IC₅₀ values rather than experimentally measuring them. The implications of our results go beyond virtual drug screening design: potentially thousands of drugs could be probed in silico to systematically test their potential efficacy as anti-tumour agents based on their structure, thus providing a computational framework to identify new drug repositioning opportunities as well as ultimately be useful for personalized medicine by linking the genomic traits of patients to drug sensitivity.
Funded by: Cancer Research UK; Medical Research Council: G0902106; Wellcome Trust
PloS one 2013;8;4;e61318
Biomarkers for type 2 diabetes and impaired fasting glucose using a nontargeted metabolomics approach.
Department of Twin Research and Genetic Epidemiology, King's College London, London, U.K.
Using a nontargeted metabolomics approach of 447 fasting plasma metabolites, we searched for novel molecular markers that arise before and after hyperglycemia in a large population-based cohort of 2,204 females (115 type 2 diabetic [T2D] case subjects, 192 individuals with impaired fasting glucose [IFG], and 1,897 control subjects) from TwinsUK. Forty-two metabolites from three major fuel sources (carbohydrates, lipids, and proteins) were found to significantly correlate with T2D after adjusting for multiple testing; of these, 22 were previously reported as associated with T2D or insulin resistance. Fourteen metabolites were found to be associated with IFG. Among the metabolites identified, the branched-chain keto-acid metabolite 3-methyl-2-oxovalerate was the strongest predictive biomarker for IFG after glucose (odds ratio [OR] 1.65 [95% CI 1.39-1.95], P = 8.46 × 10(-9)) and was moderately heritable (h(2) = 0.20). The association was replicated in an independent population (n = 720, OR 1.68 [ 1.34-2.11], P = 6.52 × 10(-6)) and validated in 189 twins with urine metabolomics taken at the same time as plasma (OR 1.87 [1.27-2.75], P = 1 × 10(-3)). Results confirm an important role for catabolism of branched-chain amino acids in T2D and IFG. In conclusion, this T2D-IFG biomarker study has surveyed the broadest panel of nontargeted metabolites to date, revealing both novel and known associated metabolites and providing potential novel targets for clinical prediction and a deeper understanding of causal mechanisms.
Funded by: Medical Research Council: MC_UU_12013/1; Wellcome Trust: 092447/Z/10/Z, WT091310, WT098051
Metabolomic markers reveal novel pathways of ageing and early development in human populations.
Department of Twin Research & Genetic Epidemiology, King's College London, London, UK, Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg, Germany, Institute of Genetic Epidemiology, Helmholtz Zentrum München, Neuherberg, Germany, Institute of Epidemiology I, Helmholtz Zentrum München, Neuherberg, Germany, Pfizer Research Laboratories, Groton, CT, USA, Worldwide R&D, Pfizer Inc., Cambridge, MA, USA, School of Medicine and Pharmacology, University of Western Australia, Crawley, WA, Australia, Department of Endocrinology and Diabetes, Sir Charles Gairdner Hospital, Nedlands, WA, Australia, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK, Metabolon Inc., 617 Davis Drive, Durham, NC 27713, USA; Department of Physiology and Biophysics, Weill Cornell Medical College in Qatar, Education City, Qatar Foundation, Doha, State of Qatar and Academic Rheumatology, University of Nottingham, Nottingham City Hospital, Nottingham, UK.
Background: Human ageing is a complex, multifactorial process and early developmental factors affect health outcomes in old age.
Methods: Metabolomic profiling on fasting blood was carried out in 6055 individuals from the UK. Stepwise regression was performed to identify a panel of independent metabolites which could be used as a surrogate for age. We also investigated the association with birthweight overall and within identical discordant twins and with genome-wide methylation levels.
Results: We identified a panel of 22 metabolites which combined are strongly correlated with age (R(2) = 59%) and with age-related clinical traits independently of age. One particular metabolite, C-glycosyl tryptophan (C-glyTrp), correlated strongly with age (beta = 0.03, SE = 0.001, P = 7.0 × 10(-157)) and lung function (FEV1 beta = -0.04, SE = 0.008, P = 1.8 × 10(-8) adjusted for age and confounders) and was replicated in an independent population (n = 887). C-glyTrp was also associated with bone mineral density (beta = -0.01, SE = 0.002, P = 1.9 × 10(-6)) and birthweight (beta = -0.06, SE = 0.01, P = 2.5 × 10(-9)). The difference in C-glyTrp levels explained 9.4% of the variance in the difference in birthweight between monozygotic twins. An epigenome-wide association study in 172 individuals identified three CpG-sites, associated with levels of C-glyTrp (P < 2 × 10(-6)). We replicated one CpG site in the promoter of the WDR85 gene in an independent sample of 350 individuals (beta = -0.20, SE = 0.04, P = 2.9 × 10(-8)). WDR85 is a regulator of translation elongation factor 2, essential for protein synthesis in eukaryotes.
Conclusions: Our data illustrate how metabolomic profiling linked with epigenetic studies can identify some key molecular mechanisms potentially determined in early development that produce long-term physiological changes influencing human health and ageing.
International journal of epidemiology 2013
Quantifying single nucleotide variant detection sensitivity in exome sequencing.
MRC Human Genetics Unit, MRC Institute for Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, UK. firstname.lastname@example.org.
Background: The targeted capture and sequencing of genomic regions has rapidly demonstrated its utility in genetic studies. Inherent in this technology is considerable heterogeneity of target coverage and this is expected to systematically impact our sensitivity to detect genuine polymorphisms. To fully interpret the polymorphisms identified in a genetic study it is often essential to both detect polymorphisms and to understand where and with what probability real polymorphisms may have been missed. Results: Using down-sampling of 30 deeply sequenced exomes and a set of gold-standard single nucleotide variant (SNV) genotype calls for each sample, we developed an empirical model relating the read depth at a polymorphic site to the probability of calling the correct genotype at that site. We find that measured sensitivity in SNV detection is substantially worse than that predicted from the naive expectation of sampling from a binomial. This calibrated model allows us to produce single nucleotide resolution SNV sensitivity estimates which can be merged to give summary sensitivity measures for any arbitrary partition of the target sequences (nucleotide, exon, gene, pathway, exome). These metrics are directly comparable between platforms and can be combined between samples to give "power estimates" for an entire study. We estimate a local read depth of 13X is required to detect the alleles and genotype of a heterozygous SNV 95% of the time, but only 3X for a homozygous SNV. At a mean on-target read depth of 20X, commonly used for rare disease exome sequencing studies, we predict 5-15% of heterozygous and 1-4% of homozygous SNVs in the targeted regions will be missed. Conclusions: Non-reference alleles in the heterozygote state have a high chance of being missed when commonly applied read coverage thresholds are used despite the widely held assumption that there is good polymorphism detection at these coverage levels. Such alleles are likely to be of functional importance in population based studies of rare diseases, somatic mutations in cancer and explaining the "missing heritability" of quantitative traits.
BMC bioinformatics 2013;14;195
Empirical research on the ethics of genomic research.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. email@example.com
Funded by: Wellcome Trust
American journal of medical genetics. Part A 2013;161A;8;2099-101
Multiple populations of artemisinin-resistant Plasmodium falciparum in Cambodia.
Medical Research Council MRC Centre for Genomics and Global Health, University of Oxford, Oxford, UK.
We describe an analysis of genome variation in 825 P. falciparum samples from Asia and Africa that identifies an unusual pattern of parasite population structure at the epicenter of artemisinin resistance in western Cambodia. Within this relatively small geographic area, we have discovered several distinct but apparently sympatric parasite subpopulations with extremely high levels of genetic differentiation. Of particular interest are three subpopulations, all associated with clinical resistance to artemisinin, which have skewed allele frequency spectra and high levels of haplotype homozygosity, indicative of founder effects and recent population expansion. We provide a catalog of SNPs that show high levels of differentiation in the artemisinin-resistant subpopulations, including codon variants in transporter proteins and DNA mismatch repair proteins. These data provide a population-level genetic framework for investigating the biological origins of artemisinin resistance and for defining molecular markers to assist in its elimination.
Funded by: Howard Hughes Medical Institute: 55005502; Medical Research Council: G0600718, G19/9, MC_U190081987; NIAID NIH HHS: R01 AI101713; Wellcome Trust: 082370, 089275, 089276, 090532, 090532/Z/09/Z, 090770, 090770/Z/09/Z, 093956, 098051, G0600718
Nature genetics 2013;45;6;648-55
Nuclear Wave1 is required for reprogramming transcription in oocytes and for normal development.
Wellcome Trust/Cancer Research UK Gurdon Institute, The Henry Wellcome Building of Cancer and Developmental Biology, Cambridge, UK. firstname.lastname@example.org
Eggs and oocytes have a remarkable ability to induce transcription of sperm after normal fertilization and in somatic nuclei after somatic cell nuclear transfer. This ability of eggs and oocytes is essential for normal development. Nuclear actin and actin-binding proteins have been shown to contribute to transcription, although their mode of action is elusive. Here, we find that Xenopus Wave1, previously characterized as a protein involved in actin cytoskeleton organization, is present in the oocyte nucleus and is required for efficient transcriptional reprogramming. Moreover, Wave1 knockdown in embryos results in abnormal development and defective hox gene activation. Nuclear Wave1 binds by its WHD domain to active transcription components, and this binding contributes to the action of RNA polymerase II. We identify Wave1 as a maternal reprogramming factor that also has a necessary role in gene activation in development.
Funded by: Medical Research Council: G1001690/1; Wellcome Trust: 088333, 089613, 092096, 101050, 101050/Z/13/Z, WT077187, WT089613
Science (New York, N.Y.) 2013;341;6149;1002-5
Deciphering the Mechanisms of Developmental Disorders (DMDD): a new programme for phenotyping embryonic lethal mice.
International efforts to test gene function in the mouse by the systematic knockout of each gene are creating many lines in which embryonic development is compromised. These homozygous lethal mutants represent a potential treasure trove for the biomedical community. Developmental biologists could exploit them in their studies of tissue differentiation and organogenesis; for clinical researchers they offer a powerful resource for investigating the origins of developmental diseases that affect newborns. Here, we outline a new programme of research in the UK aiming to kick-start research with embryonic lethal mouse lines. The 'Deciphering the Mechanisms of Developmental Disorders' (DMDD) programme has the ambitious goal of identifying all embryonic lethal knockout lines made in the UK over the next 5 years, and will use a combination of comprehensive imaging and transcriptomics to identify abnormalities in embryo structure and development. All data will be made freely available, enabling individual researchers to identify lines relevant to their research. The DMDD programme will coordinate its work with similar international efforts through the umbrella of the International Mouse Phenotyping Consortium [see accompanying Special Article (Adams et al., 2013)] and, together, these programmes will provide a novel database for embryonic development, linking gene identity with molecular profiles and morphology phenotypes.
Funded by: British Heart Foundation: RG/10/17/28553; Cancer Research UK: 12401, 13031; Medical Research Council: G0801124, G0901525, MC_U117562103; Wellcome Trust: 059312, 090532, 100160
Disease models & mechanisms 2013;6;3;562-6
MiR-210 is induced by Oct-2, regulates B cells, and inhibits autoantibody production.
Department of Medicine, Cambridge Institute for Medical Research, University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Cambridge CB2 0XY, United Kingdom.
MicroRNAs (MiRs) are small, noncoding RNAs that regulate gene expression posttranscriptionally. In this study, we show that MiR-210 is induced by Oct-2, a key transcriptional mediator of B cell activation. Germline deletion of MiR-210 results in the development of autoantibodies from 5 mo of age. Overexpression of MiR-210 in vivo resulted in cell autonomous expansion of the B1 lineage and impaired fitness of B2 cells. Mice overexpressing MiR-210 exhibited impaired class-switched Ab responses, a finding confirmed in wild-type B cells transfected with a MiR-210 mimic. In vitro studies demonstrated defects in cellular proliferation and cell cycle entry, which were consistent with the transcriptomic analysis demonstrating downregulation of genes involved in cellular proliferation and B cell activation. These findings indicate that Oct-2 induction of MiR-210 provides a novel inhibitory mechanism for the control of B cells and autoantibody production.
Funded by: Biotechnology and Biological Sciences Research Council; Cancer Research UK: 11832; Medical Research Council; Wellcome Trust: 06753AIA, 083650, 100140, WT098051
Journal of immunology (Baltimore, Md. : 1950) 2013;191;6;3037-48
The origin, evolution, and functional impact of short insertion-deletion variants identified in 179 human genomes.
Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, 1211, Switzerland;
Short insertions and deletions (indels) are the second most abundant form of human genetic variation, but our understanding of their origins and functional effects lags behind that of other types of variants. Using population-scale sequencing, we have identified a high-quality set of 1.6 million indels from 179 individuals representing three diverse human populations. We show that rates of indel mutagenesis are highly heterogeneous, with 43%-48% of indels occurring in 4.03% of the genome, whereas in the remaining 96% their prevalence is 16 times lower than SNPs. Polymerase slippage can explain upwards of three-fourths of all indels, with the remainder being mostly simple deletions in complex sequence. However, insertions do occur and are significantly associated with pseudo-palindromic sequence features compatible with the fork stalling and template switching (FoSTeS) mechanism more commonly associated with large structural variations. We introduce a quantitative model of polymerase slippage, which enables us to identify indel-hypermutagenic protein-coding genes, some of which are associated with recurrent mutations leading to disease. Accounting for mutational rate heterogeneity due to sequence context, we find that indels across functional sequence are generally subject to stronger purifying selection than SNPs. We find that indel length modulates selection strength, and that indels affecting multiple functionally constrained nucleotides undergo stronger purifying selection. We further find that indels are enriched in associations with gene expression and find evidence for a contribution of nonsense-mediated decay. Finally, we show that indels can be integrated in existing genome-wide association studies (GWAS); although we do not find direct evidence that potentially causal protein-coding indels are enriched with associations to known disease-associated SNPs, our findings suggest that the causal variant underlying some of these associations may be indels.
Genome research 2013;23;5;749-61
Implementing a successful data-management framework: the UK10K managed access model.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
This paper outlines the history behind open access principles and describes the development of a managed access data-sharing process for the UK10K Project, currently Britain's largest genomic sequencing consortium (2010 to 2013). Funded by the Wellcome Trust, the purpose of UK10K was two-fold: to investigate how low-frequency and rare genetic variants contribute to human disease, and to provide an enduring data resource for future research into human genetics. In this paper, we discuss the challenge of reconciling data-sharing principles with the practicalities of delivering a sequencing project of UK10K's scope and magnitude. We describe the development of a sustainable, easy-to-use managed access system that allowed rapid access to UK10K data, while protecting the interests of participants and data generators alike. Specifically, we focus in depth on the three key issues that emerge in the data pipeline: study recruitment, data release and data access.
Funded by: Wellcome Trust: 092731
Genome medicine 2013;5;11;100
Functional transcriptomics in the post-ENCODE era.
Department of Informatics, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, United Kingdom.
The last decade has seen tremendous effort committed to the annotation of the human genome sequence, most notably perhaps in the form of the ENCODE project. One of the major findings of ENCODE, and other genome analysis projects, is that the human transcriptome is far larger and more complex than previously thought. This complexity manifests, for example, as alternative splicing within protein-coding genes, as well as in the discovery of thousands of long noncoding RNAs. It is also possible that significant numbers of human transcripts have not yet been described by annotation projects, while existing transcript models are frequently incomplete. The question as to what proportion of this complexity is truly functional remains open, however, and this ambiguity presents a serious challenge to genome scientists. In this article, we will discuss the current state of human transcriptome annotation, drawing on our experience gained in generating the GENCODE gene annotation set. We highlight the gaps in our knowledge of transcript functionality that remain, and consider the potential computational and experimental strategies that can be used to help close them. We propose that an understanding of the true overlap between transcriptional complexity and functionality will not be gained in the short term. However, significant steps toward obtaining this knowledge can now be taken by using an integrated strategy, combining all of the experimental resources at our disposal.
Funded by: NHGRI NIH HHS: 5U54HG004555, U41 HG007234, U54 HG004555; Wellcome Trust: WT098051
Genome research 2013;23;12;1961-73
Independent specialization of the human and mouse X chromosomes for the male germ line.
Whitehead Institute, Cambridge, Massachusetts, USA.
We compared the human and mouse X chromosomes to systematically test Ohno's law, which states that the gene content of X chromosomes is conserved across placental mammals. First, we improved the accuracy of the human X-chromosome reference sequence through single-haplotype sequencing of ampliconic regions. The new sequence closed gaps in the reference sequence, corrected previously misassembled regions and identified new palindromic amplicons. Our subsequent analysis led us to conclude that the evolution of human and mouse X chromosomes was bimodal. In accord with Ohno's law, 94-95% of X-linked single-copy genes are shared by humans and mice; most are expressed in both sexes. Notably, most X-ampliconic genes are exceptions to Ohno's law: only 31% of human and 22% of mouse X-ampliconic genes had orthologs in the other species. X-ampliconic genes are expressed predominantly in testicular germ cells, and many were independently acquired since divergence from the common ancestor of humans and mice, specializing portions of their X chromosomes for sperm production.
Nature genetics 2013
A powerful molecular synergy between mutant Nucleophosmin and Flt3-ITD drives acute myeloid leukemia in mice.
Funded by: Wellcome Trust: 095663
Evolution of equine influenza virus in vaccinated horses.
Medical Research Council-University of Glasgow Centre for Virus Research, Institute of Infection, Inflammation and Immunity, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom.
Influenza A viruses are characterized by their ability to evade host immunity, even in vaccinated individuals. To determine how prior immunity shapes viral diversity in vivo, we studied the intra- and interhost evolution of equine influenza virus in vaccinated horses. Although the level and structure of genetic diversity were similar to those in naïve horses, intrahost bottlenecks may be more stringent in vaccinated animals, and mutations shared among horses often fall close to putative antigenic sites.
Funded by: Medical Research Council: G0801822; NIGMS NIH HHS: R01 GM080533, R01 GM080533-06; Wellcome Trust
Journal of virology 2013;87;8;4768-71
Sociodemographic distribution of non-communicable disease risk factors in rural Uganda: a cross-sectional study.
Department of Public Health & Primary Care, University of Cambridge, Cambridge, UK, Wellcome Trust Sanger Institute, Hinxton, UK, Medical Research Council/Uganda Virus Research Institute (MRC/UVRI), Uganda Research Unit on AIDS, Entebbe, Uganda, London School of Hygiene and Tropical Medicine, London, UK and School of International Development, University of East Anglia, Norwich, UK.
Background: Non-communicable diseases (NCDs) are rapidly becoming leading causes of morbidity and mortality in low- and middle-income countries, including those in sub-Saharan Africa. In contrast to high-income countries, the sociodemographic distribution, including socioeconomic inequalities, of NCDs and their risk factors is unclear in sub-Saharan Africa, particularly among rural populations.
Methods: We undertook a cross-sectional population-based survey of 7809 residents aged 13 years or older in the General Population Cohort in south-western rural Uganda. Information on behavioural, physiological and biochemical risk factors was obtained using standardized methods as recommended by the WHO STEPwise Approach to Surveillance. Socioeconomic status (SES) was determined by principal component analysis including household features, ownership, and occupation and education of the head of household.
Results: SES was found to be associated with NCD risk factors in this rural population. Smoking, alcohol consumption (men only) and low high-density lipoprotein (HDL) cholesterol were more common among those of lower SES. For example, the prevalence of smoking decreased 4-fold from the lowest to the highest SES groups, from 22.0% to 5.7% for men and 2.2% to 0.4% for women, respectively. In contrast, overweight, raised blood pressure, raised HbA1c (women only) and raised cholesterol were more common among those of higher SES. For example, the prevalence of overweight increased 5-fold from 2.1% to 10.1% for men, and 2-fold from 12.0% to 23.4% for women, from the lowest to highest SES groups respectively. However, neither low physical activity nor fruit, vegetable or staples consumption was associated with SES. Furthermore, associations between NCD risk factors and SES were modified by age and sex.
Conclusions: Within this rural population, NCD risk factors are common and vary both inversely and positively across the SES gradient. A better understanding of the determinants of the sociodemographic distribution of NCDs and their risk factors in rural sub-Saharan African populations will help identify populations at most risk of developing NCDs and help plan interventions to reduce their burden.
Funded by: Medical Research Council: G0801566, G0901213, G0901213-92157, MC_U950080926; Wellcome Trust: 079643
International journal of epidemiology 2013;42;6;1740-53
Cardiometabolic risk in a rural Ugandan population.
Funded by: Medical Research Council: G0801566, G0901213, MC_U950080926
Diabetes care 2013;36;9;e143
Somatic CALR mutations in myeloproliferative neoplasms with nonmutated JAK2.
The authors' full names, degrees, and affiliations are listed in the Appendix.
Background: Somatic mutations in the Janus kinase 2 gene (JAK2) occur in many myeloproliferative neoplasms, but the molecular pathogenesis of myeloproliferative neoplasms with nonmutated JAK2 is obscure, and the diagnosis of these neoplasms remains a challenge.
Methods: We performed exome sequencing of samples obtained from 151 patients with myeloproliferative neoplasms. The mutation status of the gene encoding calreticulin (CALR) was assessed in an additional 1345 hematologic cancers, 1517 other cancers, and 550 controls. We established phylogenetic trees using hematopoietic colonies. We assessed calreticulin subcellular localization using immunofluorescence and flow cytometry.
Results: Exome sequencing identified 1498 mutations in 151 patients, with medians of 6.5, 6.5, and 13.0 mutations per patient in samples of polycythemia vera, essential thrombocythemia, and myelofibrosis, respectively. Somatic CALR mutations were found in 70 to 84% of samples of myeloproliferative neoplasms with nonmutated JAK2, in 8% of myelodysplasia samples, in occasional samples of other myeloid cancers, and in none of the other cancers. A total of 148 CALR mutations were identified with 19 distinct variants. Mutations were located in exon 9 and generated a +1 base-pair frameshift, which would result in a mutant protein with a novel C-terminal. Mutant calreticulin was observed in the endoplasmic reticulum without increased cell-surface or Golgi accumulation. Patients with myeloproliferative neoplasms carrying CALR mutations presented with higher platelet counts and lower hemoglobin levels than patients with mutated JAK2. Mutation of CALR was detected in hematopoietic stem and progenitor cells. Clonal analyses showed CALR mutations in the earliest phylogenetic node, a finding consistent with its role as an initiating mutation in some patients.
Conclusions: Somatic mutations in the endoplasmic reticulum chaperone CALR were found in a majority of patients with myeloproliferative neoplasms with nonmutated JAK2. (Funded by the Kay Kendall Leukaemia Fund and others.).
Funded by: Canadian Institutes of Health Research; Cancer Research UK: 12765, 8961; Medical Research Council: G0300497; Wellcome Trust: 079249, 084812, 093867, 100140
The New England journal of medicine 2013;369;25;2391-405
Special focus: bioinformatics.
Eddy Lab; HHMI Janelia Farm Research Campus; Ashburn, VA USA.
RNA biology 2013;10;7;1160
The relative timing of mutations in a breast cancer genome.
Hutchison/MRC Research Centre and Department of Pathology, University of Cambridge, Cambridge, United Kingdom.
Many tumors have highly rearranged genomes, but a major unknown is the relative importance and timing of genome rearrangements compared to sequence-level mutation. Chromosome instability might arise early, be a late event contributing little to cancer development, or happen as a single catastrophic event. Another unknown is which of the point mutations and rearrangements are selected. To address these questions we show, using the breast cancer cell line HCC1187 as a model, that we can reconstruct the likely history of a breast cancer genome. We assembled probably the most complete map to date of a cancer genome, by combining molecular cytogenetic analysis with sequence data. In particular, we assigned most sequence-level mutations to individual chromosomes by sequencing of flow sorted chromosomes. The parent of origin of each chromosome was assigned from SNP arrays. We were then able to classify most of the mutations as earlier or later according to whether they occurred before or after a landmark event in the evolution of the genome, endoreduplication (duplication of its entire genome). Genome rearrangements and sequence-level mutations were fairly evenly divided earlier and later, suggesting that genetic instability was relatively constant throughout the life of this tumor, and chromosome instability was not a late event. Mutations that caused chromosome instability would be in the earlier set. Strikingly, the great majority of inactivating mutations and in-frame gene fusions happened earlier. The non-random timing of some of the mutations may be evidence that they were selected.
PloS one 2013;8;6;e64991
Comparative genomics in Chlamydomonas and Plasmodium identifies an ancient nuclear envelope protein family essential for sexual reproduction in protists, fungi, plants, and vertebrates.
Department of Cell Biology, University of Texas Southwestern Medical School, Dallas, Texas 75390, USA.
Fertilization is a crucial yet poorly characterized event in eukaryotes. Our previous discovery that the broadly conserved protein HAP2 (GCS1) functioned in gamete membrane fusion in the unicellular green alga Chlamydomonas and the malaria pathogen Plasmodium led us to exploit the rare biological phenomenon of isogamy in Chlamydomonas in a comparative transcriptomics strategy to uncover additional conserved sexual reproduction genes. All previously identified Chlamydomonas fertilization-essential genes fell into related clusters based on their expression patterns. Out of several conserved genes in a minus gamete cluster, we focused on Cre06.g280600, an ortholog of the fertilization-related Arabidopsis GEX1. Gene disruption, cell biological, and immunolocalization studies show that CrGEX1 functions in nuclear fusion in Chlamydomonas. Moreover, CrGEX1 and its Plasmodium ortholog, PBANKA_113980, are essential for production of viable meiotic progeny in both organisms and thus for mosquito transmission of malaria. Remarkably, we discovered that the genes are members of a large, previously unrecognized family whose first-characterized member, KAR5, is essential for nuclear fusion during yeast sexual reproduction. Our comparative transcriptomics approach provides a new resource for studying sexual development and demonstrates that exploiting the data can lead to the discovery of novel biology that is conserved across distant taxa.
Funded by: Medical Research Council: G0501670; NCRR NIH HHS: C06 RR 30414; NIGMS NIH HHS: GM25661, GM56778, R01 GM056778; Wellcome Trust: 098051
Genes & development 2013;27;10;1198-215
Chlamydia trachomatis clinical isolates identified as tetracycline resistant do not exhibit resistance in vitro: whole-genome sequencing reveals a mutation in porB but no evidence for tetracycline resistance genes.
Faculty of Medicine, CES Academic Unit, University of Southampton, Southampton General Hospital, Tremona Road, Southampton, UK. email@example.com
Chlamydia trachomatis is the most common bacterial sexually transmitted infection worldwide and the leading cause of preventable blindness in developing countries. Tetracycline is commonly the drug of choice for treating C. trachomatis infections, but cases of antibiotic resistance in clinical isolates have previously been reported. Here, we used antibiotic resistance assays and whole-genome sequencing to interrogate the hypothesis that two clinical isolates (IU824 and IU888) have acquired mechanisms of antibiotic resistance. Immunofluorescence staining was used to identify C. trachomatis inclusions in cell cultures grown in the presence of tetracycline; however, only antibiotic-free control cultures yielded the strong fluorescence associated with the presence of chlamydial inclusions. Infectivity was lost upon passage of harvested cultures grown in the presence of tetracycline into antibiotic-free medium, so we conclude that these isolates were phenotypically sensitive to tetracycline. Comparisons of the genome and plasmid sequences for the two isolates with tetracycline-sensitive strains did not identify regions of low sequence identity that could accommodate horizontally acquired resistance genes, and the tetracycline binding region of the 16S rRNA gene was identical to that of the sensitive control strains. The porB gene of strain IU824, however, was found to contain a premature stop codon not previously identified, which is noteworthy but unlikely to be related to tetracycline resistance. In conclusion, we found no evidence of tetracycline resistance in the two strains investigated, and it seems most likely that the small, aberrant inclusions previously identified resulted from the high chlamydial load used in the original antibiotic resistance assays.
Microbiology (Reading, England) 2013;159;Pt 4;748-56
Meta-analysis of genome-wide association studies identifies six new Loci for serum calcium concentrations.
National Heart, Lung, and Blood Institute's Framingham Heart Study and Center for Population Studies, Framingham, Massachusetts, United States of America ; Renal Division, Massachusetts General Hospital, Boston, Massachusetts, United States of America.
Calcium is vital to the normal functioning of multiple organ systems and its serum concentration is tightly regulated. Apart from CASR, the genes associated with serum calcium are largely unknown. We conducted a genome-wide association meta-analysis of 39,400 individuals from 17 population-based cohorts and investigated the 14 most strongly associated loci in ≤21,679 additional individuals. Seven loci (six new regions) in association with serum calcium were identified and replicated. Rs1570669 near CYP24A1 (P = 9.1E-12), rs10491003 upstream of GATA3 (P = 4.8E-09) and rs7481584 in CARS (P = 1.2E-10) implicate regions involved in Mendelian calcemic disorders: Rs1550532 in DGKD (P = 8.2E-11), also associated with bone density, and rs7336933 near DGKH/KIAA0564 (P = 9.1E-10) are near genes that encode distinct isoforms of diacylglycerol kinase. Rs780094 is in GCKR. We characterized the expression of these genes in gut, kidney, and bone, and demonstrate modulation of gene expression in bone in response to dietary calcium in mice. Our results shed new light on the genetics of calcium homeostasis.
PLoS genetics 2013;9;9;e1003796
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Nature reviews. Microbiology 2013;11;11;744
Autologous antibody capture to enrich immunogenic viruses for viral discovery.
Laboratory of Experimental Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center of the University of Amsterdam, Amsterdam, The Netherlands.
Discovery of new viruses has been boosted by novel deep sequencing technologies. Currently, many viruses can be identified by sequencing without knowledge of the pathogenicity of the virus. However, attributing the presence of a virus in patient material to a disease in the patient can be a challenge. One approach to meet this challenge is identification of viral sequences based on enrichment by autologous patient antibody capture. This method facilitates identification of viruses that have provoked an immune response within the patient and may increase the sensitivity of the current virus discovery techniques. To demonstrate the utility of this method, virus discovery deep sequencing (VIDISCA-454) was performed on clinical samples from 19 patients: 13 with a known respiratory viral infection and 6 with a known gastrointestinal viral infection. Patient sera was collected from one to several months after the acute infection phase. Input and antibody capture material was sequenced and enrichment was assessed. In 18 of the 19 patients, viral reads from immunogenic viruses were enriched by antibody capture (ranging between 1.5x to 343x in respiratory material, and 1.4x to 53x in stool). Enriched reads were also determined in an identity independent manner by using a novel algorithm Xcompare. In 16 of the 19 patients, 21% to 100% of the enriched reads were derived from infecting viruses. In conclusion, the technique provides a novel approach to specifically identify immunogenic viral sequences among the bulk of sequences which are usually encountered during virus discovery metagenomics.
Funded by: Wellcome Trust: 093724
PloS one 2013;8;11;e78454
Efficient depletion of host DNA contamination in malaria clinical sequencing.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom. Samuel.firstname.lastname@example.org
The cost of whole-genome sequencing (WGS) is decreasing rapidly as next-generation sequencing technology continues to advance, and the prospect of making WGS available for public health applications is becoming a reality. So far, a number of studies have demonstrated the use of WGS as an epidemiological tool for typing and controlling outbreaks of microbial pathogens. Success of these applications is hugely dependent on efficient generation of clean genetic material that is free from host DNA contamination for rapid preparation of sequencing libraries. The presence of large amounts of host DNA severely affects the efficiency of characterizing pathogens using WGS and is therefore a serious impediment to clinical and epidemiological sequencing for health care and public health applications. We have developed a simple enzymatic treatment method that takes advantage of the methylation of human DNA to selectively deplete host contamination from clinical samples prior to sequencing. Using malaria clinical samples with over 80% human host DNA contamination, we show that the enzymatic treatment enriches Plasmodium falciparum DNA up to ∼9-fold and generates high-quality, nonbiased sequence reads covering >98% of 86,158 catalogued typeable single-nucleotide polymorphism loci.
Funded by: Medical Research Council: G19/9; Wellcome Trust: 079355/Z/06/Z, 090532
Journal of clinical microbiology 2013;51;3;745-51
Diversity among human non-typhoidal salmonellae isolates from Zimbabwe.
Microbiologica, Dipartimento di Scienze Biomediche, Universita di Sassari, Sassari, Sardegna, Italy.
Background: Non-typhoidal Salmonella infections are an important public health problem in sub-Saharan Africa, especially among children and HIV-seropositive patients in whom they may cause invasive disease.
Methods: In order to better understand the epidemiology of Salmonella infections in southern Africa we typed, using serotyping, phage typing and multilocus sequence typing, 167 non-typhoidal Salmonella strains isolated from human clinical specimens during 1995-2000.
Results: The most common serovars were Salmonella Typhimurium DT56/ST313, Salmonella Enteritidis PT4 and Salmonella Isangi ST216. Isolates of Salmonella Isangi showed a multidrug-resistant phenotype that was resistant to extended-spectrum cephalosporins. Twelve new sequence types and six new serotypes of Salmonella were identified.
Conclusions: Given the diversity detected in the study it seems likely that many new variants of S. enterica are extant in Zimbabwe and by implication across sub-Saharan Africa. We have demonstrated the presence in Zimbabwe of a multidrug-resistant strain of the serovar Salmonella Isangi and demonstrated the diversity of Salmonella circulating in one sub-Saharan African country. Further studies on the characteristics of Salmonella Isangi isolates from Zimbabwe, including plasmid typing and genotyping, are essential if effective control of the spread of this potential pathogen in sub-Saharan Africa is to be achieved.
Funded by: Medical Research Council: G0600805; Wellcome Trust
Transactions of the Royal Society of Tropical Medicine and Hygiene 2013;107;8;487-92
Ectopic Expression of Activated Notch or SOX2 Reveals Similar and Unique Roles in the Development of the Sensory Cell Progenitors in the Mammalian Inner Ear.
Department of Ophthalmology and Department of Biomedical Genetics, University of Rochester, Rochester, New York 14642, Department of Pediatric Surgery, Erasmus MC-Sophia Children's Hospital, 3000 CA Rotterdam, the Netherlands, and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.
Hearing impairment or vestibular dysfunction in humans often results from a permanent loss of critical cell types in the sensory regions of the inner ear, including hair cells, supporting cells, or cochleovestibular neurons. These important cell types arise from a common sensory or neurosensory progenitor, although little is known about how these progenitors are specified. Studies have shown that Notch signaling and the transcription factor Sox2 are required for the development of these lineages. Previously we and others demonstrated that ectopic activation of Notch can direct nonsensory cells to adopt a sensory fate, indicating a role for Notch in early specification events. Here, we explore the relationship between Notch and SOX2 by ectopically activating these factors in nonsensory regions of the mouse cochlea, and demonstrate that, similar to Notch, SOX2 can specify sensory progenitors, consistent with a role downstream of Notch signaling. However, we also show that Notch has a unique role in promoting the proliferation of the sensory progenitors. We further demonstrate that Notch can only induce ectopic sensory regions within a certain time window of development, and that the ectopic hair cells display specialized stereocilia bundles similar to endogenous hair cells. These results demonstrate that Notch and SOX2 can both drive the sensory program in nonsensory cells, indicating these factors may be useful in cell replacement strategies in the inner ear.
The Journal of neuroscience : the official journal of the Society for Neuroscience 2013;33;41;16146-16157
Tailoring the models of transcription.
The Welcome Trust Sanger Institute, Genome Campus Hinxton, Cambridge CB10 1SA, UK. email@example.com.
Molecular biology is a rapidly evolving field that has led to the development of increasingly sophisticated technologies to improve our capacity to study cellular processes in much finer detail. Transcription is the first step in protein expression and the major point of regulation of the components that determine the characteristics, fate and functions of cells. The study of transcriptional regulation has been greatly facilitated by the development of reporter genes and transcription factor expression vectors, which have become versatile tools for manipulating promoters, as well as transcription factors in order to examine their function. The understanding of promoter complexity and transcription factor structure offers an insight into the mechanisms of transcriptional control and their impact on cell behaviour. This review focuses on some of the many applications of molecular cut-and-paste tools for the manipulation of promoters and transcription factors leading to the understanding of crucial aspects of transcriptional regulation.
International journal of molecular sciences 2013;14;4;7583-97
Advances in osteoarthritis genetics.
Department of Human Genetics, Wellcome Trust Sanger Institute, Cambridgeshire, UK.
Osteoarthritis (OA), the most common form of arthritis, is a highly debilitating disease of the joints and can lead to severe pain and disability. There is no cure for OA. Current treatments often fail to alleviate its symptoms leading to an increased demand for joint replacement surgery. Previous epidemiological and genetic research has established that OA is a multifactorial disease with both environmental and genetic components. Over the past 6 years, a candidate gene study and several genome-wide association scans (GWAS) in populations of Asian and European descent have collectively established 15 loci associated with knee or hip OA that have been replicated with genome-wide significance, shedding some light on the aetiogenesis of the disease. All OA associated variants to date are common in frequency and appear to confer moderate to small effect sizes. Some of the associated variants are found within or near genes with clear roles in OA pathogenesis, whereas others point to unsuspected, less characterised pathways. These studies have also provided further evidence in support of the existence of ethnic, sex, and joint specific effects in OA and have highlighted the importance of expanded and more homogeneous phenotype definitions in genetic studies of OA.
Funded by: Arthritis Research UK: 19542; Wellcome Trust: 098051
Journal of medical genetics 2013;50;11;715-24
In search of low-frequency and rare variants affecting complex traits.
The allelic architecture of complex traits is likely to be underpinned by a combination of multiple common frequency and rare variants. Targeted genotyping arrays and next-generation sequencing technologies at the whole-genome sequencing (WGS) and whole-exome scales (WES) are increasingly employed to access sequence variation across the full minor allele frequency (MAF) spectrum. Different study design strategies that make use of diverse technologies, imputation and sample selection approaches are an active target of development and evaluation efforts. Initial insights into the contribution of rare variants in common diseases and medically relevant quantitative traits point to low-frequency and rare alleles acting either independently or in aggregate and in several cases alongside common variants. Studies conducted in population isolates have been successful in detecting rare variant associations with complex phenotypes. Statistical methodologies that enable the joint analysis of rare variants across regions of the genome continue to evolve with current efforts focusing on incorporating information such as functional annotation, and on the meta-analysis of these burden tests. In addition, population stratification, defining genome-wide statistical significance thresholds and the design of appropriate replication experiments constitute important considerations for the powerful analysis and interpretation of rare variant association studies. Progress in addressing these emerging challenges and the accrual of sufficiently large data sets are poised to help the field of complex trait genetics enter a promising era of discovery.
Funded by: Arthritis Research UK: 19542; Wellcome Trust: 098051
Human molecular genetics 2013;22;R1;R16-21
Genome-wide association study for osteoarthritis
The Lancet 2013;381;9864;373
Clinical and biological implications of driver mutations in myelodysplastic syndromes.
Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, United Kingdom;
Myelodysplastic syndromes (MDS) are a heterogeneous group of chronic hematological malignancies characterized by dysplasia, ineffective hematopoiesis and a variable risk of progression to acute myeloid leukemia. Sequencing of MDS genomes has identified mutations in genes implicated in RNA splicing, DNA modification, chromatin regulation, and cell signaling. We sequenced 111 genes across 738 patients with MDS or closely related neoplasms (including chronic myelomonocytic leukemia and MDS-myeloproliferative neoplasms) to explore the role of acquired mutations in MDS biology and clinical phenotype. Seventy-eight percent of patients had 1 or more oncogenic mutations. We identify complex patterns of pairwise association between genes, indicative of epistatic interactions involving components of the spliceosome machinery and epigenetic modifiers. Coupled with inferences on subclonal mutations, these data suggest a hypothesis of genetic "predestination," in which early driver mutations, typically affecting genes involved in RNA splicing, dictate future trajectories of disease evolution with distinct clinical phenotypes. Driver mutations had equivalent prognostic significance, whether clonal or subclonal, and leukemia-free survival deteriorated steadily as numbers of driver mutations increased. Thus, analysis of oncogenic mutations in large, well-characterized cohorts of patients illustrates the interconnections between the cancer genome and disease biology, with considerable potential for clinical application.
Funded by: Cancer Research UK; Medical Research Council: G1000729, MC_U137961146; Wellcome Trust: 077012/Z/05/Z, 088340, 100140, WT088340MA
Blood 2013;122;22;3616-27; quiz 3699
A member of the Plasmodium falciparum PHIST family binds to the erythrocyte cytoskeleton component band 4.1.
William C Gorgas Center for Geographic Medicine, Division of Infectious Diseases, Department of Medicine, University of Alabama at Birmingham, 845 19th St, South, Birmingham, AL, 35294-2170, USA. firstname.lastname@example.org.
Background: Plasmodium falciparum parasites export more than 400 proteins into the cytosol of their host erythrocytes. These exported proteins catalyse the formation of knobs on the erythrocyte plasma membrane and an overall increase in erythrocyte rigidity, presumably by modulating the endogenous erythrocyte cytoskeleton. In uninfected erythrocytes, Band 4.1 (4.1R) plays a key role in regulating erythrocyte shape by interacting with multiple proteins through the three lobes of its cloverleaf-shaped N-terminal domain. In P. falciparum-infected erythrocytes, the C-lobe of 4.1R interacts with the P. falciparum protein mature parasite-infected erythrocyte surface antigen (MESA), but it is not currently known whether other P. falciparum proteins bind to other lobes of the 4.1R N-terminal domain. Methods: In order to identify novel 4.1R interacting proteins, a yeast two-hybrid screen was performed with a fragment of 4.1R containing both the N- and α-lobes. Positive interactions were confirmed and investigated using site-directed mutagenesis, and antibodies were raised against the interacting partner to characterise it's expression and distribution in P. falciparum infected erythrocytes. Results: Yeast two-hybrid screening identified a positive interaction between the 4.1R N- and α-lobes and PF3D7_0402000. PF3D7_0402000 is a member of a large family of exported proteins that share a domain of unknown function, the PHIST domain. Domain mapping and site-directed mutagenesis established that it is the PHIST domain of PF3D7_0402000 that interacts with 4.1R. Native PF3D7_0402000 is localized at the parasitophorous vacuole membrane (PVM), and colocalizes with a subpopulation of 4.1R. Discussion: The function of the majority of P. falciparum exported proteins, including most members of the PHIST family, is unknown, and in only a handful of cases has a direct interaction between P. falciparum-exported proteins and components of the erythrocyte cytoskeleton been established. The interaction between 4.1R and PF3D7_0402000, and localization of PF3D7_0402000 with a sub-population of 4.1R at the PVM could indicate a role in modulating PVM structure. Further investigation into the mechanisms for 4.1R recruitment is needed. Conclusion: PF3D7_0402000 was identified as a new binding partner for the major erythrocyte cytoskeletal protein, 4.1R. This interaction is consistent with a growing body of literature that suggests the PHIST family members function by interacting directly with erythrocyte proteins.
Malaria journal 2013;12;160
What has high-throughput sequencing ever done for us?
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
This month's Genome Watch looks back over the past 10 years and highlights how the incredible advances in sequencing technologies have transformed research into microbial genomes.
Nature reviews. Microbiology 2013;11;10;664-5
Incidence and Characterisation of Methicillin-Resistant Staphylococcus aureus (MRSA) from Nasal Colonisation in Participants Attending a Cattle Veterinary Conference in the UK.
Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge, United Kingdom.
We sought to determine the prevalence of nasal colonisation with methicillin-resistant Staphylococcus aureus among cattle veterinarians in the UK. There was particular interest in examining the frequency of colonisation with MRSA harbouring mecC, as strains with this mecA homologue were originally identified in bovine milk and may represent a zoonotic risk to those in contact with dairy livestock. Three hundred and seven delegates at the British Cattle Veterinarian Association (BCVA) Congress 2011 in Southport, UK were screening for nasal colonisation with MRSA. Isolates were characterised by whole genome sequencing and antimicrobial susceptibility testing. Eight out of three hundred and seven delegates (2.6%) were positive for nasal colonisation with MRSA. All strains were positive for mecA and none possessed mecC. The time since a delegate's last visit to a farm was significantly shorter in the MRSA-positive group than in MRSA-negative counterparts. BCVA delegates have an increased risk of MRSA colonisation compared to the general population but their frequency of colonisation is lower than that reported from other types of veterinarian conference, and from that seen in human healthcare workers. The results indicate that recent visitation to a farm is a risk factor for MRSA colonisation and that mecC-MRSA are rare among BCVA delegates (<1% based on sample size). Contact with livestock, including dairy cattle, may still be a risk factor for human colonisation with mecC-MRSA but occurs at a rate below the lower limit of detection available in this study.
PloS one 2013;8;7;e68463
The cell-cycle state of stem cells determines cell fate propensity.
Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, Anne McLaren Laboratory for Regenerative Medicine and Department of Surgery, University of Cambridge, Cambridge CB2 0SZ, UK. Electronic address: email@example.com.
Self-renewal and differentiation of stem cells are fundamentally associated with cell-cycle progression to enable tissue specification, organ homeostasis, and potentially tumorigenesis. However, technical challenges have impaired the study of the molecular interactions coordinating cell fate choice and cell-cycle progression. Here, we bypass these limitations by using the FUCCI reporter system in human pluripotent stem cells and show that their capacity of differentiation varies during the progression of their cell cycle. These mechanisms are governed by the cell-cycle regulators cyclin D1-3 that control differentiation signals such as the TGF-β-Smad2/3 pathway. Conversely, cell-cycle manipulation using a small molecule directs differentiation of hPSCs and provides an approach to generate cell types with a clinical interest. Our results demonstrate that cell fate decisions are tightly associated with the cell-cycle machinery and reveal insights in the mechanisms synchronizing differentiation and proliferation in developing tissues.
Funded by: Medical Research Council: G0701448; Wellcome Trust: 079249
Maps of open chromatin highlight cell type-restricted patterns of regulatory sequence variation at hematological trait loci.
Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom. firstname.lastname@example.org
Nearly three-quarters of the 143 genetic signals associated with platelet and erythrocyte phenotypes identified by meta-analyses of genome-wide association (GWA) studies are located at non-protein-coding regions. Here, we assessed the role of candidate regulatory variants associated with cell type-restricted, closely related hematological quantitative traits in biologically relevant hematopoietic cell types. We used formaldehyde-assisted isolation of regulatory elements followed by next-generation sequencing (FAIRE-seq) to map regions of open chromatin in three primary human blood cells of the myeloid lineage. In the precursors of platelets and erythrocytes, as well as in monocytes, we found that open chromatin signatures reflect the corresponding hematopoietic lineages of the studied cell types and associate with the cell type-specific gene expression patterns. Dependent on their signal strength, open chromatin regions showed correlation with promoter and enhancer histone marks, distance to the transcription start site, and ontology classes of nearby genes. Cell type-restricted regions of open chromatin were enriched in sequence variants associated with hematological indices. The majority (63.6%) of such candidate functional variants at platelet quantitative trait loci (QTLs) coincided with binding sites of five transcription factors key in regulating megakaryopoiesis. We experimentally tested 13 candidate regulatory variants at 10 platelet QTLs and found that 10 (76.9%) affected protein binding, suggesting that this is a frequent mechanism by which regulatory variants influence quantitative trait levels. Our findings demonstrate that combining large-scale GWA data with open chromatin profiles of relevant cell types can be a powerful means of dissecting the genetic architecture of closely related quantitative traits.
Funded by: British Heart Foundation: RG/08/014/24067, RG/09/012/28096, RG/09/12/28096; Medical Research Council: MR/L003120/1; Wellcome Trust: 097117, 098051
Genome research 2013;23;7;1130-41
KSR2 mutations are associated with obesity, insulin resistance, and impaired cellular fuel oxidation.
Kinase suppressor of Ras 2 (KSR2) is an intracellular scaffolding protein involved in multiple signaling pathways. Targeted deletion of Ksr2 leads to obesity in mice, suggesting a role in energy homeostasis. We explored the role of KSR2 in humans by sequencing 2,101 individuals with severe early-onset obesity and 1,536 controls. We identified multiple rare variants in KSR2 that disrupt signaling through the Raf-MEKERK pathway and impair cellular fatty acid oxidation and glucose oxidation in transfected cells; effects that can be ameliorated by the commonly prescribed antidiabetic drug, metformin. Mutation carriers exhibit hyperphagia in childhood, low heart rate, reduced basal metabolic rate and severe insulin resistance. These data establish KSR2 as an important regulator of energy intake, energy expenditure, and substrate utilization in humans. Modulation of KSR2-mediated effects may represent a novel therapeutic strategy for obesity and type 2 diabetes.
Funded by: Medical Research Council: G0502115, G0900554, MC_U106179471, MC_UU_12015/1; Wellcome Trust: 077016/Z/05/Z, 096106/Z/11/Z, 098497, 098497/Z/12/Z, WT091310
Continent-wide panmixia of an African fruit bat facilitates transmission of potentially zoonotic viruses.
1] Department of Veterinary Medicine, University of Cambridge, Cambridge CB3 0ES, UK  Institute of Zoology, Zoological Society of London, Regent's Park, London NW1 4RY, UK.
The straw-coloured fruit bat, Eidolon helvum, is Africa's most widely distributed and commonly hunted fruit bat, often living in close proximity to human populations. This species has been identified as a reservoir of potentially zoonotic viruses, but uncertainties remain regarding viral transmission dynamics and mechanisms of persistence. Here we combine genetic and serological analyses of populations across Africa, to determine the extent of epidemiological connectivity among E. helvum populations. Multiple markers reveal panmixia across the continental range, at a greater geographical scale than previously recorded for any other mammal, whereas populations on remote islands were genetically distinct. Multiple serological assays reveal antibodies to henipaviruses and Lagos bat virus in all locations, including small isolated island populations, indicating that factors other than population size and connectivity may be responsible for viral persistence. Our findings have potentially important public health implications, and highlight a need to avoid disturbances that may precipitate viral spillover.
Nature communications 2013;4;2770
Genetic variants associated with warfarin dose in African-American individuals: a genome-wide association study.
Section of Genetic Medicine, Department of Medicine, University of Chicago, IL, USA.
BACKGROUND: VKORC1 and CYP2C9 are important contributors to warfarin dose variability, but explain less variability for individuals of African descent than for those of European or Asian descent. We aimed to identify additional variants contributing to warfarin dose requirements in African Americans. METHODS: We did a genome-wide association study of discovery and replication cohorts. Samples from African-American adults (aged ≥18 years) who were taking a stable maintenance dose of warfarin were obtained at International Warfarin Pharmacogenetics Consortium (IWPC) sites and the University of Alabama at Birmingham (Birmingham, AL, USA). Patients enrolled at IWPC sites but who were not used for discovery made up the independent replication cohort. All participants were genotyped. We did a stepwise conditional analysis, conditioning first for VKORC1 -1639G→A, followed by the composite genotype of CYP2C9*2 and CYP2C9*3. We prespecified a genome-wide significance threshold of p<5×10(-8) in the discovery cohort and p<0·0038 in the replication cohort. FINDINGS: The discovery cohort contained 533 participants and the replication cohort 432 participants. After the prespecified conditioning in the discovery cohort, we identified an association between a novel single nucleotide polymorphism in the CYP2C cluster on chromosome 10 (rs12777823) and warfarin dose requirement that reached genome-wide significance (p=1·51×10(-8)). This association was confirmed in the replication cohort (p=5·04×10(-5)); analysis of the two cohorts together produced a p value of 4·5×10(-12). Individuals heterozygous for the rs12777823 A allele need a dose reduction of 6·92 mg/week and those homozygous 9·34 mg/week. Regression analysis showed that the inclusion of rs12777823 significantly improves warfarin dose variability explained by the IWPC dosing algorithm (21% relative improvement). INTERPRETATION: A novel CYP2C single nucleotide polymorphism exerts a clinically relevant effect on warfarin dose in African Americans, independent of CYP2C9*2 and CYP2C9*3. Incorporation of this variant into pharmacogenetic dosing algorithms could improve warfarin dose prediction in this population. FUNDING: National Institutes of Health, American Heart Association, Howard Hughes Medical Institute, Wisconsin Network for Health Research, and the Wellcome Trust.
Automatic event detection within thrombus formation based on integer programming
Lecture Notes in Computer Science 2013;7766;215-24
Recombination-mediated genetic engineering of Plasmodium berghei DNA.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.
DNA of Plasmodium berghei is difficult to manipulate in Escherichia coli by conventional restriction and ligation methods due to its high content of adenine and thymine (AT) nucleotides. This limits our ability to clone large genes and to generate complex vectors for modifying the parasite genome. We here describe a protocol for using lambda Red recombinase to modify inserts of a P. berghei genomic DNA library constructed in a linear, low-copy, phage-derived vector. The method uses primer extensions of 50 bp, which provide sufficient homology for an antibiotic resistance marker to recombine efficiently with a P. berghei genomic DNA insert in E. coli. In a subsequent in vitro Gateway reaction the bacterial marker is replaced with a cassette for selection in P. berghei. The insert is then released and used for transfection. The basic techniques we describe here can be adapted to generate highly efficient vectors for gene deletion, tagging, targeted mutagenesis, or genetic complementation with larger genomic regions.
Funded by: Medical Research Council: G0501670
Methods in molecular biology (Clifton, N.J.) 2013;923;127-38
Identification of Salmonella enterica Serovar Typhi Genotypes by Use of Rapid Multiplex Ligation-Dependent Probe Amplification.
The Hospital for Tropical Diseases, Wellcome Trust Major Overseas Programme, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam.
Salmonella enterica serovar Typhi, the causative agent of typhoid fever, is highly clonal and genetically conserved, making isolate subtyping difficult. We describe a standardized multiplex ligation-dependent probe amplification (MLPA) genotyping scheme targeting 11 key phylogenetic markers of the S. Typhi genome. The MLPA method demonstrated 90% concordance with single nucleotide polymorphism (SNP) typing, the gold standard for S. Typhi genotyping, and had the ability to identify isolates of the H58 haplotype, which is associated with resistance to multiple antimicrobials. Additionally, the assay permitted the detection of fluoroquinolone resistance-associated mutations in the DNA gyrase-encoding gene gyrA and the topoisomerase gene parC with a sensitivity of 100%. The MLPA methodology is simple and reliable, providing phylogenetically and phenotypically relevant genotyping information. This MLPA scheme offers a more-sensitive and interpretable alternative to the nonphylogenetic subgrouping methodologies that are currently used in reference and research laboratories in areas where typhoid is endemic.
Journal of clinical microbiology 2013;51;9;2950-8
Transcriptional regulation of Culex pipiens mosquitoes by Wolbachia influences cytoplasmic incompatibility.
Peter Medawar Building for Pathogen Research and Nuffield Department of Medicine (NDM), University of Oxford, Oxford, United Kingdom ; Department of Zoology, University of Oxford, Oxford, United Kingdom.
Cytoplasmic incompatibility (CI) induced by the endosymbiont Wolbachia pipientis causes complex patterns of crossing sterility between populations of the Culex pipiens group of mosquitoes. The molecular basis of the phenotype is yet to be defined. In order to investigate what host changes may underlie CI at the molecular level, we examined the transcription of a homolog of the Drosophila melanogaster gene grauzone that encodes a zinc finger protein and acts as a regulator of female meiosis, in which mutations can cause sterility. Upregulation was observed in Wolbachia-infected C. pipiens group individuals relative to Wolbachia-cured lines and the level of upregulation differed between lines that were reproductively incompatible. Knockdown analysis of this gene using RNAi showed an effect on hatch rates in a Wolbachia infected Culex molestus line. Furthermore, in later stages of development an effect on developmental progression in CI embryos occurs in bidirectionally incompatible crosses. The genome of a wPip Wolbachia strain variant from Culex molestus was sequenced and compared with the genome of a wPip variant with which it was incompatible. Three genes in inserted or deleted regions were newly identified in the C. molestus wPip genome, one of which is a transcriptional regulator labelled wtrM. When this gene was transfected into adult Culex mosquitoes, upregulation of the grauzone homolog was observed. These data suggest that Wolbachia-mediated regulation of host gene expression is a component of the mechanism of cytoplasmic incompatibility.
Funded by: Wellcome Trust: 076964, 079059, 095121
PLoS pathogens 2013;9;10;e1003647
Genome Wide Association Analysis of a Founder Population Identified TAF3 as a Gene for MCHC in Humans.
Division of Genetics and Cell Biology, San Raffaele Research Institute and Vita Salute University, Milano, Italy.
The red blood cell related traits are highly heritable but their genetics are poorly defined. Only 5-10% of the total observed variance is explained by the genetic loci found to date, suggesting that additional loci should be searched using approaches alternative to large meta analysis. GWAS (Genome Wide Association Study) for red blood cell traits in a founder population cohort from Northern Italy identified a new locus for mean corpuscular hemoglobin concentration (MCHC) in the TAF3 gene. The association was replicated in two cohorts (rs1887582, P = 4.25E-09). TAF3 encodes a transcription cofactor that participates in core promoter recognition complex, and is involved in zebrafish and mouse erythropoiesis. We show here that TAF3 is required for transcription of the SPTA1 gene, encoding alpha spectrin, one of the proteins that link the plasma membrane to the actin cytoskeleton. Mutations in SPTA1 are responsible for hereditary spherocytosis, a monogenic disorder of MCHC, as well as for the normal MCHC level. Based on our results, we propose that TAF3 is required for normal erythropoiesis in human and that it might have a role in controlling the ratio between hemoglobin (Hb) and cell volume and in the dynamics of RBC maturation in healthy individuals. Finally, TAF3 represents a potential candidate or a modifier gene for disorders of red cell membrane.
PloS one 2013;8;7;e69206
NDUFA4 Mutations Underlie Dysfunction of a Cytochrome c Oxidase Subunit Linked to Human Neurological Disease.
MRC Centre for Neuromuscular Diseases, UCL Institute of Neurology and National Hospital for Neurology and Neurosurgery, Queen Square, London WC1N 3BG, UK.
The molecular basis of cytochrome c oxidase (COX, complex IV) deficiency remains genetically undetermined in many cases. Homozygosity mapping and whole-exome sequencing were performed in a consanguineous pedigree with isolated COX deficiency linked to a Leigh syndrome neurological phenotype. Unexpectedly, affected individuals harbored homozygous splice donor site mutations in NDUFA4, a gene previously assigned to encode a mitochondrial respiratory chain complex I (NADH:ubiquinone oxidoreductase) subunit. Western blot analysis of denaturing gels and immunocytochemistry revealed undetectable steady-state NDUFA4 protein levels, indicating that the mutation causes a loss-of-function effect in the homozygous state. Analysis of one- and two-dimensional blue-native polyacrylamide gels confirmed an interaction between NDUFA4 and the COX enzyme complex in control muscle, whereas the COX enzyme complex without NDUFA4 was detectable with no abnormal subassemblies in patient muscle. These observations support recent work in cell lines suggesting that NDUFA4 is an additional COX subunit and demonstrate that NDUFA4 mutations cause human disease. Our findings support reassignment of the NDUFA4 protein to complex IV and suggest that patients with unexplained COX deficiency should be screened for NDUFA4 mutations.
Cell reports 2013
COX10 mutations resulting in complex multisystem mitochondrial disease that remains stable into adulthood.
Medical Research Council Centre for Neuromuscular Diseases, University College London Institute of Neurology and National Hospital for Neurology and Neurosurgery, London, England.
Importance: Isolated cytochrome-c oxidase (COX) deficiency is one of the most frequent respiratory chain defects seen in human mitochondrial disease. Typically, patients present with severe neonatal multisystem disease and have an early fatal outcome. We describe an adult patient with isolated COX deficiency associated with a relatively mild clinical phenotype comprising myopathy; demyelinating neuropathy; premature ovarian failure; short stature; hearing loss; pigmentary maculopathy; and renal tubular dysfunction.
Observations: Whole-exome sequencing detected 1 known pathogenic and 1 novel COX10 mutation: c.1007A>T; p.Asp336Val, previously associated with fatal infantile COX deficiency, and c.1015C>T; p.Arg339Trp. Muscle COX holoenzyme and subassemblies were undetectable on immunoblots of blue-native gels, whereas denaturing gels and immunocytochemistry showed reduced core subunit MTCO1. Heme absorption spectra revealed low heme aa3 compatible with heme A:farnesyltransferase deficiency due to COX10 dysfunction. Both mutations demonstrated respiratory deficiency in yeast, confirming pathogenicity. A COX10 protein model was used to predict the structural consequences of the novel Arg339Trp and all previously reported substitutions.
Conclusions and relevance: These findings establish that COX10 mutations cause adult mitochondrial disease. Nuclear modifiers, epigenetic phenomenon, and/or environmental factors may influence the disease phenotype caused by reduced COX activity and contribute to the variable clinical severity related to COX10 dysfunction.
Funded by: Department of Health; Medical Research Council: G0601943, G0800674, MR/K000608/1, U117581331; NINDS NIH HHS: 1U54NS065712-01; Wellcome Trust: WT091310
JAMA neurology 2013;70;12;1556-61
High-fat feeding rapidly induces obesity and lipid derangements in C57BL/6N mice.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK.
C57BL/6N (B6N) is becoming the standard background for genetic manipulation of the mouse genome. The B6N, whose genome is very closely related to the reference C57BL/6J genome, is versatile in a wide range of phenotyping and experimental settings and large repositories of B6N ES cells have been developed. Here, we present a series of studies showing the baseline characteristics of B6N fed a high-fat diet (HFD) for up to 12 weeks. We show that HFD-fed B6N mice show increased weight gain, fat mass, and hypercholesterolemia compared to control diet-fed mice. In addition, HFD-fed B6N mice display a rapid onset of lipid accumulation in the liver with both macro- and microvacuolation, which became more severe with increasing duration of HFD. Our results suggest that the B6N mouse strain is a versatile background for studying diet-induced metabolic syndrome and may also represent a model for early nonalcoholic fatty liver disease.
Funded by: Wellcome Trust: 77157/Z/05/Z
Mammalian genome : official journal of the International Mammalian Genome Society 2013;24;5-6;240-51
Genome-wide mutational signatures of aristolochic acid and its application as a screening tool.
NCCS-VARI Translational Research Laboratory, Division of Medical Sciences, National Cancer Centre Singapore, 11 Hospital Drive, Singapore 169610, Singapore.
Aristolochic acid (AA), a natural product of Aristolochia plants found in herbal remedies and health supplements, is a group 1 carcinogen that can cause nephrotoxicity and upper urinary tract urothelial cell carcinoma (UTUC). Whole-genome and exome analysis of nine AA-associated UTUCs revealed a strikingly high somatic mutation rate (150 mutations/Mb), exceeding smoking-associated lung cancer (8 mutations/Mb) and ultraviolet radiation-associated melanoma (111 mutations/Mb). The AA-UTUC mutational signature was characterized by A:T to T:A transversions at the sequence motif A[C|T]AGG, located primarily on nontranscribed strands. AA-induced mutations were also significantly enriched at splice sites, suggesting a role for splice-site mutations in UTUC pathogenesis. RNA sequencing of AA-UTUC confirmed a general up-regulation of nonsense-mediated decay machinery components and aberrant splicing events associated with splice-site mutations. We observed a high frequency of somatic mutations in chromatin modifiers, particularly KDM6A, in AA-UTUC, demonstrated the sufficiency of AA to induce renal dysplasia in mice, and reproduced the AA mutational signature in experimentally treated human renal tubular cells. Finally, exploring other malignancies that were not known to be associated with AA, we screened 93 hepatocellular carcinoma genomes/exomes and identified AA-like mutational signatures in 11. Our study highlights an unusual genome-wide AA mutational signature and the potential use of mutation signatures as "molecular fingerprints" for interrogating high-throughput cancer genome data to infer previous carcinogen exposures.
Science translational medicine 2013;5;197;197ra101
Single-cell mutational profiling and clonal phylogeny in cancer.
The Institute of Cancer Research, London, SM2 5NG, United Kingdom;
The development of cancer is a dynamic evolutionary process in which intraclonal, genetic diversity provides a substrate for clonal selection and a source of therapeutic escape. The complexity and topography of intraclonal genetic architectures have major implications for biopsy-based prognosis and for targeted therapy. High-depth, next-generation sequencing (NGS) efficiently captures the mutational load of individual tumors or biopsies. But, being a snapshot portrait of total DNA, it disguises the fundamental features of subclonal variegation of genetic lesions and of clonal phylogeny. Single-cell genetic profiling provides a potential resolution to this problem, but methods developed to date all have limitations. We present a novel solution to this challenge using leukemic cells with known mutational spectra as a tractable model. DNA from flow-sorted single cells is screened using multiplex targeted Q-PCR within a microfluidic platform allowing unbiased single-cell selection, high-throughput, and comprehensive analysis for all main varieties of genetic abnormalities: chimeric gene fusions, copy number alterations, and single-nucleotide variants. We show, in this proof-of-principle study, that the method has a low error rate and can provide detailed subclonal genetic architectures and phylogenies.
Genome research 2013
Identification of Small Exonic CNV from Whole-Exome Sequence Data and Application to Autism Spectrum Disorder.
Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
Copy number variation (CNV) is an important determinant of human diversity and plays important roles in susceptibility to disease. Most studies of CNV carried out to date have made use of chromosome microarray and have had a lower size limit for detection of about 30 kilobases (kb). With the emergence of whole-exome sequencing studies, we asked whether such data could be used to reliably call rare exonic CNV in the size range of 1-30 kilobases (kb), making use of the eXome Hidden Markov Model (XHMM) program. By using both transmission information and validation by molecular methods, we confirmed that small CNV encompassing as few as three exons can be reliably called from whole-exome data. We applied this approach to an autism case-control sample (n = 811, mean per-target read depth = 161) and observed a significant increase in the burden of rare (MAF ≤1%) 1-30 kb CNV, 1-30 kb deletions, and 1-10 kb deletions in ASD. CNV in the 1-30 kb range frequently hit just a single gene, and we were therefore able to carry out enrichment and pathway analyses, where we observed enrichment for disruption of genes in cytoskeletal and autophagy pathways in ASD. In summary, our results showed that XHMM provided an effective means to assess small exonic CNV from whole-exome data, indicated that rare 1-30 kb exonic deletions could contribute to risk in up to 7% of individuals with ASD, and implicated a candidate pathway in developmental delay syndromes.
American journal of human genetics 2013;93;4;607-19
Comparative study of transcriptome profiles of mechanical- and skin-transformed Schistosoma mansoni schistosomula.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.
Schistosome infection begins with the penetration of cercariae through healthy unbroken host skin. This process leads to the transformation of the free-living larvae into obligate parasites called schistosomula. This irreversible transformation, which occurs in as little as two hours, involves casting the cercaria tail and complete remodelling of the surface membrane. At this stage, parasites are vulnerable to host immune attack and oxidative stress. Consequently, the mechanisms by which the parasite recognises and swiftly adapts to the human host are still the subject of many studies, especially in the context of development of intervention strategies against schistosomiasis infection. Because obtaining enough material from in vivo infections is not always feasible for such studies, the transformation process is often mimicked in the laboratory by application of shear pressure to a cercarial sample resulting in mechanically transformed (MT) schistosomula. These parasites share remarkable morphological and biochemical similarity to the naturally transformed counterparts and have been considered a good proxy for parasites undergoing natural infection. Relying on this equivalency, MT schistosomula have been used almost exclusively in high-throughput studies of gene expression, identification of drug targets and identification of effective drugs against schistosomes. However, the transcriptional equivalency between skin-transformed (ST) and MT schistosomula has never been proven. In our approach to compare these two types of schistosomula preparations and to explore differences in gene expression triggered by the presence of a skin barrier, we performed RNA-seq transcriptome profiling of ST and MT schistosomula at 24 hours post transformation. We report that these two very distinct schistosomula preparations differ only in the expression of 38 genes (out of ∼11,000), providing convincing evidence to resolve the skin vs. mechanical long-lasting controversy.
Funded by: Wellcome Trust: WT 083931/Z/07/Z, WT 098051
PLoS neglected tropical diseases 2013;7;3;e2091
Targeting MYCN in neuroblastoma by BET bromodomain inhibition.
Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA.
Bromodomain inhibition comprises a promising therapeutic strategy in cancer, particularly for hematologic malignancies. To date, however, genomic biomarkers to direct clinical translation have been lacking. We conducted a cell-based screen of genetically defined cancer cell lines using a prototypical inhibitor of BET bromodomains. Integration of genetic features with chemosensitivity data revealed a robust correlation between MYCN amplification and sensitivity to bromodomain inhibition. We characterized the mechanistic and translational significance of this finding in neuroblastoma, a childhood cancer with frequent amplification of MYCN. Genome-wide expression analysis showed downregulation of the MYCN transcriptional program accompanied by suppression of MYCN transcription. Functionally, bromodomain-mediated inhibition of MYCN impaired growth and induced apoptosis in neuroblastoma. BRD4 knockdown phenocopied these effects, establishing BET bromodomains as transcriptional regulators of MYCN. BET inhibition conferred a significant survival advantage in 3 in vivo neuroblastoma models, providing a compelling rationale for developing BET bromodomain inhibitors in patients with neuroblastoma.
Funded by: NCI NIH HHS: P01 CA081403, P01CA081403, R01 CA102321, R01CA102321, T32 CA128583, T32CA151022; NINDS NIH HHS: K08 NS079485, K08NS079485; Wellcome Trust: 086357, 093868
Cancer discovery 2013;3;3;308-23
SpoIVA and SipL are Clostridium difficile spore morphogenetic proteins.
Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont, USA.
Clostridium difficile is a major nosocomial pathogen whose infections are difficult to treat because of their frequent recurrence. The spores of C. difficile are responsible for these clinical features, as they resist common disinfectants and antibiotic treatment. Although spores are the major transmissive form of C. difficile, little is known about their composition or morphogenesis. Spore morphogenesis has been well characterized for Bacillus sp., but Bacillus sp. spore coat proteins are poorly conserved in Clostridium sp. Of the known spore morphogenetic proteins in Bacillus subtilis, SpoIVA is one of the mostly highly conserved in the Bacilli and the Clostridia. Using genetic analyses, we demonstrate that SpoIVA is required for proper spore morphogenesis in C. difficile. In particular, a spoIVA mutant exhibits defects in spore coat localization but not cortex formation. Our study also identifies SipL, a previously uncharacterized protein found in proteomic studies of C. difficile spores, as another critical spore morphogenetic protein, since a sipL mutant phenocopies a spoIVA mutant. Biochemical analyses and mutational analyses indicate that SpoIVA and SipL directly interact. This interaction depends on the Walker A ATP binding motif of SpoIVA and the LysM domain of SipL. Collectively, these results provide the first insights into spore morphogenesis in C. difficile.
Funded by: NCRR NIH HHS: P20RR021905; NIGMS NIH HHS: P20 GM103496, R00 GM092934, R00GM092934
Journal of bacteriology 2013;195;6;1214-25
A genetic progression model of Braf(V600E)-induced intestinal tumorigenesis reveals targets for therapeutic intervention.
Department of Medicine II, Klinikum Rechts der Isar, Technische Universität München, 81675, München, Germany. email@example.com
We show that BRAF(V600E) initiates an alternative pathway to colorectal cancer (CRC), which progresses through a hyperplasia/adenoma/carcinoma sequence. This pathway underlies significant subsets of CRCs with distinctive pathomorphologic/genetic/epidemiologic/clinical characteristics. Genetic and functional analyses in mice revealed a series of stage-specific molecular alterations driving different phases of tumor evolution and uncovered mechanisms underlying this stage specificity. We further demonstrate dose-dependent effects of oncogenic signaling, with physiologic Braf(V600E) expression being sufficient for hyperplasia induction, but later stage intensified Mapk-signaling driving both tumor progression and activation of intrinsic tumor suppression. Such phenomena explain, for example, the inability of p53 to restrain tumor initiation as well as its importance in invasiveness control, and the late stage specificity of its somatic mutation. Finally, systematic drug screening revealed sensitivity of this CRC subtype to targeted therapeutics, including Mek or combinatorial PI3K/Braf inhibition.
Funded by: Wellcome Trust: 095663
Cancer cell 2013;24;1;15-29
Dnmt2-dependent methylomes lack defined DNA methylation patterns.
Division of Epigenetics, DKFZ-ZMBH Alliance, German Cancer Research Center, 69120 Heidelberg, Germany.
Several organisms have retained methyltransferase 2 (Dnmt2) as their only candidate DNA methyltransferase gene. However, information about Dnmt2-dependent methylation patterns has been limited to a few isolated loci and the results have been discussed controversially. In addition, recent studies have shown that Dnmt2 functions as a tRNA methyltransferase, which raised the possibility that Dnmt2-only genomes might be unmethylated. We have now used whole-genome bisulfite sequencing to analyze the methylomes of Dnmt2-only organisms at single-base resolution. Our results show that the genomes of Schistosoma mansoni and Drosophila melanogaster lack detectable DNA methylation patterns. Residual unconverted cytosine residues shared many attributes with bisulfite deamination artifacts and were observed at comparable levels in Dnmt2-deficient flies. Furthermore, genetically modified Dnmt2-only mouse embryonic stem cells lost the DNA methylation patterns found in wild-type cells. Our results thus uncover fundamental differences among animal methylomes and suggest that DNA methylation is dispensable for a considerable number of eukaryotic organisms.
Proceedings of the National Academy of Sciences of the United States of America 2013;110;21;8627-31
Rare variants in single-minded 1 (SIM1) are associated with severe obesity.
University of Cambridge Metabolic Research Laboratories and NIHR Cambridge Biomedical Research Centre, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge, United Kingdom.
Single-minded 1 (SIM1) is a basic helix-loop-helix transcription factor involved in the development and function of the paraventricular nucleus of the hypothalamus. Obesity has been reported in Sim1 haploinsufficient mice and in a patient with a balanced translocation disrupting SIM1. We sequenced the coding region of SIM1 in 2,100 patients with severe, early onset obesity and in 1,680 controls. Thirteen different heterozygous variants in SIM1 were identified in 28 unrelated severely obese patients. Nine of the 13 variants significantly reduced the ability of SIM1 to activate a SIM1-responsive reporter gene when studied in stably transfected cells coexpressing the heterodimeric partners of SIM1 (ARNT or ARNT2). SIM1 variants with reduced activity cosegregated with obesity in extended family studies with variable penetrance. We studied the phenotype of patients carrying variants that exhibited reduced activity in vitro. Variant carriers exhibited increased ad libitum food intake at a test meal, normal basal metabolic rate, and evidence of autonomic dysfunction. Eleven of the 13 probands had evidence of a neurobehavioral phenotype. The phenotypic similarities between patients with SIM1 deficiency and melanocortin 4 receptor (MC4R) deficiency suggest that some of the effects of SIM1 deficiency on energy homeostasis are mediated by altered melanocortin signaling.
Funded by: Medical Research Council: G0900554, G9824984, MC_U106179471, MC_U106179473, MC_U106188470; NHLBI NIH HHS: HL-102923, HL-102924, HL-102925, HL-102926, HL-103010; Wellcome Trust: 077016/Z/05/Z, 082390/Z/07/Z, 098497
The Journal of clinical investigation 2013;123;7;3042-50
DeNovoGear: de novo indel and point mutation discovery and phasing.
1] Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, USA. .
We present DeNovoGear software for analyzing de novo mutations from familial and somatic tissue sequencing data. DeNovoGear uses likelihood-based error modeling to reduce the false positive rate of mutation discovery in exome analysis and fragment information to identify the parental origin of germ-line mutations. We used DeNovoGear on human whole-genome sequencing data to produce a set of predicted de novo insertion and/or deletion (indel) mutations with a 95% validation rate.
Nature methods 2013
Sex-stratified genome-wide association studies including 270,000 individuals show sexual dimorphism in genetic loci for anthropometric traits.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.
Given the anthropometric differences between men and women and previous evidence of sex-difference in genetic effects, we conducted a genome-wide search for sexually dimorphic associations with height, weight, body mass index, waist circumference, hip circumference, and waist-to-hip-ratio (133,723 individuals) and took forward 348 SNPs into follow-up (additional 137,052 individuals) in a total of 94 studies. Seven loci displayed significant sex-difference (FDR<5%), including four previously established (near GRB14/COBLL1, LYPLAL1/SLC30A10, VEGFA, ADAMTS9) and three novel anthropometric trait loci (near MAP3K1, HSD17B4, PPARG), all of which were genome-wide significant in women (P<5×10(-8)), but not in men. Sex-differences were apparent only for waist phenotypes, not for height, weight, BMI, or hip circumference. Moreover, we found no evidence for genetic effects with opposite directions in men versus women. The PPARG locus is of specific interest due to its role in diabetes genetics and therapy. Our results demonstrate the value of sex-specific GWAS to unravel the sexually dimorphic genetic underpinning of complex traits.
Funded by: AHRQ HHS: R01HS006516; British Heart Foundation: PG/07/133/24260; CSR NIH HHS: RG2008/014, RG2008/08; Cancer Research UK: 14136, C490/A10124, C490/A8339; Chief Scientist Office: CZB/4/672, CZB/4/710; Medical Research Council: 85374, G0000649, G0000934, G0500539, G0600237, G0600705, G0601261, G1000143, G1002084, G9521010, G9521010D, MC_U106179471, MC_U106179472, MC_U106188470, MC_U123092720, MR/K006584/1; NCATS NIH HHS: UL1 TR000124; NCI NIH HHS: P01CA87969, R01CA047988, R01CA049449, R01CA050385, R01CA065725, R01CA067262, U01CA098233; NCRR NIH HHS: M01RR16500, U54RR020278, UL1RR025005, UL1RR033176; NHGRI NIH HHS: N01HG65403, R01HG002651, U01HG004402, Z01HG000024; NHLBI NIH HHS: 5R01HL087679-02, N01HC25195, N01HC35129, N01HC85079, N01HC85080, N01HC85081, N01HC85082, N01HC85083, N01HC85084, N01HC85085, N01HC85086, N01HC85239, N02HL64278, R01HL036310, R01HL043851, R01HL059367, R01HL071981, R01HL075366, R01HL086694, R01HL087641, R01HL087647, R01HL087652, R01HL087676, R01HL087679, R01HL087700, R01HL088215, R01HL105756, U01HL069757, U01HL072515, U01HL080295, U01HL084729, U01HL084756; NIA NIH HHS: N01AG12100, N01AG12109, R01AG004563, R01AG008724, R01AG008861, R01AG010175, R01AG013196, R01AG015928, R01AG020098, R01AG023629, R01AG027058, R01AG028555; NIAAA NIH HHS: R01AA007535, R01AA013320, R01AA013321, R01AA013326, R01AA014041; NICHD NIH HHS: R01HD042157; NIDA NIH HHS: R01DA12854; NIDDK NIH HHS: K23DK080145, P01 DK088761, P30 DK063491, P30DK063491, P30DK072488, R01 DK072193, R01 DK075787, R01 DK091718, R01DK062370, R01DK072193, R01DK073490, R01DK075681, R01DK075787, R01DK089256, U01DK062418; NIGMS NIH HHS: T32 GM007092; NIMH NIH HHS: 1RL1MH083268-01, 5R01MH63706, R01MH059565, R01MH059566, R01MH059571, R01MH059586, R01MH059587, R01MH059588, R01MH060870, R01MH060879, R01MH061675, R01MH067257, R01MH079469, R01MH079470, R01MH081800, RL1MH083268, U24MH068457; NLM NIH HHS: R01LM010098; OAPP OPHS HHS: PG/02/128; PHS HHS: 268200625226C, 268201100005C, 268201100006C, 268201100007C, 268201100008C, 268201100009C, 268201100010C, 268201100011C, 268201100012C; Wellcome Trust: 068545/Z/02, 069224, 072960, 075491, 076113/B/04/Z, 076113/K/04/Z, 079557, 079895, 081682, 083270, 084183/Z/07/Z, 085301, 086596/Z/08/Z, 089061/Z/09/Z, 089062/Z/09/Z, 090532, 090532/Z/09/Z, 098051
PLoS genetics 2013;9;6;e1003500
Cancer gene discovery: exploiting insertional mutagenesis.
San Raffaele-Telethon Institute for Gene Therapy, via Olgettina 58, 20132, Milan, Italy. firstname.lastname@example.org.
Insertional mutagenesis has been used as a functional forward genetics screen for the identification of novel genes involved in the pathogenesis of human cancers. Different insertional mutagens have been successfully used to reveal new cancer genes. For example, retroviruses are integrating viruses with the capacity to induce the deregulation of genes in the neighborhood of the insertion site. Retroviruses have been used for more than 30 years to identify cancer genes in the hematopoietic system and mammary gland. Similarly, another tool that has revolutionized cancer gene discovery is the cut-and-paste transposons. These DNA elements have been engineered to contain strong promoters and stop cassettes that may function to perturb gene expression upon integration proximal to genes. In addition, complex mouse models characterized by tissue-restricted activity of transposons have been developed to identify oncogenes and tumor suppressor genes that control the development of a wide range of solid tumor types, extending beyond those tissues accessible using retrovirus-based approaches. Most recently, lentiviral vectors have appeared on the scene for use in cancer gene screens. Lentiviral vectors are replication-defective integrating vectors that have the advantage of being able to infect nondividing cells, in a wide range of cell types and tissues. In this review, we describe the various insertional mutagens focusing on their advantages/limitations, and we discuss the new and promising tools that will improve the insertional mutagenesis screens of the future.
Funded by: Cancer Research UK: 13031; Telethon: TGT11D01
Molecular cancer research : MCR 2013;11;10;1141-58
Cake: a bioinformatics pipeline for the integrated analysis of somatic variants in cancer genomes.
Experimental Cancer Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK.
We have developed Cake, a bioinformatics software pipeline that integrates four publicly available somatic variant-calling algorithms to identify single nucleotide variants with higher sensitivity and accuracy than any one algorithm alone. Cake can be run on a high-performance computer cluster or used as a stand-alone application. Availabilty: Cake is open-source and is available from http://cakesomatic.sourceforge.net/
Funded by: Cancer Research UK: 13031; Wellcome Trust
Bioinformatics (Oxford, England) 2013;29;17;2208-10
Combined sequence-based and genetic mapping analysis of complex traits in outbred rats.
Wellcome Trust Centre for Human Genetics, Oxford, UK.
Genetic mapping on fully sequenced individuals is transforming understanding of the relationship between molecular variation and variation in complex traits. Here we report a combined sequence and genetic mapping analysis in outbred rats that maps 355 quantitative trait loci for 122 phenotypes. We identify 35 causal genes involved in 31 phenotypes, implicating new genes in models of anxiety, heart disease and multiple sclerosis. The relationship between sequence and genetic variation is unexpectedly complex: at approximately 40% of quantitative trait loci, a single sequence variant cannot account for the phenotypic effect. Using comparable sequence and mapping data from mice, we show that the extent and spatial pattern of variation in inbred rats differ substantially from those of inbred mice and that the genetic variants in orthologous genes rarely contribute to the same phenotype in both species.
Funded by: British Heart Foundation: BHFRG/07/005/23633, RG/07/005/23633; Cancer Research UK: 13031, A10976; Medical Research Council: G0900084, G9900061, MC_U120061454; NHGRI NIH HHS: U54 HG003273; NIAMS NIH HHS: R01 AR047822; Wellcome Trust: 083573, 083573/Z/07/Z, 089269, 089269/Z/09/Z, 090532, 090532/Z/09/Z
Nature genetics 2013;45;7;767-75
Evolution of the thermopsin peptidase family (a5).
Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom ; European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridgeshire, United Kingdom.
Thermopsin is a peptidase from Sulfolobus acidocaldarius that is active at low pH and high temperature. From reversible inhibition with pepstatin, thermopsin is thought to be an aspartic peptidase. It is a member of the only family of peptidases to be restricted entirely to the archaea, namely peptidase family A5. Evolution within this family has been mapped, using a taxonomic tree based on the known classification of archaea. Homologues are found only in archaeans that are both hyperthermophiles and acidophiles, and this implies lateral transfer of genes between archaea, because species with homologues are not necessarily closely related. Despite the remarkable stability and activity in extreme conditions, no tertiary structure has been solved for any member of the family, and the catalytic mechanism is unknown. Putative catalytic residues have been predicted here by examination of aligned sequences.
PloS one 2013;8;11;e78998
Handbook of Proteolytic Enzymes 2013;1;253;1122-5
Introduction: Metallopeptidases and Their Clans
Handbook of Proteolytic Enzymes 2013;1;77;325-70
Introduction: The Clans and Families of Cysteine Peptidases
Handbook of Proteolytic Enzymes 2013;2;404;1743-73
Introduction: Clan PB Containing N-terminal Nucleophile Peptidases
Handbook of Proteolytic Enzymes 2013;3;808;3648-53
Introduction: Aspartic and Glutamic Peptidases and Their Clans
Handbook of Proteolytic Enzymes 2013;1;1;3-19
Handbook of Proteolytic Enzymes 2013;1;246;1079-81
Bacteriophage T4 Prohead Endopeptidase
Handbook of Proteolytic Enzymes 2013;3;789;3560-2
Introduction: Peptidases of Unknown Catalytic Type
Handbook of Proteolytic Enzymes 2013;3;827;3747-9
Introduction: Serine Peptidases and Their Clans
Handbook of Proteolytic Enzymes 2013;3;559;2491-523
Handbook of Proteolytic Enzymes
Handbook of Proteolytic Enzymes 2013
Genes involved in host-parasite interactions can be revealed by their correlated expression.
Parasite genomics group, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. email@example.com
Molecular interactions between a parasite and its host are key to the ability of the parasite to enter the host and persist. Our understanding of the genes and proteins involved in these interactions is limited. To better understand these processes it would be advantageous to have a range of methods to predict pairs of genes involved in such interactions. Correlated gene expression profiles can be used to identify molecular interactions within a species. Here we have extended the concept to different species, showing that genes with correlated expression are more likely to encode proteins, which directly or indirectly participate in host-parasite interaction. We go on to examine our predictions of molecular interactions between the malaria parasite and both its mammalian host and insect vector. Our approach could be applied to study any interaction between species, for example, between a host and its parasites or pathogens, but also symbiotic and commensal pairings.
Funded by: Wellcome Trust: 098051
Nucleic acids research 2013;41;3;1508-18
Secretory meningiomas are defined by combined KLF4 K409Q and TRAF7 mutations.
Department of Neuropathology, Institute of Pathology, Ruprecht-Karls-University Heidelberg, Im Neuenheimer Feld 224, 69120, Heidelberg, Germany.
Meningiomas are among the most frequent intracranial tumors. The secretory variant of meningioma is characterized by glandular differentiation, formation of intracellular lumina and pseudopsammoma bodies, expression of a distinct pattern of cytokeratins and clinically by pronounced perifocal brain edema. Here we describe whole-exome sequencing analysis of DNA from 16 secretory meningiomas and corresponding constitutional tissues. All secretory meningiomas invariably harbored a mutation in both KLF4 and TRAF7. Validation in an independent cohort of 14 secretory meningiomas by Sanger sequencing or derived cleaved amplified polymorphic sequence (dCAPS) assay detected the same pattern, with KLF4 mutations observed in a total of 30/30 and TRAF7 mutations in 29/30 of these tumors. All KLF4 mutations were identical, affected codon 409 and resulted in a lysine to glutamine exchange (K409Q). KLF4 mutations were not found in 89 non-secretory meningiomas, 267 other intracranial tumors including gliomas, glioneuronal tumors, pituitary adenomas and metastases, 59 peripheral nerve sheath tumors and 52 pancreatic tumors. TRAF7 mutations were restricted to the WD40 domains. While KLF4 mutations were exclusively seen in secretory meningiomas, TRAF7 mutations were also observed in 7/89 (8 %) of non-secretory meningiomas. KLF4 and TRAF7 mutations were mutually exclusive with NF2 mutations. In conclusion, our findings suggest an essential contribution of combined KLF4 K409Q and TRAF7 mutations in the genesis of secretory meningioma and demonstrate a role for TRAF7 alterations in other non-NF2 meningiomas.
Acta neuropathologica 2013;125;3;351-8
Rapid bacterial whole-genome sequencing to enhance diagnostic and public health microbiology.
Wellcome Trust Sanger Institute, Hinxton, England.
Importance: The latest generation of benchtop DNA sequencing platforms can provide an accurate whole-genome sequence (WGS) for a broad range of bacteria in less than a day. These could be used to more effectively contain the spread of multidrug-resistant pathogens.
Objective: To compare WGS with standard clinical microbiology practice for the investigation of nosocomial outbreaks caused by multidrug-resistant bacteria, the identification of genetic determinants of antimicrobial resistance, and typing of other clinically important pathogens.
Design, setting, and participants: A laboratory-based study of hospital inpatients with a range of bacterial infections at Cambridge University Hospitals NHS Foundation Trust, a secondary and tertiary referral center in England, comparing WGS with standard diagnostic microbiology using stored bacterial isolates and clinical information.
Main outcomes and measures: Specimens were taken and processed as part of routine clinical care, and cultured isolates stored and referred for additional reference laboratory testing as necessary. Isolates underwent DNA extraction and library preparation prior to sequencing on the Illumina MiSeq platform. Bioinformatic analyses were performed by persons blinded to the clinical, epidemiologic, and antimicrobial susceptibility data.
Results: We investigated 2 putative nosocomial outbreaks, one caused by vancomycin-resistant Enterococcus faecium and the other by carbapenem-resistant Enterobacter cloacae; WGS accurately discriminated between outbreak and nonoutbreak isolates and was superior to conventional typing methods. We compared WGS with standard methods for the identification of the mechanism of carbapenem resistance in a range of gram-negative bacteria (Acinetobacter baumannii, E cloacae, Escherichia coli, and Klebsiella pneumoniae). This demonstrated concordance between phenotypic and genotypic results, and the ability to determine whether resistance was attributable to the presence of carbapenemases or other resistance mechanisms. Whole-genome sequencing was used to recapitulate reference laboratory typing of clinical isolates of Neisseria meningitidis and to provide extended phylogenetic analyses of these.
Conclusions and relevance: The speed, accuracy, and depth of information provided by WGS platforms to confirm or refute outbreaks in hospitals and the community, and to accurately define transmission of multidrug-resistant and other organisms, represents an important advance.
Funded by: Medical Research Council: G1000803; Wellcome Trust: 098051
JAMA internal medicine 2013;173;15;1397-404
Using Varying Negative Examples to Improve Computational Predictions of Transcription Factor Binding Sites
Communications in Computer and Information Science 2013;311;234-43
Two new cytotypes reinforce that Micronycteris hirsuta Peters, 1869 does not represent a monotypic taxon.
Laboratório de Citogenética, Instituto de Ciências Biológicas, Universidade Federal do Pará, Campus do Guamá, Av, Perimetral, sn, Guamá, Belém, Pará, 66075-900, Brazil. firstname.lastname@example.org.
Background: The genus Micronycteris is a diverse group of phyllostomid bats currently comprising 11 species, with diploid number (2n) ranging from 26 to 40 chromosomes. The karyotypic relationships within Micronycteris and between Micronycteris and other phyllostomids remain poorly understood. The karyotype of Micronycteris hirsuta is of particular interest: three different diploid numbers were reported for this species in South and Central Americas with 2n = 26, 28 and 30 chromosomes. Although current evidence suggests some geographic differentiation among populations of M. hirsuta based on chromosomal, morphological, and nuclear and mitochondrial DNA markers, the recognition of new species or subspecies has been avoided due to the need for additional data, mainly chromosomal data.
Results: We describe two new cytotypes for Micronycteris hirsuta (MHI) (2n = 26 and 25, NF = 32), whose differences in diploid number are interpreted as the products of Robertsonian rearrangements. C-banding revealed a small amount of constitutive heterochromatin at the centromere and the NOR was located in the interstitial portion of the short arm of a second pair, confirmed by FISH. Telomeric probes hybridized to the centromeric regions and weakly to telomeric regions of most chromosomes. The G-banding analysis and chromosome painting with whole chromosome probes from Carollia brevicauda (CBR) and Phyllostomus hastatus (PHA) enabled the establishment of genome-wide homologies between MHI, CBR and PHA.
Conclusions: The karyotypes of Brazilian specimens of Micronycteris hirsuta described here are new to Micronycteris and reinforce that M. hirsuta does not represent a monotypic taxon. Our results corroborate the hypothesis of karyotypic megaevolution within Micronycteris, and strong evidence for this is that the entire chromosome complement of M. hirsuta was shown to be derivative with respect to species compared in this study.
Funded by: Wellcome Trust
BMC genetics 2013;14;119
GWAS of 126,559 Individuals Identifies Genetic Variants Associated with Educational Attainment.
Department of Applied Economics, Erasmus School of Economics, Erasmus University Rotterdam, 3000 DR Rotterdam, The Netherlands.
A genome-wide association study of educational attainment was conducted in a discovery sample of 101,069 individuals and a replication sample of 25,490. Three independent SNPs are genome-wide significant (rs9320913, rs11584700, rs4851266), and all three replicate. Estimated effects sizes are small (R(2) â‰ˆ 0.02%), approximately 1 month of schooling per allele. A linear polygenic score from all measured SNPs accounts for â‰ˆ 2% of the variance in both educational attainment and cognitive function. Genes in the region of the loci have previously been associated with health, cognitive, and central nervous system phenotypes, and bioinformatics analyses suggest the involvement of the anterior caudate nucleus. These findings provide promising candidate SNPs for follow-up work, and our effect size estimates can anchor power analyses in social-science genetics.
Science (New York, N.Y.) 2013
The first structure in a family of peptidase inhibitors reveals an unusual Ig-like fold.
Institute of Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK.
We report the crystal structure solution of the Intracellular Protease Inhibitor (IPI) protein from Bacillus subtilis, which has been reported to be an inhibitor of the intracellular subtilisin Isp1 from the same organism. The structure of IPI is a variant of the all-beta, immunoglobulin (Ig) fold. It is possible that IPI is important for protein-protein interactions, of which inhibition of Isp1 is one. The intracellular nature of ISP is questioned, because an alternative ATG codon in the ipi gene would produce a protein with an N-terminal extension containing a signal peptide. It is possible that alternative initiation exists, producing either an intracellular inhibitor or a secreted form that may be associated with the cell surface. Homologues of the IPI protein from other species are multi-domain proteins, containing signal peptides and domains also associated with the bacterial cell-surface. The cysteine peptidase inhibitors chagasin and amoebiasin also have Ig-like folds, but their topology differs significantly from that of IPI, and they share no recent common ancestor. A model of IPI docked to Isp1 shows similarities to other subtilisin:inhibitor complexes, particularly where the inhibitor interacts with the peptidase active site.
Genome-wide association analysis identifies 13 new risk loci for schizophrenia.
1] Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA.  Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA. .
Schizophrenia is an idiopathic mental disorder with a heritable component and a substantial public health impact. We conducted a multi-stage genome-wide association study (GWAS) for schizophrenia beginning with a Swedish national sample (5,001 cases and 6,243 controls) followed by meta-analysis with previous schizophrenia GWAS (8,832 cases and 12,067 controls) and finally by replication of SNPs in 168 genomic regions in independent samples (7,413 cases, 19,762 controls and 581 parent-offspring trios). We identified 22 loci associated at genome-wide significance; 13 of these are new, and 1 was previously implicated in bipolar disorder. Examination of candidate genes at these loci suggests the involvement of neuronal calcium signaling. We estimate that 8,300 independent, mostly common SNPs (95% credible interval of 6,300-10,200 SNPs) contribute to risk for schizophrenia and that these collectively account for at least 32% of the variance in liability. Common genetic variation has an important role in the etiology of schizophrenia, and larger studies will allow more detailed understanding of this disorder.
Funded by: Medical Research Council: G0600429, G0601635, G0701420, G0800509, G1000718, G1100583; NIMH NIH HHS: K01 MH094406, R01 MH077139, R01 MH083094, R01 MH095034, U01 MH094421; Wellcome Trust: 085475/B/08/Z, 085475/Z/08/Z, 090532, 095552
Nature genetics 2013;45;10;1150-9
GATA1-mutant clones are frequent and often unsuspected in babies with Down syndrome: identification of a population at risk of leukemia.
Centre for Haematology, Hammersmith Campus, Imperial College London, London, United Kingdom;
Transient abnormal myelopoiesis (TAM), a preleukemic disorder unique to neonates with Down syndrome (DS), may transform to childhood acute myeloid leukemia (ML-DS). Acquired GATA1 mutations are present in both TAM and ML-DS. Current definitions of TAM specify neither the percentage of blasts nor the role of GATA1 mutation analysis. To define TAM, we prospectively analyzed clinical findings, blood counts and smears, and GATA1 mutation status in 200 DS neonates. All DS neonates had multiple blood count and smear abnormalities. Surprisingly, 195 of 200 (97.5%) had circulating blasts. GATA1 mutations were detected by Sanger sequencing/denaturing high performance liquid chromatography (Ss/DHPLC) in 17 of 200 (8.5%), all with blasts >10%. Furthermore low-abundance GATA1 mutant clones were detected by targeted next-generation resequencing (NGS) in 18 of 88 (20.4%; sensitivity ∼0.3%) DS neonates without Ss/DHPLC-detectable GATA1 mutations. No clinical or hematologic features distinguished these 18 neonates. We suggest the term "silent TAM" for neonates with DS with GATA1 mutations detectable only by NGS. To identify all babies at risk of ML-DS, we suggest GATA1 mutation and blood count and smear analyses should be performed in DS neonates. Ss/DPHLC can be used for initial screening, but where GATA1 mutations are undetectable by Ss/DHPLC, NGS-based methods can identify neonates with small GATA1 mutant clones.
Funded by: Medical Research Council; Wellcome Trust: 091182
μ-Opioid receptor gene (OPRM1) polymorphism A118G: lack of association in Finnish populations with alcohol dependence or alcohol consumption.
Ministry of Social Affairs and Health, Department of Occupational Safety and Health, PO Box 33, FI-00023 Government, Finland.
Aims: The molecular epidemiological studies on the association of the opioid receptor µ-1 (OPRM1) polymorphism A118G (Asn40Asp, rs1799971) and alcohol use disorders have given conflicting results. The aim of this study was to test the possible association of A118G polymorphism and alcohol use disorders and alcohol consumption in three large cohort-based study samples.
Methods: The association between the OPRM1 A118G (Asn40Asp, rs1799971) polymorphism and alcohol use disorders and alcohol consumption was analyzed using three different population-based samples: (a) a Finnish cohort study, Health 2000, with 503 participants having a DSM-IV diagnosis for alcohol dependence and/or alcohol abuse and 506 age- and sex-matched controls; (b) a Finnish cohort study, FINRISK (n = 2360) and (c) the Helsinki Birth Cohort Study (n = 1384). The latter two populations lacked diagnosis-based phenotypes, but included detailed information on alcohol consumption.
Results: We found no statistically significant differences in genotypic or allelic distribution between controls and subjects with alcohol dependence or abuse diagnoses. Likewise no significant effects were observed between the A118G genotype and alcohol consumption.
Conclusion: These results suggest that A118G (Asn40Asp) polymorphism may not have a major effect on the development of alcohol use disorders at least in the Finnish population.
Alcohol and alcoholism (Oxford, Oxfordshire) 2013;48;5;519-25
Identification of nine new susceptibility loci for testicular cancer, including variants near DAZL and PRDM14.
Division of Genetics and Epidemiology, Institute of Cancer Research, Sutton, UK.
Testicular germ cell tumor (TGCT) is the most common cancer in young men and is notable for its high familial risks. So far, six loci associated with TGCT have been reported. From genome-wide association study (GWAS) analysis of 307,291 SNPs in 986 TGCT cases and 4,946 controls, we selected for follow-up 694 SNPs, which we genotyped in a further 1,064 TGCT cases and 10,082 controls from the UK. We identified SNPs at nine new loci (1q22, 1q24.1, 3p24.3, 4q24, 5q31.1, 8q13.3, 16q12.1, 17q22 and 21q22.3) showing association with TGCT (P < 5 × 10(-8)), which together account for an additional 4-6% of the familial risk of TGCT. The loci include genes plausibly related to TGCT development. PRDM14, at 8q13.3, is essential for early germ cell specification, and DAZL, at 3p24.3, is required for the regulation of germ cell development. Furthermore, PITX1, at 5q31.1, regulates TERT expression and is the third TGCT-associated locus implicated in telomerase regulation.
Nature genetics 2013;45;6;686-9
Mosaic PPM1D mutations are associated with predisposition to breast and ovarian cancer.
Division of Genetics & Epidemiology, The Institute of Cancer Research, Sutton SM2 5NG, UK.
Improved sequencing technologies offer unprecedented opportunities for investigating the role of rare genetic variation in common disease. However, there are considerable challenges with respect to study design, data analysis and replication. Using pooled next-generation sequencing of 507 genes implicated in the repair of DNA in 1,150 samples, an analytical strategy focused on protein-truncating variants (PTVs) and a large-scale sequencing case-control replication experiment in 13,642 individuals, here we show that rare PTVs in the p53-inducible protein phosphatase PPM1D are associated with predisposition to breast cancer and ovarian cancer. PPM1D PTV mutations were present in 25 out of 7,781 cases versus 1 out of 5,861 controls (P = 1.12 × 10(-5)), including 18 mutations in 6,912 individuals with breast cancer (P = 2.42 × 10(-4)) and 12 mutations in 1,121 individuals with ovarian cancer (P = 3.10 × 10(-9)). Notably, all of the identified PPM1D PTVs were mosaic in lymphocyte DNA and clustered within a 370-base-pair region in the final exon of the gene, carboxy-terminal to the phosphatase catalytic domain. Functional studies demonstrate that the mutations result in enhanced suppression of p53 in response to ionizing radiation exposure, suggesting that the mutant alleles encode hyperactive PPM1D isoforms. Thus, although the mutations cause premature protein truncation, they do not result in the simple loss-of-function effect typically associated with this class of variant, but instead probably have a gain-of-function effect. Our results have implications for the detection and management of breast and ovarian cancer risk. More generally, these data provide new insights into the role of rare and of mosaic genetic variants in common conditions, and the use of sequencing in their identification.
Funded by: Cancer Research UK: 11174, C12292/A11174; Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Medical Research Council: G0000934, G0600329, G0800759, G0900747 91070, G9521010, MR/K006584/1; Wellcome Trust: 068545/Z/02, 083948, 090532, 090532/Z/09/Z, 091157, 095552, 098051, 100140
Evolution of GluN2A/B cytoplasmic domains diversified vertebrate synaptic plasticity and behavior.
Genes to Cognition Programme, Wellcome Trust Sanger Institute, Cambridge, UK.
Two genome duplications early in the vertebrate lineage expanded gene families, including GluN2 subunits of the NMDA receptor. Diversification between the four mammalian GluN2 proteins occurred primarily at their intracellular C-terminal domains (CTDs). To identify shared ancestral functions and diversified subunit-specific functions, we exchanged the exons encoding the GluN2A (also known as Grin2a) and GluN2B (also known as Grin2b) CTDs in two knock-in mice and analyzed the mice's biochemistry, synaptic physiology, and multiple learned and innate behaviors. The eight behaviors were genetically separated into four groups, including one group comprising three types of learning linked to conserved GluN2A/B regions. In contrast, the remaining five behaviors exhibited subunit-specific regulation. GluN2A/B CTD diversification conferred differential binding to cytoplasmic MAGUK proteins and differential forms of long-term potentiation. These data indicate that vertebrate behavior and synaptic signaling acquired increased complexity from the duplication and diversification of ancestral GluN2 genes.
Funded by: Medical Research Council: G0802238; NIMH NIH HHS: R01 MH060919; Wellcome Trust: 089703
Nature neuroscience 2013;16;1;25-32
In Vitro Models of Brain Cancer
Emerging Concepts in Neuro-Oncology 2013;75-86
Molecular characterization of mutant mouse strains generated from the EUCOMM/KOMP-CSD ES cell resource.
The Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK. email@example.com
The Sanger Mouse Genetics Project generates knockout mice strains using the EUCOMM/KOMP-CSD embryonic stem (ES) cell collection and characterizes the consequences of the mutations using a high-throughput primary phenotyping screen. Upon achieving germline transmission, new strains are subject to a panel of quality control (QC) PCR- and qPCR-based assays to confirm the correct targeting, cassette structure, and the presence of the 3' LoxP site (required for the potential conditionality of the allele). We report that over 86 % of the 731 strains studied showed the correct targeting and cassette structure, of which 97 % retained the 3' LoxP site. We discuss the characteristics of the lines that failed QC and postulate that the majority of these may be due to mixed ES cell populations which were not detectable with the original screening techniques employed when creating the ES cell resource.
Funded by: Wellcome Trust: WT098051
Mammalian genome : official journal of the International Mammalian Genome Society 2013;24;7-8;286-94
Genomic analysis of a novel spontaneous albino C57BL/6N mouse strain.
The Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, United Kingdom. firstname.lastname@example.org
We report an albino C57BL/6N mouse strain carrying a spontaneous mutation in the tyrosinase gene (C57BL/6N-Tyr(cWTSI)). Deep whole genome sequencing of founder mice revealed very little divergence from C57BL/6NJ and C57BL/6N (Taconic). This coisogenic strain will be of great utility for the International Mouse Phenotyping Consortium (IMPC), which uses the EUCOMM/KOMP targeted C57BL/6N ES cell resource, and other investigators wishing to work on a defined C57BL/6N background.
Funded by: Cancer Research UK: 12401, 13031; Medical Research Council: G0800024; Wellcome Trust: WT098051
Genesis (New York, N.Y. : 2000) 2013;51;7;523-8
Characterization and comparative analysis of the complete Haemonchus contortus β-tubulin gene family and implications for benzimidazole resistance in strongylid nematodes.
Institute of Infection, Immunity and Inflammation, College of Medical, Veterinary and Life Sciences, University of Glasgow, 464 Bearsden Road, Glasgow, Scotland G61 1QH, UK.
Parasitic nematode β-tubulin genes are of particular interest because they are the targets of benzimidazole drugs. However, in spite of this, the full β-tubulin gene family has not been characterized for any parasitic nematode to date. Haemonchus contortus is the parasite species for which we understand benzimidazole resistance the best and its close phylogenetic relationship with Caenorhabditis elegans potentially allows inferences of gene function by comparative analysis. Consequently, we have characterized the full β-tubulin gene family in H. contortus. Further to the previously identified Hco-tbb-iso-1 and Hco-tbb-iso-2 genes, we have characterized two additional family members designated Hco-tbb-iso-3 and Hco-tbb-iso-4. We show that Hco-tbb-iso-1 is not a one-to-one orthologue with Cel-ben-1, the only β-tubulin gene in C. elegans that is a benzimidazole drug target. Instead, both Hco-tbb-iso-1 and Hco-tbb-iso-2 have a complex evolutionary relationship with three C. elegans β-tubulin genes: Cel-ben-1, Cel-tbb-1 and Cel-tbb-2. Furthermore, we show that both Hco-tbb-iso-1 and Hco-tbb-iso-2 are highly expressed in adult worms; in contrast, Hco-tbb-iso-3 and Hco-tbb-iso-4 are expressed only at very low levels and are orthologous to the Cel-mec-7 and Cel-tbb-4 genes, respectively, suggesting that they have specialized functional roles. Indeed, we have found that the expression pattern of Hco-tbb-iso-3 in H. contortus is identical to that of Cel-mec-7 in C. elegans, being expressed in just six "touch receptor" mechano-sensory neurons. These results suggest that further investigation is warranted into the potential involvement of strongylid isotype-2 β-tubulin genes in mechanisms of benzimidazole resistance.
Funded by: Canadian Institutes of Health Research: 230937; Wellcome Trust: WT098051
International journal for parasitology 2013;43;6;465-75
Genome-wide association study identifies a novel locus contributing to type 2 diabetes susceptibility in sikhs of punjabi origin from India.
Center for Human Genetic Research and Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts.
We performed a genome-wide association study (GWAS) and a multistage meta-analysis of type 2 diabetes (T2D) in Punjabi Sikhs from India. Our discovery GWAS in 1,616 individuals (842 case subjects) was followed by in silico replication of the top 513 independent single nucleotide polymorphisms (SNPs) (P < 10(-3)) in Punjabi Sikhs (n = 2,819; 801 case subjects). We further replicated 66 SNPs (P < 10(-4)) through genotyping in a Punjabi Sikh sample (n = 2,894; 1,711 case subjects). On combined meta-analysis in Sikh populations (n = 7,329; 3,354 case subjects), we identified a novel locus in association with T2D at 13q12 represented by a directly genotyped intronic SNP (rs9552911, P = 1.82 × 10(-8)) in the SGCG gene. Next, we undertook in silico replication (stage 2b) of the top 513 signals (P < 10(-3)) in 29,157 non-Sikh South Asians (10,971 case subjects) and de novo genotyping of up to 31 top signals (P < 10(-4)) in 10,817 South Asians (5,157 case subjects) (stage 3b). In combined South Asian meta-analysis, we observed six suggestive associations (P < 10(-5) to < 10(-7)), including SNPs at HMG1L1/CTCFL, PLXNA4, SCAP, and chr5p11. Further evaluation of 31 top SNPs in 33,707 East Asians (16,746 case subjects) (stage 3c) and 47,117 Europeans (8,130 case subjects) (stage 3d), and joint meta-analysis of 128,127 individuals (44,358 case subjects) from 27 multiethnic studies, did not reveal any additional loci nor was there any evidence of replication for the new variant. Our findings provide new evidence on the presence of a population-specific signal in relation to T2D, which may provide additional insights into T2D pathogenesis.
A genome-wide survey of genetic variation in gorillas using reduced representation sequencing.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom.
All non-human great apes are endangered in the wild, and it is therefore important to gain an understanding of their demography and genetic diversity. Whole genome assembly projects have provided an invaluable foundation for understanding genetics in all four genera, but to date genetic studies of multiple individuals within great ape species have largely been confined to mitochondrial DNA and a small number of other loci. Here, we present a genome-wide survey of genetic variation in gorillas using a reduced representation sequencing approach, focusing on the two lowland subspecies. We identify 3,006,670 polymorphic sites in 14 individuals: 12 western lowland gorillas (Gorilla gorilla gorilla) and 2 eastern lowland gorillas (Gorilla beringei graueri). We find that the two species are genetically distinct, based on levels of heterozygosity and patterns of allele sharing. Focusing on the western lowland population, we observe evidence for population substructure, and a deficit of rare genetic variants suggesting a recent episode of population contraction. In western lowland gorillas, there is an elevation of variation towards telomeres and centromeres on the chromosomal scale. On a finer scale, we find substantial variation in genetic diversity, including a marked reduction close to the major histocompatibility locus, perhaps indicative of recent strong selection there. These findings suggest that despite their maintaining an overall level of genetic diversity equal to or greater than that of humans, population decline, perhaps associated with disease, has been a significant factor in recent and long-term pressures on wild gorilla populations.
Funded by: Wellcome Trust: 098051
PloS one 2013;8;6;e65066
Combined NGS approaches identify mutations in the intraflagellar transport gene IFT140 in skeletal ciliopathies with early progressive kidney Disease.
Molecular Medicine Unit, University College London Institute of Child Health, London, UK.
Ciliopathies are genetically heterogeneous disorders characterized by variable expressivity and overlaps between different disease entities. This is exemplified by the short rib-polydactyly syndromes, Jeune, Sensenbrenner, and Mainzer-Saldino chondrodysplasia syndromes. These three syndromes are frequently caused by mutations in intraflagellar transport (IFT) genes affecting the primary cilia, which play a crucial role in skeletal and chondral development. Here, we identified mutations in IFT140, an IFT complex A gene, in five Jeune asphyxiating thoracic dystrophy (JATD) and two Mainzer-Saldino syndrome (MSS) families, by screening a cohort of 66 JATD/MSS patients using whole exome sequencing and targeted resequencing of a customized ciliopathy gene panel. We also found an enrichment of rare IFT140 alleles in JATD compared with nonciliopathy diseases, implying putative modifier effects for certain alleles. IFT140 patients presented with mild chest narrowing, but all had end-stage renal failure under 13 years of age and retinal dystrophy when examined for ocular dysfunction. This is consistent with the severe cystic phenotype of Ift140 conditional knockout mice, and the higher level of Ift140 expression in kidney and retina compared with the skeleton at E15.5 in the mouse. IFT140 is therefore a major cause of cono-renal syndromes (JATD and MSS). The present study strengthens the rationale for IFT140 screening in skeletal ciliopathy spectrum patients that have kidney disease and/or retinal dystrophy.
Funded by: NIGMS NIH HHS: GM060992, R01 GM060992; Wellcome Trust: 091310, 098051, UK10K, WT091310
Human mutation 2013;34;5;714-24
Mutations in the Gene Encoding IFT Dynein Complex Component WDR34 Cause Jeune Asphyxiating Thoracic Dystrophy.
Molecular Medicine Unit and Birth Defect Research Centre, Institute of Child Health, University College London (UCL), London WC1N 1EH, UK. Electronic address: email@example.com.
Bidirectional (anterograde and retrograde) motor-based intraflagellar transport (IFT) governs cargo transport and delivery processes that are essential for primary cilia growth and maintenance and for hedgehog signaling functions. The IFT dynein-2 motor complex that regulates ciliary retrograde protein transport contains a heavy chain dynein ATPase/motor subunit, DYNC2H1, along with other less well functionally defined subunits. Deficiency of IFT proteins, including DYNC2H1, underlies a spectrum of skeletal ciliopathies. Here, by using exome sequencing and a targeted next-generation sequencing panel, we identified a total of 11 mutations in WDR34 in 9 families with the clinical diagnosis of Jeune syndrome (asphyxiating thoracic dystrophy). WDR34 encodes a WD40 repeat-containing protein orthologous to Chlamydomonas FAP133, a dynein intermediate chain associated with the retrograde intraflagellar transport motor. Three-dimensional protein modeling suggests that the identified mutations all affect residues critical for WDR34 protein-protein interactions. We find that WDR34 concentrates around the centrioles and basal bodies in mammalian cells, also showing axonemal staining. WDR34 coimmunoprecipitates with the dynein-1 light chain DYNLL1 in vitro, and mining of proteomics data suggests that WDR34 could represent a previously unrecognized link between the cytoplasmic dynein-1 and IFT dynein-2 motors. Together, these data show that WDR34 is critical for ciliary functions essential to normal development and survival, most probably as a previously unrecognized component of the mammalian dynein-IFT machinery.
American journal of human genetics 2013;93;5;932-44
Co-binding by YY1 identifies the transcriptionally active, highly conserved set of CTCF-bound regions in primate genomes.
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. firstname.lastname@example.org.
Background: The genomic binding of CTCF is highly conserved across mammals, but the mechanisms that underlie its stability are poorly understood. One transcription factor known to functionally interact with CTCF in the context of X-chromosome inactivation is the ubiquitously expressed YY1. Because combinatorial transcription factor binding can contribute to the evolutionary stabilization of regulatory regions, we tested whether YY1 and CTCF co-binding could in part account for conservation of CTCF binding.
Results: Combined analysis of CTCF and YY1 binding in lymphoblastoid cell lines from seven primates, as well as in mouse and human livers, reveals extensive genome-wide co-localization specifically at evolutionarily stable CTCF-bound regions. CTCF-YY1 co-bound regions resemble regions bound by YY1 alone, as they enrich for active histone marks, RNA polymerase II and transcription factor binding. Although these highly conserved, transcriptionally active CTCF-YY1 co-bound regions are often promoter-proximal, gene-distal regions show similar molecular features.
Conclusions: Our results reveal that these two ubiquitously expressed, multi-functional zinc-finger proteins collaborate in functionally active regions to stabilize one another's genome-wide binding across primate evolution.
Funded by: NIGMS NIH HHS: R01 GM077959; NIH HHS: P51 OD011132; Wellcome Trust: 095908
Genome biology 2013;14;12;R148
Mechanisms controlling the temporal degradation of Nek2A and Kif18A by the APC/C-Cdc20 complex.
The Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark.
The Anaphase Promoting Complex/Cyclosome (APC/C) in complex with its co-activator Cdc20 is responsible for targeting proteins for ubiquitin-mediated degradation during mitosis. The activity of APC/C-Cdc20 is inhibited during prometaphase by the Spindle Assembly Checkpoint (SAC) yet certain substrates escape this inhibition. Nek2A degradation during prometaphase depends on direct binding of Nek2A to the APC/C via a C-terminal MR dipeptide but whether this motif alone is sufficient is not clear. Here, we identify Kif18A as a novel APC/C-Cdc20 substrate and show that Kif18A degradation depends on a C-terminal LR motif. However in contrast to Nek2A, Kif18A is not degraded until anaphase showing that additional mechanisms contribute to Nek2A degradation. We find that dimerization via the leucine zipper, in combination with the MR motif, is required for stable Nek2A binding to and ubiquitination by the APC/C. Nek2A and the mitotic checkpoint complex (MCC) have an overlap in APC/C subunit requirements for binding and we propose that Nek2A binds with high affinity to apo-APC/C and is degraded by the pool of Cdc20 that avoids inhibition by the SAC.
Funded by: Wellcome Trust: 079643/Z/06/Z, 092096
The EMBO journal 2013;32;2;303-14
Cohesin-based chromatin interactions enable regulated gene expression within preexisting architectural compartments.
Lymphocyte Development Group, MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, United Kingdom;
Chromosome conformation capture approaches have shown that interphase chromatin is partitioned into spatially segregated Mb-sized compartments and sub-Mb-sized topological domains. This compartmentalization is thought to facilitate the matching of genes and regulatory elements, but its precise function and mechanistic basis remain unknown. Cohesin controls chromosome topology to enable DNA repair and chromosome segregation in cycling cells. In addition, cohesin associates with active enhancers and promoters and with CTCF to form long-range interactions important for gene regulation. Although these findings suggest an important role for cohesin in genome organization, this role has not been assessed on a global scale. Unexpectedly, we find that architectural compartments are maintained in noncycling mouse thymocytes after genetic depletion of cohesin in vivo. Cohesin was, however, required for specific long-range interactions within compartments where cohesin-regulated genes reside. Cohesin depletion diminished interactions between cohesin-bound sites, whereas alternative interactions between chromatin features associated with transcriptional activation and repression became more prominent, with corresponding changes in gene expression. Our findings indicate that cohesin-mediated long-range interactions facilitate discrete gene expression states within preexisting chromosomal compartments.
Genome research 2013;23;12;2066-77
Generating whole bacterial genome sequences of low-abundance species from complex samples with IMS-MDA.
1] Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridgeshire, UK. .
The study of bacterial populations using whole-genome sequencing is of considerable scientific and clinical interest. However, obtaining bacterial genomic information is not always trivial: the target bacteria may be difficult to culture or uncultured, and they may be found within samples containing complex mixtures of other contaminating microbes and/or host cells, from which it is very difficult to derive robust sequencing data. Here we describe our procedure to generate sufficient DNA for whole-genome sequencing from clinical samples and without the need for culture, as successfully used on the difficult-to-culture, obligate intracellular pathogen Chlamydia trachomatis. Our protocol combines immunomagnetic separation (IMS) for targeted bacterial enrichment with multiple displacement amplification (MDA) for whole-genome amplification (WGA), which is followed by high-throughput sequencing. Compared with other techniques that might be used to generate such data, IMS-MDA is an inexpensive, low-technology and highly transferable process that provides amplified genomic DNA for sequencing from target bacteria in under 5 h, with little hands-on time.
Funded by: Wellcome Trust: 098051
Nature protocols 2013;8;12;2404-12
Whole-genome sequences of Chlamydia trachomatis directly from clinical samples without culture.
Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom. email@example.com
The use of whole-genome sequencing as a tool for the study of infectious bacteria is of growing clinical interest. Chlamydia trachomatis is responsible for sexually transmitted infections and the blinding disease trachoma, which affect hundreds of millions of people worldwide. Recombination is widespread within the genome of C. trachomatis, thus whole-genome sequencing is necessary to understand the evolution, diversity, and epidemiology of this pathogen. Culture of C. trachomatis has, until now, been a prerequisite to obtain DNA for whole-genome sequencing; however, as C. trachomatis is an obligate intracellular pathogen, this procedure is technically demanding and time consuming. Discarded clinical samples represent a large resource for sequencing the genomes of pathogens, yet clinical swabs frequently contain very low levels of C. trachomatis DNA and large amounts of contaminating microbial and human DNA. To determine whether it is possible to obtain whole-genome sequences from bacteria without the need for culture, we have devised an approach that combines immunomagnetic separation (IMS) for targeted bacterial enrichment with multiple displacement amplification (MDA) for whole-genome amplification. Using IMS-MDA in conjunction with high-throughput multiplexed Illumina sequencing, we have produced the first whole bacterial genome sequences direct from clinical samples. We also show that this method can be used to generate genome data from nonviable archived samples. This method will prove a useful tool in answering questions relating to the biology of many difficult-to-culture or fastidious bacteria of clinical concern.
Funded by: Wellcome Trust: 089276, 098051, 100087
Genome research 2013;23;5;855-66
Efficient Knockin Mouse Generation by ssDNA Oligonucleotides and Zinc-Finger Nuclease Assisted Homologous Recombination in Zygotes.
MOE Key Laboratory of Model Animal for Disease Study, Model Animal Research Center of Nanjing University, National Resource Center for Mutant Mice, Nanjing, China.
The generation of specific mutant animal models is critical for functional analysis of human genes. The conventional gene targeting approach in embryonic stem cells (ESCs) by homologous recombination is however laborious, slow, expensive, and limited to species with functional ESCs. It is therefore a long-sought goal to develop an efficient and simple alternative gene targeting strategy. Here we demonstrate that, by combining an efficient ZFN pair and ssODN, a restriction site and a loxP site were successfully introduced into a specific genomic locus. A targeting efficiency up to 22.22% was achieved by coinciding the insertion site and the ZFN cleavage site isogenic and keeping the length of the homology arms equal and isogenic to the endogenous target locus. Furthermore, we determine that ZFN and ssODN-assisted HR is ssODN homology arm length dependent. We further show that mutant alleles generated by ZFN and ssODN-assisted HR can be transmitted through the germline successfully. This study establishes an efficient gene targeting strategy by ZFN and ssODN-assisted HR in mouse zygotes, and provides a potential avenue for genome engineering in animal species without functional ES cell lines.
PloS one 2013;8;10;e77696
Progressive genome-wide introgression in agricultural Campylobacter coli.
Department of Zoology, The Tinbergen Building, University of Oxford, South Parks Road, Oxford, OX1 3PS, UK. firstname.lastname@example.org
Hybridization between distantly related organisms can facilitate rapid adaptation to novel environments, but is potentially constrained by epistatic fitness interactions among cell components. The zoonotic pathogens Campylobacter coli and C. jejuni differ from each other by around 15% at the nucleotide level, corresponding to an average of nearly 40 amino acids per protein-coding gene. Using whole genome sequencing, we show that a single C. coli lineage, which has successfully colonized an agricultural niche, has been progressively accumulating C. jejuni DNA. Members of this lineage belong to two groups, the ST-828 and ST-1150 clonal complexes. The ST-1150 complex is less frequently isolated and has undergone a substantially greater amount of introgression leading to replacement of up to 23% of the C. coli core genome as well as import of novel DNA. By contrast, the more commonly isolated ST-828 complex bacteria have 10-11% introgressed DNA, and C. jejuni and nonagricultural C. coli lineages each have <2%. Thus, the C. coli that colonize agriculture, and consequently cause most human disease, have hybrid origin, but this cross-species exchange has so far not had a substantial impact on the gene pools of either C. jejuni or nonagricultural C. coli. These findings also indicate remarkable interchangeability of basic cellular machinery after a prolonged period of independent evolution.
Funded by: Biotechnology and Biological Sciences Research Council
Molecular ecology 2013;22;4;1051-64
Genome-wide association study identifies vitamin B5 biosynthesis as a host specificity factor in Campylobacter.
Department of Zoology, University of Oxford, Oxford OX1 3PS, United Kingdom.
Genome-wide association studies have the potential to identify causal genetic factors underlying important phenotypes but have rarely been performed in bacteria. We present an association mapping method that takes into account the clonal population structure of bacteria and is applicable to both core and accessory genome variation. Campylobacter is a common cause of human gastroenteritis as a consequence of its proliferation in multiple farm animal species and its transmission via contaminated meat and poultry. We applied our association mapping method to identify the factors responsible for adaptation to cattle and chickens among 192 Campylobacter isolates from these and other host sources. Phylogenetic analysis implied frequent host switching but also showed that some lineages were strongly associated with particular hosts. A seven-gene region with a host association signal was found. Genes in this region were almost universally present in cattle but were frequently absent in isolates from chickens and wild birds. Three of the seven genes encoded vitamin B5 biosynthesis. We found that isolates from cattle were better able to grow in vitamin B5-depleted media and propose that this difference may be an adaptation to host diet.
Funded by: Biotechnology and Biological Sciences Research Council; Wellcome Trust: 087622
Proceedings of the National Academy of Sciences of the United States of America 2013;110;29;11923-7
A comparative phenotypic and genomic analysis of C57BL/6J and C57BL/6N mouse strains.
Background: The mouse inbred line C57BL/6J is widely used in mouse genetics and its genome has been incorporated into many genetic reference populations. More recently large initiatives such as the International Knockout Mouse Consortium (IKMC) are using the C57BL/6N mouse strain to generate null alleles for all mouse genes. Hence both strains are now widely used in mouse genetics studies. Here we perform a comprehensive genomic and phenotypic analysis of the two strains to identify differences that may influence their underlying genetic mechanisms.
Results: We undertake genome sequence comparisons of C57BL/6J and C57BL/6N to identify SNPs, indels and structural variants, with a focus on identifying all coding variants. We annotate 34 SNPs and 2 indels that distinguish C57BL/6J and C57BL/6N coding sequences, as well as 15 structural variants that overlap a gene. In parallel we assess the comparative phenotypes of the two inbred lines utilizing the EMPReSSslim phenotyping pipeline, a broad based assessment encompassing diverse biological systems. We perform additional secondary phenotyping assessments to explore other phenotype domains and to elaborate phenotype differences identified in the primary assessment. We uncover significant phenotypic differences between the two lines, replicated across multiple centers, in a number of physiological, biochemical and behavioral systems.
Conclusions: Comparison of C57BL/6J and C57BL/6N demonstrates a range of phenotypic differences that have the potential to impact upon penetrance and expressivity of mutational effects in these strains. Moreover, the sequence variants we identify provide a set of candidate genes for the phenotypic differences observed between the two strains.
Funded by: Cancer Research UK: 13031; Medical Research Council: G0300212, G0800024, G1002082, MC_PC_U127561112, MC_QA137918, MC_U142684171, MC_U142684172, MC_U142684175; Wellcome Trust: 098051, 100669
Genome biology 2013;14;7;R82
PhenoDigm: analyzing curated annotations to associate animal models with human diseases.
Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. email@example.com
The ultimate goal of studying model organisms is to translate what is learned into useful knowledge about normal human biology and disease to facilitate treatment and early screening for diseases. Recent advances in genomic technologies allow for rapid generation of models with a range of targeted genotypes as well as their characterization by high-throughput phenotyping. As an abundance of phenotype data become available, only systematic analysis will facilitate valid conclusions to be drawn from these data and transferred to human diseases. Owing to the volume of data, automated methods are preferable, allowing for a reliable analysis of the data and providing evidence about possible gene-disease associations. Here, we propose Phenotype comparisons for DIsease Genes and Models (PhenoDigm), as an automated method to provide evidence about gene-disease associations by analysing phenotype information. PhenoDigm integrates data from a variety of model organisms and, at the same time, uses several intermediate scoring methods to identify only strongly data-supported gene candidates for human genetic diseases. We show results of an automated evaluation as well as selected manually assessed examples that support the validity of PhenoDigm. Furthermore, we provide guidance on how to browse the data with PhenoDigm's web interface and illustrate its usefulness in supporting research. Database URL: http://www.sanger.ac.uk/resources/databases/phenodigm
Funded by: NHGRI NIH HHS: R01HG004838-02; NIH HHS: R24 OD011883, R24OD011883
Database : the journal of biological databases and curation 2013;2013;bat025
Sherlock Genomes - viral investigator.
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. firstname.lastname@example.org.
This month's Genome Watch highlights how deep sequencing technologies have vastly reduced the time and prior knowledge needed to generate viral genomes.
Nature reviews. Microbiology 2013;11;3;150
Chicken interferon-inducible transmembrane protein 3 restricts influenza viruses and lyssaviruses in vitro.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.
Interferon-inducible transmembrane protein 3 (IFITM3) is an effector protein of the innate immune system. It confers potent, cell-intrinsic resistance to infection by diverse enveloped viruses both in vitro and in vivo, including influenza viruses, West Nile virus, and dengue virus. IFITM3 prevents cytosolic entry of these viruses by blocking complete virus envelope fusion with cell endosome membranes. Although the IFITM locus, which includes IFITM1, -2, -3, and -5, is present in mammalian species, this locus has not been unambiguously identified or functionally characterized in avian species. Here, we show that the IFITM locus exists in chickens and is syntenic with the IFITM locus in mammals. The chicken IFITM3 protein restricts cell infection by influenza A viruses and lyssaviruses to a similar level as its human orthologue. Furthermore, we show that chicken IFITM3 is functional in chicken cells and that knockdown of constitutive expression in chicken fibroblasts results in enhanced infection by influenza A virus. Chicken IFITM2 and -3 are constitutively expressed in all tissues examined, whereas IFITM1 is only expressed in the bursa of Fabricius, gastrointestinal tract, cecal tonsil, and trachea. Despite being highly divergent at the amino acid level, IFITM3 proteins of birds and mammals can restrict replication of viruses that are able to infect different host species, suggesting IFITM proteins may provide a crucial barrier for zoonotic infections.
Funded by: Biotechnology and Biological Sciences Research Council: BB/J004448/1; Medical Research Council: G0600369, G1000413; PHS HHS: 098051
Journal of virology 2013;87;23;12957-66
Genetic variants from lipid-related pathways and risk for incident myocardial infarction.
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.
Background: Circulating lipids levels, as well as several familial lipid metabolism disorders, are strongly associated with initiation and progression of atherosclerosis and incidence of myocardial infarction (MI).
Objectives: We hypothesized that genetic variants associated with circulating lipid levels would also be associated with MI incidence, and have tested this in three independent samples.
Setting and subjects: Using age- and sex-adjusted additive genetic models, we analyzed 554 single nucleotide polymorphisms (SNPs) in 41 candidate gene regions proposed to be involved in lipid-related pathways potentially predisposing to incidence of MI in 2,602 participants of the Swedish Twin Register (STR; 57% women). All associations with nominal P<0.01 were further investigated in the Uppsala Longitudinal Study of Adult Men (ULSAM; N = 1,142).
Results: In the present study, we report associations of lipid-related SNPs with incident MI in two community-based longitudinal studies with in silico replication in a meta-analysis of genome-wide association studies. Overall, there were 9 SNPs in STR with nominal P-value <0.01 that were successfully genotyped in ULSAM. rs4149313 located in ABCA1 was associated with MI incidence in both longitudinal study samples with nominal significance (hazard ratio, 1.36 and 1.40; P-value, 0.004 and 0.015 in STR and ULSAM, respectively). In silico replication supported the association of rs4149313 with coronary artery disease in an independent meta-analysis including 173,975 individuals of European descent from the CARDIoGRAMplusC4D consortium (odds ratio, 1.03; P-value, 0.048).
Conclusions: rs4149313 is one of the few amino acid changing variants in ABCA1 known to associate with reduced cholesterol efflux. Our results are suggestive of a weak association between this variant and the development of atherosclerosis and MI.