Sanger Institute - Publications 2014

Number of papers published in 2014: 379

  • The Population Structure of Vibrio cholerae from the Chandigarh Region of Northern India.

    Abd El Ghany M, Chander J, Mutreja A, Rashid M, Hill-Cawthorne GA, Ali S, Naeem R, Thomson NR, Dougan G and Pain A

    Pathogen Genomics Laboratory, Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia.

    Background: Cholera infection continues to be a threat to global public health. The current cholera pandemic associated with Vibrio cholerae El Tor has now been ongoing for over half a century.

    Thirty-eight V. cholerae El Tor isolates associated with a cholera outbreak in 2009 from the Chandigarh region of India were characterised by a combination of microbiology, molecular typing and whole-genome sequencing. The genomic analysis indicated that two clones of V. cholera circulated in the region and caused disease during this time. These clones fell into two distinct sub-clades that map independently onto wave 3 of the phylogenetic tree of seventh pandemic V. cholerae El Tor. Sequence analyses of the cholera toxin gene, the Vibrio seventh Pandemic Island II (VSPII) and SXT element correlated with this phylogenetic position of the two clades on the El Tor tree. The clade 2 isolates, characterized by a drug-resistant profile and the expression of a distinct cholera toxin, are closely related to the recent V. cholerae isolated elsewhere, including Haiti, but fell on a distinct branch of the tree, showing they were independent outbreaks. Multi-Locus Sequence Typing (MLST) distinguishes two sequence types among the 38 isolates, that did not correspond to the clades defined by whole-genome sequencing. Multi-Locus Variable-length tandem-nucleotide repeat Analysis (MLVA) identified 16 distinct clusters.

    The use of whole-genome sequencing enabled the identification of two clones of V. cholerae that circulated during the 2009 Chandigarh outbreak. These clones harboured a similar structure of ICEVchHai1 but differed mainly in the structure of CTX phage and VSPII. The limited capacity of MLST and MLVA to discriminate between the clones that circulated in the 2009 Chandigarh outbreak highlights the value of whole-genome sequencing as a route to the identification of further genetic markers to subtype V. cholerae isolates.

    PLoS neglected tropical diseases 2014;8;7;e2981

  • Towards a molecular systems model of coronary artery disease.

    Abraham G, Bhalala OG, de Bakker PI, Ripatti S and Inouye M

    Medical Systems Biology, Department of Pathology and Department of Microbiology & Immunology, The University of Melbourne, Parkville, Victoria, 3010, Australia.

    Coronary artery disease (CAD) is a complex disease driven by myriad interactions of genetics and environmental factors. Traditionally, studies have analyzed only 1 disease factor at a time, providing useful but limited understanding of the underlying etiology. Recent advances in cost-effective and high-throughput technologies, such as single nucleotide polymorphism (SNP) genotyping, exome/genome/RNA sequencing, gene expression microarrays, and metabolomics assays have enabled the collection of millions of data points in many thousands of individuals. In order to make sense of such 'omics' data, effective analytical methods are needed. We review and highlight some of the main results in this area, focusing on integrative approaches that consider multiple modalities simultaneously. Such analyses have the potential to uncover the genetic basis of CAD, produce genomic risk scores (GRS) for disease prediction, disentangle the complex interactions underlying disease, and predict response to treatment.

    Current cardiology reports 2014;16;6;488

  • Editorial overview: Cancer genomics: kill it. Kill it dead.

    Adams D and McDermott U

    Experimental Cancer Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1HH, UK. Electronic address:

    Current opinion in genetics & development 2014

  • Histopathology reveals correlative and unique phenotypes in a high-throughput mouse phenotyping screen.

    Adissu HA, Estabel J, Sunter D, Tuck E, Hooks Y, Carragher DM, Clarke K, Karp NA, Project SM, Newbigging S, Jones N, Morikawa L, White JK and McKerlie C

    Centre for Modeling Human Disease, Toronto Centre for Phenogenomics, 25 Orde Street, Toronto, ON M5T 3H7, Canada.

    The Mouse Genetics Project (MGP) at the Wellcome Trust Sanger Institute aims to generate and phenotype over 800 genetically modified mouse lines over the next 5 years to gain a better understanding of mammalian gene function and provide an invaluable resource to the scientific community for follow-up studies. Phenotyping includes the generation of a standardized biobank of paraffin-embedded tissues for each mouse line, but histopathology is not routinely performed. In collaboration with the Pathology Core of the Centre for Modeling Human Disease (CMHD) we report the utility of histopathology in a high-throughput primary phenotyping screen. Histopathology was assessed in an unbiased selection of 50 mouse lines with (n=30) or without (n=20) clinical phenotypes detected by the standard MGP primary phenotyping screen. Our findings revealed that histopathology added correlating morphological data in 19 of 30 lines (63.3%) in which the primary screen detected a phenotype. In addition, seven of the 50 lines (14%) presented significant histopathology findings that were not associated with or predicted by the standard primary screen. Three of these seven lines had no clinical phenotype detected by the standard primary screen. Incidental and strain-associated background lesions were present in all mutant lines with good concordance to wild-type controls. These findings demonstrate the complementary and unique contribution of histopathology to high-throughput primary phenotyping of mutant mice.

    Funded by: NHGRI NIH HHS: U54 HG006364; NIH HHS: U42 OD011175

    Disease models & mechanisms 2014;7;5;515-24

  • H3Africa: a tipping point for a revolution in bioinformatics, genomics and health research in Africa.

    Adoga MP, Fatumo SA and Agwale SM

    Computational and Evolutionary Biology/Bioinformatics, Faculty of Life Sciences, University of Manchester, Manchester, UK ; Microbiology Unit, Department of Biological Sciences, Nasarawa State University, Keffi, Nigeria.

    Background: A multi-million dollar research initiative involving the National Institutes of Health (NIH), Wellcome Trust and African scientists has been launched. The initiative, referred to as H3Africa, is an acronym that stands for Human Heredity and Health in Africa. Here, we outline what this initiative is set to achieve and the latest commitments of the key players as at October 2013.

    Findings: The initiative has so far been awarded over $74 million in research grants. During the first set of awards announced in 2012, the NIH granted $5 million a year for a period of five years, while the Wellcome Trust doled out at least $12 million over the period to the research consortium. This was in addition to Wellcome Trust's provision of administrative support, scientific consultation and advanced training, all in collaboration with the African Society for Human Genetics. In addition, during the second set of awards announced in October 2013, the NIH awarded to the laudable initiative 10 new grants of up to $17 million over the next four years.

    Conclusions: H3Africa is poised to transform the face of research in genomics, bioinformatics and health in Africa. The capacity of African scientists will be enhanced through training and the better research facilities that will be acquired. Research collaborations between Africa and the West will grow and all stakeholders, including the funding partners, African scientists, scientists across the globe, physicians and patients will be the eventual winners.

    Source code for biology and medicine 2014;9;10

  • Human african trypanosomiasis research gets a boost: unraveling the tsetse genome.

    Aksoy S, Attardo G, Berriman M, Christoffels A, Lehane M, Masiga D and Toure Y

    Yale School of Public Health, Department of Epidemiology and Public Health, New Haven, Connecticut, United States of America.

    PLoS neglected tropical diseases 2014;8;4;e2624

  • Rare variants in NR2F2 cause congenital heart defects in humans.

    Al Turki S, Manickaraj AK, Mercer CL, Gerety SS, Hitz MP, Lindsay S, D'Alessandro LC, Swaminathan GJ, Bentham J, Arndt AK, Low J, Breckpot J, Gewillig M, Thienpont B, Abdul-Khaliq H, Harnack C, Hoff K, Kramer HH, Schubert S, Siebert R, Toka O, Cosgrove C, Watkins H, Lucassen AM, O'Kelly IM, Salmon AP, Bu'lock FA, Granados-Riveron J, Setchfield K, Thornborough C, Brook JD, Mulder B, Klaassen S, Bhattacharya S, Devriendt K, Fitzpatrick DF, UK10K Consortium, Wilson DI, Mital S and Hurles ME

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK; Department of Pathology, King Abdulaziz Medical City, P.O. Box 22490, Riyadh 11426, Saudi Arabia.

    Congenital heart defects (CHDs) are the most common birth defect worldwide and are a leading cause of neonatal mortality. Nonsyndromic atrioventricular septal defects (AVSDs) are an important subtype of CHDs for which the genetic architecture is poorly understood. We performed exome sequencing in 13 parent-offspring trios and 112 unrelated individuals with nonsyndromic AVSDs and identified five rare missense variants (two of which arose de novo) in the highly conserved gene NR2F2, a very significant enrichment (p = 7.7 × 10(-7)) compared to 5,194 control subjects. We identified three additional CHD-affected families with other variants in NR2F2 including a de novo balanced chromosomal translocation, a de novo substitution disrupting a splice donor site, and a 3 bp duplication that cosegregated in a multiplex family. NR2F2 encodes a pleiotropic developmental transcription factor, and decreased dosage of NR2F2 in mice has been shown to result in abnormal development of atrioventricular septa. Via luciferase assays, we showed that all six coding sequence variants observed in individuals significantly alter the activity of NR2F2 on target promoters.

    Funded by: Medical Research Council; Wellcome Trust: WT098051

    American journal of human genetics 2014;94;4;574-85

  • Mutational signatures: the patterns of somatic mutations hidden in cancer genomes.

    Alexandrov LB and Stratton MR

    Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom. Electronic address:

    All cancers originate from a single cell that starts to behave abnormally due to the acquired somatic mutations in its genome. Until recently, the knowledge of the mutational processes that cause these somatic mutations has been very limited. Recent advances in sequencing technologies and the development of novel mathematical approaches have allowed deciphering the patterns of somatic mutations caused by different mutational processes. Here, we summarize our current understanding of mutational patterns and mutational signatures in light of both the somatic cell paradigm of cancer research and the recent developments in the field of cancer genomics.

    Funded by: Wellcome Trust: 098051

    Current opinion in genetics & development 2014;24;52-60

  • Reading between the lines; understanding drug response in the post genomic era.

    Alifrangis CC and McDermott U

    Cambridge Institute of Medical Research, University of Cambridge, Cambridge, UK; Cancer Genome Project, Wellcome Trust Sanger Institute, Cambridge, UK; Dept of Medical Oncology, Charing Cross Hospital, London, UK. Electronic address:

    Following the fanfare of initial, often dramatic, success with small molecule inhibitors in the treatment of defined genomic subgroups, it can be argued that the extension of targeted therapeutics to the majority of patients with solid cancers has stalled. Despite encouraging FDA approval rates, the attrition rates of these compounds remains high in early stage clinical studies, with single agent studies repeatedly showing poor efficacy In striking contrast, our understanding of the complexity of solid neoplasms has increased in huge increments, following the publication of large-scale genomic and transcriptomic datasets from large collaborations such as the International Cancer Genome Consortium (ICGC and The Cancer Genome Atlas (TCGA However, there remains a clear disconnect between these rich datasets describing the genomic complexity of cancer, including both intra- and inter-tumour heterogeneity, and what a treating oncologist can consider to be a clinically "actionable" mutation profile. Our understanding of these data is in its infancy and we still find difficulties ascribing characteristics to tumours that consistently predict therapeutic response for the majority of small molecule inhibitors. This article will seek to explore the recent studies of the patterns and impact of mutations in drug resistance, and demonstrate how we may use this data to reshape our thinking about biological pathways, critical dependencies and their therapeutic interruption.

    Molecular oncology 2014

  • Plasmodium falciparum founder populations in Western Cambodia have reduced artemisinin sensitivity in vitro.

    Amaratunga C, Witkowski B, Dek D, Try V, Khim N, Miotto O, Ménard D and Fairhurst RM

    Laboratory of Malaria and Vector Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, USA.

    Reduced Plasmodium falciparum sensitivity to short-course artemisinin (ART) monotherapy manifests as a long parasite clearance half-life. We recently defined three parasite founder populations with long half-life in Pursat, Western Cambodia, where reduced ART sensitivity is prevalent. Using the ring-stage survival assay, we show that these founder populations have reduced ART sensitivity in vitro at the early-ring stage of parasite development, and that a genetically admixed population contains subsets of parasites with normal or reduced ART sensitivity.

    Antimicrobial agents and chemotherapy 2014

  • A molecular marker of artemisinin-resistant Plasmodium falciparum malaria.

    Ariey F, Witkowski B, Amaratunga C, Beghain J, Langlois AC, Khim N, Kim S, Duru V, Bouchier C, Ma L, Lim P, Leang R, Duong S, Sreng S, Suon S, Chuor CM, Bout DM, Ménard S, Rogers WO, Genton B, Fandeur T, Miotto O, Ringwald P, Le Bras J, Berry A, Barale JC, Fairhurst RM, Benoit-Vical F, Mercereau-Puijalon O and Ménard D

    1] Institut Pasteur, Parasite Molecular Immunology Unit, 75724 Paris Cedex 15, France [2] Centre National de la Recherche Scientifique, Unité de Recherche Associée 2581, 75724 Paris Cedex 15, France [3] Institut Pasteur, Genetics and Genomics of Insect Vectors Unit, 75724 Paris Cedex 15, France (F.A.); Institut Pasteur, Functional Genetics of Infectious Diseases Unit, 75724 Paris Cedex 15, France (J.B.); Centre de Physiopathologie de Toulouse-Purpan, Institut National de la Santé et de la Recherche Médicale UMR1043, Centre National de la Recherche Scientifique UMR5282, Université Toulouse III, 31024 Toulouse Cedex 3, France Institut Pasteur, Unité de Biologie et Génétique du Paludisme, Team Malaria Targets and Drug Development, 75724 Paris Cedex 15, France (J.-C.B.).

    Plasmodium falciparum resistance to artemisinin derivatives in southeast Asia threatens malaria control and elimination activities worldwide. To monitor the spread of artemisinin resistance, a molecular marker is urgently needed. Here, using whole-genome sequencing of an artemisinin-resistant parasite line from Africa and clinical parasite isolates from Cambodia, we associate mutations in the PF3D7_1343700 kelch propeller domain ('K13-propeller') with artemisinin resistance in vitro and in vivo. Mutant K13-propeller alleles cluster in Cambodian provinces where resistance is prevalent, and the increasing frequency of a dominant mutant K13-propeller allele correlates with the recent spread of resistance in western Cambodia. Strong correlations between the presence of a mutant allele, in vitro parasite survival rates and in vivo parasite clearance rates indicate that K13-propeller mutations are important determinants of artemisinin resistance. K13-propeller polymorphism constitutes a useful molecular marker for large-scale surveillance efforts to contain artemisinin resistance in the Greater Mekong Subregion and prevent its global spread.

    Funded by: Medical Research Council: G0600718; Wellcome Trust: 090770/Z/09/Z, 098051

    Nature 2014;505;7481;50-5

  • Lipoprotein(a) Levels, Genotype, and Incident Aortic Valve Stenosis: A Prospective Mendelian Randomization Study and Replication in a Case-Control Cohort.

    Arsenault BJ, Boekholdt SM, Dubé MP, Rhéaume E, Wareham NJ, Khaw KT, Sandhu MS and Tardif JC

    From the Montreal Heart Institute Research Center, Montreal, Quebec, Canada (B.J.A., M.-P.D., É.R., J.-C.T.); Department of Medicine, Faculty of Medicine, University of Montreal, Montreal, Quebec, Canada (B.J.A., M.-P.D., É.R., J.-C.T.); Department of Cardiology, Academic Medical Center, Amsterdam, The Netherlands (S.M.B.); MRC Epidemiology Unit (N.J.W.) and Department of Public Health and Primary Care (K.-T.K., M.S.S.), University of Cambridge, Cambridge, United Kingdom; and Genetic Epidemiology Group, Wellcome Trust Sanger Institute, Hinxton, United Kingdom (M.S.S.).

    Background: Although a previous study has suggested that a genetic variant in the LPA region was associated with the presence of aortic valve stenosis (AVS), no prospective study has suggested a role for lipoprotein(a) levels in the pathophysiology of AVS. Our objective was to determine whether lipoprotein(a) levels and a common genetic variant that is strongly associated with lipoprotein(a) levels are associated with an increased risk of developing AVS.

    Serum lipoprotein(a) levels were measured in 17 553 participants of the European Prospective Investigation into Cancer (EPIC)-Norfolk study. Among these study participants, 118 developed AVS during a mean follow-up of 11.7 years. The rs10455872 genetic variant in LPA was genotyped in 14 735 study participants, who simultaneously had lipoprotein(a) level measurements, and in a replication study of 379 patients with echocardiography-confirmed AVS and 404 controls. In EPIC-Norfolk, compared with participants in the bottom lipoprotein(a) tertile, those in the top lipoprotein(a) tertile had a higher risk of AVS (hazard ratio, 1.57; 95% confidence interval, 1.02-2.42) after adjusting for age, sex, and smoking. Compared with rs10455872 AA homozygotes, carriers of 1 or 2 G alleles were at increased risk of AVS (hazard ratio, 1.78; 95% confidence interval, 1.11-2.87, versus hazard ratio, 4.83; 95% confidence interval, 1.77-13.20, respectively). In the replication study, the genetic variant rs10455872 also showed a positive association with AVS (odds ratio, 1.57; 95% confidence interval, 1.10-2.26).

    Conclusions: Patients with high lipoprotein(a) levels are at increased risk for AVS. The rs10455872 variant, which is associated with higher lipoprotein(a) levels, is also associated with increased risk of AVS, suggesting that this association may be causal.

    Circulation. Cardiovascular genetics 2014;7;3;304-10

  • Spread of artemisinin resistance in Plasmodium falciparum malaria.

    Ashley EA, Dhorda M, Fairhurst RM, Amaratunga C, Lim P, Suon S, Sreng S, Anderson JM, Mao S, Sam B, Sopha C, Chuor CM, Nguon C, Sovannaroth S, Pukrittayakamee S, Jittamala P, Chotivanich K, Chutasmit K, Suchatsoonthorn C, Runcharoen R, Hien TT, Thuy-Nhien NT, Thanh NV, Phu NH, Htut Y, Han KT, Aye KH, Mokuolu OA, Olaosebikan RR, Folaranmi OO, Mayxay M, Khanthavong M, Hongvanthong B, Newton PN, Onyamboko MA, Fanello CI, Tshefu AK, Mishra N, Valecha N, Phyo AP, Nosten F, Yi P, Tripura R, Borrmann S, Bashraheil M, Peshu J, Faiz MA, Ghose A, Hossain MA, Samad R, Rahman MR, Hasan MM, Islam A, Miotto O, Amato R, MacInnis B, Stalker J, Kwiatkowski DP, Bozdech Z, Jeeyapant A, Cheah PY, Sakulthaew T, Chalk J, Intharabut B, Silamut K, Lee SJ, Vihokhern B, Kunasol C, Imwong M, Tarning J, Taylor WJ, Yeung S, Woodrow CJ, Flegg JA, Das D, Smith J, Venkatesan M, Plowe CV, Stepniewska K, Guerin PJ, Dondorp AM, Day NP, White NJ and Tracking Resistance to Artemisinin Collaboration (TRAC)

    The authors' affiliations are listed in the Appendix.

    Background: Artemisinin resistance in Plasmodium falciparum has emerged in Southeast Asia and now poses a threat to the control and elimination of malaria. Mapping the geographic extent of resistance is essential for planning containment and elimination strategies.

    Methods: Between May 2011 and April 2013, we enrolled 1241 adults and children with acute, uncomplicated falciparum malaria in an open-label trial at 15 sites in 10 countries (7 in Asia and 3 in Africa). Patients received artesunate, administered orally at a daily dose of either 2 mg per kilogram of body weight per day or 4 mg per kilogram, for 3 days, followed by a standard 3-day course of artemisinin-based combination therapy. Parasite counts in peripheral-blood samples were measured every 6 hours, and the parasite clearance half-lives were determined.

    Results: The median parasite clearance half-lives ranged from 1.9 hours in the Democratic Republic of Congo to 7.0 hours at the Thailand-Cambodia border. Slowly clearing infections (parasite clearance half-life >5 hours), strongly associated with single point mutations in the "propeller" region of the P. falciparum kelch protein gene on chromosome 13 (kelch13), were detected throughout mainland Southeast Asia from southern Vietnam to central Myanmar. The incidence of pretreatment and post-treatment gametocytemia was higher among patients with slow parasite clearance, suggesting greater potential for transmission. In western Cambodia, where artemisinin-based combination therapies are failing, the 6-day course of antimalarial therapy was associated with a cure rate of 97.7% (95% confidence interval, 90.9 to 99.4) at 42 days.

    Conclusions: Artemisinin resistance to P. falciparum, which is now prevalent across mainland Southeast Asia, is associated with mutations in kelch13. Prolonged courses of artemisinin-based combination therapies are currently efficacious in areas where standard 3-day treatments are failing. (Funded by the U.K. Department of International Development and others; number, NCT01350856.).

    Funded by: Wellcome Trust

    The New England journal of medicine 2014;371;5;411-23

  • Draft genome sequences of the type strains of Shigella flexneri held at Public Health England: comparison of classical phenotypic and novel molecular assays with whole genome sequence.

    Ashton PM, Baker KS, Gentle A, Wooldridge DJ, Thomson NR, Dallman TJ and Jenkins C

    Gastrointestinal Bacteria Reference Unit, Public Health England, 61 Colindale Ave, NW9 5HT London, England.

    Background: Public Health England (PHE) holds a collection of Shigella flexneri Type strains isolated between 1949 and 1972 representing 15 established serotypes and one provisional type, E1037. In this study, the genomes of all 16 PHE Type strains were sequenced using the Illumina HiSeq platform. The relationship between core genome phylogeny and serotype was examined.

    Results: The most common target gene for the detection of Shigella species in clinical PCR assays, ipaH, was detected in all genomes. The type-specific target genes were correctly identified in each genome sequence. In contrast to the S. flexneri in serotype 5 strain described by Sun et al. (2012), the two PHE serotype 5 Type strains possessed an additional oac gene and were differentiated by the presence (serotype 5b) or absence (serotype 5a) of gtrX. The somatic antigen structure and phylogenetic relationship were broadly congruent for strains expressing serotype specific antigens III, IV and V, but not for those expressing I and II. The whole genome phylogenies of the 15 isolates sequenced showed that the serotype 6 Type Strain was phylogenetically distinct from the other S. flexneri serotypes sequenced. The provisional serotype E1037 fell within the serotype 4 clade, being most closely related to the Serotype 4a Type Strain.

    Conclusions: The S. flexneri genome sequences were used to evaluate phylogenetic relationships between Type strains and validate genotypic and phenotypic assays. The analysis confirmed that the PHE S. flexneri Type strains are phenotypically and genotypically distinct. Novel variants will continue to be added to this archive.

    Gut pathogens 2014;6;1;7

  • estMOI: estimating multiplicity of infection using parasite deep sequencing data.

    Assefa SA, Preston MD, Campino S, Ocholla H, Sutherland CJ and Clark TG

    London School of Hygiene and Tropical Medicine, WC1E 7HT, London, UK, Wellcome Trust Sanger Institute, CB10 1SA, Hinxton, UK and Malawi-Liverpool-Wellcome Trust Clinical Research Programme, Box 30096 BT3, Blantyre, Malawia.

    Summary: Individuals living in endemic areas generally harbour multiple parasite strains. Multiplicity of infection (MOI) can be an indicator of immune status and transmission intensity. It has a potentially confounding effect on a number of population genetic analyses, which often assume isolates are clonal. Polymerase chain reaction-based approaches to estimate MOI can lack sensitivity. For example, in the human malaria parasite Plasmodium falciparum, genotyping of the merozoite surface protein (MSP1/2) genes is a standard method for assessing MOI, despite the apparent problem of underestimation. The availability of deep coverage data from massively parallizable sequencing technologies means that MOI can be detected genome wide by considering the abundance of heterozygous genotypes. Here, we present a method to estimate MOI, which considers unique combinations of polymorphisms from sequence reads. The method is implemented within the estMOI software. When applied to clinical P.falciparum isolates from three continents, we find that multiple infections are common, especially in regions with high transmission.Availability and implementation: estMOI is freely available from Contact: SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

    Bioinformatics (Oxford, England) 2014

  • Epistasis between the haptoglobin common variant and α+thalassemia influences risk of severe malaria in Kenyan children.

    Atkinson SH, Uyoga SM, Nyatichi E, Macharia AW, Nyutu G, Ndila C, Kwiatkowski DP, Rockett KA and Williams TN

    Department of Paediatrics, Oxford University Hospitals National Health Service Trust, University of Oxford, and.

    Haptoglobin (Hp) scavenges free hemoglobin following malaria-induced hemolysis. Few studies have investigated the relationship between the common Hp variants and the risk of severe malaria, and their results are inconclusive. We conducted a case-control study of 996 children with severe Plasmodium falciparum malaria and 1220 community controls and genotyped for Hp, hemoglobin (Hb) S heterozygotes, and α(+)thalassemia. Hb S heterozygotes and α(+)thalassemia homozygotes were protected from severe malaria (odds ratio [OR], 0.12; 95% confidence interval [CI], 0.07-0.18 and OR, 0.69; 95% CI, 0.53-0.91, respectively). The risk of severe malaria also varied by Hp genotype: Hp2-1 was associated with the greatest protection against severe malaria and Hp2-2 with the greatest risk. Meta-analysis of the current and published studies suggests that Hp2-2 is associated with increased risk of severe malaria compared with Hp2-1. We found a significant interaction between Hp genotype and α(+)thalassemia in predicting risk of severe malaria: Hp2-1 in combination with heterozygous or homozygous α(+)thalassemia was associated with protection from severe malaria (OR, 0.73; 95% CI, 0.54-0.99 and OR, 0.48; 95% CI, 0.32-0.73, respectively), but α(+)thalassemia in combination with Hp2-2 was not protective. This epistatic interaction together with varying frequencies of α(+)thalassemia across Africa may explain the inconsistent relationship between Hp genotype and malaria reported in previous studies.

    Blood 2014;123;13;2008-16

  • Transcriptionally active chromatin recruits homologous recombination at DNA double-strand breaks.

    Aymard F, Bugler B, Schmidt CK, Guillou E, Caron P, Briois S, Iacovoni JS, Daburon V, Miller KM, Jackson SP and Legube G

    1] Laboratoire de Biologie Cellulaire et Moléculaire du Contrôle de la Prolifération, Université de Toulouse, Université Paul Sabatier, Toulouse, France. [2] CNRS, Laboratoire de Biologie Cellulaire et Moléculaire du Contrôle de la Prolifération, Toulouse, France.

    Although both homologous recombination (HR) and nonhomologous end joining can repair DNA double-strand breaks (DSBs), the mechanisms by which one of these pathways is chosen over the other remain unclear. Here we show that transcriptionally active chromatin is preferentially repaired by HR. Using chromatin immunoprecipitation-sequencing (ChIP-seq) to analyze repair of multiple DSBs induced throughout the human genome, we identify an HR-prone subset of DSBs that recruit the HR protein RAD51, undergo resection and rely on RAD51 for efficient repair. These DSBs are located in actively transcribed genes and are targeted to HR repair via the transcription elongation-associated mark trimethylated histone H3 K36. Concordantly, depletion of SETD2, the main H3 K36 trimethyltransferase, severely impedes HR at such DSBs. Our study thereby demonstrates a primary role in DSB repair of the chromatin context in which a break occurs.

    Nature structural & molecular biology 2014

  • Revisiting the thrifty gene hypothesis via 65 loci associated with susceptibility to type 2 diabetes.

    Ayub Q, Moutsianas L, Chen Y, Panoutsopoulou K, Colonna V, Pagani L, Prokopenko I, Ritchie GR, Tyler-Smith C, McCarthy MI, Zeggini E and Xue Y

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1HH, UK.

    We have investigated the evidence for positive selection in samples of African, European, and East Asian ancestry at 65 loci associated with susceptibility to type 2 diabetes (T2D) previously identified through genome-wide association studies. Selection early in human evolutionary history is predicted to lead to ancestral risk alleles shared between populations, whereas late selection would result in population-specific signals at derived risk alleles. By using a wide variety of tests based on the site frequency spectrum, haplotype structure, and population differentiation, we found no global signal of enrichment for positive selection when we considered all T2D risk loci collectively. However, in a locus-by-locus analysis, we found nominal evidence for positive selection at 14 of the loci. Selection favored the protective and risk alleles in similar proportions, rather than the risk alleles specifically as predicted by the thrifty gene hypothesis, and may not be related to influence on diabetes. Overall, we conclude that past positive selection has not been a powerful influence driving the prevalence of T2D risk alleles.

    Funded by: Wellcome Trust: 090532, 098051, 098381, WT090367MA

    American journal of human genetics 2014;94;2;176-85

  • Novel mutations in penicillin-binding protein genes in clinical Staphylococcus aureus isolates that are methicillin resistant on susceptibility testing, but lack the mec gene.

    Ba X, Harrison EM, Edwards GF, Holden MT, Larsen AR, Petersen A, Skov RL, Peacock SJ, Parkhill J, Paterson GK and Holmes MA

    Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.

    Objectives: Methicillin-resistant Staphylococcus aureus (MRSA) is an important global health problem. MRSA resistance to β-lactam antibiotics is mediated by the mecA or mecC genes, which encode an alternative penicillin-binding protein (PBP) 2a that has a low affinity to β-lactam antibiotics. Detection of mec genes or PBP2a is regarded as the gold standard for the diagnosis of MRSA. We identified four MRSA isolates that lacked mecA or mecC genes, but were still phenotypically resistant to pencillinase-resistant β-lactam antibiotics.

    Methods: The four human S. aureus isolates were investigated by whole genome sequencing and a range of phenotypic assays.

    Results: We identified a number of amino acid substitutions present in the endogenous PBPs 1, 2 and 3 that were found in the resistant isolates but were absent in closely related susceptible isolates and which may be the basis of resistance. Of particular interest are three identical amino acid substitutions in PBPs 1, 2 and 3, occurring independently in isolates from at least two separate multilocus sequence types. Two different non-conservative substitutions were also present in the same amino acid of PBP1 in two isolates from two different sequence types.

    Conclusions: This work suggests that phenotypically resistant MRSA could be misdiagnosed using molecular methods alone and provides evidence of alternative mechanisms for β-lactam resistance in MRSA that may need to be considered by diagnostic laboratories.

    Funded by: Medical Research Council: G1001787

    The Journal of antimicrobial chemotherapy 2014;69;3;594-7

  • TB or not TB? Genomic portraits provide answers.

    Baker KS and Ellington MJ

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2014

  • Poxviruses in Bats … so What?

    Baker KS and Murcia PR

    Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK.

    Poxviruses are important pathogens of man and numerous domestic and wild animal species. Cross species (including zoonotic) poxvirus infections can have drastic consequences for the recipient host. Bats are a diverse order of mammals known to carry lethal viral zoonoses such as Rabies, Hendra, Nipah, and SARS. Consequent targeted research is revealing bats to be infected with a rich diversity of novel viruses. Poxviruses were recently identified in bats and the settings in which they were found were dramatically different. Here, we review the natural history of poxviruses in bats and highlight the relationship of the viruses to each other and their context in the Poxviridae family. In addition to considering the zoonotic potential of these viruses, we reflect on the broader implications of these findings. Specifically, the potential to explore and exploit this newfound relationship to study coevolution and cross species transmission together with fundamental aspects of poxvirus host tropism as well as bat virology and immunology.

    Viruses 2014;6;4;1564-77

  • Gene conversion violates the stepwise mutation model for microsatellites in y-chromosomal palindromic repeats.

    Balaresque P, King TE, Parkin EJ, Heyer E, Carvalho-Silva D, Kraaijenbrink T, de Knijff P, Tyler-Smith C and Jobling MA

    UMR5288 CNRS/UPS-AMIS-Université Paul Sabatier, Toulouse, France; Department of Genetics, University of Leicester, Leicester, UK.

    The male-specific region of the human Y chromosome (MSY) contains eight large inverted repeats (palindromes), in which high-sequence similarity between repeat arms is maintained by gene conversion. These palindromes also harbor microsatellites, considered to evolve via a stepwise mutation model (SMM). Here, we ask whether gene conversion between palindrome microsatellites contributes to their mutational dynamics. First, we study the duplicated tetranucleotide microsatellite DYS385a,b lying in palindrome P4. We show, by comparing observed data with simulated data under a SMM within haplogroups, that observed heteroallelic combinations in which the modal repeat number difference between copies was large, can give rise to homoallelic combinations with zero-repeats difference, equivalent to many single-step mutations. These are unlikely to be generated under a strict SMM, suggesting the action of gene conversion. Second, we show that the intercopy repeat number difference for a large set of duplicated microsatellites in all palindromes in the MSY reference sequence is significantly reduced compared with that for nonpalindrome-duplicated microsatellites, suggesting that the former are characterized by unusual evolutionary dynamics. These observations indicate that gene conversion violates the SMM for microsatellites in palindromes, homogenizing copies within individual Y chromosomes, but increasing overall haplotype diversity among chromosomes within related groups.

    Funded by: Wellcome Trust: 087576

    Human mutation 2014;35;5;609-17

  • Toward male individualization with rapidly mutating y-chromosomal short tandem repeats.

    Ballantyne KN, Ralf A, Aboukhalid R, Achakzai NM, Anjos MJ, Ayub Q, Balažic J, Ballantyne J, Ballard DJ, Berger B, Bobillo C, Bouabdellah M, Burri H, Capal T, Caratti S, Cárdenas J, Cartault F, Carvalho EF, Carvalho M, Cheng B, Coble MD, Comas D, Corach D, D'Amato ME, Davison S, de Knijff P, De Ungria MC, Decorte R, Dobosz T, Dupuy BM, Elmrghni S, Gliwiński M, Gomes SC, Grol L, Haas C, Hanson E, Henke J, Henke L, Herrera-Rodríguez F, Hill CR, Holmlund G, Honda K, Immel UD, Inokuchi S, Jobling MA, Kaddura M, Kim JS, Kim SH, Kim W, King TE, Klausriegler E, Kling D, Kovačević L, Kovatsi L, Krajewski P, Kravchenko S, Larmuseau MH, Lee EY, Lessig R, Livshits LA, Marjanović D, Minarik M, Mizuno N, Moreira H, Morling N, Mukherjee M, Munier P, Nagaraju J, Neuhuber F, Nie S, Nilasitsataporn P, Nishi T, Oh HH, Olofsson J, Onofri V, Palo JU, Pamjav H, Parson W, Petlach M, Phillips C, Ploski R, Prasad SP, Primorac D, Purnomo GA, Purps J, Rangel-Villalobos H, Rębała K, Rerkamnuaychoke B, Gonzalez DR, Robino C, Roewer L, Rosa A, Sajantila A, Sala A, Salvador JM, Sanz P, Schmitt C, Sharma AK, Silva DA, Shin KJ, Sijen T, Sirker M, Siváková D, Skaro V, Solano-Matamoros C, Souto L, Stenzl V, Sudoyo H, Syndercombe-Court D, Tagliabracci A, Taylor D, Tillmar A, Tsybovsky IS, Tyler-Smith C, van der Gaag KJ, Vanek D, Völgyi A, Ward D, Willemse P, Yap EP, Yong RY, Pajnič IZ and Kayser M

    Department of Forensic Molecular Biology, Erasmus MC University Medical Centre Rotterdam, Rotterdam, The Netherlands; Office of the Chief Forensic Scientist, Victoria Police Forensic Services Department, Macleod, Victoria, Australia.

    Relevant for various areas of human genetics, Y-chromosomal short tandem repeats (Y-STRs) are commonly used for testing close paternal relationships among individuals and populations, and for male lineage identification. However, even the widely used 17-loci Yfiler set cannot resolve individuals and populations completely. Here, 52 centers generated quality-controlled data of 13 rapidly mutating (RM) Y-STRs in 14,644 related and unrelated males from 111 worldwide populations. Strikingly, >99% of the 12,272 unrelated males were completely individualized. Haplotype diversity was extremely high (global: 0.9999985, regional: 0.99836-0.9999988). Haplotype sharing between populations was almost absent except for six (0.05%) of the 12,156 haplotypes. Haplotype sharing within populations was generally rare (0.8% nonunique haplotypes), significantly lower in urban (0.9%) than rural (2.1%) and highest in endogamous groups (14.3%). Analysis of molecular variance revealed 99.98% of variation within populations, 0.018% among populations within groups, and 0.002% among groups. Of the 2,372 newly and 156 previously typed male relative pairs, 29% were differentiated including 27% of the 2,378 father-son pairs. Relative to Yfiler, haplotype diversity was increased in 86% of the populations tested and overall male relative differentiation was raised by 23.5%. Our study demonstrates the value of RM Y-STRs in identifying and separating unrelated and related males and provides a reference database.

    Human mutation 2014;35;8;1021-32

  • Mutations in KPTN cause macrocephaly, neurodevelopmental delay, and seizures.

    Baple EL, Maroofian R, Chioza BA, Izadi M, Cross HE, Al-Turki S, Barwick K, Skrzypiec A, Pawlak R, Wagner K, Coblentz R, Zainy T, Patton MA, Mansour S, Rich P, Qualmann B, Hurles ME, Kessels MM and Crosby AH

    Monogenic Molecular Genetics, University of Exeter Medical School, St. Luke's Campus, Magdalen Road, Exeter EX1 2LU, UK.

    The proper development of neuronal circuits during neuromorphogenesis and neuronal-network formation is critically dependent on a coordinated and intricate series of molecular and cellular cues and responses. Although the cortical actin cytoskeleton is known to play a key role in neuromorphogenesis, relatively little is known about the specific molecules important for this process. Using linkage analysis and whole-exome sequencing on samples from families from the Amish community of Ohio, we have demonstrated that mutations in KPTN, encoding kaptin, cause a syndrome typified by macrocephaly, neurodevelopmental delay, and seizures. Our immunofluorescence analyses in primary neuronal cell cultures showed that endogenous and GFP-tagged kaptin associates with dynamic actin cytoskeletal structures and that this association is lost upon introduction of the identified mutations. Taken together, our studies have identified kaptin alterations responsible for macrocephaly and neurodevelopmental delay and define kaptin as a molecule crucial for normal human neuromorphogenesis.

    Funded by: Medical Research Council: G1001931, G1002279; Wellcome Trust: WT098051

    American journal of human genetics 2014;94;1;87-94

  • Transposon mutagenesis identifies genes driving hepatocellular carcinoma in a chronic hepatitis B mouse model.

    Bard-Chapeau EA, Nguyen AT, Rust AG, Sayadi A, Lee P, Chua BQ, New LS, de Jong J, Ward JM, Chin CK, Chew V, Toh HC, Abastado JP, Benoukraf T, Soong R, Bard FA, Dupuy AJ, Johnson RL, Radda GK, Chan EC, Wessels LF, Adams DJ, Jenkins NA and Copeland NG

    Institute Molecular and Cell Biology, Agency for Science, Technology and Research (A*STAR), Biopolis, Singapore.

    The most common risk factor for developing hepatocellular carcinoma (HCC) is chronic infection with hepatitis B virus (HBV). To better understand the evolutionary forces driving HCC, we performed a near-saturating transposon mutagenesis screen in a mouse HBV model of HCC. This screen identified 21 candidate early stage drivers and a very large number (2,860) of candidate later stage drivers that were enriched for genes that are mutated, deregulated or functioning in signaling pathways important for human HCC, with a striking 1,199 genes being linked to cellular metabolic processes. Our study provides a comprehensive overview of the genetic landscape of HCC.

    Funded by: Cancer Research UK: 13031; Wellcome Trust

    Nature genetics 2014;46;1;24-32

  • A genome wide association study of mathematical ability reveals an association at chromosome 3q29, a locus associated with autism and learning difficulties: a preliminary study.

    Baron-Cohen S, Murphy L, Chakrabarti B, Craig I, Mallya U, Lakatošová S, Rehnstrom K, Peltonen L, Wheelwright S, Allison C, Fisher SE and Warrier V

    Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridgeshire, United Kingdom; CLASS Clinic, Cambridgeshire and Peterborough NHS Foundation Trust (CPFT), Cambridgeshire, United Kingdom.

    Mathematical ability is heritable, but few studies have directly investigated its molecular genetic basis. Here we aimed to identify specific genetic contributions to variation in mathematical ability. We carried out a genome wide association scan using pooled DNA in two groups of U.K. samples, based on end of secondary/high school national academic exam achievement: high (n = 419) versus low (n = 183) mathematical ability while controlling for their verbal ability. Significant differences in allele frequencies between these groups were searched for in 906,600 SNPs using the Affymetrix GeneChip Human Mapping version 6.0 array. After meeting a threshold of p<1.5×10-5, 12 SNPs from the pooled association analysis were individually genotyped in 542 of the participants and analyzed to validate the initial associations (lowest p-value 1.14 ×10-6). In this analysis, one of the SNPs (rs789859) showed significant association after Bonferroni correction, and four (rs10873824, rs4144887, rs12130910 rs2809115) were nominally significant (lowest p-value 3.278 × 10-4). Three of the SNPs of interest are located within, or near to, known genes (FAM43A, SFT2D1, C14orf64). The SNP that showed the strongest association, rs789859, is located in a region on chromosome 3q29 that has been previously linked to learning difficulties and autism. rs789859 lies 1.3 kbp downstream of LSG1, and 700 bp upstream of FAM43A, mapping within the potential promoter/regulatory region of the latter. To our knowledge, this is only the second study to investigate the association of genetic variants with mathematical ability, and it highlights a number of interesting markers for future study.

    PloS one 2014;9;5;e96374

  • Global population structure and evolution of Bordetella pertussis and their relationship with vaccination.

    Bart MJ, Harris SR, Advani A, Arakawa Y, Bottero D, Bouchez V, Cassiday PK, Chiang CS, Dalby T, Fry NK, Gaillard ME, van Gent M, Guiso N, Hallander HO, Harvill ET, He Q, van der Heide HG, Heuvelman K, Hozbor DF, Kamachi K, Karataev GI, Lan R, Lutyłska A, Maharjan RP, Mertsola J, Miyamura T, Octavia S, Preston A, Quail MA, Sintchenko V, Stefanelli P, Tondella ML, Tsang RS, Xu Y, Yao SM, Zhang S, Parkhill J and Mooi FR

    Bordetella pertussis causes pertussis, a respiratory disease that is most severe for infants. Vaccination was introduced in the 1950s, and in recent years, a resurgence of disease was observed worldwide, with significant mortality in infants. Possible causes for this include the switch from whole-cell vaccines (WCVs) to less effective acellular vaccines (ACVs), waning immunity, and pathogen adaptation. Pathogen adaptation is suggested by antigenic divergence between vaccine strains and circulating strains and by the emergence of strains with increased pertussis toxin production. We applied comparative genomics to a worldwide collection of 343 B. pertussis strains isolated between 1920 and 2010. The global phylogeny showed two deep branches; the largest of these contained 98% of all strains, and its expansion correlated temporally with the first descriptions of pertussis outbreaks in Europe in the 16th century. We found little evidence of recent geographical clustering of the strains within this lineage, suggesting rapid strain flow between countries. We observed that changes in genes encoding proteins implicated in protective immunity that are included in ACVs occurred after the introduction of WCVs but before the switch to ACVs. Furthermore, our analyses consistently suggested that virulence-associated genes and genes coding for surface-exposed proteins were involved in adaptation. However, many of the putative adaptive loci identified have a physiological role, and further studies of these loci may reveal less obvious ways in which B. pertussis and the host interact. This work provides insight into ways in which pathogens may adapt to vaccination and suggests ways to improve pertussis vaccines. IMPORTANCE Whooping cough is mainly caused by Bordetella pertussis, and current vaccines are targeted against this organism. Recently, there have been increasing outbreaks of whooping cough, even where vaccine coverage is high. Analysis of the genomes of 343 B. pertussis isolates from around the world over the last 100 years suggests that the organism has emerged within the last 500 years, consistent with historical records. We show that global transmission of new strains is very rapid and that the worldwide population of B. pertussis is evolving in response to vaccine introduction, potentially enabling vaccine escape.

    Funded by: Wellcome Trust: 098051

    mBio 2014;5;2;e01074

  • Considerations when investigating lncRNA function in vivo.

    Bassett AR, Akhtar A, Barlow DP, Bird AP, Brockdorff N, Duboule D, Ephrussi A, Ferguson-Smith AC, Gingeras TR, Haerty W, Higgs DR, Miska EA and Ponting CP

    Andrew R Bassett is in the MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom.

    Although a small number of the vast array of animal long non-coding RNAs (lncRNAs) have known effects on cellular processes examined in vitro, the extent of their contributions to normal cell processes throughout development, differentiation and disease for the most part remains less clear. Phenotypes arising from deletion of an entire genomic locus cannot be unequivocally attributed either to the loss of the lncRNA per se or to the associated loss of other overlapping DNA regulatory elements. The distinction between cis- or trans-effects is also often problematic. We discuss the advantages and challenges associated with the current techniques for studying the in vivo function of lncRNAs in the light of different models of lncRNA molecular mechanism, and reflect on the design of experiments to mutate lncRNA loci. These considerations should assist in the further investigation of these transcriptional products of the genome.DOI:

    eLife 2014;3;e03058

  • Efficacy of a Plasmodium vivax Malaria Vaccine Using ChAd63 and Modified Vaccinia Ankara Expressing Thrombospondin-Related Anonymous Protein as Assessed with Transgenic Plasmodium berghei Parasites.

    Bauza K, Malinauskas T, Pfander C, Anar B, Jones EY, Billker O, Hill AV and Reyes-Sandoval A

    The Jenner Institute, University of Oxford, Oxford, United Kingdom.

    Plasmodium vivax is the world's most widely distributed malaria parasite and a potential cause of morbidity and mortality for approximately 2.85 billion people living mainly in Southeast Asia and Latin America. Despite this dramatic burden, very few vaccines have been assessed in humans. The clinically relevant vectors modified vaccinia virus Ankara (MVA) and the chimpanzee adenovirus ChAd63 are promising delivery systems for malaria vaccines due to their safety profiles and proven ability to induce protective immune responses against Plasmodium falciparum thrombospondin-related anonymous protein (TRAP) in clinical trials. Here, we describe the development of new recombinant ChAd63 and MVA vectors expressing P. vivax TRAP (PvTRAP) and show their ability to induce high antibody titers and T cell responses in mice. In addition, we report a novel way of assessing the efficacy of new candidate vaccines against P. vivax using a fully infectious transgenic Plasmodium berghei parasite expressing P. vivax TRAP to allow studies of vaccine efficacy and protective mechanisms in rodents. Using this model, we found that both CD8(+) T cells and antibodies mediated protection against malaria using virus-vectored vaccines. Our data indicate that ChAd63 and MVA expressing PvTRAP are good preerythrocytic-stage vaccine candidates with potential for future clinical application.

    Infection and immunity 2014;82;3;1277-86

  • Genome sequencing of normal cells reveals developmental lineages and mutational processes.

    Behjati S, Huch M, van Boxtel R, Karthaus W, Wedge DC, Tamuri AU, Martincorena I, Petljak M, Alexandrov LB, Gundem G, Tarpey PS, Roerink S, Blokker J, Maddison M, Mudie L, Robinson B, Nik-Zainal S, Campbell P, Goldman N, van de Wetering M, Cuppen E, Clevers H and Stratton MR

    1] Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK [2] Department of Paediatrics, University of Cambridge, Hills Road, Cambridge CB2 2XY, UK.

    The somatic mutations present in the genome of a cell accumulate over the lifetime of a multicellular organism. These mutations can provide insights into the developmental lineage tree, the number of divisions that each cell has undergone and the mutational processes that have been operative. Here we describe whole genomes of clonal lines derived from multiple tissues of healthy mice. Using somatic base substitutions, we reconstructed the early cell divisions of each animal, demonstrating the contributions of embryonic cells to adult tissues. Differences were observed between tissues in the numbers and types of mutations accumulated by each cell, which likely reflect differences in the number of cell divisions they have undergone and varying contributions of different mutational processes. If somatic mutation rates are similar to those in mice, the results indicate that precise insights into development and mutagenesis of normal human cells will be possible.

    Nature 2014

  • A Pathogenic Mosaic TP53 Mutation in Two Germ Layers Detected by Next Generation Sequencing.

    Behjati S, Maschietto M, Williams RD, Side L, Hubank M, West R, Pearson K, Sebire N, Tarpey P, Futreal A, Brooks T, Stratton MR and Anderson J

    Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom; Department of Paediatrics, University of Cambridge, Cambridge, United Kingdom.

    Background: Li-Fraumeni syndrome is caused by germline TP53 mutations and is clinically characterized by a predisposition to a range of cancers, most commonly sarcoma, brain tumours and leukemia. Pathogenic mosaic TP53 mutations have only rarely been described.

    We describe a 2 years old child presenting with three separate cancers over a 6 month period; two soft tissue mesenchymal tumors and an aggressive metastatic neuroblastoma. As conventional testing of blood DNA by Sanger sequencing for mutations in TP53, ALK, and SDH was negative, whole exome sequencing of the blood DNA of the patient and both parents was performed to screen more widely for cancer predisposing mutations. In the patient's but not the parents' DNA we found a c.743 G>A, p.Arg248Gln (CCDS11118.1) TP53 mutation in 3-20% of sequencing reads, a level that would not generally be detectable by Sanger sequencing. Homozygosity for this mutation was detected in all tumor samples analyzed, and germline mosaicism was demonstrated by analysis of the child's newborn blood spot DNA. The occurrence of separate tumors derived from different germ layers suggests that this de novo mutation occurred early in embryogenesis, prior to gastrulation.

    Conclusion: The case demonstrates pathogenic mosaicim, detected by next generation deep sequencing, that arose in the early stages of embryogenesis.

    PloS one 2014;9;5;e96531

  • Recurrent PTPRB and PLCG1 mutations in angiosarcoma.

    Behjati S, Tarpey PS, Sheldon H, Martincorena I, Van Loo P, Gundem G, Wedge DC, Ramakrishna M, Cooke SL, Pillay N, Vollan HK, Papaemmanuil E, Koss H, Bunney TD, Hardy C, Joseph OR, Martin S, Mudie L, Butler A, Teague JW, Patil M, Steers G, Cao Y, Gumbs C, Ingram D, Lazar AJ, Little L, Mahadeshwar H, Protopopov A, Al Sannaa GA, Seth S, Song X, Tang J, Zhang J, Ravi V, Torres KE, Khatri B, Halai D, Roxanis I, Baumhoer D, Tirabosco R, Amary MF, Boshoff C, McDermott U, Katan M, Stratton MR, Futreal PA, Flanagan AM, Harris A and Campbell PJ

    1] Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK. [2] Department of Paediatrics, University of Cambridge, Cambridge, UK. [3].

    Angiosarcoma is an aggressive malignancy that arises spontaneously or secondarily to ionizing radiation or chronic lymphoedema. Previous work has identified aberrant angiogenesis, including occasional somatic mutations in angiogenesis signaling genes, as a key driver of angiosarcoma. Here we employed whole-genome, whole-exome and targeted sequencing to study the somatic changes underpinning primary and secondary angiosarcoma. We identified recurrent mutations in two genes, PTPRB and PLCG1, which are intimately linked to angiogenesis. The endothelial phosphatase PTPRB, a negative regulator of vascular growth factor tyrosine kinases, harbored predominantly truncating mutations in 10 of 39 tumors (26%). PLCG1, a signal transducer of tyrosine kinases, encoded a recurrent, likely activating p.Arg707Gln missense variant in 3 of 34 cases (9%). Overall, 15 of 39 tumors (38%) harbored at least one driver mutation in angiogenesis signaling genes. Our findings inform and reinforce current therapeutic efforts to target angiogenesis signaling in angiosarcoma.

    Funded by: NCI NIH HHS: K08 CA160443; Wellcome Trust: 077012/Z/05/Z

    Nature genetics 2014;46;4;376-9

  • Keeping 53BP1 out of focus in mitosis.

    Belotserkovskaya R and Jackson SP

    The Wellcome Trust and Cancer Research UK Gurdon Institute and Department of Biochemistry, University of Cambridge, Cambridge, CB2 1QN, England, UK.

    A recent study published in Science reveals the mechanism and biological importance of DNA damage response abrogation in mitotic cells.

    Cell research 2014

  • Split reality for novel tick virus.

    Bennett HM

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2014

  • A High-Definition View of Functional Genetic Variation from Natural Yeast Genomes.

    Bergström A, Simpson JT, Salinas F, Barré B, Parts L, Zia A, Nguyen Ba AN, Moses AM, Louis EJ, Mustonen V, Warringer J, Durbin R and Liti G

    Institute for Research on Cancer and Ageing, Nice (IRCAN), University of Nice, Nice, France.

    The question of how genetic variation in a population influences phenotypic variation and evolution is of major importance in modern biology. Yet much is still unknown about the relative functional importance of different forms of genome variation and how they are shaped by evolutionary processes. Here we address these questions by population level sequencing of 42 strains from the budding yeast Saccharomyces cerevisiae and its closest relative S. paradoxus. We find that genome content variation, in the form of presence or absence as well as copy number of genetic material, is higher within S. cerevisiae than within S. paradoxus, despite genetic distances as measured in single-nucleotide polymorphisms being vastly smaller within the former species. This genome content variation, as well as loss-of-function variation in the form of premature stop codons and frameshifting indels, is heavily enriched in the subtelomeres, strongly reinforcing the relevance of these regions to functional evolution. Genes affected by these likely functional forms of variation are enriched for functions mediating interaction with the external environment (sugar transport and metabolism, flocculation, metal transport, and metabolism). Our results and analyses provide a comprehensive view of genomic diversity in budding yeast and expose surprising and pronounced differences between the variation within S. cerevisiae and that within S. paradoxus. We also believe that the sequence data and de novo assemblies will constitute a useful resource for further evolutionary and population genomics studies.

    Molecular biology and evolution 2014

  • Izumo meets Juno: Preventing polyspermy in fertilization.

    Bianchi E and Wright GJ

    Cell Surface Signalling Laboratory; Wellcome Trust Sanger Institute; Cambridge, UK.

    Cell cycle (Georgetown, Tex.) 2014;13;13

  • Juno is the egg Izumo receptor and is essential for mammalian fertilization.

    Bianchi E, Doe B, Goulding D and Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK.

    Fertilization occurs when sperm and egg recognize each other and fuse to form a new, genetically distinct organism. The molecular basis of sperm-egg recognition is unknown, but is likely to require interactions between receptor proteins displayed on their surface. Izumo1 is an essential sperm cell-surface protein, but its receptor on the egg has not been described. Here we identify folate receptor 4 (Folr4) as the receptor for Izumo1 on the mouse egg, and propose to rename it Juno. We show that the Izumo1-Juno interaction is conserved within several mammalian species, including humans. Female mice lacking Juno are infertile and Juno-deficient eggs do not fuse with normal sperm. Rapid shedding of Juno from the oolemma after fertilization suggests a mechanism for the membrane block to polyspermy, ensuring eggs normally fuse with just a single sperm. Our discovery of an essential receptor pair at the nexus of conception provides opportunities for the rational development of new fertility treatments and contraceptives.

    Funded by: Wellcome Trust: 098051

    Nature 2014;508;7497;483-7

  • A loss of function screen of identified genome-wide association study Loci reveals new genes controlling hematopoiesis.

    Bielczyk-Maczyńska E, Serbanovic-Canic J, Ferreira L, Soranzo N, Stemple DL, Ouwehand WH and Cvejic A

    Department of Haematology, University of Cambridge, Cambridge, United Kingdom; Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom; NHS Blood and Transplant, Cambridge, United Kingdom.

    The formation of mature cells by blood stem cells is very well understood at the cellular level and we know many of the key transcription factors that control fate decisions. However, many upstream signalling and downstream effector processes are only partially understood. Genome wide association studies (GWAS) have been particularly useful in providing new directions to dissect these pathways. A GWAS meta-analysis identified 68 genetic loci controlling platelet size and number. Only a quarter of those genes, however, are known regulators of hematopoiesis. To determine function of the remaining genes we performed a medium-throughput genetic screen in zebrafish using antisense morpholino oligonucleotides (MOs) to knock down protein expression, followed by histological analysis of selected genes using a wide panel of different hematopoietic markers. The information generated by the initial knockdown was used to profile phenotypes and to position candidate genes hierarchically in hematopoiesis. Further analysis of brd3a revealed its essential role in differentiation but not maintenance and survival of thrombocytes. Using the from-GWAS-to-function strategy we have not only identified a series of genes that represent novel regulators of thrombopoiesis and hematopoiesis, but this work also represents, to our knowledge, the first example of a functional genetic screening strategy that is a critical step toward obtaining biologically relevant functional data from GWA study for blood cell traits.

    PLoS genetics 2014;10;7;e1004450

  • Heterogeneity of genomic evolution and mutational profiles in multiple myeloma.

    Bolli N, Avet-Loiseau H, Wedge DC, Van Loo P, Alexandrov LB, Martincorena I, Dawson KJ, Iorio F, Nik-Zainal S, Bignell GR, Hinton JW, Li Y, Tubio JM, McLaren S, O' Meara S, Butler AP, Teague JW, Mudie L, Anderson E, Rashid N, Tai YT, Shammas MA, Sperling AS, Fulciniti M, Richardson PG, Parmigiani G, Magrangeas F, Minvielle S, Moreau P, Attal M, Facon T, Futreal PA, Anderson KC, Campbell PJ and Munshi NC

    1] Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK [2] Department of Haematology, University of Cambridge, CIMR, Cambridge CB2 0XY, UK.

    Multiple myeloma is an incurable plasma cell malignancy with a complex and incompletely understood molecular pathogenesis. Here we use whole-exome sequencing, copy-number profiling and cytogenetics to analyse 84 myeloma samples. Most cases have a complex subclonal structure and show clusters of subclonal variants, including subclonal driver mutations. Serial sampling reveals diverse patterns of clonal evolution, including linear evolution, differential clonal response and branching evolution. Diverse processes contribute to the mutational repertoire, including kataegis and somatic hypermutation, and their relative contribution changes over time. We find heterogeneity of mutational spectrum across samples, with few recurrent genes. We identify new candidate genes, including truncations of SP140, LTB, ROBO1 and clustered missense mutations in EGR1. The myeloma genome is heterogeneous across the cohort, and exhibits diversity in clonal admixture and in dynamics of evolution, which may impact prognostic stratification, therapeutic approaches and assessment of disease response to treatment.

    Nature communications 2014;5;2997

  • The Scramble Conversion Tool.

    Bonfield JK

    Wellcome Trust Sanger Institute

    Motivation: The reference CRAM file format implementation is in Java. We present "Scramble": a new C implementation of SAM, BAM and CRAM file I/O.

    Results: The C API for CRAM is 1.5-1.7x slower than BAM at decoding, but 1.8-2.6x faster at encoding. We see file size savings of 34-55%. Availability: Source code is available from under the BSD software licence.


    Bioinformatics (Oxford, England) 2014

  • Pre-Columbian mycobacterial genomes reveal seals as a source of New World human tuberculosis.

    Bos KI, Harkins KM, Herbig A, Coscolla M, Weber N, Comas I, Forrest SA, Bryant JM, Harris SR, Schuenemann VJ, Campbell TJ, Majander K, Wilbur AK, Guichon RA, Wolfe Steadman DL, Cook DC, Niemann S, Behr MA, Zumarraga M, Bastida R, Huson D, Nieselt K, Young D, Parkhill J, Buikstra JE, Gagneux S, Stone AC and Krause J

    1] Department of Archaeological Sciences, University of Tübingen, Ruemelinstraße 23, 72070 Tübingen, Germany [2].

    Modern strains of Mycobacterium tuberculosis from the Americas are closely related to those from Europe, supporting the assumption that human tuberculosis was introduced post-contact. This notion, however, is incompatible with archaeological evidence of pre-contact tuberculosis in the New World. Comparative genomics of modern isolates suggests that M. tuberculosis attained its worldwide distribution following human dispersals out of Africa during the Pleistocene epoch, although this has yet to be confirmed with ancient calibration points. Here we present three 1,000-year-old mycobacterial genomes from Peruvian human skeletons, revealing that a member of the M. tuberculosis complex caused human disease before contact. The ancient strains are distinct from known human-adapted forms and are most closely related to those adapted to seals and sea lions. Two independent dating approaches suggest a most recent common ancestor for the M. tuberculosis complex less than 6,000 years ago, which supports a Holocene dispersal of the disease. Our results implicate sea mammals as having played a role in transmitting the disease to humans across the ocean.

    Nature 2014

  • Multiplex PCR Assay for Unequivocal Differentiation of Actinobacillus pleuropneumoniae Serovars 1 to 3, 5 to 8, 10, and 12.

    Bossé JT, Li Y, Angen O, Weinert LA, Chaudhuri RR, Holden MT, Williamson SM, Maskell DJ, Tucker AW, Wren BW, Rycroft AN, Langford PR and BRaDP1T consortium

    Section of Paediatrics, Department of Medicine, Imperial College London, St. Mary's Campus, London, United Kingdom.

    An improved multiplex PCR, using redesigned primers targeting the serovar 3 capsule locus, which differentiates serovars 3, 6, and 8 Actinobacillus pleuropneumoniae isolates, is described. The new primers eliminate an aberrant serovar 3-indicative amplicon found in some serovar 6 clinical isolates. Furthermore, we have developed a new multiplex PCR for the detection of serovars 1 to 3, 5 to 8, 10, and 12 along with apxIV, thus extending the utility of this diagnostic PCR to cover a broader range of isolates.

    Journal of clinical microbiology 2014;52;7;2380-5

  • Human immunodeficiency virus Tat associates with a specific set of cellular RNAs.

    Bouwman RD, Palser A, Parry CM, Coulter E, Rasaiyaah J, Kellam P and Jenner RG

    MRC Centre for Medical Molecular Virology, Division of Infection and Immunity, University College London, London WC1E 6BT, UK.

    Background: Human Immunodeficiency Virus 1 (HIV-1) exhibits a wide range of interactions with the host cell but whether viral proteins interact with cellular RNA is not clear. A candidate interacting factor is the trans-activator of transcription (Tat) protein. Tat is required for expression of virus genes but activates transcription through an unusual mechanism; binding to an RNA stem-loop, the transactivation response element (TAR), with the host elongation factor P-TEFb. HIV-1 Tat has also been shown to alter the expression of host genes during infection, contributing to viral pathogenesis but, whether Tat also interacts with cellular RNAs is unknown.

    Results: Using RNA immunoprecipitation coupled with microarray analysis, we have discovered that HIV-1 Tat is associated with a specific set of human mRNAs in T cells. mRNAs bound by Tat share a stem-loop structural element and encode proteins with common biological roles. In contrast, we do not find evidence that Tat associates with microRNAs or the RNA-induced silencing complex (RISC). The interaction of Tat with cellular RNA requires an intact RNA binding domain and Tat RNA binding is linked to an increase in RNA abundance in cell lines and during infection of primary CD4+ T cells by HIV.

    Conclusions: We conclude that Tat interacts with a specific set of human mRNAs in T cells, many of which show changes in abundance in response to Tat and HIV infection. This work uncovers a previously unrecognised interaction between HIV and its host that may contribute to viral alteration of the host cellular environment.

    Funded by: Medical Research Council: G0802068; Wellcome Trust

    Retrovirology 2014;11;53

  • Expression of a single-chain human leukocyte antigen-DRA/DRB3*01:01 molecule and differential binding of a monoclonal antibody in the presence of specifically bound human platelet antigen-1a peptide.

    Bouwmans EE, Smethurst PA, Garner SF, Ouwehand WH and Morley SL

    Department of Haematology, University of Cambridge, Cambridge, UK.

    Background: Studies show that 1 in 1200 neonates have a low platelet (PLT) count due to alloimmunization against human PLT antigen (HPA)-1a (β3 -L33). This mainly occurs in HPA-1a-negative mothers who are positive for the human leukocyte antigen (HLA)-DRB3*01:01 allele, but only about one-third of cases will mount an effective alloimmune response. The development of specific treatment modalities requires that the mechanisms driving the maternal alloimmune response against the fetal PLTs be further explored. An antibody reagent that has a different binding affinity to HLA-DRA/DRB3*01:01 with and without the β3 -L33 peptide would be a valuable reagent to study peptide presentation on maternal antigen-presenting cells.

    To identify such antibodies, HLA-DRA/DRB3*01:01 was recombinantly expressed in Drosophila S2 cells. To delineate the epitope of interesting antibodies, seven mutant HLA-DRA/DRB3*01:01 molecules were generated by site-directed mutagenesis introducing naturally occurring amino acid changes encoded by DRB3*02 and DRB3*03 alleles.

    Results: The murine monoclonal antibody (MoAb) DA2 showed robust binding by enzyme-linked immunosorbent assay to recombinant HLA-DRA/DRB3*01:01, but binding was reduced in the presence of β3 -L33 peptide. The binding affinity of DA2 to the mutant HLA-DRA/DRB3*0101 in which serine at Position 60 of the β1-chain was replaced by tyrosine was greatly enhanced. Interestingly the binding of DA2 to the mutant was not reduced by the presence of β3 -L33 peptide.

    Conclusion: The results of this study generate a molecular model of the interaction of the HLA-DRA/DRB3*01:01 molecule with MoAb DA2. This will inform functional studies with the recombinant Class II molecules.

    Transfusion 2014;54;6;1478-85

  • DECIPHER: database for the interpretation of phenotype-linked plausibly pathogenic sequence and copy-number variation.

    Bragin E, Chatzimichali EA, Wright CF, Hurles ME, Firth HV, Bevan AP and Swaminathan GJ

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK and Cambridge University Department of Medical Genetics, Addenbrooke's Hospital, Cambridge CB2 2QQ, UK.

    The DECIPHER database ( is an accessible online repository of genetic variation with associated phenotypes that facilitates the identification and interpretation of pathogenic genetic variation in patients with rare disorders. Contributing to DECIPHER is an international consortium of >200 academic clinical centres of genetic medicine and ≥1600 clinical geneticists and diagnostic laboratory scientists. Information integrated from a variety of bioinformatics resources, coupled with visualization tools, provides a comprehensive set of tools to identify other patients with similar genotype-phenotype characteristics and highlights potentially pathogenic genes. In a significant development, we have extended DECIPHER from a database of just copy-number variants to allow upload, annotation and analysis of sequence variants such as single nucleotide variants (SNVs) and InDels. Other notable developments in DECIPHER include a purpose-built, customizable and interactive genome browser to aid combined visualization and interpretation of sequence and copy-number variation against informative datasets of pathogenic and population variation. We have also introduced several new features to our deposition and analysis interface. This article provides an update to the DECIPHER database, an earlier instance of which has been described elsewhere [Swaminathan et al. (2012) DECIPHER: web-based, community resource for clinical interpretation of rare variants in developmental disorders. Hum. Mol. Genet., 21, R37-R44].

    Funded by: Wellcome Trust: WT077008

    Nucleic acids research 2014;42;Database issue;D993-D1000

  • Phosphoinositide metabolism links cGMP-dependent protein kinase G to essential Ca²⁺ signals at key decision points in the life cycle of malaria parasites.

    Brochet M, Collins MO, Smith TK, Thompson E, Sebastian S, Volkmann K, Schwach F, Chappell L, Gomes AR, Berriman M, Rayner JC, Baker DA, Choudhary J and Billker O

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    Many critical events in the Plasmodium life cycle rely on the controlled release of Ca²⁺ from intracellular stores to activate stage-specific Ca²⁺-dependent protein kinases. Using the motility of Plasmodium berghei ookinetes as a signalling paradigm, we show that the cyclic guanosine monophosphate (cGMP)-dependent protein kinase, PKG, maintains the elevated level of cytosolic Ca²⁺ required for gliding motility. We find that the same PKG-dependent pathway operates upstream of the Ca²⁺ signals that mediate activation of P. berghei gametocytes in the mosquito and egress of Plasmodium falciparum merozoites from infected human erythrocytes. Perturbations of PKG signalling in gliding ookinetes have a marked impact on the phosphoproteome, with a significant enrichment of in vivo regulated sites in multiple pathways including vesicular trafficking and phosphoinositide metabolism. A global analysis of cellular phospholipids demonstrates that in gliding ookinetes PKG controls phosphoinositide biosynthesis, possibly through the subcellular localisation or activity of lipid kinases. Similarly, phosphoinositide metabolism links PKG to egress of P. falciparum merozoites, where inhibition of PKG blocks hydrolysis of phosphatidylinostitol (4,5)-bisphosphate. In the face of an increasing complexity of signalling through multiple Ca²⁺ effectors, PKG emerges as a unifying factor to control multiple cellular Ca²⁺ signals essential for malaria parasite development and transmission.

    Funded by: Medical Research Council: G0501670, G10000779; Wellcome Trust: 079643/Z/06/Z, WT093228, WT094752, WT098051

    PLoS biology 2014;12;3;e1001806

  • Genetic interactions affecting human gene expression identified by variance association mapping.

    Brown AA, Buil A, Viñuela A, Lappalainen T, Zheng HF, Richards JB, Small KS, Spector TD, Dermitzakis ET and Durbin R

    Informatics , Wellcome Trust Sanger Institute , Cambridge , United Kingdom.

    Non-additive interaction between genetic variants, or epistasis, is a possible explanation for the gap between heritability of complex traits and the variation explained by identified genetic loci. Interactions give rise to genotype dependent variance, and therefore the identification of variance quantitative trait loci can be an intermediate step to discover both epistasis and gene by environment effects (GxE). Using RNA-sequence data from lymphoblastoid cell lines (LCLs) from the TwinsUK cohort, we identify a candidate set of 508 variance associated SNPs. Exploiting the twin design we show that GxE plays a role in ~70% of these associations. Further investigation of these loci reveals 57 epistatic interactions that replicated in a smaller dataset, explaining on average 4.3% of phenotypic variance. In 24 cases, more variance is explained by the interaction than their additive contributions. Using molecular phenotypes in this way may provide a route to uncovering genetic interactions underlying more complex traits.

    eLife 2014

  • Cis and trans effects of human genomic variants on gene expression.

    Bryois J, Buil A, Evans DM, Kemp JP, Montgomery SB, Conrad DF, Ho KM, Ring S, Hurles M, Deloukas P, Davey Smith G and Dermitzakis ET

    Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland; Institute of Genetics and Genomics in Geneva (iGE3), Geneva, Switzerland; Swiss Institute of Bioinformatics (SIB), Geneva, Switzerland.

    Gene expression is a heritable cellular phenotype that defines the function of a cell and can lead to diseases in case of misregulation. In order to detect genetic variations affecting gene expression, we performed association analysis of single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) with gene expression measured in 869 lymphoblastoid cell lines of the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort in cis and in trans. We discovered that 3,534 genes (false discovery rate (FDR) = 5%) are affected by an expression quantitative trait locus (eQTL) in cis and 48 genes are affected in trans. We observed that CNVs are more likely to be eQTLs than SNPs. In addition, we found that variants associated to complex traits and diseases are enriched for trans-eQTLs and that trans-eQTLs are enriched for cis-eQTLs. As a variant affecting both a gene in cis and in trans suggests that the cis gene is functionally linked to the trans gene expression, we looked specifically for trans effects of cis-eQTLs. We discovered that 26 cis-eQTLs are associated to 92 genes in trans with the cis-eQTLs of the transcriptions factors BATF3 and HMX2 affecting the most genes. We then explored if the variation of the level of expression of the cis genes were causally affecting the level of expression of the trans genes and discovered several causal relationships between variation in the level of expression of the cis gene and variation of the level of expression of the trans gene. This analysis shows that a large sample size allows the discovery of secondary effects of human variations on gene expression that can be used to construct short directed gene regulatory networks.

    PLoS genetics 2014;10;7;e1004461

  • A multi-country outbreak of Salmonella Newport gastroenteritis in Europe associated with watermelon from Brazil, confirmed by whole genome sequencing: October 2011 to January 2012.

    Byrne L, Fisher I, Peters T, Mather A, Thomson N, Rosner B, Bernard H, McKeown P, Cormican M, Cowden J, Aiyedun V, Lane C and On Behalf Of The International Outbreak Control Team C

    Gastrointestinal, Emerging and Zoonotic Infections, Centre for Infectious Disease Surveillance and Control, Public Health England, Colindale, London, United Kingdom.

    Euro surveillance : bulletin Européen sur les maladies transmissibles = European communicable disease bulletin 2014;19;31

  • Genome-wide analysis of cold adaptation in indigenous Siberian populations.

    Cardona A, Pagani L, Antao T, Lawson DJ, Eichstaedt CA, Yngvadottir B, Shwe MT, Wee J, Romero IG, Raj S, Metspalu M, Villems R, Willerslev E, Tyler-Smith C, Malyarchuk BA, Derenko MV and Kivisild T

    Department of Archaeology and Anthropology, University of Cambridge, Cambridge, United Kingdom.

    Following the dispersal out of Africa, where hominins evolved in warm environments for millions of years, our species has colonised different climate zones of the world, including high latitudes and cold environments. The extent to which human habitation in (sub-)Arctic regions has been enabled by cultural buffering, short-term acclimatization and genetic adaptations is not clearly understood. Present day indigenous populations of Siberia show a number of phenotypic features, such as increased basal metabolic rate, low serum lipid levels and increased blood pressure that have been attributed to adaptation to the extreme cold climate. In this study we introduce a dataset of 200 individuals from ten indigenous Siberian populations that were genotyped for 730,525 SNPs across the genome to identify genes and non-coding regions that have undergone unusually rapid allele frequency and long-range haplotype homozygosity change in the recent past. At least three distinct population clusters could be identified among the Siberians, each of which showed a number of unique signals of selection. A region on chromosome 11 (chr11:66-69 Mb) contained the largest amount of clustering of significant signals and also the strongest signals in all the different selection tests performed. We present a list of candidate cold adaption genes that showed significant signals of positive selection with our strongest signals associated with genes involved in energy regulation and metabolism (CPT1A, LRP5, THADA) and vascular smooth muscle contraction (PRKG1). By employing a new method that paints phased chromosome chunks by their ancestry we distinguish local Siberian-specific long-range haplotype signals from those introduced by admixture.

    Funded by: Wellcome Trust: 098051

    PloS one 2014;9;5;e98076

  • Single nucleotide polymorphisms with cis-regulatory effects on long non-coding transcripts in human primary monocytes.

    Carlsson Almlöf J, Lundmark P, Lundmark A, Ge B, Pastinen T, Cardiogenics Consortium, Goodall AH, Cambien F, Deloukas P, Ouwehand WH and Syvänen AC

    Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, Uppsala, Sweden.

    We applied genome-wide allele-specific expression analysis of monocytes from 188 samples. Monocytes were purified from white blood cells of healthy blood donors to detect cis-acting genetic variation that regulates the expression of long non-coding RNAs. We analysed 8929 regions harboring genes for potential long non-coding RNA that were retrieved from data from the ENCODE project. Of these regions, 60% were annotated as intergenic, which implies that they do not overlap with protein-coding genes. Focusing on the intergenic regions, and using stringent analysis of the allele-specific expression data, we detected robust cis-regulatory SNPs in 258 out of 489 informative intergenic regions included in the analysis. The cis-regulatory SNPs that were significantly associated with allele-specific expression of long non-coding RNAs were enriched to enhancer regions marked for active or bivalent, poised chromatin by histone modifications. Out of the lncRNA regions regulated by cis-acting regulatory SNPs, 20% (n = 52) were co-regulated with the closest protein coding gene. We compared the identified cis-regulatory SNPs with those in the catalog of SNPs identified by genome-wide association studies of human diseases and traits. This comparison identified 32 SNPs in loci from genome-wide association studies that displayed a strong association signal with allele-specific expression of non-coding RNAs in monocytes, with p-values ranging from 6.7×10-7 to 9.5×10-89. The identified cis-regulatory SNPs are associated with diseases of the immune system, like multiple sclerosis and rheumatoid arthritis.

    PloS one 2014;9;7;e102612

  • Exome sequencing improves genetic diagnosis of structural fetal abnormalities revealed by ultrasound.

    Carss KJ, Hillman SC, Parthiban V, McMullan DJ, Maher ER, Kilby MD and Hurles ME

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    The genetic aetiology of non-aneuploid fetal structural abnormalities is typically investigated by karyotyping and array-based detection of microscopically detectable rearrangements, and submicroscopic copy number variants (CNVs), which collectively yield a pathogenic finding in up to 10% of cases. We propose that exome sequencing may substantially increase the identification of underlying aetiologies.We performed exome sequencing on a cohort of 30 non-aneuploid fetuses and neonates (along with their parents) with diverse structural abnormalities first identified by prenatal ultrasound. We identified candidate pathogenic variants with a range of inheritance models, and evaluated these in the context of detailed phenotypic information.We identified 35 de novo single nucleotide variants (SNVs), small indels, deletions or duplications, of which three (accounting for 10% of the cohort) are highly likely to be causative. These are de novo missense variants in FGFR3 and COL2A1, and a de novo 16·8 kb deletion that includes most of OFD1. In five further cases (17%) we identified de novo or inherited recessive or X-linked variants in plausible candidate genes, which require additional validation to determine pathogenicity.Our diagnostic yield of 10% is comparable to, and supplementary to, the diagnostic yield of existing microarray testing for large chromosomal rearrangements and targeted CNV detection. The de novo nature of these events could enable couples to be counselled as to their low recurrence risk. This study outlines the way for a substantial improvement in the diagnostic yield of prenatal genetic abnormalities through the application of next generation sequencing.

    Human molecular genetics 2014

  • Evolution and transmission of drug-resistant tuberculosis in a Russian population.

    Casali N, Nikolayevskyy V, Balabanova Y, Harris SR, Ignatyeva O, Kontsevaya I, Corander J, Bryant J, Parkhill J, Nejentsev S, Horstmann RD, Brown T and Drobniewski F

    Public Health England (PHE) National Mycobacterium Reference Laboratory, Clinical TB and HIV Group, Blizard Institute, Queen Mary University of London, London, UK.

    The molecular mechanisms determining the transmissibility and prevalence of drug-resistant tuberculosis in a population were investigated through whole-genome sequencing of 1,000 prospectively obtained patient isolates from Russia. Two-thirds belonged to the Beijing lineage, which was dominated by two homogeneous clades. Multidrug-resistant (MDR) genotypes were found in 48% of isolates overall and in 87% of the major clades. The most common rpoB mutation was associated with fitness-compensatory mutations in rpoA or rpoC, and a new intragenic compensatory substitution was identified. The proportion of MDR cases with extensively drug-resistant (XDR) tuberculosis was 16% overall, with 65% of MDR isolates harboring eis mutations, selected by kanamycin therapy, which may drive the expansion of strains with enhanced virulence. The combination of drug resistance and compensatory mutations displayed by the major clades confers clinical resistance without compromising fitness and transmissibility, showing that, in addition to weaknesses in the tuberculosis control program, biological factors drive the persistence and spread of MDR and XDR tuberculosis in Russia and beyond.

    Nature genetics 2014

  • Whole-genome sequencing reveals clonal expansion of multiresistant Staphylococcus haemolyticus in European hospitals.

    Cavanagh JP, Hjerde E, Holden MT, Kahlke T, Klingenberg C, Flægstad T, Parkhill J, Bentley SD and Sollid JU

    Department of Paediatrics, University Hospital of North Norway, Tromsø, Norway Department of Clinical Medicine, UiT The Arctic University of Norway, Tromsø, Norway

    Objectives: Staphylococcus haemolyticus is an emerging cause of nosocomial infections, primarily affecting immunocompromised patients. A comparative genomic analysis was performed on clinical S. haemolyticus isolates to investigate their genetic relationship and explore the coding sequences with respect to antimicrobial resistance determinants and putative hospital adaptation.

    Methods: Whole-genome sequencing was performed on 134 isolates of S. haemolyticus from geographically diverse origins (Belgium, 2; Germany, 10; Japan, 13; Norway, 54; Spain, 2; Switzerland, 43; UK, 9; USA, 1). Each genome was individually assembled. Protein coding sequences (CDSs) were predicted and homologous genes were categorized into three types: Type I, core genes, homologues present in all strains; Type II, unique core genes, homologues shared by only a subgroup of strains; and Type III, unique genes, strain-specific CDSs. The phylogenetic relationship between the isolates was built from variable sites in the form of single nucleotide polymorphisms (SNPs) in the core genome and used to construct a maximum likelihood phylogeny.

    Results: SNPs in the genome core regions divided the isolates into one major group of 126 isolates and one minor group of isolates with highly diverse genomes. The major group was further subdivided into seven clades (A-G), of which four (A-D) encompassed isolates only from Europe. Antimicrobial multiresistance was observed in 77.7% of the collection. High levels of homologous recombination were detected in genes involved in adherence, staphylococcal host adaptation and bacterial cell communication.

    Conclusions: The presence of several successful and highly resistant clones underlines the adaptive potential of this opportunistic pathogen.

    The Journal of antimicrobial chemotherapy 2014

  • Found in translation.

    Chappell L

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2014;12;4;238

  • A reduction in Ptprq associated with specific features of the deafness phenotype of the miR-96 mutant mouse diminuendo.

    Chen J, Johnson SL, Lewis MA, Hilton JM, Huma A, Marcotti W and Steel KP

    Wellcome Trust Sanger Institute, Cambridge, UK; Wolfson Centre for Age-Related Diseases, King's College London, Guy's Campus, London, SE1 1UL, UK.

    miR-96 is a microRNA, a non-coding RNA gene which regulates a wide array of downstream genes. The miR-96 mouse mutant diminuendo exhibits deafness and arrested hair cell functional and morphological differentiation. We have previously shown that several genes are markedly downregulated in the diminuendo organ of Corti; one of these is Ptprq, a gene known to be important for maturation and maintenance of hair cells. In order to study the contribution that downregulation of Ptprq makes to the diminuendo phenotype, we carried out microarrays, scanning electron microscopy and single hair cell electrophysiology to compare diminuendo mutants (heterozygous and homozygous) with mice homozygous for a functional null allele of Ptprq. In terms of both morphology and electrophysiology, the auditory phenotype of mice lacking Ptprq resembles that of diminuendo heterozygotes, while diminuendo homozygotes are more severely affected. A comparison of transcriptomes indicates there is a broad similarity between diminuendo homozygotes and Ptprq-null mice. The reduction in Ptprq observed in diminuendo mice appears to be a major contributor to the morphological, transcriptional and electrophysiological phenotype, but does not account for the complete diminuendo phenotype.

    The European journal of neuroscience 2014

  • Dense genomic sampling identifies highways of pneumococcal recombination.

    Chewapreecha C, Harris SR, Croucher NJ, Turner C, Marttinen P, Cheng L, Pessia A, Aanensen DM, Mather AE, Page AJ, Salter SJ, Harris D, Nosten F, Goldblatt D, Corander J, Parkhill J, Turner P and Bentley SD

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Evasion of clinical interventions by Streptococcus pneumoniae occurs through selection of non-susceptible genomic variants. We report whole-genome sequencing of 3,085 pneumococcal carriage isolates from a 2.4-km(2) refugee camp. This sequencing provides unprecedented resolution of the process of recombination and its impact on population evolution. Genomic recombination hotspots show remarkable consistency between lineages, indicating common selective pressures acting at certain loci, particularly those associated with antibiotic resistance. Temporal changes in antibiotic consumption are reflected in changes in recombination trends, demonstrating rapid spread of resistance when selective pressure is high. The highest frequencies of receipt and donation of recombined DNA fragments were observed in non-encapsulated lineages, implying that this largely overlooked pneumococcal group, which is beyond the reach of current vaccines, may have a major role in genetic exchange and the adaptation of the species as a whole. These findings advance understanding of pneumococcal population dynamics and provide information for the design of future intervention strategies.

    Funded by: Wellcome Trust: 098051

    Nature genetics 2014;46;3;305-9

  • Comprehensive Identification of Single Nucleotide Polymorphisms Associated with Beta-lactam Resistance within Pneumococcal Mosaic Genes.

    Chewapreecha C, Marttinen P, Croucher NJ, Salter SJ, Harris SR, Mather AE, Hanage WP, Goldblatt D, Nosten FH, Turner C, Turner P, Bentley SD and Parkhill J

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    Traditional genetic association studies are very difficult in bacteria, as the generally limited recombination leads to large linked haplotype blocks, confounding the identification of causative variants. Beta-lactam antibiotic resistance in Streptococcus pneumoniae arises readily as the bacteria can quickly incorporate DNA fragments encompassing variants that make the transformed strains resistant. However, the causative mutations themselves are embedded within larger recombined blocks, and previous studies have only analysed a limited number of isolates, leading to the description of "mosaic genes" as being responsible for resistance. By comparing a large number of genomes of beta-lactam susceptible and non-susceptible strains, the high frequency of recombination should break up these haplotype blocks and allow the use of genetic association approaches to identify individual causative variants. Here, we performed a genome-wide association study to identify single nucleotide polymorphisms (SNPs) and indels that could confer beta-lactam non-susceptibility using 3,085 Thai and 616 USA pneumococcal isolates as independent datasets for the variant discovery. The large sample sizes allowed us to narrow the source of beta-lactam non-susceptibility from long recombinant fragments down to much smaller loci comprised of discrete or linked SNPs. While some loci appear to be universal resistance determinants, contributing equally to non-susceptibility for at least two classes of beta-lactam antibiotics, some play a larger role in resistance to particular antibiotics. All of the identified loci have a highly non-uniform distribution in the populations. They are enriched not only in vaccine-targeted, but also non-vaccine-targeted lineages, which may raise clinical concerns. Identification of single nucleotide polymorphisms underlying resistance will be essential for future use of genome sequencing to predict antibiotic sensitivity in clinical microbiology.

    PLoS genetics 2014;10;8;e1004547

  • International glossina genome initiative 2004-2014: a driver for post-genomic era research on the african continent.

    Christoffels A, Masiga D, Berriman M, Lehane M, Touré Y and Aksoy S

    South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa.

    PLoS neglected tropical diseases 2014;8;8;e3024

  • From cheek swabs to consensus sequences: an A to Z protocol for high-throughput DNA sequencing of complete human mitochondrial genomes.

    Clarke AC, Prost S, Stanton JA, White WT, Kaplan ME, Matisoo-Smith EA and Genographic Consortium

    Department of Anatomy, University of Otago, Dunedin, New Zealand.

    Background: Next-generation DNA sequencing (NGS) technologies have made huge impacts in many fields of biological research, but especially in evolutionary biology. One area where NGS has shown potential is for high-throughput sequencing of complete mtDNA genomes (of humans and other animals). Despite the increasing use of NGS technologies and a better appreciation of their importance in answering biological questions, there remain significant obstacles to the successful implementation of NGS-based projects, especially for new users.

    Results: Here we present an 'A to Z' protocol for obtaining complete human mitochondrial (mtDNA) genomes - from DNA extraction to consensus sequence. Although designed for use on humans, this protocol could also be used to sequence small, organellar genomes from other species, and also nuclear loci. This protocol includes DNA extraction, PCR amplification, fragmentation of PCR products, barcoding of fragments, sequencing using the 454 GS FLX platform, and a complete bioinformatics pipeline (primer removal, reference-based mapping, output of coverage plots and SNP calling).

    Conclusions: All steps in this protocol are designed to be straightforward to implement, especially for researchers who are undertaking next-generation sequencing for the first time. The molecular steps are scalable to large numbers (hundreds) of individuals and all steps post-DNA extraction can be carried out in 96-well plate format. Also, the protocol has been assembled so that individual 'modules' can be swapped out to suit available resources.

    BMC genomics 2014;15;68

  • Adaptive introgression between Anopheles sibling species eliminates a major genomic island but not reproductive isolation.

    Clarkson CS, Weetman D, Essandoh J, Yawson AE, Maslen G, Manske M, Field SG, Webster M, Antão T, MacInnis B, Kwiatkowski D and Donnelly MJ

    1] Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, Liverpool L3 5QA, UK [2].

    Adaptive introgression can provide novel genetic variation to fuel rapid evolutionary responses, though it may be counterbalanced by potential for detrimental disruption of the recipient genomic background. We examine the extent and impact of recent introgression of a strongly selected insecticide-resistance mutation (Vgsc-1014F) located within one of two exceptionally large genomic islands of divergence separating the Anopheles gambiae species pair. Here we show that transfer of the Vgsc mutation results in homogenization of the entire genomic island region (~1.5% of the genome) between species. Despite this massive disruption, introgression is clearly adaptive with a dramatic rise in frequency of Vgsc-1014F and no discernable impact on subsequent reproductive isolation between species. Our results show (1) how resilience of genomes to massive introgression can permit rapid adaptive response to anthropogenic selection and (2) that even extreme prominence of genomic islands of divergence can be an unreliable indicator of importance in speciation.

    Nature communications 2014;5;4248

  • PolyTB: A genomic variation map for Mycobacterium tuberculosis.

    Coll F, Preston M, Guerra-Assunção JA, Hill-Cawthorn G, Harris D, Perdigão J, Viveiros M, Portugal I, Drobniewski F, Gagneux S, Glynn JR, Pain A, Parkhill J, McNerney R, Martin N and Clark TG

    Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, WC1E 7HT London, UK. Electronic address:

    Tuberculosis (TB) caused by Mycobacterium tuberculosis (Mtb) is the second major cause of death from an infectious disease worldwide. Recent advances in DNA sequencing are leading to the ability to generate whole genome information in clinical isolates of M. tuberculosis complex (MTBC). The identification of informative genetic variants such as phylogenetic markers and those associated with drug resistance or virulence will help barcode Mtb in the context of epidemiological, diagnostic and clinical studies. Mtb genomic datasets are increasingly available as raw sequences, which are potentially difficult and computer intensive to process, and compare across studies. Here we have processed the raw sequence data (>1500 isolates, eight studies) to compile a catalogue of SNPs (n = 74,039, 63% non-synonymous, 51.1% in more than one isolate, i.e. non-private), small indels (n = 4810) and larger structural variants (n = 800). We have developed the PolyTB web-based tool ( to visualise the resulting variation and important meta-data (e.g. in silico inferred strain-types, location) within geographical map and phylogenetic views. This resource will allow researchers to identify polymorphisms within candidate genes of interest, as well as examine the genomic diversity and distribution of strains. PolyTB source code is freely available to researchers wishing to develop similar tools for their pathogen of interest.

    Tuberculosis (Edinburgh, Scotland) 2014

  • Confident and sensitive phosphoproteomics using combinations of collision induced dissociation and electron transfer dissociation.

    Collins MO, Wright JC, Jones M, Rayner JC and Choudhary JS

    Proteomic Mass Spectrometry, The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK.

    We present a workflow using an ETD-optimised version of Mascot Percolator and a modified version of SLoMo (turbo-SLoMo) for analysis of phosphoproteomic data. We have benchmarked this against several database searching algorithms and phosphorylation site localisation tools and show that it offers highly sensitive and confident phosphopeptide identification and site assignment with PSM-level statistics, enabling rigorous comparison of data acquisition methods. We analysed the Plasmodium falciparum schizont phosphoproteome using for the first time, a data-dependent neutral loss-triggered-ETD (DDNL) strategy and a conventional decision-tree method. At a posterior error probability threshold of 0.01, similar numbers of PSMs were identified using both methods with a 73% overlap in phosphopeptide identifications. The false discovery rate associated with spectral pairs where DDNL CID/ETD identified the same phosphopeptide was <1%. 72% of phosphorylation site assignments using turbo-SLoMo without any score filtering, were identical and 99.8% of these cases are associated with a the false localisation rate of <5%. We show that DDNL acquisition is a useful approach for phosphoproteomics and results in increased confidence in phosphopeptide identification without compromising sensitivity or duty cycle. Furthermore, the combination of Mascot Percolator and turbo-SLoMo represents a robust workflow for phosphoproteomic data analysis using CID and ETD fragmentation.

    Protein phosphorylation is a ubiquitous post-translational modification that regulates protein function. Mass spectrometry-based approaches have revolutionised its analysis on a large-scale but phosphorylation sites are often identified by single phosphopeptides and therefore require more rigorous data analysis to unsure that sites are identified with high confidence for follow up experiments to investigate their biological significance. The coverage and confidence of phosphoproteomic experiments can be enhanced by the use of multiple complementary fragmentation methods. Here we have benchmarked a data analysis pipeline for analysis of phosphoproteomic data generated using CID and ETD fragmentation and used it to demonstrate the utility of a data-dependent neutral loss triggered ETD fragmentation strategy for high confidence phosphopeptide identification and phosphorylation site localisation.

    Journal of proteomics 2014

  • Processed pseudogenes acquired somatically during cancer development.

    Cooke SL, Shlien A, Marshall J, Pipinikas CP, Martincorena I, Tubio JM, Li Y, Menzies A, Mudie L, Ramakrishna M, Yates L, Davies H, Bolli N, Bignell GR, Tarpey PS, Behjati S, Nik-Zainal S, Papaemmanuil E, Teixeira VH, Raine K, O'Meara S, Dodoran MS, Teague JW, Butler AP, Iacobuzio-Donahue C, Santarius T, Grundy RG, Malkin D, Greaves M, Munshi N, Flanagan AM, Bowtell D, Martin S, Larsimont D, Reis-Filho JS, Boussioutas A, Taylor JA, Hayes ND, Janes SM, Futreal PA, Stratton MR, McDermott U, Campbell PJ and ICGC Breast Cancer Group

    Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    Cancer evolves by mutation, with somatic reactivation of retrotransposons being one such mutational process. Germline retrotransposition can cause processed pseudogenes, but whether this occurs somatically has not been evaluated. Here we screen sequencing data from 660 cancer samples for somatically acquired pseudogenes. We find 42 events in 17 samples, especially non-small cell lung cancer (5/27) and colorectal cancer (2/11). Genomic features mirror those of germline LINE element retrotranspositions, with frequent target-site duplications (67%), consensus TTTTAA sites at insertion points, inverted rearrangements (21%), 5' truncation (74%) and polyA tails (88%). Transcriptional consequences include expression of pseudogenes from UTRs or introns of target genes. In addition, a somatic pseudogene that integrated into the promoter and first exon of the tumour suppressor gene, MGA, abrogated expression from that allele. Thus, formation of processed pseudogenes represents a new class of mutation occurring during cancer development, with potentially diverse functional consequences depending on genomic context.

    Nature communications 2014;5;3644

  • Genomic identification of a novel co-trimoxazole resistance genotype and its prevalence amongst Streptococcus pneumoniae in Malawi.

    Cornick JE, Harris SR, Parry CM, Moore MJ, Jassi C, Kamng'ona A, Kulohoma B, Heyderman RS, Bentley SD and Everett DB

    Malawi-Liverpool-Wellcome Clinical Research Programme, University of Malawi, College of Medicine, Blantyre, Malawi.

    Objectives: This study aimed to define the molecular basis of co-trimoxazole resistance in Malawian pneumococci under the dual selective pressure of widespread co-trimoxazole and sulfadoxine/pyrimethamine use. Methods: We measured the trimethoprim and sulfamethoxazole MICs and analysed folA and folP nucleotide and translated amino acid sequences for 143 pneumococci isolated from carriage and invasive disease in Malawi (2002-08). Results: Pneumococci were highly resistant to both trimethoprim and sulfamethoxazole (96%, 137/143). Sulfamethoxazole-resistant isolates showed a 3 or 6 bp insertion in the sulphonamide-binding site of folP. The trimethoprim-resistant isolates fell into three genotypic groups based on dihydrofolate reductase (encoded by folA) mutations: Ile-100-Leu (10%), the Ile-100-Leu substitution together with a residue 92 substitution (56%) and those with a novel uncharacterized resistance genotype (34%). The nucleotide sequence divergence and dN/dS of folA and folP remained stable from 2004 onwards. Conclusions: S. pneumoniae exhibit almost universal co-trimoxazole resistance in vitro and in silico that we believe is driven by extensive co-trimoxazole and sulfadoxine/pyrimethamine use. More than one-third of pneumococci employ a novel mechanism of co-trimoxazole resistance. Resistance has now reached a point of stabilizing evolution. The use of co-trimoxazole to prevent pneumococcal infection in HIV/AIDS patients in sub-Saharan Africa should be re-evaluated.

    The Journal of antimicrobial chemotherapy 2014;69;2;368-74

  • BioJS: an open source standard for biological visualisation - its status in 2014.

    Corpas M, Jimenez R, Carbon SJ, García A, Garcia L, Goldberg T, Gomez J, Kalderimis A, Lewis SE, Mulvany I, Pawlik A, Rowland F, Salazar G, Schreiber F, Sillitoe I, Spooner WH, Thanki AS, Villaveces JM, Yachdav G and Hermjakob H

    The Genome Analysis Centre, Norwich Research Park, Norwich, NR4 7UH, UK.

    BioJS is a community-based standard and repository of functional components to represent biological information on the web. The development of BioJS has been prompted by the growing need for bioinformatics visualisation tools to be easily shared, reused and discovered. Its modular architecture makes it easy for users to find a specific functionality without needing to know how it has been built, while components can be extended or created for implementing new functionality. The BioJS community of developers currently provides a range of functionality that is open access and freely available. A registry has been set up that categorises and provides installation instructions and testing facilities at The source code for all components is available for ready use at

    F1000Research 2014;3;55

  • Full genome virus detection in fecal samples using sensitive nucleic Acid preparation, deep sequencing, and a novel iterative sequence classification algorithm.

    Cotten M, Oude Munnink B, Canuti M, Deijs M, Watson SJ, Kellam P and van der Hoek L

    Wellcome Trust Sanger Institute, Hinxton, United Kingdom.

    We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis.

    PloS one 2014;9;4;e93269

  • Deep sequencing of norovirus genomes defines evolutionary patterns in an urban tropical setting.

    Cotten M, Petrova V, Phan MV, Rabaa MA, Watson SJ, Ong SH, Kellam P and Baker S

    The Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK.

    Norovirus is a highly transmissible infectious agent that causes epidemic gastroenteritis in susceptible children and adults. Norovirus infections can be severe and can be initiated from an exceptionally small number of viral particles. Detailed genome sequence data are useful for tracking norovirus transmission and evolution. To address this need we have developed a whole-genome deep sequencing method that generates entire genome sequences from small amounts of clinical specimens. This novel approach employs an algorithm for reverse transcription and PCR amplification primer design using all publically-available norovirus sequence data. Deep sequencing and de novo assembly were used to generate norovirus genomes from a large set of diarrheal patients attending 3 hospitals in Ho Chi Minh City, Vietnam, over a 2.5-year period. Positive selection analysis and direct examination of protein changes in the virus over time identified codons in the regions encoding proteins VP1, p48 (NS1-2) and p22 (NS4) under positive selection and expands the known targets of norovirus evolutionary pressure.

    Importance: The high transmissibility and rapid evolution rate of norovirus, combined with short-lived host immune responses, are thought to be responsible for the virus causing a majority of pediatric viral diarrhea cases. The evolutionary patterns of this RNA virus have only been described in detail for a portion of the virus genome and never from a detailed urban tropical setting. We have developed robust deep sequencing methods for generating complete genome sequences directly from small amounts of patient fecal material. We use this method to provide a detailed sequence description of the noroviruses circulating in three Ho Chi Minh City hospitals over a 2.5-year period. The study identified patterns of virus change in known sites of host immune response and identified three additional regions of the virus genome under selection that were not previously recognized. In addition, the methods described here provide a robust full-genome sequencing platform for community-based virus surveillance.

    Journal of virology 2014

  • Spread, circulation, and evolution of the Middle East respiratory syndrome coronavirus.

    Cotten M, Watson SJ, Zumla AI, Makhdoom HQ, Palser AL, Ong SH, Al Rabeeah AA, Alhakeem RF, Assiri A, Al-Tawfiq JA, Albarrak A, Barry M, Shibl A, Alrabiah FA, Hajjar S, Balkhy HH, Flemban H, Rambaut A, Kellam P and Memish ZA

    Unlabelled: The Middle East respiratory syndrome coronavirus (MERS-CoV) was first documented in the Kingdom of Saudi Arabia (KSA) in 2012 and, to date, has been identified in 180 cases with 43% mortality. In this study, we have determined the MERS-CoV evolutionary rate, documented genetic variants of the virus and their distribution throughout the Arabian peninsula, and identified the genome positions under positive selection, important features for monitoring adaptation of MERS-CoV to human transmission and for identifying the source of infections. Respiratory samples from confirmed KSA MERS cases from May to September 2013 were subjected to whole-genome deep sequencing, and 32 complete or partial sequences (20 were ≥ 99% complete, 7 were 50 to 94% complete, and 5 were 27 to 50% complete) were obtained, bringing the total available MERS-CoV genomic sequences to 65. An evolutionary rate of 1.12 × 10(-3) substitutions per site per year (95% credible interval [95% CI], 8.76 × 10(-4); 1.37 × 10(-3)) was estimated, bringing the time to most recent common ancestor to March 2012 (95% CI, December 2011; June 2012). Only one MERS-CoV codon, spike 1020, located in a domain required for cell entry, is under strong positive selection. Four KSA MERS-CoV phylogenetic clades were found, with 3 clades apparently no longer contributing to current cases. The size of the population infected with MERS-CoV showed a gradual increase to June 2013, followed by a decline, possibly due to increased surveillance and infection control measures combined with a basic reproduction number (R0) for the virus that is less than 1.

    Importance: MERS-CoV adaptation toward higher rates of sustained human-to-human transmission appears not to have occurred yet. While MERS-CoV transmission currently appears weak, careful monitoring of changes in MERS-CoV genomes and of the MERS epidemic should be maintained. The observation of phylogenetically related MERS-CoV in geographically diverse locations must be taken into account in efforts to identify the animal source and transmission of the virus.

    Funded by: Wellcome Trust

    mBio 2014;5;1

  • The genome and life-stage specific transcriptomes of Globodera pallida elucidate key aspects of plant parasitism by a cyst nematode.

    Cotton JA, Lilley CJ, Jones LM, Kikuchi T, Reid AJ, Thorpe P, Tsai IJ, Beasley H, Blok V, Cock PJ, Eves-van den Akker S, Holroyd N, Hunt M, Mantelin S, Naghra H, Pain A, Palomares-Rius JE, Zarowiecki M, Berriman M, Jones JT and Urwin PE

    Background: Globodera pallida is a devastating pathogen of potato crops, making it one of the most economically important plant parasitic nematodes. It is also an important model for the biology of cyst nematodes. Cyst nematodes and root-knot nematodes are the two most important plant parasitic nematode groups and together represent a global threat to food security.

    Results: We present the complete genome sequence of G. pallida, together with transcriptomic data from most of the nematode life cycle, particularly focusing on the life cycle stages involved in root invasion and establishment of the biotrophic feeding site. Despite the relatively close phylogenetic relationship with root-knot nematodes, we describe a very different gene family content between the two groups and in particular extensive differences in the repertoire of effectors, including an enormous expansion of the SPRY domain protein family in G. pallida, which includes the SPRYSEC family of effectors. This highlights the distinct biology of cyst nematodes compared to the root-knot nematodes that were, until now, the only sedentary plant parasitic nematodes for which genome information was available. We also present in-depth descriptions of the repertoires of other genes likely to be important in understanding the unique biology of cyst nematodes and of potential drug targets and other targets for their control.

    Conclusions: The data and analyses we present will be central in exploiting post-genomic approaches in the development of much-needed novel strategies for the control of G. pallida and related pathogens.

    Genome biology 2014;15;3;R43

  • Genome-wide association study of sexual maturation in males and females highlights a role for body mass and menarche loci in male puberty.

    Cousminer DL, Stergiakouli E, Berry DJ, Ang W, Groen-Blokhuis MM, Körner A, Siitonen N, Ntalla I, Marinelli M, Perry JR, Kettunen J, Jansen R, Surakka I, Timpson NJ, Ring S, Mcmahon G, Power C, Wang C, Kähönen M, Viikari J, Lehtimäki T, Middeldorp CM, Hulshoff Pol HE, Neef M, Weise S, Pahkala K, Niinikoski H, Zeggini E, Panoutsopoulou K, Bustamante M, Penninx BW, ReproGen Consortium, Murabito J, Torrent M, Dedoussis GV, Kiess W, Boomsma DI, Pennell CE, Raitakari OT, Hyppönen E, Davey Smith G, Ripatti S, McCarthy MI, Widén E and Early Growth Genetics (EGG) Consortium

    Institute for Molecular Medicine Finland (FIMM).

    Little is known about genes regulating male puberty. Further, while many identified pubertal timing variants associate with age at menarche, a late manifestation of puberty, and body mass, little is known about these variants' relationship to pubertal initiation or tempo. To address these questions, we performed genome-wide association meta-analysis in over 11 000 European samples with data on early pubertal traits, male genital and female breast development, measured by the Tanner scale. We report the first genome-wide significant locus for male sexual development upstream of myocardin-like 2 (MKL2) (P = 8.9 × 10(-9)), a menarche locus tagging a developmental pathway linking earlier puberty with reduced pubertal growth (P = 4.6 × 10(-5)) and short adult stature (p = 7.5 × 10(-6)) in both males and females. Furthermore, our results indicate that a proportion of menarche loci are important for pubertal initiation in both sexes. Consistent with epidemiological correlations between increased prepubertal body mass and earlier pubertal timing in girls, body mass index (BMI)-increasing alleles correlated with earlier breast development. In boys, some BMI-increasing alleles associated with earlier, and others with delayed, sexual development; these genetic results mimic the controversy in epidemiological studies, some of which show opposing correlations between prepubertal BMI and male puberty. Our results contribute to our understanding of the pubertal initiation program in both sexes and indicate that although mechanisms regulating pubertal onset in males and females may largely be shared, the relationship between body mass and pubertal timing in boys may be complex and requires further genetic studies.

    Funded by: Wellcome Trust: 090532, 092731

    Human molecular genetics 2014;23;16;4452-64

  • Quantitation of Malaria Parasite-Erythrocyte Cell-Cell Interactions Using Optical Tweezers.

    Crick AJ, Theron M, Tiffert T, Lew VL, Cicuta P and Rayner JC

    Cavendish Laboratory, University of Cambridge, Cambridge, United Kingdom.

    Erythrocyte invasion by Plasmodium falciparum merozoites is an essential step for parasite survival and hence the pathogenesis of malaria. Invasion has been studied intensively, but our cellular understanding has been limited by the fact that it occurs very rapidly: invasion is generally complete within 1 min, and shortly thereafter the merozoites, at least in in vitro culture, lose their invasive capacity. The rapid nature of the process, and hence the narrow time window in which measurements can be taken, have limited the tools available to quantitate invasion. Here we employ optical tweezers to study individual invasion events for what we believe is the first time, showing that newly released P. falciparum merozoites, delivered via optical tweezers to a target erythrocyte, retain their ability to invade. Even spent merozoites, which had lost the ability to invade, retain the ability to adhere to erythrocytes, and furthermore can still induce transient local membrane deformations in the erythrocyte membrane. We use this technology to measure the strength of the adhesive force between merozoites and erythrocytes, and to probe the cellular mode of action of known invasion inhibitory treatments. These data add to our understanding of the erythrocyte-merozoite interactions that occur during invasion, and demonstrate the power of optical tweezers technologies in unraveling the blood-stage biology of malaria.

    Biophysical journal 2014;107;4;846-853

  • Evidence for soft selective sweeps in the evolution of pneumococcal multidrug-resistance and vaccine escape.

    Croucher NJ, Chewapreecha C, Hanage WP, Harris SR, McGee L, van der Linden M, Song JH, Ko KS, de Lencastre H, Turner C, Yang F, Sá-Leão R, Beall B, Klugman KP, Parkhill J, Turner P and Bentley SD

    Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA 02115, USA. Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    The multidrug-resistant Streptococcus pneumoniae Taiwan(19F)-14, or PMEN14, clone was first observed with a 19F serotype, which is targeted by the heptavalent polysaccharide conjugate vaccine (PCV7). However, 'vaccine escape' PMEN14 isolates with a 19A serotype became an increasingly important cause of disease post-PCV7. Whole genome sequencing was used to characterise the recent evolution 173 pneumococci of, or related to, PMEN14. This suggested PMEN14 is a single lineage that originated in the late 1980s in parallel with the acquisition of multiple resistances by close relatives. One of the four detected serotype switches to 19A generated representatives of the sequence type (ST) 320 isolates that have been highly successful post-PCV7. A second produced an ST236 19A genotype with reduced resistance to β-lactams owing to alteration of pbp1a and pbp2x sequences through the same recombination that caused the change in serotype. A third, which generated a mosaic capsule biosynthesis locus, resulted in serotype 19A ST271 isolates. The rapid diversification through homologous recombination seen in the global collection was similarly observed in the absence of vaccination in isolates from the Maela refugee camp in Thailand. This sample also allowed variation to be observed within carriage through longitudinal sampling. This suggests some pneumococcal genotypes generate a pool of standing variation that is sufficiently extensive to result in 'soft' selective sweeps: the emergence of multiple related mutants in parallel upon a change in selection pressure, such as vaccine introduction. The subsequent competition between these mutants makes this phenomenon difficult to detect without deep sampling of individual lineages.

    Genome biology and evolution 2014

  • Time between sputum sample collection and storage significantly influences bacterial sequence composition from Cystic Fibrosis respiratory infections.

    Cuthbertson L, Rogers GB, Walker AW, Oliver A, Hafiz T, Hoffman LR, Carroll MP, Parkhill J, Bruce KD and van der Gast CJ

    NERC Centre for Ecology & Hydrology, Wallingford, OX10 8BB, UK Institute of Pharmaceutical Science, Molecular Microbiology Research Laboratory, King's College London, London, SE1 9NH, UK.

    Spontaneously expectorated sputum is traditionally used as the sampling method for the investigation of lower airway infections. Whilst guidelines exist for the handling of these samples for culture-based diagnostic microbiology, there is no comparable consensus on their handling prior to culture-independent analysis. The increasing incorporation of culture-independent approaches in diagnostic microbiology means it is of critical importance to assess potential biases. The aim of this study was to assess the impact of delayed freezing on culture-independent microbiological analyses, and to identify acceptable parameters for sample handling. Sputum samples from eight adult cystic fibrosis (CF) patients were collected and aliquoted into sterile Bijou bottles. Aliquots were stored at room temperature before freezing at -80°C for increasing intervals up to 72 hour period. Samples were treated with propidium monoazide, to distinguish live from dead cells, prior to DNA extraction, and 16S rRNA gene pyrosequencing was used to characterise the bacterial composition. Substantial variation was observed in samples with high diversity bacterial communities over time, whereas low diversity communities dominated by recognised CF pathogens varied little regardless of time to freezing. Partitioning into common and rare species demonstrated that the rare species drove changes in similarity. The percentage abundance of anaerobes over the study significantly decreased after 12 hours at room temperature (P=0.008). Failure to stabilise samples at -80°C within 12 hours of collection results in significant changes in the detected community composition.

    Journal of clinical microbiology 2014

  • From genome-wide association study hits to new insights into experimental hematology.

    Cvejic A

    Department of Haematology, University of Cambridge, UK; Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. Electronic address:

    Despite significant improvements in our knowledge of the mechanisms of normal and pathological hematopoiesis, our current understanding is most likely an oversimplification of the complexity of regulatory networks at play. Thus, considerable efforts have been made to catalogue the total sum of germline alterations in individual genomes affecting human hematopoiesis. These efforts ultimately led to the discovery of a large number of new genes not previously implicated in blood formation. Although identification of novel genes is important in revealing the profiles of genetic variations associated with normal hematopoiesis, further functional studies are necessary to improve our understanding of the mechanism(s) involved in these processes. In this review, we summarize the knowledge gained from genome-wide association studies to elucidate the relationship between genetics and blood cell traits. We discuss the most important recent advances, with an emphasis on functional follow-up studies that have been particularly useful in providing an insight into novel regulatory processes that influence blood cell formation and function. We also discuss potential future directions and challenges in the field.

    Experimental hematology 2014;42;8;630-636

  • Streptococcus agalactiae clones infecting humans were selected and fixed through the extensive use of tetracycline.

    Da Cunha V, Davies MR, Douarre PE, Rosinski-Chupin I, Margarit I, Spinali S, Perkins T, Lechat P, Dmytruk N, Sauvage E, Ma L, Romi B, Tichit M, Lopez-Sanchez MJ, Descorps-Declere S, Souche E, Buchrieser C, Trieu-Cuot P, Moszer I, Clermont D, Maione D, Bouchier C, McMillan DJ, Parkhill J, Telford JL, Dougan G, Walker MJ, DEVANI Consortium, Holden MT, Poyart C, Glaser P and DEVANI Consortium

    1] Institut Pasteur, Unité de Biologie des Bacteries Pathogènes à Gram-positif, Paris 75015, France [2] CNRS UMR3525, Paris 75015, France [3] Institut Pasteur, Bioinformatics platform, Paris 75015, France [4].

    Streptococcus agalactiae (Group B Streptococcus, GBS) is a commensal of the digestive and genitourinary tracts of humans that emerged as the leading cause of bacterial neonatal infections in Europe and North America during the 1960s. Due to the lack of epidemiological and genomic data, the reasons for this emergence are unknown. Here we show by comparative genome analysis and phylogenetic reconstruction of 229 isolates that the rise of human GBS infections corresponds to the selection and worldwide dissemination of only a few clones. The parallel expansion of the clones is preceded by the insertion of integrative and conjugative elements conferring tetracycline resistance (TcR). Thus, we propose that the use of tetracycline from 1948 onwards led in humans to the complete replacement of a diverse GBS population by only few TcR clones particularly well adapted to their host, causing the observed emergence of GBS diseases in neonates.

    Nature communications 2014;5;4544

  • Structure and computational analysis of a novel protein with metallopeptidase-like and circularly permuted winged-helix-turn-helix domains reveals a possible role in modified polysaccharide biosynthesis.

    Das D, Murzin AG, Rawlings ND, Finn RD, Coggill P, Bateman A, Godzik A and Aravind L

    Joint Center for Structural Genomics, La Jolla, CA, USA.

    Background: CA_C2195 from Clostridium acetobutylicum is a protein of unknown function. Sequence analysis predicted that part of the protein contained a metallopeptidase-related domain. There are over 200 homologs of similar size in large sequence databases such as UniProt, with pairwise sequence identities in the range of ~40-60%. CA_C2195 was chosen for crystal structure determination for structure-based function annotation of novel protein sequence space.

    Results: The structure confirmed that CA_C2195 contained an N-terminal metallopeptidase-like domain. The structure revealed two extra domains: an α+β domain inserted in the metallopeptidase-like domain and a C-terminal circularly permuted winged-helix-turn-helix domain.

    Conclusions: Based on our sequence and structural analyses using the crystal structure of CA_C2195 we provide a view into the possible functions of the protein. From contextual information from gene-neighborhood analysis, we propose that rather than being a peptidase, CA_C2195 and its homologs might play a role in biosynthesis of a modified cell-surface carbohydrate in conjunction with several sugar-modification enzymes. These results provide the groundwork for the experimental verification of the function.

    Funded by: Medical Research Council: MC_U105192716; NIGMS NIH HHS: R01GM101457, U54 GM094586; Wellcome Trust: WT077044/Z/05/Z

    BMC bioinformatics 2014;15;75

  • The correlation between reading and mathematics ability at age twelve has a substantial genetic component.

    Davis OS, Band G, Pirinen M, Haworth CM, Meaburn EL, Kovas Y, Harlaar N, Docherty SJ, Hanscombe KB, Trzaskowski M, Curtis CJ, Strange A, Freeman C, Bellenguez C, Su Z, Pearson R, Vukcevic D, Langford C, Deloukas P, Hunt S, Gray E, Dronov S, Potter SC, Tashakkori-Ghanbaria A, Edkins S, Bumpstead SJ, Blackwell JM, Bramon E, Brown MA, Casas JP, Corvin A, Duncanson A, Jankowski JA, Markus HS, Mathew CG, Palmer CN, Rautanen A, Sawcer SJ, Trembath RC, Viswanathan AC, Wood NW, Barroso I, Peltonen L, Dale PS, Petrill SA, Schalkwyk LS, Craig IW, Lewis CM, Price TS, Wellcome Trust Case Control Consortium, Donnelly P, Plomin R and Spencer CC

    1] Department of Genetics, Evolution and Environment, UCL Genetics Institute, University College London, London WC1E 6BT, UK [2] King's College London, Social Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, London SE5 8AF, UK [3].

    Dissecting how genetic and environmental influences impact on learning is helpful for maximizing numeracy and literacy. Here we show, using twin and genome-wide analysis, that there is a substantial genetic component to children's ability in reading and mathematics, and estimate that around one half of the observed correlation in these traits is due to shared genetic effects (so-called Generalist Genes). Thus, our results highlight the potential role of the learning environment in contributing to differences in a child's cognitive abilities at age twelve.

    Nature communications 2014;5;4204

  • Recurrent mutations, including NPM1c, activate a BRD4-dependent core transcriptional program in acute myeloid leukemia.

    Dawson MA, Gudgin EJ, Horton SJ, Giotopoulos G, Meduri E, Robson S, Cannizzaro E, Osaki H, Wiese M, Putwain S, Fong CY, Grove C, Craig J, Dittmann A, Lugo D, Jeffrey P, Drewes G, Lee K, Bullinger L, Prinjha RK, Kouzarides T, Vassiliou GS and Huntly BJ

    1] Department of Haematology, Cambridge Institute for Medical Research and Addenbrookes Hospital, University of Cambridge, Cambridge, UK [2] Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, Cambridge, UK [3] Gurdon Institute and Department of Pathology, University of Cambridge, Cambridge UK.

    Recent evidence suggests that inhibition of bromodomain and extra-terminal (BET) epigenetic readers may have clinical utility against acute myeloid leukemia (AML). Here we validate this hypothesis, demonstrating the efficacy of the BET inhibitor I-BET151 across a variety of AML subtypes driven by disparate mutations. We demonstrate that a common 'core' transcriptional program, which is HOX gene independent, is downregulated in AML and underlies sensitivity to I-BET treatment. This program is enriched for genes that contain 'super-enhancers', recently described regulatory elements postulated to control key oncogenic driver genes. Moreover, our program can independently classify AML patients into distinct cytogenetic and molecular subgroups, suggesting that it contains biomarkers of sensitivity and response. We focus AML with mutations of the Nucleophosmin gene (NPM1) and show evidence to suggest that wild-type NPM1 has an inhibitory influence on BRD4 that is relieved upon NPM1c mutation and cytosplasmic dislocation. This leads to the upregulation of the core transcriptional program facilitating leukemia development. This program is abrogated by I-BET therapy and by nuclear restoration of NPM1. Finally, we demonstrate the efficacy of I-BET151 in a unique murine model and in primary patient samples of NPM1c AML. Taken together, our data support the use of BET inhibitors in clinical trials in AML.

    Leukemia 2014;28;2;311-20

  • Genome sequencing of disease and carriage isolates of nontypeable Haemophilus influenzae identifies discrete population structure.

    De Chiara M, Hood D, Muzzi A, Pickard DJ, Perkins T, Pizza M, Dougan G, Rappuoli R, Moxon ER, Soriani M and Donati C

    Novartis Vaccines, 53100 Siena, Italy.

    One of the main hurdles for the development of an effective and broadly protective vaccine against nonencapsulated isolates of Haemophilus influenzae (NTHi) lies in the genetic diversity of the species, which renders extremely difficult the identification of cross-protective candidate antigens. To assess whether a population structure of NTHi could be defined, we performed genome sequencing of a collection of diverse clinical isolates representative of both carriage and disease and of the diversity of the natural population. Analysis of the distribution of polymorphic sites in the core genome and of the composition of the accessory genome defined distinct evolutionary clades and supported a predominantly clonal evolution of NTHi, with the majority of genetic information transmitted vertically within lineages. A correlation between the population structure and the presence of selected surface-associated proteins and lipooligosaccharide structure, known to contribute to virulence, was found. This high-resolution, genome-based population structure of NTHi provides the foundation to obtain a better understanding, of NTHi adaptation to the host as well as its commensal and virulence behavior, that could facilitate intervention strategies against disease caused by this important human pathogen.

    Proceedings of the National Academy of Sciences of the United States of America 2014;111;14;5439-44

  • Chromatin landscapes of retroviral and transposon integration profiles.

    de Jong J, Akhtar W, Badhai J, Rust AG, Rad R, Hilkens J, Berns A, van Lohuizen M, Wessels LF and de Ridder J

    Computational Cancer Biology Group, Division of Molecular Carcinogenesis, The Netherlands Cancer Institute, Amsterdam, The Netherlands; Netherlands Consortium for Systems Biology, Amsterdam, The Netherlands.

    The ability of retroviruses and transposons to insert their genetic material into host DNA makes them widely used tools in molecular biology, cancer research and gene therapy. However, these systems have biases that may strongly affect research outcomes. To address this issue, we generated very large datasets consisting of [Formula: see text] to [Formula: see text] unselected integrations in the mouse genome for the Sleeping Beauty (SB) and piggyBac (PB) transposons, and the Mouse Mammary Tumor Virus (MMTV). We analyzed [Formula: see text] (epi)genomic features to generate bias maps at both local and genome-wide scales. MMTV showed a remarkably uniform distribution of integrations across the genome. More distinct preferences were observed for the two transposons, with PB showing remarkable resemblance to bias profiles of the Murine Leukemia Virus. Furthermore, we present a model where target site selection is directed at multiple scales. At a large scale, target site selection is similar across systems, and defined by domain-oriented features, namely expression of proximal genes, proximity to CpG islands and to genic features, chromatin compaction and replication timing. Notable differences between the systems are mainly observed at smaller scales, and are directed by a diverse range of features. To study the effect of these biases on integration sites occupied under selective pressure, we turned to insertional mutagenesis (IM) screens. In IM screens, putative cancer genes are identified by finding frequently targeted genomic regions, or Common Integration Sites (CISs). Within three recently completed IM screens, we identified 7%-33% putative false positive CISs, which are likely not the result of the oncogenic selection process. Moreover, results indicate that PB, compared to SB, is more suited to tag oncogenes.

    PLoS genetics 2014;10;4;e1004250

  • Genome-wide association meta-analysis of human longevity identifies a novel locus conferring survival beyond 90 years of age.

    Deelen J, Beekman M, Uh HW, Broer L, Ayers KL, Tan Q, Kamatani Y, Bennet AM, Tamm R, Trompet S, Guðbjartsson DF, Flachsbart F, Rose G, Viktorin A, Fischer K, Nygaard M, Cordell HJ, Crocco P, van den Akker EB, Böhringer S, Helmer Q, Nelson CP, Saunders GI, Alver M, Andersen-Ranberg K, Breen ME, van der Breggen R, Caliebe A, Capri M, Cevenini E, Collerton JC, Dato S, Davies K, Ford I, Gampe J, Garagnani P, de Geus EJ, Harrow J, van Heemst D, Heijmans BT, Heinsen FA, Hottenga JJ, Hofman A, Jeune B, Jonsson PV, Lathrop M, Lechner D, Martin-Ruiz C, Mcnerlan SE, Mihailov E, Montesanto A, Mooijaart SP, Murphy A, Nohr EA, Paternoster L, Postmus I, Rivadeneira F, Ross OA, Salvioli S, Sattar N, Schreiber S, Stefánsson H, Stott DJ, Tiemeier H, Uitterlinden AG, Westendorp RG, Willemsen G, Samani NJ, Galan P, Sørensen TI, Boomsma DI, Jukema JW, Rea IM, Passarino G, de Craen AJ, Christensen K, Nebel A, Stefánsson K, Metspalu A, Magnusson P, Blanché H, Christiansen L, Kirkwood TB, van Duijn CM, Franceschi C, Houwing-Duistermaat JJ and Slagboom PE

    Department of Molecular Epidemiology, Netherlands Consortium for Healthy Ageing.

    The genetic contribution to the variation in human lifespan is ∼25%. Despite the large number of identified disease-susceptibility loci, it is not known which loci influence population mortality. We performed a genome-wide association meta-analysis of 7729 long-lived individuals of European descent (≥85 years) and 16 121 younger controls (<65 years) followed by replication in an additional set of 13 060 long-lived individuals and 61 156 controls. In addition, we performed a subset analysis in cases aged ≥90 years. We observed genome-wide significant association with longevity, as reflected by survival to ages beyond 90 years, at a novel locus, rs2149954, on chromosome 5q33.3 (OR = 1.10, P = 1.74 × 10(-8)). We also confirmed association of rs4420638 on chromosome 19q13.32 (OR = 0.72, P = 3.40 × 10(-36)), representing the TOMM40/APOE/APOC1 locus. In a prospective meta-analysis (n = 34 103), the minor allele of rs2149954 (T) on chromosome 5q33.3 associates with increased survival (HR = 0.95, P = 0.003). This allele has previously been reported to associate with low blood pressure in middle age. Interestingly, the minor allele (T) associates with decreased cardiovascular mortality risk, independent of blood pressure. We report on the first GWAS-identified longevity locus on chromosome 5q33.3 influencing survival in the general European population. The minor allele of this locus associates with low blood pressure in middle age, although the contribution of this allele to survival may be less dependent on blood pressure. Hence, the pleiotropic mechanisms by which this intragenic variation contributes to lifespan regulation have to be elucidated.

    Funded by: NHGRI NIH HHS: T32 HG002536

    Human molecular genetics 2014;23;16;4420-32

  • Minimal morphological criteria for defining bone marrow dysplasia: a basis for clinical implementation of WHO classification of myelodysplastic syndromes.

    Della Porta MG, Travaglino E, Boveri E, Ponzoni M, Malcovati L, Papaemmanuil E, Rigolin GM, Pascutto C, Croci G, Gianelli U, Milani R, Ambaglio I, Elena C, Ubezio M, Da Via' MC, Bono E, Pietra D, Quaglia F, Bastia R, Ferretti V, Cuneo A, Morra E, Campbell PJ, Orazi A, Invernizzi R and Cazzola M

    1] Department of Hematology Oncology, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy [2] Department of Internal Medicine, University of Pavia, Pavia, Italy.

    The World Health Organization classification of myelodysplastic syndromes (MDS) is based on morphological evaluation of marrow dysplasia. We performed a systematic review of cytological and histological data from 1150 patients with peripheral blood cytopenia. We analyzed the frequency and discriminant power of single morphological abnormalities. A score to define minimal morphological criteria associated to the presence of marrow dysplasia was developed. This score showed high sensitivity/specificity (>90%), acceptable reproducibility and was independently validated. The severity of granulocytic and megakaryocytic dysplasia significantly affected survival. A close association was found between ring sideroblasts and SF3B1 mutations, and between severe granulocytic dysplasia and mutation of ASXL1, RUNX1, TP53 and SRSF2 genes. In myeloid neoplasms with fibrosis, multilineage dysplasia, hypolobulated/multinucleated megakaryocytes and increased CD34+ progenitors in the absence of JAK2, MPL and CALR gene mutations were significantly associated with a myelodysplastic phenotype. In myeloid disorders with marrow hypoplasia, granulocytic and/or megakaryocytic dysplasia, increased CD34+ progenitors and chromosomal abnormalities are consistent with a diagnosis of MDS. The proposed morphological score may be useful to evaluate the presence of dysplasia in cases without a clearly objective myelodysplastic phenotype. The integration of cytological and histological parameters improves the identification of MDS cases among myeloid disorders with fibrosis and hypocellularity.Leukemia advance online publication, 17 June 2014; doi:10.1038/leu.2014.161.

    Leukemia 2014

  • Phylogenetic studies of transmission dynamics in generalized HIV epidemics: An essential tool where the burden is greatest?

    Dennis AM, Herbeck JT, Brown AL, Kellam P, de Oliveira T, Pillay D, Fraser C and Cohen MS

    aDivision of Infectious Diseases, University of North Carolina at Chapel Hill, Chapel Hill, NC bDepartment of Microbiology, University of Washington, Seattle, WA cInstitute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK dWellcome Trust Sanger Institute, Cambridge, UK eDivision of Infection and Immunity, University College London, London, UK fWellcome Trust-Africa Centre for Health and Population Studies, University of Kwazula-Natal, ZA gDepartment of Infectious Disease Epidemiology, Imperial College London, London, UK.

    Efficient and effective HIV prevention measures for generalized epidemics in sub-Saharan Africa have not yet been validated at the population-level. Design and impact evaluation of such measures requires fine-scale understanding of local HIV transmission dynamics. The novel tools of HIV phylogenetics and molecular epidemiology may elucidate these transmission dynamics. Such methods have been incorporated into studies of concentrated HIV epidemics to identify proximate and determinant traits associated with ongoing transmission. However, applying similar phylogenetic analyses to generalized epidemics, including the design and evaluation of prevention trials, presents additional challenges. Here we review the scope of these methods and present examples of their use in concentrated epidemics in the context of prevention. Next, we describe the current uses for phylogenetics in generalized epidemics, and discuss their promise for elucidating transmission patterns and informing prevention trials. Finally, we review logistic and technical challenges inherent to large-scale molecular epidemiological studies of generalized epidemics, and suggest potential solutions.

    Journal of acquired immune deficiency syndromes (1999) 2014

  • Mitochondrial Genome Sequencing in Mesolithic North East Europe Unearths a New Sub-Clade within the Broadly Distributed Human Haplogroup C1.

    Der Sarkissian C, Brotherton P, Balanovsky O, Templeton JE, Llamas B, Soubrier J, Moiseyev V, Khartanovich V, Cooper A, Haak W and Genographic Consortium

    Australian Centre for Ancient DNA, School of Earth and Environmental Sciences, University of Adelaide, Adelaide, South Australia, Australia.

    The human mitochondrial haplogroup C1 has a broad global distribution but is extremely rare in Europe today. Recent ancient DNA evidence has demonstrated its presence in European Mesolithic individuals. Three individuals from the 7,500 year old Mesolithic site of Yuzhnyy Oleni Ostrov, Western Russia, could be assigned to haplogroup C1 based on mitochondrial hypervariable region I sequences. However, hypervariable region I data alone could not provide enough resolution to establish the phylogenetic relationship of these Mesolithic haplotypes with haplogroup C1 mitochondrial DNA sequences found today in populations of Europe, Asia and the Americas. In order to obtain high-resolution data and shed light on the origin of this European Mesolithic C1 haplotype, we target-enriched and sequenced the complete mitochondrial genome of one Yuzhnyy Oleni Ostrov C1 individual. The updated phylogeny of C1 haplogroups indicated that the Yuzhnyy Oleni Ostrov haplotype represents a new distinct clade, provisionally coined "C1f". We show that all three C1 carriers of Yuzhnyy Oleni Ostrov belong to this clade. No haplotype closely related to the C1f sequence could be found in the large current database of ancient and present-day mitochondrial genomes. Hence, we have discovered past human mitochondrial diversity that has not been observed in modern-day populations so far. The lack of positive matches in modern populations may be explained by under-sampling of rare modern C1 carriers or by demographic processes, population extinction or replacement, that may have impacted on populations of Northeast Europe since prehistoric times.

    PloS one 2014;9;2;e87612

  • Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility.

    DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium

    To further understanding of the genetic basis of type 2 diabetes (T2D) susceptibility, we aggregated published meta-analyses of genome-wide association studies (GWAS), including 26,488 cases and 83,964 controls of European, east Asian, south Asian and Mexican and Mexican American ancestry. We observed a significant excess in the directional consistency of T2D risk alleles across ancestry groups, even at SNPs demonstrating only weak evidence of association. By following up the strongest signals of association from the trans-ethnic meta-analysis in an additional 21,491 cases and 55,647 controls of European ancestry, we identified seven new T2D susceptibility loci. Furthermore, we observed considerable improvements in the fine-mapping resolution of common variant association signals at several T2D susceptibility loci. These observations highlight the benefits of trans-ethnic GWAS for the discovery and characterization of complex trait loci and emphasize an exciting opportunity to extend insight into the genetic architecture and pathogenesis of human diseases across populations of diverse ancestry.

    Nature genetics 2014

  • DNA methylation and body-mass index: a genome-wide analysis.

    Dick KJ, Nelson CP, Tsaprouni L, Sandling JK, Aïssi D, Wahl S, Meduri E, Morange PE, Gagnon F, Grallert H, Waldenberger M, Peters A, Erdmann J, Hengstenberg C, Cambien F, Goodall AH, Ouwehand WH, Schunkert H, Thompson JR, Spector TD, Gieger C, Trégouët DA, Deloukas P and Samani NJ

    Department of Cardiovascular Sciences, University of Leicester, Leicester, UK; National Institute for Health Research Leicester Cardiovascular Biomedical Research Unit, Glenfield Hospital, Leicester, UK.

    Background: Obesity is a major health problem that is determined by interactions between lifestyle and environmental and genetic factors. Although associations between several genetic variants and body-mass index (BMI) have been identified, little is known about epigenetic changes related to BMI. We undertook a genome-wide analysis of methylation at CpG sites in relation to BMI.

    Methods: 479 individuals of European origin recruited by the Cardiogenics Consortium formed our discovery cohort. We typed their whole-blood DNA with the Infinium HumanMethylation450 array. After quality control, methylation levels were tested for association with BMI. Methylation sites showing an association with BMI at a false discovery rate q value of 0·05 or less were taken forward for replication in a cohort of 339 unrelated white patients of northern European origin from the MARTHA cohort. Sites that remained significant in this primary replication cohort were tested in a second replication cohort of 1789 white patients of European origin from the KORA cohort. We examined whether methylation levels at identified sites also showed an association with BMI in DNA from adipose tissue (n=635) and skin (n=395) obtained from white female individuals participating in the MuTHER study. Finally, we examined the association of methylation at BMI-associated sites with genetic variants and with gene expression.

    Findings: 20 individuals from the discovery cohort were excluded from analyses after quality-control checks, leaving 459 participants. After adjustment for covariates, we identified an association (q value ≤0·05) between methylation at five probes across three different genes and BMI. The associations with three of these probes-cg22891070, cg27146050, and cg16672562, all of which are in intron 1 of HIF3A-were confirmed in both the primary and second replication cohorts. For every 0·1 increase in methylation β value at cg22891070, BMI was 3·6% (95% CI 2·4-4·9) higher in the discovery cohort, 2·7% (1·2-4·2) higher in the primary replication cohort, and 0·8% (0·2-1·4) higher in the second replication cohort. For the MuTHER cohort, methylation at cg22891070 was associated with BMI in adipose tissue (p=1·72 × 10(-5)) but not in skin (p=0·882). We observed a significant inverse correlation (p=0·005) between methylation at cg22891070 and expression of one HIF3A gene-expression probe in adipose tissue. Two single nucleotide polymorphisms-rs8102595 and rs3826795-had independent associations with methylation at cg22891070 in all cohorts. However, these single nucleotide polymorphisms were not significantly associated with BMI.

    Interpretation: Increased BMI in adults of European origin is associated with increased methylation at the HIF3A locus in blood cells and in adipose tissue. Our findings suggest that perturbation of hypoxia inducible transcription factor pathways could have an important role in the response to increased weight in people.

    Funding: The European Commission, National Institute for Health Research, British Heart Foundation, and Wellcome Trust.

    Lancet 2014

  • Open-source electronic data capture system offered increased accuracy and cost-effectiveness compared with paper methods in Africa.

    Dillon DG, Pirie F, Pomilla C, Sandhu MS, Motala AA, Young EH and African Partnership for Chronic Disease Research (APCDR)

    International Health Research Group, Department of Public Health and Primary Care, University of Cambridge, Strangeways Research Laboratory, Wort's Causeway, Cambridge, CB1 8RN, United Kingdom; Genetic Epidemiology Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1HH, United Kingdom.

    Objectives: Existing electronic data capture options are often financially unfeasible in resource-poor settings or difficult to support technically in the field. To help facilitate large-scale multicenter studies in sub-Saharan Africa, the African Partnership for Chronic Disease Research (APCDR) has developed an open-source electronic questionnaire (EQ).

    To assess its relative validity, we compared the EQ against traditional pen-and-paper methods using 200 randomized interviews conducted in an ongoing type 2 diabetes case-control study in South Africa.

    Results: During its 3-month validation, the EQ had a lower frequency of errors (EQ, 0.17 errors per 100 questions; paper, 0.73 errors per 100 questions; P-value ≤0.001), and a lower monetary cost per correctly entered question, compared with the pen-and-paper method. We found no marked difference in the average duration of the interview between methods (EQ, 5.4 minutes; paper, 5.6 minutes).

    Conclusion: This validation study suggests that the EQ may offer increased accuracy, similar interview duration, and increased cost-effectiveness compared with paper-based data collection methods. The APCDR EQ software is freely available (

    Journal of clinical epidemiology 2014

  • Impact of type 2 diabetes susceptibility variants on quantitative glycemic traits reveals mechanistic heterogeneity.

    Dimas AS, Lagou V, Barker A, Knowles JW, Mägi R, Hivert MF, Benazzo A, Rybin D, Jackson AU, Stringham HM, Song C, Fischer-Rosinsky A, Boesgaard TW, Grarup N, Abbasi FA, Assimes TL, Hao K, Yang X, Lecoeur C, Barroso I, Bonnycastle LL, Böttcher Y, Bumpstead S, Chines PS, Erdos MR, Graessler J, Kovacs P, Morken MA, Narisu N, Payne F, Stancakova A, Swift AJ, Tönjes A, Bornstein SR, Cauchi S, Froguel P, Meyre D, Schwarz PE, Häring HU, Smith U, Boehnke M, Bergman RN, Collins FS, Mohlke KL, Tuomilehto J, Quertemous T, Lind L, Hansen T, Pedersen O, Walker M, Pfeiffer AF, Spranger J, Stumvoll M, Meigs JB, Wareham NJ, Kuusisto J, Laakso M, Langenberg C, Dupuis J, Watanabe RM, Florez JC, Ingelsson E, McCarthy MI, Prokopenko I and MAGIC Investigators

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, U.K.Alexander Fleming, Biomedical Sciences Research Center, Vari, Athens, Greece.

    Patients with established type 2 diabetes display both β-cell dysfunction and insulin resistance. To define fundamental processes leading to the diabetic state, we examined the relationship between type 2 diabetes risk variants at 37 established susceptibility loci, and indices of proinsulin processing, insulin secretion, and insulin sensitivity. We included data from up to 58,614 nondiabetic subjects with basal measures and 17,327 with dynamic measures. We used additive genetic models with adjustment for sex, age, and BMI, followed by fixed-effects, inverse-variance meta-analyses. Cluster analyses grouped risk loci into five major categories based on their relationship to these continuous glycemic phenotypes. The first cluster (PPARG, KLF14, IRS1, GCKR) was characterized by primary effects on insulin sensitivity. The second cluster (MTNR1B, GCK) featured risk alleles associated with reduced insulin secretion and fasting hyperglycemia. ARAP1 constituted a third cluster characterized by defects in insulin processing. A fourth cluster (TCF7L2, SLC30A8, HHEX/IDE, CDKAL1, CDKN2A/2B) was defined by loci influencing insulin processing and secretion without a detectable change in fasting glucose levels. The final group contained 20 risk loci with no clear-cut associations to continuous glycemic traits. By assembling extensive data on continuous glycemic traits, we have exposed the diverse mechanisms whereby type 2 diabetes risk variants impact disease predisposition.

    Funded by: NCRR NIH HHS: 2 M01 RR000070, RR01066; NHGRI NIH HHS: 1 Z01 HG000024; NHLBI NIH HHS: N01-HC-25195, N02-HL-6-4278; NIDA NIH HHS: U54 DA021519; NIDDK NIH HHS: DK-062370, DK-069922, DK-072193, K24 DK-080140, K24 DK080140, R01 DK-078616, R01 DK062370, R01 DK072193, R01 DK078616, R01 DK093757, R56 DK062370, U01 DK062370; Wellcome Trust: 090532, 098381

    Diabetes 2014;63;6;2158-71

  • Estimating telomere length from whole genome sequence data.

    Ding Z, Mangino M, Aviv A, UK10K Consortium, Spector T and Durbin R

    Genome Informatics, Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK.

    Telomeres play a key role in replicative ageing and undergo age-dependent attrition in vivo. Here, we report a novel method, TelSeq, to measure average telomere length from whole genome or exome shotgun sequence data. In 260 leukocyte samples, we show that TelSeq results correlate with Southern blot measurements of the mean length of terminal restriction fragments (mTRFs) and display age-dependent attrition comparably well as mTRFs.

    Nucleic acids research 2014

  • Epidermal Wnt/β-catenin signaling regulates adipocyte differentiation via secretion of adipogenic factors.

    Donati G, Proserpio V, Lichtenberger BM, Natsuga K, Sinclair R, Fujiwara H and Watt FM

    Centre for Stem Cells and Regenerative Medicine, Kings College London, London SE1 9RT, United Kingdom.

    It has long been recognized that the hair follicle growth cycle and oscillation in the thickness of the underlying adipocyte layer are synchronized. Although factors secreted by adipocytes are known to regulate the hair growth cycle, it is unclear whether the epidermis can regulate adipogenesis. We show that inhibition of epidermal Wnt/β-catenin signaling reduced adipocyte differentiation in developing and adult mouse dermis. Conversely, ectopic activation of epidermal Wnt signaling promoted adipocyte differentiation and hair growth. When the Wnt pathway was activated in the embryonic epidermis, there was a dramatic and premature increase in adipocytes in the absence of hair follicle formation, demonstrating that Wnt activation, rather than mature hair follicles, is required for adipocyte generation. Epidermal and dermal gene expression profiling identified keratinocyte-derived adipogenic factors that are induced by β-catenin activation. Wnt/β-catenin signaling-dependent secreted factors from keratinocytes promoted adipocyte differentiation in vitro, and we identified ligands for the bone morphogenetic protein and insulin pathways as proadipogenic factors. Our results indicate epidermal Wnt/β-catenin as a critical initiator of a signaling cascade that induces adipogenesis and highlight the role of epidermal Wnt signaling in synchronizing adipocyte differentiation with the hair growth cycle.

    Proceedings of the National Academy of Sciences of the United States of America 2014;111;15;E1501-9

  • Novel determinants of antibiotic resistance: identification of mutated loci in highly methicillin-resistant subpopulations of methicillin-resistant Staphylococcus aureus.

    Dordel J, Kim C, Chung M, Pardos de la Gándara M, Holden MT, Parkhill J, de Lencastre H, Bentley SD and Tomasz A

    We identified mutated genes in highly resistant subpopulations of methicillin-resistant Staphylococcus aureus (MRSA) that are most likely responsible for the historic failure of the β-lactam family of antibiotics as therapeutic agents against these important pathogens. Such subpopulations are produced during growth of most clinical MRSA strains, including the four historically early MRSA isolates studied here. Chromosomal DNA was prepared from the highly resistant cells along with DNA from the majority of cells (poorly resistant cells) followed by full genome sequencing. In the highly resistant cells, mutations were identified in 3 intergenic sequences and 27 genes representing a wide range of functional categories. A common feature of these mutations appears to be their capacity to induce high-level β-lactam resistance and increased amounts of the resistance protein PBP2A in the bacteria. The observations fit a recently described model in which the ultimate controlling factor of the phenotypic expression of β-lactam resistance in MRSA is a RelA-mediated stringent response. IMPORTANCE It has been well established that the level of antibiotic resistance (i.e., minimum concentration of a β-lactam antibiotic needed to inhibit growth) of a methicillin-resistant Staphylococcus aureus (MRSA) strain depends on the transcription and translation of the resistance protein PBP2A. Here we describe mutated loci in an additional novel set of genetic determinants that appear to be essential for the unusually high resistance levels typical of subpopulations of staphylococci that are produced with unique low frequency in most MRSA clinical isolates. We propose that mutations in these determinants can trigger induction of the stringent stress response which was recently shown to cause increased transcription/translation of the resistance protein PBP2A in parallel with the increased level of resistance.

    Funded by: NCATS NIH HHS: UL1 TR000043-07S1; NIAID NIH HHS: 2 RO1 AI457838-14; Wellcome Trust: 098051

    mBio 2014;5;2;e01000

  • Neutralization of Plasmodium falciparum Merozoites by Antibodies against PfRH5.

    Douglas AD, Williams AR, Knuepfer E, Illingworth JJ, Furze JM, Crosnier C, Choudhary P, Bustamante LY, Zakutansky SE, Awuah DK, Alanine DG, Theron M, Worth A, Shimkets R, Rayner JC, Holder AA, Wright GJ and Draper SJ

    Jenner Institute, University of Oxford, Oxford OX3 7DQ, United Kingdom;

    There is intense interest in induction and characterization of strain-transcending neutralizing Ab against antigenically variable human pathogens. We have recently identified the human malaria parasite Plasmodium falciparum reticulocyte-binding protein homolog 5 (PfRH5) as a target of broadly neutralizing Abs, but there is little information regarding the functional mechanism(s) of Ab-mediated neutralization. In this study, we report that vaccine-induced polyclonal anti-PfRH5 Abs inhibit the tight attachment of merozoites to erythrocytes and are capable of blocking the interaction of PfRH5 with its receptor basigin. Furthermore, by developing anti-PfRH5 mAbs, we provide evidence of the following: 1) the ability to block the PfRH5-basigin interaction in vitro is predictive of functional activity, but absence of blockade does not predict absence of functional activity; 2) neutralizing mAbs bind spatially related epitopes on the folded protein, involving at least two defined regions of the PfRH5 primary sequence; 3) a brief exposure window of PfRH5 is likely to necessitate rapid binding of Ab to neutralize parasites; and 4) intact bivalent IgG contributes to but is not necessary for parasite neutralization. These data provide important insight into the mechanisms of broadly neutralizing anti-malaria Abs and further encourage anti-PfRH5-based malaria prevention efforts.

    Journal of immunology (Baltimore, Md. : 1950) 2014;192;1;245-58

  • A strategy to identify dominant point mutant modifiers of a quantitative trait.

    Dove WF, Shedlovsky A, Clipson L, Amos-Landgraf JM, Halberg RB, Krentz KJ, Boehm FJ, Newton MA, Adams DJ and Keane TM

    McArdle Laboratory for Cancer Research, Department of Oncology, University of Wisconsin-Madison, Madison, Wisconsin 53706 Laboratory of Genetics, University of Wisconsin-Madison, Madison, Wisconsin 53706

    A central goal in the analysis of complex traits is to identify genes that modify a phenotype. Modifiers of a cancer phenotype may act either intrinsically or extrinsically on the salient cell lineage. Germline point mutagenesis by ethylnitrosourea can provide alleles for a gene of interest that include loss-, gain-, or alteration-of-function. Unlike strain polymorphisms, point mutations with heterozygous quantitative phenotypes are detectable in both essential and nonessential genes and are unlinked from other variants that might confound their identification and analysis. This report analyzes strategies seeking quantitative mutational modifiers of Apc(Min) in the mouse. To identify a quantitative modifier of a phenotype of interest, a cluster of test progeny is needed. The cluster size can be increased as necessary for statistical significance if the founder is a male whose sperm is cryopreserved. A second critical element in this identification is a mapping panel free of polymorphic modifiers of the phenotype, to enable low-resolution mapping followed by targeted resequencing to identify the causative mutation. Here, we describe the development of a panel of six "isogenic mapping partner lines" for C57BL/6J, carrying single-nucleotide markers introduced by mutagenesis. One such derivative, B6.SNVg, shown to be phenotypically neutral in combination with Apc(Min), is an appropriate mapping partner to locate induced mutant modifiers of the Apc(Min) phenotype. The evolved strategy can complement four current major initiatives in the genetic analysis of complex systems: the Genome-wide Association Study; the Collaborative Cross; the Knockout Mouse Project; and The Cancer Genome Atlas.

    G3 (Bethesda, Md.) 2014;4;6;1113-21

  • Efficient haplotype matching and storage using the Positional Burrows-Wheeler Transform (PBWT).

    Durbin R

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, UK.

    Motivation: Over the last few years, methods based on suffix arrays using the Burrows-Wheeler Transform have been widely used for DNA sequence read matching and assembly. These provide very fast search algorithms, linear in the search pattern size, on a highly compressible representation of the data set being searched. Meanwhile, algorithmic development for genotype data has concentrated on statistical methods for phasing and imputation, based on probabilistic matching to hidden Markov model representations of the reference data, which while powerful are much less computationally efficient. Here I develop a theory of haplotype matching using suffix array ideas, which should scale to much larger data sets than those currently handled by genotype algorithms. Results: Given M sequences with N bi-allelic variable sites, I give an O(NM) algorithm to derive a representation of the data based on positional prefix arrays, which I term the Positional Burrows-Wheeler Transform (PBWT). On large data sets this compresses with run-length encoding by more than a factor of a hundred smaller than using gzip on the raw data. Using this representation I show how to find all maximal haplotype matches within the set in O(NM) time rather than O(NM(2)) as expected from naive pairwise comparison, and provide a fast algorithm, empirically independent of M given sufficient memory for indexes, to find maximal matches between a new sequence and the set. The discussion includes some proposals about how these approaches could be used for imputation and phasing. Availability: CONTACT:

    Bioinformatics (Oxford, England) 2014

  • The peculiar epidemiology of dracunculiasis in Chad.

    Eberhard ML, Ruiz-Tiben E, Hopkins DR, Farrell C, Toe F, Weiss A, Withers PC, Jenks MH, Thiele EA, Cotton JA, Hance Z, Holroyd N, Cama VA, Tahir MA and Mounda T

    Division of Parasitic Diseases and Malaria, Centers for Disease Control and Prevention, Atlanta, Georgia; The Carter Center, Atlanta, Georgia; The Carter Center, N'Djamena, Chad; LifeSource Biomedical, Centreville, Virginia; The Wellcome Trust Sanger Institute, Hinxton, United Kingdom; Ministry of Public Health, N'Djamena, Chad.

    Dracunculiasis was rediscovered in Chad in 2010 after an apparent absence of 10 years. In April 2012 active village-based surveillance was initiated to determine where, when, and how transmission of the disease was occurring, and to implement interventions to interrupt it. The current epidemiologic pattern of the disease in Chad is unlike that seen previously in Chad or other endemic countries, i.e., no clustering of cases by village or association with a common water source, the average number of worms per person was small, and a large number of dogs were found to be infected. Molecular sequencing suggests these infections were all caused by Dracunculus medinensis. It appears that the infection in dogs is serving as the major driving force sustaining transmission in Chad, that an aberrant life cycle involving a paratenic host common to people and dogs is occurring, and that the cases in humans are sporadic and incidental.

    Funded by: Wellcome Trust: 098051

    The American journal of tropical medicine and hygiene 2014;90;1;61-70

  • CYP6 P450 Enzymes and ACE-1 Duplication Produce Extreme and Multiple Insecticide Resistance in the Malaria Mosquito Anopheles gambiae.

    Edi CV, Djogbénou L, Jenkins AM, Regna K, Muskavitch MA, Poupardin R, Jones CM, Essandoh J, Kétoh GK, Paine MJ, Koudou BG, Donnelly MJ, Ranson H and Weetman D

    Vector Biology Department, Liverpool School of Tropical Medicine, Pembroke Place, Liverpool, United Kingdom; Centre Suisse de Recherches Scientifiques en Côte d'Ivoire, Abidjan, Cote d'Ivoire.

    Malaria control relies heavily on pyrethroid insecticides, to which susceptibility is declining in Anopheles mosquitoes. To combat pyrethroid resistance, application of alternative insecticides is advocated for indoor residual spraying (IRS), and carbamates are increasingly important. Emergence of a very strong carbamate resistance phenotype in Anopheles gambiae from Tiassalé, Côte d'Ivoire, West Africa, is therefore a potentially major operational challenge, particularly because these malaria vectors now exhibit resistance to multiple insecticide classes. We investigated the genetic basis of resistance to the most commonly-applied carbamate, bendiocarb, in An. gambiae from Tiassalé. Geographically-replicated whole genome microarray experiments identified elevated P450 enzyme expression as associated with bendiocarb resistance, most notably genes from the CYP6 subfamily. P450s were further implicated in resistance phenotypes by induction of significantly elevated mortality to bendiocarb by the synergist piperonyl butoxide (PBO), which also enhanced the action of pyrethroids and an organophosphate. CYP6P3 and especially CYP6M2 produced bendiocarb resistance via transgenic expression in Drosophila in addition to pyrethroid resistance for both genes, and DDT resistance for CYP6M2 expression. CYP6M2 can thus cause resistance to three distinct classes of insecticide although the biochemical mechanism for carbamates is unclear because, in contrast to CYP6P3, recombinant CYP6M2 did not metabolise bendiocarb in vitro. Strongly bendiocarb resistant mosquitoes also displayed elevated expression of the acetylcholinesterase ACE-1 gene, arising at least in part from gene duplication, which confers a survival advantage to carriers of additional copies of resistant ACE-1 G119S alleles. Our results are alarming for vector-based malaria control. Extreme carbamate resistance in Tiassalé An. gambiae results from coupling of over-expressed target site allelic variants with heightened CYP6 P450 expression, which also provides resistance across contrasting insecticides. Mosquito populations displaying such a diverse basis of extreme and cross-resistance are likely to be unresponsive to standard insecticide resistance management practices.

    PLoS genetics 2014;10;3;e1004236

  • Geographic population structure analysis of worldwide human populations infers their biogeographical origins.

    Elhaik E, Tatarinova T, Chebotarev D, Piras IS, Maria Calò C, De Montis A, Atzori M, Marini M, Tofanelli S, Francalacci P, Pagani L, Tyler-Smith C, Xue Y, Cucca F, Schurr TG, Gaieski JB, Melendez C, Vilar MG, Owings AC, Gómez R, Fujita R, Santos FR, Comas D, Balanovsky O, Balanovska E, Zalloua P, Soodyall H, Pitchappan R, Ganeshprasad A, Hammer M, Matisoo-Smith L, Wells RS and Genographic Consortium

    1] Department of Animal and Plant Sciences, University of Sheffield, Western Bank, Sheffield, S10 2TN, UK [2] Department of Mental Health, Johns Hopkins University Bloomberg School of Public Health, 615 N. Wolfe Street, Baltimore, Maryland 21205, USA [3].

    The search for a method that utilizes biological information to predict humans' place of origin has occupied scientists for millennia. Over the past four decades, scientists have employed genetic data in an effort to achieve this goal but with limited success. While biogeographical algorithms using next-generation sequencing data have achieved an accuracy of 700 km in Europe, they were inaccurate elsewhere. Here we describe the Geographic Population Structure (GPS) algorithm and demonstrate its accuracy with three data sets using 40,000-130,000 SNPs. GPS placed 83% of worldwide individuals in their country of origin. Applied to over 200 Sardinians villagers, GPS placed a quarter of them in their villages and most of the rest within 50 km of their villages. GPS's accuracy and power to infer the biogeography of worldwide individuals down to their country or, in some cases, village, of origin, underscores the promise of admixture-based methods for biogeography and has ramifications for genetic ancestry testing.

    Nature communications 2014;5;3513

  • Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes.

    Ezkurdia I, Juan D, Rodriguez JM, Frankish A, Diekhans M, Harrow J, Vazquez J, Valencia A and Tress ML

    Unidad de Proteómica, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Melchor Fernández Almagro, 3, 28029, Madrid, Spain.

    Determining the full complement of protein-coding genes is a key goal of genome annotation. The most powerful approach for confirming protein coding potential is the detection of cellular protein expression through peptide mass spectrometry experiments. Here we mapped peptides detected in 7 large-scale proteomics studies to almost 60% of the protein coding genes in the GENCODE annotation of the human genome. We found a strong relationship between detection in proteomics experiments and both gene family age and cross-species conservation. Most of the genes for which we detected peptides were highly conserved. We found peptides for more than 96% of genes that evolved before bilateria. At the opposite end of the scale we identified almost no peptides for genes that have appeared since primates, for genes that did not have any protein-like features or for genes with poor cross-species conservation. These results motivated us to describe a set of 2,001 potentially non-coding genes based on features such as weak conservation, a lack of protein features, or ambiguous annotations from major databases, all of which correlated with low peptide detection across the seven experiments. We identified peptides for just 3% of these genes. We show that many of these genes behave more like non-coding genes than protein-coding genes, and suggest that most are unlikely to code for proteins under normal circumstances. We believe that their inclusion in the human protein coding gene catalogue should be revised as part of the ongoing human genome annotation effort.

    Human molecular genetics 2014

  • Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression.

    Fairfax BP, Humburg P, Makino S, Naranbhai V, Wong D, Lau E, Jostins L, Plant K, Andrews R, McGee C and Knight JC

    Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK.

    To systematically investigate the impact of immune stimulation upon regulatory variant activity, we exposed primary monocytes from 432 healthy Europeans to interferon-γ (IFN-γ) or differing durations of lipopolysaccharide and mapped expression quantitative trait loci (eQTLs). More than half of cis-eQTLs identified, involving hundreds of genes and associated pathways, are detected specifically in stimulated monocytes. Induced innate immune activity reveals multiple master regulatory trans-eQTLs including the major histocompatibility complex (MHC), coding variants altering enzyme and receptor function, an IFN-β cytokine network showing temporal specificity, and an interferon regulatory factor 2 (IRF2) transcription factor-modulated network. Induced eQTL are significantly enriched for genome-wide association study loci, identifying context-specific associations to putative causal genes including CARD9, ATM, and IRF8. Thus, applying pathophysiologically relevant immune stimuli assists resolution of functional genetic variants.

    Funded by: Medical Research Council: 98082; Wellcome Trust: 074318, 088891, 090532/Z/09/Z

    Science (New York, N.Y.) 2014;343;6175;1246949

  • Low copy number of the salivary amylase gene predisposes to obesity.

    Falchi M, El-Sayed Moustafa JS, Takousis P, Pesce F, Bonnefond A, Andersson-Assarsson JC, Sudmant PH, Dorajoo R, Al-Shafai MN, Bottolo L, Ozdemir E, So HC, Davies RW, Patrice A, Dent R, Mangino M, Hysi PG, Dechaume A, Huyvaert M, Skinner J, Pigeyre M, Caiazzo R, Raverdy V, Vaillant E, Field S, Balkau B, Marre M, Visvikis-Siest S, Weill J, Poulain-Godefroy O, Jacobson P, Sjostrom L, Hammond CJ, Deloukas P, Sham PC, McPherson R, Lee J, Tai ES, Sladek R, Carlsson LM, Walley A, Eichler EE, Pattou F, Spector TD and Froguel P

    1] Department of Genomics of Common Disease, Imperial College London, London, UK. [2] [3] [4].

    Common multi-allelic copy number variants (CNVs) appear enriched for phenotypic associations compared to their biallelic counterparts. Here we investigated the influence of gene dosage effects on adiposity through a CNV association study of gene expression levels in adipose tissue. We identified significant association of a multi-allelic CNV encompassing the salivary amylase gene (AMY1) with body mass index (BMI) and obesity, and we replicated this finding in 6,200 subjects. Increased AMY1 copy number was positively associated with both amylase gene expression (P = 2.31 × 10(-14)) and serum enzyme levels (P < 2.20 × 10(-16)), whereas reduced AMY1 copy number was associated with increased BMI (change in BMI per estimated copy = -0.15 (0.02) kg/m(2); P = 6.93 × 10(-10)) and obesity risk (odds ratio (OR) per estimated copy = 1.19, 95% confidence interval (CI) = 1.13-1.26; P = 1.46 × 10(-10)). The OR value of 1.19 per copy of AMY1 translates into about an eightfold difference in risk of obesity between subjects in the top (copy number > 9) and bottom (copy number < 4) 10% of the copy number distribution. Our study provides a first genetic link between carbohydrate metabolism and BMI and demonstrates the power of integrated genomic approaches beyond genome-wide association studies.

    Nature genetics 2014

  • Current status and new features of the Consensus Coding Sequence database.

    Farrell CM, O'Leary NA, Harte RA, Loveland JE, Wilming LG, Wallin C, Diekhans M, Barrell D, Searle SM, Aken B, Hiatt SM, Frankish A, Suner MM, Rajput B, Steward CA, Brown GR, Bennett R, Murphy M, Wu W, Kay MP, Hart J, Rajan J, Weber J, Snow C, Riddick LD, Hunt T, Webb D, Thomas M, Tamez P, Rangwala SH, McGarvey KM, Pujar S, Shkeda A, Mudge JM, Gonzalez JM, Gilbert JG, Trevanion SJ, Baertsch R, Harrow JL, Hubbard T, Ostell JM, Haussler D and Pruitt KD

    National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA, Center for Biomolecular Science and Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK and Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.

    The Consensus Coding Sequence (CCDS) project ( is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.

    Nucleic acids research 2014;42;1;D865-72

  • Workshops: a great way to enhance and supplement a degree.

    Fatumo S, Shome S and Macintyre G

    H3Africa Bioinformatics Network (H3ABioNet) Node, National Biotechnology Development Agency (NABDA), Federal Ministry of Science and Technology (FMST), Abuja, Nigeria ; International Health Research Group, Dept of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom ; Genetic Epidemiology Group, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    As part of the International Society for Computational Biology Student Council (ISCB-SC), Regional Student Groups (RSGs) have helped organise workshops in the emerging fields of bioinformatics and computational biology. Workshops are a great way for students to gain hands-on experience and rapidly acquire knowledge in advanced research topics where curriculum-based education is yet to be developed. RSG workshops have improved dissemination of knowledge of the latest bioinformatics techniques and resources among student communities and young scientists, especially in developing nations. This article highlights some of the benefits and challenges encountered while running RSG workshops. Examples cover a variety of subjects, including introductory bioinformatics and advanced bioinformatics, as well as soft skills such as networking, career development, and socializing. The collective experience condensed in this article is a useful starting point for students wishing to organise their own tailor-made workshops.

    PLoS computational biology 2014;10;2;e1003497

  • Computational biology and bioinformatics in Nigeria.

    Fatumo SA, Adoga MP, Ojo OO, Oluwagbemi O, Adeoye T, Ewejobi I, Adebiyi M, Adebiyi E, Bewaji C and Nashiru O

    H3Africa Bioinformatics Network (H3ABioNet) Node, National Biotechnology Development Agency (NABDA), Federal Ministry of Science and Technology (FMST), Abuja, Nigeria; Human Genetics Department, Wellcome Trust Sanger Institute, Cambridge, United Kingdom; International Health Research Group, Department of Public Health & Primary Care, University of Cambridge, Cambridge, United Kingdom.

    Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological data generated by the scientific community. The critical need to process and analyze such a deluge of data and turn it into useful knowledge has caused bioinformatics to gain prominence and importance. Bioinformatics is an interdisciplinary research area that applies techniques, methodologies, and tools in computer and information science to solve biological problems. In Nigeria, bioinformatics has recently played a vital role in the advancement of biological sciences. As a developing country, the importance of bioinformatics is rapidly gaining acceptance, and bioinformatics groups comprised of biologists, computer scientists, and computer engineers are being constituted at Nigerian universities and research institutes. In this article, we present an overview of bioinformatics education and research in Nigeria. We also discuss professional societies and academic and research institutions that play central roles in advancing the discipline in Nigeria. Finally, we propose strategies that can bolster bioinformatics education and support from policy makers in Nigeria, with potential positive implications for other developing countries.

    PLoS computational biology 2014;10;4;e1003516

  • Pfam: the protein families database.

    Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer EL, Tate J and Punta M

    HHMI Janelia Farm Research Campus, 19700 Helix Drive, Ashburn, VA 20147 USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK, MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, OX1 3QX, UK, Institute of Biotechnology and Department of Biological and Environmental Sciences, University of Helsinki, PO Box 56 (Viikinkaari 5), 00014 Helsinki, Finland and Stockholm Bioinformatics Center, Swedish eScience Research Center, Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, PO Box 1031, SE-17121 Solna, Sweden.

    Pfam, available via servers in the UK ( and the USA (, is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.

    Nucleic acids research 2014;42;1;D222-30

  • High-Definition Reconstruction of Clonal Composition in Cancer.

    Fischer A, Vázquez-García I, Illingworth CJ and Mustonen V

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. Electronic address:

    The extensive genetic heterogeneity of cancers can greatly affect therapy success due to the existence of subclonal mutations conferring resistance. However, the characterization of subclones in mixed-cell populations is computationally challenging due to the short length of sequence reads that are generated by current sequencing technologies. Here, we report cloneHD, a probabilistic algorithm for the performance of subclone reconstruction from data generated by high-throughput DNA sequencing: read depth, B-allele counts at germline heterozygous loci, and somatic mutation counts. The algorithm can exploit the added information present in correlated longitudinal or multiregion samples and takes into account correlations along genomes caused by events such as copy-number changes. We apply cloneHD to two case studies: a breast cancer sample and time-resolved samples of chronic lymphocytic leukemia, where we demonstrate that monitoring the response of a patient to therapy regimens is feasible. Our work provides new opportunities for tracking cancer development.

    Cell reports 2014

  • Biomarker profiling by nuclear magnetic resonance spectroscopy for the prediction of all-cause mortality: an observational study of 17,345 persons.

    Fischer K, Kettunen J, Würtz P, Haller T, Havulinna AS, Kangas AJ, Soininen P, Esko T, Tammesoo ML, Mägi R, Smit S, Palotie A, Ripatti S, Salomaa V, Ala-Korpela M, Perola M and Metspalu A

    The Estonian Genome Center, University of Tartu, Tartu, Estonia.

    Background: Early identification of ambulatory persons at high short-term risk of death could benefit targeted prevention. To identify biomarkers for all-cause mortality and enhance risk prediction, we conducted high-throughput profiling of blood specimens in two large population-based cohorts.

    106 candidate biomarkers were quantified by nuclear magnetic resonance spectroscopy of non-fasting plasma samples from a random subset of the Estonian Biobank (n = 9,842; age range 18-103 y; 508 deaths during a median of 5.4 y of follow-up). Biomarkers for all-cause mortality were examined using stepwise proportional hazards models. Significant biomarkers were validated and incremental predictive utility assessed in a population-based cohort from Finland (n = 7,503; 176 deaths during 5 y of follow-up). Four circulating biomarkers predicted the risk of all-cause mortality among participants from the Estonian Biobank after adjusting for conventional risk factors: alpha-1-acid glycoprotein (hazard ratio [HR] 1.67 per 1-standard deviation increment, 95% CI 1.53-1.82, p = 5×10(-31)), albumin (HR 0.70, 95% CI 0.65-0.76, p = 2×10(-18)), very-low-density lipoprotein particle size (HR 0.69, 95% CI 0.62-0.77, p = 3×10(-12)), and citrate (HR 1.33, 95% CI 1.21-1.45, p = 5×10(-10)). All four biomarkers were predictive of cardiovascular mortality, as well as death from cancer and other nonvascular diseases. One in five participants in the Estonian Biobank cohort with a biomarker summary score within the highest percentile died during the first year of follow-up, indicating prominent systemic reflections of frailty. The biomarker associations all replicated in the Finnish validation cohort. Including the four biomarkers in a risk prediction score improved risk assessment for 5-y mortality (increase in C-statistics 0.031, p = 0.01; continuous reclassification improvement 26.3%, p = 0.001).

    Conclusions: Biomarker associations with cardiovascular, nonvascular, and cancer mortality suggest novel systemic connectivities across seemingly disparate morbidities. The biomarker profiling improved prediction of the short-term risk of death from all causes above established risk factors. Further investigations are needed to clarify the biological mechanisms and the utility of these biomarkers for guiding screening and prevention. Please see later in the article for the Editors' Summary.

    PLoS medicine 2014;11;2;e1001606

  • Ensembl 2014.

    Flicek P, Amode MR, Barrell D, Beal K, Billis K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fitzgerald S, Gil L, Girón CG, Gordon L, Hourlier T, Hunt S, Johnson N, Juettemann T, Kähäri AK, Keenan S, Kulesha E, Martin FJ, Maurel T, McLaren WM, Murphy DN, Nag R, Overduin B, Pignatelli M, Pritchard B, Pritchard E, Riat HS, Ruffier M, Sheppard D, Taylor K, Thormann A, Trevanion SJ, Vullo A, Wilder SP, Wilson M, Zadissa A, Aken BL, Birney E, Cunningham F, Harrow J, Herrero J, Hubbard TJ, Kinsella R, Muffato M, Parker A, Spudich G, Yates A, Zerbino DR and Searle SM

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Ensembl ( creates tools and data resources to facilitate genomic analysis in chordate species with an emphasis on human, major vertebrate model organisms and farm animals. Over the past year we have increased the number of species that we support to 77 and expanded our genome browser with a new scrollable overview and improved variation and phenotype views. We also report updates to our core datasets and improvements to our gene homology relationships from the addition of new species. Our REST service has been extended with additional support for comparative genomics and ontology information. Finally, we provide updated information about our methods for data access and resources for user training.

    Nucleic acids research 2014;42;1;D749-55

  • Genetics in PSC: What Do the "Risk Genes" Teach Us?

    Folseraas T, Liaskou E, Anderson CA and Karlsen TH

    Norwegian PSC Research Center, Department of Transplantation Medicine, Division of Cancer Medicine, Surgery and Transplantation, Oslo University Hospital Rikshospitalet, 4950 Nydalen, 0424, Oslo, Norway.

    A role of genetics in primary sclerosing cholangitis (PSC) development is now firmly established. A total of 16 risk genes have been reported at highly robust ("genome-wide") significance levels, and ongoing efforts suggest that the list will ultimately be considerably longer. Importantly, this genetic risk pool so far accounts for less than 10 % of an estimated overall PSC susceptibility. The relative importance of genetic versus environmental factors (including gene-gene and gene-environment interactions) in remaining aspects of PSC pathogenesis is unknown, and other study designs than genome-wide association studies are needed to explore these aspects. For some of the loci, e.g. HLA and FUT2, distinct interacting environmental factors may exist, and working from the genetic associations may prove one valid path for determining the specific nature of environmental triggers. So far the biological implications for PSC risk genes are typically merely hypothesized based on previously published literature, and there is therefore a strong need for dedicated translational studies to determine their roles within the specific disease context of PSC. Apparently, most risk loci seem to involve in a subset of biological pathways for which genetic associations exist in a multitude of immune-mediated diseases, accounting for both inflammatory bowel disease as well as prototypical autoimmunity. In the present article, we will survey the current knowledge on PSC genetics with a particular emphasis on the pathophysiological insight potentially gained from genetic risk loci involving in this profound immunogenetic pleiotropy.

    Clinical reviews in allergy & immunology 2014

  • Whipworm genome and dual-species transcriptome analyses provide molecular insights into an intimate host-parasite interaction.

    Foth BJ, Tsai IJ, Reid AJ, Bancroft AJ, Nichol S, Tracey A, Holroyd N, Cotton JA, Stanley EJ, Zarowiecki M, Liu JZ, Huckvale T, Cooper PJ, Grencis RK and Berriman M

    1] Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK. [2].

    Whipworms are common soil-transmitted helminths that cause debilitating chronic infections in man. These nematodes are only distantly related to Caenorhabditis elegans and have evolved to occupy an unusual niche, tunneling through epithelial cells of the large intestine. We report here the whole-genome sequences of the human-infective Trichuris trichiura and the mouse laboratory model Trichuris muris. On the basis of whole-transcriptome analyses, we identify many genes that are expressed in a sex- or life stage-specific manner and characterize the transcriptional landscape of a morphological region with unique biological adaptations, namely, bacillary band and stichosome, found only in whipworms and related parasites. Using RNA sequencing data from whipworm-infected mice, we describe the regulated T helper 1 (TH1)-like immune response of the chronically infected cecum in unprecedented detail. In silico screening identified numerous new potential drug targets against trichuriasis. Together, these genomes and associated functional data elucidate key aspects of the molecular host-parasite interactions that define chronic whipworm infection.

    Funded by: Wellcome Trust: 088862/Z/09/Z, 098051, WT083620MA, WT100290MA

    Nature genetics 2014;46;7;693-700

  • GENCODE Pseudogenes.

    Frankish A and Harrow J

    Human and Vertebrate Analysis and Annotation Group, Wellcome Trust Sanger Institute, Morgan Building, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1HH, UK,

    Historically pseudogenes were believed to represent nonfunctional genomic fossils; however, there is emerging evidence that many of them could be biologically active. This possibility has ignited interest in pseudogene loci and made the need for their high-quality annotation more pressing as an accurate knowledge of all pseudogenes in the human reference genome sequence facilitates confident functional analysis. GENCODE have undertaken the first genome-wide pseudogene assignment for protein-coding genes combining both large-scale manual annotation and computational pseudogene prediction pipelines. Multiple computational predictions provide an unbiased set of hints for manual annotators to investigate, both during first-pass annotation and as part of QC to identify any potential missing pseudogene loci. Where a pseudogene is identified, the extent of its homology to the parent locus is fully investigated by a manual annotator; a pseudogene model is built and assigned to one of eight pseudogene biotypes depending on the mechanism of creation and on the presence of locus-specific transcriptional or proteomic data. The high-quality, information-rich set of pseudogenes created has been integrated with ENCODE functional genomics data, specifically expression level, transcription factor and RNA polymerase II binding, and chromatin marks. In this way we have been able to identify some pseudogenes that possess conventional characteristics of functionality as well as others with interesting patterns of partial activity, which might suggest that putatively inactive loci could be gaining a novel function, for example as long noncoding RNAs. The activity data associated with every pseudogene is stored in the psiDR resource.

    Methods in molecular biology (Clifton, N.J.) 2014;1167;129-55

  • De novo mutations in schizophrenia implicate synaptic networks.

    Fromer M, Pocklington AJ, Kavanagh DH, Williams HJ, Dwyer S, Gormley P, Georgieva L, Rees E, Palta P, Ruderfer DM, Carrera N, Humphreys I, Johnson JS, Roussos P, Barker DD, Banks E, Milanova V, Grant SG, Hannon E, Rose SA, Chambert K, Mahajan M, Scolnick EM, Moran JL, Kirov G, Palotie A, McCarroll SA, Holmans P, Sklar P, Owen MJ, Purcell SM and O'Donovan MC

    1] Division of Psychiatric Genomics in the Department of Psychiatry, and Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA [2] Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.

    Inherited alleles account for most of the genetic risk for schizophrenia. However, new (de novo) mutations, in the form of large chromosomal copy number changes, occur in a small fraction of cases and disproportionally disrupt genes encoding postsynaptic proteins. Here we show that small de novo mutations, affecting one or a few nucleotides, are overrepresented among glutamatergic postsynaptic proteins comprising activity-regulated cytoskeleton-associated protein (ARC) and N-methyl-d-aspartate receptor (NMDAR) complexes. Mutations are additionally enriched in proteins that interact with these complexes to modulate synaptic strength, namely proteins regulating actin filament dynamics and those whose messenger RNAs are targets of fragile X mental retardation protein (FMRP). Genes affected by mutations in schizophrenia overlap those mutated in autism and intellectual disability, as do mutation-enriched synaptic pathways. Aligning our findings with a parallel case-control study, we demonstrate reproducible insights into aetiological mechanisms for schizophrenia and reveal pathophysiology shared with other neurodevelopmental disorders.

    Nature 2014

  • Complete Genome Sequence of the WHO International Standard for HIV-1 RNA Determined by Deep Sequencing.

    Gall A, Morris C, Kellam P and Berry N

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    The World Health Organization (WHO) International Standard for HIV-1 RNA nucleic acid assays was characterized by complete genome deep sequencing analysis. The entire coding sequence and flanking long terminal repeats (LTRs), including minority species, were assigned subtype B. This information will aid the design, development, and evaluation of HIV-1 RNA amplification assays.

    Genome announcements 2014;2;1

  • 27th International Mammalian Genome Conference meeting report.

    Gamache B, Leist SR, Bard AD, Logan DW and Carpanini SM

    National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, US.

    Mammalian genome : official journal of the International Mammalian Genome Society 2014;25;5-6;195-201

  • The evolving role of cancer cell line-based screens to define the impact of cancer genomes on drug response.

    Garnett MJ and McDermott U

    Cancer Genome Project, Wellcome Trust Sanger Institute Hinxton, Cambridge, United Kingdom.

    Over the last decade we have witnessed the convergence of two powerful experimental designs toward a common goal of defining the molecular subtypes that underpin the likelihood of a cancer patient responding to treatment in the clinic. The first of these 'experiments' has been the systematic sequencing of large numbers of cancer genomes through the International Cancer Genome Consortium and The Cancer Genome Atlas. This endeavour is beginning to yield a complete catalogue of the cancer genes that are critical for tumourigenesis and amongst which we will find tomorrow's biomarkers and drug targets. The second 'experiment' has been the use of large-scale biological models such as cancer cell lines to correlate mutations in cancer genes with drug sensitivity, such that one could begin to develop rationale clinical trials to begin to test these hypotheses. It is at this intersection of cancer genome sequencing and biological models that there exists the opportunity to completely transform how we stratify cancer patients in the clinic for treatment.

    Current opinion in genetics & development 2014;24C;114-119

  • Subclonal variant calling with multiple samples and prior knowledge.

    Gerstung M, Papaemmanuil E and Campbell PJ

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK; Department of Haematology, Addenbrooke's Hospital, Cambridge CB2 0QQ, UK; Department of Haematology, University of Cambridge, Cambridge CB22XY, UK.

    Motivation: Targeted resequencing of cancer genes in large cohorts of patients is important to understand the biological and clinical consequences of mutations. Cancers are often clonally heterogeneous and the detection of subclonal mutations is important from a diagnostic point of view, but presents strong statistical challenges. Results: Here we present a novel statistical approach for calling mutations from large cohorts of deeply resequenced cancer genes. These data allow for precisely estimating local error profiles and enable detecting mutations with high sensitivity and specificity. Our probabilistic method incorporates knowledge about the distribution of variants in terms of a prior probability. We show that our algorithm has a high accuracy of calling cancer mutations and demonstrate that the detected clonal and subclonal variants have important prognostic consequences. Availability: Code is available as part of the Bioconductor package deepSNV. Contact:,

    Bioinformatics (Oxford, England) 2014

  • Maturation of Induced Pluripotent Stem Cell Derived Hepatocytes by 3D-Culture.

    Gieseck Iii RL, Hannan NR, Bort R, Hanley NA, Drake RA, Cameron GW, Wynn TA and Vallier L

    Wellcome Trust-Medical Research Council Stem Cell Institute, Anne McLaren Laboratory for Regenerative Medicine, Department of Surgery, University of Cambridge, Cambridge, United Kingdom ; Immunopathogenesis Section, Laboratory of Parasitic Diseases, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, United States of America.

    Induced pluripotent stem cell derived hepatocytes (IPSC-Heps) have the potential to reduce the demand for a dwindling number of primary cells used in applications ranging from therapeutic cell infusions to in vitro toxicology studies. However, current differentiation protocols and culture methods produce cells with reduced functionality and fetal-like properties compared to adult hepatocytes. We report a culture method for the maturation of IPSC-Heps using 3-Dimensional (3D) collagen matrices compatible with high throughput screening. This culture method significantly increases functional maturation of IPSC-Heps towards an adult phenotype when compared to conventional 2D systems. Additionally, this approach spontaneously results in the presence of polarized structures necessary for drug metabolism and improves functional longevity to over 75 days. Overall, this research reveals a method to shift the phenotype of existing IPSC-Heps towards primary adult hepatocytes allowing such cells to be a more relevant replacement for the current primary standard.

    PloS one 2014;9;1;e86372

  • Expression and replication studies to identify new candidate genes involved in normal hearing function.

    Girotto G, Vuckovic D, Buniello A, Lorente-Cánovas B, Lewis M, Gasparini P and Steel KP

    Department of Medical Sciences, University of Trieste, Trieste, Italy.

    Considerable progress has been made in identifying deafness genes, but still little is known about the genetic basis of normal variation in hearing function. We recently carried out a Genome Wide Association Study (GWAS) of quantitative hearing traits in southern European populations and found several SNPs with suggestive but none with significant association. In the current study, we followed up these SNPs to investigate which of them might show a genuine association with auditory function using alternative approaches. Firstly, we generated a shortlist of 19 genes from the published GWAS results. Secondly, we carried out immunocytochemistry to examine expression of these 19 genes in the mouse inner ear. Twelve of them showed distinctive cochlear expression patterns. Four showed expression restricted to sensory hair cells (Csmd1, Arsg, Slc16a6 and Gabrg3), one only in marginal cells of the stria vascularis (Dclk1) while the others (Ptprd, Grm8, GlyBP, Evi5, Rimbp2, Ank2, Cdh13) in multiple cochlear cell types. In the third step, we tested these 12 genes for replication of association in an independent set of samples from the Caucasus and Central Asia. Nine out of them showed nominally significant association (p<0.05). In particular, 4 were replicated at the same SNP and with the same effect direction while the remaining 5 showed a significant association in a gene-based test. Finally, to look for genotype-phenotype relationship, the audiometric profiles of the three genotypes of the most strongly associated gene variants were analyzed. Seven out of the 9 replicated genes (CDH13, GRM8, ANK2, SLC16A6, ARSG, RIMBP2 and DCLK1) showed an audiometric pattern with differences between different genotypes further supporting their role in hearing function. These data demonstrate the usefulness of this multistep approach in providing new insights into the molecular basis of hearing and may suggest new targets for treatment and prevention of hearing impairment.

    PloS one 2014;9;1;e85352

  • Fast randomization of large genomic datasets while preserving alteration counts.

    Gobbi A, Iorio F, Dawson KJ, Wedge DC, Tamborero D, Alexandrov LB, Lopez-Bigas N, Garnett MJ, Jurman G and Saez-Rodriguez J

    Fondazione Bruno Kessler, I-38100 Povo (Trento), Italy, European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge CB10 1SD, UK, Wellcome Trust Sanger Institute, Cambridge CB10 1SD, UK and Universitat Pompeu Fabra, Barcelona 08003, Spain.

    Motivation: Studying combinatorial patterns in cancer genomic datasets has recently emerged as a tool for identifying novel cancer driver networks. Approaches have been devised to quantify, for example, the tendency of a set of genes to be mutated in a 'mutually exclusive' manner. The significance of the proposed metrics is usually evaluated by computing P-values under appropriate null models. To this end, a Monte Carlo method (the switching-algorithm) is used to sample simulated datasets under a null model that preserves patient- and gene-wise mutation rates. In this method, a genomic dataset is represented as a bipartite network, to which Markov chain updates (switching-steps) are applied. These steps modify the network topology, and a minimal number of them must be executed to draw simulated datasets independently under the null model. This number has previously been deducted empirically to be a linear function of the total number of variants, making this process computationally expensive.

    Results: We present a novel approximate lower bound for the number of switching-steps, derived analytically. Additionally, we have developed the R package BiRewire, including new efficient implementations of the switching-algorithm. We illustrate the performances of BiRewire by applying it to large real cancer genomics datasets. We report vast reductions in time requirement, with respect to existing implementations/bounds and equivalent P-value computations. Thus, we propose BiRewire to study statistical properties in genomic datasets, and other data that can be modeled as bipartite networks. Availability and implementation: BiRewire is available on BioConductor at CONTACT: Supplementary information: Supplementary data are available at Bioinformatics online.

    Bioinformatics (Oxford, England) 2014;30;17;i617-i623

  • Genomic epidemiology of Neisseria gonorrhoeae with reduced susceptibility to cefixime in the USA: a retrospective observational study.

    Grad YH, Kirkcaldy RD, Trees D, Dordel J, Harris SR, Goldstein E, Weinstock H, Parkhill J, Hanage WP, Bentley S and Lipsitch M

    Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA; Division of Infectious Diseases, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. Electronic address:

    Background: The emergence of Neisseria gonorrhoeae with decreased susceptibility to extended spectrum cephalosporins raises the prospect of untreatable gonorrhoea. In the absence of new treatments, efforts to slow the increasing incidence of resistant gonococcus require insight into the factors that contribute to its emergence and spread. We assessed the relatedness between isolates in the USA and reconstructed likely spread of lineages through different sexual networks. Methods: We sequenced the genomes of 236 isolates of N gonorrhoeae collected by the Centers for Disease Control and Prevention's Gonococcal Isolate Surveillance Project (GISP) from sentinel public sexually transmitted disease clinics in the USA, including 118 (97%) of the isolates from 2009-10 in GISP with reduced susceptibility to cefixime (cef(RS)) and 118 cefixime-susceptible isolates from GISP matched as closely as possible by location, collection date, and sexual orientation. We assessed the association between antimicrobial resistance genotype and phenotype and correlated phylogenetic clustering with location and sexual orientation. Findings: Mosaic penA XXXIV had a high positive predictive value for cef(RS). We found that two of the 118 cef(RS) isolates lacked a mosaic penA allele, and rechecking showed that these two were susceptible to cefixime. Of the 116 remaining cef(RS) isolates, 114 (98%) fell into two distinct lineages that have independently acquired mosaic penA allele XXXIV. A major lineage of cef(RS) strains spread eastward, predominantly through a sexual network of men who have sex with men. Eight of nine inferred transitions between sexual networks were introductions from men who have sex with men into the heterosexual population. Interpretation: Genomic methods might aid efforts to slow the spread of antibiotic-resistant N gonorrhoeae through augmentation of gonococcal outbreak surveillance and identification of populations that could benefit from increased screening for aymptomatic infections. Funding: American Sexually Transmitted Disease Association, Wellcome Trust, National Institute of General Medical Sciences, and National Institute of Allergy and Infectious Diseases, National Institutes of Health.

    Funded by: NIGMS NIH HHS: U54 GM088558

    The Lancet infectious diseases 2014

  • Acute myeloid leukaemia: a paradigm for the clonal evolution of cancer?

    Grove CS and Vassiliou GS

    Haematological Cancer Genetics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Acute myeloid leukaemia (AML) is an uncontrolled clonal proliferation of abnormal myeloid progenitor cells in the bone marrow and blood. Advances in cancer genomics have revealed the spectrum of somatic mutations that give rise to human AML and drawn our attention to its molecular evolution and clonal architecture. It is now evident that most AML genomes harbour small numbers of mutations, which are acquired in a stepwise manner. This characteristic, combined with our ability to identify mutations in individual leukaemic cells and our detailed understanding of normal human and murine haematopoiesis, makes AML an excellent model for understanding the principles of cancer evolution. Furthermore, a better understanding of how AML evolves can help us devise strategies to improve the therapy and prognosis of AML patients. Here, we draw from recent advances in genomics, clinical studies and experimental models to describe the current knowledge of the clonal evolution of AML and its implications for the biology and treatment of leukaemias and other cancers.

    Disease models & mechanisms 2014;7;8;941-951

  • De Novo Loss-of-Function Mutations in SETD5, Encoding a Methyltransferase in a 3p25 Microdeletion Syndrome Critical Region, Cause Intellectual Disability.

    Grozeva D, Carss K, Spasic-Boskovic O, Parker MJ, Archer H, Firth HV, Park SM, Canham N, Holder SE, Wilson M, Hackett A, Field M, Floyd JA, UK10K Consortium, Hurles M and Raymond FL

    Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, UK.

    To identify further Mendelian causes of intellectual disability (ID), we screened a cohort of 996 individuals with ID for variants in 565 known or candidate genes by using a targeted next-generation sequencing approach. Seven loss-of-function (LoF) mutations-four nonsense (c.1195A>T [p.Lys399(∗)], c.1333C>T [p.Arg445(∗)], c.1866C>G [p.Tyr622(∗)], and c.3001C>T [p.Arg1001(∗)]) and three frameshift (c.2177_2178del [p.Thr726Asnfs(∗)39], c.3771dup [p.Ser1258Glufs(∗)65], and c.3856del [p.Ser1286Leufs(∗)84])-were identified in SETD5, a gene predicted to encode a methyltransferase. All mutations were compatible with de novo dominant inheritance. The affected individuals had moderate to severe ID with additional variable features of brachycephaly; a prominent high forehead with synophrys or striking full and broad eyebrows; a long, thin, and tubular nose; long, narrow upslanting palpebral fissures; and large, fleshy low-set ears. Skeletal anomalies, including significant leg-length discrepancy, were a frequent finding in two individuals. Congenital heart defects, inguinal hernia, or hypospadias were also reported. Behavioral problems, including obsessive-compulsive disorder, hand flapping with ritualized behavior, and autism, were prominent features. SETD5 lies within the critical interval for 3p25 microdeletion syndrome. The individuals with SETD5 mutations showed phenotypic similarity to those previously reported with a deletion in 3p25, and thus loss of SETD5 might be sufficient to account for many of the clinical features observed in this condition. Our findings add to the growing evidence that mutations in genes encoding methyltransferases regulating histone modification are important causes of ID. This analysis provides sufficient evidence that rare de novo LoF mutations in SETD5 are a relatively frequent (0.7%) cause of ID.

    American journal of human genetics 2014

  • A systematic review of definitions of extreme phenotypes of HIV control and progression.

    Gurdasani D, Iles L, Dillon DG, Young EH, Olson AD, Naranbhai V, Fidler S, Gkrania-Klotsas E, Post FA, Kellam P, Porter K and Sandhu MS

    aWellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton bStrangeways Research Laboratory, Department of Public Health and Primary Care, University of Cambridge, Wort's Causeway, Cambridge cMedical Research Council, Clinical Trials Unit, Aviation House, London, UK dCentre for the AIDS Programme of Research in South Africa (CAPRISA), Doris Duke Medical Research Institute, Nelson R Mandela School of Medicine, University of KwaZulu-Natal, Durban, South Africa eWellcome Trust Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford fImperial College Healthcare NHS Trust, London gCambridge University Hospitals NHS Foundation Trust, Department of Infectious Diseases, Addenbrooke's Hospital, Cambridge hKing's College London, Weston Education Centre iDivision of Infection and Immunity, University College London, London, UK.

    The study of individuals at opposite ends of the HIV clinical spectrum can provide invaluable insights into HIV biology. Heterogeneity in criteria used to define these individuals can introduce inconsistencies in results from research and make it difficult to identify biological mechanisms underlying these phenotypes. In this systematic review, we formally quantified the heterogeneity in definitions used for terms referring to extreme phenotypes in the literature, and identified common definitions and components used to describe these phenotypes. We assessed 714 definitions of HIV extreme phenotypes in 501 eligible studies published between 1 January 2000 and 15 March 2012, and identified substantial variation among these. This heterogeneity in definitions may represent important differences in biological endophenotypes and clinical progression profiles of individuals selected by these, suggesting the need for harmonized definitions. In this context, we were able to identify common components in existing definitions that may provide a framework for developing consensus definitions for these phenotypes in HIV infection.

    Funded by: Medical Research Council: G0901213; Wellcome Trust

    AIDS (London, England) 2014;28;2;149-62

  • A GC1 Acinetobacter baumannii isolate carrying AbaR3 and the aminoglycoside resistance transposon TnaphA6 in a conjugative plasmid.

    Hamidian M, Holt KE, Pickard D, Dougan G and Hall RM

    School of Molecular Bioscience, The University of Sydney, NSW 2006, Australia.

    Objectives: To locate the acquired antibiotic resistance genes, including the amikacin resistance transposon TnaphA6, in the genome of an Australian isolate belonging to Acinetobacter baumannii global clone 1 (GC1).

    Methods: A multiply antibiotic-resistant GC1 isolate harbouring TnaphA6 was sequenced using Illumina HiSeq, and reads were used to generate a de novo assembly and determine multilocus sequence types (STs). PCR was used to assemble the AbaR chromosomal resistance island and a large plasmid carrying TnaphA6. Plasmid DNA sequences were compared with ones available in GenBank. Conjugation experiments were conducted.

    Results: The A. baumannii GC1 isolate G7 was shown to include the AbaR3 antibiotic resistance island. It also contains an 8.7 kb cryptic plasmid, pAb-G7-1, and a 70 100 bp plasmid, pAb-G7-2, carrying TnaphA6. pAb-G7-2 belongs to the Aci6 Acinetobacter plasmid family. It encodes transfer functions and was shown to conjugate. Plasmids related to pAb-G7-2 were detected in further amikacin-resistant GC1 isolates using PCR. From the genome sequence, isolate G7 was ST1 (Institut Pasteur scheme) and ST231 (Oxford scheme). Using Oxford scheme PCR-based methods, the isolate was ST109 and this difference was traced to a single base difference resulting from the inclusion of the original primers in the gpi segment analysed.

    Conclusions: The multiply antibiotic-resistant GC1 isolate G7 carries most of its resistance genes in AbaR3 located in the chromosome. However, TnaphA6 is on a conjugative plasmid, pAb-G7-2. Primers developed to locate TnaphA6 in pAb-G7-2 will simplify the detection of plasmids related to pAb-G7-2 in A. baumannii isolates.

    The Journal of antimicrobial chemotherapy 2014;69;4;955-8

  • A conjugative plasmid carrying the carbapenem resistance gene blaOXA-23 in AbaR4 in an extensively resistant GC1 Acinetobacter baumannii isolate.

    Hamidian M, Kenyon JJ, Holt KE, Pickard D and Hall RM

    School of Molecular Bioscience, The University of Sydney, NSW 2006, Australia.

    Objectives: To locate the acquired blaOXA-23 carbapenem resistance gene in an Australian A. baumannii global clone 1 (GC1) isolate.

    Methods: The genome of the extensively antibiotic-resistant GC1 isolate A85 harbouring blaOXA-23 in Tn2006 was sequenced using Illumina HiSeq, and the reads were used to generate a de novo assembly. PCR was used to assemble relevant contigs. Sequences were compared with ones in GenBank. Conjugation experiments were conducted.

    Results: The sporadic GC1 isolate A85, recovered in 2003, was extensively resistant, exhibiting resistance to imipenem, meropenem and ticarcillin/clavulanate, to cephalosporins and fluoroquinolones and to the older antibiotics gentamicin, kanamycin and neomycin, sulfamethoxazole, trimethoprim and tetracycline. Genes for resistance to older antibiotics are in the chromosome, in an AbaR3 resistance island. A second copy of the ampC gene in Tn6168 confers cephalosporin resistance and the gyrA and parC genes have mutations leading to fluoroquinolone resistance. An 86 335 bp repAci6 plasmid, pA85-3, carrying blaOXA-23 in Tn2006 in AbaR4, was shown to transfer imipenem, meropenem and ticarcillin/clavulanate resistance into a susceptible recipient. A85 also contains two small cryptic plasmids of 2.7 and 8.7 kb. A85 is sequence type ST126 (Oxford scheme) and carries a novel KL15 capsule locus and the OCL3 outer core locus.

    Conclusions: A85 represents a new GC1 lineage identified by the novel capsule locus but retains AbaR3 carrying genes for resistance to older antibiotics. Resistance to imipenem, meropenem and ticarcillin/clavulanate has been introduced into A85 by pA85-3, a repAci6 conjugative plasmid carrying Tn2006 in AbaR4.

    The Journal of antimicrobial chemotherapy 2014

  • Identification of a marker for two lineages within the GC1 clone of Acinetobacter baumannii.

    Hamidian M, Wynn M, Holt KE, Pickard D, Dougan G and Hall RM

    School of Molecular Bioscience, The University of Sydney, NSW 2006, Australia.

    The Journal of antimicrobial chemotherapy 2014;69;2;557-8

  • Efficient in vivo deletion of a large imprinted lncRNA by CRISPR/Cas9.

    Han J, Zhang J, Chen L, Shen B, Zhou J, Hu B, Du Y, Tate PH, Huang X and Zhang W

    MOE Key Laboratory of Model Animal for Disease Study; Model Animal Research Center of Nanjing University; Nanjing, Jiangsu Province, PR China.

    Recent genome-wide studies have revealed that the majority of the mouse genome is transcribed as non-coding RNAs (ncRNAs) and growing evidence supports the importance of ncRNAs in regulating gene expression and epigenetic processes. However, the low efficiency of conventional gene targeting strategies has hindered the functional study of ncRNAs in vivo, particularly in generating large fragment deletions of long non-coding RNAs (lncRNAs) with multiple expression variants. The bacterial clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated 9 (Cas9) system has recently been applied as an efficient tool for engineering site-specific mutations of protein-coding genes in the genome. In this study, we explored the potential of using the CRISPR/Cas9 system to generate large genomic deletions of lncRNAs in mice. We developed an efficient one-step strategy to target the maternally expressed lncRNA, Rian, on chromosome 12 in mice. We showed that paired sgRNAs can precisely generate large deletions up to 23kb and the deletion efficiency can be further improved up to 33% by combining multiple sgRNAs. The deletion successfully abolished the expression of Rian from the maternally inherited allele, validating the biological relevance of the mutations in studying an imprinted locus. Mutation of Rian has differential effects on expression of nearby genes in different somatic tissues. Taken together, we have established a robust one-step method to engineer large deletions to knockout lncRNA genes with the CRISPR/Cas9 system. Our work will facilitate future functional studies of other lncRNAs in vivo.

    RNA biology 2014;11;7

  • Collateral damage.

    Hancock RE

    Centre for Microbial Diseases and Immunity Research, University of British Columbia, Vancouver, British Columbia, Canada, and the Wellcome Trust Sanger Institute, Hinxton, UK.

    Nature biotechnology 2014;32;1;66-8

  • Haptoglobin (HP) and Haptoglobin-related protein (HPR) copy number variation, natural selection, and trypanosomiasis.

    Hardwick RJ, Ménard A, Sironi M, Milet J, Garcia A, Sese C, Yang F, Fu B, Courtin D and Hollox EJ

    Department of Genetics, University of Leicester, Leicester, UK.

    Haptoglobin, coded by the HP gene, is a plasma protein that acts as a scavenger for free heme, and haptoglobin-related protein (coded by the HPR gene) forms part of the trypanolytic factor TLF-1, together with apolipoprotein L1 (ApoL1). We analyse the polymorphic small intragenic duplication of the HP gene, with alleles Hp1 and Hp2, in 52 populations, and find no evidence for natural selection either from extended haplotype analysis or from correlation with pathogen richness matrices. Using fiber-FISH, the paralog ratio test, and array-CGH data, we also confirm that the HPR gene is copy number variable, with duplication of the whole HPR gene at polymorphic frequencies in west and central Africa, up to an allele frequency of 15 %. The geographical distribution of the HPR duplication allele overlaps the region where the pathogen causing chronic human African trypanosomiasis, Trypanosoma brucei gambiense, is endemic. The HPR duplication has occurred on one SNP haplotype, but there is no strong evidence of extended homozygosity, a characteristic of recent natural selection. The HPR duplication shows a slight, non-significant undertransmission to human African trypanosomiasis-affected children of unaffected parents in the Democratic Republic of Congo. However, taken together with alleles of APOL1, there is an overall significant undertransmission of putative protective alleles to human African trypanosomiasis-affected children.

    Human genetics 2014;133;1;69-83

  • Restriction and recruitment - gene duplication and the origin and evolution of snake venom toxins.

    Hargreaves AD, Swain MT, Hegarty MJ, Logan DW and Mulley JF

    1. School of Biological Sciences, Bangor University, Brambell Building, Deiniol Road, Bangor, Gwynedd, LL57 2UW, United Kingdom.

    Snake venom has been hypothesised to have originated and diversified via a process that involves duplication of genes encoding body proteins with subsequent recruitment of the copy to the venom gland, where natural selection acts to develop or increase toxicity. However, gene duplication is known to be a rare event in vertebrate genomes and the recruitment of duplicated genes to a novel expression domain (neofunctionalisation) is an even rarer process that requires the evolution of novel combinations of transcription factor binding sites in upstream regulatory regions. Therefore, whilst this hypothesis concerning the evolution of snake venom is therefore very unlikely and should be regarded with caution, it is nonetheless often assumed to be established fact, hindering research into the true origins of snake venom toxins. To critically evaluate this hypothesis we have generated transcriptomic data for body tissues and salivary and venom glands from five species of venomous and non-venomous reptiles. Our comparative transcriptomic analysis of these data reveals that snake venom does not evolve via the hypothesised process of duplication and recruitment of genes encoding body proteins. Indeed, our results show that many proposed venom toxins are in fact expressed in a wide variety of body tissues, including the salivary gland of non-venomous reptiles and that these genes have therefore been restricted to the venom gland following duplication, not recruited. Thus snake venom evolves via the duplication and subfunctionalisation of genes encoding existing salivary proteins. These results highlight the danger of the elegant and intuitive "just-so story" in evolutionary biology.

    Genome biology and evolution 2014

  • Abundant and Diverse Clustered Regularly Interspaced Short Palindromic Repeat Spacers in Clostridium difficile Strains and Prophages Target Multiple Phage Types within This Pathogen.

    Hargreaves KR, Flores CO, Lawley TD and Clokie MR

    Department of Infection, Inflammation and Immunity, University of Leicester, Leicester, United Kingdom.

    Clostridium difficile is an important human-pathogenic bacterium causing antibiotic-associated nosocomial infections worldwide. Mobile genetic elements and bacteriophages have helped shape C. difficile genome evolution. In many bacteria, phage infection may be controlled by a form of bacterial immunity called the clustered regularly interspaced short palindromic repeats/CRISPR-associated (CRISPR/Cas) system. This uses acquired short nucleotide sequences (spacers) to target homologous sequences (protospacers) in phage genomes. C. difficile carries multiple CRISPR arrays, and in this paper we examine the relationships between the host- and phage-carried elements of the system. We detected multiple matches between spacers and regions in 31 C. difficile phage and prophage genomes. A subset of the spacers was located in prophage-carried CRISPR arrays. The CRISPR spacer profiles generated suggest that related phages would have similar host ranges. Furthermore, we show that C. difficile strains of the same ribotype could either have similar or divergent CRISPR contents. Both synonymous and nonsynonymous mutations in the protospacer sequences were identified, as well as differences in the protospacer adjacent motif (PAM), which could explain how phages escape this system. This paper illustrates how the distribution and diversity of CRISPR spacers in C. difficile, and its prophages, could modulate phage predation for this pathogen and impact upon its evolution and pathogenicity.

    Importance: Clostridium difficile is a significant bacterial human pathogen which undergoes continual genome evolution, resulting in the emergence of new virulent strains. Phages are major facilitators of genome evolution in other bacterial species, and we use sequence analysis-based approaches in order to examine whether the CRISPR/Cas system could control these interactions across divergent C. difficile strains. The presence of spacer sequences in prophages that are homologous to phage genomes raises an extra level of complexity in this predator-prey microbial system. Our results demonstrate that the impact of phage infection in this system is widespread and that the CRISPR/Cas system is likely to be an important aspect of the evolutionary dynamics in C. difficile.

    mBio 2014;5;5

  • WormBase 2014: new views of curated biology.

    Harris TW, Baran J, Bieri T, Cabunoc A, Chan J, Chen WJ, Davis P, Done J, Grove C, Howe K, Kishore R, Lee R, Li Y, Muller HM, Nakamura C, Ozersky P, Paulini M, Raciti D, Schindelman G, Tuli MA, Van Auken K, Wang D, Wang X, Williams G, Wong JD, Yook K, Schedl T, Hodgkin J, Berriman M, Kersey P, Spieth J, Stein L and Sternberg PW

    Informatics and Bio-computing Platform, Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada, Genome Sequencing Center, Washington University, School of Medicine, St Louis, MO 63108, USA, Division of Biology and Biological Engineering 156-29, California Institute of Technology, Pasadena, CA 91125, USA, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Department of Genetics Campus, Washington University School of Medicine, St. Louis, MO 63110, USA, Genetics Unit, Department of Biochemistry, University of Oxford, Oxford OX1 3QU, UK, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK and Howard Hughes Medical Institute, California Institute of Technology, Pasadena, CA 91125, USA.

    WormBase ( is a highly curated resource dedicated to supporting research using the model organism Caenorhabditis elegans. With an electronic history predating the World Wide Web, WormBase contains information ranging from the sequence and phenotype of individual alleles to genome-wide studies generated using next-generation sequencing technologies. In recent years, we have expanded the contents to include data on additional nematodes of agricultural and medical significance, bringing the knowledge of C. elegans to bear on these systems and providing support for underserved research communities. Manual curation of the primary literature remains a central focus of the WormBase project, providing users with reliable, up-to-date and highly cross-linked information. In this update, we describe efforts to organize the original atomized and highly contextualized curated data into integrated syntheses of discrete biological topics. Next, we discuss our experiences coping with the vast increase in available genome sequences made possible through next-generation sequencing platforms. Finally, we describe some of the features and tools of the new WormBase Web site that help users better find and explore data of interest.

    Funded by: Howard Hughes Medical Institute; Medical Research Council: G070119; NHGRI NIH HHS: P41 HG002223, U41-HG002223

    Nucleic acids research 2014;42;Database issue;D789-93

  • Modification of British Committee for Standards in Haematology diagnostic criteria for essential thrombocythaemia.

    Harrison CN, Butt N, Campbell P, Conneally E, Drummond M, Green AR, Murrin R, Radia DH, Mead A, Reilly JT, Cross NC and McMullin MF

    Department of Haematology, Guy's and St Thomas, Hospitals' NHS Foundation Trust, London, UK.

    British journal of haematology 2014

  • A novel hybrid SCCmec-mecC region in Staphylococcus sciuri.

    Harrison EM, Paterson GK, Holden MT, Ba X, Rolo J, Morgan FJ, Pichon B, Kearns A, Zadoks RN, Peacock SJ, Parkhill J and Holmes MA

    Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.

    Objectives: Methicillin resistance in Staphylococcus spp. results from the expression of an alternative penicillin-binding protein 2a (encoded by mecA) with a low affinity for β-lactam antibiotics. Recently, a novel variant of mecA known as mecC (formerly mecALGA251) was identified in Staphylococcus aureus isolates from both humans and animals. In this study, we identified two Staphylococcus sciuri subsp. carnaticus isolates from bovine infections that harbour three different mecA homologues: mecA, mecA1 and mecC.

    Methods: We subjected the two isolates to whole-genome sequencing to further understand the genetic context of the mec-containing region. We also used PCR and RT-PCR to investigate the excision and expression of the SCCmec element and mec genes, respectively.

    Results: Whole-genome sequencing revealed a novel hybrid SCCmec region at the orfX locus consisting of a class E mec complex (mecI-mecR1-mecC1-blaZ) located immediately downstream of a staphylococcal cassette chromosome mec (SCCmec) type VII element. A second SCCmec attL site (attL2), which was imperfect, was present downstream of the mecC region. PCR analysis of stationary-phase cultures showed that both the SCCmec type VII element and a hybrid SCCmec-mecC element were capable of excision from the genome and forming a circular intermediate. Transcriptional analysis showed that mecC and mecA, but not mecA1, were both expressed in liquid culture supplemented with oxacillin.

    Conclusions: Overall, this study further highlights that a range of staphylococcal species harbour the mecC gene and furthers the view that coagulase-negative staphylococci associated with animals may act as reservoirs of antibiotic resistance genes for more pathogenic staphylococcal species.

    The Journal of antimicrobial chemotherapy 2014;69;4;911-8

  • A Shared Population of Epidemic Methicillin-Resistant Staphylococcus aureus 15 Circulates in Humans and Companion Animals.

    Harrison EM, Weinert LA, Holden MT, Welch JJ, Wilson K, Morgan FJ, Harris SR, Loeffler A, Boag AK, Peacock SJ, Paterson GK, Waller AS, Parkhill J and Holmes MA

    Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom.

    Unlabelled: Methicillin-resistant Staphylococcus aureus (MRSA) is a global human health problem causing infections in both hospitals and the community. Companion animals, such as cats, dogs, and horses, are also frequently colonized by MRSA and can become infected. We sequenced the genomes of 46 multilocus sequence type (ST) 22 MRSA isolates from cats and dogs in the United Kingdom and compared these to an extensive population framework of human isolates from the same lineage. Phylogenomic analyses showed that all companion animal isolates were interspersed throughout the epidemic MRSA-15 (EMRSA-15) pandemic clade and clustered with human isolates from the United Kingdom, with human isolates basal to those from companion animals, suggesting a human source for isolates infecting companion animals. A number of isolates from the same veterinary hospital clustered together, suggesting that as in human hospitals, EMRSA-15 isolates are readily transmitted in the veterinary hospital setting. Genome-wide association analysis did not identify any host-specific single nucleotide polymorphisms (SNPs) or virulence factors. However, isolates from companion animals were significantly less likely to harbor a plasmid encoding erythromycin resistance. When this plasmid was present in animal-associated isolates, it was more likely to contain mutations mediating resistance to clindamycin. This finding is consistent with the low levels of erythromycin and high levels of clindamycin used in veterinary medicine in the United Kingdom. This study furthers the "one health" view of infectious diseases that the pathogen pool of human and animal populations are intrinsically linked and provides evidence that antibiotic usage in animal medicine is shaping the population of a major human pathogen.

    Importance: Methicillin-resistant Staphylococcus aureus (MRSA) is major problem in human medicine. Companion animals, such as cats, dogs, and horses, can also become colonized and infected by MRSA. Here, we demonstrate that a shared population of an important and globally disseminated lineage of MRSA can infect both humans and companion animals without undergoing host adaptation. This suggests that companion animals might act as a reservoir for human infections. We also show that the isolates from companion animals have differences in the presence of certain antibiotic resistance genes. This study furthers the "one health" view of infectious diseases by demonstrating that the pool of MRSA isolates in the human and animal populations are shared and highlights how different antibiotic usage patterns between human and veterinary medicine can shape the population of bacterial pathogens.

    mBio 2014;5;3

  • The Vertebrate Genome Annotation browser 10 years on.

    Harrow JL, Steward CA, Frankish A, Gilbert JG, Gonzalez JM, Loveland JE, Mudge J, Sheppard D, Thomas M, Trevanion S and Wilming LG

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1HH, UK.

    The Vertebrate Genome Annotation (VEGA) database (, initially designed as a community resource for browsing manual annotation of the human genome project, now contains five reference genomes (human, mouse, zebrafish, pig and rat). Its introduction pages have been redesigned to enable the user to easily navigate between whole genomes and smaller multi-species haplotypic regions of interest such as the major histocompatibility complex. The VEGA browser is unique in that annotation is updated via the Human And Vertebrate Analysis aNd Annotation (HAVANA) update track every 2 weeks, allowing single gene updates to be made publicly available to the research community quickly. The user can now access different haplotypic subregions more easily, such as those from the non-obese diabetic mouse, and display them in a more intuitive way using the comparative tools. We also highlight how the user can browse manually annotated updated patches from the Genome Reference Consortium (GRC).

    Funded by: Biotechnology and Biological Sciences Research Council: BB/K009524/1; NHGRI NIH HHS: 5U54HG004555, U41 HG007234, U54 HG004555; Wellcome Trust: WT098051

    Nucleic acids research 2014;42;Database issue;D771-9

  • Bayesian latent variable collapsing model for detecting rare variant interaction effect in twin study.

    He L, Sillanpää MJ, Ripatti S and Pitkäniemi J

    Department of Public Health, Hjelt Institute, University of Helsinki, Finland.

    By analyzing more next-generation sequencing data, researchers have affirmed that rare genetic variants are widespread among populations and likely play an important role in complex phenotypes. Recently, a handful of statistical models have been developed to analyze rare variant (RV) association in different study designs. However, due to the scarce occurrence of minor alleles in data, appropriate statistical methods for detecting RV interaction effects are still difficult to develop. We propose a hierarchical Bayesian latent variable collapsing method (BLVCM), which circumvents the obstacles by parameterizing the signals of RVs with latent variables in a Bayesian framework and is parameterized for twin data. The BLVCM can tackle nonassociated variants, allow both protective and deleterious effects, capture SNP-SNP synergistic effect, provide estimates for the gene level and individual SNP contributions, and can be applied to both independent and various twin designs. We assessed the statistical properties of the BLVCM using simulated data, and found that it achieved better performance in terms of power for interaction effect detection compared to the Granvil and the SKAT. As proof of practical application, the BLVCM was then applied to a twin study analysis of more than 20,000 gene regions to identify significant RVs associated with low-density lipoprotein cholesterol level. The results show that some of the findings are consistent with previous studies, and we identified some novel gene regions with significant SNP-SNP synergistic effects.

    Genetic epidemiology 2014;38;4;310-24

  • Mechanisms underlying mutational signatures in human cancers.

    Helleday T, Eshtad S and Nik-Zainal S

    Science for Life Laboratory, Division of Translational Medicine and Chemical Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, S-171 21 Stockholm, Sweden.

    The collective somatic mutations observed in a cancer are the outcome of multiple mutagenic processes that have been operative over the lifetime of a patient. Each process leaves a characteristic imprint - a mutational signature - on the cancer genome, which is defined by the type of DNA damage and DNA repair processes that result in base substitutions, insertions and deletions or structural variations. With the advent of whole-genome sequencing, researchers are identifying an increasing array of these signatures. Mutational signatures can be used as a physiological readout of the biological history of a cancer and also have potential use for discerning ongoing mutational processes from historical ones, thus possibly revealing new targets for anticancer therapies.

    Nature reviews. Genetics 2014

  • Optoactivation of locus ceruleus neurons evokes bidirectional changes in thermal nociception in rats.

    Hickey L, Li Y, Fyson SJ, Watson TC, Perrins R, Hewinson J, Teschemacher AG, Furue H, Lumb BM and Pickering AE

    School of Physiology and Pharmacology, University of Bristol, Bristol BS8 1TD, United Kingdom, Department of Anesthesia, University Hospitals Bristol, Bristol BS2 8HW, United Kingdom, Department of Information Physiology, National Institute for Physiological Sciences, Myodaiji, Okazaki 444-8787, Japan, Wellcome Trust Sanger Institute, Cambridge CB10 1SA, United Kingdom, and Sorbonne Universités, Université Pierre et Marie Curie Paris 6, Unité Mixte de Recherche-Scientifique 8246, Neuroscience Paris Seine, Navigation Memory and Aging team, F-75005 Paris, France.

    Pontospinal noradrenergic neurons are thought to form part of a descending endogenous analgesic system that exerts inhibitory influences on spinal nociception. Using optogenetic targeting, we tested the hypothesis that excitation of the locus ceruleus (LC) is antinociceptive. We transduced rat LC neurons by direct injection of a lentiviral vector expressing channelrhodopsin2 under the control of the PRS promoter. Subsequent optoactivation of the LC evoked repeatable, robust, antinociceptive (+4.7°C ± 1.0, p < 0.0001) or pronociceptive (-4.4°C ± 0.7, p < 0.0001) changes in hindpaw thermal withdrawal thresholds. Post hoc anatomical characterization of the distribution of transduced somata referenced against the position of the optical fiber and subsequent further functional analysis showed that antinociceptive actions were evoked from a distinct, ventral subpopulation of LC neurons. Therefore, the LC is capable of exerting potent, discrete, bidirectional influences on thermal nociception that are produced by specific subpopulations of noradrenergic neurons. This reflects an underlying functional heterogeneity of the influence of the LC on the processing of nociceptive information.

    The Journal of neuroscience : the official journal of the Society for Neuroscience 2014;34;12;4148-60

  • Trypsin- and Chymotrypsin-Like Serine Proteases in Schistosoma mansoni - 'The Undiscovered Country'.

    Horn M, Fajtová P, Rojo Arreola L, Ulrychová L, Bartošová-Sojková P, Franta Z, Protasio AV, Opavský D, Vondrášek J, McKerrow JH, Mareš M, Caffrey CR and Dvořák J

    Institute of Organic Chemistry and Biochemistry, Academy of Sciences of the Czech Republic, Prague, Czech Republic.

    Background: Blood flukes (Schistosoma spp.) are parasites that can survive for years or decades in the vasculature of permissive mammalian hosts, including humans. Proteolytic enzymes (proteases) are crucial for successful parasitism, including aspects of invasion, maturation and reproduction. Most attention has focused on the 'cercarial elastase' serine proteases that facilitate skin invasion by infective schistosome larvae, and the cysteine and aspartic proteases that worms use to digest the blood meal. Apart from the cercarial elastases, information regarding other S. mansoni serine proteases (SmSPs) is limited. To address this, we investigated SmSPs using genomic, transcriptomic, phylogenetic and functional proteomic approaches.

    Genes encoding five distinct SmSPs, termed SmSP1 - SmSP5, some of which comprise disparate protein domains, were retrieved from the S. mansoni genome database and annotated. Reverse transcription quantitative PCR (RT- qPCR) in various schistosome developmental stages indicated complex expression patterns for SmSPs, including their constituent protein domains. SmSP2 stood apart as being massively expressed in schistosomula and adult stages. Phylogenetic analysis segregated SmSPs into diverse clusters of family S1 proteases. SmSP1 to SmSP4 are trypsin-like proteases, whereas SmSP5 is chymotrypsin-like. In agreement, trypsin-like activities were shown to predominate in eggs, schistosomula and adults using peptidyl fluorogenic substrates. SmSP5 is particularly novel in the phylogenetics of family S1 schistosome proteases, as it is part of a cluster of sequences that fill a gap between the highly divergent cercarial elastases and other family S1 proteases.

    Our series of post-genomics analyses clarifies the complexity of schistosome family S1 serine proteases and highlights their interrelationships, including the cercarial elastases and, not least, the identification of a 'missing-link' protease cluster, represented by SmSP5. A framework is now in place to guide the characterization of individual proteases, their stage-specific expression and their contributions to parasitism, in particular, their possible modulation of host physiology.

    PLoS neglected tropical diseases 2014;8;3;e2766

  • Genome-Wide Association Study for Circulating Tissue Plasminogen Activator Levels and Functional Follow-Up Implicates Endothelial STXBP5 and STX2.

    Huang J, Huffman JE, Yamkauchi M, Trompet S, Asselbergs FW, Sabater-Lleal M, Trégouët DA, Chen WM, Smith NL, Kleber ME, Shin SY, Becker DM, Tang W, Dehghan A, Johnson AD, Truong V, Folkersen L, Yang Q, Oudot-Mellkah T, Buckley BM, Moore JH, Williams FM, Campbell H, Silbernagel G, Vitart V, Rudan I, Tofler GH, Navis GJ, Destefano A, Wright AF, Chen MH, de Craen AJ, Worrall BB, Rudnicka AR, Rumley A, Bookman EB, Psaty BM, Chen F, Keene KL, Franco OH, Böhm BO, Uitterlinden AG, Carter AM, Jukema JW, Sattar N, Bis JC, Ikram MA, the Cohorts for Heart and Aging Research in Genome Epidemiology (CHARGE) Consortium Neurology Working Group, Sale MM, McKnight B, Fornage M, Ford I, Taylor K, Slagboom PE, McArdle WL, Hsu FC, Franco-Cereceda A, Goodall AH, Yanek LR, Furie KL, Cushman M, Hofman A, Witteman JC, Folsom AR, Basu S, Matijevic N, van Gilst WH, Wilson JF, Westendorp RG, Kathiresan S, Reilly MP, the CARDIoGRAM Consortium, Tracy RP, Polasek O, Winkelmann BR, Grant PJ, Hillege HL, Cambien F, Stott DJ, Lowe GD, Spector TD, Meigs JB, Marz W, Eriksson P, Becker LC, Morange PE, Soranzo N, Williams SM, Hayward C, van der Harst P, Hamsten A, Lowenstein CJ, Strachan DP, O'Donnell CJ and the CHARGE Consortium Hemostatic Factor Working Group

    From National Heart, Lung, and Blood Institute's Framingham Heart Study, Framingham, MA (J.H., A.D.J., C.J.O.); Division of Intramural Research, National Heart, Lung, and Blood Institute, Bethesda, MD (J.H., A.D.J., C.J.O.); MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, Western General Hospital, Edinburgh, Scotland, United Kingdom (J.E.H., V.V., A.F.W., C.H.); The Aab Cardiovascular Research Institute, Department of Medicine, University of Rochester School of Medicine and Dentistry, Rochester, NY (M.Y., C.J.L.); Departments of Cardiology (S.T., J.W.J.), Gerontology and Geriatrics (S.T., A.J.M.d.C., R.G.J.W.), and Molecular Epidemiology (P.E.S.), Leiden University Medical Center, the Netherlands; Department of Cardiology, Division of Heart and Lungs, University Medical Center Utrecht, Utrecht, the Netherlands (F.W.A.); Durrer Center for Cardiogenetic Research, ICIN-Netherlands Heart Institute, Utrecht, the Netherlands (F.W.A.); Institute of Cardiovascular Science, Faculty of Population Health Sciences, University College London, London, United Kingdom (F.W.A.); Cardiovascular Genetics and Genomics Group, Atherosclerosis Research Unit, Department of Medicine (M.S.-L., L.F., P.E., A.H.), Karolinska Institutet, Karolinska University Hospital, Solna, Stockholm, Sweden; INSERM UMRS 937, Pierre et Marie Curie University, Paris, France (D.-A.T., V.T., T.O.M., F.C.); ICAN Institute for Cardiometabolism and Nutrion, Paris, France (D.-A.T., V.T., F.C.); Departments of Public Health Sciences (W.M.C., B.B.W., F.C.) and Biochemistry and Molecular Genetics (M.M.S.), Center for Public Health Genomics, University of Virginia, Charlottesville, VA; Departments of Epidemiology (N.L.S., B.M.P., B.M.), Medicine (B.M.P., J.C.B.), and Health Services (B.M.P.), University of Washington, Seattle, WA; Group Health Research Institute, Group Health Cooperative, Seattle, WA (N.L.S., B.M.P.); Seattle Epidemiologic Research and Information Center, VA Office of Research and Development, Seattle, WA (N.L.S.); Departments of Internal Medicine II-Cardiology (M.E.K.) and Internal Medicine I (B.O.B.), University of Ulm Medical Centre, Ulm, Germany; Mannheim Institute of Public Health, Medical Faculty of Mannheim, University of Heidelberg, Mannheim, Germany (M.E.K., W.M.); Wellcome Trust Sanger Institute, Hinxton, United Kingdom (S.-Y.S., N.S.); MRC Centre for CAiTE, School of Social and Community Medicine (S.-Y.S.), and ALSPAC Laboratory, Department of Social Medicine (W.L.M.), University of Bristol, Bristol, United Kingdom; Division of Internal Medicine, Johns Hopkins School of Medicine, Baltimore, MD (D.M.B., L.R.Y., L.C.B.); Divisions of Epidemiology and Community Health (W.T., A.R.F.) and Biostatistics (S.B.), University of Minnesota, Minneapolis, MN; Departments of Epidemiology (A.D., O.H.F., M.A.I., A.H., J.C.M.W.), Internal Medicine (A.G.U., P.J.G.), Radiology (M.A.I.), and Neurology (M.A.I.), Erasmus Medical Center, Rotterdam, the Netherlands; Netherlands Consortium of Healthy Aging sponsored by Netherlands Genomics Initiative, Leiden, the Netherlands (A.D., O.H.F., A.G.U., P.E.S., A.H., J.C.M.W., R.G.J.W.); Department of Biostatistics, Boston University, Boston, MA (Q.Y., A.D., M.-H.C.); Department of Pharmacology and Therapeutics, University College Cork, Ireland (B.M.B.); Departments of Genetics (J.H.M., S.M.W.) and Community and Family Medicine (J.H.M.), Gesiel School of Medicine at Dartmouth, Lebanon, NH; Department of Twin Research and Genetic Epidemiology, King's College London, United Kingdom (F.M.K.W., T.D.S., N.S.); Centre for Population Health Sciences, University of Edinburgh, Scotland, United Kingdom (H.C., I.R., J.F.W.); Department of Angiology, Swiss Cardiovascular Center, Bern, Switzerland (G.S.); Royal North Shore Hospital, University of Sydney, Australia (G.H.T.); Departments of Internal Medicine (G.J.N.) and Cardiology (W.H.v.G., H.L.H., P.v.d.H.), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands; Division of Population Health Sciences and Education, St George's University of London, London, United Kingdom (A.R.R., D.P.S.); Institute of Cardiovascular and Medical Sciences (A.R., D.J.S., G.D.L.), and Robertson Center for Biostatistics (I.F.), University of Glasgow, Glasgow, United Kingdom; Division of Genomic Medicine, National Human Genome Research Institute, Bethesda, MD (E.B.B.); Department of Biology and Center for Health Disparities Research, East Carolina University, Greenville, NC (K.L.K.); LKC School of Medicine, Nanyang Technological University, Singapore (B.O.B.); Division of Cardiovascular and Diabetes Research, Leeds University, Leeds, United Kingdom (A.M.C.); Durrer Center for Cardiogenetic Research, Amsterdam, the Netherlands (J.W.J.); Interuniversity Cardiology Institute of the Netherlands, Utrecht, the Netherlands (J.W.J.); BHF Glasgow Cardiovascular Research Centre, Faculty of Medicine, Glasgow, United Kingdom (N.S.); Brown Foundation Institute of Molecular Medicine and Human Genetics Center, Division of Epidemiology, School of Public Health (M.F.), and Hemostasis Laboratory (N.M.), University of Texas Health Science Center at Houston, Houston, TX; Medical Genetics Institute, Cedars-Sinai Medical Center, Los Angeles, CA (K.T.); Department of Biostatistical Sciences, Wake Forest School of Medicine, Winston-Salem, NC (F.-C.H.); Cardiothoracic Surgery Unit, Department of Molecular Medicine and Surgery (A.-F.C.), Karolinska Institutet, Stockholm, Sweden; Department of Cardiovascular Sciences, University of Leicester, Leicester, United Kingdom (A.H.G.); The Warren Alpert Medical School of Brown University, Providence, RI (K.L.F.); Departments of Medicine (M.C.) and Pathology (M.C., R.P.T.), University of Vermont, Burlington, VT; Cardiology Division (S.K., C.J.O.), Cardiovascular Research Center (S.K.), Center for Human Genetic Research (S.K.), and General Medicine Division (J.B.M.), Massachusetts General Hospital, Boston, MA; Program in Medical and Population Genetics, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA (S.K.); The Cardiovascular Institute, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA (M.R.P.); Department of Public Health, Faculty of Medicine, University of Split, Split, Croatia (O.P.); Cardiology Team Sachsenhausen, Frankfurt am Main, Germany (B.R.W.); Department of Medicine, Harvard Medical School, Boston, MA (J.B.M.); Synlab Academy, Mannheim, Germany (W.M.); Clinical Institute of Medical and Chemical Laboratory Diagnostics, Medical University of Graz, Graz, Austria (W.M.); and INSERM UMRS 1062, Aix-Marseille Université, Marseille, France (P.-E.M.).

    Objective: Tissue plasminogen activator (tPA), a serine protease, catalyzes the conversion of plasminogen to plasmin, the major enzyme responsible for endogenous fibrinolysis. In some populations, elevated plasma levels of tPA have been associated with myocardial infarction and other cardiovascular diseases. We conducted a meta-analysis of genome-wide association studies to identify novel correlates of circulating levels of tPA.

    Fourteen cohort studies with tPA measures (N=26 929) contributed to the meta-analysis. Three loci were significantly associated with circulating tPA levels (P<5.0×10(-8)). The first locus is on 6q24.3, with the lead single nucleotide polymorphism (SNP; rs9399599; P=2.9×10(-14)) within STXBP5. The second locus is on 8p11.21. The lead SNP (rs3136739; P=1.3×10(-9)) is intronic to POLB and <200 kb away from the tPA encoding the gene PLAT. We identified a nonsynonymous SNP (rs2020921) in modest linkage disequilibrium with rs3136739 (r(2)=0.50) within exon 5 of PLAT (P=2.0×10(-8)). The third locus is on 12q24.33, with the lead SNP (rs7301826; P=1.0×10(-9)) within intron 7 of STX2. We further found evidence for the association of lead SNPs in STXBP5 and STX2 with expression levels of the respective transcripts. In in vitro cell studies, silencing STXBP5 decreased the release of tPA from vascular endothelial cells, whereas silencing STX2 increased the tPA release. Through an in silico lookup, we found no associations of the 3 lead SNPs with coronary artery disease or stroke.

    Conclusions: We identified 3 loci associated with circulating tPA levels, the PLAT region, STXBP5, and STX2. Our functional studies implicate a novel role for STXBP5 and STX2 in regulating tPA release.

    Arteriosclerosis, thrombosis, and vascular biology 2014

  • A comprehensive evaluation of assembly scaffolding tools.

    Hunt M, Newbold C, Berriman M and Otto TD

    Background: Genome assembly is typically a two-stage process: contig assembly followed by the use of paired sequencing reads to join contigs into scaffolds. Scaffolds are usually the focus of reported assembly statistics; longer scaffolds greatly facilitate the use of genome sequences in downstream analyses, and it is appealing to present larger numbers as metrics of assembly performance. However, scaffolds are highly prone to errors, especially when generated using short reads, which can directly result in inflated assembly statistics.

    Results: Here we provide the first independent evaluation of scaffolding tools for second-generation sequencing data. We find large variations in the quality of results depending on the tool and dataset used. Even extremely simple test cases of perfect input, constructed to elucidate the behavior of each algorithm, produced some surprising results. We further dissect the performance of the scaffolders using real and simulated sequencing data derived from the genomes of Staphylococcus aureus, Rhodobacter sphaeroides, Plasmodium falciparum and Homo sapiens. The results from simulated data are of high quality, with several of the tools producing perfect output. However, at least 10% of joins remains unidentified when using real data.

    Conclusions: The scaffolders vary in their usability, speed and number of correct and missed joins made between contigs. Results from real data highlight opportunities for further improvements of the tools. Overall, SGA, SOPRA and SSPACE generally outperform the other tools on our datasets. However, the quality of the results is highly dependent on the read mapper and genome complexity.

    Genome biology 2014;15;3;R42

  • Insertional Mutagenesis and Deep Profiling Reveals Gene Hierarchies and a Myc/p53-Dependent Bottleneck in Lymphomagenesis.

    Huser CA, Gilroy KL, de Ridder J, Kilbey A, Borland G, Mackay N, Jenkins A, Bell M, Herzyk P, van der Weyden L, Adams DJ, Rust AG, Cameron E and Neil JC

    Centre for Virus Research, Institute of Infection, Immunity and Inflammation, College of Medicine, Veterinary Medicine and Life Sciences, University of Glasgow, Glasgow, United Kingdom.

    Retroviral insertional mutagenesis (RIM) is a powerful tool for cancer genomics that was combined in this study with deep sequencing (RIM/DS) to facilitate a comprehensive analysis of lymphoma progression. Transgenic mice expressing two potent collaborating oncogenes in the germ line (CD2-MYC, -Runx2) develop rapid onset tumours that can be accelerated and rendered polyclonal by neonatal Moloney murine leukaemia virus (MoMLV) infection. RIM/DS analysis of 28 polyclonal lymphomas identified 771 common insertion sites (CISs) defining a 'progression network' that encompassed a remarkably large fraction of known MoMLV target genes, with further strong indications of oncogenic selection above the background of MoMLV integration preference. Progression driven by RIM was characterised as a Darwinian process of clonal competition engaging proliferation control networks downstream of cytokine and T-cell receptor signalling. Enhancer mode activation accounted for the most efficiently selected CIS target genes, including Ccr7 as the most prominent of a set of chemokine receptors driving paracrine growth stimulation and lymphoma dissemination. Another large target gene subset including candidate tumour suppressors was disrupted by intragenic insertions. A second RIM/DS screen comparing lymphomas of wild-type and parental transgenics showed that CD2-MYC tumours are virtually dependent on activation of Runx family genes in strong preference to other potent Myc collaborating genes (Gfi1, Notch1). Ikzf1 was identified as a novel collaborating gene for Runx2 and illustrated the interface between integration preference and oncogenic selection. Lymphoma target genes for MoMLV can be classified into (a) a small set of master regulators that confer self-renewal; overcoming p53 and other failsafe pathways and (b) a large group of progression genes that control autonomous proliferation in transformed cells. These findings provide insights into retroviral biology, human cancer genetics and the safety of vector-mediated gene therapy.

    PLoS genetics 2014;10;2;e1004167

  • Who watches the watchmen? An appraisal of benchmarks for multiple sequence alignment.

    Iantorno S, Gori K, Goldman N, Gil M and Dessimoz C

    Wellcome Trust Sanger Institute, Cambridge, UK.

    Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique in bioinformatics used to infer related residues among biological sequences. Thus alignment accuracy is crucial to a vast range of analyses, often in ways difficult to assess in those analyses. To compare the performance of different aligners and help detect systematic errors in alignments, a number of benchmarking strategies have been pursued. Here we present an overview of the main strategies-based on simulation, consistency, protein structure, and phylogeny-and discuss their different advantages and associated risks. We outline a set of desirable characteristics for effective benchmarking, and evaluate each strategy in light of them. We conclude that there is currently no universally applicable means of benchmarking MSA, and that developers and users of alignment tools should base their choice of benchmark depending on the context of application-with a keen awareness of the assumptions underlying each benchmarking strategy.

    Methods in molecular biology (Clifton, N.J.) 2014;1079;59-73

  • The genomic basis of vomeronasal-mediated behaviour.

    Ibarra-Soria X, Levitin MO and Logan DW

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    The vomeronasal organ (VNO) is a chemosensory subsystem found in the nose of most mammals. It is principally tasked with detecting pheromones and other chemical signals that initiate innate behavioural responses. The VNO expresses subfamilies of vomeronasal receptors (VRs) in a cell-specific manner: each sensory neuron expresses just one or two receptors and silences all the other receptor genes. VR genes vary greatly in number within mammalian genomes, from no functional genes in some primates to many hundreds in rodents. They bind semiochemicals, some of which are also encoded in gene families that are coexpanded in species with correspondingly large VR repertoires. Protein and peptide cues that activate the VNO tend to be expressed in exocrine tissues in sexually dimorphic, and sometimes individually variable, patterns. Few chemical ligand-VR-behaviour relationships have been fully elucidated to date, largely due to technical difficulties in working with large, homologous gene families with high sequence identity. However, analysis of mouse lines with mutations in genes involved in ligand-VR signal transduction has revealed that the VNO mediates a range of social behaviours, including male-male and maternal aggression, sexual attraction, lordosis, and selective pregnancy termination, as well as interspecific responses such as avoidance and defensive behaviours. The unusual logic of VR expression now offers an opportunity to map the specific neural circuits that drive these behaviours.

    Mammalian genome : official journal of the International Mammalian Genome Society 2014;25;1-2;75-86

  • Removal of Reprogramming Transgenes Improves the Tissue Reconstitution Potential of Keratinocytes Generated From Human Induced Pluripotent Stem Cells.

    Igawa K, Kokubu C, Yusa K, Horie K, Yoshimura Y, Yamauchi K, Suemori H, Yokozeki H, Toyoda M, Kiyokawa N, Okita H, Miyagawa Y, Akutsu H, Umezawa A, Katayama I and Takeda J

    Department of Dermatology and Department of Social and Environmental Medicine, Graduate School of Medicine, Osaka University, Osaka, Japan; Department of Dermatology, Graduate School of Medicine, Tokyo Medical and Dental University, Tokyo, Japan; Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom; Department of Embryonic Stem Cell Research, Institute for Frontier Medical Sciences, Kyoto University, Kyoto, Japan; Department of Reproductive Biology, Center for Regenerative Medicine, National Center for Child Health and Development, Tokyo, Japan

    Human induced pluripotent stem cell (hiPSC) lines have a great potential for therapeutics because customized cells and organs can be induced from such cells. Assessment of the residual reprogramming factors after the generation of hiPSC lines is required, but an ideal system has been lacking. Here, we generated hiPSC lines from normal human dermal fibroblasts with piggyBac transposon bearing reprogramming transgenes followed by removal of the transposon by the transposase. Under this condition, we compared the phenotypes of transgene-residual and -free hiPSCs of the same genetic background. The transgene-residual hiPSCs, in which the transcription levels of the reprogramming transgenes were eventually suppressed, were quite similar to the transgene-free hiPSCs in a pluripotent state. However, after differentiation into keratinocytes, clear differences were observed. Morphological, functional, and molecular analyses including single-cell gene expression profiling revealed that keratinocytes from transgene-free hiPSC lines were more similar to normal human keratinocytes than those from transgene-residual hiPSC lines, which may be partly explained by reactivation of residual transgenes upon induction of keratinocyte differentiation. These results suggest that transgene-free hiPSC lines should be chosen for therapeutic purposes.

    Stem cells translational medicine 2014

  • Genome evolution and plasticity of Serratia marcescens, an important multidrug resistant nosocomial pathogen.

    Iguchi A, Nagaya Y, Pradel E, Ooka T, Ogura Y, Katsura K, Kurokawa K, Oshima K, Hattori M, Parkhill J, Sebaihia M, Coulthurst S, Gotoh N, Thomson NR, Ewbank JJ and Hayashi T

    Interdisciplinary Research Organization, University of Miyazaki, Miyazaki, Japan; Department of Microbiology and Infection Control Science, Kyoto Pharmaceutical University, Kyoto, Japan; Centre d'Immunologie de Marseille-Luminy, UM2 Aix-Marseille Université, Marseille, France; INSERM, U1104, Marseille, France; CNRS, UMR7280, Marseille, France; Department of Infectious Diseases, Faculty of Medicine, University of Miyazaki, Miyazaki, Japan; Department of Genomics and Bioenvironmental Science, Frontier Science Research Center, University of Miyazaki, Miyazaki, Japan; Earth-Life Science Institute, Tokyo Institute of Technology, Kanagawa, Japan; Graduate School of Frontier Sciences, University of Tokyo, Chiba, Japan; Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK; Division of Molecular Microbiology, College of Life Sciences, University of Dundee, Dundee, UK. Department of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London WC1E 7HT, UK.

    Serratia marcescens is an important nosocomial pathogen that can cause an array of infections, most notably of the urinary tract and bloodstream. Naturally, it is found in many environmental niches, and is capable of infecting plants and animals. The emergence and spread of multidrug-resistant strains producing extended-spectrum or metallo beta-lactamases now pose a threat to public health worldwide. Here we report the complete genome sequences of two carefully selected S. marcescens strains, a multidrug-resistant clinical isolate (strain SM39) and an insect isolate (strain Db11). Our comparative analyses reveal the core genome of S. marcescens and define the potential metabolic capacity, virulence, and multi-drug resistance of this species. We show a remarkable intra-species genetic diversity, both at the sequence level and with regards genome flexibility, which may reflect the diversity of niches inhabited by members of this species. A broader analysis with other Serratia species identifies a set of ca. 3,000 genes that characterize the genus. Within this apparent genetic diversity, we identified many genes implicated in the high virulence potential and antibiotic resistance of SM39, including the metallo beta-lactamase and multiple other drug resistance determinants carried on plasmid pSMC1. We further show that pSMC1 is most closely related to plasmids circulating in Pseudomonas species. Our data will provide a valuable basis for future studies on S. marcescens and new insights into the genetic mechanisms that underlie the emergence of pathogens highly resistant to multiple antimicrobial agents.

    Genome biology and evolution 2014

  • Identifying selection in the within-host evolution of influenza using viral sequence data.

    Illingworth CJ, Fischer A and Mustonen V

    Department of Genetics, University of Cambridge, Cambridge, United Kingdom.

    The within-host evolution of influenza is a vital component of its epidemiology. A question of particular interest is the role that selection plays in shaping the viral population over the course of a single infection. We here describe a method to measure selection acting upon the influenza virus within an individual host, based upon time-resolved genome sequence data from an infection. Analysing sequence data from a transmission study conducted in pigs, describing part of the haemagglutinin gene (HA1) of an influenza virus, we find signatures of non-neutrality in six of a total of sixteen infections. We find evidence for both positive and negative selection acting upon specific alleles, while in three cases, the data suggest the presence of time-dependent selection. In one infection we observe what is potentially a specific immune response against the virus; a non-synonymous mutation in an epitope region of the virus is found to be under initially positive, then strongly negative selection. Crucially, given the lack of homologous recombination in influenza, our method accounts for linkage disequilibrium between nucleotides at different positions in the haemagglutinin gene, allowing for the analysis of populations in which multiple mutations are present at any given time. Our approach offers a new insight into the dynamics of influenza infection, providing a detailed characterisation of the forces that underlie viral evolution.

    PLoS computational biology 2014;10;7;e1003755

  • Genome sequence of the tsetse fly (Glossina morsitans): vector of African trypanosomiasis.

    International Glossina Genome Initiative

    Tsetse flies are the sole vectors of human African trypanosomiasis throughout sub-Saharan Africa. Both sexes of adult tsetse feed exclusively on blood and contribute to disease transmission. Notable differences between tsetse and other disease vectors include obligate microbial symbioses, viviparous reproduction, and lactation. Here, we describe the sequence and annotation of the 366-megabase Glossina morsitans morsitans genome. Analysis of the genome and the 12,308 predicted protein-encoding genes led to multiple discoveries, including chromosomal integrations of bacterial (Wolbachia) genome sequences, a family of lactation-specific proteins, reduced complement of host pathogen recognition proteins, and reduced olfaction/chemosensory associated genes. These genome data provide a foundation for research into trypanosomiasis prevention and yield important insights with broad implications for multiple aspects of tsetse biology.

    Funded by: NHGRI NIH HHS: U54 HG003079; Wellcome Trust: 085775/Z/08/Z, 098051

    Science (New York, N.Y.) 2014;344;6182;380-6

  • Jannovar: a java library for exome annotation.

    Jäger M, Wang K, Bauer S, Smedley D, Krawitz P and Robinson PN

    Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany; Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Berlin, Germany.

    Transcript-based annotation and pedigree analysis are two basic steps in the computational analysis of whole-exome sequencing experiments in genetic diagnostics and disease-gene discovery projects. Here, we present Jannovar, a stand-alone Java application as well as a Java library designed to be used in larger software frameworks for exome and genome analysis. Jannovar uses an interval tree to identify all transcripts affected by a given variant, and provides Human Genome Variation Society-compliant annotations both for variants affecting coding sequences and splice junctions as well as untranslated regions and noncoding RNA transcripts. Jannovar can also perform family-based pedigree analysis with Variant Call Format (VCF) files with data from members of a family segregating a Mendelian disorder. Using a desktop computer, Jannovar requires a few seconds to annotate a typical VCF file with exome data. Jannovar is freely available under the BSD2 license. Source code as well as the Java application and library file can be downloaded from (with tutorial) and

    Human mutation 2014;35;5;548-55

  • The evolutionary dynamics of variant antigen genes in Babesia reveal a history of genomic innovation underlying host-parasite interaction.

    Jackson AP, Otto TD, Darby A, Ramaprasad A, Xia D, Echaide IE, Farber M, Gahlot S, Gamble J, Gupta D, Gupta Y, Jackson L, Malandrin L, Malas TB, Moussa E, Nair M, Reid AJ, Sanders M, Sharma J, Tracey A, Quail MA, Weir W, Wastling JM, Hall N, Willadsen P, Lingelbach K, Shiels B, Tait A, Berriman M, Allred DR and Pain A

    Department of Infection Biology, Institute of Infection and Global Health, University of Liverpool, Liverpool Science Park Ic2, 146 Brownlow Hill, Liverpool L3 5RF, UK

    Babesia spp. are tick-borne, intraerythrocytic hemoparasites that use antigenic variation to resist host immunity, through sequential modification of the parasite-derived variant erythrocyte surface antigen (VESA) expressed on the infected red blood cell surface. We identified the genomic processes driving antigenic diversity in genes encoding VESA (ves1) through comparative analysis within and between three Babesia species, (B. bigemina, B. divergens and B. bovis). Ves1 structure diverges rapidly after speciation, notably through the evolution of shortened forms (ves2) from 5' ends of canonical ves1 genes. Phylogenetic analyses show that ves1 genes are transposed between loci routinely, whereas ves2 genes are not. Similarly, analysis of sequence mosaicism shows that recombination drives variation in ves1 sequences, but less so for ves2, indicating the adoption of different mechanisms for variation of the two families. Proteomic analysis of the B. bigemina PR isolate shows that two dominant VESA1 proteins are expressed in the population, whereas numerous VESA2 proteins are co-expressed, consistent with differential transcriptional regulation of each family. Hence, VESA2 proteins are abundant and previously unrecognized elements of Babesia biology, with evolutionary dynamics consistently different to those of VESA1, suggesting that their functions are distinct.

    Nucleic acids research 2014;42;11;7113-31

  • Human stem cells for craniomaxillofacial reconstruction.

    Jalali M, Kirkpatrick WN, Cameron MG, Pauklin S and Vallier L

    1 Anne McLaren Laboratory for Regenerative Medicine, Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, University of Cambridge , Cambridge, United Kingdom .

    Human stem cell research represents an exceptional opportunity for regenerative medicine and the surgical reconstruction of the craniomaxillofacial complex. The correct architecture and function of the vastly diverse tissues of this important anatomical region are critical for life supportive processes, the delivery of senses, social interaction, and aesthetics. Craniomaxillofacial tissue loss is commonly associated with inflammatory responses of the surrounding tissue, significant scarring, disfigurement, and psychological sequelae as an inevitable consequence. The in vitro production of fully functional cells for skin, muscle, cartilage, bone, and neurovascular tissue formation from human stem cells, may one day provide novel materials for the reconstructive surgeon operating on patients with both hard and soft tissue deficit due to cancer, congenital disease, or trauma. However, the clinical translation of human stem cell technology, including the application of human pluripotent stem cells (hPSCs) in novel regenerative therapies, faces several hurdles that must be solved to permit safe and effective use in patients. The basic biology of hPSCs remains to be fully elucidated and concerns of tumorigenicity need to be addressed, prior to the development of cell transplantation treatments. Furthermore, functional comparison of in vitro generated tissue to their in vivo counterparts will be necessary for confirmation of maturity and suitability for application in reconstructive surgery. Here, we provide an overview of human stem cells in disease modeling, drug screening, and therapeutics, while also discussing the application of regenerative medicine for craniomaxillofacial tissue deficit and surgical reconstruction.

    Funded by: Medical Research Council: G0701448

    Stem cells and development 2014;23;13;1437-51

  • Histone deacetylase (HDAC) 1 and 2 are essential for accurate cell division and the pluripotency of embryonic stem cells.

    Jamaladdin S, Kelly RD, O'Regan L, Dovey OM, Hodson GE, Millard CJ, Portolano N, Fry AM, Schwabe JW and Cowley SM

    Department of Biochemistry, University of Leicester, Leicester LE1 9HN, United Kingdom; and.

    Histone deacetylases 1 and 2 (HDAC1/2) form the core catalytic components of corepressor complexes that modulate gene expression. In most cell types, deletion of both Hdac1 and Hdac2 is required to generate a discernible phenotype, suggesting their activity is largely redundant. We have therefore generated an ES cell line in which Hdac1 and Hdac2 can be inactivated simultaneously. Loss of HDAC1/2 resulted in a 60% reduction in total HDAC activity and a loss of cell viability. Cell death is dependent upon cell cycle progression, because differentiated, nonproliferating cells retain their viability. Furthermore, we observe increased mitotic defects, chromatin bridges, and micronuclei, suggesting HDAC1/2 are necessary for accurate chromosome segregation. Consistent with a critical role in the regulation of gene expression, microarray analysis of Hdac1/2-deleted cells reveals 1,708 differentially expressed genes. Significantly for the maintenance of stem cell self-renewal, we detected a reduction in the expression of the pluripotent transcription factors, Oct4, Nanog, Esrrb, and Rex1. HDAC1/2 activity is regulated through binding of an inositol tetraphosphate molecule (IP4) sandwiched between the HDAC and its cognate corepressor. This raises the important question of whether IP4 regulates the activity of the complex in cells. By rescuing the viability of double-knockout cells, we demonstrate for the first time (to our knowledge) that mutations that abolish IP4 binding reduce the activity of HDAC1/2 in vivo. Our data indicate that HDAC1/2 have essential and pleiotropic roles in cellular proliferation and regulate stem cell self-renewal by maintaining expression of key pluripotent transcription factors.

    Proceedings of the National Academy of Sciences of the United States of America 2014

  • A novel RCE1 isoform is required for H-Ras plasma membrane localization and is regulated by USP17.

    Jaworski J, Govender U, McFarlane C, de la Vega M, Greene MK, Rawlings ND, Johnston JA, Scott CJ and Burrows JF

    *School of Pharmacy, Queen's University Belfast, McClay Research Building, 97 Lisburn Road, Belfast BT9 7BL, U.K.

    Processing of the 'CaaX' motif found on the C-termini of many proteins, including the proto-oncogene Ras, requires the ER (endoplasmic reticulum)-resident protease RCE1 (Ras-converting enzyme 1) and is necessary for the proper localization and function of many of these 'CaaX' proteins. In the present paper, we report that several mammalian species have a novel isoform (isoform 2) of RCE1 resulting from an alternate splice site and producing an N-terminally truncated protein. We demonstrate that both RCE1 isoform 1 and the newly identified isoform 2 are required to reinstate proper H-Ras processing and thus plasma membrane localization in RCE1-null cells. In addition, we show that the deubiquitinating enzyme USP17 (ubiquitin-specific protease 17), previously shown to modulate RCE1 activity, can regulate the abundance and localization of isoform 2. Furthermore, we show that isoform 2 is ubiquitinated on Lys43 and deubiquitinated by USP17. Collectively, the findings of the present study indicate that RCE1 isoform 2 is required for proper 'CaaX' processing and that USP17 can regulate this via its modulation of RCE1 isoform 2 ubiquitination.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F013647/1

    The Biochemical journal 2014;457;2;289-300

  • The sheep genome illuminates biology of the rumen and lipid metabolism.

    Jiang Y, Xie M, Chen W, Talbot R, Maddox JF, Faraut T, Wu C, Muzny DM, Li Y, Zhang W, Stanton JA, Brauning R, Barris WC, Hourlier T, Aken BL, Searle SM, Adelson DL, Bian C, Cam GR, Chen Y, Cheng S, DeSilva U, Dixen K, Dong Y, Fan G, Franklin IR, Fu S, Fuentes-Utrilla P, Guan R, Highland MA, Holder ME, Huang G, Ingham AB, Jhangiani SN, Kalra D, Kovar CL, Lee SL, Liu W, Liu X, Lu C, Lv T, Mathew T, McWilliam S, Menzies M, Pan S, Robelin D, Servin B, Townley D, Wang W, Wei B, White SN, Yang X, Ye C, Yue Y, Zeng P, Zhou Q, Hansen JB, Kristiansen K, Gibbs RA, Flicek P, Warkup CC, Jones HE, Oddy VH, Nicholas FW, McEwan JC, Kijas JW, Wang J, Worley KC, Archibald AL, Cockett N, Xu X, Wang W and Dalrymple BP

    State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China. Commonwealth Scientific and Industrial Research Organisation Animal Food and Health Sciences, St Lucia, QLD 4067, Australia. College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China.

    Sheep (Ovis aries) are a major source of meat, milk, and fiber in the form of wool and represent a distinct class of animals that have a specialized digestive organ, the rumen, that carries out the initial digestion of plant material. We have developed and analyzed a high-quality reference sheep genome and transcriptomes from 40 different tissues. We identified highly expressed genes encoding keratin cross-linking proteins associated with rumen evolution. We also identified genes involved in lipid metabolism that had been amplified and/or had altered tissue expression patterns. This may be in response to changes in the barrier lipids of the skin, an interaction between lipid metabolism and wool synthesis, and an increased role of volatile fatty acids in ruminants compared with nonruminant animals.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/1025360/1, BB/I025328/1, BB/I025360/1, BB/I025506/1; Wellcome Trust: WT095908, WT098051

    Science (New York, N.Y.) 2014;344;6188;1168-73

  • RNA-seq Analysis of Host and Viral Gene Expression Highlights Interaction between Varicella Zoster Virus and Keratinocyte Differentiation.

    Jones M, Dry IR, Frampton D, Singh M, Kanda RK, Yee MB, Kellam P, Hollinshead M, Kinchington PR, O'Toole EA and Breuer J

    Division of Infection and Immunity, University College London, London, United Kingdom.

    Varicella zoster virus (VZV) is the etiological agent of chickenpox and shingles, diseases characterized by epidermal skin blistering. Using a calcium-induced keratinocyte differentiation model we investigated the interaction between epidermal differentiation and VZV infection. RNA-seq analysis showed that VZV infection has a profound effect on differentiating keratinocytes, altering the normal process of epidermal gene expression to generate a signature that resembles patterns of gene expression seen in both heritable and acquired skin-blistering disorders. Further investigation by real-time PCR, protein analysis and electron microscopy revealed that VZV specifically reduced expression of specific suprabasal cytokeratins and desmosomal proteins, leading to disruption of epidermal structure and function. These changes were accompanied by an upregulation of kallikreins and serine proteases. Taken together VZV infection promotes blistering and desquamation of the epidermis, both of which are necessary to the viral spread and pathogenesis. At the same time, analysis of the viral transcriptome provided evidence that VZV gene expression was significantly increased following calcium treatment of keratinocytes. Using reporter viruses and immunohistochemistry we confirmed that VZV gene and protein expression in skin is linked with cellular differentiation. These studies highlight the intimate host-pathogen interaction following VZV infection of skin and provide insight into the mechanisms by which VZV remodels the epidermal environment to promote its own replication and spread.

    PLoS pathogens 2014;10;1;e1003896

  • The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data.

    Köhler S, Doelken SC, Mungall CJ, Bauer S, Firth HV, Bailleul-Forestier I, Black GC, Brown DL, Brudno M, Campbell J, Fitzpatrick DR, Eppig JT, Jackson AP, Freson K, Girdea M, Helbig I, Hurst JA, Jähn J, Jackson LG, Kelly AM, Ledbetter DH, Mansour S, Martin CL, Moss C, Mumford A, Ouwehand WH, Park SM, Riggs ER, Scott RH, Sisodiya S, Vooren SV, Wapner RJ, Wilkie AO, Wright CF, Vulto-van Silfhout AT, Leeuw Nd, de Vries BB, Washingthon NL, Smith CL, Westerfield M, Schofield P, Ruef BJ, Gkoutos GV, Haendel M, Smedley D, Lewis SE and Robinson PN

    Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany, Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany, Lawrence Berkeley National Laboratory, Mail Stop 84R0171, Berkeley, CA 94720, USA, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK, Department of Medical Genetics, Cambridge University Addenbrooke's Hospital, Cambridge CB2 2QQ, UK, Université Paul Sabatier, Faculté de Chirurgie Dentaire, CHU Toulouse, France, Centre for Genomic Medicine, Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Sciences Centre (MAHSC), Manchester, UK, Centre for Genomic Medicine, Institute of Human Development, Faculty of Medical and Human Sciences, University of Manchester, MAHSC, Manchester M13 9WL, UK, Institute of Genetic Medicine. Newcastle University, Central Parkway, Newcastle upon Tyne, NE1 3BZ, UK, Department of Computer Science, University of Toronto, Ontario, Canada, Centre for Computational Medicine, Hospital for Sick Children, Toronto, Ontario, Canada, Department of Clinical Genetics, Leeds Teaching Hospitals NHS Trust, Leeds LS2 9NS, UK, MRC Human Genetics Unit, MRC Institute of Genetic and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK, The Jackson Laboratory, Bar Harbor, ME 04609, USA, Center for Molecular and Vascular Biology, University of Leuven, Belgium, Department of Neuropediatrics, University Medical Center Schleswig-Holstein, Kiel Campus, 24105 Kiel, Germany, NE Thames Genetics Service, Great Ormond Street Hospital, London WC1N 3JH, UK, Drexel University College of Medicine, Philadelphia, PA 19102, USA, Department of Haematology, University of Cambridge and NHS Blood and Transplant Cambridge, CB2 0PT Cambridge, UK, Autism and Developmental Medicine Institute, Geisinger Health System

    The Human Phenotype Ontology (HPO) project, available at, provides a structured, comprehensive and well-defined set of 10,088 classes (terms) describing human phenotypic abnormalities and 13,326 subclass relations between the HPO classes. In addition we have developed logical definitions for 46% of all HPO classes using terms from ontologies for anatomy, cell types, function, embryology, pathology and other domains. This allows interoperability with several resources, especially those containing phenotype information on model organisms such as mouse and zebrafish. Here we describe the updated HPO database, which provides annotations of 7,278 human hereditary syndromes listed in OMIM, Orphanet and DECIPHER to classes of the HPO. Various meta-attributes such as frequency, references and negations are associated with each annotation. Several large-scale projects worldwide utilize the HPO for describing phenotype information in their datasets. We have therefore generated equivalence mappings to other phenotype vocabularies such as LDDB, Orphanet, MedDRA, UMLS and phenoDB, allowing integration of existing datasets and interoperability with multiple biomedical resources. We have created various ways to access the HPO database content using flat files, a MySQL database, and Web-based tools. All data and documentation on the HPO project can be found online.

    Funded by: NIH HHS: R24 OD011883

    Nucleic acids research 2014;42;1;D966-74

  • Genetic diversity within Mycobacterium tuberculosis complex impacts on the accuracy of genotypic pyrazinamide drug-susceptibility assay.

    Köser CU, Comas I, Feuerriegel S, Niemann S, Gagneux S and Peacock SJ

    Department of Medicine, University of Cambridge, Cambridge, United Kingdom. Electronic address:

    Tuberculosis (Edinburgh, Scotland) 2014;94;4;451-3

  • Whole-genome sequencing to control antimicrobial resistance.

    Köser CU, Ellington MJ and Peacock SJ

    Department of Medicine, University of Cambridge, Cambridge, UK. Electronic address:

    Following recent improvements in sequencing technologies, whole-genome sequencing (WGS) is positioned to become an essential tool in the control of antibiotic resistance, a major threat in modern healthcare. WGS has already found numerous applications in this area, ranging from the development of novel antibiotics and diagnostic tests through to antibiotic stewardship of currently available drugs via surveillance and the elucidation of the factors that allow the emergence and persistence of resistance. Numerous proof-of-principle studies have also highlighted the value of WGS as a tool for day-to-day infection control and, for some pathogens, as a primary diagnostic tool to detect antibiotic resistance. However, appropriate data analysis platforms will need to be developed before routine WGS can be introduced on a large scale.

    Trends in genetics : TIG 2014

  • Rapid single-colony whole-genome sequencing of bacterial pathogens.

    Köser CU, Fraser LJ, Ioannou A, Becq J, Ellington MJ, Holden MT, Reuter S, Török ME, Bentley SD, Parkhill J, Gormley NA, Smith GP and Peacock SJ

    Department of Medicine, University of Cambridge, Cambridge, UK.

    Objectives: As a result of the introduction of rapid benchtop sequencers, the time required to subculture a bacterial pathogen to extract sufficient DNA for library preparation can now exceed the time to sequence said DNA. We have eliminated this rate-limiting step by developing a protocol to generate DNA libraries for whole-genome sequencing directly from single bacterial colonies grown on primary culture plates.

    Methods: We developed our protocol using single colonies of 17 bacterial pathogens responsible for severe human infection that were grown using standard diagnostic media and incubation conditions. We then applied this method to four clinical scenarios that currently require time-consuming reference laboratory tests: full identification and genotyping of salmonellae; identification of blaNDM-1, a highly transmissible carbapenemase resistance gene, in Klebsiella pneumoniae; detection of genes encoding staphylococcal toxins associated with specific disease syndromes; and monitoring of vaccine targets to detect vaccine escape in Neisseria meningitidis.

    Results: We validated our single-colony whole-genome sequencing protocol for all 40 combinations of pathogen and selective, non-selective or indicator media tested in this study. Moreover, we demonstrated the clinical value of this method compared with current reference laboratory tests.

    Conclusions: This advance will facilitate the implementation of whole-genome sequencing into diagnostic and public health microbiology.

    The Journal of antimicrobial chemotherapy 2014;69;5;1275-81

  • A transcriptional switch underlies commitment to sexual development in malaria parasites.

    Kafsack BF, Rovira-Graells N, Clark TG, Bancells C, Crowley VM, Campino SG, Williams AE, Drought LG, Kwiatkowski DP, Baker DA, Cortés A and Llinás M

    1] Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA [2] Division of Basic Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA (B.F.C.K.); Department of Molecular Biology and Center for Infectious Disease Dynamics, The Pennsylvania State University, State College, Pennsylvania 16802, USA (V.M.C., M.L.).

    The life cycles of many parasites involve transitions between disparate host species, requiring these parasites to go through multiple developmental stages adapted to each of these specialized niches. Transmission of malaria parasites (Plasmodium spp.) from humans to the mosquito vector requires differentiation from asexual stages replicating within red blood cells into non-dividing male and female gametocytes. Although gametocytes were first described in 1880, our understanding of the molecular mechanisms involved in commitment to gametocyte formation is extremely limited, and disrupting this critical developmental transition remains a long-standing goal. Here we show that expression levels of the DNA-binding protein PfAP2-G correlate strongly with levels of gametocyte formation. Using independent forward and reverse genetics approaches, we demonstrate that PfAP2-G function is essential for parasite sexual differentiation. By combining genome-wide PfAP2-G cognate motif occurrence with global transcriptional changes resulting from PfAP2-G ablation, we identify early gametocyte genes as probable targets of PfAP2-G and show that their regulation by PfAP2-G is critical for their wild-type level expression. In the asexual blood-stage parasites pfap2-g appears to be among a set of epigenetically silenced loci prone to spontaneous activation. Stochastic activation presents a simple mechanism for a low baseline of gametocyte production. Overall, these findings identify PfAP2-G as a master regulator of sexual-stage development in malaria parasites and mark the first discovery of a transcriptional switch controlling a differentiation decision in protozoan parasites.

    Funded by: Biotechnology and Biological Sciences Research Council; Howard Hughes Medical Institute; Medical Research Council: G0600230, J005398; NIAID NIH HHS: R01 AI076276; NIGMS NIH HHS: P50GM071508; Wellcome Trust: 090532, 090532/Z/09/Z, 094752, 098051

    Nature 2014;507;7491;248-52

  • Antibacterial resistance in sub-Saharan Africa: an underestimated emergency.

    Kariuki S and Dougan G

    Centre for Microbiology Research, Kenya Medical Research Institute, Nairobi, Kenya; Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    Antibacterial resistance-associated infections are known to increase morbidity, mortality, and cost of treatment, and to potentially put others in the community at higher risk of infections. In high-income countries, where the burden of infectious diseases is relatively modest, resistance to first-line antibacterial agents is usually overcome by use of second- and third-line agents. However, in developing countries where the burden of infectious diseases is high, patients with antibacterial-resistant infections may be unable to obtain or afford effective second-line treatments. In sub-Saharan Africa (SSA), the situation is aggravated by poor hygiene, unreliable water supplies, civil conflicts, and increasing numbers of immunocompromised people, such as those with HIV, which facilitate both the evolution of resistant pathogens and their rapid spread in the community. Because of limited capacity for disease detection and surveillance, the burden of illnesses due to treatable bacterial infections, their specific etiologies, and the awareness of antibacterial resistance are less well established in most of SSA, and therefore the ability to mitigate their consequences is significantly limited.

    Annals of the New York Academy of Sciences 2014

  • Natural selection and infectious disease in human populations.

    Karlsson EK, Kwiatkowski DP and Sabeti PC

    1] Center for Systems Biology, Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138, USA. [2] Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142, USA.

    The ancient biological 'arms race' between microbial pathogens and humans has shaped genetic variation in modern populations, and this has important implications for the growing field of medical genomics. As humans migrated throughout the world, populations encountered distinct pathogens, and natural selection increased the prevalence of alleles that are advantageous in the new ecosystems in both host and pathogens. This ancient history now influences human infectious disease susceptibility and microbiome homeostasis, and contributes to common diseases that show geographical disparities, such as autoimmune and metabolic disorders. Using new high-throughput technologies, analytical methods and expanding public data resources, the investigation of natural selection is leading to new insights into the function and dysfunction of human biology.

    Nature reviews. Genetics 2014

  • Kdm3a lysine demethylase is an Hsp90 client required for cytoskeletal rearrangements during spermatogenesis.

    Kasioulis I, Syred HM, Tate P, Finch A, Shaw J, Seawright A, Fuszard M, Botting CH, Shirran S, Adams IR, Jackson IJ, van Heyningen V and Yeyati PL

    MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, EH4 2XU, UK Edinburgh Cancer Research UK Centre, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, EH4 2XU, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1HH, UK Biomedical Sciences Research Complex Mass Spectrometry and Proteomics Facility, BMS Annexe, North Haugh, University of St Andrews, St Andrews, Fife, KY16 9ST, UK.

    The lysine demethylase Kdm3a (Jhdm2a, Jmjd1a) is required for male fertility, sex determination and metabolic homeostasis through its nuclear role in chromatin remodeling. Many histone-modifying enzymes have additional non-histone substrates, as well as non-enzymatic functions, contributing to the full spectrum of events underlying their biological roles. Here, we present two Kdm3a mouse models that exhibit cytoplasmic defects that may account in part for the globozoospermia phenotype reported previously. Electron microscopy revealed abnormal acrosome, manchette and the absence of implantation fossa at the caudal end of the nucleus in mice without Kdm3a demethylase activity, thus affecting cytoplasmic structures required to elongate the sperm head. We describe an enzymatically active new Kdm3a isoform and show that subcellular distribution, protein levels and lysine demethylation activity of Kdm3a depended on Hsp90. We show that Kdm3a localizes to cytoplasmic structures of maturing spermatids affected in Kdm3a mutant mice which in turn display altered fractionation of β-actin and γ-tubulin. Kdm3a is therefore a multi-functional Hsp90 client protein that participates directly in the regulation of cytoskeletal components.

    Molecular biology of the cell 2014

  • Managing clinically significant findings in research: the UK10K example.

    Kaye J, Hurles M, Griffin H, Grewal J, Bobrow M, Timpson N, Smee C, Bolton P, Durbin R, Dyke S, Fitzpatrick D, Kennedy K, Kent A, Muddyman D, Muntoni F, Raymond LF, Semple R and Spector T

    Nuffield Department of Population Health, HeLEX - Centre for Health, Law and Emerging Technologies, University of Oxford, Oxford, UK.

    Recent advances in sequencing technology allow data on the human genome to be generated more quickly and in greater detail than ever before. Such detail includes findings that may be of significance to the health of the research participant involved. Although research studies generally do not feed back information on clinically significant findings (CSFs) to participants, this stance is increasingly being questioned. There may be difficulties and risks in feeding clinically significant information back to research participants, however, the UK10K consortium sought to address these by creating a detailed management pathway. This was not intended to create any obligation upon the researchers to feed back any CSFs they discovered. Instead, it provides a mechanism to ensure that any such findings can be passed on to the participant where appropriate. This paper describes this mechanism and the specific criteria, which must be fulfilled in order for a finding and participant to qualify for feedback. This mechanism could be used by future research consortia, and may also assist in the development of sound principles for dealing with CSFs.European Journal of Human Genetics advance online publication, 15 January 2014; doi:10.1038/ejhg.2013.290.

    European journal of human genetics : EJHG 2014

  • Identification of structural variation in mouse genomes.

    Keane TM, Wong K, Adams DJ, Flint J, Reymond A and Yalcin B

    Wellcome Trust Sanger Institute Hinxton, Cambridge, UK.

    Structural variation is variation in structure of DNA regions affecting DNA sequence length and/or orientation. It generally includes deletions, insertions, copy-number gains, inversions, and transposable elements. Traditionally, the identification of structural variation in genomes has been challenging. However, with the recent advances in high-throughput DNA sequencing and paired-end mapping (PEM) methods, the ability to identify structural variation and their respective association to human diseases has improved considerably. In this review, we describe our current knowledge of structural variation in the mouse, one of the prime model systems for studying human diseases and mammalian biology. We further present the evolutionary implications of structural variation on transposable elements. We conclude with future directions on the study of structural variation in mouse genomes that will increase our understanding of molecular architecture and functional consequences of structural variation.

    Frontiers in genetics 2014;5;192

  • The origins of malaria: there are more things in heaven and earth …

    Keeling PJ and Rayner JC

    Department of Botany, Canadian Institute for Advanced Research, Evolutionary Biology Program,University of British Columbia,Vancouver, BC V6T 1Z4,Canada.

    SUMMARY Malaria remains one of the most significant global public health burdens, with nearly half of the world's population at risk of infection. Malaria is not however a monolithic disease - it can be caused by multiple different parasite species of the Plasmodium genus, each of which can induce different symptoms and pathology, and which pose quite different challenges for control. Furthermore, malaria is in no way restricted to humans. There are Plasmodium species that have adapted to infect most warm-blooded vertebrate species, and the genus as a whole is both highly successful and highly diverse. How, where and when human malaria parasites originated from within this diversity has long been a subject of fascination and sometimes also controversy. The past decade has seen the publication of a number of important discoveries about malaria parasite origins, all based on the application of molecular diagnostic tools to new sources of samples. This review summarizes some of those recent discoveries and discusses their implication for our current understanding of the origin and evolution of the Plasmodium genus. The nature of these discoveries and the manner in which they are made are then used to lay out a series of opportunities and challenges for the next wave of parasite hunters.

    Parasitology 2014;1-10

  • Expression of phosphofructokinase in skeletal muscle is influenced by genetic variation and associated with insulin sensitivity.

    Keildson S, Fadista J, Ladenvall C, Hedman AK, Elgzyri T, Small KS, Grundberg E, Nica AC, Glass D, Richards JB, Barrett A, Nisbet J, Zheng HF, Rönn T, Ström K, Eriksson KF, Prokopenko I, MAGIC Consortium, DIAGRAM Consortium, MuTHER Consortium, Spector TD, Dermitzakis ET, Deloukas P, McCarthy MI, Rung J, Groop L, Franks PW, Lindgren CM and Hansson O

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, U.K.

    Using an integrative approach in which genetic variation, gene expression, and clinical phenotypes are assessed in relevant tissues may help functionally characterize the contribution of genetics to disease susceptibility. We sought to identify genetic variation influencing skeletal muscle gene expression (expression quantitative trait loci [eQTLs]) as well as expression associated with measures of insulin sensitivity. We investigated associations of 3,799,401 genetic variants in expression of >7,000 genes from three cohorts (n = 104). We identified 287 genes with cis-acting eQTLs (false discovery rate [FDR] <5%; P < 1.96 × 10(-5)) and 49 expression-insulin sensitivity phenotype associations (i.e., fasting insulin, homeostasis model assessment-insulin resistance, and BMI) (FDR <5%; P = 1.34 × 10(-4)). One of these associations, fasting insulin/phosphofructokinase (PFKM), overlaps with an eQTL. Furthermore, the expression of PFKM, a rate-limiting enzyme in glycolysis, was nominally associated with glucose uptake in skeletal muscle (P = 0.026; n = 42) and overexpressed (Bonferroni-corrected P = 0.03) in skeletal muscle of patients with T2D (n = 102) compared with normoglycemic controls (n = 87). The PFKM eQTL (rs4547172; P = 7.69 × 10(-6)) was nominally associated with glucose uptake, glucose oxidation rate, intramuscular triglyceride content, and metabolic flexibility (P = 0.016-0.048; n = 178). We explored eQTL results using published data from genome-wide association studies (DIAGRAM and MAGIC), and a proxy for the PFKM eQTL (rs11168327; r(2) = 0.75) was nominally associated with T2D (DIAGRAM P = 2.7 × 10(-3)). Taken together, our analysis highlights PFKM as a potential regulator of skeletal muscle insulin sensitivity.

    Diabetes 2014;63;3;1154-65

  • The Impact of Different DNA Extraction Kits and Laboratories upon the Assessment of Human Gut Microbiota Composition by 16S rRNA Gene Sequencing.

    Kennedy NA, Walker AW, Berry SH, Duncan SH, Farquarson FM, Louis P, Thomson JM, Other members not named within the manuscript author list (alphabetical by surname):, Satsangi J, Flint HJ, Parkhill J, Lees CW and Hold GL

    Gastrointestinal Unit, Centre for Genomic and Experimental Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom.

    Introduction: Determining bacterial community structure in fecal samples through DNA sequencing is an important facet of intestinal health research. The impact of different commercially available DNA extraction kits upon bacterial community structures has received relatively little attention. The aim of this study was to analyze bacterial communities in volunteer and inflammatory bowel disease (IBD) patient fecal samples extracted using widely used DNA extraction kits in established gastrointestinal research laboratories.

    Methods: Fecal samples from two healthy volunteers (H3 and H4) and two relapsing IBD patients (I1 and I2) were investigated. DNA extraction was undertaken using MoBio Powersoil and MP Biomedicals FastDNA SPIN Kit for Soil DNA extraction kits. PCR amplification for pyrosequencing of bacterial 16S rRNA genes was performed in both laboratories on all samples. Hierarchical clustering of sequencing data was done using the Yue and Clayton similarity coefficient.

    Results: DNA extracted using the FastDNA kit and the MoBio kit gave median DNA concentrations of 475 (interquartile range 228-561) and 22 (IQR 9-36) ng/µL respectively (p<0.0001). Hierarchical clustering of sequence data by Yue and Clayton coefficient revealed four clusters. Samples from individuals H3 and I2 clustered by patient; however, samples from patient I1 extracted with the MoBio kit clustered with samples from patient H4 rather than the other I1 samples. Linear modelling on relative abundance of common bacterial families revealed significant differences between kits; samples extracted with MoBio Powersoil showed significantly increased Bacteroidaceae, Ruminococcaceae and Porphyromonadaceae, and lower Enterobacteriaceae, Lachnospiraceae, Clostridiaceae, and Erysipelotrichaceae (p<0.05).

    Conclusion: This study demonstrates significant differences in DNA yield and bacterial DNA composition when comparing DNA extracted from the same fecal sample with different extraction kits. This highlights the importance of ensuring that samples in a study are prepared with the same method, and the need for caution when cross-comparing studies that use different methods.

    PloS one 2014;9;2;e88982

  • Insertions in the OCL1 locus of Acinetobacter baumannii lead to shortened lipooligosaccharides.

    Kenyon JJ, Holt KE, Pickard D, Dougan G and Hall RM

    School of Molecular Bioscience, The University of Sydney, New South Wales, Australia. Electronic address:

    Genomes of 82 Acinetobacter baumannii global clones 1 (GC1) and 2 (GC2) isolates were sequenced and different forms of the locus predicted to direct synthesis of the outer core (OC) of the lipooligosaccharide were identified. OCL1 was in all GC2 genomes, whereas GC1 isolates carried OCL1, OCL3 or a new locus, OCL5. Three mutants in which an insertion sequence (ISAba1 or ISAba23) interrupted OCL1 were identified. Isolates with OCL1 intact produced only lipooligosaccharide, while the mutants produced lipooligosaccharide of reduced molecular weight. Thus, the assignment of the OC locus as that responsible for the synthesis of the OC is correct.

    Research in microbiology 2014

  • Ensembl Genomes 2013: scaling up access to genome-wide data.

    Kersey PJ, Allen JE, Christensen M, Davis P, Falin LJ, Grabmueller C, Hughes DS, Humphrey J, Kerhornou A, Khobova J, Langridge N, McDowall MD, Maheswari U, Maslen G, Nuhn M, Ong CK, Paulini M, Pedro H, Toneva I, Tuli MA, Walts B, Williams G, Wilson D, Youens-Clark K, Monaco MK, Stein J, Wei X, Ware D, Bolser DM, Howe KL, Kulesha E, Lawson D and Staines DM

    The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Wellcome Trust Sanger Centre, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK, Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, NY 11724, USA and USDA-ARS, Cornell University, Ithaca, NY, 14853, USA.

    Ensembl Genomes ( is an integrating resource for genome-scale data from non-vertebrate species. The project exploits and extends technologies for genome annotation, analysis and dissemination, developed in the context of the vertebrate-focused Ensembl project, and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. This article provides an update to the previous publications about the resource, with a focus on recent developments. These include the addition of important new genomes (and related data sets) including crop plants, vectors of human disease and eukaryotic pathogens. In addition, the resource has scaled up its representation of bacterial genomes, and now includes the genomes of over 9000 bacteria. Specific extensions to the web and programmatic interfaces have been developed to support users in navigating these large data sets. Looking forward, analytic tools to allow targeted selection of data for visualization and download are likely to become increasingly important in future as the number of available genomes increases within all domains of life, and some of the challenges faced in representing bacterial data are likely to become commonplace for eukaryotes in future.

    Nucleic acids research 2014;42;1;D546-52

  • Cancer mouse models: Past, present and future.

    Khaled WT and Liu P

    Department of Pharmacology, University of Cambridge, Cambridge CB2 1PD, UK. Electronic address:

    The development and advances in gene targeting technology over the past three decades has facilitated the generation of cancer mouse models that recapitulate features of human malignancies. These models have been and still remain instrumental in revealing the complexities of human cancer biology. However, they will need to evolve in the post-genomic era of cancer research. In this review we will highlight some of the key developments over the past decades and will discuss the new possibilities of cancer mouse models in the light of emerging powerful gene manipulating tools.

    Seminars in cell & developmental biology 2014

  • A novel method for detecting uniparental disomy from trio genotypes identifies a significant excess in children with developmental disorders.

    King DA, Fitzgerald TW, Miller R, Canham N, Clayton-Smith J, Johnson D, Mansour S, Stewart F, Vasudevan P, Hurles ME and DDD Study

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, United Kingdom;

    Exome sequencing of parent-offspring trios is a popular strategy for identifying causative genetic variants in children with rare diseases. This method owes its strength to the leveraging of inheritance information, which facilitates de novo variant calling, inference of compound heterozygosity, and the identification of inheritance anomalies. Uniparental disomy describes the inheritance of a homologous chromosome pair from only one parent. This aberration is important to detect in genetic disease studies because it can result in imprinting disorders and recessive diseases. We have developed a software tool to detect uniparental disomy from child-mother-father genotype data that uses a binomial test to identify chromosomes with a significant burden of uniparentally inherited genotypes. This tool is the first to read VCF-formatted genotypes, to perform integrated copy number filtering, and to use a statistical test inherently robust for use in platforms of varying genotyping density and noise characteristics. Simulations demonstrated superior accuracy compared with previously developed approaches. We implemented the method on 1057 trios from the Deciphering Developmental Disorders project, a trio-based rare disease study, and detected six validated events, a significant enrichment compared with the population prevalence of UPD (1 in 3500), suggesting that most of these events are pathogenic. One of these events represents a known imprinting disorder, and exome analyses have identified rare homozygous candidate variants, mainly in the isodisomic regions of UPD chromosomes, which, among other variants, provide targets for further genetic and functional evaluation.

    Funded by: Wellcome Trust: 076113, WT098051

    Genome research 2014;24;4;673-87

  • Determinants of invasiveness beneath the capsule of the pneumococcus.

    Klugman KP, Bentley SD and McGee L

    Department of Global Health, Emory University.

    The Journal of infectious diseases 2014;209;3;321-2

  • USP28 Is Recruited to Sites of DNA Damage by the Tandem BRCT Domains of 53BP1 but Plays a Minor Role in Double-Strand Break Metabolism.

    Knobel PA, Belotserkovskaya R, Galanty Y, Schmidt CK, Jackson SP and Stracker TH

    Institute for Research in Biomedicine (IRB Barcelona), Barcelona, Spain.

    The DNA damage response (DDR) is critical for genome stability and the suppression of a wide variety of human malignancies, including neurodevelopmental disorders, immunodeficiency, and cancer. In addition, the efficacy of many chemotherapeutic strategies is dictated by the status of the DDR. Ubiquitin-specific protease 28 (USP28) was reported to govern the stability of multiple factors that are critical for diverse aspects of the DDR. Here, we examined the effects of USP28 depletion on the DDR in cells and in vivo. We found that USP28 is recruited to double-strand breaks in a manner that requires the tandem BRCT domains of the DDR protein 53BP1. However, we observed only minor DDR defects in USP28-depleted cells, and mice lacking USP28 showed normal longevity, immunological development, and radiation responses. Our results thus indicate that USP28 is not a critical factor in double-strand break metabolism and is unlikely to be an attractive target for therapeutic intervention aimed at chemotherapy sensitization.

    Molecular and cellular biology 2014;34;11;2062-74

  • Confinement and Deformation of Single Cells and Their Nuclei Inside Size-Adapted Microtubes.

    Koch B, Sanchez S, Schmidt CK, Swiersy A, Jackson SP and Schmidt OG

    Institute for Integrative Nanosciences, IFW Dresden, Helmholtzstraße 20, Dresden, D-01069, Germany.

    Rolled-up transparent microtubes are shown to serve as cell culture scaffolds that exactly define the space available for single cell growth. Human U2OS osteosarcoma cells are confined within microtubes of different diameters and the effects of the cell deformation on the integrity of the DNA and cell survival are studied.

    Advanced healthcare materials 2014

  • A Comparison of Peak Callers Used for DNase-Seq Data.

    Koohy H, Down TA, Spivakov M and Hubbard T

    The Babraham Institute, Babraham Research Campus, Cambridge, United Kingdom; Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom.

    Genome-wide profiling of open chromatin regions using DNase I and high-throughput sequencing (DNase-seq) is an increasingly popular approach for finding and studying regulatory elements. A variety of algorithms have been developed to identify regions of open chromatin from raw sequence-tag data, which has motivated us to assess and compare their performance. In this study, four published, publicly available peak calling algorithms used for DNase-seq data analysis (F-seq, Hotspot, MACS and ZINBA) are assessed at a range of signal thresholds on two published DNase-seq datasets for three cell types. The results were benchmarked against an independent dataset of regulatory regions derived from ENCODE in vivo transcription factor binding data for each particular cell type. The level of overlap between peak regions reported by each algorithm and this ENCODE-derived reference set was used to assess sensitivity and specificity of the algorithms. Our study suggests that F-seq has a slightly higher sensitivity than the next best algorithms. Hotspot and the ChIP-seq oriented method, MACS, both perform competitively when used with their default parameters. However the generic peak finder ZINBA appears to be less sensitive than the other three. We also assess accuracy of each algorithm over a range of signal thresholds. In particular, we show that the accuracy of F-Seq can be considerably improved by using a threshold setting that is different from the default value.

    PloS one 2014;9;5;e96303

  • The International Mouse Phenotyping Consortium Web Portal, a unified point of access for knockout mice and related phenotyping data.

    Koscielny G, Yaikhom G, Iyer V, Meehan TF, Morgan H, Atienza-Herrero J, Blake A, Chen CK, Easty R, Di Fenza A, Fiegel T, Grifiths M, Horne A, Karp NA, Kurbatova N, Mason JC, Matthews P, Oakley DJ, Qazi A, Regnart J, Retha A, Santos LA, Sneddon DJ, Warren J, Westerberg H, Wilson RJ, Melvin DG, Smedley D, Brown SD, Flicek P, Skarnes WC, Mallon AM and Parkinson H

    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Medical Research Council Harwell (Mammalian Genetics Unit and Mary Lyon Centre), Harwell, Oxfordshire OX11 0RD, UK and Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    The International Mouse Phenotyping Consortium (IMPC) web portal ( provides the biomedical community with a unified point of access to mutant mice and rich collection of related emerging and existing mouse phenotype data. IMPC mouse clinics worldwide follow rigorous highly structured and standardized protocols for the experimentation, collection and dissemination of data. Dedicated 'data wranglers' work with each phenotyping center to collate data and perform quality control of data. An automated statistical analysis pipeline has been developed to identify knockout strains with a significant change in the phenotype parameters. Annotation with biomedical ontologies allows biologists and clinicians to easily find mouse strains with phenotypic traits relevant to their research. Data integration with other resources will provide insights into mammalian gene function and human disease. As phenotype data become available for every gene in the mouse, the IMPC web portal will become an invaluable tool for researchers studying the genetic contributions of genes to human diseases.

    Nucleic acids research 2014;42;1;D802-9

  • Clinical and molecular characterization of a novel PLIN1 frameshift mutation identified in patients with familial partial lipodystrophy.

    Kozusko K, Tsang V, Bottomley W, Cho Y, Gandotra S, Mimmack M, Lim K, Isaac I, Patel S, Saudek V, O'Rahilly S, Srinivasan S, Greenfield J, Barroso I, Campbell L and Savage D

    University of Cambridge Metabolic Research Laboratories, Wellcome Trust-Medical Research Council Institute of Metabolic Science, University of Cambridge, UK.

    Perilipin-1 is a lipid droplet coat protein predominantly expressed in adipocytes, where it inhibits basal and facilitates stimulated lipolysis. Loss-of-function mutations in PLIN1 were recently reported in patients with a novel subtype of familial partial lipodystrophy, designated as FPLD4. We now report the identification and characterization of a novel heterozygous frameshift mutation affecting the carboxy-terminus (439fs) of perilipin-1 in two unrelated families. The mutation co-segregated with a similar phenotype including partial lipodystrophy, severe insulin resistance and type 2 diabetes, extreme hypertriglyceridaemia and non-alcoholic fatty liver disease in both families. Poor metabolic control despite maximal medical therapy prompted two patients to undergo bariatric surgery, with remarkably beneficial consequences. Functional studies indicated that expression levels of the mutant protein were lower than wild type protein and in stably tranfected pre-adipocytes the mutant protein was associated with smaller lipid droplets. Interestingly, unlike the previously reported 398 and 404 frameshift mutants, this variant binds and stabilizes ABHD5 expression, but still fails to inhibit basal lipolysis as effectively as wild type perilipin-1. Collectively, these findings highlight the physiological need for exquisite regulation of neutral lipid storage within adipocyte lipid droplets, as well as the possible metabolic benefits of bariatric surgery in this serious disease.

    Diabetes 2014

  • A linguistically informed autosomal STR survey of human populations residing in the greater Himalayan region.

    Kraaijenbrink T, van der Gaag KJ, Zuniga SB, Xue Y, Carvalho-Silva DR, Tyler-Smith C, Jobling MA, Parkin EJ, Su B, Shi H, Xiao CJ, Tang WR, Kashyap VK, Trivedi R, Sitalaximi T, Banerjee J, Karma Tshering of Gaselô, Tuladhar NM, Opgenort JR, van Driem GL, Barbujani G and de Knijff P

    MGC Department of Human and Clinical Genetics, Leiden University Medical Centre, Leiden, the Netherlands.

    The greater Himalayan region demarcates two of the most prominent linguistic phyla in Asia: Tibeto-Burman and Indo-European. Previous genetic surveys, mainly using Y-chromosome polymorphisms and/or mitochondrial DNA polymorphisms suggested a substantially reduced geneflow between populations belonging to these two phyla. These studies, however, have mainly focussed on populations residing far to the north and/or south of this mountain range, and have not been able to study geneflow patterns within the greater Himalayan region itself. We now report a detailed, linguistically informed, genetic survey of Tibeto-Burman and Indo-European speakers from the Himalayan countries Nepal and Bhutan based on autosomal microsatellite markers and compare these populations with surrounding regions. The genetic differentiation between populations within the Himalayas seems to be much higher than between populations in the neighbouring countries. We also observe a remarkable genetic differentiation between the Tibeto-Burman speaking populations on the one hand and Indo-European speaking populations on the other, suggesting that language and geography have played an equally large role in defining the genetic composition of present-day populations within the Himalayas.

    Funded by: Wellcome Trust: 087576, WT 087576, WT 098051

    PloS one 2014;9;3;e91534

  • High risk population isolate reveals low frequency variants predisposing to intracranial aneurysms.

    Kurki MI, Gaál EI, Kettunen J, Lappalainen T, Menelaou A, Anttila V, van 't Hof FN, von Und Zu Fraunberg M, Helisalmi S, Hiltunen M, Lehto H, Laakso A, Kivisaari R, Koivisto T, Ronkainen A, Rinne J, Kiemeney LA, Vermeulen SH, Kaunisto MA, Eriksson JG, Aromaa A, Perola M, Lehtimäki T, Raitakari OT, Salomaa V, Gunel M, Dermitzakis ET, Ruigrok YM, Rinkel GJ, Niemelä M, Hernesniemi J, Ripatti S, de Bakker PI, Palotie A and Jääskeläinen JE

    Neurosurgery, NeuroCenter, Kuopio University Hospital, Kuopio, Finland ; Neurosurgery, Institute of Clinical Medicine, University of Eastern Finland, Kuopio, Finland ; Department of Neurobiology, A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio, Finland.

    3% of the population develops saccular intracranial aneurysms (sIAs), a complex trait, with a sporadic and a familial form. Subarachnoid hemorrhage from sIA (sIA-SAH) is a devastating form of stroke. Certain rare genetic variants are enriched in the Finns, a population isolate with a small founder population and bottleneck events. As the sIA-SAH incidence in Finland is >2× increased, such variants may associate with sIA in the Finnish population. We tested 9.4 million variants for association in 760 Finnish sIA patients (enriched for familial sIA), and in 2,513 matched controls with case-control status and with the number of sIAs. The most promising loci (p<5E-6) were replicated in 858 Finnish sIA patients and 4,048 controls. The frequencies and effect sizes of the replicated variants were compared to a continental European population using 717 Dutch cases and 3,004 controls. We discovered four new high-risk loci with low frequency lead variants. Three were associated with the case-control status: 2q23.3 (MAF 2.1%, OR 1.89, p 1.42×10-9); 5q31.3 (MAF 2.7%, OR 1.66, p 3.17×10-8); 6q24.2 (MAF 2.6%, OR 1.87, p 1.87×10-11) and one with the number of sIAs: 7p22.1 (MAF 3.3%, RR 1.59, p 6.08×-9). Two of the associations (5q31.3, 6q24.2) replicated in the Dutch sample. The 7p22.1 locus was strongly differentiated; the lead variant was more frequent in Finland (4.6%) than in the Netherlands (0.3%). Additionally, we replicated a previously inconclusive locus on 2q33.1 in all samples tested (OR 1.27, p 1.87×10-12). The five loci explain 2.1% of the sIA heritability in Finland, and may relate to, but not explain, the increased incidence of sIA-SAH in Finland. This study illustrates the utility of population isolates, familial enrichment, dense genotype imputation and alternate phenotyping in search for variants associated with complex diseases.

    PLoS genetics 2014;10;1;e1004134

  • Genomic encyclopedia of bacteria and archaea: sequencing a myriad of type strains.

    Kyrpides NC, Hugenholtz P, Eisen JA, Woyke T, Göker M, Parker CT, Amann R, Beck BJ, Chain PS, Chun J, Colwell RR, Danchin A, Dawyndt P, Dedeurwaerdere T, DeLong EF, Detter JC, De Vos P, Donohue TJ, Dong XZ, Ehrlich DS, Fraser C, Gibbs R, Gilbert J, Gilna P, Glöckner FO, Jansson JK, Keasling JD, Knight R, Labeda D, Lapidus A, Lee JS, Li WJ, Ma J, Markowitz V, Moore ER, Morrison M, Meyer F, Nelson KE, Ohkuma M, Ouzounis CA, Pace N, Parkhill J, Qin N, Rossello-Mora R, Sikorski J, Smith D, Sogin M, Stevens R, Stingl U, Suzuki K, Taylor D, Tiedje JM, Tindall B, Wagner M, Weinstock G, Weissenbach J, White O, Wang J, Zhang L, Zhou YG, Field D, Whitman WB, Garrity GM and Klenk HP

    DOE-Joint Genome Institute, Walnut Creek, California, United States of America; Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia.

    Microbes hold the key to life. They hold the secrets to our past (as the descendants of the earliest forms of life) and the prospects for our future (as we mine their genes for solutions to some of the planet's most pressing problems, from global warming to antibiotic resistance). However, the piecemeal approach that has defined efforts to study microbial genetic diversity for over 20 years and in over 30,000 genome projects risks squandering that promise. These efforts have covered less than 20% of the diversity of the cultured archaeal and bacterial species, which represent just 15% of the overall known prokaryotic diversity. Here we call for the funding of a systematic effort to produce a comprehensive genomic catalog of all cultured Bacteria and Archaea by sequencing, where available, the type strain of each species with a validly published name (currently∼11,000). This effort will provide an unprecedented level of coverage of our planet's genetic diversity, allow for the large-scale discovery of novel genes and functions, and lead to an improved understanding of microbial evolution and function in the environment.

    PLoS biology 2014;12;8;e1001920

  • Design of clone-specific probes from genome sequences for rapid PCR-typing of outbreak pathogens.

    López-Camacho E, Rentero Z, Ruiz-Carrascoso G, Wesselink JJ, Pérez-Vázquez M, Lusa-Bernal S, Gómez-Puertas P, Kingsley RA, Gómez-Sánchez P, Campos J, Oteo J and Mingorance J

    Servicio de Microbiología, Hospital Universitario La Paz, IdiPAZ, Madrid, Spain.

    The genome sequence of one OXA-48-producing Klebsiella pneumoniae belonging to sequence type (ST) 405, and three belonging to ST11, were used to design and test ST-specific PCR assays for typing OXA-48-producing K. pneumoniae. The approach proved to be useful for in-house development of rapid PCR typing assays for local outbreak surveillance.

    Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases 2014

  • Predicting the virulence of MRSA from its genome sequence.

    Laabei M, Recker M, Rudkin JK, Aldeljawi M, Gulay Z, Sloan TJ, Williams P, Endres JL, Bayles KW, Fey PD, Yajjala VK, Widhelm T, Hawkins E, Lewis K, Parfett S, Scowen L, Peacock SJ, Holden M, Wilson D, Read TD, van den Elsen J, Priest NK, Feil EJ, Hurst LD, Josefsson E and Massey RC

    Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, United Kingdom;

    Microbial virulence is a complex and often multifactorial phenotype, intricately linked to a pathogen's evolutionary trajectory. Toxicity, the ability to destroy host cell membranes, and adhesion, the ability to adhere to human tissues, are the major virulence factors of many bacterial pathogens, including Staphylococcus aureus. Here, we assayed the toxicity and adhesiveness of 90 MRSA (methicillin resistant S. aureus) isolates and found that while there was remarkably little variation in adhesion, toxicity varied by over an order of magnitude between isolates, suggesting different evolutionary selection pressures acting on these two traits. We performed a genome-wide association study (GWAS) and identified a large number of loci, as well as a putative network of epistatically interacting loci, that significantly associated with toxicity. Despite this apparent complexity in toxicity regulation, a predictive model based on a set of significant single nucleotide polymorphisms (SNPs) and insertion and deletions events (indels) showed a high degree of accuracy in predicting an isolate's toxicity solely from the genetic signature at these sites. Our results thus highlight the potential of using sequence data to determine clinically relevant parameters and have further implications for understanding the microbial virulence of this opportunistic pathogen.

    Genome research 2014;24;5;839-49

  • Gene-lifestyle interaction and type 2 diabetes: the EPIC interact case-cohort study.

    Langenberg C, Sharp SJ, Franks PW, Scott RA, Deloukas P, Forouhi NG, Froguel P, Groop LC, Hansen T, Palla L, Pedersen O, Schulze MB, Tormo MJ, Wheeler E, Agnoli C, Arriola L, Barricarte A, Boeing H, Clarke GM, Clavel-Chapelon F, Duell EJ, Fagherazzi G, Kaaks R, Kerrison ND, Key TJ, Khaw KT, Kröger J, Lajous M, Morris AP, Navarro C, Nilsson PM, Overvad K, Palli D, Panico S, Quirós JR, Rolandsson O, Sacerdote C, Sánchez MJ, Slimani N, Spijkerman AM, Tumino R, van der A DL, van der Schouw YT, Barroso I, McCarthy MI, Riboli E and Wareham NJ

    Medical Research Council Epidemiology Unit, University of Cambridge, Cambridge, United Kingdom.

    Background: Understanding of the genetic basis of type 2 diabetes (T2D) has progressed rapidly, but the interactions between common genetic variants and lifestyle risk factors have not been systematically investigated in studies with adequate statistical power. Therefore, we aimed to quantify the combined effects of genetic and lifestyle factors on risk of T2D in order to inform strategies for prevention.

    The InterAct study includes 12,403 incident T2D cases and a representative sub-cohort of 16,154 individuals from a cohort of 340,234 European participants with 3.99 million person-years of follow-up. We studied the combined effects of an additive genetic T2D risk score and modifiable and non-modifiable risk factors using Prentice-weighted Cox regression and random effects meta-analysis methods. The effect of the genetic score was significantly greater in younger individuals (p for interaction  = 1.20×10-4). Relative genetic risk (per standard deviation [4.4 risk alleles]) was also larger in participants who were leaner, both in terms of body mass index (p for interaction  = 1.50×10-3) and waist circumference (p for interaction  = 7.49×10-9). Examination of absolute risks by strata showed the importance of obesity for T2D risk. The 10-y cumulative incidence of T2D rose from 0.25% to 0.89% across extreme quartiles of the genetic score in normal weight individuals, compared to 4.22% to 7.99% in obese individuals. We detected no significant interactions between the genetic score and sex, diabetes family history, physical activity, or dietary habits assessed by a Mediterranean diet score.

    Conclusions: The relative effect of a T2D genetic risk score is greater in younger and leaner participants. However, this sub-group is at low absolute risk and would not be a logical target for preventive interventions. The high absolute risk associated with obesity at any level of genetic risk highlights the importance of universal rather than targeted approaches to lifestyle intervention.

    Funded by: Cancer Research UK; Medical Research Council: G0601261; Wellcome Trust: 083270/Z/07/Z, 090532, 098017, WT090532, WT098017

    PLoS medicine 2014;11;5;e1001647

  • Chemical inhibition of NAT10 corrects defects of laminopathic cells.

    Larrieu D, Britton S, Demir M, Rodriguez R and Jackson SP

    The Wellcome Trust/Cancer Research UK (CRUK) Gurdon Institute and Department of Biochemistry, University of Cambridge, CB2 1QN Cambridge, UK.

    Down-regulation and mutations of the nuclear-architecture proteins lamin A and C cause misshapen nuclei and altered chromatin organization associated with cancer and laminopathies, including the premature-aging disease Hutchinson-Gilford progeria syndrome (HGPS). Here, we identified the small molecule "Remodelin" that improved nuclear architecture, chromatin organization, and fitness of both human lamin A/C-depleted cells and HGPS-derived patient cells and decreased markers of DNA damage in these cells. Using a combination of chemical, cellular, and genetic approaches, we identified the acetyl-transferase protein NAT10 as the target of Remodelin that mediated nuclear shape rescue in laminopathic cells via microtubule reorganization. These findings provide insights into how NAT10 affects nuclear architecture and suggest alternative strategies for treating laminopathies and aging.

    Funded by: Cancer Research UK: C6/A11224, C6946/A14492; Medical Research Council: MR/L019116/1; Wellcome Trust: WT092096

    Science (New York, N.Y.) 2014;344;6183;527-32

  • Complete humanization of the mouse immunoglobulin loci enables efficient therapeutic antibody discovery.

    Lee EC, Liang Q, Ali H, Bayliss L, Beasley A, Bloomfield-Gerdes T, Bonoli L, Brown R, Campbell J, Carpenter A, Chalk S, Davis A, England N, Fane-Dremucheva A, Franz B, Germaschewski V, Holmes H, Holmes S, Kirby I, Kosmac M, Legent A, Lui H, Manin A, O'Leary S, Paterson J, Sciarrillo R, Speak A, Spensberger D, Tuffery L, Waddell N, Wang W, Wells S, Wong V, Wood A, Owen MJ, Friedrich GA and Bradley A

    Kymab Ltd., Babraham Research Campus, Cambridge, UK.

    If immunized with an antigen of interest, transgenic mice with large portions of unrearranged human immunoglobulin loci can produce fully human antigen-specific antibodies; several such antibodies are in clinical use. However, technical limitations inherent to conventional transgenic technology and sequence divergence between the human and mouse immunoglobulin constant regions limit the utility of these mice. Here, using repetitive cycles of genome engineering in embryonic stem cells, we have inserted the entire human immunoglobulin variable-gene repertoire (2.7 Mb) into the mouse genome, leaving the mouse constant regions intact. These transgenic mice are viable and fertile, with an immune system resembling that of wild-type mice. Antigen immunization results in production of high-affinity antibodies with long human-like complementarity-determining region 3 (CDR3H), broad epitope coverage and strong signatures of somatic hypermutation. These mice provide a robust system for the discovery of therapeutic human monoclonal antibodies; as a surrogate readout of the human antibody response, they may also aid vaccine design efforts.

    Nature biotechnology 2014

  • Reprogramming the Methylome: Erasing Memory and Creating Diversity.

    Lee HJ, Hore TA and Reik W

    Epigenetics Programme, The Babraham Institute, Cambridge, CB22 3AT, UK; Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    The inheritance of epigenetic marks, in particular DNA methylation, provides a molecular memory that ensures faithful commitment to transcriptional programs during mammalian development. Epigenetic reprogramming results in global hypomethylation of the genome together with a profound loss of memory, which underlies naive pluripotency. Such global reprogramming occurs in primordial germ cells, early embryos, and embryonic stem cells where reciprocal molecular links connect the methylation machinery to pluripotency. Priming for differentiation is initiated upon exit from pluripotency, and we propose that epigenetic mechanisms create diversity of transcriptional states, which help with symmetry breaking during cell fate decisions and lineage commitment.

    Funded by: Wellcome Trust: 095645

    Cell stem cell 2014;14;6;710-719

  • Molecular genetic evidence for overlap between general cognitive ability and risk for schizophrenia: a report from the Cognitive Genomics consorTium (COGENT).

    Lencz T, Knowles E, Davies G, Guha S, Liewald DC, Starr JM, Djurovic S, Melle I, Sundet K, Christoforou A, Reinvang I, Mukherjee S, Derosse P, Lundervold A, Steen VM, John M, Espeseth T, Räikkönen K, Widen E, Palotie A, Eriksson JG, Giegling I, Konte B, Ikeda M, Roussos P, Giakoumaki S, Burdick KE, Payton A, Ollier W, Horan M, Donohoe G, Morris D, Corvin A, Gill M, Pendleton N, Iwata N, Darvasi A, Bitsios P, Rujescu D, Lahti J, Hellard SL, Keller MC, Andreassen OA, Deary IJ, Glahn DC and Malhotra AK

    1] Division of Psychiatry Research, Zucker Hillside Hospital, Glen Oaks, NY, USA [2] Center for Psychiatric Neuroscience, Feinstein Institute for Medical Research, Manhasset, NY, USA [3] Hofstra North Shore-LIJ School of Medicine, Departments of Psychiatry and Molecular Medicine, Hempstead, NY, USA.

    It has long been recognized that generalized deficits in cognitive ability represent a core component of schizophrenia (SCZ), evident before full illness onset and independent of medication. The possibility of genetic overlap between risk for SCZ and cognitive phenotypes has been suggested by the presence of cognitive deficits in first-degree relatives of patients with SCZ; however, until recently, molecular genetic approaches to test this overlap have been lacking. Within the last few years, large-scale genome-wide association studies (GWAS) of SCZ have demonstrated that a substantial proportion of the heritability of the disorder is explained by a polygenic component consisting of many common single-nucleotide polymorphisms (SNPs) of extremely small effect. Similar results have been reported in GWAS of general cognitive ability. The primary aim of the present study is to provide the first molecular genetic test of the classic endophenotype hypothesis, which states that alleles associated with reduced cognitive ability should also serve to increase risk for SCZ. We tested the endophenotype hypothesis by applying polygenic SNP scores derived from a large-scale cognitive GWAS meta-analysis (~5000 individuals from nine nonclinical cohorts comprising the Cognitive Genomics consorTium (COGENT)) to four SCZ case-control cohorts. As predicted, cases had significantly lower cognitive polygenic scores compared to controls. In parallel, polygenic risk scores for SCZ were associated with lower general cognitive ability. In addition, using our large cognitive meta-analytic data set, we identified nominally significant cognitive associations for several SNPs that have previously been robustly associated with SCZ susceptibility. Results provide molecular confirmation of the genetic overlap between SCZ and general cognitive ability, and may provide additional insight into pathophysiology of the disorder.

    Molecular psychiatry 2014;19;2;168-74

  • JAK2V617F homozygosity drives a phenotypic switch in myeloproliferative neoplasms, but is insufficient to sustain disease.

    Li J, Kent DG, Godfrey AL, Manning H, Nangalia J, Aziz A, Chen E, Saeb-Parsy K, Fink J, Sneade R, Hamilton TL, Pask DC, Silber Y, Zhao X, Ghevaert C, Liu P and Green AR

    Cambridge Institute for Medical Research and Wellcome Trust/Medical Research Council Stem Cell Institute, University of Cambridge, Cambridge, United Kingdom; Department of Haematology, University of Cambridge, United Kingdom;

    Genomic regions of acquired uniparental disomy (UPD) are common in malignancy and frequently harbor mutated oncogenes. Homozygosity for such gain-of-function mutations is thought to modulate tumor phenotype, but direct evidence has been elusive. Polycythemia vera (PV) and essential thrombocythemia (ET), 2 subtypes of myeloproliferative neoplasms, are associated with an identical acquired JAK2V617F mutation but the mechanisms responsible for distinct clinical phenotypes remain unclear. We provide direct genetic evidence and demonstrate that homozygosity for human JAK2V617F in knock-in mice results in a striking phenotypic switch from an ET-like to PV-like phenotype. The resultant erythrocytosis is driven by increased numbers of early erythroid progenitors and enhanced erythroblast proliferation, whereas reduced platelet numbers are associated with impaired platelet survival. JAK2V617F-homozygous mice developed a severe hematopoietic stem cell defect, suggesting that additional lesions are needed to sustain clonal expansion. Together, our results indicate that UPD for 9p plays a causal role in the PV phenotype in patients as a consequence of JAK2V617F homozygosity. The generation of a JAK2V617F allelic series of mice with a dose-dependent effect on hematopoiesis provides a powerful model for studying the consequences of mutant JAK2 homozygosity.

    Blood 2014;123;20;3139-51

  • Constitutional and somatic rearrangement of chromosome 21 in acute lymphoblastic leukaemia.

    Li Y, Schwab C, Ryan SL, Papaemmanuil E, Robinson HM, Jacobs P, Moorman AV, Dyer S, Borrow J, Griffiths M, Heerema NA, Carroll AJ, Talley P, Bown N, Telford N, Ross FM, Gaunt L, McNally RJ, Young BD, Sinclair P, Rand V, Teixeira MR, Joseph O, Robinson B, Maddison M, Dastugue N, Vandenberghe P, Haferlach C, Stephens PJ, Cheng J, Van Loo P, Stratton MR, Campbell PJ and Harrison CJ

    1] Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK [2].

    Changes in gene dosage are a major driver of cancer, known to be caused by a finite, but increasingly well annotated, repertoire of mutational mechanisms. This can potentially generate correlated copy-number alterations across hundreds of linked genes, as exemplified by the 2% of childhood acute lymphoblastic leukaemia (ALL) with recurrent amplification of megabase regions of chromosome 21 (iAMP21). We used genomic, cytogenetic and transcriptional analysis, coupled with novel bioinformatic approaches, to reconstruct the evolution of iAMP21 ALL. Here we show that individuals born with the rare constitutional Robertsonian translocation between chromosomes 15 and 21, rob(15;21)(q10;q10)c, have approximately 2,700-fold increased risk of developing iAMP21 ALL compared to the general population. In such cases, amplification is initiated by a chromothripsis event involving both sister chromatids of the Robertsonian chromosome, a novel mechanism for cancer predisposition. In sporadic iAMP21, breakage-fusion-bridge cycles are typically the initiating event, often followed by chromothripsis. In both sporadic and rob(15;21)c-associated iAMP21, the final stages frequently involve duplications of the entire abnormal chromosome. The end-product is a derivative of chromosome 21 or the rob(15;21)c chromosome with gene dosage optimized for leukaemic potential, showing constrained copy-number levels over multiple linked genes. Thus, dicentric chromosomes may be an important precipitant of chromothripsis, as we show rob(15;21)c to be constitutionally dicentric and breakage-fusion-bridge cycles generate dicentric chromosomes somatically. Furthermore, our data illustrate that several cancer-specific mutational processes, applied sequentially, can coordinate to fashion copy-number profiles over large genomic scales, incrementally refining the fitness benefits of aggregated gene dosage changes.

    Nature 2014

  • Novel skin phenotypes revealed by a genome-wide mouse reverse genetic screen.

    Liakath-Ali K, Vancollie VE, Heath E, Smedley DP, Estabel J, Sunter D, Ditommaso T, White JK, Ramirez-Solis R, Smyth I, Steel KP and Watt FM

    1] Centre for Stem Cells and Regenerative Medicine, King's College London, Guy's Hospital, London SE1 9RT, UK [2] Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1QW, UK [3] Wellcome Trust-Medical Research Council Stem Cell Institute, University of Cambridge, Tennis Court Road, Cambridge CB2 1QR, UK.

    Permanent stop-and-shop large-scale mouse mutant resources provide an excellent platform to decipher tissue phenogenomics. Here we analyse skin from 538 knockout mouse mutants generated by the Sanger Institute Mouse Genetics Project. We optimize immunolabelling of tail epidermal wholemounts to allow systematic annotation of hair follicle, sebaceous gland and interfollicular epidermal abnormalities using ontology terms from the Mammalian Phenotype Ontology. Of the 50 mutants with an epidermal phenotype, 9 map to human genetic conditions with skin abnormalities. Some mutant genes are expressed in the skin, whereas others are not, indicating systemic effects. One phenotype is affected by diet and several are incompletely penetrant. In-depth analysis of three mutants, Krt76, Myo5a (a model of human Griscelli syndrome) and Mysm1, provides validation of the screen. Our study is the first large-scale genome-wide tissue phenotype screen from the International Knockout Mouse Consortium and provides an open access resource for the scientific community.

    Funded by: Medical Research Council; Wellcome Trust: 096540, 098051

    Nature communications 2014;5;3540

  • Distribution and Medical Impact of Loss-of-Function Variants in the Finnish Founder Population.

    Lim ET, Würtz P, Havulinna AS, Palta P, Tukiainen T, Rehnström K, Esko T, Mägi R, Inouye M, Lappalainen T, Chan Y, Salem RM, Lek M, Flannick J, Sim X, Manning A, Ladenvall C, Bumpstead S, Hämäläinen E, Aalto K, Maksimow M, Salmi M, Blankenberg S, Ardissino D, Shah S, Horne B, McPherson R, Hovingh GK, Reilly MP, Watkins H, Goel A, Farrall M, Girelli D, Reiner AP, Stitziel NO, Kathiresan S, Gabriel S, Barrett JC, Lehtimäki T, Laakso M, Groop L, Kaprio J, Perola M, McCarthy MI, Boehnke M, Altshuler DM, Lindgren CM, Hirschhorn JN, Metspalu A, Freimer NB, Zeller T, Jalkanen S, Koskinen S, Raitakari O, Durbin R, MacArthur DG, Salomaa V, Ripatti S, Daly MJ, Palotie A and Sequencing Initiative Suomi (SISu) Project

    Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, United States of America; Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, United States of America; Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America; Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, Massachusetts, United States of America.

    Exome sequencing studies in complex diseases are challenged by the allelic heterogeneity, large number and modest effect sizes of associated variants on disease risk and the presence of large numbers of neutral variants, even in phenotypically relevant genes. Isolated populations with recent bottlenecks offer advantages for studying rare variants in complex diseases as they have deleterious variants that are present at higher frequencies as well as a substantial reduction in rare neutral variation. To explore the potential of the Finnish founder population for studying low-frequency (0.5-5%) variants in complex diseases, we compared exome sequence data on 3,000 Finns to the same number of non-Finnish Europeans and discovered that, despite having fewer variable sites overall, the average Finn has more low-frequency loss-of-function variants and complete gene knockouts. We then used several well-characterized Finnish population cohorts to study the phenotypic effects of 83 enriched loss-of-function variants across 60 phenotypes in 36,262 Finns. Using a deep set of quantitative traits collected on these cohorts, we show 5 associations (p<5×10-8) including splice variants in LPA that lowered plasma lipoprotein(a) levels (P = 1.5×10-117). Through accessing the national medical records of these participants, we evaluate the LPA finding via Mendelian randomization and confirm that these splice variants confer protection from cardiovascular disease (OR = 0.84, P = 3×10-4), demonstrating for the first time the correlation between very low levels of LPA in humans with potential therapeutic implications for cardiovascular diseases. More generally, this study articulates substantial advantages for studying the role of rare variation in complex phenotypes in founder populations like the Finns and by combining a unique population genetic history with data from large population cohorts and centralized research access to National Health Registers.

    PLoS genetics 2014;10;7;e1004494

  • Genetic studies of Crohn's disease: past, present and future.

    Liu JZ and Anderson CA

    The Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    The exact aetiology of Crohn's disease is unknown, though it is clear from early epidemiological studies that a combination of genetic and environmental risk factors contributes to an individual's disease susceptibility. Here, we review the history of gene-mapping studies of Crohn's disease, from the linkage-based studies that first implicated the NOD2 locus, through to modern-day genome-wide association studies that have discovered over 140 loci associated with Crohn's disease and yielded novel insights into the biological pathways underlying pathogenesis. We describe on-going and future gene-mapping studies that utilise next generation sequencing technology to pinpoint causal variants and identify rare genetic variation underlying Crohn's disease risk. We comment on the utility of genetic markers for predicting an individual's disease risk and discuss their potential for identifying novel drug targets and influencing disease management. Finally, we describe how these studies have shaped and continue to shape our understanding of the genetic architecture of Crohn's disease.

    Funded by: Wellcome Trust: 098051

    Best practice & research. Clinical gastroenterology 2014;28;3;373-86

  • African origin of the malaria parasite Plasmodium vivax.

    Liu W, Li Y, Shaw KS, Learn GH, Plenderleith LJ, Malenke JA, Sundararaman SA, Ramirez MA, Crystal PA, Smith AG, Bibollet-Ruche F, Ayouba A, Locatelli S, Esteban A, Mouacha F, Guichet E, Butel C, Ahuka-Mundeke S, Inogwabini BI, Ndjango JB, Speede S, Sanz CM, Morgan DB, Gonder MK, Kranzusch PJ, Walsh PD, Georgiev AV, Muller MN, Piel AK, Stewart FA, Wilson ML, Pusey AE, Cui L, Wang Z, Färnert A, Sutherland CJ, Nolder D, Hart JA, Hart TB, Bertolani P, Gillis A, LeBreton M, Tafon B, Kiyang J, Djoko CF, Schneider BS, Wolfe ND, Mpoudi-Ngole E, Delaporte E, Carter R, Culleton RL, Shaw GM, Rayner JC, Peeters M, Hahn BH and Sharp PM

    Department of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.

    Plasmodium vivax is the leading cause of human malaria in Asia and Latin America but is absent from most of central Africa due to the near fixation of a mutation that inhibits the expression of its receptor, the Duffy antigen, on human erythrocytes. The emergence of this protective allele is not understood because P. vivax is believed to have originated in Asia. Here we show, using a non-invasive approach, that wild chimpanzees and gorillas throughout central Africa are endemically infected with parasites that are closely related to human P. vivax. Sequence analyses reveal that ape parasites lack host specificity and are much more diverse than human parasites, which form a monophyletic lineage within the ape parasite radiation. These findings indicate that human P. vivax is of African origin and likely selected for the Duffy-negative mutation. All extant human P. vivax parasites are derived from a single ancestor that escaped out of Africa.

    Funded by: NIAID NIH HHS: P30 AI045008, R01 AI058715, R01 AI091595, R01 AI58715, R37 AI050529, T32 AI007532; Wellcome Trust: 098051

    Nature communications 2014;5;3346

  • Do you smell what I smell? Genetic variation in olfactory perception.

    Logan DW

    *Welcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, U.K.

    The sense of smell is mediated by the detection of chemical odours by ORs (olfactory receptors) in the nose. This initiates a neural percept of the odour in the brain, which may provoke an emotional or behavioural response. Analogous to colour-blindness in the visual system, some individuals report a very different percept of specific odours to others, in terms of intensity, valence or detection threshold. A significant proportion of variance in odour perception is heritable, and recent advances in genome sequencing and genotyping technologies have permitted studies into the genes that underpin these phenotypic differences. In the present article, I review the evidence that OR genes are extremely variable between individuals. I argue that this contributes to a unique receptor repertoire in our noses that provides us each with a personalized perception of our environment. I highlight specific examples where known OR variants influence odour detection and discuss the wider implications of this for both humans and other mammals that use chemical communication for social interaction.

    Biochemical Society transactions 2014;42;4;861-5

  • A DERL3-associated defect in the degradation of SLC2A1 mediates the Warburg effect.

    Lopez-Serra P, Marcilla M, Villanueva A, Ramos-Fernandez A, Palau A, Leal L, Wahi JE, Setien-Baranda F, Szczesna K, Moutinho C, Martinez-Cardus A, Heyn H, Sandoval J, Puertas S, Vidal A, Sanjuan X, Martinez-Balibrea E, Viñals F, Perales JC, Bramsem JB, Ørntoft TF, Andersen CL, Tabernero J, McDermott U, Boxer MB, Heiden MG, Albar JP and Esteller M

    Cancer Epigenetics and Biology Program (PEBC), Bellvitge Biomedical Research Institute (IDIBELL), L'Hospitalet, Barcelona, 08908 Catalonia, Spain.

    Cancer cells possess aberrant proteomes that can arise by the disruption of genes involved in physiological protein degradation. Here we demonstrate the presence of promoter CpG island hypermethylation-linked inactivation of DERL3 (Derlin-3), a key gene in the endoplasmic reticulum-associated protein degradation pathway, in human tumours. The restoration of in vitro and in vivo DERL3 activity highlights the tumour suppressor features of the gene. Using the stable isotopic labelling of amino acids in cell culture workflow for differential proteome analysis, we identify SLC2A1 (glucose transporter 1, GLUT1) as a downstream target of DERL3. Most importantly, SLC2A1 overexpression mediated by DERL3 epigenetic loss contributes to the Warburg effect in the studied cells and pinpoints a subset of human tumours with greater vulnerability to drugs targeting glycolysis.

    Nature communications 2014;5;3608

  • Genome-wide association analysis identifies six new loci associated with forced vital capacity.

    Loth DW, Artigas MS, Gharib SA, Wain LV, Franceschini N, Koch B, Pottinger TD, Smith AV, Duan Q, Oldmeadow C, Lee MK, Strachan DP, James AL, Huffman JE, Vitart V, Ramasamy A, Wareham NJ, Kaprio J, Wang XQ, Trochet H, Kähönen M, Flexeder C, Albrecht E, Lopez LM, de Jong K, Thyagarajan B, Alves AC, Enroth S, Omenaas E, Joshi PK, Fall T, Viñuela A, Launer LJ, Loehr LR, Fornage M, Li G, Wilk JB, Tang W, Manichaikul A, Lahousse L, Harris TB, North KE, Rudnicka AR, Hui J, Gu X, Lumley T, Wright AF, Hastie ND, Campbell S, Kumar R, Pin I, Scott RA, Pietiläinen KH, Surakka I, Liu Y, Holliday EG, Schulz H, Heinrich J, Davies G, Vonk JM, Wojczynski M, Pouta A, Johansson A, Wild SH, Ingelsson E, Rivadeneira F, Völzke H, Hysi PG, Eiriksdottir G, Morrison AC, Rotter JI, Gao W, Postma DS, White WB, Rich SS, Hofman A, Aspelund T, Couper D, Smith LJ, Psaty BM, Lohman K, Burchard EG, Uitterlinden AG, Garcia M, Joubert BR, McArdle WL, Musk AB, Hansel N, Heckbert SR, Zgaga L, van Meurs JB, Navarro P, Rudan I, Oh YM, Redline S, Jarvis DL, Zhao JH, Rantanen T, O'Connor GT, Ripatti S, Scott RJ, Karrasch S, Grallert H, Gaddis NC, Starr JM, Wijmenga C, Minster RL, Lederer DJ, Pekkanen J, Gyllensten U, Campbell H, Morris AP, Gläser S, Hammond CJ, Burkart KM, Beilby J, Kritchevsky SB, Gudnason V, Hancock DB, Williams OD, Polasek O, Zemunik T, Kolcic I, Petrini MF, Wjst M, Kim WJ, Porteous DJ, Scotland G, Smith BH, Viljanen A, Heliövaara M, Attia JR, Sayers I, Hampel R, Gieger C, Deary IJ, Boezen HM, Newman A, Jarvelin MR, Wilson JF, Lind L, Stricker BH, Teumer A, Spector TD, Melén E, Peters MJ, Lange LA, Barr RG, Bracke KR, Verhamme FM, Sung J, Hiemstra PS, Cassano PA, Sood A, Hayward C, Dupuis J, Hall IP, Brusselle GG, Tobin MD and London SJ

    1] Department of Epidemiology, Erasmus MC, Rotterdam, the Netherlands. [2] Netherlands Health Care Inspectorate, The Hague, the Netherlands. [3].

    Forced vital capacity (FVC), a spirometric measure of pulmonary function, reflects lung volume and is used to diagnose and monitor lung diseases. We performed genome-wide association study meta-analysis of FVC in 52,253 individuals from 26 studies and followed up the top associations in 32,917 additional individuals of European ancestry. We found six new regions associated at genome-wide significance (P < 5 × 10(-8)) with FVC in or near EFEMP1, BMP6, MIR129-2-HSD17B12, PRDM11, WWOX and KCNJ2. Two loci previously associated with spirometric measures (GSTCD and PTCH1) were related to FVC. Newly implicated regions were followed up in samples from African-American, Korean, Chinese and Hispanic individuals. We detected transcripts for all six newly implicated genes in human lung tissue. The new loci may inform mechanisms involved in lung development and the pathogenesis of restrictive lung disease.

    Nature genetics 2014

  • A proteomic chronology of gene expression through the cell cycle in human myeloid leukemia cells.

    Ly T, Ahmad Y, Shlien A, Soroka D, Mills A, Emanuele MJ, Stratton MR and Lamond AI

    Centre for Gene Regulation and Expression, College of Life Sciences, University of Dundee, Dundee, United Kingdom.

    Technological advances have enabled the analysis of cellular protein and RNA levels with unprecedented depth and sensitivity, allowing for an unbiased re-evaluation of gene regulation during fundamental biological processes. Here, we have chronicled the dynamics of protein and mRNA expression levels across a minimally perturbed cell cycle in human myeloid leukemia cells using centrifugal elutriation combined with mass spectrometry-based proteomics and RNA-Seq, avoiding artificial synchronization procedures. We identify myeloid-specific gene expression and variations in protein abundance, isoform expression and phosphorylation at different cell cycle stages. We dissect the relationship between protein and mRNA levels for both bulk gene expression and for over ∼6000 genes individually across the cell cycle, revealing complex, gene-specific patterns. This data set, one of the deepest surveys to date of gene expression in human cells, is presented in an online, searchable database, the Encyclopedia of Proteome Dynamics ( DOI:

    eLife 2014;3;e01630

  • Cloning of recombinant monoclonal antibodies from hybridomas in a single Mammalian expression plasmid.

    Müller-Sienerth N, Crosnier C, Wright GJ and Staudt N

    Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Antibodies are an integral part of biological and medical research. In addition, immunoglobulins are used in many diagnostic tests and are becoming increasingly important in the therapy of diseases. To express antibodies recombinantly, the immunoglobulin heavy and light chains are usually cloned into two different expression plasmids. Here, we describe a method for recombinant antibody expression from a single plasmid.

    Methods in molecular biology (Clifton, N.J.) 2014;1131;229-40

  • Guidelines for investigating causality of sequence variants in human disease.

    MacArthur DG, Manolio TA, Dimmock DP, Rehm HL, Shendure J, Abecasis GR, Adams DR, Altman RB, Antonarakis SE, Ashley EA, Barrett JC, Biesecker LG, Conrad DF, Cooper GM, Cox NJ, Daly MJ, Gerstein MB, Goldstein DB, Hirschhorn JN, Leal SM, Pennacchio LA, Stamatoyannopoulos JA, Sunyaev SR, Valle D, Voight BF, Winckler W and Gunter C

    1] Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA [2] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA.

    The discovery of rare genetic variants is accelerating, and clear guidelines for distinguishing disease-causing sequence variants from the many potentially functional variants present in any human genome are urgently needed. Without rigorous standards we risk an acceleration of false-positive reports of causality, which would impede the translation of genomic research findings into the clinical diagnostic setting and hinder biological understanding of disease. Here we discuss the key challenges of assessing sequence variants in human disease, integrating both gene-level and variant-level support for causality. We propose guidelines for summarizing confidence in variant pathogenicity and highlight several areas that require further resource development.

    Nature 2014;508;7497;469-76

  • The rate of nonallelic homologous recombination in males is highly variable, correlated between monozygotic twins and independent of age.

    MacArthur JA, Spector TD, Lindsay SJ, Mangino M, Gill R, Small KS and Hurles ME

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    Nonallelic homologous recombination (NAHR) between highly similar duplicated sequences generates chromosomal deletions, duplications and inversions, which can cause diverse genetic disorders. Little is known about interindividual variation in NAHR rates and the factors that influence this. We estimated the rate of deletion at the CMT1A-REP NAHR hotspot in sperm DNA from 34 male donors, including 16 monozygotic (MZ) co-twins (8 twin pairs) aged 24 to 67 years old. The average NAHR rate was 3.5 × 10(-5) with a seven-fold variation across individuals. Despite good statistical power to detect even a subtle correlation, we observed no relationship between age of unrelated individuals and the rate of NAHR in their sperm, likely reflecting the meiotic-specific origin of these events. We then estimated the heritability of deletion rate by calculating the intraclass correlation (ICC) within MZ co-twins, revealing a significant correlation between MZ co-twins (ICC = 0.784, p = 0.0039), with MZ co-twins being significantly more correlated than unrelated pairs. We showed that this heritability cannot be explained by variation in PRDM9, a known regulator of NAHR, or variation within the NAHR hotspot itself. We also did not detect any correlation between Body Mass Index (BMI), smoking status or alcohol intake and rate of NAHR. Our results suggest that other, as yet unidentified, genetic or environmental factors play a significant role in the regulation of NAHR and are responsible for the extensive variation in the population for the probability of fathering a child with a genomic disorder resulting from a pathogenic deletion.

    Funded by: Wellcome Trust: 077014/Z/05/Z

    PLoS genetics 2014;10;3;e1004195

  • Single cell genomics: advances and future perspectives.

    Macaulay IC and Voet T

    Single Cell Genomics Centre, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    Advances in whole-genome and whole-transcriptome amplification have permitted the sequencing of the minute amounts of DNA and RNA present in a single cell, offering a window into the extent and nature of genomic and transcriptomic heterogeneity which occurs in both normal development and disease. Single-cell approaches stand poised to revolutionise our capacity to understand the scale of genomic, epigenomic, and transcriptomic diversity that occurs during the lifetime of an individual organism. Here, we review the major technological and biological breakthroughs achieved, describe the remaining challenges to overcome, and provide a glimpse into the promise of recent and future developments.

    PLoS genetics 2014;10;1;e1004126

  • Targeting of Slc25a21 Is Associated with Orofacial Defects and Otitis Media Due to Disrupted Expression of a Neighbouring Gene.

    Maguire S, Estabel J, Ingham N, Pearson S, Ryder E, Carragher DM, Walker N, Sanger MGP Slc25a21 Project Team, Bussell J, Chan WI, Keane TM, Adams DJ, Scudamore CL, Lelliott CJ, Ramírez-Solis R, Karp NA, Steel KP, White JK and Gerdin AK

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom.

    Homozygosity for Slc25a21tm1a(KOMP)Wtsi results in mice exhibiting orofacial abnormalities, alterations in carpal and rugae structures, hearing impairment and inflammation in the middle ear. In humans it has been hypothesised that the 2-oxoadipate mitochondrial carrier coded by SLC25A21 may be involved in the disease 2-oxoadipate acidaemia. Unexpectedly, no 2-oxoadipate acidaemia-like symptoms were observed in animals homozygous for Slc25a21tm1a(KOMP)Wtsi despite confirmation that this allele reduces Slc25a21 expression by 71.3%. To study the complete knockout, an allelic series was generated using the loxP and FRT sites typical of a Knockout Mouse Project allele. After removal of the critical exon and neomycin selection cassette, Slc25a21 knockout mice homozygous for the Slc25a21tm1b(KOMP)Wtsi and Slc25a21tm1d(KOMP)Wtsi alleles were phenotypically indistinguishable from wild-type. This led us to explore the genomic environment of Slc25a21 and to discover that expression of Pax9, located 3' of the target gene, was reduced in homozygous Slc25a21tm1a(KOMP)Wtsi mice. We hypothesize that the presence of the selection cassette is the cause of the down regulation of Pax9 observed. The phenotypes we observed in homozygous Slc25a21tm1a(KOMP)Wtsi mice were broadly consistent with a hypomorphic Pax9 allele with the exception of otitis media and hearing impairment which may be a novel consequence of Pax9 down regulation. We explore the ramifications associated with this particular targeted mutation and emphasise the need to interpret phenotypes taking into consideration all potential underlying genetic mechanisms.

    PloS one 2014;9;3;e91807

  • Single-cell RNA sequencing reveals T helper cells synthesizing steroids de novo to contribute to immune homeostasis.

    Mahata B, Zhang X, Kolodziejczyk AA, Proserpio V, Haim-Vilmovsky L, Taylor AE, Hebenstreit D, Dingler FA, Moignard V, Göttgens B, Arlt W, McKenzie AN and Teichmann SA

    EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK; MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 OQH, UK; Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. Electronic address:

    T helper 2 (Th2) cells regulate helminth infections, allergic disorders, tumor immunity, and pregnancy by secreting various cytokines. It is likely that there are undiscovered Th2 signaling molecules. Although steroids are known to be immunoregulators, de novo steroid production from immune cells has not been previously characterized. Here, we demonstrate production of the steroid pregnenolone by Th2 cells in vitro and in vivo in a helminth infection model. Single-cell RNA sequencing and quantitative PCR analysis suggest that pregnenolone synthesis in Th2 cells is related to immunosuppression. In support of this, we show that pregnenolone inhibits Th cell proliferation and B cell immunoglobulin class switching. We also show that steroidogenic Th2 cells inhibit Th cell proliferation in a Cyp11a1 enzyme-dependent manner. We propose pregnenolone as a "lymphosteroid," a steroid produced by lymphocytes. We speculate that this de novo steroid production may be an intrinsic phenomenon of Th2-mediated immune responses to actively restore immune homeostasis.

    Funded by: Medical Research Council; Wellcome Trust

    Cell reports 2014;7;4;1130-42

  • Glucose-6-phosphate dehydrogenase polymorphisms and susceptibility to mild malaria in Dogon and Fulani, Mali.

    Maiga B, Dolo A, Campino S, Sepulveda N, Corran P, Rockett KA, Troye-Blomberg M, Doumbo OK and Clark TG

    Malaria Research and Training Centre, Department of Epidemiology of Parasitic Diseases, Faculty of Medicine, Pharmacy and Odonto - Stomatology, USTTB, BP 1805 Bamako, Mali.

    Background: Glucose-6-phosphate dehydrogenase (G6PD) deficiency is associated with protection from severe malaria, and potentially uncomplicated malaria phenotypes. It has been documented that G6PD deficiency in sub-Saharan Africa is due to the 202A/376G G6PD A-allele, and association studies have used genotyping as a convenient technique for epidemiological studies. However, recent studies have shown discrepancies in G6PD202/376 associations with severe malaria. There is evidence to suggest that other G6PD deficiency alleles may be common in some regions of West Africa, and that allelic heterogeneity could explain these discrepancies.

    Methods: A cross-sectional epidemiological study of malaria susceptibility was conducted during 2006 and 2007 in the Sahel meso-endemic malaria zone of Mali. The study included Dogon (n = 375) and Fulani (n = 337) sympatric ethnic groups, where the latter group is characterized by lower susceptibility to Plasmodium falciparum malaria. Fifty-three G6PD polymorphisms, including 202/376, were genotyped across the 712 samples. Evidence of association of these G6PD polymorphisms and mild malaria was assessed in both ethnic groups using genotypic and haplotypic statistical tests.

    Results: It was confirmed that the Fulani are less susceptible to malaria, and the 202A mutation is rare in this group (< 1% versus Dogon 7.9%). The Betica-Selma 968C/376G (~11% enzymatic activity) was more common in Fulani (6.1% vs Dogon 0.0%). There are differences in haplotype frequencies between Dogon and Fulani, and association analysis did not reveal strong evidence of protective G6PD genetic effects against uncomplicated malaria in both ethnic groups and gender. However, there was some evidence of increased risk of mild malaria in Dogon with the 202A mutation, attaining borderline statistical significance in females. The rs915942 polymorphism was found to be associated with asymptomatic malaria in Dogon females, and the rs61042368 polymorphism was associated with clinical malaria in Fulani males.

    Conclusions: The results highlight the need to consider markers in addition to G6PD202 in studies of deficiency. Further, large genetic epidemiological studies of multi-ethnic groups in West Africa across a spectrum of malaria severity phenotypes are required to establish who receives protection from G6PD deficiency.

    Malaria journal 2014;13;1;270

  • Fc gamma Receptor IIa-H131R Polymorphism and Malaria Susceptibility in Sympatric Ethnic Groups, Fulani and Dogon of Mali.

    Maiga B, Dolo A, Touré O, Dara V, Tapily A, Campino S, Sepulveda N, Corran P, Rockett K, Clark TG, Troye Blomberg M and Doumbo OK

    Malaria Research and Training Center/Department of Epidemiology of Parasitic Diseases/Faculty of Medicine, Pharmacy and Odonto - Stomatology, Bamako/USTTB, Mali; Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, Stockholm, Sweden.

    It has been previously shown that there are some interethnic differences in susceptibility to malaria between two sympatric ethnic groups of Mali, the Fulani and the Dogon. The lower susceptibility to Plasmodium falciparum malaria seen in the Fulani has not been fully explained by genetic polymorphisms previously known to be associated with malaria resistance, including haemoglobin S (HbS), haemoglobin C (HbC), alpha-thalassaemia and glucose-6-phosphate dehydrogenase (G6PD) deficiency. Given the observed differences in the distribution of FcγRIIa allotypes among different ethnic groups and with malaria susceptibility that have been reported, we analysed the rs1801274-R131H polymorphism in the FcγRIIa gene in a study of Dogon and Fulani in Mali (n = 939). We confirm that the Fulani have less parasite densities, less parasite prevalence, more spleen enlargement and higher levels of total IgG antibodies (anti-CSP, anti-AMA1, anti-MSP1 and anti-MSP2) and more total IgE (P < 0.05) compared with the Dogon ethnic group. Furthermore, the Fulani exhibit higher frequencies of the blood group O (56.5%) compared with the Dogon (43.5%) (P < 0.001). With regard to the FcγRIIa polymorphism and allele frequency, the Fulani group have a higher frequency of the H allele (Fulani 0.474, Dogon 0.341, P < 0.0001), which was associated with greater total IgE production (P = 0.004). Our findings show that the FcγRIIa polymorphism might have an implication in the relative protection seen in the Fulani tribe, with confirmatory studies required in other malaria endemic settings.

    Scandinavian journal of immunology 2014;79;1;43-50

  • Characterization of Vibrio cholerae bacteriophages isolated from the environmental waters of the Lake Victoria region of Kenya.

    Maina AN, Mwaura FB, Oyugi J, Goulding D, Toribio AL and Kariuki S

    School of Biological Sciences, University of Nairobi, Nairobi, Kenya,

    Over the last decade, cholera outbreaks have become common in some parts of Kenya. The most recent cholera outbreak occurred in Coastal and Lake Victoria region during January 2009 and May 2010, where a total of 11,769 cases and 274 deaths were reported by the Ministry of Public Health and Sanitation. The objective of this study is to isolate Vibrio cholerae bacteriophages from the environmental waters of the Lake Victoria region of Kenya with potential for use as a biocontrol for cholera outbreaks. Water samples from wells, ponds, sewage effluent, boreholes, rivers, and lakes of the Lake Victoria region of Kenya were enriched for 48 h at 37 °C in broth containing a an environmental strain of V. cholerae. Bacteriophages were isolated from 5 out of the 42 environmental water samples taken. Isolated phages produced tiny, round, and clear plaques suggesting that these phages were lytic to V. cholerae. Transmission electron microscope examination revealed that all the nine phages belonged to the family Myoviridae, with typical icosahedral heads, long contractile tails, and fibers. Head had an average diameter of 88.3 nm and tail of length and width 84.9 and 16.1 nm, respectively. Vibriophages isolated from the Lake Victoria region of Kenya have been characterized and the isolated phages may have a potential to be used as antibacterial agents to control pathogenic V. cholerae bacteria in water reservoirs.

    Current microbiology 2014;68;1;64-70

  • Mutation in KERA Identified by Linkage Analysis and Targeted Resequencing in a Pedigree with Premature Atherosclerosis.

    Maiwald S, Sivapalaratnam S, Motazacker MM, van Capelleveen JC, Bot I, de Jager SC, van Eck M, Jolley J, Kuiper J, Stephens J, Albers CA, Vosmeer CR, Kruize H, Geerke DP, van der Wal AC, van der Loos CM, Kastelein JJ, Trip MD, Ouwehand WH, Dallinga-Thie GM and Hovingh GK

    Department of Vascular Medicine, Academic Medical Centre, Amsterdam, the Netherlands; Department of Experimental Vascular Medicine, Academic Medical Centre, Amsterdam, the Netherlands.

    Aims: Genetic factors explain a proportion of the inter-individual variation in the risk for atherosclerotic events, but the genetic basis of atherosclerosis and atherothrombosis in families with Mendelian forms of premature atherosclerosis is incompletely understood. We set out to unravel the molecular pathology in a large kindred with an autosomal dominant inherited form of premature atherosclerosis.

    Parametric linkage analysis was performed in a pedigree comprising 4 generations, of which a total of 11 members suffered from premature vascular events. A parametric LOD-score of 3.31 was observed for a 4.4 Mb interval on chromosome 12. Upon sequencing, a non-synonymous variant in KERA (c.920C>G; p.Ser307Cys) was identified. The variant was absent from nearly 28,000 individuals, including 2,571 patients with premature atherosclerosis. KERA, a proteoglycan protein, was expressed in lipid-rich areas of human atherosclerotic lesions, but not in healthy arterial specimens. Moreover, KERA expression in plaques was significantly associated with plaque size in a carotid-collar Apoe-/- mice (r2 = 0.69; p<0.0001).

    Conclusion: A rare variant in KERA was identified in a large kindred with premature atherosclerosis. The identification of KERA in atherosclerotic plaque specimen in humans and mice lends support to its potential role in atherosclerosis.

    PloS one 2014;9;5;e98289

  • Driver somatic mutations identify distinct disease entities within myeloid neoplasms with myelodysplasia.

    Malcovati L, Papaemmanuil E, Ambaglio I, Elena C, Gallì A, Della Porta MG, Travaglino E, Pietra D, Pascutto C, Ubezio M, Bono E, Da Vià MC, Brisci A, Bruno F, Cremonesi L, Ferrari M, Boveri E, Invernizzi R, Campbell PJ and Cazzola M

    Department of Molecular Medicine, University of Pavia, Pavia, Italy;

    Our knowledge of the genetic basis of myelodysplastic syndromes (MDS) and myelodysplastic/myeloproliferative neoplasms (MDS/MPN) has considerably improved. To define genotype/phenotype relationships of clinical relevance, we studied 308 patients with MDS, MDS/MPN or acute myeloid leukemia evolving from MDS. Unsupervised statistical analysis, including the World Health Organization (WHO) classification criteria and somatic mutations, showed that MDS associated with SF3B1-mutation (51/245 patients, 20.8%) is a distinct nosologic entity irrespective of current morphological classification criteria. Conversely, MDS with ring sideroblasts with nonmutated SF3B1 segregated in different clusters with other MDS subtypes. Mutations of genes involved DNA methylation, splicing factors other than SF3B1, and genes of the RAS pathway and cohesin complex were independently associated with multilineage dysplasia and identify a distinct subset (51/245 patients, 20.8%). No recurrent mutation pattern correlated with unilineage dysplasia without ring sideroblasts. Irrespective of driver somatic mutations, a threshold of 5% bone marrow blasts retained a significant discriminant value for identifying cases with clonal evolution. Co-mutation of TET2 and SRSF2 was highly predictive of a myeloid neoplasm characterized by myelodysplasia and monocytosis, including but not limited to chronic myelomonocytic leukemia. These results serve as a proof of concept that a molecular classification of myeloid neoplasms is feasible.

    Blood 2014

  • The common marmoset genome provides insight into primate biology and evolution.

    Marmoset Genome Sequencing and Analysis Consortium and Marmoset Genome Sequencing and Analysis Consortium

    We report the whole-genome sequence of the common marmoset (Callithrix jacchus). The 2.26-Gb genome of a female marmoset was assembled using Sanger read data (6×) and a whole-genome shotgun strategy. A first analysis has permitted comparison with the genomes of apes and Old World monkeys and the identification of specific features that might contribute to the unique biology of this diminutive primate, including genetic changes that may influence body size, frequent twinning and chimerism. We observed positive selection in growth hormone/insulin-like growth factor genes (growth pathways), respiratory complex I genes (metabolic pathways), and genes encoding immunobiological factors and proteases (reproductive and immunity pathways). In addition, both protein-coding and microRNA genes related to reproduction exhibited evidence of rapid sequence evolution. This genome sequence for a New World monkey enables increased power for comparative analyses among available primate genomes and facilitates biomedical research application.

    Funded by: NHGRI NIH HHS: K99 HG005846, R01 HG002385, U54 HG003079, U54 HG003273; NIGMS NIH HHS: R01 GM059290

    Nature genetics 2014;46;8;850-7

  • Parallel dynamics and evolution: Protein conformational fluctuations and assembly reflect evolutionary changes in sequence and structure.

    Marsh JA and Teichmann SA

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Protein structure is dynamic: the intrinsic flexibility of polypeptides facilitates a range of conformational fluctuations, and individual protein chains can assemble into complexes. Proteins are also dynamic in evolution: significant variations in secondary, tertiary and quaternary structure can be observed among divergent members of a protein family. Recent work has highlighted intriguing similarities between these structural and evolutionary dynamics occurring at various levels. Here we review evidence showing how evolutionary changes in protein sequence and structure are often closely related to local protein flexibility and disorder, large-scale motions and quaternary structure assembly. We suggest that these correspondences can be largely explained by neutral evolution, while deviations between structural and evolutionary dynamics can provide valuable functional insights. Finally, we address future prospects for the field and practical applications that arise from a deeper understanding of the intimate relationship between protein structure, dynamics, function and evolution.

    BioEssays : news and reviews in molecular, cellular and developmental biology 2014;36;2;209-18

  • Assessing multivariate gene-metabolome associations with rare variants using Bayesian reduced rank regression.

    Marttinen P, Pirinen M, Sarin AP, Gillberg J, Kettunen J, Surakka I, Kangas AJ, Soininen P, O'Reilly PF, Kaakinen M, Kähönen M, Lehtimäki T, Ala-Korpela M, Raitakari OT, Salomaa V, Järvelin MR, Ripatti S and Kaski S

    Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Finland, Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA, Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Finland, Unit of Public Health Genomics, National Institute for Health and Welfare, Helsinki, Finland, Computational Medicine, Institute of Health Sciences, University of Oulu and Oulu University Hospital, Oulu, Finland, NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland, Department of Epidemiology and Biostatistics, MRC Health Protection, Agency (HPA) Centre for Environment and Health, School of Public Health, Imperial College, London, UK, Institute of Health Sciences, University of Oulu, Finland, Biocenter Oulu, University of Oulu, Finland, Department of Clinical Physiology, Tampere University Hospital and University of Tampere, Department of Clinical Chemistry, Fimlab Laboratories, University of Tampere School of Medicine, Tampere, Finland, Computational Medicine, School of Social and Community Medicine and the Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK, Department of Clinical Physiology and Nuclear Medicine, University of Turku and Turku University Hospital, Turku, Finland, Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku and Turku University Hospital, Turku, Finland, Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Finland, Unit of Primary Care, Oulu University Hospital, Finland, Department of Children and Young People and Families, National Institute for Health and Welfare, Oulu, Finland, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK, Hjelt Institute, University of Helsinki, Helsinki, Finland, Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Finland.

    Motivation: A typical genome-wide association study searches for associations between single nucleotide polymorphisms (SNPs) and a univariate phenotype. However, there is a growing interest to investigate associations between genomics data and multivariate phenotypes, for example in gene expression or metabolomics studies. A common approach is to perform a univariate test between each genotype-phenotype pair, and then to apply a stringent significance cutoff to account for the large number of tests performed. However, this approach has limited ability to uncover dependencies involving multiple variables. Another trend in the current genetics is the investigation of the impact of rare variants on the phenotype, where the standard methods often fail due to lack of power when the minor allele is present in only a limited number of individuals.

    Results: We propose a new statistical approach based on Bayesian reduced rank regression to assess the impact of multiple SNPs on a high-dimensional phenotype. Due to the method's ability to combine information over multiple SNPs and phenotypes, it is particularly suitable for detecting associations involving rare variants. We demonstrate the potential of our method and compare it with alternatives using the Northern Finland Birth Cohort with 4,702 individuals, for whom genome-wide SNP data along with lipoprotein profiles comprising 74 traits are available. We discovered two genes (XRCC4 and MTHFD2L) without previously reported associations, which replicated in a combined analysis of two additional cohorts: 2,390 individuals from the Cardiovascular Risk in Young Finns study and 3,659 individuals from the FINRISK Study.

    R-code freely available for download at


    Bioinformatics (Oxford, England) 2014

  • The genome sequence of ectromelia virus Naval and Cornell isolates from outbreaks in North America.

    Mavian C, López-Bueno A, Bryant NA, Seeger K, Quail MA, Harris D, Barrell B and Alcami A

    Centro de Biología Molecular Severo Ochoa (Consejo Superior de Investigaciones Científicas-Universidad Autónoma de Madrid), Nicolas Cabrera 1, Campus de Cantoblanco, Madrid, Spain.

    Ectromelia virus (ECTV) is the causative agent of mousepox, a disease of laboratory mouse colonies and an excellent model for human smallpox. We report the genome sequence of two isolates from outbreaks in laboratory mouse colonies in the USA in 1995 and 1999: ECTV-Naval and ECTV-Cornell, respectively. The genome of ECTV-Naval and ECTV-Cornell was sequenced by the 454-Roche technology. The ECTV-Naval genome was also sequenced by the Sanger and Illumina technologies in order to evaluate these technologies for poxvirus genome sequencing. Genomic comparisons revealed that ECTV-Naval and ECTV-Cornell correspond to the same virus isolated from independent outbreaks. Both ECTV-Naval and ECTV-Cornell are extremely virulent in susceptible BALB/c mice, similar to ECTV-Moscow. This is consistent with the ECTV-Naval genome sharing 98.2% DNA sequence identity with that of ECTV-Moscow, and indicates that the genetic differences with ECTV-Moscow do not affect the virulence of ECTV-Naval in the mousepox model of footpad infection.

    Virology 2014;462-463C;218-226

  • Identification of novel genetic Loci associated with thyroid peroxidase antibodies and clinical thyroid disease.

    Medici M, Porcu E, Pistis G, Teumer A, Brown SJ, Jensen RA, Rawal R, Roef GL, Plantinga TS, Vermeulen SH, Lahti J, Simmonds MJ, Husemoen LL, Freathy RM, Shields BM, Pietzner D, Nagy R, Broer L, Chaker L, Korevaar TI, Plia MG, Sala C, Völker U, Richards JB, Sweep FC, Gieger C, Corre T, Kajantie E, Thuesen B, Taes YE, Visser WE, Hattersley AT, Kratzsch J, Hamilton A, Li W, Homuth G, Lobina M, Mariotti S, Soranzo N, Cocca M, Nauck M, Spielhagen C, Ross A, Arnold A, van de Bunt M, Liyanarachchi S, Heier M, Grabe HJ, Masciullo C, Galesloot TE, Lim EM, Reischl E, Leedman PJ, Lai S, Delitala A, Bremner AP, Philips DI, Beilby JP, Mulas A, Vocale M, Abecasis G, Forsen T, James A, Widen E, Hui J, Prokisch H, Rietzschel EE, Palotie A, Feddema P, Fletcher SJ, Schramm K, Rotter JI, Kluttig A, Radke D, Traglia M, Surdulescu GL, He H, Franklyn JA, Tiller D, Vaidya B, de Meyer T, Jørgensen T, Eriksson JG, O'Leary PC, Wichmann E, Hermus AR, Psaty BM, Ittermann T, Hofman A, Bosi E, Schlessinger D, Wallaschofski H, Pirastu N, Aulchenko YS, de la Chapelle A, Netea-Maier RT, Gough SC, Meyer Zu Schwabedissen H, Frayling TM, Kaufman JM, Linneberg A, Räikkönen K, Smit JW, Kiemeney LA, Rivadeneira F, Uitterlinden AG, Walsh JP, Meisinger C, den Heijer M, Visser TJ, Spector TD, Wilson SG, Völzke H, Cappola A, Toniolo D, Sanna S, Naitza S and Peeters RP

    Department of Internal Medicine, Erasmus Medical Center Rotterdam, Rotterdam, The Netherlands.

    Autoimmune thyroid diseases (AITD) are common, affecting 2-5% of the general population. Individuals with positive thyroid peroxidase antibodies (TPOAbs) have an increased risk of autoimmune hypothyroidism (Hashimoto's thyroiditis), as well as autoimmune hyperthyroidism (Graves' disease). As the possible causative genes of TPOAbs and AITD remain largely unknown, we performed GWAS meta-analyses in 18,297 individuals for TPOAb-positivity (1769 TPOAb-positives and 16,528 TPOAb-negatives) and in 12,353 individuals for TPOAb serum levels, with replication in 8,990 individuals. Significant associations (P<5×10(-8)) were detected at TPO-rs11675434, ATXN2-rs653178, and BACH2-rs10944479 for TPOAb-positivity, and at TPO-rs11675434, MAGI3-rs1230666, and KALRN-rs2010099 for TPOAb levels. Individual and combined effects (genetic risk scores) of these variants on (subclinical) hypo- and hyperthyroidism, goiter and thyroid cancer were studied. Individuals with a high genetic risk score had, besides an increased risk of TPOAb-positivity (OR: 2.18, 95% CI 1.68-2.81, P = 8.1×10(-8)), a higher risk of increased thyroid-stimulating hormone levels (OR: 1.51, 95% CI 1.26-1.82, P = 2.9×10(-6)), as well as a decreased risk of goiter (OR: 0.77, 95% CI 0.66-0.89, P = 6.5×10(-4)). The MAGI3 and BACH2 variants were associated with an increased risk of hyperthyroidism, which was replicated in an independent cohort of patients with Graves' disease (OR: 1.37, 95% CI 1.22-1.54, P = 1.2×10(-7) and OR: 1.25, 95% CI 1.12-1.39, P = 6.2×10(-5)). The MAGI3 variant was also associated with an increased risk of hypothyroidism (OR: 1.57, 95% CI 1.18-2.10, P = 1.9×10(-3)). This first GWAS meta-analysis for TPOAbs identified five newly associated loci, three of which were also associated with clinical thyroid disease. With these markers we identified a large subgroup in the general population with a substantially increased risk of TPOAbs. The results provide insight into why individuals with thyroid autoimmunity do or do not eventually develop thyroid disease, and these markers may therefore predict which TPOAb-positives are particularly at risk of developing clinical thyroid dysfunction.

    PLoS genetics 2014;10;2;e1004123

  • The sex-specific associations of the aromatase gene with Alzheimer's disease and its interaction with IL10 in the Epistasis Project.

    Medway C, Combarros O, Cortina-Borja M, Butler HT, Ibrahim-Verbaas CA, de Bruijn RF, Koudstaal PJ, van Duijn CM, Ikram MA, Mateo I, Sánchez-Juan P, Lehmann MG, Heun R, Kölsch H, Deloukas P, Hammond N, Coto E, Alvarez V, Kehoe PG, Barber R, Wilcock GK, Brown K, Belbin O, Warden DR, Smith AD, Morgan K and Lehmann DJ

    Human Genetics Research, Queens Medical Centre, School of Molecular Medical Sciences, University of Nottingham, Nottingham, UK.

    Epistasis between interleukin-10 (IL10) and aromatase gene polymorphisms has previously been reported to modify the risk of Alzheimer's disease (AD). However, although the main effects of aromatase variants suggest a sex-specific effect in AD, there has been insufficient power to detect sex-specific epistasis between these genes to date. Here we used the cohort of 1757 AD patients and 6294 controls in the Epistasis Project. We replicated the previously reported main effects of aromatase polymorphisms in AD risk in women, for example, adjusted odds ratio of disease for rs1065778 GG=1.22 (95% confidence interval: 1.01-1.48, P=0.03). We also confirmed a reported epistatic interaction between IL10 rs1800896 and aromatase (CYP19A1) rs1062033, again only in women: adjusted synergy factor=1.94 (1.16-3.25, 0.01). Aromatase, a rate-limiting enzyme in the synthesis of estrogens, is expressed in AD-relevant brain regions ,and is downregulated during the disease. IL-10 is an anti-inflammatory cytokine. Given that estrogens have neuroprotective and anti-inflammatory activities and regulate microglial cytokine production, epistasis is biologically plausible. Diminishing serum estrogen in postmenopausal women, coupled with suboptimal brain estrogen synthesis, may contribute to the inflammatory state, that is a pathological hallmark of AD.

    European journal of human genetics : EJHG 2014;22;2;216-20

  • C. elegans whole genome sequencing reveals mutational signatures related to carcinogens and DNA repair deficiency.

    Meier B, Cooke SL, Weiss J, Bailly AP, Alexandrov LB, Marshall J, Raine K, Maddison M, Anderson E, Stratton MR, Gartner A and Campbell PJ

    University of Dundee;

    Mutation is associated with developmental and hereditary disorders, aging and cancer. While we understand some mutational processes operative in human disease, most remain mysterious. We used C. elegans whole genome sequencing to model mutational signatures, analyzing 183 worm populations across 17 DNA repair-deficient backgrounds, propagated for 20 generations or exposed to carcinogens. The baseline mutation rate in C. elegans was ~1/genome/generation, not overtly altered across several DNA repair deficiencies over 20 generations. Telomere erosion led to complex chromosomal rearrangements initiated by breakage-fusion-bridge cycles and completed by simultaneously acquired, localized clusters of breakpoints. Aflatoxin-B1 induced substitutions of guanines in GpC context, as observed in aflatoxin-induced liver cancers. Mutational burden increased with impaired nucleotide excision repair. Cisplatin and mechlorethamine, DNA crosslinking agents, caused dose- and genotype-dependent signatures among indels, substitutions and rearrangements. Strikingly, both agents induced clustered rearrangements resembling 'chromoanasynthesis,' a replication-based mutational signature seen in constitutional genomic disorders, suggesting interstrand crosslinks may play a pathogenic role in such events. Cisplatin mutagenicity was most pronounced in xpf-1 mutants, suggesting this gene critically protects cells against platinum chemotherapy. Thus, experimental model systems combined with genome sequencing can recapture and mechanistically explain mutational signatures associated with human disease.

    Genome research 2014

  • Respiratory Tract Samples, Viral Load and Genome Fraction Yield in patients with Middle East Respiratory Syndrome.

    Memish ZA, Al-Tawfiq JA, Makhdoom HQ, Assiri A, Alhakeem RF, Albarrak A, Alsubaie S, Al-Rabeeah AA, Hajomar WH, Hussain R, Kheyami AM, Almutairi A, Azhar EI, Drosten C, Watson SJ, Kellam P, Cotten M and Zumla A

    Global Centre for Mass Gatherings Medicine (GCMGM), Ministry of Health, Riyadh, Kingdom of Saudi Arabia (KSA).

    Background:  Analysis of clinical samples of patients with new viral infections is critical to confirm the diagnosis, provide viral load and sequence data necessary for characterizing viral kinetics, transmission and evolution of the virus. We analysed samples from 112 patients infected with the recently discovered Middle East Respiratory Syndrome Coronavirus (MERS-CoV) METHODS:  Respiratory tract samples from MERS-CoV PCR-confirmed cases were analysed for yields of MERS-CoV viral load and fraction of MERS-CoV genome. These values were analyzed to determine associations with clinical sample type.

    Results:  Samples from 112 MERS-CoV PCR-positive individuals were analysed: 13 Sputum samples, 64 Nasopharyngeal swabs, 30 Tracheal aspirates, 3 Broncho-alveolar lavages and 2 were of unknown origin. Tracheal aspirates yielded significantly higher MERS-CoV high viral load when compared with Nasopharyngeal swabs (p=0.005) and to Sputum (p=0.0001). Tracheal aspirates had similar viral load compared to Broncho-alveolar lavage (p=0.3079). Broncho-alveolar lavage samples and tracheal aspirates had significantly higher vital load values than nasopharyngeal swabs (p=0.0095 and p=0.0002) and Sputum samples (p=0.0009 and p=0.0001). The genome yield from tracheal aspirates and bronchoalveolar lavage samples were similar (p=0.1174).

    Conclusions:  Lower respiratory tract samples yield significantly higher MERS-CoV viral load, and genome fractions than upper respiratory tract samples.

    The Journal of infectious diseases 2014

  • Community Case Clusters of Middle East Respiratory Syndrome Coronavirus in Hafr Al-Batin, Kingdom of Saudi Arabia: A Descriptive Genomic study.

    Memish ZA, Cotten M, Watson SJ, Kellam P, Zumla A, Alhakeem RF, Assiri A, Rabeeah AA and Al-Tawfiq JA

    Global Centre for Mass Gatherings Medicine (GCMGM), Ministry of Health, Riyadh, Kingdom of Saudi Arabia; College of Medicine, Alfaisal University, Riyadh, Kingdom of Saudi Arabia. Electronic address:

    The Middle East respiratory syndrome coronavirus (MERS-CoV) was first described in September 2012 and had caused a total of 191 cases of MERS-CoV infection with 82 deaths. Camels have been implicated as the reservoir of MERS-CoV, but the exact source and mode of transmission for most patients remain unknown. During a 3 month period, June to August 2013, there were 12 positive MERS-CoV cases reported from the Hafr Al-Batin district in the north east region of the Kingdom of Saudi Arabia. In addition to the different regional camel festivals in neighboring countries, Hafr Al-Batin has the biggest camel market in the entire Kingdom and host an annual camel festival. Thus, we conducted a detailed epidemiological, clinical and genomic study to ascertain common exposure and transmission patterns of all cases of MERS-CoV reported from Hafr Al-Batin. The genetic data indicated that at least two of the infected contacts could not have been directly infected from the index patient and alternate source should be considered. Camels appear as the likely source but other animals have not been ruled out. More detailed case control studies with detailed case histories, epidemiological information and genomic analysis are being conducted to delineate the missing pieces in the transmission dynamics of MERS-CoV outbreak.

    International journal of infectious diseases : IJID : official publication of the International Society for Infectious Diseases 2014

  • Online questionnaire development: using film to engage participants and then gather attitudes towards the sharing of genomic data.

    Middleton A, Bragin E, Morley KI, Parker M and DDD Study

    Wellcome Trust Sanger Institute, Cambridge, UK. Electronic address:

    How can a researcher engage a participant in a survey, when the subject matter may be perceived as 'challenging' or even be totally unfamiliar to the participant? The Genomethics study addressed this via the creation and delivery of a novel online questionnaire containing 10 integrated films. The films documented various ethical dilemmas raised by genomic technologies and the survey ascertained attitudes towards these. Participants were recruited into the research using social media, traditional media and email invitation. The film-survey strategy was successful: 11,336 initial hits on the survey website led to 6944 completed surveys. Participants included from those who knew nothing of the subject matter through to experts in the field of genomics (61% compliance rate), 72% of participants answered every single question. This paper summarises the survey design process and validation methods applied. The recruitment strategy and results from the survey are presented elsewhere.

    Funded by: Department of Health; Wellcome Trust

    Social science research 2014;44;211-23

  • Finding people who will tell you their thoughts on genomics-recruitment strategies for social sciences research.

    Middleton A, Bragin E, Parker M and on behalf of the DDD Study

    Wellcome Trust Sanger Institute, Cambridge, CB10 1SA, UK,

    This paper offers a description of how social media, traditional media and direct invitation were used as tools for the recruitment of 6,944 research participants for a social sciences study on genomics. The remit was to gather the views of various stakeholders towards sharing incidental findings from whole genome studies. This involved recruiting members of the public, genetic health professionals, genomic researchers and non-genetic health professionals. A novel survey was designed that contained ten integrated films; this was made available online and open for completion by anyone worldwide. The recruitment methods are described together with the convenience and snowballing sampling framework. The most successful strategy involved the utilisation of social media; Facebook, Blogging, Twitter, LinkedIn and Google Ads led to the ascertainment of over 75 % of the final sample. We conclude that the strategies used were successful in recruiting in eclectic mix of appropriate participants. Design of the survey and results from the study are presented separately.

    Journal of community genetics 2014

  • Position statement on opportunistic genomic screening from the Association of Genetic Nurses and Counsellors (UK and Ireland).

    Middleton A, Patch C, Wiggins J, Barnes K, Crawford G, Benjamin C and Bruce A

    Human Genetics, Wellcome Trust Sanger Institute, Cambridge, UK.

    The American College of Medical Genetics and Genomics released recommendations for reporting incidental findings (IFs) in clinical exome and genome sequencing. These suggest 'opportunistic genomic screening' should be available to both adults and children each time a sequence is done and would be undertaken without seeking preferences from the patient first. Should opportunistic genomic screening be implemented in the United Kingdom, the Association of Genetic Nurses and Counsellors (AGNC), which represents British and Irish genetic counsellors and nurses, feels strongly that the following must be considered (see article for complete list): (1) Following appropriate genetic counselling, patients should be allowed to consent to or opt out of opportunistic genomic screening. (2) If true IFs are discovered the AGNC are guided by the report from the Joint Committee on Medical Genetics about the sharing of genetic testing results. (3) Children should not be routinely tested for adult-onset conditions. (4) The formation of a list of variants should involve a representative from the AGNC as well as a patient support group. (5) The variants should be for serious or life-threatening conditions for which there are treatments or preventative strategies available. (6) There needs to be robust evidence that the benefits of opportunistic screening outweigh the potential harms. (7) The clinical validity and utility of variants should be known. (8) There must be a quality assurance framework that operates to International standards for laboratory testing. (9) Psychosocial research is urgently needed in this area to understand the impact on patients.European Journal of Human Genetics advance online publication, 8 January 2014; doi:10.1038/ejhg.2013.301.

    European journal of human genetics : EJHG 2014

  • The International Cancer Genome Consortium's evolving data-protection policies.

    Milius D, Dove ES, Chalmers D, Dyke SO, Kato K, Nicolás P, Ouellette BF, Ozenberger B, Rodriguez LL, Zeps N and Joly Y

    Centre of Genomics and Policy, McGill University, Montreal, Quebec, Canada.

    Nature biotechnology 2014;32;6;519-23

  • Metabolic and Target-Site Mechanisms Combine to Confer Strong DDT Resistance in Anopheles gambiae.

    Mitchell SN, Rigden DJ, Dowd AJ, Lu F, Wilding CS, Weetman D, Dadzie S, Jenkins AM, Regna K, Boko P, Djogbenou L, Muskavitch MA, Ranson H, Paine MJ, Mayans O and Donnelly MJ

    Department of Vector Biology, Liverpool School of Tropical Medicine, Liverpool, United Kingdom.

    The development of resistance to insecticides has become a classic exemplar of evolution occurring within human time scales. In this study we demonstrate how resistance to DDT in the major African malaria vector Anopheles gambiae is a result of both target-site resistance mechanisms that have introgressed between incipient species (the M- and S-molecular forms) and allelic variants in a DDT-detoxifying enzyme. Sequencing of the detoxification enzyme, Gste2, from DDT resistant and susceptible strains of An. gambiae, revealed a non-synonymous polymorphism (I114T), proximal to the DDT binding domain, which segregated with strain phenotype. Recombinant protein expression and DDT metabolism analysis revealed that the proteins from the susceptible strain lost activity at higher DDT concentrations, characteristic of substrate inhibition. The effect of I114T on GSTE2 protein structure was explored through X-ray crystallography. The amino acid exchange in the DDT-resistant strain introduced a hydroxyl group nearby the hydrophobic DDT-binding region. The exchange does not result in structural alterations but is predicted to facilitate local dynamics and enzyme activity. Expression of both wild-type and 114T alleles the allele in Drosophila conferred an increase in DDT tolerance. The 114T mutation was significantly associated with DDT resistance in wild caught M-form populations and acts in concert with target-site mutations in the voltage gated sodium channel (Vgsc-1575Y and Vgsc-1014F) to confer extreme levels of DDT resistance in wild caught An. gambiae.

    PloS one 2014;9;3;e92662

  • Genome-wide analysis of selection on the malaria parasite Plasmodium falciparum in West African populations of differing infection endemicity.

    Mobegi VA, Duffy CW, Amambua-Ngwa A, Loua KM, Laman E, Nwakanma DC, Macinnis B, Aspeling-Jones H, Murray L, Clark TG, Kwiatkowski DP and Conway DJ

    Pathogen Molecular Biology Department, London School of Hygiene and Tropical Medicine, London, UK; Medical Research Council Unit, Fajara, Banjul, The Gambia; National Institute of Public Health, Conakry, Republic of Guinea; The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK; Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.

    Locally varying selection on pathogens may be due to differences in drug pressure, host immunity, transmission opportunities between hosts, or the intensity of between-genotype competition within hosts. Highly recombining populations of the human malaria parasite Plasmodium falciparum throughout West Africa are closely related, as gene flow is relatively unrestricted in this endemic region, but markedly varying ecology and transmission intensity should cause distinct local selective pressures. Genome-wide analysis of sequence variation was undertaken on a sample of 100 P. falciparum clinical isolates from a highly endemic region of the Republic of Guinea where transmission occurs for most of each year, and compared with data from 52 clinical isolates from a previously sampled population from The Gambia where there is relatively limited seasonal malaria transmission. Paired-end short read sequences were mapped against the 3D7 P. falciparum reference genome sequence, and data on 136144 SNPs were obtained. Within-population analyses identifying loci showing evidence of recent positive directional selection and balancing selection confirm that antimalarial drugs and host immunity have been major selective agents. Many of the signatures of recent directional selection reflected by standardised integrated haplotype scores (|iHS|) were population-specific, including differences at drug resistance loci due to historically different antimalarial use between the countries. In contrast, both populations showed a similar set of loci likely to be under balancing selection as indicated by very high Tajima's D values, including a significant over-representation of genes expressed at the merozoite stage that invades erythrocytes, and several previously validated targets of acquired immunity. Between-population FST analysis identified exceptional differentiation of allele frequencies at a small number of loci, most markedly for five SNPs covering a 15kb region within and flanking the gdv1 gene that regulates the early stages of gametocyte development, which is likely related to the extreme differences in mosquito vector abundance and seasonality which determine the transmission opportunities for the sexual stage of the parasite.

    Molecular biology and evolution 2014

  • Widespread epidemic cholera caused by a restricted subset of Vibrio cholerae clones.

    Moore S, Thomson N, Mutreja A and Piarroux R

    Aix-Marseille University, UMR MD3, Marseilles, France.

    Since 1817, seven cholera pandemics have plagued humankind. As the causative agent, Vibrio cholerae, is autochthonous in the aquatic ecosystem and some studies have revealed links between outbreaks and fluctuations in climatic and aquatic conditions, it has been widely assumed that cholera epidemics are triggered by environmental factors that promote the growth of local bacterial reservoirs. However, mounting epidemiological findings and genome sequence analysis of clinical isolates have indicated that epidemics are largely unassociated with most of the V. cholerae strains in aquatic ecosystems. Instead, only a specific subset of V. cholerae El Tor 'types' appears to be responsible for current epidemics. A recent report examining the evolution of a variety of V. cholerae strains indicates that the current pandemic is monophyletic and originated from a single ancestral clone that has spread globally in successive waves. In this review, we examine the clonal nature of the disease, with the example of the recent history of cholera in the Americas. Epidemiological data and genome sequence-based analysis of V. cholerae isolates demonstrate that the cholera epidemics of the 1990s in South America were triggered by the importation of a pathogenic V. cholerae strain that gradually spread throughout the region until local outbreaks ceased in 2001. Latin America remained almost unaffected by the disease until a new toxigenic V. cholerae clone was imported into Haiti in 2010. Overall, cholera appears to be largely caused by a subset of specific V. cholerae clones rather than by the vast diversity of V. cholerae strains in the environment.

    Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases 2014

  • Heterogeneity in the frequency and characteristics of homologous recombination in pneumococcal evolution.

    Mostowy R, Croucher NJ, Hanage WP, Harris SR, Bentley S and Fraser C

    Department of Infectious Disease Epidemiology, Imperial College London, St Mary's Campus, London, United Kingdom.

    The bacterium Streptococcus pneumoniae (pneumococcus) is one of the most important human bacterial pathogens, and a leading cause of morbidity and mortality worldwide. The pneumococcus is also known for undergoing extensive homologous recombination via transformation with exogenous DNA. It has been shown that recombination has a major impact on the evolution of the pathogen, including acquisition of antibiotic resistance and serotype-switching. Nevertheless, the mechanism and the rates of recombination in an epidemiological context remain poorly understood. Here, we proposed several mathematical models to describe the rate and size of recombination in the evolutionary history of two very distinct pneumococcal lineages, PMEN1 and CC180. We found that, in both lineages, the process of homologous recombination was best described by a heterogeneous model of recombination with single, short, frequent replacements, which we call micro-recombinations, and rarer, multi-fragment, saltational replacements, which we call macro-recombinations. Macro-recombination was associated with major phenotypic changes, including serotype-switching events, and thus was a major driver of the diversification of the pathogen. We critically evaluate biological and epidemiological processes that could give rise to the micro-recombination and macro-recombination processes.

    PLoS genetics 2014;10;5;e1004300

  • A New Method To Determine In Vivo Interactomes Reveals Binding of the Legionella pneumophila Effector PieE to Multiple Rab GTPases.

    Mousnier A, Schroeder GN, Stoneham CA, So EC, Garnett JA, Yu L, Matthews SJ, Choudhary JS, Hartland EL and Frankel G

    MRC Centre for Molecular Bacteriology and Infection, Department of Life Sciences, Imperial College London, London, United Kingdom.

    Unlabelled: Legionella pneumophila, the causative agent of Legionnaires' disease, uses the Dot/Icm type IV secretion system (T4SS) to translocate more than 300 effectors into host cells, where they subvert host cell signaling. The function and host cell targets of most effectors remain unknown. PieE is a 69-kDa Dot/Icm effector containing three coiled-coil (CC) regions and 2 transmembrane (TM) helices followed by a fourth CC region. Here, we report that PieE dimerized by an interaction between CC3 and CC4. We found that ectopically expressed PieE localized to the endoplasmic reticulum (ER) and induced the formation of organized smooth ER, while following infection PieE localized to the Legionella-containing vacuole (LCV). To identify the physiological targets of PieE during infection, we established a new purification method for which we created an A549 cell line stably expressing the Escherichia coli biotin ligase BirA and infected the cells with L. pneumophila expressing PieE fused to a BirA-specific biotinylation site and a hexahistidine tag. Following tandem Ni(2+) nitrilotriacetic acid (NTA) and streptavidin affinity chromatography, the effector-target complexes were analyzed by mass spectrometry. This revealed interactions of PieE with multiple host cell proteins, including the Rab GTPases 1a, 1b, 2a, 5c, 6a, 7, and 10. Binding of the Rab GTPases, which was validated by yeast two-hybrid binding assays, was mediated by the PieE CC1 and CC2. In summary, using a novel, highly specific strategy to purify effector complexes from infected cells, which is widely applicable to other pathogens, we identified PieE as a multidomain LCV protein with promiscuous Rab GTPase-binding capacity.

    Importance: The respiratory pathogen Legionella pneumophila uses the Dot/Icm type IV secretion system to translocate more than 300 effector proteins into host cells. The function of most effectors in infection remains unknown. One of the bottlenecks for their characterization is the identification of target proteins. Frequently used in vitro approaches are not applicable to all effectors and suffer from high rates of false positives or missed interactions, as they are not performed in the context of an infection. Here, we determine key functional domains of the effector PieE and describe a new method to identify host cell targets under physiological infection conditions. Our approach, which is applicable to other pathogens, uncovered the interaction of PieE with several proteins involved in membrane trafficking, in particular Rab GTPases, revealing new details of the Legionella infection strategy and demonstrating the potential of this method to greatly advance our understanding of the molecular basis of infection.

    mBio 2014;5;4

  • Reciprocal duplication of the williams-beuren syndrome deletion on chromosome 7q11.23 is associated with schizophrenia.

    Mulle JG, Pulver AE, McGrath JA, Wolyniec PS, Dodd AF, Cutler DJ, Sebat J, Malhotra D, Nestadt G, Conrad DF, Hurles M, Barnes CP, Ikeda M, Iwata N, Levinson DF, Gejman PV, Sanders AR, Duan J, Mitchell AA, Peter I, Sklar P, O'Dushlaine CT, Grozeva D, O'Donovan MC, Owen MJ, Hultman CM, Kähler AK, Sullivan PF, Molecular Genetics of Schizophrenia Consortium, Kirov G and Warren ST

    Department of Epidemiology, Rollins School of Public Health, Emory University; Department of Human Genetics, Emory University School of Medicine, Atlanta, Georgia. Electronic address:

    Background: Several copy number variants (CNVs) have been implicated as susceptibility factors for schizophrenia (SZ). Some of these same CNVs also increase risk for autism spectrum disorders, suggesting an etiologic overlap between these conditions. Recently, de novo duplications of a region on chromosome 7q11.23 were associated with autism spectrum disorders. The reciprocal deletion of this region causes Williams-Beuren syndrome.

    Methods: We assayed an Ashkenazi Jewish cohort of 554 SZ cases and 1014 controls for genome-wide CNV. An excess of large rare and de novo CNVs were observed, including a 1.4 Mb duplication on chromosome 7q11.23 identified in two unrelated patients. To test whether this 7q11.23 duplication is also associated with SZ, we obtained data for 14,387 SZ cases and 28,139 controls from seven additional studies with high-resolution genome-wide CNV detection. We performed a meta-analysis, correcting for study population of origin, to assess whether the duplication is associated with SZ.

    Results: We found duplications at 7q11.23 in 11 of 14,387 SZ cases with only 1 in 28,139 control subjects (unadjusted odds ratio 21.52, 95% confidence interval: 3.13-922.6, p value 5.5 × 10(-5); adjusted odds ratio 10.8, 95% confidence interval: 1.46-79.62, p value .007). Of three SZ duplication carriers with detailed retrospective data, all showed social anxiety and language delay premorbid to SZ onset, consistent with both human studies and animal models of the 7q11.23 duplication.

    Conclusions: We have identified a new CNV associated with SZ. Reciprocal duplication of the Williams-Beuren syndrome deletion at chromosome 7q11.23 confers an approximately tenfold increase in risk for SZ.

    Funded by: Medical Research Council: G0800509; NIMH NIH HHS: R01 MH080129, U01 MH094411

    Biological psychiatry 2014;75;5;371-7

  • Transmissible [corrected] dog cancer genome reveals the origin and history of an ancient cell lineage.

    Murchison EP, Wedge DC, Alexandrov LB, Fu B, Martincorena I, Ning Z, Tubio JM, Werner EI, Allen J, De Nardi AB, Donelan EM, Marino G, Fassati A, Campbell PJ, Yang F, Burt A, Weiss RA and Stratton MR

    Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Canine transmissible venereal tumor (CTVT) is the oldest known somatic cell lineage. It is a transmissible cancer that propagates naturally in dogs. We sequenced the genomes of two CTVT tumors and found that CTVT has acquired 1.9 million somatic substitution mutations and bears evidence of exposure to ultraviolet light. CTVT is remarkably stable and lacks subclonal heterogeneity despite thousands of rearrangements, copy-number changes, and retrotransposon insertions. More than 10,000 genes carry nonsynonymous variants, and 646 genes have been lost. CTVT first arose in a dog with low genomic heterozygosity that may have lived about 11,000 years ago. The cancer spawned by this individual dispersed across continents about 500 years ago. Our results provide a genetic identikit of an ancient dog and demonstrate the robustness of mammalian somatic cells to survive for millennia despite a massive mutation burden.

    Funded by: Wellcome Trust: 098051

    Science (New York, N.Y.) 2014;343;6169;437-40

  • The use of anthropometric measures for cardiometabolic risk identification in a rural african population.

    Murphy GA, Asiki G, Nsubuga RN, Young EH, Seeley J, Sandhu MS and Kamali A

    Corresponding author: Georgina A.V. Murphy,

    Diabetes care 2014;37;4;e64-5

  • Extreme Growth Failure is a Common Presentation of Ligase IV Deficiency.

    Murray JE, Bicknell LS, Yigit G, Duker AL, van Kogelenberg M, Haghayegh S, Wieczorek D, Kayserili H, Albert MH, Wise CA, Brandon J, Kleefstra T, Warris A, van der Flier M, Bamforth JS, Doonanco K, Adès L, Ma A, Field M, Johnson D, Shackley F, Firth H, Woods CG, Nürnberg P, Gatti RA, Hurles M, Bober MB, Wollnik B and Jackson AP

    MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK.

    Ligase IV syndrome is a rare differential diagnosis for Nijmegen breakage syndrome owing to a shared predisposition to lympho-reticular malignancies, significant microcephaly, and radiation hypersensitivity. Only 16 cases with mutations in LIG4 have been described to date with phenotypes varying from malignancy in developmentally normal individuals, to severe combined immunodeficiency and early mortality. Here, we report the identification of biallelic truncating LIG4 mutations in 11 patients with microcephalic primordial dwarfism presenting with restricted prenatal growth and extreme postnatal global growth failure (average OFC -10.1 s.d., height -5.1 s.d.). Subsequently, most patients developed thrombocytopenia and leucopenia later in childhood and many were found to have previously unrecognized immunodeficiency following molecular diagnosis. None have yet developed malignancy, though all patients tested had cellular radiosensitivity. A genotype-phenotype correlation was also noted with position of truncating mutations corresponding to disease severity. This work extends the phenotypic spectrum associated with LIG4 mutations, establishing that extreme growth retardation with microcephaly is a common presentation of bilallelic truncating mutations. Such growth failure is therefore sufficient to consider a diagnosis of LIG4 deficiency and early recognition of such cases is important as bone marrow failure, immunodeficiency, and sometimes malignancy are long term sequelae of this disorder.

    Human mutation 2014;35;1;76-85

  • Differential methylation of the TRPA1 promoter in pain sensitivity.

    MuTHER Consortium

    Chronic pain is a global public health problem, but the underlying molecular mechanisms are not fully understood. Here we examine genome-wide DNA methylation, first in 50 identical twins discordant for heat pain sensitivity and then in 50 further unrelated individuals. Whole-blood DNA methylation was characterized at 5.2 million loci by MeDIP sequencing and assessed longitudinally to identify differentially methylated regions associated with high or low pain sensitivity (pain DMRs). Nine meta-analysis pain DMRs show robust evidence for association (false discovery rate 5%) with the strongest signal in the pain gene TRPA1 (P=1.2 × 10(-13)). Several pain DMRs show longitudinal stability consistent with susceptibility effects, have similar methylation levels in the brain and altered expression in the skin. Our approach identifies epigenetic changes in both novel and established candidate genes that provide molecular insights into pain and may generalize to other complex traits.

    Nature communications 2014;5;2978

  • Structural genomic variation as risk factor for idiopathic recurrent miscarriage.

    Nagirnaja L, Palta P, Kasak L, Rull K, Christiansen OB, Nielsen HS, Steffensen R, Esko T, Remm M and Laan M

    Human Molecular Genetics Research Group, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia.

    Recurrent miscarriage (RM) is a multifactorial disorder with acknowledged genetic heritability that affects ∼3% of couples aiming at childbirth. As copy number variants (CNVs) have been shown to contribute to reproductive disease susceptibility, we aimed to describe genome-wide profile of CNVs and identify common rearrangements modulating risk to RM. Genome-wide screening of Estonian RM patients and fertile controls identified excessive cumulative burden of CNVs (5.4 and 6.1 Mb per genome) in two RM cases possibly increasing their individual disease risk. Functional profiling of all rearranged genes within RM study group revealed significant enrichment of loci related to innate immunity and immunoregulatory pathways essential for immune tolerance at fetomaternal interface. As a major finding, we report a multicopy duplication (61.6 kb) at 5p13.3 conferring increased maternal risk to RM in Estonia and Denmark (meta-analysis, n = 309/205, odds ratio = 4.82, P = 0.012). Comparison to Estonian population-based cohort (total, n = 1000) confirmed the risk for Estonian female cases (P = 7.9 × 10(-4) ). Datasets of four cohorts from the Database of Genomic Variants (total, n = 5,846 subjects) exhibited similar low duplication prevalence worldwide (0.7%-1.2%) compared to RM cases of this study (6.6%-7.5%). The CNV disrupts PDZD2 and GOLPH3 genes predominantly expressed in placenta and it may represent a novel risk factor for pregnancy complications.

    Human mutation 2014;35;8;972-82

  • De novo mutations in HCN1 cause early infantile epileptic encephalopathy.

    Nava C, Dalle C, Rastetter A, Striano P, de Kovel CG, Nabbout R, Cancès C, Ville D, Brilstra EH, Gobbi G, Raffo E, Bouteiller D, Marie Y, Trouillard O, Robbiano A, Keren B, Agher D, Roze E, Lesage S, Nicolas A, Brice A, Baulac M, Vogt C, El Hajj N, Schneider E, Suls A, Weckhuysen S, Gormley P, Lehesjoki AE, De Jonghe P, Helbig I, Baulac S, Zara F, Koeleman BP, EuroEPINOMICS RES Consortium, Haaf T, Leguern E and Depienne C

    1] INSERM UMR 975, Institut du Cerveau et de la Moelle Epinière, Hôpital Pitié-Salpêtrière, Paris, France. [2] CNRS 7225, Hôpital Pitié-Salpêtrière, Paris, France. [3] Université Pierre et Marie Curie-Paris 6 (UPMC), UMRS 975, Paris, France. [4] Assistance Publique-Hôpitaux de Paris (AP-HP), Hôpital Pitié-Salpêtrière, Département de Génétique et de Cytogénétique, Unité Fonctionnelle de Neurogénétique Moléculaire et Cellulaire, Paris, France. [5].

    Hyperpolarization-activated, cyclic nucleotide-gated (HCN) channels contribute to cationic Ih current in neurons and regulate the excitability of neuronal networks. Studies in rat models have shown that the Hcn1 gene has a key role in epilepsy, but clinical evidence implicating HCN1 mutations in human epilepsy is lacking. We carried out exome sequencing for parent-offspring trios with fever-sensitive, intractable epileptic encephalopathy, leading to the discovery of two de novo missense HCN1 mutations. Screening of follow-up cohorts comprising 157 cases in total identified 4 additional amino acid substitutions. Patch-clamp recordings of Ih currents in cells expressing wild-type or mutant human HCN1 channels showed that the mutations had striking but divergent effects on homomeric channels. Individuals with mutations had clinical features resembling those of Dravet syndrome with progression toward atypical absences, intellectual disability and autistic traits. These findings provide clear evidence that de novo HCN1 point mutations cause a recognizable early-onset epileptic encephalopathy in humans.

    Nature genetics 2014

  • Olfactory bulb encoding during learning under anesthesia.

    Nicol AU, Sanchez-Andrade G, Collado P, Segonds-Pichon A and Kendrick KM

    Sub-department of Animal Behaviour, University of Cambridge Cambridge, UK.

    Neural plasticity changes within the olfactory bulb are important for olfactory learning, although how neural encoding changes support new associations with specific odors and whether they can be investigated under anesthesia, remain unclear. Using the social transmission of food preference olfactory learning paradigm in mice in conjunction with in vivo microdialysis sampling we have shown firstly that a learned preference for a scented food odor smelled on the breath of a demonstrator animal occurs under isofluorane anesthesia. Furthermore, subsequent exposure to this cued odor under anesthesia promotes the same pattern of increased release of glutamate and gamma-aminobutyric acid (GABA) in the olfactory bulb as previously found in conscious animals following olfactory learning, and evoked GABA release was positively correlated with the amount of scented food eaten. In a second experiment, multiarray (24 electrodes) electrophysiological recordings were made from olfactory bulb mitral cells under isofluorane anesthesia before, during and after a novel scented food odor was paired with carbon disulfide. Results showed significant increases in overall firing frequency to the cued-odor during and after learning and decreases in response to an uncued odor. Analysis of patterns of changes in individual neurons revealed that a substantial proportion (>50%) of them significantly changed their response profiles during and after learning with most of those previously inhibited becoming excited. A large number of cells exhibiting no response to the odors prior to learning were either excited or inhibited afterwards. With the uncued odor many previously responsive cells became unresponsive or inhibited. Learning associated changes only occurred in the posterior part of the olfactory bulb. Thus olfactory learning under anesthesia promotes extensive, but spatially distinct, changes in mitral cell networks to both cued and uncued odors as well as in evoked glutamate and GABA release.

    Frontiers in behavioral neuroscience 2014;8;193

  • Association of a germline copy number polymorphism of APOBEC3A and APOBEC3B with burden of putative APOBEC-dependent mutations in breast cancer.

    Nik-Zainal S, Wedge DC, Alexandrov LB, Petljak M, Butler AP, Bolli N, Davies HR, Knappskog S, Martin S, Papaemmanuil E, Ramakrishna M, Shlien A, Simonic I, Xue Y, Tyler-Smith C, Campbell PJ and Stratton MR

    1] Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK. [2] Department of Medical Genetics, Addenbrooke's Hospital National Health Service (NHS) Trust, Cambridge, UK.

    The somatic mutations in a cancer genome are the aggregate outcome of one or more mutational processes operative through the lifetime of the individual with cancer. Each mutational process leaves a characteristic mutational signature determined by the mechanisms of DNA damage and repair that constitute it. A role was recently proposed for the APOBEC family of cytidine deaminases in generating particular genome-wide mutational signatures and a signature of localized hypermutation called kataegis. A germline copy number polymorphism involving APOBEC3A and APOBEC3B, which effectively deletes APOBEC3B, has been associated with modestly increased risk of breast cancer. Here we show that breast cancers in carriers of the deletion show more mutations of the putative APOBEC-dependent genome-wide signatures than cancers in non-carriers. The results suggest that the APOBEC3A-APOBEC3B germline deletion allele confers cancer susceptibility through increased activity of APOBEC-dependent mutational processes, although the mechanism by which this increase in activity occurs remains unknown.

    Funded by: Wellcome Trust: WT088340MA

    Nature genetics 2014;46;5;487-91

  • Changes in malaria parasite drug resistance in an endemic population over a 25-year period with resulting genomic evidence of selection.

    Nwakanma DC, Duffy CW, Amambua-Ngwa A, Oriero EC, Bojang KA, Pinder M, Drakeley CJ, Sutherland CJ, Milligan PJ, Macinnis B, Kwiatkowski DP, Clark TG, Greenwood BM and Conway DJ

    Medical Research Council Unit, Fajara, The Gambia.

    Background. Analysis of genome-wide polymorphism in many organisms has potential to identify genes under recent selection. However, data on historical allele frequency changes are rarely available for direct confirmation. Methods. We genotyped single nucleotide polymorphisms (SNPs) in 4 Plasmodium falciparum drug resistance genes in 668 archived parasite-positive blood samples of a Gambian population between 1984 and 2008. This covered a period before antimalarial resistance was detected locally, through subsequent failure of multiple drugs until introduction of artemisinin combination therapy. We separately performed genome-wide sequence analysis of 52 clinical isolates from 2008 to prospect for loci under recent directional selection. Results. Resistance alleles increased from very low frequencies, peaking in 2000 for chloroquine resistance-associated crt and mdr1 genes and at the end of the survey period for dhfr and dhps genes respectively associated with pyrimethamine and sulfadoxine resistance. Temporal changes fit a model incorporating likely selection coefficients over the period. Three of the drug resistance loci were in the top 4 regions under strong selection implicated by the genome-wide analysis. Conclusions. Genome-wide polymorphism analysis of an endemic population sample robustly identifies loci with detailed documentation of recent selection, demonstrating power to prospectively detect emerging drug resistance genes.

    Funded by: Medical Research Council: G1100123, MC_U190081987, MC_U190092708

    The Journal of infectious diseases 2014;209;7;1126-35

  • A General Approach for Haplotype Phasing across the Full Spectrum of Relatedness.

    O'Connell J, Gurdasani D, Delaneau O, Pirastu N, Ulivi S, Cocca M, Traglia M, Huang J, Huffman JE, Rudan I, McQuillan R, Fraser RM, Campbell H, Polasek O, Asiki G, Ekoru K, Hayward C, Wright AF, Vitart V, Navarro P, Zagury JF, Wilson JF, Toniolo D, Gasparini P, Soranzo N, Sandhu MS and Marchini J

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom; Department of Statistics, University of Oxford, Oxford, United Kingdom.

    Many existing cohorts contain a range of relatedness between genotyped individuals, either by design or by chance. Haplotype estimation in such cohorts is a central step in many downstream analyses. Using genotypes from six cohorts from isolated populations and two cohorts from non-isolated populations, we have investigated the performance of different phasing methods designed for nominally 'unrelated' individuals. We find that SHAPEIT2 produces much lower switch error rates in all cohorts compared to other methods, including those designed specifically for isolated populations. In particular, when large amounts of IBD sharing is present, SHAPEIT2 infers close to perfect haplotypes. Based on these results we have developed a general strategy for phasing cohorts with any level of implicit or explicit relatedness between individuals. First SHAPEIT2 is run ignoring all explicit family information. We then apply a novel HMM method (duoHMM) to combine the SHAPEIT2 haplotypes with any family information to infer the inheritance pattern of each meiosis at all sites across each chromosome. This allows the correction of switch errors, detection of recombination events and genotyping errors. We show that the method detects numbers of recombination events that align very well with expectations based on genetic maps, and that it infers far fewer spurious recombination events than Merlin. The method can also detect genotyping errors and infer recombination events in otherwise uninformative families, such as trios and duos. The detected recombination events can be used in association scans for recombination phenotypes. The method provides a simple and unified approach to haplotype estimation, that will be of interest to researchers in the fields of human, animal and plant genetics.

    PLoS genetics 2014;10;4;e1004234

  • Single nucleotide variants in the protein C pathway and mortality in dialysis patients.

    Ocak G, Drechsler C, Vossen CY, Vos HL, Rosendaal FR, Reitsma PH, Hoffmann MM, März W, Ouwehand WH, Krediet RT, Boeschoten EW, Dekker FW, Wanner C and Verduijn M

    Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands.

    Background: The protein C pathway plays an important role in the maintenance of endothelial barrier function and in the inflammatory and coagulant processes that are characteristic of patients on dialysis. We investigated whether common single nucleotide variants (SNV) in genes encoding protein C pathway components were associated with all-cause 5 years mortality risk in dialysis patients.

    Methods: Single nucleotides variants in the factor V gene (F5 rs6025; factor V Leiden), the thrombomodulin gene (THBD rs1042580), the protein C gene (PROC rs1799808 and 1799809) and the endothelial protein C receptor gene (PROCR rs867186, rs2069951, and rs2069952) were genotyped in 1070 dialysis patients from the NEtherlands COoperative Study on the Adequacy of Dialysis (NECOSAD) cohort) and in 1243 dialysis patients from the German 4D cohort.

    Results: Factor V Leiden was associated with a 1.5-fold (95% CI 1.1-1.9) increased 5-year all-cause mortality risk and carriers of the AG/GG genotypes of the PROC rs1799809 had a 1.2-fold (95% CI 1.0-1.4) increased 5-year all-cause mortality risk. The other SNVs in THBD, PROC, and PROCR were not associated with 5-years mortality.

    Conclusion: Our study suggests that factor V Leiden and PROC rs1799809 contributes to an increased mortality risk in dialysis patients.

    PloS one 2014;9;5;e97251

  • Whole-genome scans provide evidence of adaptive evolution in Malawian Plasmodium falciparum isolates.

    Ocholla H, Preston MD, Mipando M, Jensen AT, Campino S, MacInnis B, Alcock D, Terlouw A, Zongo I, Oudraogo JB, Djimde AA, Assefa S, Doumbo OK, Borrmann S, Nzila A, Marsh K, Fairhurst RM, Nosten F, Anderson TJ, Kwiatkowski DP, Craig A, Clark TG and Montgomery J

    Malawi-Liverpool-Wellcome Trust Clinical Research Programme, College of Medicine, University of Malawi, Blantyre , Malawi Liverpool School of Tropical Medicine, Pembroke Place , Liverpool, UK.

    Background:  Selection by host immunity and antimalarial drugs has driven extensive adaptive evolution in Plasmodium falciparum, and continues to produce ever-changing landscapes of genetic variation.

    Methods:  We carried out whole-genome sequencing of 69 P. falciparum isolates from Malawi and used population genetics approaches to investigate genetic diversity and population structure, and identify loci under selection.

    Results:  High genetic diversity (π=2.4 x 10(-4)), moderately high multiplicity of infection (2.7), and low linkage disequilibrium (500-bp) were observed in Chikhwawa District, Malawi, an area of high malaria transmission. Allele frequency-based tests provided evidence of recent population growth in Malawi and detected potential targets of host immunity and candidate vaccine antigens. Comparing the sequence variation between isolates from Malawi and those from 5 geographically dispersed countries (Kenya, Burkina Faso, Mali, Cambodia, and Thailand) detected population genetic differences between Africa and Asia, within Southeast Asia, and within Africa. Applying haplotype-based tests of selection to sequence data from all 6 populations identified signals of directional selection at known drug-resistance loci, including pfcrt, pfdhps, pfmdr1, and pfgch1.

    Conclusions:  The sequence variations observed at drug-resistance loci reflect differences in each country's historical use of antimalarial drugs, and may be useful in formulating local malaria treatment guidelines.

    The Journal of infectious diseases 2014

  • Using association rule mining to determine promising secondary phenotyping hypotheses.

    Oellrich A, Jacobsen J, Papatheodorou I, Sanger Mouse Genetics Project and Smedley D

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB1 10SA, UK.

    Motivation: Large-scale phenotyping projects such as the Sanger Mouse Genetics project are ongoing efforts to help identify the influences of genes and their modification on phenotypes. Gene-phenotype relations are crucial to the improvement of our understanding of human heritable diseases as well as the development of drugs. However, given that there are ∼: 20 000 genes in higher vertebrate genomes and the experimental verification of gene-phenotype relations requires a lot of resources, methods are needed that determine good candidates for testing.

    Results: In this study, we applied an association rule mining approach to the identification of promising secondary phenotype candidates. The predictions rely on a large gene-phenotype annotation set that is used to find occurrence patterns of phenotypes. Applying an association rule mining approach, we could identify 1967 secondary phenotype hypotheses that cover 244 genes and 136 phenotypes. Using two automated and one manual evaluation strategies, we demonstrate that the secondary phenotype candidates possess biological relevance to the genes they are predicted for. From the results we conclude that the predicted secondary phenotypes constitute good candidates to be experimentally tested and confirmed. Availability: The secondary phenotype candidates can be browsed through at

    Contact: or Supplementary information: Supplementary data are available at Bioinformatics online.

    Bioinformatics (Oxford, England) 2014;30;12;i52-i59

  • The influence of disease categories on gene candidate predictions from model organism phenotypes.

    Oellrich A, Koehler S, Washington N, Sanger Mouse Genetic Project, Mungall C, Lewis S, Haendel M, Robinson PN and Smedley D

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, CB10 1SA Hinxton, UK.

    Background: The molecular etiology is still to be identified for about half of the currently described Mendelian diseases in humans, thereby hindering efforts to find treatments or preventive measures. Advances, such as new sequencing technologies, have led to increasing amounts of data becoming available with which to address the problem of identifying disease genes. Therefore, automated methods are needed that reliably predict disease gene candidates based on available data. We have recently developed Exomiser as a tool for identifying causative variants from exome analysis results by filtering and prioritising using a number of criteria including the phenotype similarity between the disease and mouse mutants involving the gene candidates. Initial investigations revealed a variation in performance for different medical categories of disease, due in part to a varying contribution of the phenotype scoring component.

    Results: In this study, we further analyse the performance of our cross-species phenotype matching algorithm, and examine in more detail the reasons why disease gene filtering based on phenotype data works better for certain disease categories than others. We found that in addition to misleading phenotype alignments between species, some disease categories are still more amenable to automated predictions than others, and that this often ties in with community perceptions on how well the organism works as model.

    Conclusions: In conclusion, our automated disease gene candidate predictions are highly dependent on the organism used for the predictions and the disease category being studied. Future work on computational disease gene prediction using phenotype data would benefit from methods that take into account the disease category and the source of model organism data.

    Journal of biomedical semantics 2014;5;Suppl 1 Proceedings of the Bio-Ontologies Spec Interest G;S4

  • Linking tissues to phenotypes using gene expression profiles.

    Oellrich A, Sanger Mouse Genetics Project and Smedley D

    Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Despite great biological and computational efforts to determine the genetic causes underlying human heritable diseases, approximately half (3500) of these diseases are still without an identified genetic cause. Model organism studies allow the targeted modification of the genome and can help with the identification of genetic causes for human diseases. Targeted modifications have led to a vast amount of model organism data. However, these data are scattered across different databases, preventing an integrated view and missing out on contextual information. Once we are able to combine all the existing resources, will we be able to fully understand the causes underlying a disease and how species differ. Here, we present an integrated data resource combining tissue expression with phenotypes in mouse lines and bringing us one step closer to consequence chains from a molecular level to a resulting phenotype. Mutations in genes often manifest in phenotypes in the same tissue that the gene is expressed in. However, in other cases, a systems level approach is required to understand how perturbations to gene-networks connecting multiple tissues lead to a phenotype. Automated evaluation of the predicted tissue-phenotype associations reveals that 72-76% of the phenotypes are associated with disruption of genes expressed in the affected tissue. However, 55-64% of the individual phenotype-tissue associations show spatially separated gene expression and phenotype manifestation. For example, we see a correlation between 'total body fat' abnormalities and genes expressed in the 'brain', which fits recent discoveries linking genes expressed in the hypothalamus to obesity. Finally, we demonstrate that the use of our predicted tissue-phenotype associations can improve the detection of a known disease-gene association when combined with a disease gene candidate prediction tool. For example, JAK2, the known gene associated with Familial Erythrocytosis 1, rises from the seventh best candidate to the top hit when the associated tissues are taken into consideration. Database URL:

    Database : the journal of biological databases and curation 2014;2014;0;bau017

  • An Evaluation of HIV Elite Controller Definitions within a Large Seroconverter Cohort Collaboration.

    Olson AD, Meyer L, Prins M, Thiebaut R, Gurdasani D, Guiguet M, Chaix ML, Amornkul P, Babiker A, Sandhu MS, Porter K and for C. A. S. C. A. D. E. Collaboration in EuroCoord

    Medical Research Council Clinical Trials Unit at University College London, London, United Kingdom.

    Background: Understanding the mechanisms underlying viral control is highly relevant to vaccine studies and elite control (EC) of HIV infection. Although numerous definitions of EC exist, it is not clear which, if any, best identify this rare phenotype. Methods: We assessed a number of EC definitions used in the literature using CASCADE data of 25,692 HIV seroconverters. We estimated proportions maintaining EC of total ART-naïve follow-up time, and disease progression, comparing to non-EC. We also examined HIV-RNA and CD4 values and CD4 slope during EC and beyond (while ART naïve). Results: Most definitions classify ∼1% as ECs with median HIV-RNA 43-903 copies/ml and median CD4>500 cells/mm(3). Beyond EC status, median HIV-RNA levels remained low, although often detectable, and CD4 values high but with strong evidence of decline for all definitions. Median % ART-naïve time as EC was ≥92% although overlap between definitions was low. EC definitions with consecutive HIV-RNA measurements <75 copies/ml with follow-up≥ six months, or with 90% of measurements <400 copies/ml over ≥10 year follow-up preformed best overall. Individuals thus defined were less likely to progress to endpoint (hazard ratios ranged from 12.5-19.0 for non-ECs compared to ECs). Conclusions: ECs are rare, less likely to progress to clinical disease, but may eventually lose control. We suggest definitions requiring individuals to have consecutive undetectable HIV-RNA measurements for ≥ six months or otherwise with >90% of measurements <400 copies/ml over ≥10 years be used to define this phenotype.

    PloS one 2014;9;1;e86719

  • Paving the way towards a successful and fulfilling career in computational biology.

    Oluwagbemi O, Adebiyi M, Fatumo S and Macintyre G

    Department of Computer and Information Sciences, College of Science and Technology, Bioinformatics Unit, Covenant University, Ota, Ogun State, Nigeria.

    PLoS computational biology 2014;10;5;e1003593

  • BCKDH: The Missing Link in Apicomplexan Mitochondrial Metabolism Is Required for Full Virulence of Toxoplasma gondii and Plasmodium berghei.

    Oppenheim RD, Creek DJ, Macrae JI, Modrzynska KK, Pino P, Limenitakis J, Polonais V, Seeber F, Barrett MP, Billker O, McConville MJ and Soldati-Favre D

    Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, Geneva, Switzerland.

    While the apicomplexan parasites Plasmodium falciparum and Toxoplasma gondii are thought to primarily depend on glycolysis for ATP synthesis, recent studies have shown that they can fully catabolize glucose in a canonical TCA cycle. However, these parasites lack a mitochondrial isoform of pyruvate dehydrogenase and the identity of the enzyme that catalyses the conversion of pyruvate to acetyl-CoA remains enigmatic. Here we demonstrate that the mitochondrial branched chain ketoacid dehydrogenase (BCKDH) complex is the missing link, functionally replacing mitochondrial PDH in both T. gondii and P. berghei. Deletion of the E1a subunit of T. gondii and P. berghei BCKDH significantly impacted on intracellular growth and virulence of both parasites. Interestingly, disruption of the P. berghei E1a restricted parasite development to reticulocytes only and completely prevented maturation of oocysts during mosquito transmission. Overall this study highlights the importance of the molecular adaptation of BCKDH in this important class of pathogens.

    PLoS pathogens 2014;10;7;e1004263

  • Loss of FTO antagonises Wnt signaling and leads to developmental defects associated with ciliopathies.

    Osborn DP, Roccasecca RM, McMurray F, Hernandez-Hernandez V, Mukherjee S, Barroso I, Stemple D, Cox R, Beales PL and Christou-Savina S

    Biomedical Sciences, St George's University of London, London, United Kingdom.

    Common intronic variants in the Human fat mass and obesity-associated gene (FTO) are found to be associated with an increased risk of obesity. Overexpression of FTO correlates with increased food intake and obesity, whilst loss-of-function results in lethality and severe developmental defects. Despite intense scientific discussions around the role of FTO in energy metabolism, the function of FTO during development remains undefined. Here, we show that loss of Fto leads to developmental defects such as growth retardation, craniofacial dysmorphism and aberrant neural crest cells migration in Zebrafish. We find that the important developmental pathway, Wnt, is compromised in the absence of FTO, both in vivo (zebrafish) and in vitro (Fto(-/-) MEFs and HEK293T). Canonical Wnt signalling is down regulated by abrogated β-Catenin translocation to the nucleus whilst non-canonical Wnt/Ca(2+) pathway is activated via its key signal mediators CaMKII and PKCδ. Moreover, we demonstrate that loss of Fto results in short, absent or disorganised cilia leading to situs inversus, renal cystogenesis, neural crest cell defects and microcephaly in Zebrafish. Congruently, Fto knockout mice display aberrant tissue specific cilia. These data identify FTO as a protein-regulator of the balanced activation between canonical and non-canonical branches of the Wnt pathway. Furthermore, we present the first evidence that FTO plays a role in development and cilia formation/function.

    Funded by: Medical Research Council: G0801843

    PloS one 2014;9;2;e87662

  • New antigens for a multicomponent blood-stage malaria vaccine.

    Osier FH, Mackinnon MJ, Crosnier C, Fegan G, Kamuyu G, Wanaguru M, Ogada E, McDade B, Rayner JC, Wright GJ and Marsh K

    Pathogen Vector and Human Biology Department, Kenya Medical Research Institute Centre for Geographical Medicine Research, Coast, P. O. Box 230, 80108 Kilifi, Kenya.

    An effective blood-stage vaccine against Plasmodium falciparum remains a research priority, but the number of antigens that have been translated into multicomponent vaccines for testing in clinical trials remains limited. Investigating the large number of potential targets found in the parasite proteome has been constrained by an inability to produce natively folded recombinant antigens for immunological studies. We overcame these constraints by generating a large library of biochemically active merozoite surface and secreted full-length ectodomain proteins. We then systematically examined the antibody reactivity against these proteins in a cohort of Kenyan children (n = 286) who were sampled at the start of a malaria transmission season and prospectively monitored for clinical episodes of malaria over the ensuing 6 months. We found that antibodies to previously untested or little-studied proteins had superior or equivalent potential protective efficacy to the handful of current leading malaria vaccine candidates. Moreover, cumulative responses to combinations comprising 5 of the 10 top-ranked antigens, including PF3D7_1136200, MSP2, RhopH3, P41, MSP11, MSP3, PF3D7_0606800, AMA1, Pf113, and MSRP1, were associated with 100% protection against clinical episodes of malaria. These data suggest not only that there are many more potential antigen candidates for the malaria vaccine development pipeline but also that effective vaccination may be achieved by combining a selection of these antigens.

    Science translational medicine 2014;6;247;247ra102

  • Unexplained diarrhoea in HIV-1 infected individuals.

    Oude Munnink BB, Canuti M, Deijs M, de Vries M, Jebbink MF, Rebers S, Molenkamp R, van Hemert FJ, Chung K, Cotten M, Snijders F, Sol CJ and van der Hoek L

    Laboratory of Experimental Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center of the University of Amsterdam, Amsterdam, The Netherlands.

    Background: Gastrointestinal symptoms, in particular diarrhoea, are common in non-treated HIV-1 infected individuals. Although various enteric pathogens have been implicated, the aetiology of diarrhoea remains unexplained in a large proportion of HIV-1 infected patients. Our aim is to identify the cause of diarrhoea for patients that remain negative in routine diagnostics.

    Methods: In this study stool samples of 196 HIV-1 infected persons, including 29 persons with diarrhoea, were examined for enteropathogens and HIV-1. A search for unknown and unexpected viruses was performed using virus discovery cDNA-AFLP combined with Roche-454 sequencing (VIDISCA-454).

    Results: HIV-1 RNA was detected in stool of 19 patients with diarrhoea (66%) compared to 75 patients (45%) without diarrhoea. In 19 of the 29 diarrhoea cases a known enteropathogen could be identified (66%). Next to these known causative agents, a range of recently identified viruses was identified via VIDISCA-454: cosavirus, Aichi virus, human gyrovirus, and non-A non-B hepatitis virus. Moreover, a novel virus was detected which was named immunodeficiency-associated stool virus (IASvirus). However, PCR based screening for these viruses showed that none of these novel viruses was associated with diarrhoea. Notably, among the 34% enteropathogen-negative cases, HIV-1 RNA shedding in stool was more frequently observed (80%) compared to enteropathogen-positive cases (47%), indicating that HIV-1 itself is the most likely candidate to be involved in diarrhoea.

    Conclusion: Unexplained diarrhoea in HIV-1 infected patients is probably not caused by recently described or previously unknown pathogens, but it is more likely that HIV-1 itself plays a role in intestinal mucosal abnormalities which leads to diarrhoea.

    BMC infectious diseases 2014;14;1;22

  • SDF-1 Chemokine Signalling Modulates the Apoptotic Responses to Iron Deprivation of Clathrin-Depleted DT40 Cells.

    Pance A, Morrissey-Wettey FR, Craig H, Downing A, Talbot R and Jackson AP

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    We have previously deleted both endogenous copies of the clathrin heavy-chain gene in the chicken pre B-cell-line DT40 and replaced them with clathrin under the control of a tetracycline-regulatable promoter (Tet-Off). The originally derived cell-line DKO-S underwent apoptosis when clathrin expression was repressed. We have also described a cell-line DKO-R derived from DKO-S cells that was less sensitive to clathrin-depletion. Here we show that the restriction of transferrin uptake, resulting in iron deprivation, is responsible for the lethal consequence of clathrin-depletion. We further show that the DKO-R cells have up-regulated an anti-apoptotic survival pathway based on the chemokine SDF-1 and its receptor CXCR4. Our work clarifies several puzzling features of clathrin-depleted DT40 cells and reveals an example of how SDF-1/CXCR4 signalling can abrogate pro-apoptotic pathways and increase cell survival. We propose that the phenomenon described here has implications for the therapeutic approach to a variety of cancers.

    PloS one 2014;9;8;e106278

  • RAG-mediated recombination is the predominant driver of oncogenic rearrangement in ETV6-RUNX1 acute lymphoblastic leukemia.

    Papaemmanuil E, Rapado I, Li Y, Potter NE, Wedge DC, Tubio J, Alexandrov LB, Van Loo P, Cooke SL, Marshall J, Martincorena I, Hinton J, Gundem G, van Delft FW, Nik-Zainal S, Jones DR, Ramakrishna M, Titley I, Stebbings L, Leroy C, Menzies A, Gamble J, Robinson B, Mudie L, Raine K, O'Meara S, Teague JW, Butler AP, Cazzaniga G, Biondi A, Zuna J, Kempski H, Muschen M, Ford AM, Stratton MR, Greaves M and Campbell PJ

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, UK.

    The ETV6-RUNX1 fusion gene, found in 25% of childhood acute lymphoblastic leukemia (ALL) cases, is acquired in utero but requires additional somatic mutations for overt leukemia. We used exome and low-coverage whole-genome sequencing to characterize secondary events associated with leukemic transformation. RAG-mediated deletions emerge as the dominant mutational process, characterized by recombination signal sequence motifs near breakpoints, incorporation of non-templated sequence at junctions, ∼30-fold enrichment at promoters and enhancers of genes actively transcribed in B cell development and an unexpectedly high ratio of recurrent to non-recurrent structural variants. Single-cell tracking shows that this mechanism is active throughout leukemic evolution, with evidence of localized clustering and reiterated deletions. Integration of data on point mutations and rearrangements identifies ATF7IP and MGA as two new tumor-suppressor genes in ALL. Thus, a remarkably parsimonious mutational process transforms ETV6-RUNX1-positive lymphoblasts, targeting the promoters, enhancers and first exons of genes that normally regulate B cell differentiation.

    Funded by: NCI NIH HHS: R01 CA139032, R01 CA157644, R01 CA169458, R01 CA172558; Wellcome Trust: 077012/05/Z, 093867, WT088340MA, WT100183MA

    Nature genetics 2014;46;2;116-25

  • Prevalence and properties of mecC methicillin-resistant Staphylococcus aureus (MRSA) in bovine bulk tank milk in Great Britain.

    Paterson GK, Morgan FJ, Harrison EM, Peacock SJ, Parkhill J, Zadoks RN and Holmes MA

    Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge CB3 0ES, UK.

    Objectives: mecC methicillin-resistant Staphylococcus aureus (MRSA) represent a newly recognized form of MRSA, distinguished by the possession of a divergent mecA homologue, mecC. The first isolate to be identified came from bovine milk, but there are few data on the prevalence of mecC MRSA among dairy cattle. The aim of this study was to conduct a prevalence study of mecC MRSA among dairy farms in Great Britain. Methods: Test farms were randomly selected by random order generation and bulk tank samples were tested for the presence of mecC MRSA by broth enrichment and plating onto chromogenic agar. All MRSA isolated were screened by PCR for mecA and mecC, and mecC MRSA were further characterized by multilocus sequence typing, spa typing and antimicrobial susceptibility testing. Results: mecC MRSA were detected on 10 of 465 dairy farms sampled in England and Wales (prevalence 2.15%, 95% CI 1.17%-3.91%), but not from 625 farms sampled in Scotland (95% CI of prevalence 0%-0.61%). Seven isolates belonged to sequence type (ST) 425, while the other three belonged to clonal complex 130. Resistance to non-β-lactam antibiotics was uncommon. All 10 isolates produced a negative result by slide agglutination for penicillin-binding protein 2a. mecA MRSA ST398 was detected on one farm in England. Conclusions: mecC MRSA is widely distributed among dairy farms in Great Britain, but this distribution is not uniform across the whole country. These results provide an important baseline dataset to monitor the epidemiology of this emerging form of MRSA.

    Funded by: Medical Research Council: G1001787

    The Journal of antimicrobial chemotherapy 2014;69;3;598-602

  • Functional interpretation of non-coding sequence variation: Concepts and challenges.

    Paul DS, Soranzo N and Beck S

    UCL Cancer Institute, University College London, London, United Kingdom.

    Understanding the functional mechanisms underlying genetic signals associated with complex traits and common diseases, such as cancer, diabetes and Alzheimer's disease, is a formidable challenge. Many genetic signals discovered through genome-wide association studies map to non-protein coding sequences, where their molecular consequences are difficult to evaluate. This article summarizes concepts for the systematic interpretation of non-coding genetic signals using genome annotation data sets in different cellular systems. We outline strategies for the global analysis of multiple association intervals and the in-depth molecular investigation of individual intervals. We highlight experimental techniques to validate candidate (potential causal) regulatory variants, with a focus on novel genome-editing techniques including CRISPR/Cas9. These approaches are also applicable to low-frequency and rare variants, which have become increasingly important in genomic studies of complex traits and diseases. There is a pressing need to translate genetic signals into biological mechanisms, leading to prognostic, diagnostic and therapeutic advances.

    BioEssays : news and reviews in molecular, cellular and developmental biology 2014;36;2;191-9

  • Mutations disrupting the Kennedy phosphatidylcholine pathway in humans with congenital lipodystrophy and fatty liver disease.

    Payne F, Lim K, Girousse A, Brown RJ, Kory N, Robbins A, Xue Y, Sleigh A, Cochran E, Adams C, Dev Borman A, Russel-Jones D, Gorden P, Semple RK, Saudek V, O'Rahilly S, Walther TC, Barroso I and Savage DB

    Metabolic Disease Group, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, United Kingdom;

    Phosphatidylcholine (PC) is the major glycerophospholipid in eukaryotic cells and is an essential component in all cellular membranes. The biochemistry of de novo PC synthesis by the Kennedy pathway is well established, but less is known about the physiological functions of PC. We identified two unrelated patients with defects in the Kennedy pathway due to biallellic loss-of-function mutations in phosphate cytidylyltransferase 1 alpha (PCYT1A), the rate-limiting enzyme in this pathway. The mutations lead to a marked reduction in PCYT1A expression and PC synthesis. The phenotypic consequences include some features, such as severe fatty liver and low HDL cholesterol levels, that are predicted by the results of previously reported liver-specific deletion of murine Pcyt1a. Both patients also had lipodystrophy, severe insulin resistance, and diabetes, providing evidence for an additional and essential role for PCYT1A-generated PC in the normal function of white adipose tissue and insulin action.

    Funded by: Medical Research Council; NHLBI NIH HHS: HL-102923, HL-102926, HL-103010; NIGMS NIH HHS: R01 GM09719; Wellcome Trust: 098498, WT091310, WT091551, WT095515, WT098051, WT098498

    Proceedings of the National Academy of Sciences of the United States of America 2014;111;24;8901-6

  • The different shades of fat.

    Peirce V, Carobbio S and Vidal-Puig A

    University of Cambridge Metabolic Research Laboratories, Level 4, Wellcome Trust-MRC Institute of Metabolic Science, Box 289, Addenbrooke's Hospital, Cambridge CB2 OQQ, UK.

    Our understanding of adipose tissue biology has progressed rapidly since the turn of the century. White adipose tissue has emerged as a key determinant of healthy metabolism and metabolic dysfunction. This realization is paralleled only by the confirmation that adult humans have heat-dissipating brown adipose tissue, an important contributor to energy balance and a possible therapeutic target for the treatment of metabolic disease. We propose that the development of successful strategies to target brown and white adipose tissues will depend on investigations that elucidate their developmental origins and cell-type-specific functional regulators.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council; Wellcome Trust

    Nature 2014;510;7503;76-83

  • Functional genomics reveals that Clostridium difficile Spo0A coordinates sporulation, virulence and metabolism.

    Pettit LJ, Browne HP, Yu L, Smits WK, Fagan RP, Barquist L, Martin MJ, Goulding D, Duncan SH, Flint HJ, Dougan G, Choudhary JS and Lawley TD

    Wellcome Trust Sanger Institute, Hinxton, UK.

    Background: Clostridium difficile is an anaerobic, Gram-positive bacterium that can reside as a commensal within the intestinal microbiota of healthy individuals or cause life-threatening antibiotic-associated diarrhea in immunocompromised hosts. C. difficile can also form highly resistant spores that are excreted facilitating host-to-host transmission. The C. difficile spo0A gene encodes a highly conserved transcriptional regulator of sporulation that is required for relapsing disease and transmission in mice.

    Results: Here we describe a genome-wide approach using a combined transcriptomic and proteomic analysis to identify Spo0A regulated genes. Our results validate Spo0A as a positive regulator of putative and novel sporulation genes as well as components of the mature spore proteome. We also show that Spo0A regulates a number of virulence-associated factors such as flagella and metabolic pathways including glucose fermentation leading to butyrate production.

    Conclusions: The C. difficile spo0A gene is a global transcriptional regulator that controls diverse sporulation, virulence and metabolic phenotypes coordinating pathogen adaptation to a wide range of host interactions. Additionally, the rich breadth of functional data allowed us to significantly update the annotation of the C. difficile 630 reference genome which will facilitate basic and applied research on this emerging pathogen.

    Funded by: Wellcome Trust: 086418, 098051

    BMC genomics 2014;15;160

  • Global dissemination of a multidrug resistant Escherichia coli clone.

    Petty NK, Ben Zakour NL, Stanton-Cook M, Skippington E, Totsika M, Forde BM, Phan MD, Gomes Moriel D, Peters KM, Davies M, Rogers BA, Dougan G, Rodriguez-Baño J, Pascual A, Pitout JD, Upton M, Paterson DL, Walsh TR, Schembri MA and Beatson SA

    Australian Infectious Diseases Research Centre, and School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD 4072, Australia.

    Escherichia coli sequence type 131 (ST131) is a globally disseminated, multidrug resistant (MDR) clone responsible for a high proportion of urinary tract and bloodstream infections. The rapid emergence and successful spread of E. coli ST131 is strongly associated with several factors, including resistance to fluoroquinolones, high virulence gene content, the possession of the type 1 fimbriae FimH30 allele, and the production of the CTX-M-15 extended spectrum β-lactamase (ESBL). Here, we used genome sequencing to examine the molecular epidemiology of a collection of E. coli ST131 strains isolated from six distinct geographical locations across the world spanning 2000-2011. The global phylogeny of E. coli ST131, determined from whole-genome sequence data, revealed a single lineage of E. coli ST131 distinct from other extraintestinal E. coli strains within the B2 phylogroup. Three closely related E. coli ST131 sublineages were identified, with little association to geographic origin. The majority of single-nucleotide variants associated with each of the sublineages were due to recombination in regions adjacent to mobile genetic elements (MGEs). The most prevalent sublineage of ST131 strains was characterized by fluoroquinolone resistance, and a distinct virulence factor and MGE profile. Four different variants of the CTX-M ESBL-resistance gene were identified in our ST131 strains, with acquisition of CTX-M-15 representing a defining feature of a discrete but geographically dispersed ST131 sublineage. This study confirms the global dispersal of a single E. coli ST131 clone and demonstrates the role of MGEs and recombination in the evolution of this important MDR pathogen.

    Proceedings of the National Academy of Sciences of the United States of America 2014;111;15;5694-9

  • Vying over spilt oil.

    Pham N TA and Anonye BO

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2014;12;3;156

  • Emerging insights on intestinal dysbiosis during bacterial infections.

    Pham TA and Lawley TD

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, United Kingdom.

    Infection of the gastrointestinal tract is commonly linked to pathological imbalances of the resident microbiota, termed dysbiosis. In recent years, advanced high-throughput genomic approaches have allowed us to examine the microbiota in an unprecedented manner, revealing novel biological insights about infection-associated dysbiosis at the community and individual species levels. A dysbiotic microbiota is typically reduced in taxonomic diversity and metabolic function, and can harbour pathobionts that exacerbate intestinal inflammation or manifest systemic disease. Dysbiosis can also promote pathogen genome evolution, while allowing the pathogens to persist at high density and transmit to new hosts. A deeper understanding of bacterial pathogenicity in the context of the intestinal microbiota should unveil new approaches for developing diagnostics and therapies for enteropathogens.

    Current opinion in microbiology 2014;17C;67-74

  • CtIP-mediated resection is essential for viability and can operate independently of BRCA1.

    Polato F, Callen E, Wong N, Faryabi R, Bunting S, Chen HT, Kozak M, Kruhlak MJ, Reczek CR, Lee WH, Ludwig T, Baer R, Feigenbaum L, Jackson S and Nussenzweig A

    Laboratory of Genome Integrity, Experimental Immunology Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892.

    Homologous recombination (HR) is initiated by DNA end resection, a process in which stretches of single-strand DNA (ssDNA) are generated and used for homology search. Factors implicated in resection include nucleases MRE11, EXO1, and DNA2, which process DNA ends into 3' ssDNA overhangs; helicases such as BLM, which unwind DNA; and other proteins such as BRCA1 and CtIP whose functions remain unclear. CDK-mediated phosphorylation of CtIP on T847 is required to promote resection, whereas CDK-dependent phosphorylation of CtIP-S327 is required for interaction with BRCA1. Here, we provide evidence that CtIP functions independently of BRCA1 in promoting DSB end resection. First, using mouse models expressing S327A or T847A mutant CtIP as a sole species, and B cells deficient in CtIP, we show that loss of the CtIP-BRCA1 interaction does not detectably affect resection, maintenance of genomic stability or viability, whereas T847 is essential for these functions. Second, although loss of 53BP1 rescues the embryonic lethality and HR defects in BRCA1-deficient mice, it does not restore viability or genome integrity in CtIP(-/-) mice. Third, the increased resection afforded by loss of 53BP1 and the rescue of BRCA1-deficiency depend on CtIP but not EXO1. Finally, the sensitivity of BRCA1-deficient cells to poly ADP ribose polymerase (PARP) inhibition is partially rescued by the phospho-mimicking mutant CtIP (CtIP-T847E). Thus, in contrast to BRCA1, CtIP has indispensable roles in promoting resection and embryonic development.

    The Journal of experimental medicine 2014

  • Obituary: Professor Frederick Sanger:13 August 1918 – 19 November 2013

    Powell, D

    Genetics Society News 2014;70;16-18

  • PlasmoView: A Web-based Resource to Visualise Global Plasmodium falciparum Genomic Variation.

    Preston MD, Assefa SA, Ocholla H, Sutherland CJ, Borrmann S, Nzila A, Michon P, Hien TT, Bousema T, Drakeley CJ, Zongo I, Ouédraogo JB, Djimde AA, Doumbo OK, Nosten F, Fairhurst RM, Conway DJ, Roper C and Clark TG

    Department of Pathogen Molecular Biology, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, United Kingdom.

    Malaria is a global public health challenge, with drug resistance a major barrier to disease control and elimination. To meet the urgent need for better treatments and vaccines, a deeper knowledge of Plasmodium biology and malaria epidemiology is required. An improved understanding of the genomic variation of malaria parasites, especially the most virulent Plasmodium falciparum (Pf) species, has the potential to yield new insights in these areas. High-throughput sequencing and genotyping is generating large amounts of genomic data across multiple parasite populations. The resulting ability to identify informative variants, particularly single-nucleotide polymorphisms (SNPs), will lead to the discovery of intra- and inter-population differences and thus enable the development of genetic barcodes for diagnostic assays and clinical studies. Knowledge of genetic variability underlying drug resistance and other differential phenotypes will also facilitate the identification of novel mutations and contribute to surveillance and stratified medicine applications. The PlasmoView interactive web-browsing tool enables the research community to visualise genomic variation and annotation (eg, biological function) in a geographic setting. The first release contains over 600 000 high-quality SNPs in 631 Pf isolates from laboratory strains and four malaria-endemic regions (West Africa, East Africa, Southeast Asia and Oceania).

    The Journal of infectious diseases 2014;209;11;1808-15

  • A barcode of organellar genome polymorphisms identifies the geographic origin of Plasmodium falciparum strains.

    Preston MD, Campino S, Assefa SA, Echeverry DF, Ocholla H, Amambua-Ngwa A, Stewart LB, Conway DJ, Borrmann S, Michon P, Zongo I, Ouédraogo JB, Djimde AA, Doumbo OK, Nosten F, Pain A, Bousema T, Drakeley CJ, Fairhurst RM, Sutherland CJ, Roper C and Clark TG

    Immunology and Infection Department, London School of Hygiene and Tropical Medicine, London WC1E 7HT, UK.

    Malaria is a major public health problem that is actively being addressed in a global eradication campaign. Increased population mobility through international air travel has elevated the risk of re-introducing parasites to elimination areas and dispersing drug-resistant parasites to new regions. A simple genetic marker that quickly and accurately identifies the geographic origin of infections would be a valuable public health tool for locating the source of imported outbreaks. Here we analyse the mitochondrion and apicoplast genomes of 711 Plasmodium falciparum isolates from 14 countries, and find evidence that they are non-recombining and co-inherited. The high degree of linkage produces a panel of relatively few single-nucleotide polymorphisms (SNPs) that is geographically informative. We design a 23-SNP barcode that is highly predictive (~92%) and easily adapted to aid case management in the field and survey parasite migration worldwide.

    Nature communications 2014;5;4052

  • A central role for GRB10 in regulation of islet function in man.

    Prokopenko I, Poon W, Mägi R, Prasad B R, Salehi SA, Almgren P, Osmark P, Bouatia-Naji N, Wierup N, Fall T, Stančáková A, Barker A, Lagou V, Osmond C, Xie W, Lahti J, Jackson AU, Cheng YC, Liu J, O'Connell JR, Blomstedt PA, Fadista J, Alkayyali S, Dayeh T, Ahlqvist E, Taneera J, Lecoeur C, Kumar A, Hansson O, Hansson K, Voight BF, Kang HM, Levy-Marchal C, Vatin V, Palotie A, Syvänen AC, Mari A, Weedon MN, Loos RJ, Ong KK, Nilsson P, Isomaa B, Tuomi T, Wareham NJ, Stumvoll M, Widen E, Lakka TA, Langenberg C, Tönjes A, Rauramaa R, Kuusisto J, Frayling TM, Froguel P, Walker M, Eriksson JG, Ling C, Kovacs P, Ingelsson E, McCarthy MI, Shuldiner AR, Silver KD, Laakso M, Groop L and Lyssenko V

    Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, United Kingdom; Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom; Department of Genomics of Common Disease, School of Public Health, Imperial College London, Hammersmith Hospital, London, United Kingdom.

    Variants in the growth factor receptor-bound protein 10 (GRB10) gene were in a GWAS meta-analysis associated with reduced glucose-stimulated insulin secretion and increased risk of type 2 diabetes (T2D) if inherited from the father, but inexplicably reduced fasting glucose when inherited from the mother. GRB10 is a negative regulator of insulin signaling and imprinted in a parent-of-origin fashion in different tissues. GRB10 knock-down in human pancreatic islets showed reduced insulin and glucagon secretion, which together with changes in insulin sensitivity may explain the paradoxical reduction of glucose despite a decrease in insulin secretion. Together, these findings suggest that tissue-specific methylation and possibly imprinting of GRB10 can influence glucose metabolism and contribute to T2D pathogenesis. The data also emphasize the need in genetic studies to consider whether risk alleles are inherited from the mother or the father.

    Funded by: Medical Research Council: G0601261; NCRR NIH HHS: M01 RR02719, M01 RR16500; NHLBI NIH HHS: U01 HL84756; NIDDK NIH HHS: P30 DK072488, P60 DK79637, R01 DK54261, R01 DK68495; Wellcome Trust: 083270, 089062, 089062/Z/09/Z, 090367, 090532, 098381, 89061/Z/09/Z

    PLoS genetics 2014;10;4;e1004235

  • A genome-wide association analysis of a broad psychosis phenotype identifies three loci for further investigation.

    Psychosis Endophenotypes International Consortium, Wellcome Trust Case-Control Consortium 2, Bramon E, Pirinen M, Strange A, Lin K, Freeman C, Bellenguez C, Su Z, Band G, Pearson R, Vukcevic D, Langford C, Deloukas P, Hunt S, Gray E, Dronov S, Potter SC, Tashakkori-Ghanbaria A, Edkins S, Bumpstead SJ, Arranz MJ, Bakker S, Bender S, Bruggeman R, Cahn W, Chandler D, Collier DA, Crespo-Facorro B, Dazzan P, de Haan L, Di Forti M, Dragović M, Giegling I, Hall J, Iyegbe C, Jablensky A, Kahn RS, Kalaydjieva L, Kravariti E, Lawrie S, Linszen DH, Mata I, McDonald C, McIntosh A, Myin-Germeys I, Ophoff RA, Pariante CM, Paunio T, Picchioni M, Psychiatric Genomics Consortium, Ripke S, Rujescu D, Sauer H, Shaikh M, Sussmann J, Suvisaari J, Tosato S, Toulopoulou T, Van Os J, Walshe M, Weisbrod M, Whalley H, Wiersma D, Blackwell JM, Brown MA, Casas JP, Corvin A, Duncanson A, Jankowski JA, Markus HS, Mathew CG, Palmer CN, Plomin R, Rautanen A, Sawcer SJ, Trembath RC, Wood NW, Barroso I, Peltonen L, Lewis CM, Murray RM, Donnelly P, Powell J and Spencer CC

    Background: Genome-wide association studies (GWAS) have identified several loci associated with schizophrenia and/or bipolar disorder. We performed a GWAS of psychosis as a broad syndrome rather than within specific diagnostic categories.

    Methods: 1239 cases with schizophrenia, schizoaffective disorder, or psychotic bipolar disorder; 857 of their unaffected relatives, and 2739 healthy controls were genotyped with the Affymetrix 6.0 single nucleotide polymorphism (SNP) array. Analyses of 695,193 SNPs were conducted using UNPHASED, which combines information across families and unrelated individuals. We attempted to replicate signals found in 23 genomic regions using existing data on nonoverlapping samples from the Psychiatric GWAS Consortium and Schizophrenia-GENE-plus cohorts (10,352 schizophrenia patients and 24,474 controls).

    Results: No individual SNP showed compelling evidence for association with psychosis in our data. However, we observed a trend for association with same risk alleles at loci previously associated with schizophrenia (one-sided p = .003). A polygenic score analysis found that the Psychiatric GWAS Consortium's panel of SNPs associated with schizophrenia significantly predicted disease status in our sample (p = 5 × 10(-14)) and explained approximately 2% of the phenotypic variance.

    Conclusions: Although narrowly defined phenotypes have their advantages, we believe new loci may also be discovered through meta-analysis across broad phenotypes. The novel statistical methodology we introduced to model effect size heterogeneity between studies should help future GWAS that combine association evidence from related phenotypes. Applying these approaches, we highlight three loci that warrant further investigation. We found that SNPs conveying risk for schizophrenia are also predictive of disease status in our data.

    Funded by: Department of Health: PDA/02/06/016; Medical Research Council: G0000934, G0901310, G1100583; Wellcome Trust: 064971, 068545/Z/02, 072894/Z/03/Z, 075491/Z/04/B, 085475/B/08/Z, 085475/Z/08/Z, 090532, 090532/Z/09/Z, 095552, 097364/Z/11/Z

    Biological psychiatry 2014;75;5;386-97

  • A polygenic burden of rare disruptive mutations in schizophrenia.

    Purcell SM, Moran JL, Fromer M, Ruderfer D, Solovieff N, Roussos P, O'Dushlaine C, Chambert K, Bergen SE, Kähler A, Duncan L, Stahl E, Genovese G, Fernández E, Collins MO, Komiyama NH, Choudhary JS, Magnusson PK, Banks E, Shakir K, Garimella K, Fennell T, Depristo M, Grant SG, Haggarty SJ, Gabriel S, Scolnick EM, Lander ES, Hultman CM, Sullivan PF, McCarroll SA and Sklar P

    1] Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA [2] Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA [3] Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA [4] Analytic and Translational Genetics Unit, Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA [5] Medical and Population Genetics Program, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.

    Schizophrenia is a common disease with a complex aetiology, probably involving multiple and heterogeneous genetic factors. Here, by analysing the exome sequences of 2,536 schizophrenia cases and 2,543 controls, we demonstrate a polygenic burden primarily arising from rare (less than 1 in 10,000), disruptive mutations distributed across many genes. Particularly enriched gene sets include the voltage-gated calcium ion channel and the signalling complex formed by the activity-regulated cytoskeleton-associated scaffold protein (ARC) of the postsynaptic density, sets previously implicated by genome-wide association and copy-number variation studies. Similar to reports in autism, targets of the fragile X mental retardation protein (FMRP, product of FMR1) are enriched for case mutations. No individual gene-based test achieves significance after correction for multiple testing and we do not detect any alleles of moderately low frequency (approximately 0.5 to 1 per cent) and moderately large effect. Taken together, these data suggest that population-based exome sequencing can discover risk alleles and complements established gene-mapping paradigms in neuropsychiatric disease.

    Nature 2014

  • A global analysis of Y-chromosomal haplotype diversity for 23 STR loci.

    Purps J, Siegert S, Willuweit S, Nagy M, Alves C, Salazar R, Angustia SM, Santos LH, Anslinger K, Bayer B, Ayub Q, Wei W, Xue Y, Tyler-Smith C, Bafalluy MB, Martínez-Jarreta B, Egyed B, Balitzki B, Tschumi S, Ballard D, Court DS, Barrantes X, Bäßler G, Wiest T, Berger B, Niederstätter H, Parson W, Davis C, Budowle B, Burri H, Borer U, Koller C, Carvalho EF, Domingues PM, Chamoun WT, Coble MD, Hill CR, Corach D, Caputo M, D'Amato ME, Davison S, Decorte R, Larmuseau MH, Ottoni C, Rickards O, Lu D, Jiang C, Dobosz T, Jonkisz A, Frank WE, Furac I, Gehrig C, Castella V, Grskovic B, Haas C, Wobst J, Hadzic G, Drobnic K, Honda K, Hou Y, Zhou D, Li Y, Hu S, Chen S, Immel UD, Lessig R, Jakovski Z, Ilievska T, Klann AE, García CC, de Knijff P, Kraaijenbrink T, Kondili A, Miniati P, Vouropoulou M, Kovacevic L, Marjanovic D, Lindner I, Mansour I, Al-Azem M, Andari AE, Marino M, Furfuro S, Locarno L, Martín P, Luque GM, Alonso A, Miranda LS, Moreira H, Mizuno N, Iwashima Y, Neto RS, Nogueira TL, Silva R, Nastainczyk-Wulf M, Edelmann J, Kohl M, Nie S, Wang X, Cheng B, Núñez C, Pancorbo MM, Olofsson JK, Morling N, Onofri V, Tagliabracci A, Pamjav H, Volgyi A, Barany G, Pawlowski R, Maciejewska A, Pelotti S, Pepinski W, Abreu-Glowacka M, Phillips C, Cárdenas J, Rey-Gonzalez D, Salas A, Brisighelli F, Capelli C, Toscanini U, Piccinini A, Piglionica M, Baldassarra SL, Ploski R, Konarzewska M, Jastrzebska E, Robino C, Sajantila A, Palo JU, Guevara E, Salvador J, Ungria MC, Rodriguez JJ, Schmidt U, Schlauderer N, Saukko P, Schneider PM, Sirker M, Shin KJ, Oh YN, Skitsa I, Ampati A, Smith TG, Calvit LS, Stenzl V, Capal T, Tillmar A, Nilsson H, Turrina S, De Leo D, Verzeletti A, Cortellini V, Wetton JH, Gwynne GM, Jobling MA, Whittle MR, Sumita DR, Wolańska-Nowak P, Yong RY, Krawczak M, Nothnagel M and Roewer L

    Department of Forensic Genetics, Institute of Legal Medicine and Forensic Sciences, Charité-Universitätsmedizin, Berlin, Germany.

    In a worldwide collaborative effort, 19,630 Y-chromosomes were sampled from 129 different populations in 51 countries. These chromosomes were typed for 23 short-tandem repeat (STR) loci (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS385ab, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635, GATAH4, DYS481, DYS533, DYS549, DYS570, DYS576, and DYS643) and using the PowerPlex Y23 System (PPY23, Promega Corporation, Madison, WI). Locus-specific allelic spectra of these markers were determined and a consistently high level of allelic diversity was observed. A considerable number of null, duplicate and off-ladder alleles were revealed. Standard single-locus and haplotype-based parameters were calculated and compared between subsets of Y-STR markers established for forensic casework. The PPY23 marker set provides substantially stronger discriminatory power than other available kits but at the same time reveals the same general patterns of population structure as other marker sets. A strong correlation was observed between the number of Y-STRs included in a marker set and some of the forensic parameters under study. Interestingly a weak but consistent trend toward smaller genetic distances resulting from larger numbers of markers became apparent.

    Funded by: Wellcome Trust: 087576

    Forensic science international. Genetics 2014;12;12-23

  • FTO genetic variants, dietary intake, and body mass index: insights from 177,330 individuals.

    Qi Q, Kilpeläinen TO, Downer MK, Tanaka T, Smith CE, Sluijs I, Sonestedt E, Chu AY, Renström F, Lin X, Angquist LH, Huang J, Liu Z, Li Y, Asif Ali M, Xu M, Ahluwalia TS, Boer JM, Chen P, Daimon M, Eriksson J, Perola M, Friedlander Y, Gao YT, Heppe DH, Holloway JW, Houston DK, Kanoni S, Kim YM, Laaksonen MA, Jääskeläinen T, Lee NR, Lehtimäki T, Lemaitre RN, Lu W, Luben RN, Manichaikul A, Männistö S, Marques-Vidal P, Monda KL, Ngwa JS, Perusse L, van Rooij FJ, Xiang YB, Wen W, Wojczynski MK, Zhu J, Borecki IB, Bouchard C, Cai Q, Cooper C, Dedoussis GV, Deloukas P, Ferrucci L, Forouhi NG, Hansen T, Christiansen L, Hofman A, Johansson I, Jørgensen T, Karasawa S, Khaw KT, Kim MK, Kristiansson K, Li H, Lin X, Liu Y, Lohman KK, Long J, Mikkilä V, Mozaffarian D, North K, Pedersen O, Raitakari O, Rissanen H, Tuomilehto J, van der Schouw YT, Uitterlinden AG, Zillikens MC, Franco OH, Tai ES, Shu XO, Siscovick DS, Toft U, Verschuren WM, Vollenweider P, Wareham NJ, Witteman JC, Zheng W, Ridker PM, Kang JH, Liang L, Jensen MK, Curhan GC, Pasquale LR, Hunter DJ, Mohlke KL, Uusitupa M, Cupples LA, Rankinen T, Orho-Melander M, Chasman DI, Franks PW, Sørensen TI, Hu FB, Loos RJ, Nettleton J and Qi L

    Department of Epidemiology, Albert Einstein College of Medicine, Bronx, New York, United States of America Department of Nutrition, Harvard School of Public Health, Boston, Massachusetts, United States of America.

    FTO is the strongest known genetic susceptibility locus for obesity. Experimental studies in animals suggest the potential roles of FTO in regulating food intake. The interactive relation among FTO variants, dietary intake, and body mass index (BMI) are complex and results from previous often small-scale studies in humans are highly inconsistent. We performed large-scale analyses based on data from 177,330 adults (154,439 whites, 5,776 African Americans, and 17,115 Asians) from 40 studies to examine: 1) the association between the FTO-rs9939609 variant (or a proxy SNP) and total energy and macronutrient intake; and 2) the interaction between the FTO variant and dietary intake on BMI. The minor allele (A-allele) of the FTO-rs9939609 variant was associated with higher BMI in whites (effect per allele =0.34 [0.31, 0.37] kg/m(2), P=1.9×10(-105)), and all participants (0.30 [0.30, 0.35] kg/m(2), P=3.6×10(-107)). The BMI-increasing allele of the FTO variant showed a significant association with higher dietary protein intake (effect per allele =0.08[0.06, 0.10]%, P=2.4×10(-16)), and relative weak associations with lower total energy intake (-6.4[-10.1, -2.6] kcal/day, P=0.001) and lower dietary carbohydrate intake (-0.07 [-0.11, -0.02]%, P=0.004). The associations with protein (P=7.5×10(-9)) and total energy (P =0.002) were attenuated but remained significant after adjustment for BMI. We did not find significant interactions between the FTO variant and dietary intake of total energy, protein, carbohydrate, or fat on BMI. Our findings suggest a positive association between the BMI-increasing allele of FTO variant and higher dietary protein intake and offer insight into potential link between FTO, dietary protein intake and adiposity.

    Human molecular genetics 2014

  • SASI-Seq: sample assurance Spike-Ins, and highly differentiating 384 barcoding for Illumina sequencing.

    Quail MA, Smith M, Jackson D, Leonard S, Skelly T, Swerdlow HP, Gu Y and Ellis P

    Wellcome Trust Sanger Institute, Hinxton CB10 1SA, Cambs, UK.

    Background: A minor but significant fraction of samples subjected to next-generation sequencing methods are either mixed-up or cross-contaminated. These events can lead to false or inconclusive results. We have therefore developed SASI-Seq; a process whereby a set of uniquely barcoded DNA fragments are added to samples destined for sequencing. From the final sequencing data, one can verify that all the reads derive from the original sample(s) and not from contaminants or other samples.

    Results: By adding a mixture of three uniquely barcoded amplicons, of different sizes spanning the range of insert sizes one would normally use for Illumina sequencing, at a spike-in level of approximately 0.1%, we demonstrate that these fragments remain intimately associated with the sample. They can be detected following even the tightest size selection regimes or exome enrichment and can report the occurrence of sample mix-ups and cross-contamination.As a consequence of this work, we have designed a set of 384 eleven-base Illumina barcode sequences that are at least 5 changes apart from each other, allowing for single-error correction and very low levels of barcode misallocation due to sequencing error.

    Conclusion: SASI-Seq is a simple, inexpensive and flexible tool that enables sample assurance, allows deconvolution of sample mix-ups and reports levels of cross-contamination between samples throughout NGS workflows.

    BMC genomics 2014;15;1;110

  • Central regulation of bone mass.

    Quiros-Gonzalez I and Yadav VK

    Systems Biology of Bone Laboratory, Department of Mouse and Zebrafish Genetics, The Wellcome Trust Sanger Institute, Cambridge CB10 1SA, United Kingdom.

    Bones are structures that give the shape and defined features to vertebrates, protect several soft organs and perform multiple endocrine influences on other organs. To achieve these functions bones are first modeled early during life and then constantly remodeled throughout life. The process of bone (re)modeling happens simultaneously at multitude of locations in the skeleton and ensures that vertebrates have a mechanically strong yet a flexible skeleton to the most part of their life. Given the extent of its occurrence in the body, bone remodeling is a highly energy demanding process and is co-ordinated with other physiological processes as diverse as energy metabolism, sleep-wake cycle and reproduction. Neuronal circuits in the brain play a very important role in the coordination of bone remodeling with other organ system functions, and perform this function in sync with environmental and peripheral hormonal cues. In this review, we will focus on the roles of hormonal signals and neural circuits that originate in, or impinge on, the brain in the regulation of bone mass. We will provide herein an updated view of how advances in molecular genetics have refined the neural circuits involved in the regulation of bone mass, from the whole brain level to the specific neuronal populations and their neurotransmitters. This will help to understand the mechanisms whereby vertebrate brain regulates bone mass by fine-tuning metabolic signals that originate in the brain or elsewhere in the body.

    Archives of biochemistry and biophysics 2014

  • In utero undernourishment perturbs the adult sperm methylome and intergenerational metabolism.

    Radford EJ, Ito M, Shi H, Corish JA, Yamazawa K, Isganaitis E, Seisenberger S, Hore TA, Reik W, Erkek S, Peters AH, Patti ME and Ferguson-Smith AC

    Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3EG, UK.

    Adverse prenatal environments can promote metabolic disease in offspring and subsequent generations. Animal models and epidemiological data implicate epigenetic inheritance, but the mechanisms remain unknown. In an intergenerational developmental programming model affecting F2 mouse metabolism, we demonstrate that the in utero nutritional environment of F1 embryos alters the germline DNA methylome of F1 adult males in a locus-specific manner. Differentially methylated regions are hypomethylated and enriched in nucleosome-retaining regions. A substantial fraction is resistant to early embryo methylation reprogramming, potentially impacting F2 development. Differential methylation is not maintained in F2 tissues, yet locus-specific expression is perturbed. Thus, in utero nutritional exposures during critical windows of germ cell development can impact the male germline methylome, associated with metabolic disease in offspring.

    Science (New York, N.Y.) 2014

  • Generation of primary human intestinal T cell transcriptomes reveals differential expression at genetic risk loci for immune-mediated disease.

    Raine T, Liu JZ, Anderson CA, Parkes M and Kaser A

    Department of Medicine, Addenbrooke's Hospital, University of Cambridge, , Cambridge, UK.

    Objective: Genome-wide association studies (GWAS) have identified genetic variants within multiple risk loci as predisposing to intestinal inflammatory diseases, including Crohn's disease, ulcerative colitis and coeliac disease. Most risk variants affect regulation of transcription, but a critical challenge is to identify which genes and which cell types these variants affect. We aimed to characterise whole transcriptomes for each common T lymphocyte subset resident within the gut mucosa, and use these to infer biological insights and highlight candidate genes of interest within GWAS risk loci.

    Design: We isolated the four major intestinal T cell populations from pinch biopsies from healthy subjects and generated transcriptomes for each. We computationally integrated these transcriptomes with GWAS data from immune-related diseases.

    Results: Robust, high quality transcriptomic data were generated from 1 ng of RNA from precisely sorted cell subsets. Gene expression patterns clearly differentiated intestinal T cells from counterparts in peripheral blood and revealed distinct signalling pathways for each intestinal T cell subset. Intestinal-specific T cell transcripts were enriched in GWAS risk loci for Crohn's disease, ulcerative colitis and coeliac disease, but also specific extraintestinal immune-mediated diseases, allowing prediction of novel candidate genes.

    Conclusions: This is the first report of transcriptomes for minimally manipulated intestinal T lymphocyte subsets in humans. We have demonstrated that careful processing of mucosal biopsies allows the generation of transcriptomes from as few as 1000 highly purified cells with minimal interindividual variation. Bioinformatic integration of transcriptomic data with recent GWAS data identified specific candidate genes and cell types for inflammatory pathologies.

    Gut 2014

  • Monoallelic and Biallelic Mutations in MAB21L2 Cause a Spectrum of Major Eye Malformations.

    Rainger J, Pehlivan D, Johansson S, Bengani H, Sanchez-Pulido L, Williamson KA, Ture M, Barker H, Rosendahl K, Spranger J, Horn D, Meynert A, Floyd JA, Prescott T, Anderson CA, Rainger JK, Karaca E, Gonzaga-Jauregui C, Jhangiani S, Muzny DM, Seawright A, Soares DC, Kharbanda M, Murday V, Finch A, UK10K, Baylor-Hopkins Center for Mendelian Genomics, Gibbs RA, van Heyningen V, Taylor MS, Yakut T, Knappskog PM, Hurles ME, Ponting CP, Lupski JR, Houge G and FitzPatrick DR

    Medical Research Council Human Genetics Unit, Medical Research Council Institute of Genetics and Molecular Medicine, Edinburgh EH4 2XU, UK.

    We identified four different missense mutations in the single-exon gene MAB21L2 in eight individuals with bilateral eye malformations from five unrelated families via three independent exome sequencing projects. Three mutational events altered the same amino acid (Arg51), and two were identical de novo mutations (c.151C>T [p.Arg51Cys]) in unrelated children with bilateral anophthalmia, intellectual disability, and rhizomelic skeletal dysplasia. c.152G>A (p.Arg51His) segregated with autosomal-dominant bilateral colobomatous microphthalmia in a large multiplex family. The fourth heterozygous mutation (c.145G>A [p.Glu49Lys]) affected an amino acid within two residues of Arg51 in an adult male with bilateral colobomata. In a fifth family, a homozygous mutation (c.740G>A [p.Arg247Gln]) altering a different region of the protein was identified in two male siblings with bilateral retinal colobomata. In mouse embryos, Mab21l2 showed strong expression in the developing eye, pharyngeal arches, and limb bud. As predicted by structural homology, wild-type MAB21L2 bound single-stranded RNA, whereas this activity was lost in all altered forms of the protein. MAB21L2 had no detectable nucleotidyltransferase activity in vitro, and its function remains unknown. Induced expression of wild-type MAB21L2 in human embryonic kidney 293 cells increased phospho-ERK (pERK1/2) signaling. Compared to the wild-type and p.Arg247Gln proteins, the proteins with the Glu49 and Arg51 variants had increased stability. Abnormal persistence of pERK1/2 signaling in MAB21L2-expressing cells during development is a plausible pathogenic mechanism for the heterozygous mutations. The phenotype associated with the homozygous mutation might be a consequence of complete loss of MAB21L2 RNA binding, although the cellular function of this interaction remains unknown.

    American journal of human genetics 2014;94;6;915-23

  • Characterizing genetic variants for clinical action.

    Ramos EM, Din-Lovinescu C, Berg JS, Brooks LD, Duncanson A, Dunn M, Good P, Hubbard TJ, Jarvik GP, O'Donnell C, Sherry ST, Aronson N, Biesecker LG, Blumberg B, Calonge N, Colhoun HM, Epstein RS, Flicek P, Gordon ES, Green ED, Green RC, Hurles M, Kawamoto K, Knaus W, Ledbetter DH, Levy HP, Lyon E, Maglott D, McLeod HL, Rahman N, Randhawa G, Wicklund C, Manolio TA, Chisholm RL and Williams MS

    Genome-wide association studies, DNA sequencing studies, and other genomic studies are finding an increasing number of genetic variants associated with clinical phenotypes that may be useful in developing diagnostic, preventive, and treatment strategies for individual patients. However, few variants have been integrated into routine clinical practice. The reasons for this are several, but two of the most significant are limited evidence about the clinical implications of the variants and a lack of a comprehensive knowledge base that captures genetic variants, their phenotypic associations, and other pertinent phenotypic information that is openly accessible to clinical groups attempting to interpret sequencing data. As the field of medicine begins to incorporate genome-scale analysis into clinical care, approaches need to be developed for collecting and characterizing data on the clinical implications of variants, developing consensus on their actionability, and making this information available for clinical use. The National Human Genome Research Institute (NHGRI) and the Wellcome Trust thus convened a workshop to consider the processes and resources needed to: (1) identify clinically valid genetic variants; (2) decide whether they are actionable and what the action should be; and (3) provide this information for clinical use. This commentary outlines the key discussion points and recommendations from the workshop. © 2014 Wiley Periodicals, Inc.

    American journal of medical genetics. Part C, Seminars in medical genetics 2014;166;1;93-104

  • Meta-analysis identifies loci affecting levels of the potential osteoarthritis biomarkers sCOMP and uCTX-II with genome wide significance.

    Ramos YF, Metrustry S, Arden N, Bay-Jensen AC, Beekman M, de Craen AJ, Cupples LA, Esko T, Evangelou E, Felson DT, Hart DJ, Ioannidis JP, Karsdal M, Kloppenburg M, Lafeber F, Metspalu A, Panoutsopoulou K, Slagboom PE, Spector TD, van Spil EW, Uitterlinden AG, Zhu Y, arcOGEN consortium, TreatOA collaborators, Valdes AM, van Meurs JB and Meulenbelt I

    Department of Molecular Epidemiology, LUMC, Leiden, The Netherlands The Netherlands Genomics Initiative-Sponsored Netherlands Consortium for Healthy Aging, Leiden and Rotterdam, The Netherlands.

    Background: Research for the use of biomarkers in osteoarthritis (OA) is promising, however, adequate discrimination between patients and controls may be hampered due to innate differences. We set out to identify loci influencing levels of serum cartilage oligomeric protein (sCOMP) and urinary C-telopeptide of type II collagen (uCTX-II).

    Methods: Meta-analysis of genome-wide association studies was applied to standardised residuals of sCOMP (N=3316) and uCTX-II (N=4654) levels available in 6 and 7 studies, respectively, from TreatOA. Effects were estimated using a fixed-effects model. Six promising signals were followed up by de novo genotyping in the Cohort Hip and Cohort Knee study (N=964). Subsequently, their role in OA susceptibility was investigated in large-scale genome-wide association studies meta-analyses for OA. Differential expression of annotated genes was assessed in cartilage.

    Results: Genome-wide significant association with sCOMP levels was found for a SNP within MRC1 (rs691461, p=1.7×10(-12)) and a SNP within CSMD1 associated with variation in uCTX-II levels with borderline genome-wide significance (rs1983474, p=8.5×10(-8)). Indication for association with sCOMP levels was also found for a locus close to the COMP gene itself (rs10038, p=7.1×10(-6)). The latter SNP was subsequently found to be associated with hip OA whereas COMP expression appeared responsive to the OA pathophysiology in cartilage.

    Conclusions: We have identified genetic loci affecting either uCTX-II or sCOMP levels. The genome wide significant association of MRC1 with sCOMP levels was found likely to act independent of OA subtypes. Increased sensitivity of biomarkers with OA may be accomplished by taking genetic variation into account.

    Journal of medical genetics 2014

  • Plasmid coding sequence-five influences infectivity and virulence in a mouse model of Chlamydia trachomatis urogenital infection.

    Ramsey KH, Schripsema JH, Smith BJ, Wang Y, Jham BC, O'Hagan KP, Thomson NR, Murthy AK, Skilton RJ, Chu P and Clarke IN

    Departments of Microbiology & Immunology

    The native plasmid of both Chlamydia muridarum and C. trachomatis has been shown to control virulence and infectivity in mice and in lower primates. We have recently described the development of a plasmid-based genetic transformation protocol for Chlamydia trachomatis that for the first time provides a platform for the molecular dissection of the function of the chlamydial plasmid and its individual genes or coding sequences (CDS). In the present study, we transformed a plasmid-free lymphogranuloma venereum isolate of C. trachomatis, serovar L2, with either the original shuttle vector (pGFP::SW2) or a derivative of pGFP::SW2 carrying a deletion of the plasmid CDS5 gene (pCDS5KO). Female mice were inoculated with these strains either intravaginally or transcervically. We found that transformation of the plasmid-free isolate with the intact pGFP::SW2 vector significantly enhanced infectivity and induction of host inflammatory responses when compared to the plasmid-free parental isolate. Transformation with pCDS5KO resulted in infection courses and inflammatory responses not significantly different from those observed in mice infected with the plasmid-free isolate. These results indicate a critical role of plasmid CDS5 in in vivo fitness and in induction of inflammatory responses. To our knowledge, these are the first in vivo observations ascribing infectivity and virulence to a specific plasmid gene.

    Infection and immunity 2014

  • A genomic island integrated into recA of Vibrio cholerae contains a divergent recA and provides multi-pathway protection from DNA damage.

    Rapa RA, Islam A, Monahan LG, Mutreja A, Thomson N, Charles IG, Stokes HW and Labbate M

    ithree Institute, University of Technology, PO Box 123 Broadway, Sydney, NSW, 2007, Australia; Department of Medical and Molecular Biosciences, University of Technology, Sydney, NSW, Australia.

    Lateral gene transfer (LGT) has been crucial in the evolution of the cholera pathogen, Vibrio cholerae. The two major virulence factors are present on two different mobile genetic elements, a bacteriophage containing the cholera toxin genes and a genomic island (GI) containing the intestinal adhesin genes. Non-toxigenic V. cholerae in the aquatic environment are a major source of novel DNA that allows the pathogen to morph via LGT. In this study, we report a novel GI from a non-toxigenic V. cholerae strain containing multiple genes involved in DNA repair including the recombination repair gene recA that is 23% divergent from the indigenous recA and genes involved in the translesion synthesis pathway. This is the first report of a GI containing the critical gene recA and the first report of a GI that targets insertion into a specific site within recA. We show that possession of the island in Escherichia coli is protective against DNA damage induced by UV-irradiation and DNA targeting antibiotics. This study highlights the importance of genetic elements such as GIs in the evolution of V. cholerae and emphasizes the importance of environmental strains as a source of novel DNA that can influence the pathogenicity of toxigenic strains.

    Environmental microbiology 2014

  • MEROPS: the database of proteolytic enzymes, their substrates and inhibitors.

    Rawlings ND, Waller M, Barrett AJ and Bateman A

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK and Proteins and Protein Families, EMBO European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.

    Peptidases, their substrates and inhibitors are of great relevance to biology, medicine and biotechnology. The MEROPS database ( aims to fulfill the need for an integrated source of information about these. The database has hierarchical classifications in which homologous sets of peptidases and protein inhibitors are grouped into protein species, which are grouped into families, which are in turn grouped into clans. Recent developments include the following. A community annotation project has been instigated in which acknowledged experts are invited to contribute summaries for peptidases. Software has been written to provide an Internet-based data entry form. Contributors are acknowledged on the relevant web page. A new display showing the intron/exon structures of eukaryote peptidase genes and the phasing of the junctions has been implemented. It is now possible to filter the list of peptidases from a completely sequenced bacterial genome for a particular strain of the organism. The MEROPS filing pipeline has been altered to circumvent the restrictions imposed on non-interactive blastp searches, and a HMMER search using specially generated alignments to maximize the distribution of organisms returned in the search results has been added.

    Nucleic acids research 2014;42;1;D503-9

  • Deciphering the origin of the 2012 cholera epidemic in Guinea by integrating epidemiological and molecular analyses.

    Rebaudet S, Mengel MA, Koivogui L, Moore S, Mutreja A, Kande Y, Yattara O, Sarr Keita V, Njanpop-Lafourcade BM, Fournier PE, Garnotel E, Keita S and Piarroux R

    Aix-Marseille Université, UMD 3, Marseille, France.

    Cholera is typically considered endemic in West Africa, especially in the Republic of Guinea. However, a three-year lull period was observed from 2009 to 2011, before a new epidemic struck the country in 2012, which was officially responsible for 7,350 suspected cases and 133 deaths. To determine whether cholera re-emerged from the aquatic environment or was rather imported due to human migration, a comprehensive epidemiological and molecular survey was conducted. A spatiotemporal analysis of the national case databases established Kaback Island, located off the southern coast of Guinea, as the initial focus of the epidemic in early February. According to the field investigations, the index case was found to be a fisherman who had recently arrived from a coastal district of neighboring Sierra Leone, where a cholera outbreak had recently occurred. MLVA-based genotype mapping of 38 clinical Vibrio cholerae O1 El Tor isolates sampled throughout the epidemic demonstrated a progressive genetic diversification of the strains from a single genotype isolated on Kaback Island in February, which correlated with spatial epidemic spread. Whole-genome sequencing characterized this strain as an "atypical" El Tor variant. Furthermore, genome-wide SNP-based phylogeny analysis grouped the Guinean strain into a new clade of the third wave of the seventh pandemic, distinct from previously analyzed African strains and directly related to a Bangladeshi isolate. Overall, these results highly suggest that the Guinean 2012 epidemic was caused by a V. cholerae clone that was likely imported from Sierra Leone by an infected individual. These results indicate the importance of promoting the cross-border identification and surveillance of mobile and vulnerable populations, including fishermen, to prevent, detect and control future epidemics in the region. Comprehensive epidemiological investigations should be expanded to better understand cholera dynamics and improve disease control strategies throughout the African continent.

    PLoS neglected tropical diseases 2014;8;6;e2898

  • Bioinformatic Analysis of Expression Data to Identify Effector Candidates.

    Reid AJ and Jones JT

    Parasite Genomics, Wellcome Trust Sanger Institute, Genome Campus, Cambridge, CB10 1SA, UK,

    Pathogens produce effectors that manipulate the host to the benefit of the pathogen. These effectors are often secreted proteins that are upregulated during the early phases of infection. These properties can be used to identify candidate effectors from genomes and transcriptomes of pathogens. Here we describe commonly used bioinformatic approaches that (1) allow identification of genes encoding predicted secreted proteins within a genome and (2) allow the identification of genes encoding predicted secreted proteins that are upregulated at important stages of the life cycle. Other approaches for bioinformatic identification of effector candidates, including OrthoMCL analysis to identify expanded gene families, are also described.

    Methods in molecular biology (Clifton, N.J.) 2014;1127;17-27

  • Genomic analysis of the causative agents of coccidiosis in domestic chickens.

    Reid AJ, Blake DP, Ansari HR, Billington K, Browne HP, Bryant JM, Dunn M, Hung SS, Kawahara F, Miranda-Saavedra D, Malas T, Mourier T, Naghra H, Nair M, Otto TD, Rawlings ND, Rivailler P, Sanchez-Flores A, Sanders M, Subramaniam C, Tay YL, Woo Y, Wu X, Barrell B, Dear PH, Doerig C, Gruber A, Ivens AC, Parkinson J, Rajandream MA, Shirley MW, Wan KL, Berriman M, Tomley FM and Pain A

    Wellcome Trust Sanger Insititute;

    Global production of chickens has trebled in the past two decades and they are now the most important source of dietary animal protein worldwide. Chickens are subject to many infectious diseases that reduce their performance and productivity. Coccidiosis, caused by apicomplexan protozoa of the genus Eimeria, is one of the most important poultry diseases. Understanding the biology of Eimeria parasites underpins development of new drugs and vaccines needed to improve global food security. We have produced annotated genome sequences of all seven species of Eimeria that infect domestic chickens, which reveal the full extent of previously described repeat-rich and repeat-poor regions and show that these parasites possess the most repeat-rich proteomes ever described. Furthermore, while no other apicomplexan has been found to possess retrotransposons, Eimeria is home to a family of chromoviruses. Analysis of Eimeria genes involved in basic biology and host-parasite interaction highlights adaptations to a relatively simple developmental life cycle and a complex array of co-expressed surface proteins involved in host cell binding.

    Genome research 2014

  • Epigenetics: Cellular memory erased in human embryos.

    Reik W and Kelsey G

    1] Epigenetics Programme, Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK, and at the Centre for Trophoblast Research, University of Cambridge. [2] Wellcome Trust Sanger Institute, Cambridge.

    Nature 2014;511;7511;540-1

  • Parallel independent evolution of pathogenicity within the genus Yersinia.

    Reuter S, Connor TR, Barquist L, Walker D, Feltwell T, Harris SR, Fookes M, Hall ME, Petty NK, Fuchs TM, Corander J, Dufour M, Ringwood T, Savin C, Bouchier C, Martin L, Miettinen M, Shubin M, Riehm JM, Laukkanen-Ninios R, Sihvonen LM, Siitonen A, Skurnik M, Falcão JP, Fukushima H, Scholz HC, Prentice MB, Wren BW, Parkhill J, Carniel E, Achtman M, McNally A and Thomson NR

    Pathogen Research Group, Nottingham Trent University, Nottingham NG11 8NS, United Kingdom.

    The genus Yersinia has been used as a model system to study pathogen evolution. Using whole-genome sequencing of all Yersinia species, we delineate the gene complement of the whole genus and define patterns of virulence evolution. Multiple distinct ecological specializations appear to have split pathogenic strains from environmental, nonpathogenic lineages. This split demonstrates that contrary to hypotheses that all pathogenic Yersinia species share a recent common pathogenic ancestor, they have evolved independently but followed parallel evolutionary paths in acquiring the same virulence determinants as well as becoming progressively more limited metabolically. Shared virulence determinants are limited to the virulence plasmid pYV and the attachment invasion locus ail. These acquisitions, together with genomic variations in metabolic pathways, have resulted in the parallel emergence of related pathogens displaying an increasingly specialized lifestyle with a spectrum of virulence potential, an emerging theme in the evolution of other important human pathogens.

    Funded by: Wellcome Trust: 098051

    Proceedings of the National Academy of Sciences of the United States of America 2014;111;18;6768-73

  • Novel Genetic Associations with Serum Level Metabolites Identified by Phenotype Set Enrichment Analyses.

    Ried JS, Shin SY, Krumsiek J, Illig T, Theis FJ, Spector TD, Adamski J, Wichmann HE, Strauch K, Soranzo N, Suhre K and Gieger C

    Institute of Genetic Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, 85764 Neuherberg, Germany

    Availability of standardized metabolite panels and genome-wide single nucleotide polymorphism (SNP) data endorse the comprehensive analysis of gene-metabolite association. Currently, many studies use genome-wide association analysis to investigate the genetic effects on single metabolites (mGWAS) separately. Such studies have identified several loci that are associated not only with one but with multiple metabolites, facilitated by the fact that metabolite panels often include metabolites of the same or related pathways. Strategies that analyse several phenotypes in a combined way were shown to be able to detect additional genetic loci. One of those methods is the phenotype set enrichment analysis (PSEA) that tests sets of metabolites for enrichment at genes. Here we applied PSEA on two different panels of serum metabolites together with genome-wide data. All analyses were performed as a two-step identification-validation approach, using data from the population-based KORA cohort and the TwinsUK study. In addition to confirming genes that were already known from mGWAS, we were able to identify and validate twelve new genes. Knowledge about gene function was supported by the enriched metabolite sets. For loci with unknown gene functions, the results suggest a function that is interrelated with the metabolites, and hint at the underlying pathways.

    Human molecular genetics 2014

  • Structure- and context-based analysis of the GxGYxYP family reveals a new putative class of glycoside hydrolase.

    Rigden DJ, Eberhardt RY, Gilbert HJ, Xu Q, Chang Y and Godzik A

    Institute of Integrative Biology, University of Liverpool, Liverpool, UK.

    Background: Gut microbiome metagenomics has revealed many protein families and domains found largely or exclusively in that environment. Proteins containing the GxGYxYP domain are over-represented in the gut microbiota, and are found in Polysaccharide Utilization Loci in the gut symbiont Bacteroides thetaiotaomicron, suggesting their involvement in polysaccharide metabolism, but little else is known of the function of this domain.

    Results: Genomic context and domain architecture analyses support a role for the GxGYxYP domain in carbohydrate metabolism. Sparse occurrences in eukaryotes are the result of lateral gene transfer. The structure of the GxGYxYP domain-containing protein encoded by the BT2193 locus reveals two structural domains, the first composed of three divergent repeats with no recognisable homology to previously solved structures, the second a more familiar seven-stranded β/α barrel. Structure-based analyses including conservation mapping localise a presumed functional site to a cleft between the two domains of BT2193. Matching to a catalytic site template from a GH9 cellulase and other analyses point to a putative catalytic triad composed of Glu272, Asp331 and Asp333.

    Conclusions: We suggest that GxGYxYP-containing proteins constitute a novel glycoside hydrolase family of as yet unknown specificity.

    Funded by: Howard Hughes Medical Institute; NIGMS NIH HHS: P41GM103393, U54 GM094586-03; Wellcome Trust: WT077044/Z/05/Z

    BMC bioinformatics 2014;15;196

  • Urbanicity and lifestyle risk factors for cardiometabolic diseases in rural Uganda: a cross-sectional study.

    Riha J, Karabarinde A, Ssenyomo G, Allender S, Asiki G, Kamali A, Young EH, Sandhu MS and Seeley J

    Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom; Wellcome Trust Sanger Institute, Hinxton, United Kingdom.

    Background: Urban living is associated with unhealthy lifestyles that can increase the risk of cardiometabolic diseases. In sub-Saharan Africa (SSA), where the majority of people live in rural areas, it is still unclear if there is a corresponding increase in unhealthy lifestyles as rural areas adopt urban characteristics. This study examines the distribution of urban characteristics across rural communities in Uganda and their associations with lifestyle risk factors for chronic diseases.

    Using data collected in 2011, we examined cross-sectional associations between urbanicity and lifestyle risk factors in rural communities in Uganda, with 7,340 participants aged 13 y and above across 25 villages. Urbanicity was defined according to a multi-component scale, and Poisson regression models were used to examine associations between urbanicity and lifestyle risk factors by quartile of urbanicity. Despite all of the villages not having paved roads and running water, there was marked variation in levels of urbanicity across the villages, largely attributable to differences in economic activity, civil infrastructure, and availability of educational and healthcare services. In regression models, after adjustment for clustering and potential confounders including socioeconomic status, increasing urbanicity was associated with an increase in lifestyle risk factors such as physical inactivity (risk ratio [RR]: 1.19; 95% CI: 1.14, 1.24), low fruit and vegetable consumption (RR: 1.17; 95% CI: 1.10, 1.23), and high body mass index (RR: 1.48; 95% CI: 1.24, 1.77).

    Conclusions: This study indicates that even across rural communities in SSA, increasing urbanicity is associated with a higher prevalence of lifestyle risk factors for cardiometabolic diseases. This finding highlights the need to consider the health impact of urbanization in rural areas across SSA. Please see later in the article for the Editors' Summary.

    PLoS medicine 2014;11;7;e1001683

  • Functional annotation of noncoding sequence variants.

    Ritchie GR, Dunham I, Zeggini E and Flicek P

    1] European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK. [2] Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Identifying functionally relevant variants against the background of ubiquitous genetic variation is a major challenge in human genetics. For variants in protein-coding regions, our understanding of the genetic code and splicing allows us to identify likely candidates, but interpreting variants outside genic regions is more difficult. Here we present genome-wide annotation of variants (GWAVA), a tool that supports prioritization of noncoding variants by integrating various genomic and epigenomic annotations.

    Funded by: Wellcome Trust: 095908, 098051

    Nature methods 2014;11;3;294-6

  • Response to 'Predicting the diagnosis of autism spectrum disorder using gene pathway analysis'.

    Robinson EB, Howrigan D, Yang J, Ripke S, Anttila V, Duncan LE, Jostins L, Barrett JC, Medland SE, MacArthur DG, Breen G, O'Donovan MC, Wray NR, Devlin B, Daly MJ, Visscher PM, Sullivan PF and Neale BM

    1] Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA [2] Department of Medicine, Harvard Medical School, Boston, MA, USA [3] Medical and Population Genetics Program, Broad Institute for Harvard and MIT, Cambridge, MA, USA.

    Funded by: NCRR NIH HHS: UL1 RR025005; NHGRI NIH HHS: U01 HG004402; NHLBI NIH HHS: R01 HL059367, R01 HL086694, R01 HL087641

    Molecular psychiatry 2014;19;8;860-1

  • Improved exome prioritization of disease genes through cross-species phenotype comparison.

    Robinson PN, Köhler S, Oellrich A, Sanger Mouse Genetics Project, Wang K, Mungall CJ, Lewis SE, Washington N, Bauer S, Seelow D, Krawitz P, Gilissen C, Haendel M and Smedley D

    Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany;

    Numerous new disease-gene associations have been identified by whole-exome sequencing studies in the last few years. However, many cases remain unsolved due to the sheer number of candidate variants remaining after common filtering strategies such as removing low quality and common variants and those deemed unlikely to be pathogenic. The observation that each of our genomes contains about 100 genuine loss-of-function variants makes identification of the causative mutation problematic when using these strategies alone. We propose using the wealth of genotype to phenotype data that already exists from model organism studies to assess the potential impact of these exome variants. Here, we introduce PHenotypic Interpretation of Variants in Exomes (PHIVE), an algorithm that integrates the calculation of phenotype similarity between human diseases and genetically modified mouse models with evaluation of the variants according to allele frequency, pathogenicity, and mode of inheritance approaches in our Exomiser tool. Large-scale validation of PHIVE analysis using 100,000 exomes containing known mutations demonstrated a substantial improvement (up to 54.1-fold) over purely variant-based (frequency and pathogenicity) methods with the correct gene recalled as the top hit in up to 83% of samples, corresponding to an area under the ROC curve of >95%. We conclude that incorporation of phenotype data can play a vital role in translational bioinformatics and propose that exome sequencing projects should systematically capture clinical phenotypes to take advantage of the strategy presented here.

    Funded by: NIH HHS: R24 OD011883

    Genome research 2014;24;2;340-8

  • POT1 loss-of-function variants predispose to familial melanoma.

    Robles-Espinoza CD, Harland M, Ramsay AJ, Aoude LG, Quesada V, Ding Z, Pooley KA, Pritchard AL, Tiffen JC, Petljak M, Palmer JM, Symmons J, Johansson P, Stark MS, Gartside MG, Snowden H, Montgomery GW, Martin NG, Liu JZ, Choi J, Makowski M, Brown KM, Dunning AM, Keane TM, López-Otín C, Gruis NA, Hayward NK, Bishop DT, Newton-Bishop JA and Adams DJ

    1] Experimental Cancer Genetics, Wellcome Trust Sanger Institute, Hinxton, UK. [2].

    Deleterious germline variants in CDKN2A account for around 40% of familial melanoma cases, and rare variants in CDK4, BRCA2, BAP1 and the promoter of TERT have also been linked to the disease. Here we set out to identify new high-penetrance susceptibility genes by sequencing 184 melanoma cases from 105 pedigrees recruited in the UK, The Netherlands and Australia that were negative for variants in known predisposition genes. We identified families where melanoma cosegregates with loss-of-function variants in the protection of telomeres 1 gene (POT1), with a proportion of family members presenting with an early age of onset and multiple primary tumors. We show that these variants either affect POT1 mRNA splicing or alter key residues in the highly conserved oligonucleotide/oligosaccharide-binding (OB) domains of POT1, disrupting protein-telomere binding and leading to increased telomere length. These findings suggest that POT1 variants predispose to melanoma formation via a direct effect on telomeres.

    Funded by: Cancer Research UK: 13031, C1287/A9540, C588/A10589, C588/A4994, C8197/A10123; Wellcome Trust: WT091310, WT098051

    Nature genetics 2014;46;5;478-81

  • Assessment of osteoarthritis candidate genes in a meta-analysis of nine genome-wide association studies.

    Rodriguez-Fontenla C, Calaza M, Evangelou E, Valdes AM, Arden N, Blanco FJ, Carr A, Chapman K, Deloukas P, Doherty M, Esko T, Garcés Aletá CM, Gomez-Reino Carnota JJ, Helgadottir H, Hofman A, Jonsdottir I, Kerkhof HJ, Kloppenburg M, McCaskie A, Ntzani EE, Ollier WE, Oreiro N, Panoutsopoulou K, Ralston SH, Ramos YF, Riancho JA, Rivadeneira F, Slagboom PE, Styrkarsdottir U, Thorsteinsdottir U, Thorleifsson G, Tsezou A, Uitterlinden AG, Wallis GA, Wilkinson JM, Zhai G, Zhu Y, arcOGEN Consortium, Felson DT, Ioannidis JP, Loughlin J, Metspalu A, Meulenbelt I, Stefansson K, van Meurs JB, Zeggini E, Spector TD and Gonzalez A

    Hospital Clinico Universitario de Santiago, Santiago de Compostela, Spain.

    Objective: To assess candidate genes for association with osteoarthritis (OA) and identify promising genetic factors and, secondarily, to assess the candidate gene approach in OA.

    Methods: A total of 199 candidate genes for association with OA were identified using Human Genome Epidemiology (HuGE) Navigator. All of their single-nucleotide polymorphisms (SNPs) with an allele frequency of >5% were assessed by fixed-effects meta-analysis of 9 genome-wide association studies (GWAS) that included 5,636 patients with knee OA and 16,972 control subjects and 4,349 patients with hip OA and 17,836 control subjects of European ancestry. An additional 5,921 individuals were genotyped for significantly associated SNPs in the meta-analysis. After correction for the number of independent tests, P values less than 1.58 × 10(-5) were considered significant.

    Results: SNPs at only 2 of the 199 candidate genes (COL11A1 and VEGF) were associated with OA in the meta-analysis. Two SNPs in COL11A1 showed association with hip OA in the combined analysis: rs4907986 (P = 1.29 × 10(-5) , odds ratio [OR] 1.12, 95% confidence interval [95% CI] 1.06-1.17) and rs1241164 (P = 1.47 × 10(-5) , OR 0.82, 95% CI 0.74-0.89). The sex-stratified analysis also showed association of COL11A1 SNP rs4908291 in women (P = 1.29 × 10(-5) , OR 0.87, 95% CI 0.82-0.92); this SNP showed linkage disequilibrium with rs4907986. A single SNP of VEGF, rs833058, showed association with hip OA in men (P = 1.35 × 10(-5) , OR 0.85, 95% CI 0.79-0.91). After additional samples were genotyped, association at one of the COL11A1 signals was reinforced, whereas association at VEGF was slightly weakened.

    Conclusion: Two candidate genes, COL11A1 and VEGF, were significantly associated with OA in this focused meta-analysis. The remaining candidate genes were not associated.

    Arthritis & rheumatology (Hoboken, N.J.) 2014;66;4;940-9

  • Genomic confirmation of hybridisation and recent inbreeding in a vector-isolated Leishmania population.

    Rogers MB, Downing T, Smith BA, Imamura H, Sanders M, Svobodova M, Volf P, Berriman M, Cotton JA and Smith DF

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom ; Centre for Immunology and Infection, Department of Biology, University of York, York, United Kingdom.

    Although asexual reproduction via clonal propagation has been proposed as the principal reproductive mechanism across parasitic protozoa of the Leishmania genus, sexual recombination has long been suspected, based on hybrid marker profiles detected in field isolates from different geographical locations. The recent experimental demonstration of a sexual cycle in Leishmania within sand flies has confirmed the occurrence of hybridisation, but knowledge of the parasite life cycle in the wild still remains limited. Here, we use whole genome sequencing to investigate the frequency of sexual reproduction in Leishmania, by sequencing the genomes of 11 Leishmania infantum isolates from sand flies and 1 patient isolate in a focus of cutaneous leishmaniasis in the Çukurova province of southeast Turkey. This is the first genome-wide examination of a vector-isolated population of Leishmania parasites. A genome-wide pattern of patchy heterozygosity and SNP density was observed both within individual strains and across the whole group. Comparisons with other Leishmania donovani complex genome sequences suggest that these isolates are derived from a single cross of two diverse strains with subsequent recombination within the population. This interpretation is supported by a statistical model of the genomic variability for each strain compared to the L. infantum reference genome strain as well as genome-wide scans for recombination within the population. Further analysis of these heterozygous blocks indicates that the two parents were phylogenetically distinct. Patterns of linkage disequilibrium indicate that this population reproduced primarily clonally following the original hybridisation event, but that some recombination also occurred. This observation allowed us to estimate the relative rates of sexual and asexual reproduction within this population, to our knowledge the first quantitative estimate of these events during the Leishmania life cycle.

    Funded by: Wellcome Trust: 076355, 085822, 098051

    PLoS genetics 2014;10;1;e1004092

  • Vitamin B12-dependent taurine synthesis regulates growth and bone mass.

    Roman-Garcia P, Quiros-Gonzalez I, Mottram L, Lieben L, Sharan K, Wangwiwatsin A, Tubio J, Lewis K, Wilkinson D, Santhanam B, Sarper N, Clare S, Vassiliou GS, Velagapudi VR, Dougan G and Yadav VK

    Both maternal and offspring-derived factors contribute to lifelong growth and bone mass accrual, although the specific role of maternal deficiencies in the growth and bone mass of offspring is poorly understood. In the present study, we have shown that vitamin B12 (B12) deficiency in a murine genetic model results in severe postweaning growth retardation and osteoporosis, and the severity and time of onset of this phenotype in the offspring depends on the maternal genotype. Using integrated physiological and metabolomic analysis, we determined that B12 deficiency in the offspring decreases liver taurine production and associates with abrogation of a growth hormone/insulin-like growth factor 1 (GH/IGF1) axis. Taurine increased GH-dependent IGF1 synthesis in the liver, which subsequently enhanced osteoblast function, and in B12-deficient offspring, oral administration of taurine rescued their growth retardation and osteoporosis phenotypes. These results identify B12 as an essential vitamin that positively regulates postweaning growth and bone formation through taurine synthesis and suggests potential therapies to increase bone mass.

    The Journal of clinical investigation 2014;124;7;2988-3002

  • Allele-specific regulation of DISC1 expression by miR-135b-5p.

    Rossi M, Kilpinen H, Muona M, Surakka I, Ingle C, Lahtinen J, Hennah W, Ripatti S and Hovatta I

    1] Institute for Molecular Medicine, Finland (FIMM), University of Helsinki, Helsinki, Finland [2] Public Health Genomics Unit, National Institute for Health and Welfare, Helsinki, Finland.

    Disrupted-in-schizophrenia-1 (DISC1) gene has been established as a risk factor for various neuropsychiatric phenotypes. Both coding and regulatory variants in DISC1 have been identified and associated with these phenotypes in genetic studies. MicroRNAs (miRNAs) are important regulators of protein coding genes. Since the miRNA-mRNA target recognition mechanism is vulnerable to disruption by DNA polymorphisms, we investigated whether polymorphisms in the DISC1 3'UTR affect binding of miRNAs and lead to allele-specific regulation of DISC1. We identified four predicted polymorphic miRNA target sites in the DISC1 3'UTR, and demonstrated that miR-135b-5p regulates the level of DISC1 mRNA. Moreover, DISC1 regulation by miR-135b-5p is allele specific: miR-135b-5p only binds to the major allele (A) of rs11122396, not to the minor allele (G). Thus, the G allele may be functionally related to the DISC1-associated phenotypes by abolishing regulation by miR-135b-5p, leading to elevated DISC1 levels.

    European journal of human genetics : EJHG 2014;22;6;840-3

  • Genetic background drives transcriptional variation in human induced pluripotent stem cells.

    Rouhani F, Kumasaka N, de Brito MC, Bradley A, Vallier L and Gaffney D

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    Human iPS cells have been generated using a diverse range of tissues from a variety of donors using different reprogramming vectors. However, these cell lines are heterogeneous, which presents a limitation for their use in disease modeling and personalized medicine. To explore the basis of this heterogeneity we generated 25 iPS cell lines under normalised conditions from the same set of somatic tissues across a number of donors. RNA-seq data sets from each cell line were compared to identify the majority contributors to transcriptional heterogeneity. We found that genetic differences between individual donors were the major cause of transcriptional variation between lines. In contrast, residual signatures from the somatic cell of origin, so called epigenetic memory, contributed relatively little to transcriptional variation. Thus, underlying genetic background variation is responsible for most heterogeneity between human iPS cell lines. We conclude that epigenetic effects in hIPSCs are minimal, and that hIPSCs are a stable, robust and powerful platform for large-scale studies of the function of genetic differences between individuals. Our data also suggest that future studies using hIPSCs as a model system should focus most effort on collection of large numbers of donors, rather than generating large numbers of lines from the same donor.

    Funded by: Wellcome Trust: 098051

    PLoS genetics 2014;10;6;e1004432

  • Rapid conversion of EUCOMM/KOMP-CSD alleles in mouse embryos using a cell-permeable Cre recombinase.

    Ryder E, Doe B, Gleeson D, Houghton R, Dalvi P, Grau E, Habib B, Miklejewska E, Newman S, Sethi D, Sinclair C, Vyas S, Wardle-Jones H, Sanger Mouse Genetics Project, Bottomley J, Bussell J, Galli A, Salisbury J and Ramirez-Solis R

    The Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK,

    We describe here use of a cell-permeable Cre to efficiently convert the EUCOMM/KOMP-CSD tm1a allele to the tm1b form in preimplantation mouse embryos in a high-throughput manner, consistent with the requirements of the International Mouse Phenotyping Consortium-affiliated NIH KOMP2 project. This method results in rapid allele conversion and minimizes the use of experimental animals when compared to conventional Cre transgenic mouse breeding, resulting in a significant reduction in costs and time with increased welfare benefits.

    Transgenic research 2014;23;1;177-85

  • Deconstructing pheromone-mediated behavior one layer at a time.

    Sánchez-Andrade G and Logan DW

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    The vomeronasal organ, a sensory structure within the nasal cavity of most tetrapods, detects pheromones that influence socio-sexual behavior. It has two neuronal layers, each patterned by distinct receptor sub-families coupled to different G-proteins. Work recently published in this journal found female mice with one layer genetically inactivated are deficient in a surprisingly wide range of reproductive behaviors, providing new insights into how the nose can influence the brain.See research article:

    BMC biology 2014;12;1;33

  • Sequence of a Complete Chicken BG Haplotype Shows Dynamic Expansion and Contraction of Two Gene Lineages with Particular Expression Patterns.

    Salomonsen J, Chattaway JA, Chan AC, Parker A, Huguet S, Marston DA, Rogers SL, Wu Z, Smith AL, Staines K, Butter C, Riegert P, Vainio O, Nielsen L, Kaspers B, Griffin DK, Yang F, Zoorob R, Guillemot F, Auffray C, Beck S, Skjødt K and Kaufman J

    Basel Institute for Immunology, Basel, Switzerland; Department of Veterinary Disease Biology, University of Copenhagen, Copenhagen, Denmark; Department of International Health, Immunology and Microbiology, University of Copenhagen, Copenhagen, Denmark.

    Many genes important in immunity are found as multigene families. The butyrophilin genes are members of the B7 family, playing diverse roles in co-regulation and perhaps in antigen presentation. In humans, a fixed number of butyrophilin genes are found in and around the major histocompatibility complex (MHC), and show striking association with particular autoimmune diseases. In chickens, BG genes encode homologues with somewhat different domain organisation. Only a few BG genes have been characterised, one involved in actin-myosin interaction in the intestinal brush border, and another implicated in resistance to viral diseases. We characterise all BG genes in B12 chickens, finding a multigene family organised as tandem repeats in the BG region outside the MHC, a single gene in the MHC (the BF-BL region), and another single gene on a different chromosome. There is a precise cell and tissue expression for each gene, but overall there are two kinds, those expressed by haemopoietic cells and those expressed in tissues (presumably non-haemopoietic cells), correlating with two different kinds of promoters and 5' untranslated regions (5'UTR). However, the multigene family in the BG region contains many hybrid genes, suggesting recombination and/or deletion as major evolutionary forces. We identify BG genes in the chicken whole genome shotgun sequence, as well as by comparison to other haplotypes by fibre fluorescence in situ hybridisation, confirming dynamic expansion and contraction within the BG region. Thus, the BG genes in chickens are undergoing much more rapid evolution compared to their homologues in mammals, for reasons yet to be understood.

    PLoS genetics 2014;10;6;e1004417

  • Disruption of the potassium channel regulatory subunit Kcne2 causes iron-deficient anemia.

    Salsbury G, Cambridge EL, McIntyre Z, Arends MJ, Karp NA, Isherwood C, Shannon C, Hooks Y, The Sanger Mouse Genetics Project, Ramirez-Solis R, Adams DJ, White JK and Speak AO

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Iron homeostasis is a dynamic process that is tightly controlled to balance iron uptake, storage and export. Reduction of dietary iron from the ferric to the ferrous form is required for uptake by solute carrier family 11 (proton-coupled divalent metal ion transporters) member 2 (Slc11a2) into the enterocytes. Both processes are proton dependent, and have led to the suggestion of the importance of acidic gastric pH for the absorption of dietary iron. Kcne2, in combination with Kcnq1, form a gastric potassium channel essential for gastric acidification. Deficiency of either Kcne2 or Kcnq1 results in achlorhydia, gastric hyperplasia and neoplasia, but the impact on iron absorption has not been investigated. Here we report that Kcne2 deficient mice, in addition to the previously reported phenotypes, also present with iron-deficient anemia. Interestingly impaired function of KCNQ1 results in iron deficient anemia in Jervell and Lange-Nielsen syndrome patients. We speculate that impaired function of KCNE2 could result in the same clinical phenotype.

    Experimental hematology 2014

  • The food-borne identity.

    Salter SJ

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2014

  • A framework for the interpretation of de novo mutation in human disease.

    Samocha KE, Robinson EB, Sanders SJ, Stevens C, Sabo A, McGrath LM, Kosmicki JA, Rehnström K, Mallick S, Kirby A, Wall DP, MacArthur DG, Gabriel SB, DePristo M, Purcell SM, Palotie A, Boerwinkle E, Buxbaum JD, Cook EH, Gibbs RA, Schellenberg GD, Sutcliffe JS, Devlin B, Roeder K, Neale BM and Daly MJ

    1] Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA. [2] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. [3] Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. [4] Program in Genetics and Genomics, Biological and Biomedical Sciences, Harvard Medical School, Boston, Massachusetts, USA.

    Spontaneously arising (de novo) mutations have an important role in medical genetics. For diseases with extensive locus heterogeneity, such as autism spectrum disorders (ASDs), the signal from de novo mutations is distributed across many genes, making it difficult to distinguish disease-relevant mutations from background variation. Here we provide a statistical framework for the analysis of excesses in de novo mutation per gene and gene set by calibrating a model of de novo mutation. We applied this framework to de novo mutations collected from 1,078 ASD family trios, and, whereas we affirmed a significant role for loss-of-function mutations, we found no excess of de novo loss-of-function mutations in cases with IQ above 100, suggesting that the role of de novo mutations in ASDs might reside in fundamental neurodevelopmental processes. We also used our model to identify ∼1,000 genes that are significantly lacking in functional coding variation in non-ASD samples and are enriched for de novo loss-of-function mutations identified in ASD cases.

    Nature genetics 2014

  • Impact of infectious diseases consultation on the management of Staphylococcus aureus bacteraemia in children.

    Saunderson RB, Gouliouris T, Cartwright EJ, Nickerson EJ, Aliyu SH, O'Donnell DR, Kelsall W, Limmathurotsakul D, Peacock SJ and Török ME

    Department of Medicine, University of Cambridge, Cambridge, UK.

    Objectives: Infectious diseases consultation (IDC) in adults with Staphylococcus aureus bacteraemia (SAB) has been shown to improve management and outcome. The aim of this study was to evaluate the impact of IDC on the management of SAB in children.

    Observational cohort study of children with SAB.

    Setting: Cambridge University Hospitals National Health Service (NHS) Foundation Trust, a large acute NHS Trust in the UK.

    Participants: All children with SAB admitted to the Cambridge University Hospitals NHS Foundation Trust between 16 July 2006 and 31 December 2012.

    Methods: Children with SAB between 2006 and 31 October 2009 were managed by routine clinical care (pre-IDC group) and data were collected retrospectively by case notes review. An IDC service for SAB was introduced in November 2009. All children with SAB were reviewed regularly and data were collected prospectively (IDC group) until 31 December 2012. Baseline characteristics, quality metrics and outcome were compared between the pre-IDC group and IDC group.

    Results: There were 66 episodes of SAB in 63 children-28 patients (30 episodes) in the pre-IDC group, and 35 patients (36 episodes) in the IDC group. The median age was 3.4 years (IQR 0.2-10.7 years). Patients in the IDC group were more likely to have echocardiography performed, a removable focus of infection identified and to receive a longer course of intravenous antimicrobial therapy. There were no differences in total duration of antibiotic therapy, duration of hospital admission or outcome at 30 or 90 days following onset of SAB.

    Conclusions: IDC resulted in improvements in the investigation and management of SAB in children.

    BMJ open 2014;4;7;e004659

  • The Yersinia pseudotuberculosis complex: Characterization and delineation of a new species, Yersinia wautersii.

    Savin C, Martin L, Bouchier C, Filali S, Chenau J, Zhou Z, Becher F, Fukushima H, Thomson NR, Scholz HC and Carniel E

    Yersinia Research Unit and National Reference Laboratory, Institut Pasteur, Paris, France. Electronic address:

    The genus Yersinia contains three species pathogenic for humans, one of which is the enteropathogen Yersinia pseudotuberculosis. A recent analysis by Multi Locus Sequence Typing (MLST) of the 'Y. pseudotuberculosis complex' revealed that this complex comprises three distinct populations: the Y. pestis/Y. pseudotuberculosis group, the recently described species Yersinia similis, and a third not yet characterized population designated 'Korean Group', because most strains were isolated in Korea. The aim of this study was to perform an in depth phenotypic and genetic characterization of the three populations composing the Y. pseudotuberculosis complex (excluding Y. pestis, which belonged to the Y. pseudotuberculosis cluster in the MLST analysis). Using a set of strains representative of each group, we found that the three populations had close metabolic properties, but were nonetheless distinguishable based on D-raffinose and D-melibiose fermentation, and on pyrazinamidase activity. Moreover, high-resolution electrospray mass spectrometry highlighted protein peaks characteristic of each population. Their 16S rRNA gene sequences shared high identity (≥99.5%), but specific nucleotide signatures for each group were identified. Multi-Locus Sequence Analysis also identified three genetically closely related but distinct populations. Finally, an Average Nucleotide Identity (ANI) analysis performed after sequencing the genomes of a subset of strains of each group also showed that intragroup identity (average for each group ≥99%) was higher than intergroup diversity (94.6-97.4%). Therefore, all phenotypic and genotypic traits studied concurred with the initial MLST data indicating that the Y. pseudotuberculosis complex comprises a third and clearly distinct population of strains forming a novel Yersinia species that we propose to designate Yersinia wautersii sp. nov. The isolation of some strains from humans, the detection of virulence genes (on the pYV and pVM82 plasmids, or encoding the superantigen ypmA) in some isolates, and the absence of pyrazinamidase activity (a hallmark of pathogenicity in the genus Yersinia) argue for the pathogenic potential of Y. wautersii.

    International journal of medical microbiology : IJMM 2014

  • Inferring human population size and separation history from multiple genome sequences.

    Schiffels S and Durbin R

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.

    The availability of complete human genome sequences from populations across the world has given rise to new population genetic inference methods that explicitly model ancestral relationships under recombination and mutation. So far, application of these methods to evolutionary history more recent than 20,000-30,000 years ago and to population separations has been limited. Here we present a new method that overcomes these shortcomings. The multiple sequentially Markovian coalescent (MSMC) analyzes the observed pattern of mutations in multiple individuals, focusing on the first coalescence between any two individuals. Results from applying MSMC to genome sequences from nine populations across the world suggest that the genetic separation of non-African ancestors from African Yoruban ancestors started long before 50,000 years ago and give information about human population history as recent as 2,000 years ago, including the bottleneck in the peopling of the Americas and separations within Africa, East Asia and Europe.

    Nature genetics 2014;46;8;919-25

  • TreeFam v9: a new website, more species and orthology-on-the-fly.

    Schreiber F, Patricio M, Muffato M, Pignatelli M and Bateman A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK and European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

    TreeFam ( is a database of phylogenetic trees inferred from animal genomes. For every TreeFam family we provide homology predictions together with the evolutionary history of the genes. Here we describe an update of the TreeFam database. The TreeFam project was resurrected in 2012 and has seen two releases since. The latest release (TreeFam 9) was made available in March 2013. It has orthology predictions and gene trees for 109 species in 15 736 families covering ∼2.2 million sequences. With release 9 we made modifications to our production pipeline and redesigned our website with improved gene tree visualizations and Wikipedia integration. Furthermore, we now provide an HMM-based sequence search that places a user-provided protein sequence into a TreeFam gene tree and provides quick orthology prediction. The tool uses Mafft and RAxML for the fast insertion into a reference alignment and tree, respectively. Besides the aforementioned technical improvements, we present a new approach to visualize gene trees and alternative displays that focuses on showing homology information from a species tree point of view. From release 9 onwards, TreeFam is now hosted at the EBI.

    Nucleic acids research 2014;42;1;D922-5

  • Common genetic variants highlight the role of insulin resistance and body fat distribution in type 2 diabetes, independently of obesity.

    Scott RA, Fall T, Pasko D, Barker A, Sharp SJ, Arriola L, Balkau B, Barricarte A, Barroso I, Boeing H, Clavel-Chapelon F, Crowe FL, Dekker JM, Fagherazzi G, Ferraninini E, Forouhi NG, Franks PW, Gavrila D, Giedraitis V, Grioni S, Groop LC, Kaaks R, Key TJ, Kühn T, Lotta L, Nilsson PM, Overvad K, Palli D, Panico S, Quirós JR, Rolandsson O, Roswall N, Sacerdote C, Sala N, Sánchez MJ, Schulze MB, Siddiq A, Slimani N, Sluijs I, Spijkerman AM, Tjonneland A, Tumino R, van der A DL, Yaghootkar H, The RISC study group, The InterAct consortium, McCarthy MI, Semple RK, Riboli E, Walker M, Ingelsson E, Frayling TM, Savage DB, Langenberg C and Wareham NJ

    MRC Epidemiology Unit, University of Cambridge, Cambridge, United Kingdom

    We aimed to validate genetic variants as instruments for insulin resistance and secretion, to characterise their association with intermediate phenotypes, and to investigate their role in T2D risk among normal-weight, overweight and obese individuals.We investigated the association of genetic scores with euglycaemic-hyperinsulinaemic clamp- and OGTT-based measures of insulin resistance and secretion, and a range of metabolic measures in up to 18,565 individuals. We also studied their association with T2D risk among normal-weight, overweight and obese individuals in up to 8,124 incident T2D cases. The insulin resistance score was associated with lower insulin sensitivity measured by M/I value (β in SDs-per-allele [95%CI]:-0.03[-0.04,-0.01];p=0.004). This score was associated with lower BMI (-0.01[-0.01,-0.0;p=0.02) and gluteofemoral fat-mass : -0.03[-0.05,-0.02;p=1.4x10(-6)), and with higher ALT (0.02[0.01,0.03];p=0.002) and gamma-GT (0.02[0.01,0.03];p=0.001). While the secretion score had a stronger association with T2D in leaner individuals (pinteraction=0.001), we saw no difference in the association of the insulin resistance score with T2D among BMI- or waist-strata(pinteraction>0.31). While insulin resistance is often considered secondary to obesity, the association of the insulin resistance score with lower BMI and adiposity and with incident T2D even among individuals of normal weight highlights the role of insulin resistance and ectopic fat distribution in T2D, independently of body size.

    Diabetes 2014

  • Re-sequencing Expands Our Understanding of the Phenotypic Impact of Variants at GWAS Loci.

    Service SK, Teslovich TM, Fuchsberger C, Ramensky V, Yajnik P, Koboldt DC, Larson DE, Zhang Q, Lin L, Welch R, Ding L, McLellan MD, O'Laughlin M, Fronick C, Fulton LL, Magrini V, Swift A, Elliott P, Jarvelin MR, Kaakinen M, McCarthy MI, Peltonen L, Pouta A, Bonnycastle LL, Collins FS, Narisu N, Stringham HM, Tuomilehto J, Ripatti S, Fulton RS, Sabatti C, Wilson RK, Boehnke M and Freimer NB

    Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, California, United States of America.

    Genome-wide association studies (GWAS) have identified >500 common variants associated with quantitative metabolic traits, but in aggregate such variants explain at most 20-30% of the heritable component of population variation in these traits. To further investigate the impact of genotypic variation on metabolic traits, we conducted re-sequencing studies in >6,000 members of a Finnish population cohort (The Northern Finland Birth Cohort of 1966 [NFBC]) and a type 2 diabetes case-control sample (The Finland-United States Investigation of NIDDM Genetics [FUSION] study). By sequencing the coding sequence and 5' and 3' untranslated regions of 78 genes at 17 GWAS loci associated with one or more of six metabolic traits (serum levels of fasting HDL-C, LDL-C, total cholesterol, triglycerides, plasma glucose, and insulin), and conducting both single-variant and gene-level association tests, we obtained a more complete understanding of phenotype-genotype associations at eight of these loci. At all eight of these loci, the identification of new associations provides significant evidence for multiple genetic signals to one or more phenotypes, and at two loci, in the genes ABCA1 and CETP, we found significant gene-level evidence of association to non-synonymous variants with MAF<1%. Additionally, two potentially deleterious variants that demonstrated significant associations (rs138726309, a missense variant in G6PC2, and rs28933094, a missense variant in LIPC) were considerably more common in these Finnish samples than in European reference populations, supporting our prior hypothesis that deleterious variants could attain high frequencies in this isolated population, likely due to the effects of population bottlenecks. Our results highlight the value of large, well-phenotyped samples for rare-variant association analysis, and the challenge of evaluating the phenotypic impact of such variants.

    PLoS genetics 2014;10;1;e1004147

  • Genomic Epidemiology of Vibrio cholerae O1 Associated with Floods, Pakistan, 2010.

    Shah MA, Mutreja A, Thomson N, Baker S, Parkhill J, Dougan G, Bokhari H and Wren BW

    In August 2010, Pakistan experienced major floods and a subsequent cholera epidemic. To clarify the population dynamics and transmission of Vibrio cholerae in Pakistan, we sequenced the genomes of all V. cholerae O1 El Tor isolates and compared the sequences to a global collection of 146 V. cholerae strains. Within the global phylogeny, all isolates from Pakistan formed 2 new subclades (PSC-1 and PSC-2), lying in the third transmission wave of the seventh-pandemic lineage that could be distinguished by signature deletions and their antimicrobial susceptibilities. Geographically, PSC-1 isolates originated from the coast, whereas PSC-2 isolates originated from inland areas flooded by the Indus River. Single-nucleotide polymorphism accumulation analysis correlated river flow direction with the spread of PSC-2. We found at least 2 sources of cholera in Pakistan during the 2010 epidemic and illustrate the value of a global genomic data bank in contextualizing cholera outbreaks.

    Emerging infectious diseases 2014;20;1;13-20

  • Efficient genome modification by CRISPR-Cas9 nickase with minimal off-target effects.

    Shen B, Zhang W, Zhang J, Zhou J, Wang J, Chen L, Wang L, Hodgkins A, Iyer V, Huang X and Skarnes WC

    1] Ministry of Education Key Laboratory of Model Animal for Disease Study, Model Animal Research Center of Nanjing University, Nanjing, China. [2].

    Bacterial RNA-directed Cas9 endonuclease is a versatile tool for site-specific genome modification in eukaryotes. Co-microinjection of mouse embryos with Cas9 mRNA and single guide RNAs induces on-target and off-target mutations that are transmissible to offspring. However, Cas9 nickase can be used to efficiently mutate genes without detectable damage at known off-target sites. This method is applicable for genome editing of any model organism and minimizes confounding problems of off-target mutations.

    Nature methods 2014

  • Cryptic ecology among host generalist Campylobacter jejuni in domestic animals.

    Sheppard SK, Cheng L, Méric G, de Haan CP, Llarena AK, Marttinen P, Vidal A, Ridley A, Clifton-Hadley F, Connor TR, Strachan NJ, Forbes K, Colles FM, Jolley KA, Bentley SD, Maiden MC, Hänninen ML, Parkhill J, Hanage WP and Corander J

    Department of Zoology, University of Oxford, The Tinbergen Building, South Parks Road, Oxford, OX1 3PS, UK; Institute of Life Science, College of Medicine, Swansea University, Swansea, SA2 8PP, UK.

    Homologous recombination between bacterial strains is theoretically capable of preventing the separation of daughter clusters, and producing cohesive clouds of genotypes in sequence space. However, numerous barriers to recombination are known. Barriers may be essential such as adaptive incompatibility, or ecological, which is associated with the opportunities for recombination in the natural habitat. Campylobacter jejuni is a gut colonizer of numerous animal species and a major human enteric pathogen. We demonstrate that the two major generalist lineages of C. jejuni do not show evidence of recombination with each other in nature, despite having a high degree of host niche overlap and recombining extensively with specialist lineages. However, transformation experiments show that the generalist lineages readily recombine with one another in vitro. This suggests ecological rather than essential barriers to recombination, caused by a cryptic niche structure within the hosts.

    Molecular ecology 2014

  • Structural genomics analysis of uncharacterized protein families overrepresented in human gut bacteria identifies a novel glycoside hydrolase.

    Sheydina A, Eberhardt RY, Rigden DJ, Chang Y, Li Z, Zmasek CC, Axelrod HL and Godzik A

    Joint Center for Structural Genomics, 10550 North Torrey Pines Road, BCC-206, La Jolla, California 92037, USA.

    Background: Bacteroides spp. form a significant part of our gut microbiome and are well known for optimized metabolism of diverse polysaccharides. Initial analysis of the archetypal Bacteroides thetaiotaomicron genome identified 172 glycosyl hydrolases and a large number of uncharacterized proteins associated with polysaccharide metabolism.

    Results: BT_1012 from Bacteroides thetaiotaomicron VPI-5482 is a protein of unknown function and a member of a large protein family consisting entirely of uncharacterized proteins. Initial sequence analysis predicted that this protein has two domains, one on the N- and one on the C-terminal. A PSI-BLAST search found over 150 full length and over 90 half size homologs consisting only of the N-terminal domain. The experimentally determined three-dimensional structure of the BT_1012 protein confirms its two-domain architecture and structural analysis of both domains suggests their specific functions. The N-terminal domain is a putative catalytic domain with significant similarity to known glycoside hydrolases, the C-terminal domain has a beta-sandwich fold typically found in C-terminal domains of other glycosyl hydrolases, however these domains are typically involved in substrate binding. We describe the structure of the BT_1012 protein and discuss its sequence-structure relationship and their possible functional implications.

    Conclusions: Structural and sequence analyses of the BT_1012 protein identifies it as a glycosyl hydrolase, expanding an already impressive catalog of enzymes involved in polysaccharide metabolism in Bacteroides spp. Based on this we have renamed the Pfam families representing the two domains found in the BT_1012 protein, PF13204 and PF12904, as putative glycoside hydrolase and glycoside hydrolase-associated C-terminal domain respectively.

    Funded by: NIGMS NIH HHS: U54 GM094586

    BMC bioinformatics 2014;15;1;112

  • An atlas of genetic influences on human blood metabolites.

    Shin SY, Fauman EB, Petersen AK, Krumsiek J, Santos R, Huang J, Arnold M, Erte I, Forgetta V, Yang TP, Walter K, Menni C, Chen L, Vasquez L, Valdes AM, Hyde CL, Wang V, Ziemek D, Roberts P, Xi L, Grundberg E, Multiple Tissue Human Expression Resource (MuTHER) Consortium, Waldenberger M, Richards JB, Mohney RP, Milburn MV, John SL, Trimmer J, Theis FJ, Overington JP, Suhre K, Brosnan MJ, Gieger C, Kastenmüller G, Spector TD and Soranzo N

    1] Department of Human Genetics, Wellcome Trust Sanger Institute, Hinxton, UK. [2] [3].

    Genome-wide association scans with high-throughput metabolic profiling provide unprecedented insights into how genetic variation influences metabolism and complex disease. Here we report the most comprehensive exploration of genetic loci influencing human metabolism thus far, comprising 7,824 adult individuals from 2 European population studies. We report genome-wide significant associations at 145 metabolic loci and their biochemical connectivity with more than 400 metabolites in human blood. We extensively characterize the resulting in vivo blueprint of metabolism in human blood by integrating it with information on gene expression, heritability and overlap with known loci for complex disorders, inborn errors of metabolism and pharmacological targets. We further developed a database and web-based resources for data mining and results visualization. Our findings provide new insights into the role of inherited variation in blood metabolic diversity and identify potential new opportunities for drug development and for understanding disease.

    Funded by: Wellcome Trust: WT091310, WT098051

    Nature genetics 2014;46;6;543-50

  • Plasmid deficiency in urogenital isolates of Chlamydia trachomatis reduces infectivity and virulence in a mouse model.

    Sigar IM, Schripsema JH, Wang Y, Clarke IN, Cutcliffe LT, Seth-Smith HM, Thomson NR, Bjartling C, Unemo M, Persson K and Ramsey KH

    Microbiology and Immunology Department, Chicago College of Osteopathic Medicine, Midwestern University, Downers Grove, IL, USA.

    We hypothesized that the plasmid of urogenital isolates of Chlamydia trachomatis would modulate infectivity and virulence in a mouse model. To test this hypothesis, we infected female mice in the respiratory or urogenital tract with graded doses of a human urogenital isolate of C. trachomatis, serovar F, possessing the cognate plasmid. For comparison, we inoculated mice with a plasmid-free serovar F isolate. Following urogenital inoculation, the plasmid-free isolate displayed significantly reduced infectivity compared with the wild-type strain with the latter yielding a 17-fold lower infectious dose to yield 50% infection. When inoculated via the respiratory tract, the plasmid-free isolate exhibited reduced infectivity and virulence (as measured by weight change) when compared to the wild-type isolate. Further, differences in infectivity, but not in virulence were observed in a C. trachomatis, serovar E isolate with a deletion within the plasmid coding sequence 1 when compared to a serovar E isolate with no mutations in the plasmid. We conclude that plasmid loss reduces virulence and infectivity in this mouse model. These findings further support a role for the chlamydial plasmid in infectivity and virulence in vivo.

    Pathogens and disease 2014;70;1;61-9

  • Dissecting mammalian immunity through mutation.

    Siggs OM

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Although mutation and natural selection have given rise to our immune system, a well-placed mutation can also cripple it, and within an expanding population we are recognizing more and more cases of single-gene mutations that compromise immunity. These mutations are an ideal tool for understanding human immunology, and there are more ways than ever to measure their physiological effects. There are also more ways to create mutations in the laboratory, and to use these resources to systematically define the function of every gene in our genome. This review focuses on the discovery and creation of mutations in the context of mammalian immunity, with an emphasis on the use of genome-wide chemical and CRISPR/Cas9 mutagenesis to reveal gene function.Immunology and Cell Biology advance online publication, 11 February 2014; doi:10.1038/icb.2014.8.

    Immunology and cell biology 2014

  • A cascade of DNA-binding proteins for sexual commitment and development in Plasmodium.

    Sinha A, Hughes KR, Modrzynska KK, Otto TD, Pfander C, Dickens NJ, Religa AA, Bushell E, Graham AL, Cameron R, Kafsack BF, Williams AE, Llinás M, Berriman M, Billker O and Waters AP

    1] Wellcome Trust Centre for Molecular Parasitology, University of Glasgow, Glasgow G12 8QQ, UK [2].

    Commitment to and completion of sexual development are essential for malaria parasites (protists of the genus Plasmodium) to be transmitted through mosquitoes. The molecular mechanism(s) responsible for commitment have been hitherto unknown. Here we show that PbAP2-G, a conserved member of the apicomplexan AP2 (ApiAP2) family of DNA-binding proteins, is essential for the commitment of asexually replicating forms to sexual development in Plasmodium berghei, a malaria parasite of rodents. PbAP2-G was identified from mutations in its encoding gene, PBANKA_143750, which account for the loss of sexual development frequently observed in parasites transmitted artificially by blood passage. Systematic gene deletion of conserved ApiAP2 genes in Plasmodium confirmed the role of PbAP2-G and revealed a second ApiAP2 member (PBANKA_103430, here termed PbAP2-G2) that significantly modulates but does not abolish gametocytogenesis, indicating that a cascade of ApiAP2 proteins are involved in commitment to the production and maturation of gametocytes. The data suggest a mechanism of commitment to gametocytogenesis in Plasmodium consistent with a positive feedback loop involving PbAP2-G that could be exploited to prevent the transmission of this pernicious parasite.

    Funded by: Howard Hughes Medical Institute; Medical Research Council: G0501670; NIAID NIH HHS: R01 AI076276; NIGMS NIH HHS: P50GM071508; Wellcome Trust: 083811/Z/07/Z, 085349, 098051

    Nature 2014;507;7491;253-7

  • Comparative analysis of pseudogenes across three phyla.

    Sisu C, Pei B, Leng J, Frankish A, Zhang Y, Balasubramanian S, Harte R, Wang D, Rutenberg-Schoenberg M, Clark W, Diekhans M, Rozowsky J, Hubbard T, Harrow J and Gerstein MB

    Program in Computational Biology and Bioinformatics and Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520;

    Pseudogenes are degraded fossil copies of genes. Here, we report a comparison of pseudogenes spanning three phyla, leveraging the completed annotations of the human, worm, and fly genomes, which we make available as an online resource. We find that pseudogenes are lineage specific, much more so than protein-coding genes, reflecting the different remodeling processes marking each organism's genome evolution. The majority of human pseudogenes are processed, resulting from a retrotranspositional burst at the dawn of the primate lineage. This burst can be seen in the largely uniform distribution of pseudogenes across the genome, their preservation in areas with low recombination rates, and their preponderance in highly expressed gene families. In contrast, worm and fly pseudogenes tell a story of numerous duplication events. In worm, these duplications have been preserved through selective sweeps, so we see a large number of pseudogenes associated with highly duplicated families such as chemoreceptors. However, in fly, the large effective population size and high deletion rate resulted in a depletion of the pseudogene complement. Despite large variations between these species, we also find notable similarities. Overall, we identify a broad spectrum of biochemical activity for pseudogenes, with the majority in each organism exhibiting varying degrees of partial activity. In particular, we identify a consistent amount of transcription (∼15%) across all species, suggesting a uniform degradation process. Also, we see a uniform decay of pseudogene promoter activity relative to their coding counterparts and identify a number of pseudogenes with conserved upstream sequences and activity, hinting at potential regulatory roles.

    Proceedings of the National Academy of Sciences of the United States of America 2014

  • Luminal Microbes Promote Monocyte-Stem Cell Interactions Across a Healthy Colonic Epithelium.

    Skoczek DA, Walczysko P, Horn N, Parris A, Clare S, Williams MR and Sobolewski A

    Gut Health and Food Safety Institute Strategic Program, Institute of Food Research, Norwich, Norfolk NR4 7UA, United Kingdom;

    The intestinal epithelium forms a vital barrier between luminal microbes and the underlying mucosal immune system. Epithelial barrier function is maintained by continuous renewal of the epithelium and is pivotal for gut homeostasis. Breaching of the barrier causes mobilization of immune cells to promote epithelial restitution. However, it is not known whether microbes at the luminal surface of a healthy epithelial barrier influence immune cell mobilization to modulate tissue homeostasis. Using a mouse colonic mucosal explant model, we demonstrate that close proximity of luminal microbes to a healthy, intact epithelium results in rapid mucus secretion and movement of Ly6C(+)7/4(+) monocytes closer to epithelial stem cells. These early events are driven by the epithelial MyD88-signaling pathway and result in increased crypt cell proliferation and intestinal stem cell number. Over time, stem cell number and monocyte-crypt stem cell juxtapositioning return to homeostatic levels observed in vivo. We also demonstrate that reduced numbers of tissue Ly6C(+) monocytes can suppress Lgr5EGFP(+) stem cell expression in vivo and abrogate the response to luminal microbes ex vivo. The functional link between monocyte recruitment and increased crypt cell proliferation was further confirmed using a crypt-monocyte coculture model. This work demonstrates that the healthy gut epithelium mediates communication between luminal bacteria and monocytes, and monocytes can modulate crypt stem cell number and promote crypt cell proliferation to help maintain gut homeostasis.

    Journal of immunology (Baltimore, Md. : 1950) 2014

  • Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity.

    Smallwood SA, Lee HJ, Angermueller C, Krueger F, Saadeh H, Peat J, Andrews SR, Stegle O, Reik W and Kelsey G

    1] Epigenetics Programme, Babraham Institute, Cambridge, UK. [2].

    We report a single-cell bisulfite sequencing (scBS-seq) method that can be used to accurately measure DNA methylation at up to 48.4% of CpG sites. Embryonic stem cells grown in serum or in 2i medium displayed epigenetic heterogeneity, with '2i-like' cells present in serum culture. Integration of 12 individual mouse oocyte datasets largely recapitulated the whole DNA methylome, which makes scBS-seq a versatile tool to explore DNA methylation in rare cells and heterogeneous populations.

    Nature methods 2014

  • Walking the interactome for candidate prioritization in exome sequencing studies of Mendelian diseases.

    Smedley D, Köhler S, Czeschik JC, Amberger J, Bocchini C, Hamosh A, Veldboer J, Zemojtel T and Robinson PN

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK, Institute for Medical Genetics and Human Genetics, Charité-Universit ätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Hufelandstr. 55, 45122 Essen, Germany, Baltimore, MD, USA, Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, Takustr. 9, 14195 Berlin, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany, Max Planck Institute for Molecular Genetics, Ihnestrasse 73, 14195 Berlin, Germany.

    Motivation: Whole exome sequencing (WES) has opened up previously unheard of possibilities for identifying novel disease genes in Mendelian disorders, only about half of which have been elucidated to date. However, interpretation of WES data remains challenging.

    Results: Here, we analyze protein-protein association (PPA) networks to identify candidate genes in the vicinity of genes previously implicated in a disease. The analysis, utilizing a random-walk with restart (RWR) method, is adapted to the setting of WES by developing a composite variant-gene relevance score based on the rarity, location, and predicted pathogenicity of variants and the RWR evaluation of genes harboring the variants. Benchmarking using known disease variants from 88 disease-gene families reveals that the correct gene is ranked among the top ten candidates in 50% or more of cases, a figure which we confirmed using a prospective study of disease genes identified in 2012 and PPA data produced prior to that date. We implement our method in a freely available web server, ExomeWalker, that displays a ranked list of candidates together with information on PPAs, frequency, and predicted pathogenicity of the variants to allow quick and effective searches for candidates that are likely to reward closer investigation. Availability: CONTACT:

    Bioinformatics (Oxford, England) 2014

  • IFITM proteins-cellular inhibitors of viral entry.

    Smith S, Weston S, Kellam P and Marsh M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.

    Interferon inducible transmembrane (IFITM) proteins are a recently discovered family of cellular anti-viral proteins that restrict the replication of a number of enveloped and non-enveloped viruses. IFITM proteins are located in the plasma membrane and endosomal membranes, the main portals of entry for many viruses. Biochemical and membrane fusion studies suggest IFITM proteins have the ability to inhibit viral entry, possibly by modulating the fluidity of cellular membranes. Here we discuss the IFITM proteins, recent work on their mode of action, and future directions for research.

    Current opinion in virology 2014;4C;71-77

  • DrsG from Streptococcus dysgalactiae subsp. equisimilis Inhibits the Antimicrobial Peptide LL-37.

    Smyth D, Cameron A, Davies MR, McNeilly C, Hafner L, Sriprakash KS and McMillan DJ

    Bacterial Pathogenesis Laboratory, QIMR Berghofer Medical Research Institute, Herston, Queensland, Australia.

    SIC and DRS are related proteins present in only 4 of the >200 Streptococcus pyogenes emm types. These proteins inhibit complement-mediated lysis and/or the activity of certain antimicrobial peptides (AMPs). A gene encoding a homologue of these proteins, herein called DrsG, has been identified in the related bacterium Streptococcus dysgalactiae subsp. equisimilis. Here we show that geographically dispersed isolates representing 14 of 50 emm types examined possess variants of drsG. However, not all isolates within the drsG-positive emm types possess the gene. Sequence comparisons also revealed a high degree of conservation in different S. dysgalactiae subsp. equisimilis emm types. To examine the biological activity of DrsG, recombinant versions of two major DrsG variants, DrsGS and DrsGL, were expressed and purified. Western blot analysis using antisera raised to these proteins demonstrated both variants to be expressed and secreted into culture supernatants. Unlike SIC, but similar to DRS, DrsG does not inhibit complement-mediated lysis. However, like both SIC and DRS, DrsG is a ligand of the cathelicidin LL-37 and is inhibitory to its bactericidal activity in in vitro assays. Conservation of prolines in the C-terminal region also suggests that these residues are important in the biology of this family of proteins. This is the first report demonstrating the activity of an AMP-inhibitory protein in S. dysgalactiae subsp. equisimilis and suggests that inhibition of AMP activity is the primary function of this family of proteins. The acquisition of the complement-inhibitory activity of SIC may reflect its continuing evolution.

    Infection and immunity 2014;82;6;2337-44

  • Investigating the Feasibility of Scale up and Automation of Human Induced Pluripotent Stem Cells Cultured in Aggregates in Feeder Free Conditions.

    Soares F, Chandra A, Thomas R, Pedersen R, Vallier L and Williams D

    Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, Anne McLaren Laboratory for Regenerative medicine and Department of Surgery, University of Cambridge, UK; Centre for Biological Engineering, Loughborough University, UK. Electronic address:

    The transfer of a laboratory process into a manufacturing facility is one of the most critical steps required for the large scale production of cell-based therapy products. This study describes the first published protocol for scalable automated expansion of human induced pluripotent stem cell lines growing in aggregates in feeder-free and chemically defined medium. Cells were successfully transferred between different sites representative of research and manufacturing settings; and passaged manually and using the CompacT Select automation platform. Modified protocols were developed for the automated system and the management of cells aggregates (clumps) was identified as the critical step. Cellular morphology, pluripotency gene expression and differentiation into the three germ layers have been used compare the outcomes of manual and automated processes.

    Journal of biotechnology 2014

  • Secreted proteomes of different developmental stages of the gastrointestinal nematode Nippostrongylus brasiliensis.

    Sotillo J, Sanchez-Flores A, Cantacessi C, Harcus Y, Pickering D, Bouchery T, Camberis M, Tang SC, Giacomin P, Mulvenna J, Mitreva M, Berriman M, LeGros G, Maizels RM and Loukas A

    James Cook University, Australia;

    Hookworms infect more than 700 million people worldwide and cause more morbidity than most other human parasitic infections. Nippostrongylus brasiliensis (the rat hookworm) has been used as an experimental model for human hookworm because of their similar life cycles and ease of maintenance in laboratory rodents. Adult N. brasiliensis, like the human hookworm, live in the intestine of the host and release excretory/secretory products (ESP), which represent the major host-parasite interface. We performed a comparative proteomic analysis of infective larval (L3) and adult worm stages of N. brasiliensis to gain insights into the molecular bases of host-parasite relationships and determine whether N. brasiliensis could indeed serve as an appropriate model for studying human hookworm infections. Proteomic data were matched to a transcriptomic database assembled from 245,874,892 Illumina reads from different developmental stages (eggs, L3, L4 and adult) of N. brasiliensis yielding ~18,426 unigenes with 39,063 possible isoform transcripts. From this analysis, 313 proteins were identified from ESPs by LC-MS/MS - 52 in the L3 and 261 in the adult worm. Most of the proteins identified in the study were stage-specific (only 13 proteins were shared by both stages); in particular two families of proteins - astacin metalloproteases and CAP-domain containing SCP/TAPS - were highly represented in both L3 and adult ESP. These protein families are present in most nematode groups, and where studied, appear to play roles in larval migration and evasion of the host's immune response. Phylogenetic analyses of defined protein families and global gene similarity analyses showed that N. brasiliensis is more closely related to human hookworm than are other model nematodes including the murine gastrointestinal parasite Heligmosomoides polygyrus. These findings validate the use of N. brasiliensis as a suitable parasite for the study of human hookworm infections in a tractable animal model.

    Molecular & cellular proteomics : MCP 2014

  • A genome-wide association study and biological pathway analysis of epilepsy prognosis in a prospective cohort of newly treated epilepsy.

    Speed D, Hoggart C, Petrovski S, Tachmazidou I, Coffey A, Jorgensen A, Eleftherohorinou H, De Iorio M, Todaro M, De T, Smith D, Smith PE, Jackson M, Cooper P, Kellett M, Howell S, Newton M, Yerra R, Tan M, French C, Reuber M, Sills GE, Chadwick D, Pirmohamed M, Bentley D, Scheffer I, Berkovic S, Balding D, Palotie A, Marson A, O'Brien TJ and Johnson MR

    UCL Genetics Institute, University College London WC1E 6BT, UK.

    We present the analysis of a prospective multicentre study to investigate genetic effects on the prognosis of newly treated epilepsy. Patients with a new clinical diagnosis of epilepsy requiring medication were recruited and followed up prospectively. The clinical outcome was defined as freedom from seizures for a minimum of 12 months in accordance with the consensus statement from the International League Against Epilepsy (ILAE). Genetic effects on remission of seizures after starting treatment were analysed with and without adjustment for significant clinical prognostic factors, and the results from each cohort were combined using a fixed-effects meta-analysis. After quality control (QC), we analysed 889 newly treated epilepsy patients using 472 450 genotyped and 6.9 × 10(6) imputed single-nucleotide polymorphisms. Suggestive evidence for association (defined as Pmeta < 5.0 × 10(-7)) with remission of seizures after starting treatment was observed at three loci: 6p12.2 (rs492146, Pmeta = 2.1 × 10(-7), OR[G] = 0.57), 9p23 (rs72700966, Pmeta = 3.1 × 10(-7), OR[C] = 2.70) and 15q13.2 (rs143536437, Pmeta = 3.2 × 10(-7), OR[C] = 1.92). Genes of biological interest at these loci include PTPRD and ARHGAP11B (encoding functions implicated in neuronal development) and GSTA4 (a phase II biotransformation enzyme). Pathway analysis using two independent methods implicated a number of pathways in the prognosis of epilepsy, including KEGG categories 'calcium signaling pathway' and 'phosphatidylinositol signaling pathway'. Through a series of power curves, we conclude that it is unlikely any single common variant explains >4.4% of the variation in the outcome of newly treated epilepsy.

    Human molecular genetics 2014;23;1;247-58

  • Neutrophils Recruited by IL-22 in Peripheral Tissues Function as TRAIL-Dependent Antiviral Effectors against MCMV.

    Stacey MA, Marsden M, Pham N TA, Clare S, Dolton G, Stack G, Jones E, Klenerman P, Gallimore AM, Taylor PR, Snelgrove RJ, Lawley TD, Dougan G, Benedict CA, Jones SA, Wilkinson GW and Humphreys IR

    Institute of Infection and Immunity, School of Medicine, Cardiff University, Cardiff CF14 4XN, Wales, UK.

    During primary infection, murine cytomegalovirus (MCMV) spreads systemically, resulting in virus replication and pathology in multiple organs. This disseminated infection is ultimately controlled, but the underlying immune defense mechanisms are unclear. Investigating the role of the cytokine IL-22 in MCMV infection, we discovered an unanticipated function for neutrophils as potent antiviral effector cells that restrict viral replication and associated pathogenesis in peripheral organs. NK-, NKT-, and T cell-secreted IL-22 orchestrated antiviral neutrophil-mediated responses via induction in stromal nonhematopoietic tissue of the neutrophil-recruiting chemokine CXCL1. The antiviral effector properties of infiltrating neutrophils were directly linked to the expression of TNF-related apoptosis-inducing ligand (TRAIL). Our data identify a role for neutrophils in antiviral defense, and establish a functional link between IL-22 and the control of antiviral neutrophil responses that prevents pathogenic herpesvirus infection in peripheral organs.

    Cell host & microbe 2014;15;4;471-83

  • Development of an antigen microarray for high throughput monoclonal antibody selection.

    Staudt N, Müller-Sienerth N and Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, United Kingdom. Electronic address:

    Monoclonal antibodies are valuable laboratory reagents and are increasingly being exploited as therapeutics to treat a range of diseases. Selecting new monoclonal antibodies that are validated to work in particular applications, despite the availability of several different techniques, can be resource intensive with uncertain outcomes. To address this, we have developed an approach that enables early screening of hybridoma supernatants generated from an animal immunised with up to five different antigens followed by cloning of the antibody into a single expression plasmid. While this approach relieved the cellular cloning bottleneck and had the desirable ability to screen antibody function prior to cloning, the small volume of hybridoma supernatant available for screening limited the number of antigens for pooled immunisation. Here, we report the development of an antigen microarray that significantly reduces the volume of supernatant required for functional screening. This approach permits a significant increase in the number of antigens for parallel monoclonal antibody selection from a single animal. Finally, we show the successful use of a convenient small-scale transfection method to rapidly identify plasmids that encode functional cloned antibodies, addressing another bottleneck in this approach. In summary, we show that a hybrid approach of combining established hybridoma antibody technology with refined screening and antibody cloning methods can be used to select monoclonal antibodies of desired functional properties against many different antigens from a single immunised host.

    Biochemical and biophysical research communications 2014

  • Common variant at 16p11.2 conferring risk of psychosis.

    Steinberg S, de Jong S, Mattheisen M, Costas J, Demontis D, Jamain S, Pietiläinen OP, Lin K, Papiol S, Huttenlocher J, Sigurdsson E, Vassos E, Giegling I, Breuer R, Fraser G, Walker N, Melle I, Djurovic S, Agartz I, Tuulio-Henriksson A, Suvisaari J, Lönnqvist J, Paunio T, Olsen L, Hansen T, Ingason A, Pirinen M, Strengman E, GROUP, Hougaard DM, Orntoft T, Didriksen M, Hollegaard MV, Nordentoft M, Abramova L, Kaleda V, Arrojo M, Sanjuán J, Arango C, Etain B, Bellivier F, Méary A, Schürhoff F, Szoke A, Ribolsi M, Magni V, Siracusano A, Sperling S, Rossner M, Christiansen C, Kiemeney LA, Franke B, van den Berg LH, Veldink J, Curran S, Bolton P, Poot M, Staal W, Rehnstrom K, Kilpinen H, Freitag CM, Meyer J, Magnusson P, Saemundsen E, Martsenkovsky I, Bikshaieva I, Martsenkovska I, Vashchenko O, Raleva M, Paketchieva K, Stefanovski B, Durmishi N, Pejovic Milovancevic M, Lecic Tosevski D, Silagadze T, Naneishvili N, Mikeladze N, Surguladze S, Vincent JB, Farmer A, Mitchell PB, Wright A, Schofield PR, Fullerton JM, Montgomery GW, Martin NG, Rubino IA, van Winkel R, Kenis G, De Hert M, Réthelyi JM, Bitter I, Terenius L, Jönsson EG, Bakker S, van Os J, Jablensky A, Leboyer M, Bramon E, Powell J, Murray R, Corvin A, Gill M, Morris D, O'Neill FA, Kendler K, Riley B, Wellcome Trust Case Control Consortium 2, Craddock N, Owen MJ, O'Donovan MC, Thorsteinsdottir U, Kong A, Ehrenreich H, Carracedo A, Golimbet V, Andreassen OA, Børglum AD, Mors O, Mortensen PB, Werge T, Ophoff RA, Nöthen MM, Rietschel M, Cichon S, Ruggeri M, Tosato S, Palotie A, St Clair D, Rujescu D, Collier DA, Stefansson H and Stefansson K

    deCODE genetics, Reykjavik, Iceland.

    Epidemiological and genetic data support the notion that schizophrenia and bipolar disorder share genetic risk factors. In our previous genome-wide association study, meta-analysis and follow-up (totaling as many as 18 206 cases and 42 536 controls), we identified four loci showing genome-wide significant association with schizophrenia. Here we consider a mixed schizophrenia and bipolar disorder (psychosis) phenotype (addition of 7469 bipolar disorder cases, 1535 schizophrenia cases, 333 other psychosis cases, 808 unaffected family members and 46 160 controls). Combined analysis reveals a novel variant at 16p11.2 showing genome-wide significant association (rs4583255[T]; odds ratio=1.08; P=6.6 × 10(-11)). The new variant is located within a 593-kb region that substantially increases risk of psychosis when duplicated. In line with the association of the duplication with reduced body mass index (BMI), rs4583255[T] is also associated with lower BMI (P=0.0039 in the public GIANT consortium data set; P=0.00047 in 22 651 additional Icelanders).

    Funded by: Medical Research Council: G0601030; NIMH NIH HHS: 1U24MH081810, 2N01MH080001-001, MH074027, N01 MH900001, R01 MH078075; Wellcome Trust: 075491/Z/04, 085475/B/08/Z, 085475/Z/08/Z, 085475PELTONEN, 098051

    Molecular psychiatry 2014;19;1;108-14

  • Zero tolerance for healthcare-associated MRSA bacteraemia: is it realistic?

    Török ME, Harris SR, Cartwright EJ, Raven KE, Brown NM, Allison ME, Greaves D, Quail MA, Limmathurotsakul D, Holden MT, Parkhill J and Peacock SJ

    Department of Medicine, University of Cambridge, Cambridge, UK Department of Microbiology, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK Public Health England, Clinical Microbiology and Public Health Laboratory, Cambridge, UK

    Background: The term 'zero tolerance' has recently been applied to healthcare-associated infections, implying that such events are always preventable. This may not be the case for healthcare-associated infections such as methicillin-resistant Staphylococcus aureus (MRSA) bacteraemia.

    Methods: We combined information from an epidemiological investigation and bacterial whole-genome sequencing to evaluate a cluster of five MRSA bacteraemia episodes in four patients in a specialist hepatology unit.

    Results: The five MRSA bacteraemia isolates were highly related by multilocus sequence type (ST) (four isolates were ST22 and one isolate was a single-locus variant, ST2046). Whole-genome sequencing demonstrated unequivocally that the bacteraemia cases were unrelated. Placing the MRSA bacteraemia isolates within a local and global phylogenetic tree of MRSA ST22 genomes demonstrated that the five bacteraemia isolates were highly diverse. This was consistent with the acquisition and importation of MRSA from the wider referral network. Analysis of MRSA carriage and disease in patients within the hepatology service demonstrated a higher risk of both initial MRSA acquisition compared with the nephrology service and a higher risk of progression from MRSA carriage to bacteraemia, compared with patients in nephrology or geriatric services. A root cause analysis failed to reveal any mechanism by which three of five MRSA bacteraemia episodes could have been prevented.

    Conclusions: This study illustrates the complex nature of MRSA carriage and bacteraemia in patients in a specialized hepatology unit. Despite numerous ongoing interventions to prevent MRSA bacteraemia in healthcare settings, these are unlikely to result in a zero incidence in referral centres that treat highly complex patients.

    The Journal of antimicrobial chemotherapy 2014;69;8;2238-45

  • Clinical and pharmacogenomic implications of genetic variation in a Southern Ethiopian population.

    Tekola-Ayele F, Adeyemo A, Aseffa A, Hailu E, Finan C, Davey G, Rotimi CN and Newport MJ

    Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.

    Africa is home to genetically diverse human populations. We compared the genetic structure of the Wolaita ethnic population from Southern Ethiopia (WETH, n=120) with HapMap populations using genome-wide variants. We investigated allele frequencies of 443 clinically and pharmacogenomically relevant genetic variants in WETH compared with HapMap populations. We found that WETH were genetically most similar to the Kenya Maasai and least similar to the Japanese in HapMap. Variant alleles associated with increased risk of adverse reactions to drugs used for treating tuberculosis (rs1799929 and rs1495741 in NAT2), thromboembolism (rs7294, rs9923231 and rs9934438 in VKORC1), and HIV/AIDS and solid tumors (rs2242046 in SLC28A1) had significantly higher frequencies in WETH compared with African ancestry HapMap populations. Our results illustrate that clinically relevant pharmacogenomic loci display allele frequency differences among African populations. We conclude that drug dosage guidelines for important global health diseases should be validated in genetically diverse African populations.The Pharmacogenomics Journal advance online publication, 29 July 2014; doi:10.1038/tpj.2014.39.

    The pharmacogenomics journal 2014

  • Loss-of-function mutations in APOC3, triglycerides, and coronary disease.

    TG and HDL Working Group of the Exome Sequencing Project, National Heart, Lung, and Blood Institute, Crosby J, Peloso GM, Auer PL, Crosslin DR, Stitziel NO, Lange LA, Lu Y, Tang ZZ, Zhang H, Hindy G, Masca N, Stirrups K, Kanoni S, Do R, Jun G, Hu Y, Kang HM, Xue C, Goel A, Farrall M, Duga S, Merlini PA, Asselta R, Girelli D, Olivieri O, Martinelli N, Yin W, Reilly D, Speliotes E, Fox CS, Hveem K, Holmen OL, Nikpay M, Farlow DN, Assimes TL, Franceschini N, Robinson J, North KE, Martin LW, DePristo M, Gupta N, Escher SA, Jansson JH, Van Zuydam N, Palmer CN, Wareham N, Koch W, Meitinger T, Peters A, Lieb W, Erbel R, Konig IR, Kruppa J, Degenhardt F, Gottesman O, Bottinger EP, O'Donnell CJ, Psaty BM, Ballantyne CM, Abecasis G, Ordovas JM, Melander O, Watkins H, Orho-Melander M, Ardissino D, Loos RJ, McPherson R, Willer CJ, Erdmann J, Hall AS, Samani NJ, Deloukas P, Schunkert H, Wilson JG, Kooperberg C, Rich SS, Tracy RP, Lin DY, Altshuler D, Gabriel S, Nickerson DA, Jarvik GP, Cupples LA, Reiner AP, Boerwinkle E and Kathiresan S

    Background: Plasma triglyceride levels are heritable and are correlated with the risk of coronary heart disease. Sequencing of the protein-coding regions of the human genome (the exome) has the potential to identify rare mutations that have a large effect on phenotype.

    Methods: We sequenced the protein-coding regions of 18,666 genes in each of 3734 participants of European or African ancestry in the Exome Sequencing Project. We conducted tests to determine whether rare mutations in coding sequence, individually or in aggregate within a gene, were associated with plasma triglyceride levels. For mutations associated with triglyceride levels, we subsequently evaluated their association with the risk of coronary heart disease in 110,970 persons.

    Results: An aggregate of rare mutations in the gene encoding apolipoprotein C3 (APOC3) was associated with lower plasma triglyceride levels. Among the four mutations that drove this result, three were loss-of-function mutations: a nonsense mutation (R19X) and two splice-site mutations (IVS2+1G→A and IVS3+1G→T). The fourth was a missense mutation (A43T). Approximately 1 in 150 persons in the study was a heterozygous carrier of at least one of these four mutations. Triglyceride levels in the carriers were 39% lower than levels in noncarriers (P<1×10(-20)), and circulating levels of APOC3 in carriers were 46% lower than levels in noncarriers (P=8×10(-10)). The risk of coronary heart disease among 498 carriers of any rare APOC3 mutation was 40% lower than the risk among 110,472 noncarriers (odds ratio, 0.60; 95% confidence interval, 0.47 to 0.75; P=4×10(-6)).

    Conclusions: Rare mutations that disrupt APOC3 function were associated with lower levels of plasma triglycerides and APOC3. Carriers of these mutations were found to have a reduced risk of coronary heart disease. (Funded by the National Heart, Lung, and Blood Institute and others.).

    Funded by: British Heart Foundation; Canadian Institutes of Health Research; NHLBI NIH HHS: K08-HL114642, R01HL107816, RC2 HL-102923, RC2 HL-102924, RC2 HL-102925, RC2 HL-102926, RC2 HL-103010, T32HL007208

    The New England journal of medicine 2014;371;1;22-31

  • Typhoid fever in Fiji: a reversible plague?

    Thompson CN, Kama M, Acharya S, Bera U, Clemens J, Crump JA, Dawainavesi A, Dougan G, Edmunds WJ, Fox K, Jenkins K, Khan MI, Koroivueta J, Levine MM, Martin LB, Nilles E, Pitzer VE, Singh S, Raiwalu RV, Baker S and Mulholland K

    Hospital for Tropical Diseases, Wellcome Trust Major Overseas Programme, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam; Centre for Tropical Medicine, Nuffield Department of Clinical Medicine, Oxford University, Oxford, UK.

    The country of Fiji, with a population of approximately 870 000 people, faces a growing burden of several communicable diseases including the bacterial infection typhoid fever. Surveillance data suggest that typhoid has become increasingly common in rural areas of Fiji and is more frequent amongst young adults. Transmission of the organisms that cause typhoid is facilitated by faecal contamination of food or water and may be influenced by local behavioural practices in Fiji. The Fijian Ministry of Health, with support from Australian Aid, hosted a meeting in August 2012 to develop comprehensive control and prevention strategies for typhoid fever in Fiji. International and local specialists were invited to share relevant data and discuss typhoid control options. The resultant recommendations focused on generating a clearer sense of the epidemiology of typhoid in Fiji and exploring the contribution of potential transmission pathways. Additionally, the panel suggested steps such as ensuring that recommended ciprofloxacin doses are appropriate to reduce the potential for relapse and reinfection in clinical cases, encouraging proper hand hygiene of food and drink handlers, working with water and sanitation agencies to review current sanitation practices and considering a vaccination policy targeting epidemiologically relevant populations.

    Tropical medicine & international health : TM & IH 2014

  • Complete Genome Sequences of Two Citrobacter rodentium Bacteriophages, CR8 and CR44b.

    Toribio AL, Pickard D, Cerdeño-Tárraga AM, Petty NK, Thomson N, Salmond G and Dougan G

    The complete genomes of two virulent phages infecting Citrobacter rodentium are reported here for the first time. Both bacteriophages were isolated from local sewage treatment plant effluents. Genome analyses revealed a close relationship between both phages and allowed their classification as members of the Autographivirinae subfamily in the T7-like genus.

    Genome announcements 2014;2;3

  • New mini- zincin structures provide a minimal scaffold for members of this metallopeptidase superfamily.

    Trame CB, Chang Y, Axelrod HL, Eberhardt RY, Coggill P, Punta M and Rawlings ND

    Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Background: The Acel_2062 protein from Acidothermus cellulolyticus is a protein of unknown function. Initial sequence analysis predicted that it was a metallopeptidase from the presence of a motif conserved amongst the Asp-zincins, which are peptidases that contain a single, catalytic zinc ion ligated by the histidines and aspartic acid within the motif (HEXXHXXGXXD). The Acel_2062 protein was chosen by the Joint Center for Structural Genomics for crystal structure determination to explore novel protein sequence space and structure-based function annotation. Results: The crystal structure confirmed that the Acel_2062 protein consisted of a single, zincin-like metallopeptidase-like domain. The Met-turn, a structural feature thought to be important for a Met-zincin because it stabilizes the active site, is absent, and its stabilizing role may have been conferred to the C-terminal Tyr113. In our crystallographic model there are two molecules in the asymmetric unit and from size-exclusion chromatography, the protein dimerizes in solution. A water molecule is present in the putative zinc-binding site in one monomer, which is replaced by one of two observed conformations of His95 in the other. Conclusions: The Acel_2062 protein is structurally related to the zincins. It contains the minimum structural features of a member of this protein superfamily, and can be described as a "mini- zincin". There is a striking parallel with the structure of a mini-Glu-zincin, which represents the minimum structure of a Glu-zincin (a metallopeptidase in which the third zinc ligand is a glutamic acid). Rather than being an ancestral state, phylogenetic analysis suggests that the mini-zincins are derived from larger proteins.

    BMC bioinformatics 2014;15;1;1

  • Naturally Acquired Antibodies Specific for Plasmodium falciparum Reticulocyte-Binding Protein Homologue 5 Inhibit Parasite Growth and Predict Protection From Malaria.

    Tran TM, Ongoiba A, Coursen J, Crosnier C, Diouf A, Huang CY, Li S, Doumbo S, Doumtabe D, Kone Y, Bathily A, Dia S, Niangaly M, Dara C, Sangala J, Miller LH, Doumbo OK, Kayentao K, Long CA, Miura K, Wright GJ, Traore B and Crompton PD

    Laboratory of Immunogenetics, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, Maryland.

    Background. Plasmodium falciparum reticulocyte-binding protein homologue 5 (PfRH5) is a blood-stage parasite protein essential for host erythrocyte invasion. PfRH5-specific antibodies raised in animals inhibit parasite growth in vitro, but the relevance of naturally acquired PfRH5-specific antibodies in humans is unclear. Methods. We assessed pre-malaria season PfRH5-specific immunoglobulin G (IgG) levels in 357 Malian children and adults who were uninfected with Plasmodium. Subsequent P. falciparum infections were detected by polymerase chain reaction every 2 weeks and malaria episodes by weekly physical examination and self-referral for 7 months. The primary outcome was time between the first P. falciparum infection and the first febrile malaria episode. PfRH5-specific IgG was assayed for parasite growth-inhibitory activity. Results. The presence of PfRH5-specific IgG at enrollment was associated with a longer time between the first blood-stage infection and the first malaria episode (PfRH5-seropositive median: 71 days, PfRH5-seronegative median: 18 days; P = .001). This association remained significant after adjustment for age and other factors associated with malaria risk/exposure (hazard ratio, .62; P = .02). Concentrated PfRH5-specific IgG purified from Malians inhibited P. falciparum growth in vitro. Conclusions. Naturally acquired PfRH5-specific IgG inhibits parasite growth in vitro and predicts protection from malaria. These findings strongly support efforts to develop PfRH5 as an urgently needed blood-stage malaria vaccine. Clinical Trials Registration NCT01322581.

    The Journal of infectious diseases 2014;209;5;789-98

  • Summarizing specific profiles in Illumina sequencing from whole-genome amplified DNA.

    Tsai IJ, Hunt M, Holroyd N, Huckvale T, Berriman M and Kikuchi T

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, UK Faculty of Medicine, Division of Parasitology, Department of Infectious Disease, University of Miyazaki, Miyazaki 889-1692, Japan.

    Advances in both high-throughput sequencing and whole-genome amplification (WGA) protocols have allowed genomes to be sequenced from femtograms of DNA, for example from individual cells or from precious clinical and archived samples. Using the highly curated Caenorhabditis elegans genome as a reference, we have sequenced and identified errors and biases associated with Illumina library construction, library insert size, different WGA methods and genome features such as GC bias and simple repeat content. Detailed analysis of the reads from amplified libraries revealed characteristics suggesting that majority of amplified fragment ends are identical but inverted versions of each other. Read coverage in amplified libraries is correlated with both tandem and inverted repeat content, while GC content only influences sequencing in long-insert libraries. Nevertheless, single nucleotide polymorphism (SNP) calls and assembly metrics from reads in amplified libraries show comparable results with unamplified libraries. To utilize the full potential of WGA to reveal the real biological interest, this article highlights the importance of recognizing additional sources of errors from amplified sequence reads and discusses the potential implications in downstream analyses.

    Funded by: Wellcome Trust: WT 098051

    DNA research : an international journal for rapid publication of reports on genes and genomes 2014;21;3;243-54

  • Cellular reprogramming by transcription factor engineering.

    Tsang JC, Gao X, Lu L and Liu P

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.

    Recent researches have identified multiple transcription factors as permissible reprogramming factors to pluripotency and lineage switching. The current standard strategy by ectopic factor overexpression however has intrinsic limitations in studying the reprogramming mechanism. There is a growing interest in engineering novel chimeric reprogramming factors and applying designer transcription factors technology to improve reprogramming efficiency and dissect the process of endogenous pluripotency network reactivation. Here, we provide a concise review on the latest progress in studying cellular reprogramming by transcription factor engineering.

    Current opinion in genetics & development 2014;28C;1-9

  • Mobile DNA in cancer. Extensive transduction of nonrepetitive DNA mediated by L1 retrotransposition in cancer genomes.

    Tubio JM, Li Y, Ju YS, Martincorena I, Cooke SL, Tojo M, Gundem G, Pipinikas CP, Zamora J, Raine K, Menzies A, Roman-Garcia P, Fullam A, Gerstung M, Shlien A, Tarpey PS, Papaemmanuil E, Knappskog S, Van Loo P, Ramakrishna M, Davies HR, Marshall J, Wedge DC, Teague JW, Butler AP, Nik-Zainal S, Alexandrov L, Behjati S, Yates LR, Bolli N, Mudie L, Hardy C, Martin S, McLaren S, O'Meara S, Anderson E, Maddison M, Gamble S, ICGC Breast Cancer Group, ICGC Bone Cancer Group, ICGC Prostate Cancer Group, Foster C, Warren AY, Whitaker H, Brewer D, Eeles R, Cooper C, Neal D, Lynch AG, Visakorpi T, Isaacs WB, van't Veer L, Caldas C, Desmedt C, Sotiriou C, Aparicio S, Foekens JA, Eyfjörd JE, Lakhani SR, Thomas G, Myklebost O, Span PN, Børresen-Dale AL, Richardson AL, Van de Vijver M, Vincent-Salomon A, Van den Eynden GG, Flanagan AM, Futreal PA, Janes SM, Bova GS, Stratton MR, McDermott U and Campbell PJ

    Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK.

    Long interspersed nuclear element-1 (L1) retrotransposons are mobile repetitive elements that are abundant in the human genome. L1 elements propagate through RNA intermediates. In the germ line, neighboring, nonrepetitive sequences are occasionally mobilized by the L1 machinery, a process called 3' transduction. Because 3' transductions are potentially mutagenic, we explored the extent to which they occur somatically during tumorigenesis. Studying cancer genomes from 244 patients, we found that tumors from 53% of the patients had somatic retrotranspositions, of which 24% were 3' transductions. Fingerprinting of donor L1s revealed that a handful of source L1 elements in a tumor can spawn from tens to hundreds of 3' transductions, which can themselves seed further retrotranspositions. The activity of individual L1 elements fluctuated during tumor evolution and correlated with L1 promoter hypomethylation. The 3' transductions disseminated genes, exons, and regulatory elements to new locations, most often to heterochromatic regions of the genome.

    Funded by: Cancer Research UK: C5047/A14835; Department of Health; Wellcome Trust: WT100183MA

    Science (New York, N.Y.) 2014;345;6196;1251343

  • Chromosome x-wide association study identifies Loci for fasting insulin and height and evidence for incomplete dosage compensation.

    Tukiainen T, Pirinen M, Sarin AP, Ladenvall C, Kettunen J, Lehtimäki T, Lokki ML, Perola M, Sinisalo J, Vlachopoulou E, Eriksson JG, Groop L, Jula A, Järvelin MR, Raitakari OT, Salomaa V and Ripatti S

    Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland ; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, United States of America ; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America.

    The X chromosome (chrX) represents one potential source for the "missing heritability" for complex phenotypes, which thus far has remained underanalyzed in genome-wide association studies (GWAS). Here we demonstrate the benefits of including chrX in GWAS by assessing the contribution of 404,862 chrX SNPs to levels of twelve commonly studied cardiometabolic and anthropometric traits in 19,697 Finnish and Swedish individuals with replication data on 5,032 additional Finns. By using a linear mixed model, we estimate that on average 2.6% of the additive genetic variance in these twelve traits is attributable to chrX, this being in proportion to the number of SNPs in the chromosome. In a chrX-wide association analysis, we identify three novel loci: two for height (rs182838724 near FGF16/ATRX/MAGT1, joint P-value = 2.71×10(-9), and rs1751138 near ITM2A, P-value = 3.03×10(-10)) and one for fasting insulin (rs139163435 in Xq23, P-value = 5.18×10(-9)). Further, we find that effect sizes for variants near ITM2A, a gene implicated in cartilage development, show evidence for a lack of dosage compensation. This observation is further supported by a sex-difference in ITM2A expression in whole blood (P-value = 0.00251), and is also in agreement with a previous report showing ITM2A escapes from X chromosome inactivation (XCI) in the majority of women. Hence, our results show one of the first links between phenotypic variation in a population sample and an XCI-escaping locus and pinpoint ITM2A as a potential contributor to the sexual dimorphism in height. In conclusion, our study provides a clear motivation for including chrX in large-scale genetic studies of complex diseases and traits.

    PLoS genetics 2014;10;2;e1004127

  • Loss-of-function mutations in MICU1 cause a brain and muscle disorder linked to primary alterations in mitochondrial calcium signaling.

    UK10K Consortium

    Mitochondrial Ca(2+) uptake has key roles in cell life and death. Physiological Ca(2+) signaling regulates aerobic metabolism, whereas pathological Ca(2+) overload triggers cell death. Mitochondrial Ca(2+) uptake is mediated by the Ca(2+) uniporter complex in the inner mitochondrial membrane, which comprises MCU, a Ca(2+)-selective ion channel, and its regulator, MICU1. Here we report mutations of MICU1 in individuals with a disease phenotype characterized by proximal myopathy, learning difficulties and a progressive extrapyramidal movement disorder. In fibroblasts from subjects with MICU1 mutations, agonist-induced mitochondrial Ca(2+) uptake at low cytosolic Ca(2+) concentrations was increased, and cytosolic Ca(2+) signals were reduced. Although resting mitochondrial membrane potential was unchanged in MICU1-deficient cells, the mitochondrial network was severely fragmented. Whereas the pathophysiology of muscular dystrophy and the core myopathies involves abnormal mitochondrial Ca(2+) handling, the phenotype associated with MICU1 deficiency is caused by a primary defect in mitochondrial Ca(2+) signaling, demonstrating the crucial role of mitochondrial Ca(2+) uptake in humans.

    Nature genetics 2014;46;2;188-93

  • Heps with pep: direct reprogramming into human hepatocytes.

    Vallier L

    Wellcome Trust-Medical Research Council Stem Cell Institute, Anne McLaren Institute for Regenerative Medicine, Department of Surgery, West Forvie Site, Robinson Way, Cambridge CB20SZ, UK; Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK. Electronic address:

    The limited supply and expansion capacity of primary human hepatocytes presents major challenges for pharmaceutical applications and development of cell-based therapies for liver diseases. Now in Cell Stem Cell, two papers demonstrate efficient direct reprogramming of human fibroblasts into induced hepatocytes, which exhibit metabolic properties similar to primary hepatocytes.

    Cell stem cell 2014;14;3;267-9

  • Harmonization of Neuroticism and Extraversion phenotypes across inventories and cohorts in the Genetics of Personality Consortium: an application of Item Response Theory.

    van den Berg SM, de Moor MH, McGue M, Pettersson E, Terracciano A, Verweij KJ, Amin N, Derringer J, Esko T, van Grootheest G, Hansell NK, Huffman J, Konte B, Lahti J, Luciano M, Matteson LK, Viktorin A, Wouda J, Agrawal A, Allik J, Bierut L, Broms U, Campbell H, Smith GD, Eriksson JG, Ferrucci L, Franke B, Fox JP, de Geus EJ, Giegling I, Gow AJ, Grucza R, Hartmann AM, Heath AC, Heikkilä K, Iacono WG, Janzing J, Jokela M, Kiemeney L, Lehtimäki T, Madden PA, Magnusson PK, Northstone K, Nutile T, Ouwens KG, Palotie A, Pattie A, Pesonen AK, Polasek O, Pulkkinen L, Pulkki-Råback L, Raitakari OT, Realo A, Rose RJ, Ruggiero D, Seppälä I, Slutske WS, Smyth DC, Sorice R, Starr JM, Sutin AR, Tanaka T, Verhagen J, Vermeulen S, Vuoksimaa E, Widen E, Willemsen G, Wright MJ, Zgaga L, Rujescu D, Metspalu A, Wilson JF, Ciullo M, Hayward C, Rudan I, Deary IJ, Räikkönen K, Arias Vasquez A, Costa PT, Keltikangas-Järvinen L, van Duijn CM, Penninx BW, Krueger RF, Evans DM, Kaprio J, Pedersen NL, Martin NG and Boomsma DI

    Department of Research Methodology, Measurement and Data-Analysis, University of Twente, Enschede, The Netherlands,

    Mega- or meta-analytic studies (e.g. genome-wide association studies) are increasingly used in behavior genetics. An issue in such studies is that phenotypes are often measured by different instruments across study cohorts, requiring harmonization of measures so that more powerful fixed effect meta-analyses can be employed. Within the Genetics of Personality Consortium, we demonstrate for two clinically relevant personality traits, Neuroticism and Extraversion, how Item-Response Theory (IRT) can be applied to map item data from different inventories to the same underlying constructs. Personality item data were analyzed in >160,000 individuals from 23 cohorts across Europe, USA and Australia in which Neuroticism and Extraversion were assessed by nine different personality inventories. Results showed that harmonization was very successful for most personality inventories and moderately successful for some. Neuroticism and Extraversion inventories were largely measurement invariant across cohorts, in particular when comparing cohorts from countries where the same language is spoken. The IRT-based scores for Neuroticism and Extraversion were heritable (48 and 49 %, respectively, based on a meta-analysis of six twin cohorts, total N = 29,496 and 29,501 twin pairs, respectively) with a significant part of the heritability due to non-additive genetic factors. For Extraversion, these genetic factors qualitatively differ across sexes. We showed that our IRT method can lead to a large increase in sample size and therefore statistical power. The IRT approach may be applied to any mega- or meta-analytic study in which item-based behavioral measures need to be harmonized.

    Behavior genetics 2014

  • TMEM129 is a Derlin-1 associated ERAD E3 ligase essential for virus-induced degradation of MHC-I.

    van den Boomen DJ, Timms RT, Grice GL, Stagg HR, Skødt K, Dougan G, Nathan JA and Lehner PJ

    Cambridge Institute for Medical Research, Department of Medicine, University of Cambridge, Cambridge CB2 0XY, United Kingdom;

    The US11 gene product of human cytomegalovirus promotes viral immune evasion by hijacking the endoplasmic reticulum (ER)-associated degradation (ERAD) pathway. US11 initiates dislocation of newly translocated MHC I from the ER to the cytosol for proteasome-mediated degradation. Despite the critical role for ubiquitin in this degradation pathway, the responsible E3 ligase is unknown. In a forward genetic screen for host ERAD components hijacked by US11 in near-haploid KBM7 cells, we identified TMEM129, an uncharacterized polytopic membrane protein. TMEM129 is essential and rate-limiting for US11-mediated MHC-I degradation and acts as a novel ER resident E3 ubiquitin ligase. TMEM129 contains an unusual cysteine-only RING with intrinsic E3 ligase activity and is recruited to US11 via Derlin-1. Together with its E2 conjugase Ube2J2, TMEM129 is responsible for the ubiquitination, dislocation, and subsequent degradation of US11-associated MHC-I. US11 engages two degradation pathways: a Derlin-1/TMEM129-dependent pathway required for MHC-I degradation and a SEL1L/HRD1-dependent pathway required for "free" US11 degradation. Our data show that TMEM129 is a novel ERAD E3 ligase and the central component of a novel mammalian ERAD complex.

    Proceedings of the National Academy of Sciences of the United States of America 2014

  • Cancer gene discovery goes mobile.

    van der Weyden L, Ranzani M and Adams DJ

    Experimental Cancer Genetics, Wellcome Trust Sanger Institute, Hinxton, UK.

    A new study describes a tool, Lentihop, for somatic insertional mutagenesis in human cells and uses this system in combination with cancer genome data to define new genes and pathways involved in sarcoma development. Gene discovery in this way suggests that we are far from a complete catalog of cancer drivers.

    Nature genetics 2014;46;9;928-929

  • In vivo evolution of antimicrobial resistance in a series of Staphylococcus aureus patient isolates: the entire picture or a cautionary tale?

    van Hal SJ, Steen JA, Espedido BA, Grimmond SM, Cooper MA, Holden MT, Bentley SD, Gosbell IB and Jensen SO

    Antibiotic Resistance & Mobile Elements Group, School of Medicine, University of Western Sydney, Sydney, NSW, Australia.

    Objectives: To obtain an expanded understanding of antibiotic resistance evolution in vivo, particularly in the context of vancomycin exposure. Methods: The whole genomes of six consecutive methicillin-resistant Staphylococcus aureus blood culture isolates (ST239-MRSA-III) from a single patient exposed to various antimicrobials (over a 77 day period) were sequenced and analysed. Results: Variant analysis revealed the existence of non-susceptible sub-populations derived from a common susceptible ancestor, with the predominant circulating clone(s) selected for by type and duration of antimicrobial exposure. Conclusions: This study highlights the dynamic nature of bacterial evolution and that non-susceptible sub-populations can emerge from clouds of variation upon antimicrobial exposure. Diagnostically, this has direct implications for sample selection when using whole-genome sequencing as a tool to guide clinical therapy. In the context of bacteraemia, deep sequencing of bacterial DNA directly from patient blood samples would avoid culture 'bias' and identify mutations associated with circulating non-susceptible sub-populations, some of which may confer cross-resistance to alternate therapies.

    The Journal of antimicrobial chemotherapy 2014;69;2;363-7

  • Common genetic variants do not associate with CAD in familial hypercholesterolemia.

    van Iperen EP, Sivapalaratnam S, Boekholdt SM, Hovingh GK, Maiwald S, Tanck MW, Soranzo N, Stephens JC, Sambrook JG, Levi M, Ouwehand WH, Kastelein JJ, Trip MD and Zwinderman AH

    1] Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Centre, Amsterdam, The Netherlands [2] Durrer Center for Cardiogenetic Research, Amsterdam, The Netherlands.

    In recent years, multiple loci dispersed on the genome have been shown to be associated with coronary artery disease (CAD). We investigated whether these common genetic variants also hold value for CAD prediction in a large cohort of patients with familial hypercholesterolemia (FH). We genotyped a total of 41 single-nucleotide polymorphisms (SNPs) in 1701 FH patients, of whom 482 patients (28.3%) had at least one coronary event during an average follow up of 66 years. The association of each SNP with event-free survival time was calculated with a Cox proportional hazard model. In the cardiovascular disease risk factor adjusted analysis, the most significant SNP was rs1122608:G>T in the SMARCA4 gene near the LDL-receptor (LDLR) gene, with a hazard ratio for CAD risk of 0.74 (95% CI 0.49-0.99; P-value 0.021). However, none of the SNPs reached the Bonferroni threshold. Of all the known CAD loci analyzed, the SMARCA4 locus near the LDLR had the strongest negative association with CAD in this high-risk FH cohort. The effect is contrary to what was expected. None of the other loci showed association with CAD.

    European journal of human genetics : EJHG 2014;22;6;809-13

  • Single cell analysis of cancer genomes.

    Van Loo P and Voet T

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK; Department of Human Genetics, VIB and KU Leuven, Leuven, Belgium.

    Genomic studies have provided key insights into how cancers develop, evolve, metastasize and respond to treatment. Cancers result from an interplay between mutation, selection and clonal expansions. In solid tumours, this Darwinian competition between subclones is also influenced by topological factors. Recent advances have made it possible to study cancers at the single cell level. These methods represent important tools to dissect cancer evolution and provide the potential to considerably change both cancer research and clinical practice. Here we discuss state-of-the-art methods for the isolation of a single cell, whole-genome and whole-transcriptome amplification of the cell's nucleic acids, as well as microarray and massively parallel sequencing analysis of such amplification products. We discuss the strengths and the limitations of the techniques, and explore single-cell methodologies for future cancer research, as well as diagnosis and treatment of the disease.

    Current opinion in genetics &amp; development 2014;24C;82-91

  • The classical lancefield antigen of group a streptococcus is a virulence determinant with implications for vaccine design.

    van Sorge NM, Cole JN, Kuipers K, Henningham A, Aziz RK, Kasirer-Friede A, Lin L, Berends ET, Davies MR, Dougan G, Zhang F, Dahesh S, Shaw L, Gin J, Cunningham M, Merriman JA, Hütter J, Lepenies B, Rooijakkers SH, Malley R, Walker MJ, Shattil SJ, Schlievert PM, Choudhury B and Nizet V

    Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA; Medical Microbiology, University Medical Center Utrecht, 3584 CX Utrecht, The Netherlands. Electronic address:

    Group A Streptococcus (GAS) is a leading cause of infection-related mortality in humans. All GAS serotypes express the Lancefield group A carbohydrate (GAC), comprising a polyrhamnose backbone with an immunodominant N-acetylglucosamine (GlcNAc) side chain, which is the basis of rapid diagnostic tests. No biological function has been attributed to this conserved antigen. Here we identify and characterize the GAC biosynthesis genes, gacA through gacL. An isogenic mutant of the glycosyltransferase gacI, which is defective for GlcNAc side-chain addition, is attenuated for virulence in two infection models, in association with increased sensitivity to neutrophil killing, platelet-derived antimicrobials in serum, and the cathelicidin antimicrobial peptide LL-37. Antibodies to GAC lacking the GlcNAc side chain and containing only polyrhamnose promoted opsonophagocytic killing of multiple GAS serotypes and protected against systemic GAS challenge after passive immunization. Thus, the Lancefield antigen plays a functional role in GAS pathogenesis, and a deeper understanding of this unique polysaccharide has implications for vaccine development.

    Funded by: NHLBI NIH HHS: P01 HL107150; NIAID NIH HHS: R01 AI052453, R01 AI077780, R01 AI096837, U54 AI057153

    Cell host & microbe 2014;15;6;729-40

  • Defining the estimated core genome of bacterial populations using a bayesian decision model.

    van Tonder AJ, Mistry S, Bray JE, Hill DM, Cody AJ, Farmer CL, Klugman KP, von Gottberg A, Bentley SD, Parkhill J, Jolley KA, Maiden MC and Brueggemann AB

    Nuffield Department of Medicine, The Peter Medawar Building for Pathogen Research, University of Oxford, Oxford, United Kingdom.

    The bacterial core genome is of intense interest and the volume of whole genome sequence data in the public domain available to investigate it has increased dramatically. The aim of our study was to develop a model to estimate the bacterial core genome from next-generation whole genome sequencing data and use this model to identify novel genes associated with important biological functions. Five bacterial datasets were analysed, comprising 2096 genomes in total. We developed a Bayesian decision model to estimate the number of core genes, calculated pairwise evolutionary distances (p-distances) based on nucleotide sequence diversity, and plotted the median p-distance for each core gene relative to its genome location. We designed visually-informative genome diagrams to depict areas of interest in genomes. Case studies demonstrated how the model could identify areas for further study, e.g. 25% of the core genes with higher sequence diversity in the Campylobacter jejuni and Neisseria meningitidis genomes encoded hypothetical proteins. The core gene with the highest p-distance value in C. jejuni was annotated in the reference genome as a putative hydrolase, but further work revealed that it shared sequence homology with beta-lactamase/metallo-beta-lactamases (enzymes that provide resistance to a range of broad-spectrum antibiotics) and thioredoxin reductase genes (which reduce oxidative stress and are essential for DNA replication) in other C. jejuni genomes. Our Bayesian model of estimating the core genome is principled, easy to use and can be applied to large genome datasets. This study also highlighted the lack of knowledge currently available for many core genes in bacterial genomes of significant global public health importance.

    PLoS computational biology 2014;10;8;e1003788

  • Genetic Determinants of Long-Term Changes in Blood Lipid Concentrations: 10-Year Follow-Up of the GLACIER Study.

    Varga TV, Sonestedt E, Shungin D, Koivula RW, Hallmans G, Escher SA, Barroso I, Nilsson P, Melander O, Orho-Melander M, Renström F and Franks PW

    Department of Clinical Sciences, Genetic and Molecular Epidemiology Unit, Lund University, Skåne University Hospital Malmö, Malmö, Sweden.

    Recent genome-wide meta-analyses identified 157 loci associated with cross-sectional lipid traits. Here we tested whether these loci associate (singly and in trait-specific genetic risk scores [GRS]) with longitudinal changes in total cholesterol (TC) and triglyceride (TG) levels in a population-based prospective cohort from Northern Sweden (the GLACIER Study). We sought replication in a southern Swedish cohort (the MDC Study; N = 2,943). GLACIER Study participants (N = 6,064) were genotyped with the MetaboChip array. Up to 3,495 participants had 10-yr follow-up data available in the GLACIER Study. The TC- and TG-specific GRSs were strongly associated with change in lipid levels (β = 0.02 mmol/l per effect allele per decade follow-up, P = 2.0×10-11 for TC; β = 0.02 mmol/l per effect allele per decade follow-up, P = 5.0×10-5 for TG). In individual SNP analysis, one TC locus, apolipoprotein E (APOE) rs4420638 (β = 0.12 mmol/l per effect allele per decade follow-up, P = 2.0×10-5), and two TG loci, tribbles pseudokinase 1 (TRIB1) rs2954029 (β = 0.09 mmol/l per effect allele per decade follow-up, P = 5.1×10-4) and apolipoprotein A-I (APOA1) rs6589564 (β = 0.31 mmol/l per effect allele per decade follow-up, P = 1.4×10-8), remained significantly associated with longitudinal changes for the respective traits after correction for multiple testing. An additional 12 loci were nominally associated with TC or TG changes. In replication analyses, the APOE rs4420638, TRIB1 rs2954029, and APOA1 rs6589564 associations were confirmed (P≤0.001). In summary, trait-specific GRSs are robustly associated with 10-yr changes in lipid levels and three individual SNPs were strongly associated with 10-yr changes in lipid levels.

    PLoS genetics 2014;10;6;e1004388

  • RNA-seq analysis of the influence of anaerobiosis and FNR on Shigella flexneri.

    Vergara-Irigaray M, Fookes MC, Thomson NR and Tang CM

    Sir William Dunn School of Pathology, Oxford University, Oxford, United Kingdom.

    Background: Shigella flexneri is an important human pathogen that has to adapt to the anaerobic environment in the gastrointestinal tract to cause dysentery. To define the influence of anaerobiosis on the virulence of Shigella, we performed deep RNA sequencing to identify transcriptomic differences that are induced by anaerobiosis and modulated by the anaerobic Fumarate and Nitrate Reduction regulator, FNR.

    Results: We found that 528 chromosomal genes were differentially expressed in response to anaerobic conditions; of these, 228 genes were also influenced by FNR. Genes that were up-regulated in anaerobic conditions are involved in carbon transport and metabolism (e.g. ptsG, manX, murQ, cysP, cra), DNA topology and regulation (e.g. ygiP, stpA, hns), host interactions (e.g. yciD, nmpC, slyB, gapA, shf, msbB) and survival within the gastrointestinal tract (e.g. shiA, ospI, adiY, cysP). Interestingly, there was a marked effect of available oxygen on genes involved in Type III secretion system (T3SS), which is required for host cell invasion and pathogenesis. These genes, located on the large Shigella virulence plasmid, were down regulated in anaerobiosis in an FNR-dependent manner. We also confirmed anaerobic induction of csrB and csrC small RNAs in an FNR-independent manner.

    Conclusions: Anaerobiosis promotes survival and adaption strategies of Shigella, while modulating virulence plasmid genes involved in T3SS-mediated host cell invasion. The influence of FNR on this process is more extensive than previously appreciated, although aside from the virulence plasmid, this transcriptional regulator does not govern expression of genes on other horizontally acquired sequences on the chromosome such as pathogenicity islands.

    BMC genomics 2014;15;1;438

  • Detecting and measuring selection from gene frequency data.

    Vitalis R, Gautier M, Dawson KJ and Beaumont MA

    Institut National de la Recherche Agronomique, Unité Mixte de Recherche CBGP, (Inra, Ird, Cirad, Montpellier-SupAgro) 34988 Montferrier-sur-Lez Cedex, France.

    The recent advent of high-throughput sequencing and genotyping technologies makes it possible to produce, easily and cost effectively, large amounts of detailed data on the genotype composition of populations. Detecting locus-specific effects may help identify those genes that have been, or are currently, targeted by natural selection. How best to identify these selected regions, loci, or single nucleotides remains a challenging issue. Here, we introduce a new model-based method, called SelEstim, to distinguish putative selected polymorphisms from the background of neutral (or nearly neutral) ones and to estimate the intensity of selection at the former. The underlying population genetic model is a diffusion approximation for the distribution of allele frequency in a population subdivided into a number of demes that exchange migrants. We use a Markov chain Monte Carlo algorithm for sampling from the joint posterior distribution of the model parameters, in a hierarchical Bayesian framework. We present evidence from stochastic simulations, which demonstrates the good power of SelEstim to identify loci targeted by selection and to estimate the strength of selection acting on these loci, within each deme. We also reanalyze a subset of SNP data from the Stanford HGDP-CEPH Human Genome Diversity Cell Line Panel to illustrate the performance of SelEstim on real data. In agreement with previous studies, our analyses point to a very strong signal of positive selection upstream of the LCT gene, which encodes for the enzyme lactase-phlorizin hydrolase and is associated with adult-type hypolactasia. The geographical distribution of the strength of positive selection across the Old World matches the interpolated map of lactase persistence phenotype frequencies, with the strongest selection coefficients in Europe and in the Indus Valley.

    Funded by: Biotechnology and Biological Sciences Research Council: BBS/B/12776

    Genetics 2014;196;3;799-817

  • Novel porcine-like human G26P[19] rotavirus identified in hospitalized pediatric diarrhea patients in Ho Chi Minh City.

    Vu Tra My P, Rabaa MA, Donato C, Cowley D, Vinh Phat V, Thi Ngoc Dung T, Hong Anh P, Vinh H, Bryant JE, Kellam P, Thwaites G, Woolhouse ME, Kirkwood CD and Baker S

    The Hospital for Tropical Diseases, Wellcome Trust Major Overseas Programme, Vietnam;

    During a hospital-based diarrheal disease study conducted in Ho Chi Minh City, Vietnam from 2009-2010, we identified four symptomatic children infected with G26P[19] rotavirus, an atypical variant that has not previously been reported in human gastroenteritis. To determine the genetic structure and investigate the origin of this G26P[19] strain, the whole genome of a representative example was characterized, revealing a novel mixed genotype constellation of G26-P[19]-I5-R1-C1-M1-A8-N1-T1-E1-H1. The genome segments were most closely related to porcine (VP7, VP4, VP6 and NSP1) and Wa-like porcine rotaviruses (VP1-3, NSP2-5). We propose that this G26P[19] strain was a result from a pig-human zoonotic transmission, followed by a limited onward transmission train in humans. The identification of such strains has potential implications for vaccine efficacy in Southeast Asia and outlines the utility of whole genome sequencing for studying rotavirus genetic diversity and zoonotic potential during disease surveillance.

    The Journal of general virology 2014

  • An outpatient, ambulant-design, controlled human infection model using escalating doses of salmonella typhi challenge delivered in sodium bicarbonate solution.

    Waddington CS, Darton TC, Jones C, Haworth K, Peters A, John T, Thompson BA, Kerridge SA, Kingsley RA, Zhou L, Holt KE, Yu LM, Lockhart S, Farrar JJ, Sztein MB, Dougan G, Angus B, Levine MM and Pollard AJ

    Oxford Vaccine Group, Department of Paediatrics, University of Oxford.

    Background. Typhoid fever is a major global health problem, the control of which is hindered by lack of a suitable animal model in which to study Salmonella Typhi infection. Until 1974, a human challenge model advanced understanding of typhoid and was used in vaccine development. We set out to establish a new human challenge model and ascertain the S. Typhi (Quailes strain) inoculum required for an attack rate of 60%-75% in typhoid-naive volunteers when ingested with sodium bicarbonate solution. Methods. Groups of healthy consenting adults ingested escalating dose levels of S. Typhi and were closely monitored in an outpatient setting for 2 weeks. Antibiotic treatment was initiated if typhoid diagnosis occurred (temperature ≥38°C sustained ≥12 hours or bacteremia) or at day 14 in those remaining untreated. Results. Two dose levels (10(3) or 10(4) colony-forming units) were required to achieve the primary objective, resulting in attack rates of 55% (11/20) or 65% (13/20), respectively. Challenge was well tolerated; 4 of 40 participants fulfilled prespecified criteria for severe infection. Most diagnoses (87.5%) were confirmed by blood culture, and asymptomatic bacteremia and stool shedding of S. Typhi was also observed. Participants who developed typhoid infection demonstrated serological responses to flagellin and lipopolysaccharide antigens by day 14; however, no anti-Vi antibody responses were detected. Conclusions. Human challenge with a small inoculum of virulent S. Typhi administered in bicarbonate solution can be performed safely using an ambulant-model design to advance understanding of host-pathogen interactions and immunity. This model should expedite development of diagnostics, vaccines, and therapeutics for typhoid control.

    Clinical infectious diseases : an official publication of the Infectious Diseases Society of America 2014;58;9;1230-40

  • Whole Exome Re-Sequencing Implicates CCDC38 and Cilia Structure and Function in Resistance to Smoking Related Airflow Obstruction.

    Wain LV, Sayers I, Soler Artigas M, Portelli MA, Zeggini E, Obeidat M, Sin DD, Bossé Y, Nickle D, Brandsma CA, Malarstig A, Vangjeli C, Jelinsky SA, John S, Kilty I, McKeever T, Shrine NR, Cook JP, Patel S, Spector TD, Hollox EJ, Hall IP and Tobin MD

    University of Leicester, Department of Health Sciences, Leicester, United Kingdom.

    Chronic obstructive pulmonary disease (COPD) is a leading cause of global morbidity and mortality and, whilst smoking remains the single most important risk factor, COPD risk is heritable. Of 26 independent genomic regions showing association with lung function in genome-wide association studies, eleven have been reported to show association with airflow obstruction. Although the main risk factor for COPD is smoking, some individuals are observed to have a high forced expired volume in 1 second (FEV1) despite many years of heavy smoking. We hypothesised that these "resistant smokers" may harbour variants which protect against lung function decline caused by smoking and provide insight into the genetic determinants of lung health. We undertook whole exome re-sequencing of 100 heavy smokers who had healthy lung function given their age, sex, height and smoking history and applied three complementary approaches to explore the genetic architecture of smoking resistance. Firstly, we identified novel functional variants in the "resistant smokers" and looked for enrichment of these novel variants within biological pathways. Secondly, we undertook association testing of all exonic variants individually with two independent control sets. Thirdly, we undertook gene-based association testing of all exonic variants. Our strongest signal of association with smoking resistance for a non-synonymous SNP was for rs10859974 (P = 2.34×10-4) in CCDC38, a gene which has previously been reported to show association with FEV1/FVC, and we demonstrate moderate expression of CCDC38 in bronchial epithelial cells. We identified an enrichment of novel putatively functional variants in genes related to cilia structure and function in resistant smokers. Ciliary function abnormalities are known to be associated with both smoking and reduced mucociliary clearance in patients with COPD. We suggest that genetic influences on the development or function of cilia in the bronchial epithelium may affect growth of cilia or the extent of damage caused by tobacco smoke.

    PLoS genetics 2014;10;5;e1004314

  • Adding genomic 'foliage' to the tree of life.

    Walker A

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2014;12;