Sanger Institute - Publications 2014

Number of papers published in 2014: 163

  • Editorial overview: Cancer genomics: kill it. Kill it dead.

    Adams D and McDermott U

    Experimental Cancer Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1HH, UK. Electronic address:

    Current opinion in genetics & development 2014

  • Rare Variants in NR2F2 Cause Congenital Heart Defects in Humans.

    Al Turki S, Manickaraj AK, Mercer CL, Gerety SS, Hitz MP, Lindsay S, D'Alessandro LC, Swaminathan GJ, Bentham J, Arndt AK, Low J, Breckpot J, Gewillig M, Thienpont B, Abdul-Khaliq H, Harnack C, Hoff K, Kramer HH, Schubert S, Siebert R, Toka O, Cosgrove C, Watkins H, Lucassen AM, O'Kelly IM, Salmon AP, Bu'lock FA, Granados-Riveron J, Setchfield K, Thornborough C, Brook JD, Mulder B, Klaassen S, Bhattacharya S, Devriendt K, Fitzpatrick DF, UK10K Consortium, Wilson DI, Mital S and Hurles ME

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK; Department of Pathology, King Abdulaziz Medical City, P.O. Box 22490, Riyadh 11426, Saudi Arabia.

    Congenital heart defects (CHDs) are the most common birth defect worldwide and are a leading cause of neonatal mortality. Nonsyndromic atrioventricular septal defects (AVSDs) are an important subtype of CHDs for which the genetic architecture is poorly understood. We performed exome sequencing in 13 parent-offspring trios and 112 unrelated individuals with nonsyndromic AVSDs and identified five rare missense variants (two of which arose de novo) in the highly conserved gene NR2F2, a very significant enrichment (p = 7.7 × 10(-7)) compared to 5,194 control subjects. We identified three additional CHD-affected families with other variants in NR2F2 including a de novo balanced chromosomal translocation, a de novo substitution disrupting a splice donor site, and a 3 bp duplication that cosegregated in a multiplex family. NR2F2 encodes a pleiotropic developmental transcription factor, and decreased dosage of NR2F2 in mice has been shown to result in abnormal development of atrioventricular septa. Via luciferase assays, we showed that all six coding sequence variants observed in individuals significantly alter the activity of NR2F2 on target promoters.

    American journal of human genetics 2014;94;4;574-85

  • A molecular marker of artemisinin-resistant Plasmodium falciparum malaria.

    Ariey F, Witkowski B, Amaratunga C, Beghain J, Langlois AC, Khim N, Kim S, Duru V, Bouchier C, Ma L, Lim P, Leang R, Duong S, Sreng S, Suon S, Chuor CM, Bout DM, Ménard S, Rogers WO, Genton B, Fandeur T, Miotto O, Ringwald P, Le Bras J, Berry A, Barale JC, Fairhurst RM, Benoit-Vical F, Mercereau-Puijalon O and Ménard D

    1] Institut Pasteur, Parasite Molecular Immunology Unit, 75724 Paris Cedex 15, France [2] Centre National de la Recherche Scientifique, Unité de Recherche Associée 2581, 75724 Paris Cedex 15, France [3] Institut Pasteur, Genetics and Genomics of Insect Vectors Unit, 75724 Paris Cedex 15, France (F.A.); Institut Pasteur, Functional Genetics of Infectious Diseases Unit, 75724 Paris Cedex 15, France (J.B.); Centre de Physiopathologie de Toulouse-Purpan, Institut National de la Santé et de la Recherche Médicale UMR1043, Centre National de la Recherche Scientifique UMR5282, Université Toulouse III, 31024 Toulouse Cedex 3, France Institut Pasteur, Unité de Biologie et Génétique du Paludisme, Team Malaria Targets and Drug Development, 75724 Paris Cedex 15, France (J.-C.B.).

    Plasmodium falciparum resistance to artemisinin derivatives in southeast Asia threatens malaria control and elimination activities worldwide. To monitor the spread of artemisinin resistance, a molecular marker is urgently needed. Here, using whole-genome sequencing of an artemisinin-resistant parasite line from Africa and clinical parasite isolates from Cambodia, we associate mutations in the PF3D7_1343700 kelch propeller domain ('K13-propeller') with artemisinin resistance in vitro and in vivo. Mutant K13-propeller alleles cluster in Cambodian provinces where resistance is prevalent, and the increasing frequency of a dominant mutant K13-propeller allele correlates with the recent spread of resistance in western Cambodia. Strong correlations between the presence of a mutant allele, in vitro parasite survival rates and in vivo parasite clearance rates indicate that K13-propeller mutations are important determinants of artemisinin resistance. K13-propeller polymorphism constitutes a useful molecular marker for large-scale surveillance efforts to contain artemisinin resistance in the Greater Mekong Subregion and prevent its global spread.

    Funded by: Medical Research Council: G0600718; Wellcome Trust: 090770/Z/09/Z, 098051

    Nature 2014;505;7481;50-5

  • estMOI: estimating multiplicity of infection using parasite deep sequencing data.

    Assefa SA, Preston MD, Campino S, Ocholla H, Sutherland CJ and Clark TG

    London School of Hygiene and Tropical Medicine, WC1E 7HT, London, UK, Wellcome Trust Sanger Institute, CB10 1SA, Hinxton, UK and Malawi-Liverpool-Wellcome Trust Clinical Research Programme, Box 30096 BT3, Blantyre, Malawia.

    Summary: Individuals living in endemic areas generally harbour multiple parasite strains. Multiplicity of infection (MOI) can be an indicator of immune status and transmission intensity. It has a potentially confounding effect on a number of population genetic analyses, which often assume isolates are clonal. Polymerase chain reaction-based approaches to estimate MOI can lack sensitivity. For example, in the human malaria parasite Plasmodium falciparum, genotyping of the merozoite surface protein (MSP1/2) genes is a standard method for assessing MOI, despite the apparent problem of underestimation. The availability of deep coverage data from massively parallizable sequencing technologies means that MOI can be detected genome wide by considering the abundance of heterozygous genotypes. Here, we present a method to estimate MOI, which considers unique combinations of polymorphisms from sequence reads. The method is implemented within the estMOI software. When applied to clinical P.falciparum isolates from three continents, we find that multiple infections are common, especially in regions with high transmission.Availability and implementation: estMOI is freely available from Contact: SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

    Bioinformatics (Oxford, England) 2014

  • Epistasis between the haptoglobin common variant and α+thalassemia influences risk of severe malaria in Kenyan children.

    Atkinson SH, Uyoga SM, Nyatichi E, Macharia AW, Nyutu G, Ndila C, Kwiatkowski DP, Rockett KA and Williams TN

    Department of Paediatrics, Oxford University Hospitals National Health Service Trust, University of Oxford, and.

    Haptoglobin (Hp) scavenges free hemoglobin following malaria-induced hemolysis. Few studies have investigated the relationship between the common Hp variants and the risk of severe malaria, and their results are inconclusive. We conducted a case-control study of 996 children with severe Plasmodium falciparum malaria and 1220 community controls and genotyped for Hp, hemoglobin (Hb) S heterozygotes, and α(+)thalassemia. Hb S heterozygotes and α(+)thalassemia homozygotes were protected from severe malaria (odds ratio [OR], 0.12; 95% confidence interval [CI], 0.07-0.18 and OR, 0.69; 95% CI, 0.53-0.91, respectively). The risk of severe malaria also varied by Hp genotype: Hp2-1 was associated with the greatest protection against severe malaria and Hp2-2 with the greatest risk. Meta-analysis of the current and published studies suggests that Hp2-2 is associated with increased risk of severe malaria compared with Hp2-1. We found a significant interaction between Hp genotype and α(+)thalassemia in predicting risk of severe malaria: Hp2-1 in combination with heterozygous or homozygous α(+)thalassemia was associated with protection from severe malaria (OR, 0.73; 95% CI, 0.54-0.99 and OR, 0.48; 95% CI, 0.32-0.73, respectively), but α(+)thalassemia in combination with Hp2-2 was not protective. This epistatic interaction together with varying frequencies of α(+)thalassemia across Africa may explain the inconsistent relationship between Hp genotype and malaria reported in previous studies.

    Blood 2014;123;13;2008-16

  • Transcriptionally active chromatin recruits homologous recombination at DNA double-strand breaks.

    Aymard F, Bugler B, Schmidt CK, Guillou E, Caron P, Briois S, Iacovoni JS, Daburon V, Miller KM, Jackson SP and Legube G

    1] Laboratoire de Biologie Cellulaire et Moléculaire du Contrôle de la Prolifération, Université de Toulouse, Université Paul Sabatier, Toulouse, France. [2] CNRS, Laboratoire de Biologie Cellulaire et Moléculaire du Contrôle de la Prolifération, Toulouse, France.

    Although both homologous recombination (HR) and nonhomologous end joining can repair DNA double-strand breaks (DSBs), the mechanisms by which one of these pathways is chosen over the other remain unclear. Here we show that transcriptionally active chromatin is preferentially repaired by HR. Using chromatin immunoprecipitation-sequencing (ChIP-seq) to analyze repair of multiple DSBs induced throughout the human genome, we identify an HR-prone subset of DSBs that recruit the HR protein RAD51, undergo resection and rely on RAD51 for efficient repair. These DSBs are located in actively transcribed genes and are targeted to HR repair via the transcription elongation-associated mark trimethylated histone H3 K36. Concordantly, depletion of SETD2, the main H3 K36 trimethyltransferase, severely impedes HR at such DSBs. Our study thereby demonstrates a primary role in DSB repair of the chromatin context in which a break occurs.

    Nature structural & molecular biology 2014

  • Revisiting the thrifty gene hypothesis via 65 loci associated with susceptibility to type 2 diabetes.

    Ayub Q, Moutsianas L, Chen Y, Panoutsopoulou K, Colonna V, Pagani L, Prokopenko I, Ritchie GR, Tyler-Smith C, McCarthy MI, Zeggini E and Xue Y

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1HH, UK.

    We have investigated the evidence for positive selection in samples of African, European, and East Asian ancestry at 65 loci associated with susceptibility to type 2 diabetes (T2D) previously identified through genome-wide association studies. Selection early in human evolutionary history is predicted to lead to ancestral risk alleles shared between populations, whereas late selection would result in population-specific signals at derived risk alleles. By using a wide variety of tests based on the site frequency spectrum, haplotype structure, and population differentiation, we found no global signal of enrichment for positive selection when we considered all T2D risk loci collectively. However, in a locus-by-locus analysis, we found nominal evidence for positive selection at 14 of the loci. Selection favored the protective and risk alleles in similar proportions, rather than the risk alleles specifically as predicted by the thrifty gene hypothesis, and may not be related to influence on diabetes. Overall, we conclude that past positive selection has not been a powerful influence driving the prevalence of T2D risk alleles.

    Funded by: Wellcome Trust: 098051, 098381, WT090367MA

    American journal of human genetics 2014;94;2;176-85

  • Novel mutations in penicillin-binding protein genes in clinical Staphylococcus aureus isolates that are methicillin resistant on susceptibility testing, but lack the mec gene.

    Ba X, Harrison EM, Edwards GF, Holden MT, Larsen AR, Petersen A, Skov RL, Peacock SJ, Parkhill J, Paterson GK and Holmes MA

    Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.

    Objectives: Methicillin-resistant Staphylococcus aureus (MRSA) is an important global health problem. MRSA resistance to β-lactam antibiotics is mediated by the mecA or mecC genes, which encode an alternative penicillin-binding protein (PBP) 2a that has a low affinity to β-lactam antibiotics. Detection of mec genes or PBP2a is regarded as the gold standard for the diagnosis of MRSA. We identified four MRSA isolates that lacked mecA or mecC genes, but were still phenotypically resistant to pencillinase-resistant β-lactam antibiotics.

    Methods: The four human S. aureus isolates were investigated by whole genome sequencing and a range of phenotypic assays.

    Results: We identified a number of amino acid substitutions present in the endogenous PBPs 1, 2 and 3 that were found in the resistant isolates but were absent in closely related susceptible isolates and which may be the basis of resistance. Of particular interest are three identical amino acid substitutions in PBPs 1, 2 and 3, occurring independently in isolates from at least two separate multilocus sequence types. Two different non-conservative substitutions were also present in the same amino acid of PBP1 in two isolates from two different sequence types.

    Conclusions: This work suggests that phenotypically resistant MRSA could be misdiagnosed using molecular methods alone and provides evidence of alternative mechanisms for β-lactam resistance in MRSA that may need to be considered by diagnostic laboratories.

    Funded by: Medical Research Council: G1001787

    The Journal of antimicrobial chemotherapy 2014;69;3;594-7

  • Poxviruses in Bats … so What?

    Baker KS and Murcia PR

    Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK.

    Poxviruses are important pathogens of man and numerous domestic and wild animal species. Cross species (including zoonotic) poxvirus infections can have drastic consequences for the recipient host. Bats are a diverse order of mammals known to carry lethal viral zoonoses such as Rabies, Hendra, Nipah, and SARS. Consequent targeted research is revealing bats to be infected with a rich diversity of novel viruses. Poxviruses were recently identified in bats and the settings in which they were found were dramatically different. Here, we review the natural history of poxviruses in bats and highlight the relationship of the viruses to each other and their context in the Poxviridae family. In addition to considering the zoonotic potential of these viruses, we reflect on the broader implications of these findings. Specifically, the potential to explore and exploit this newfound relationship to study coevolution and cross species transmission together with fundamental aspects of poxvirus host tropism as well as bat virology and immunology.

    Viruses 2014;6;4;1564-77

  • Mutations in KPTN cause macrocephaly, neurodevelopmental delay, and seizures.

    Baple EL, Maroofian R, Chioza BA, Izadi M, Cross HE, Al-Turki S, Barwick K, Skrzypiec A, Pawlak R, Wagner K, Coblentz R, Zainy T, Patton MA, Mansour S, Rich P, Qualmann B, Hurles ME, Kessels MM and Crosby AH

    Monogenic Molecular Genetics, University of Exeter Medical School, St. Luke's Campus, Magdalen Road, Exeter EX1 2LU, UK.

    The proper development of neuronal circuits during neuromorphogenesis and neuronal-network formation is critically dependent on a coordinated and intricate series of molecular and cellular cues and responses. Although the cortical actin cytoskeleton is known to play a key role in neuromorphogenesis, relatively little is known about the specific molecules important for this process. Using linkage analysis and whole-exome sequencing on samples from families from the Amish community of Ohio, we have demonstrated that mutations in KPTN, encoding kaptin, cause a syndrome typified by macrocephaly, neurodevelopmental delay, and seizures. Our immunofluorescence analyses in primary neuronal cell cultures showed that endogenous and GFP-tagged kaptin associates with dynamic actin cytoskeletal structures and that this association is lost upon introduction of the identified mutations. Taken together, our studies have identified kaptin alterations responsible for macrocephaly and neurodevelopmental delay and define kaptin as a molecule crucial for normal human neuromorphogenesis.

    Funded by: Medical Research Council: G1001931, G1002279; Wellcome Trust: WT098051

    American journal of human genetics 2014;94;1;87-94

  • Transposon mutagenesis identifies genes driving hepatocellular carcinoma in a chronic hepatitis B mouse model.

    Bard-Chapeau EA, Nguyen AT, Rust AG, Sayadi A, Lee P, Chua BQ, New LS, de Jong J, Ward JM, Chin CK, Chew V, Toh HC, Abastado JP, Benoukraf T, Soong R, Bard FA, Dupuy AJ, Johnson RL, Radda GK, Chan EC, Wessels LF, Adams DJ, Jenkins NA and Copeland NG

    Institute Molecular and Cell Biology, Agency for Science, Technology and Research (A*STAR), Biopolis, Singapore.

    The most common risk factor for developing hepatocellular carcinoma (HCC) is chronic infection with hepatitis B virus (HBV). To better understand the evolutionary forces driving HCC, we performed a near-saturating transposon mutagenesis screen in a mouse HBV model of HCC. This screen identified 21 candidate early stage drivers and a very large number (2,860) of candidate later stage drivers that were enriched for genes that are mutated, deregulated or functioning in signaling pathways important for human HCC, with a striking 1,199 genes being linked to cellular metabolic processes. Our study provides a comprehensive overview of the genetic landscape of HCC.

    Nature genetics 2014;46;1;24-32

  • Efficacy of a Plasmodium vivax Malaria Vaccine Using ChAd63 and Modified Vaccinia Ankara Expressing Thrombospondin-Related Anonymous Protein as Assessed with Transgenic Plasmodium berghei Parasites.

    Bauza K, Malinauskas T, Pfander C, Anar B, Jones EY, Billker O, Hill AV and Reyes-Sandoval A

    The Jenner Institute, University of Oxford, Oxford, United Kingdom.

    Plasmodium vivax is the world's most widely distributed malaria parasite and a potential cause of morbidity and mortality for approximately 2.85 billion people living mainly in Southeast Asia and Latin America. Despite this dramatic burden, very few vaccines have been assessed in humans. The clinically relevant vectors modified vaccinia virus Ankara (MVA) and the chimpanzee adenovirus ChAd63 are promising delivery systems for malaria vaccines due to their safety profiles and proven ability to induce protective immune responses against Plasmodium falciparum thrombospondin-related anonymous protein (TRAP) in clinical trials. Here, we describe the development of new recombinant ChAd63 and MVA vectors expressing P. vivax TRAP (PvTRAP) and show their ability to induce high antibody titers and T cell responses in mice. In addition, we report a novel way of assessing the efficacy of new candidate vaccines against P. vivax using a fully infectious transgenic Plasmodium berghei parasite expressing P. vivax TRAP to allow studies of vaccine efficacy and protective mechanisms in rodents. Using this model, we found that both CD8(+) T cells and antibodies mediated protection against malaria using virus-vectored vaccines. Our data indicate that ChAd63 and MVA expressing PvTRAP are good preerythrocytic-stage vaccine candidates with potential for future clinical application.

    Infection and immunity 2014;82;3;1277-86

  • Recurrent PTPRB and PLCG1 mutations in angiosarcoma.

    Behjati S, Tarpey PS, Sheldon H, Martincorena I, Van Loo P, Gundem G, Wedge DC, Ramakrishna M, Cooke SL, Pillay N, Vollan HK, Papaemmanuil E, Koss H, Bunney TD, Hardy C, Joseph OR, Martin S, Mudie L, Butler A, Teague JW, Patil M, Steers G, Cao Y, Gumbs C, Ingram D, Lazar AJ, Little L, Mahadeshwar H, Protopopov A, Al Sannaa GA, Seth S, Song X, Tang J, Zhang J, Ravi V, Torres KE, Khatri B, Halai D, Roxanis I, Baumhoer D, Tirabosco R, Amary MF, Boshoff C, McDermott U, Katan M, Stratton MR, Futreal PA, Flanagan AM, Harris A and Campbell PJ

    1] Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK. [2] Department of Paediatrics, University of Cambridge, Cambridge, UK. [3].

    Angiosarcoma is an aggressive malignancy that arises spontaneously or secondarily to ionizing radiation or chronic lymphoedema. Previous work has identified aberrant angiogenesis, including occasional somatic mutations in angiogenesis signaling genes, as a key driver of angiosarcoma. Here we employed whole-genome, whole-exome and targeted sequencing to study the somatic changes underpinning primary and secondary angiosarcoma. We identified recurrent mutations in two genes, PTPRB and PLCG1, which are intimately linked to angiogenesis. The endothelial phosphatase PTPRB, a negative regulator of vascular growth factor tyrosine kinases, harbored predominantly truncating mutations in 10 of 39 tumors (26%). PLCG1, a signal transducer of tyrosine kinases, encoded a recurrent, likely activating p.Arg707Gln missense variant in 3 of 34 cases (9%). Overall, 15 of 39 tumors (38%) harbored at least one driver mutation in angiogenesis signaling genes. Our findings inform and reinforce current therapeutic efforts to target angiogenesis signaling in angiosarcoma.

    Nature genetics 2014

  • A High-Definition View of Functional Genetic Variation from Natural Yeast Genomes.

    Bergström A, Simpson JT, Salinas F, Barré B, Parts L, Zia A, Nguyen Ba AN, Moses AM, Louis EJ, Mustonen V, Warringer J, Durbin R and Liti G

    Institute for Research on Cancer and Ageing, Nice (IRCAN), University of Nice, Nice, France.

    The question of how genetic variation in a population influences phenotypic variation and evolution is of major importance in modern biology. Yet much is still unknown about the relative functional importance of different forms of genome variation and how they are shaped by evolutionary processes. Here we address these questions by population level sequencing of 42 strains from the budding yeast Saccharomyces cerevisiae and its closest relative S. paradoxus. We find that genome content variation, in the form of presence or absence as well as copy number of genetic material, is higher within S. cerevisiae than within S. paradoxus, despite genetic distances as measured in single-nucleotide polymorphisms being vastly smaller within the former species. This genome content variation, as well as loss-of-function variation in the form of premature stop codons and frameshifting indels, is heavily enriched in the subtelomeres, strongly reinforcing the relevance of these regions to functional evolution. Genes affected by these likely functional forms of variation are enriched for functions mediating interaction with the external environment (sugar transport and metabolism, flocculation, metal transport, and metabolism). Our results and analyses provide a comprehensive view of genomic diversity in budding yeast and expose surprising and pronounced differences between the variation within S. cerevisiae and that within S. paradoxus. We also believe that the sequence data and de novo assemblies will constitute a useful resource for further evolutionary and population genomics studies.

    Molecular biology and evolution 2014

  • Heterogeneity of genomic evolution and mutational profiles in multiple myeloma.

    Bolli N, Avet-Loiseau H, Wedge DC, Van Loo P, Alexandrov LB, Martincorena I, Dawson KJ, Iorio F, Nik-Zainal S, Bignell GR, Hinton JW, Li Y, Tubio JM, McLaren S, O' Meara S, Butler AP, Teague JW, Mudie L, Anderson E, Rashid N, Tai YT, Shammas MA, Sperling AS, Fulciniti M, Richardson PG, Parmigiani G, Magrangeas F, Minvielle S, Moreau P, Attal M, Facon T, Futreal PA, Anderson KC, Campbell PJ and Munshi NC

    1] Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK [2] Department of Haematology, University of Cambridge, CIMR, Cambridge CB2 0XY, UK.

    Multiple myeloma is an incurable plasma cell malignancy with a complex and incompletely understood molecular pathogenesis. Here we use whole-exome sequencing, copy-number profiling and cytogenetics to analyse 84 myeloma samples. Most cases have a complex subclonal structure and show clusters of subclonal variants, including subclonal driver mutations. Serial sampling reveals diverse patterns of clonal evolution, including linear evolution, differential clonal response and branching evolution. Diverse processes contribute to the mutational repertoire, including kataegis and somatic hypermutation, and their relative contribution changes over time. We find heterogeneity of mutational spectrum across samples, with few recurrent genes. We identify new candidate genes, including truncations of SP140, LTB, ROBO1 and clustered missense mutations in EGR1. The myeloma genome is heterogeneous across the cohort, and exhibits diversity in clonal admixture and in dynamics of evolution, which may impact prognostic stratification, therapeutic approaches and assessment of disease response to treatment.

    Nature communications 2014;5;2997

  • DECIPHER: database for the interpretation of phenotype-linked plausibly pathogenic sequence and copy-number variation.

    Bragin E, Chatzimichali EA, Wright CF, Hurles ME, Firth HV, Bevan AP and Swaminathan GJ

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK and Cambridge University Department of Medical Genetics, Addenbrooke's Hospital, Cambridge CB2 2QQ, UK.

    The DECIPHER database ( is an accessible online repository of genetic variation with associated phenotypes that facilitates the identification and interpretation of pathogenic genetic variation in patients with rare disorders. Contributing to DECIPHER is an international consortium of >200 academic clinical centres of genetic medicine and ≥1600 clinical geneticists and diagnostic laboratory scientists. Information integrated from a variety of bioinformatics resources, coupled with visualization tools, provides a comprehensive set of tools to identify other patients with similar genotype-phenotype characteristics and highlights potentially pathogenic genes. In a significant development, we have extended DECIPHER from a database of just copy-number variants to allow upload, annotation and analysis of sequence variants such as single nucleotide variants (SNVs) and InDels. Other notable developments in DECIPHER include a purpose-built, customizable and interactive genome browser to aid combined visualization and interpretation of sequence and copy-number variation against informative datasets of pathogenic and population variation. We have also introduced several new features to our deposition and analysis interface. This article provides an update to the DECIPHER database, an earlier instance of which has been described elsewhere [Swaminathan et al. (2012) DECIPHER: web-based, community resource for clinical interpretation of rare variants in developmental disorders. Hum. Mol. Genet., 21, R37-R44].

    Nucleic acids research 2014;42;1;D993-D1000

  • Phosphoinositide Metabolism Links cGMP-Dependent Protein Kinase G to Essential Ca2+ Signals at Key Decision Points in the Life Cycle of Malaria Parasites.

    Brochet M, Collins MO, Smith TK, Thompson E, Sebastian S, Volkmann K, Schwach F, Chappell L, Gomes AR, Berriman M, Rayner JC, Baker DA, Choudhary J and Billker O

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    Many critical events in the Plasmodium life cycle rely on the controlled release of Ca2+ from intracellular stores to activate stage-specific Ca2+-dependent protein kinases. Using the motility of Plasmodium berghei ookinetes as a signalling paradigm, we show that the cyclic guanosine monophosphate (cGMP)-dependent protein kinase, PKG, maintains the elevated level of cytosolic Ca2+ required for gliding motility. We find that the same PKG-dependent pathway operates upstream of the Ca2+ signals that mediate activation of P. berghei gametocytes in the mosquito and egress of Plasmodium falciparum merozoites from infected human erythrocytes. Perturbations of PKG signalling in gliding ookinetes have a marked impact on the phosphoproteome, with a significant enrichment of in vivo regulated sites in multiple pathways including vesicular trafficking and phosphoinositide metabolism. A global analysis of cellular phospholipids demonstrates that in gliding ookinetes PKG controls phosphoinositide biosynthesis, possibly through the subcellular localisation or activity of lipid kinases. Similarly, phosphoinositide metabolism links PKG to egress of P. falciparum merozoites, where inhibition of PKG blocks hydrolysis of phosphatidylinostitol (4,5)-bisphosphate. In the face of an increasing complexity of signalling through multiple Ca2+ effectors, PKG emerges as a unifying factor to control multiple cellular Ca2+ signals essential for malaria parasite development and transmission.

    PLoS biology 2014;12;3;e1001806

  • Exome sequencing improves genetic diagnosis of structural fetal abnormalities revealed by ultrasound.

    Carss KJ, Hillman SC, Parthiban V, McMullan DJ, Maher ER, Kilby MD and Hurles ME

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    The genetic aetiology of non-aneuploid fetal structural abnormalities is typically investigated by karyotyping and array-based detection of microscopically detectable rearrangements, and submicroscopic copy number variants (CNVs), which collectively yield a pathogenic finding in up to 10% of cases. We propose that exome sequencing may substantially increase the identification of underlying aetiologies.We performed exome sequencing on a cohort of 30 non-aneuploid fetuses and neonates (along with their parents) with diverse structural abnormalities first identified by prenatal ultrasound. We identified candidate pathogenic variants with a range of inheritance models, and evaluated these in the context of detailed phenotypic information.We identified 35 de novo single nucleotide variants (SNVs), small indels, deletions or duplications, of which three (accounting for 10% of the cohort) are highly likely to be causative. These are de novo missense variants in FGFR3 and COL2A1, and a de novo 16·8 kb deletion that includes most of OFD1. In five further cases (17%) we identified de novo or inherited recessive or X-linked variants in plausible candidate genes, which require additional validation to determine pathogenicity.Our diagnostic yield of 10% is comparable to, and supplementary to, the diagnostic yield of existing microarray testing for large chromosomal rearrangements and targeted CNV detection. The de novo nature of these events could enable couples to be counselled as to their low recurrence risk. This study outlines the way for a substantial improvement in the diagnostic yield of prenatal genetic abnormalities through the application of next generation sequencing.

    Human molecular genetics 2014

  • Evolution and transmission of drug-resistant tuberculosis in a Russian population.

    Casali N, Nikolayevskyy V, Balabanova Y, Harris SR, Ignatyeva O, Kontsevaya I, Corander J, Bryant J, Parkhill J, Nejentsev S, Horstmann RD, Brown T and Drobniewski F

    Public Health England (PHE) National Mycobacterium Reference Laboratory, Clinical TB and HIV Group, Blizard Institute, Queen Mary University of London, London, UK.

    The molecular mechanisms determining the transmissibility and prevalence of drug-resistant tuberculosis in a population were investigated through whole-genome sequencing of 1,000 prospectively obtained patient isolates from Russia. Two-thirds belonged to the Beijing lineage, which was dominated by two homogeneous clades. Multidrug-resistant (MDR) genotypes were found in 48% of isolates overall and in 87% of the major clades. The most common rpoB mutation was associated with fitness-compensatory mutations in rpoA or rpoC, and a new intragenic compensatory substitution was identified. The proportion of MDR cases with extensively drug-resistant (XDR) tuberculosis was 16% overall, with 65% of MDR isolates harboring eis mutations, selected by kanamycin therapy, which may drive the expansion of strains with enhanced virulence. The combination of drug resistance and compensatory mutations displayed by the major clades confers clinical resistance without compromising fitness and transmissibility, showing that, in addition to weaknesses in the tuberculosis control program, biological factors drive the persistence and spread of MDR and XDR tuberculosis in Russia and beyond.

    Nature genetics 2014

  • Found in translation.

    Chappell L

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2014;12;4;238

  • A reduction in Ptprq associated with specific features of the deafness phenotype of the miR-96 mutant mouse diminuendo.

    Chen J, Johnson SL, Lewis MA, Hilton JM, Huma A, Marcotti W and Steel KP

    Wellcome Trust Sanger Institute, Cambridge, UK; Wolfson Centre for Age-Related Diseases, King's College London, Guy's Campus, London, SE1 1UL, UK.

    miR-96 is a microRNA, a non-coding RNA gene which regulates a wide array of downstream genes. The miR-96 mouse mutant diminuendo exhibits deafness and arrested hair cell functional and morphological differentiation. We have previously shown that several genes are markedly downregulated in the diminuendo organ of Corti; one of these is Ptprq, a gene known to be important for maturation and maintenance of hair cells. In order to study the contribution that downregulation of Ptprq makes to the diminuendo phenotype, we carried out microarrays, scanning electron microscopy and single hair cell electrophysiology to compare diminuendo mutants (heterozygous and homozygous) with mice homozygous for a functional null allele of Ptprq. In terms of both morphology and electrophysiology, the auditory phenotype of mice lacking Ptprq resembles that of diminuendo heterozygotes, while diminuendo homozygotes are more severely affected. A comparison of transcriptomes indicates there is a broad similarity between diminuendo homozygotes and Ptprq-null mice. The reduction in Ptprq observed in diminuendo mice appears to be a major contributor to the morphological, transcriptional and electrophysiological phenotype, but does not account for the complete diminuendo phenotype.

    The European journal of neuroscience 2014

  • Dense genomic sampling identifies highways of pneumococcal recombination.

    Chewapreecha C, Harris SR, Croucher NJ, Turner C, Marttinen P, Cheng L, Pessia A, Aanensen DM, Mather AE, Page AJ, Salter SJ, Harris D, Nosten F, Goldblatt D, Corander J, Parkhill J, Turner P and Bentley SD

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Evasion of clinical interventions by Streptococcus pneumoniae occurs through selection of non-susceptible genomic variants. We report whole-genome sequencing of 3,085 pneumococcal carriage isolates from a 2.4-km(2) refugee camp. This sequencing provides unprecedented resolution of the process of recombination and its impact on population evolution. Genomic recombination hotspots show remarkable consistency between lineages, indicating common selective pressures acting at certain loci, particularly those associated with antibiotic resistance. Temporal changes in antibiotic consumption are reflected in changes in recombination trends, demonstrating rapid spread of resistance when selective pressure is high. The highest frequencies of receipt and donation of recombined DNA fragments were observed in non-encapsulated lineages, implying that this largely overlooked pneumococcal group, which is beyond the reach of current vaccines, may have a major role in genetic exchange and the adaptation of the species as a whole. These findings advance understanding of pneumococcal population dynamics and provide information for the design of future intervention strategies.

    Nature genetics 2014;46;3;305-9

  • From cheek swabs to consensus sequences: an A to Z protocol for high-throughput DNA sequencing of complete human mitochondrial genomes.

    Clarke AC, Prost S, Stanton JA, White WT, Kaplan ME, Matisoo-Smith EA and Genographic Consortium

    Department of Anatomy, University of Otago, Dunedin, New Zealand.

    Background: Next-generation DNA sequencing (NGS) technologies have made huge impacts in many fields of biological research, but especially in evolutionary biology. One area where NGS has shown potential is for high-throughput sequencing of complete mtDNA genomes (of humans and other animals). Despite the increasing use of NGS technologies and a better appreciation of their importance in answering biological questions, there remain significant obstacles to the successful implementation of NGS-based projects, especially for new users.

    Results: Here we present an 'A to Z' protocol for obtaining complete human mitochondrial (mtDNA) genomes - from DNA extraction to consensus sequence. Although designed for use on humans, this protocol could also be used to sequence small, organellar genomes from other species, and also nuclear loci. This protocol includes DNA extraction, PCR amplification, fragmentation of PCR products, barcoding of fragments, sequencing using the 454 GS FLX platform, and a complete bioinformatics pipeline (primer removal, reference-based mapping, output of coverage plots and SNP calling).

    Conclusions: All steps in this protocol are designed to be straightforward to implement, especially for researchers who are undertaking next-generation sequencing for the first time. The molecular steps are scalable to large numbers (hundreds) of individuals and all steps post-DNA extraction can be carried out in 96-well plate format. Also, the protocol has been assembled so that individual 'modules' can be swapped out to suit available resources.

    BMC genomics 2014;15;68

  • PolyTB: A genomic variation map for Mycobacterium tuberculosis.

    Coll F, Preston M, Guerra-Assunção JA, Hill-Cawthorn G, Harris D, Perdigão J, Viveiros M, Portugal I, Drobniewski F, Gagneux S, Glynn JR, Pain A, Parkhill J, McNerney R, Martin N and Clark TG

    Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, WC1E 7HT London, UK. Electronic address:

    Tuberculosis (TB) caused by Mycobacterium tuberculosis (Mtb) is the second major cause of death from an infectious disease worldwide. Recent advances in DNA sequencing are leading to the ability to generate whole genome information in clinical isolates of M. tuberculosis complex (MTBC). The identification of informative genetic variants such as phylogenetic markers and those associated with drug resistance or virulence will help barcode Mtb in the context of epidemiological, diagnostic and clinical studies. Mtb genomic datasets are increasingly available as raw sequences, which are potentially difficult and computer intensive to process, and compare across studies. Here we have processed the raw sequence data (>1500 isolates, eight studies) to compile a catalogue of SNPs (n = 74,039, 63% non-synonymous, 51.1% in more than one isolate, i.e. non-private), small indels (n = 4810) and larger structural variants (n = 800). We have developed the PolyTB web-based tool ( to visualise the resulting variation and important meta-data (e.g. in silico inferred strain-types, location) within geographical map and phylogenetic views. This resource will allow researchers to identify polymorphisms within candidate genes of interest, as well as examine the genomic diversity and distribution of strains. PolyTB source code is freely available to researchers wishing to develop similar tools for their pathogen of interest.

    Tuberculosis (Edinburgh, Scotland) 2014

  • Confident and sensitive phosphoproteomics using combinations of collision induced dissociation and electron transfer dissociation.

    Collins MO, Wright JC, Jones M, Rayner JC and Choudhary JS

    Proteomic Mass Spectrometry, The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK.

    We present a workflow using an ETD-optimised version of Mascot Percolator and a modified version of SLoMo (turbo-SLoMo) for analysis of phosphoproteomic data. We have benchmarked this against several database searching algorithms and phosphorylation site localisation tools and show that it offers highly sensitive and confident phosphopeptide identification and site assignment with PSM-level statistics, enabling rigorous comparison of data acquisition methods. We analysed the Plasmodium falciparum schizont phosphoproteome using for the first time, a data-dependent neutral loss-triggered-ETD (DDNL) strategy and a conventional decision-tree method. At a posterior error probability threshold of 0.01, similar numbers of PSMs were identified using both methods with a 73% overlap in phosphopeptide identifications. The false discovery rate associated with spectral pairs where DDNL CID/ETD identified the same phosphopeptide was <1%. 72% of phosphorylation site assignments using turbo-SLoMo without any score filtering, were identical and 99.8% of these cases are associated with a the false localisation rate of <5%. We show that DDNL acquisition is a useful approach for phosphoproteomics and results in increased confidence in phosphopeptide identification without compromising sensitivity or duty cycle. Furthermore, the combination of Mascot Percolator and turbo-SLoMo represents a robust workflow for phosphoproteomic data analysis using CID and ETD fragmentation.

    Protein phosphorylation is a ubiquitous post-translational modification that regulates protein function. Mass spectrometry-based approaches have revolutionised its analysis on a large-scale but phosphorylation sites are often identified by single phosphopeptides and therefore require more rigorous data analysis to unsure that sites are identified with high confidence for follow up experiments to investigate their biological significance. The coverage and confidence of phosphoproteomic experiments can be enhanced by the use of multiple complementary fragmentation methods. Here we have benchmarked a data analysis pipeline for analysis of phosphoproteomic data generated using CID and ETD fragmentation and used it to demonstrate the utility of a data-dependent neutral loss triggered ETD fragmentation strategy for high confidence phosphopeptide identification and phosphorylation site localisation.

    Journal of proteomics 2014

  • Processed pseudogenes acquired somatically during cancer development.

    Cooke SL, Shlien A, Marshall J, Pipinikas CP, Martincorena I, Tubio JM, Li Y, Menzies A, Mudie L, Ramakrishna M, Yates L, Davies H, Bolli N, Bignell GR, Tarpey PS, Behjati S, Nik-Zainal S, Papaemmanuil E, Teixeira VH, Raine K, O'Meara S, Dodoran MS, Teague JW, Butler AP, Iacobuzio-Donahue C, Santarius T, Grundy RG, Malkin D, Greaves M, Munshi N, Flanagan AM, Bowtell D, Martin S, Larsimont D, Reis-Filho JS, Boussioutas A, Taylor JA, Hayes ND, Janes SM, Futreal PA, Stratton MR, McDermott U, Campbell PJ and ICGC Breast Cancer Group

    Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    Cancer evolves by mutation, with somatic reactivation of retrotransposons being one such mutational process. Germline retrotransposition can cause processed pseudogenes, but whether this occurs somatically has not been evaluated. Here we screen sequencing data from 660 cancer samples for somatically acquired pseudogenes. We find 42 events in 17 samples, especially non-small cell lung cancer (5/27) and colorectal cancer (2/11). Genomic features mirror those of germline LINE element retrotranspositions, with frequent target-site duplications (67%), consensus TTTTAA sites at insertion points, inverted rearrangements (21%), 5' truncation (74%) and polyA tails (88%). Transcriptional consequences include expression of pseudogenes from UTRs or introns of target genes. In addition, a somatic pseudogene that integrated into the promoter and first exon of the tumour suppressor gene, MGA, abrogated expression from that allele. Thus, formation of processed pseudogenes represents a new class of mutation occurring during cancer development, with potentially diverse functional consequences depending on genomic context.

    Nature communications 2014;5;3644

  • Genomic identification of a novel co-trimoxazole resistance genotype and its prevalence amongst Streptococcus pneumoniae in Malawi.

    Cornick JE, Harris SR, Parry CM, Moore MJ, Jassi C, Kamng'ona A, Kulohoma B, Heyderman RS, Bentley SD and Everett DB

    Malawi-Liverpool-Wellcome Clinical Research Programme, University of Malawi, College of Medicine, Blantyre, Malawi.

    Objectives: This study aimed to define the molecular basis of co-trimoxazole resistance in Malawian pneumococci under the dual selective pressure of widespread co-trimoxazole and sulfadoxine/pyrimethamine use. Methods: We measured the trimethoprim and sulfamethoxazole MICs and analysed folA and folP nucleotide and translated amino acid sequences for 143 pneumococci isolated from carriage and invasive disease in Malawi (2002-08). Results: Pneumococci were highly resistant to both trimethoprim and sulfamethoxazole (96%, 137/143). Sulfamethoxazole-resistant isolates showed a 3 or 6 bp insertion in the sulphonamide-binding site of folP. The trimethoprim-resistant isolates fell into three genotypic groups based on dihydrofolate reductase (encoded by folA) mutations: Ile-100-Leu (10%), the Ile-100-Leu substitution together with a residue 92 substitution (56%) and those with a novel uncharacterized resistance genotype (34%). The nucleotide sequence divergence and dN/dS of folA and folP remained stable from 2004 onwards. Conclusions: S. pneumoniae exhibit almost universal co-trimoxazole resistance in vitro and in silico that we believe is driven by extensive co-trimoxazole and sulfadoxine/pyrimethamine use. More than one-third of pneumococci employ a novel mechanism of co-trimoxazole resistance. Resistance has now reached a point of stabilizing evolution. The use of co-trimoxazole to prevent pneumococcal infection in HIV/AIDS patients in sub-Saharan Africa should be re-evaluated.

    The Journal of antimicrobial chemotherapy 2014;69;2;368-74

  • Full genome virus detection in fecal samples using sensitive nucleic Acid preparation, deep sequencing, and a novel iterative sequence classification algorithm.

    Cotten M, Oude Munnink B, Canuti M, Deijs M, Watson SJ, Kellam P and van der Hoek L

    Wellcome Trust Sanger Institute, Hinxton, United Kingdom.

    We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis.

    PloS one 2014;9;4;e93269

  • Recurrent mutations, including NPM1c, activate a BRD4-dependent core transcriptional program in acute myeloid leukemia.

    Dawson MA, Gudgin EJ, Horton SJ, Giotopoulos G, Meduri E, Robson S, Cannizzaro E, Osaki H, Wiese M, Putwain S, Fong CY, Grove C, Craig J, Dittmann A, Lugo D, Jeffrey P, Drewes G, Lee K, Bullinger L, Prinjha RK, Kouzarides T, Vassiliou GS and Huntly BJ

    1] Department of Haematology, Cambridge Institute for Medical Research and Addenbrookes Hospital, University of Cambridge, Cambridge, UK [2] Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, Cambridge, UK [3] Gurdon Institute and Department of Pathology, University of Cambridge, Cambridge UK.

    Recent evidence suggests that inhibition of bromodomain and extra-terminal (BET) epigenetic readers may have clinical utility against acute myeloid leukemia (AML). Here we validate this hypothesis, demonstrating the efficacy of the BET inhibitor I-BET151 across a variety of AML subtypes driven by disparate mutations. We demonstrate that a common 'core' transcriptional program, which is HOX gene independent, is downregulated in AML and underlies sensitivity to I-BET treatment. This program is enriched for genes that contain 'super-enhancers', recently described regulatory elements postulated to control key oncogenic driver genes. Moreover, our program can independently classify AML patients into distinct cytogenetic and molecular subgroups, suggesting that it contains biomarkers of sensitivity and response. We focus AML with mutations of the Nucleophosmin gene (NPM1) and show evidence to suggest that wild-type NPM1 has an inhibitory influence on BRD4 that is relieved upon NPM1c mutation and cytosplasmic dislocation. This leads to the upregulation of the core transcriptional program facilitating leukemia development. This program is abrogated by I-BET therapy and by nuclear restoration of NPM1. Finally, we demonstrate the efficacy of I-BET151 in a unique murine model and in primary patient samples of NPM1c AML. Taken together, our data support the use of BET inhibitors in clinical trials in AML.

    Leukemia 2014;28;2;311-20

  • Chromatin landscapes of retroviral and transposon integration profiles.

    de Jong J, Akhtar W, Badhai J, Rust AG, Rad R, Hilkens J, Berns A, van Lohuizen M, Wessels LF and de Ridder J

    Computational Cancer Biology Group, Division of Molecular Carcinogenesis, The Netherlands Cancer Institute, Amsterdam, The Netherlands; Netherlands Consortium for Systems Biology, Amsterdam, The Netherlands.

    The ability of retroviruses and transposons to insert their genetic material into host DNA makes them widely used tools in molecular biology, cancer research and gene therapy. However, these systems have biases that may strongly affect research outcomes. To address this issue, we generated very large datasets consisting of [Formula: see text] to [Formula: see text] unselected integrations in the mouse genome for the Sleeping Beauty (SB) and piggyBac (PB) transposons, and the Mouse Mammary Tumor Virus (MMTV). We analyzed [Formula: see text] (epi)genomic features to generate bias maps at both local and genome-wide scales. MMTV showed a remarkably uniform distribution of integrations across the genome. More distinct preferences were observed for the two transposons, with PB showing remarkable resemblance to bias profiles of the Murine Leukemia Virus. Furthermore, we present a model where target site selection is directed at multiple scales. At a large scale, target site selection is similar across systems, and defined by domain-oriented features, namely expression of proximal genes, proximity to CpG islands and to genic features, chromatin compaction and replication timing. Notable differences between the systems are mainly observed at smaller scales, and are directed by a diverse range of features. To study the effect of these biases on integration sites occupied under selective pressure, we turned to insertional mutagenesis (IM) screens. In IM screens, putative cancer genes are identified by finding frequently targeted genomic regions, or Common Integration Sites (CISs). Within three recently completed IM screens, we identified 7%-33% putative false positive CISs, which are likely not the result of the oncogenic selection process. Moreover, results indicate that PB, compared to SB, is more suited to tag oncogenes.

    PLoS genetics 2014;10;4;e1004250

  • Mitochondrial Genome Sequencing in Mesolithic North East Europe Unearths a New Sub-Clade within the Broadly Distributed Human Haplogroup C1.

    Der Sarkissian C, Brotherton P, Balanovsky O, Templeton JE, Llamas B, Soubrier J, Moiseyev V, Khartanovich V, Cooper A, Haak W and Genographic Consortium

    Australian Centre for Ancient DNA, School of Earth and Environmental Sciences, University of Adelaide, Adelaide, South Australia, Australia.

    The human mitochondrial haplogroup C1 has a broad global distribution but is extremely rare in Europe today. Recent ancient DNA evidence has demonstrated its presence in European Mesolithic individuals. Three individuals from the 7,500 year old Mesolithic site of Yuzhnyy Oleni Ostrov, Western Russia, could be assigned to haplogroup C1 based on mitochondrial hypervariable region I sequences. However, hypervariable region I data alone could not provide enough resolution to establish the phylogenetic relationship of these Mesolithic haplotypes with haplogroup C1 mitochondrial DNA sequences found today in populations of Europe, Asia and the Americas. In order to obtain high-resolution data and shed light on the origin of this European Mesolithic C1 haplotype, we target-enriched and sequenced the complete mitochondrial genome of one Yuzhnyy Oleni Ostrov C1 individual. The updated phylogeny of C1 haplogroups indicated that the Yuzhnyy Oleni Ostrov haplotype represents a new distinct clade, provisionally coined "C1f". We show that all three C1 carriers of Yuzhnyy Oleni Ostrov belong to this clade. No haplotype closely related to the C1f sequence could be found in the large current database of ancient and present-day mitochondrial genomes. Hence, we have discovered past human mitochondrial diversity that has not been observed in modern-day populations so far. The lack of positive matches in modern populations may be explained by under-sampling of rare modern C1 carriers or by demographic processes, population extinction or replacement, that may have impacted on populations of Northeast Europe since prehistoric times.

    PloS one 2014;9;2;e87612

  • Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility.

    DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium

    To further understanding of the genetic basis of type 2 diabetes (T2D) susceptibility, we aggregated published meta-analyses of genome-wide association studies (GWAS), including 26,488 cases and 83,964 controls of European, east Asian, south Asian and Mexican and Mexican American ancestry. We observed a significant excess in the directional consistency of T2D risk alleles across ancestry groups, even at SNPs demonstrating only weak evidence of association. By following up the strongest signals of association from the trans-ethnic meta-analysis in an additional 21,491 cases and 55,647 controls of European ancestry, we identified seven new T2D susceptibility loci. Furthermore, we observed considerable improvements in the fine-mapping resolution of common variant association signals at several T2D susceptibility loci. These observations highlight the benefits of trans-ethnic GWAS for the discovery and characterization of complex trait loci and emphasize an exciting opportunity to extend insight into the genetic architecture and pathogenesis of human diseases across populations of diverse ancestry.

    Nature genetics 2014

  • DNA methylation and body-mass index: a genome-wide analysis.

    Dick KJ, Nelson CP, Tsaprouni L, Sandling JK, Aïssi D, Wahl S, Meduri E, Morange PE, Gagnon F, Grallert H, Waldenberger M, Peters A, Erdmann J, Hengstenberg C, Cambien F, Goodall AH, Ouwehand WH, Schunkert H, Thompson JR, Spector TD, Gieger C, Trégouët DA, Deloukas P and Samani NJ

    Department of Cardiovascular Sciences, University of Leicester, Leicester, UK; National Institute for Health Research Leicester Cardiovascular Biomedical Research Unit, Glenfield Hospital, Leicester, UK.

    Background: Obesity is a major health problem that is determined by interactions between lifestyle and environmental and genetic factors. Although associations between several genetic variants and body-mass index (BMI) have been identified, little is known about epigenetic changes related to BMI. We undertook a genome-wide analysis of methylation at CpG sites in relation to BMI.

    Methods: 479 individuals of European origin recruited by the Cardiogenics Consortium formed our discovery cohort. We typed their whole-blood DNA with the Infinium HumanMethylation450 array. After quality control, methylation levels were tested for association with BMI. Methylation sites showing an association with BMI at a false discovery rate q value of 0·05 or less were taken forward for replication in a cohort of 339 unrelated white patients of northern European origin from the MARTHA cohort. Sites that remained significant in this primary replication cohort were tested in a second replication cohort of 1789 white patients of European origin from the KORA cohort. We examined whether methylation levels at identified sites also showed an association with BMI in DNA from adipose tissue (n=635) and skin (n=395) obtained from white female individuals participating in the MuTHER study. Finally, we examined the association of methylation at BMI-associated sites with genetic variants and with gene expression.

    Findings: 20 individuals from the discovery cohort were excluded from analyses after quality-control checks, leaving 459 participants. After adjustment for covariates, we identified an association (q value ≤0·05) between methylation at five probes across three different genes and BMI. The associations with three of these probes-cg22891070, cg27146050, and cg16672562, all of which are in intron 1 of HIF3A-were confirmed in both the primary and second replication cohorts. For every 0·1 increase in methylation β value at cg22891070, BMI was 3·6% (95% CI 2·4-4·9) higher in the discovery cohort, 2·7% (1·2-4·2) higher in the primary replication cohort, and 0·8% (0·2-1·4) higher in the second replication cohort. For the MuTHER cohort, methylation at cg22891070 was associated with BMI in adipose tissue (p=1·72 × 10(-5)) but not in skin (p=0·882). We observed a significant inverse correlation (p=0·005) between methylation at cg22891070 and expression of one HIF3A gene-expression probe in adipose tissue. Two single nucleotide polymorphisms-rs8102595 and rs3826795-had independent associations with methylation at cg22891070 in all cohorts. However, these single nucleotide polymorphisms were not significantly associated with BMI.

    Interpretation: Increased BMI in adults of European origin is associated with increased methylation at the HIF3A locus in blood cells and in adipose tissue. Our findings suggest that perturbation of hypoxia inducible transcription factor pathways could have an important role in the response to increased weight in people.

    Funding: The European Commission, National Institute for Health Research, British Heart Foundation, and Wellcome Trust.

    Lancet 2014

  • Estimating telomere length from whole genome sequence data.

    Ding Z, Mangino M, Aviv A, UK10K Consortium, Spector T and Durbin R

    Genome Informatics, Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK.

    Telomeres play a key role in replicative ageing and undergo age-dependent attrition in vivo. Here, we report a novel method, TelSeq, to measure average telomere length from whole genome or exome shotgun sequence data. In 260 leukocyte samples, we show that TelSeq results correlate with Southern blot measurements of the mean length of terminal restriction fragments (mTRFs) and display age-dependent attrition comparably well as mTRFs.

    Nucleic acids research 2014

  • Neutralization of Plasmodium falciparum Merozoites by Antibodies against PfRH5.

    Douglas AD, Williams AR, Knuepfer E, Illingworth JJ, Furze JM, Crosnier C, Choudhary P, Bustamante LY, Zakutansky SE, Awuah DK, Alanine DG, Theron M, Worth A, Shimkets R, Rayner JC, Holder AA, Wright GJ and Draper SJ

    Jenner Institute, University of Oxford, Oxford OX3 7DQ, United Kingdom;

    There is intense interest in induction and characterization of strain-transcending neutralizing Ab against antigenically variable human pathogens. We have recently identified the human malaria parasite Plasmodium falciparum reticulocyte-binding protein homolog 5 (PfRH5) as a target of broadly neutralizing Abs, but there is little information regarding the functional mechanism(s) of Ab-mediated neutralization. In this study, we report that vaccine-induced polyclonal anti-PfRH5 Abs inhibit the tight attachment of merozoites to erythrocytes and are capable of blocking the interaction of PfRH5 with its receptor basigin. Furthermore, by developing anti-PfRH5 mAbs, we provide evidence of the following: 1) the ability to block the PfRH5-basigin interaction in vitro is predictive of functional activity, but absence of blockade does not predict absence of functional activity; 2) neutralizing mAbs bind spatially related epitopes on the folded protein, involving at least two defined regions of the PfRH5 primary sequence; 3) a brief exposure window of PfRH5 is likely to necessitate rapid binding of Ab to neutralize parasites; and 4) intact bivalent IgG contributes to but is not necessary for parasite neutralization. These data provide important insight into the mechanisms of broadly neutralizing anti-malaria Abs and further encourage anti-PfRH5-based malaria prevention efforts.

    Journal of immunology (Baltimore, Md. : 1950) 2014;192;1;245-58

  • Efficient haplotype matching and storage using the Positional Burrows-Wheeler Transform (PBWT).

    Durbin R

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, UK.

    Motivation: Over the last few years, methods based on suffix arrays using the Burrows-Wheeler Transform have been widely used for DNA sequence read matching and assembly. These provide very fast search algorithms, linear in the search pattern size, on a highly compressible representation of the data set being searched. Meanwhile, algorithmic development for genotype data has concentrated on statistical methods for phasing and imputation, based on probabilistic matching to hidden Markov model representations of the reference data, which while powerful are much less computationally efficient. Here I develop a theory of haplotype matching using suffix array ideas, which should scale to much larger data sets than those currently handled by genotype algorithms. Results: Given M sequences with N bi-allelic variable sites, I give an O(NM) algorithm to derive a representation of the data based on positional prefix arrays, which I term the Positional Burrows-Wheeler Transform (PBWT). On large data sets this compresses with run-length encoding by more than a factor of a hundred smaller than using gzip on the raw data. Using this representation I show how to find all maximal haplotype matches within the set in O(NM) time rather than O(NM(2)) as expected from naive pairwise comparison, and provide a fast algorithm, empirically independent of M given sufficient memory for indexes, to find maximal matches between a new sequence and the set. The discussion includes some proposals about how these approaches could be used for imputation and phasing. Availability: CONTACT:

    Bioinformatics (Oxford, England) 2014

  • The peculiar epidemiology of dracunculiasis in Chad.

    Eberhard ML, Ruiz-Tiben E, Hopkins DR, Farrell C, Toe F, Weiss A, Withers PC, Jenks MH, Thiele EA, Cotton JA, Hance Z, Holroyd N, Cama VA, Tahir MA and Mounda T

    Division of Parasitic Diseases and Malaria, Centers for Disease Control and Prevention, Atlanta, Georgia; The Carter Center, Atlanta, Georgia; The Carter Center, N'Djamena, Chad; LifeSource Biomedical, Centreville, Virginia; The Wellcome Trust Sanger Institute, Hinxton, United Kingdom; Ministry of Public Health, N'Djamena, Chad.

    Dracunculiasis was rediscovered in Chad in 2010 after an apparent absence of 10 years. In April 2012 active village-based surveillance was initiated to determine where, when, and how transmission of the disease was occurring, and to implement interventions to interrupt it. The current epidemiologic pattern of the disease in Chad is unlike that seen previously in Chad or other endemic countries, i.e., no clustering of cases by village or association with a common water source, the average number of worms per person was small, and a large number of dogs were found to be infected. Molecular sequencing suggests these infections were all caused by Dracunculus medinensis. It appears that the infection in dogs is serving as the major driving force sustaining transmission in Chad, that an aberrant life cycle involving a paratenic host common to people and dogs is occurring, and that the cases in humans are sporadic and incidental.

    Funded by: Wellcome Trust: 098051

    The American journal of tropical medicine and hygiene 2014;90;1;61-70

  • CYP6 P450 Enzymes and ACE-1 Duplication Produce Extreme and Multiple Insecticide Resistance in the Malaria Mosquito Anopheles gambiae.

    Edi CV, Djogbénou L, Jenkins AM, Regna K, Muskavitch MA, Poupardin R, Jones CM, Essandoh J, Kétoh GK, Paine MJ, Koudou BG, Donnelly MJ, Ranson H and Weetman D

    Vector Biology Department, Liverpool School of Tropical Medicine, Pembroke Place, Liverpool, United Kingdom; Centre Suisse de Recherches Scientifiques en Côte d'Ivoire, Abidjan, Cote d'Ivoire.

    Malaria control relies heavily on pyrethroid insecticides, to which susceptibility is declining in Anopheles mosquitoes. To combat pyrethroid resistance, application of alternative insecticides is advocated for indoor residual spraying (IRS), and carbamates are increasingly important. Emergence of a very strong carbamate resistance phenotype in Anopheles gambiae from Tiassalé, Côte d'Ivoire, West Africa, is therefore a potentially major operational challenge, particularly because these malaria vectors now exhibit resistance to multiple insecticide classes. We investigated the genetic basis of resistance to the most commonly-applied carbamate, bendiocarb, in An. gambiae from Tiassalé. Geographically-replicated whole genome microarray experiments identified elevated P450 enzyme expression as associated with bendiocarb resistance, most notably genes from the CYP6 subfamily. P450s were further implicated in resistance phenotypes by induction of significantly elevated mortality to bendiocarb by the synergist piperonyl butoxide (PBO), which also enhanced the action of pyrethroids and an organophosphate. CYP6P3 and especially CYP6M2 produced bendiocarb resistance via transgenic expression in Drosophila in addition to pyrethroid resistance for both genes, and DDT resistance for CYP6M2 expression. CYP6M2 can thus cause resistance to three distinct classes of insecticide although the biochemical mechanism for carbamates is unclear because, in contrast to CYP6P3, recombinant CYP6M2 did not metabolise bendiocarb in vitro. Strongly bendiocarb resistant mosquitoes also displayed elevated expression of the acetylcholinesterase ACE-1 gene, arising at least in part from gene duplication, which confers a survival advantage to carriers of additional copies of resistant ACE-1 G119S alleles. Our results are alarming for vector-based malaria control. Extreme carbamate resistance in Tiassalé An. gambiae results from coupling of over-expressed target site allelic variants with heightened CYP6 P450 expression, which also provides resistance across contrasting insecticides. Mosquito populations displaying such a diverse basis of extreme and cross-resistance are likely to be unresponsive to standard insecticide resistance management practices.

    PLoS genetics 2014;10;3;e1004236

  • Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression.

    Fairfax BP, Humburg P, Makino S, Naranbhai V, Wong D, Lau E, Jostins L, Plant K, Andrews R, McGee C and Knight JC

    Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK.

    To systematically investigate the impact of immune stimulation upon regulatory variant activity, we exposed primary monocytes from 432 healthy Europeans to interferon-γ (IFN-γ) or differing durations of lipopolysaccharide and mapped expression quantitative trait loci (eQTLs). More than half of cis-eQTLs identified, involving hundreds of genes and associated pathways, are detected specifically in stimulated monocytes. Induced innate immune activity reveals multiple master regulatory trans-eQTLs including the major histocompatibility complex (MHC), coding variants altering enzyme and receptor function, an IFN-β cytokine network showing temporal specificity, and an interferon regulatory factor 2 (IRF2) transcription factor-modulated network. Induced eQTL are significantly enriched for genome-wide association study loci, identifying context-specific associations to putative causal genes including CARD9, ATM, and IRF8. Thus, applying pathophysiologically relevant immune stimuli assists resolution of functional genetic variants.

    Funded by: Medical Research Council: 98082; Wellcome Trust: 074318, 088891, 090532/Z/09/Z

    Science (New York, N.Y.) 2014;343;6175;1246949

  • Low copy number of the salivary amylase gene predisposes to obesity.

    Falchi M, El-Sayed Moustafa JS, Takousis P, Pesce F, Bonnefond A, Andersson-Assarsson JC, Sudmant PH, Dorajoo R, Al-Shafai MN, Bottolo L, Ozdemir E, So HC, Davies RW, Patrice A, Dent R, Mangino M, Hysi PG, Dechaume A, Huyvaert M, Skinner J, Pigeyre M, Caiazzo R, Raverdy V, Vaillant E, Field S, Balkau B, Marre M, Visvikis-Siest S, Weill J, Poulain-Godefroy O, Jacobson P, Sjostrom L, Hammond CJ, Deloukas P, Sham PC, McPherson R, Lee J, Tai ES, Sladek R, Carlsson LM, Walley A, Eichler EE, Pattou F, Spector TD and Froguel P

    1] Department of Genomics of Common Disease, Imperial College London, London, UK. [2] [3] [4].

    Common multi-allelic copy number variants (CNVs) appear enriched for phenotypic associations compared to their biallelic counterparts. Here we investigated the influence of gene dosage effects on adiposity through a CNV association study of gene expression levels in adipose tissue. We identified significant association of a multi-allelic CNV encompassing the salivary amylase gene (AMY1) with body mass index (BMI) and obesity, and we replicated this finding in 6,200 subjects. Increased AMY1 copy number was positively associated with both amylase gene expression (P = 2.31 × 10(-14)) and serum enzyme levels (P < 2.20 × 10(-16)), whereas reduced AMY1 copy number was associated with increased BMI (change in BMI per estimated copy = -0.15 (0.02) kg/m(2); P = 6.93 × 10(-10)) and obesity risk (odds ratio (OR) per estimated copy = 1.19, 95% confidence interval (CI) = 1.13-1.26; P = 1.46 × 10(-10)). The OR value of 1.19 per copy of AMY1 translates into about an eightfold difference in risk of obesity between subjects in the top (copy number > 9) and bottom (copy number < 4) 10% of the copy number distribution. Our study provides a first genetic link between carbohydrate metabolism and BMI and demonstrates the power of integrated genomic approaches beyond genome-wide association studies.

    Nature genetics 2014

  • Current status and new features of the Consensus Coding Sequence database.

    Farrell CM, O'Leary NA, Harte RA, Loveland JE, Wilming LG, Wallin C, Diekhans M, Barrell D, Searle SM, Aken B, Hiatt SM, Frankish A, Suner MM, Rajput B, Steward CA, Brown GR, Bennett R, Murphy M, Wu W, Kay MP, Hart J, Rajan J, Weber J, Snow C, Riddick LD, Hunt T, Webb D, Thomas M, Tamez P, Rangwala SH, McGarvey KM, Pujar S, Shkeda A, Mudge JM, Gonzalez JM, Gilbert JG, Trevanion SJ, Baertsch R, Harrow JL, Hubbard T, Ostell JM, Haussler D and Pruitt KD

    National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA, Center for Biomolecular Science and Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK and Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.

    The Consensus Coding Sequence (CCDS) project ( is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.

    Nucleic acids research 2014;42;1;D865-72

  • Workshops: a great way to enhance and supplement a degree.

    Fatumo S, Shome S and Macintyre G

    H3Africa Bioinformatics Network (H3ABioNet) Node, National Biotechnology Development Agency (NABDA), Federal Ministry of Science and Technology (FMST), Abuja, Nigeria ; International Health Research Group, Dept of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom ; Genetic Epidemiology Group, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    As part of the International Society for Computational Biology Student Council (ISCB-SC), Regional Student Groups (RSGs) have helped organise workshops in the emerging fields of bioinformatics and computational biology. Workshops are a great way for students to gain hands-on experience and rapidly acquire knowledge in advanced research topics where curriculum-based education is yet to be developed. RSG workshops have improved dissemination of knowledge of the latest bioinformatics techniques and resources among student communities and young scientists, especially in developing nations. This article highlights some of the benefits and challenges encountered while running RSG workshops. Examples cover a variety of subjects, including introductory bioinformatics and advanced bioinformatics, as well as soft skills such as networking, career development, and socializing. The collective experience condensed in this article is a useful starting point for students wishing to organise their own tailor-made workshops.

    PLoS computational biology 2014;10;2;e1003497

  • Pfam: the protein families database.

    Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer EL, Tate J and Punta M

    HHMI Janelia Farm Research Campus, 19700 Helix Drive, Ashburn, VA 20147 USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK, MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, OX1 3QX, UK, Institute of Biotechnology and Department of Biological and Environmental Sciences, University of Helsinki, PO Box 56 (Viikinkaari 5), 00014 Helsinki, Finland and Stockholm Bioinformatics Center, Swedish eScience Research Center, Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, PO Box 1031, SE-17121 Solna, Sweden.

    Pfam, available via servers in the UK ( and the USA (, is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.

    Nucleic acids research 2014;42;1;D222-30

  • Biomarker profiling by nuclear magnetic resonance spectroscopy for the prediction of all-cause mortality: an observational study of 17,345 persons.

    Fischer K, Kettunen J, Würtz P, Haller T, Havulinna AS, Kangas AJ, Soininen P, Esko T, Tammesoo ML, Mägi R, Smit S, Palotie A, Ripatti S, Salomaa V, Ala-Korpela M, Perola M and Metspalu A

    The Estonian Genome Center, University of Tartu, Tartu, Estonia.

    Background: Early identification of ambulatory persons at high short-term risk of death could benefit targeted prevention. To identify biomarkers for all-cause mortality and enhance risk prediction, we conducted high-throughput profiling of blood specimens in two large population-based cohorts.

    106 candidate biomarkers were quantified by nuclear magnetic resonance spectroscopy of non-fasting plasma samples from a random subset of the Estonian Biobank (n = 9,842; age range 18-103 y; 508 deaths during a median of 5.4 y of follow-up). Biomarkers for all-cause mortality were examined using stepwise proportional hazards models. Significant biomarkers were validated and incremental predictive utility assessed in a population-based cohort from Finland (n = 7,503; 176 deaths during 5 y of follow-up). Four circulating biomarkers predicted the risk of all-cause mortality among participants from the Estonian Biobank after adjusting for conventional risk factors: alpha-1-acid glycoprotein (hazard ratio [HR] 1.67 per 1-standard deviation increment, 95% CI 1.53-1.82, p = 5×10(-31)), albumin (HR 0.70, 95% CI 0.65-0.76, p = 2×10(-18)), very-low-density lipoprotein particle size (HR 0.69, 95% CI 0.62-0.77, p = 3×10(-12)), and citrate (HR 1.33, 95% CI 1.21-1.45, p = 5×10(-10)). All four biomarkers were predictive of cardiovascular mortality, as well as death from cancer and other nonvascular diseases. One in five participants in the Estonian Biobank cohort with a biomarker summary score within the highest percentile died during the first year of follow-up, indicating prominent systemic reflections of frailty. The biomarker associations all replicated in the Finnish validation cohort. Including the four biomarkers in a risk prediction score improved risk assessment for 5-y mortality (increase in C-statistics 0.031, p = 0.01; continuous reclassification improvement 26.3%, p = 0.001).

    Conclusions: Biomarker associations with cardiovascular, nonvascular, and cancer mortality suggest novel systemic connectivities across seemingly disparate morbidities. The biomarker profiling improved prediction of the short-term risk of death from all causes above established risk factors. Further investigations are needed to clarify the biological mechanisms and the utility of these biomarkers for guiding screening and prevention. Please see later in the article for the Editors' Summary.

    PLoS medicine 2014;11;2;e1001606

  • Ensembl 2014.

    Flicek P, Amode MR, Barrell D, Beal K, Billis K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fitzgerald S, Gil L, Girón CG, Gordon L, Hourlier T, Hunt S, Johnson N, Juettemann T, Kähäri AK, Keenan S, Kulesha E, Martin FJ, Maurel T, McLaren WM, Murphy DN, Nag R, Overduin B, Pignatelli M, Pritchard B, Pritchard E, Riat HS, Ruffier M, Sheppard D, Taylor K, Thormann A, Trevanion SJ, Vullo A, Wilder SP, Wilson M, Zadissa A, Aken BL, Birney E, Cunningham F, Harrow J, Herrero J, Hubbard TJ, Kinsella R, Muffato M, Parker A, Spudich G, Yates A, Zerbino DR and Searle SM

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Ensembl ( creates tools and data resources to facilitate genomic analysis in chordate species with an emphasis on human, major vertebrate model organisms and farm animals. Over the past year we have increased the number of species that we support to 77 and expanded our genome browser with a new scrollable overview and improved variation and phenotype views. We also report updates to our core datasets and improvements to our gene homology relationships from the addition of new species. Our REST service has been extended with additional support for comparative genomics and ontology information. Finally, we provide updated information about our methods for data access and resources for user training.

    Nucleic acids research 2014;42;1;D749-55

  • De novo mutations in schizophrenia implicate synaptic networks.

    Fromer M, Pocklington AJ, Kavanagh DH, Williams HJ, Dwyer S, Gormley P, Georgieva L, Rees E, Palta P, Ruderfer DM, Carrera N, Humphreys I, Johnson JS, Roussos P, Barker DD, Banks E, Milanova V, Grant SG, Hannon E, Rose SA, Chambert K, Mahajan M, Scolnick EM, Moran JL, Kirov G, Palotie A, McCarroll SA, Holmans P, Sklar P, Owen MJ, Purcell SM and O'Donovan MC

    1] Division of Psychiatric Genomics in the Department of Psychiatry, and Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA [2] Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.

    Inherited alleles account for most of the genetic risk for schizophrenia. However, new (de novo) mutations, in the form of large chromosomal copy number changes, occur in a small fraction of cases and disproportionally disrupt genes encoding postsynaptic proteins. Here we show that small de novo mutations, affecting one or a few nucleotides, are overrepresented among glutamatergic postsynaptic proteins comprising activity-regulated cytoskeleton-associated protein (ARC) and N-methyl-d-aspartate receptor (NMDAR) complexes. Mutations are additionally enriched in proteins that interact with these complexes to modulate synaptic strength, namely proteins regulating actin filament dynamics and those whose messenger RNAs are targets of fragile X mental retardation protein (FMRP). Genes affected by mutations in schizophrenia overlap those mutated in autism and intellectual disability, as do mutation-enriched synaptic pathways. Aligning our findings with a parallel case-control study, we demonstrate reproducible insights into aetiological mechanisms for schizophrenia and reveal pathophysiology shared with other neurodevelopmental disorders.

    Nature 2014

  • Complete Genome Sequence of the WHO International Standard for HIV-1 RNA Determined by Deep Sequencing.

    Gall A, Morris C, Kellam P and Berry N

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    The World Health Organization (WHO) International Standard for HIV-1 RNA nucleic acid assays was characterized by complete genome deep sequencing analysis. The entire coding sequence and flanking long terminal repeats (LTRs), including minority species, were assigned subtype B. This information will aid the design, development, and evaluation of HIV-1 RNA amplification assays.

    Genome announcements 2014;2;1

  • The evolving role of cancer cell line-based screens to define the impact of cancer genomes on drug response.

    Garnett MJ and McDermott U

    Cancer Genome Project, Wellcome Trust Sanger Institute Hinxton, Cambridge, United Kingdom.

    Over the last decade we have witnessed the convergence of two powerful experimental designs toward a common goal of defining the molecular subtypes that underpin the likelihood of a cancer patient responding to treatment in the clinic. The first of these 'experiments' has been the systematic sequencing of large numbers of cancer genomes through the International Cancer Genome Consortium and The Cancer Genome Atlas. This endeavour is beginning to yield a complete catalogue of the cancer genes that are critical for tumourigenesis and amongst which we will find tomorrow's biomarkers and drug targets. The second 'experiment' has been the use of large-scale biological models such as cancer cell lines to correlate mutations in cancer genes with drug sensitivity, such that one could begin to develop rationale clinical trials to begin to test these hypotheses. It is at this intersection of cancer genome sequencing and biological models that there exists the opportunity to completely transform how we stratify cancer patients in the clinic for treatment.

    Current opinion in genetics & development 2014;24C;114-119

  • Subclonal variant calling with multiple samples and prior knowledge.

    Gerstung M, Papaemmanuil E and Campbell PJ

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK; Department of Haematology, Addenbrooke's Hospital, Cambridge CB2 0QQ, UK; Department of Haematology, University of Cambridge, Cambridge CB22XY, UK.

    Motivation: Targeted resequencing of cancer genes in large cohorts of patients is important to understand the biological and clinical consequences of mutations. Cancers are often clonally heterogeneous and the detection of subclonal mutations is important from a diagnostic point of view, but presents strong statistical challenges. Results: Here we present a novel statistical approach for calling mutations from large cohorts of deeply resequenced cancer genes. These data allow for precisely estimating local error profiles and enable detecting mutations with high sensitivity and specificity. Our probabilistic method incorporates knowledge about the distribution of variants in terms of a prior probability. We show that our algorithm has a high accuracy of calling cancer mutations and demonstrate that the detected clonal and subclonal variants have important prognostic consequences. Availability: Code is available as part of the Bioconductor package deepSNV. Contact:,

    Bioinformatics (Oxford, England) 2014

  • Maturation of Induced Pluripotent Stem Cell Derived Hepatocytes by 3D-Culture.

    Gieseck Iii RL, Hannan NR, Bort R, Hanley NA, Drake RA, Cameron GW, Wynn TA and Vallier L

    Wellcome Trust-Medical Research Council Stem Cell Institute, Anne McLaren Laboratory for Regenerative Medicine, Department of Surgery, University of Cambridge, Cambridge, United Kingdom ; Immunopathogenesis Section, Laboratory of Parasitic Diseases, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, United States of America.

    Induced pluripotent stem cell derived hepatocytes (IPSC-Heps) have the potential to reduce the demand for a dwindling number of primary cells used in applications ranging from therapeutic cell infusions to in vitro toxicology studies. However, current differentiation protocols and culture methods produce cells with reduced functionality and fetal-like properties compared to adult hepatocytes. We report a culture method for the maturation of IPSC-Heps using 3-Dimensional (3D) collagen matrices compatible with high throughput screening. This culture method significantly increases functional maturation of IPSC-Heps towards an adult phenotype when compared to conventional 2D systems. Additionally, this approach spontaneously results in the presence of polarized structures necessary for drug metabolism and improves functional longevity to over 75 days. Overall, this research reveals a method to shift the phenotype of existing IPSC-Heps towards primary adult hepatocytes allowing such cells to be a more relevant replacement for the current primary standard.

    PloS one 2014;9;1;e86372

  • Expression and replication studies to identify new candidate genes involved in normal hearing function.

    Girotto G, Vuckovic D, Buniello A, Lorente-Cánovas B, Lewis M, Gasparini P and Steel KP

    Department of Medical Sciences, University of Trieste, Trieste, Italy.

    Considerable progress has been made in identifying deafness genes, but still little is known about the genetic basis of normal variation in hearing function. We recently carried out a Genome Wide Association Study (GWAS) of quantitative hearing traits in southern European populations and found several SNPs with suggestive but none with significant association. In the current study, we followed up these SNPs to investigate which of them might show a genuine association with auditory function using alternative approaches. Firstly, we generated a shortlist of 19 genes from the published GWAS results. Secondly, we carried out immunocytochemistry to examine expression of these 19 genes in the mouse inner ear. Twelve of them showed distinctive cochlear expression patterns. Four showed expression restricted to sensory hair cells (Csmd1, Arsg, Slc16a6 and Gabrg3), one only in marginal cells of the stria vascularis (Dclk1) while the others (Ptprd, Grm8, GlyBP, Evi5, Rimbp2, Ank2, Cdh13) in multiple cochlear cell types. In the third step, we tested these 12 genes for replication of association in an independent set of samples from the Caucasus and Central Asia. Nine out of them showed nominally significant association (p<0.05). In particular, 4 were replicated at the same SNP and with the same effect direction while the remaining 5 showed a significant association in a gene-based test. Finally, to look for genotype-phenotype relationship, the audiometric profiles of the three genotypes of the most strongly associated gene variants were analyzed. Seven out of the 9 replicated genes (CDH13, GRM8, ANK2, SLC16A6, ARSG, RIMBP2 and DCLK1) showed an audiometric pattern with differences between different genotypes further supporting their role in hearing function. These data demonstrate the usefulness of this multistep approach in providing new insights into the molecular basis of hearing and may suggest new targets for treatment and prevention of hearing impairment.

    PloS one 2014;9;1;e85352

  • Genomic epidemiology of Neisseria gonorrhoeae with reduced susceptibility to cefixime in the USA: a retrospective observational study.

    Grad YH, Kirkcaldy RD, Trees D, Dordel J, Harris SR, Goldstein E, Weinstock H, Parkhill J, Hanage WP, Bentley S and Lipsitch M

    Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA; Division of Infectious Diseases, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. Electronic address:

    Background: The emergence of Neisseria gonorrhoeae with decreased susceptibility to extended spectrum cephalosporins raises the prospect of untreatable gonorrhoea. In the absence of new treatments, efforts to slow the increasing incidence of resistant gonococcus require insight into the factors that contribute to its emergence and spread. We assessed the relatedness between isolates in the USA and reconstructed likely spread of lineages through different sexual networks. Methods: We sequenced the genomes of 236 isolates of N gonorrhoeae collected by the Centers for Disease Control and Prevention's Gonococcal Isolate Surveillance Project (GISP) from sentinel public sexually transmitted disease clinics in the USA, including 118 (97%) of the isolates from 2009-10 in GISP with reduced susceptibility to cefixime (cef(RS)) and 118 cefixime-susceptible isolates from GISP matched as closely as possible by location, collection date, and sexual orientation. We assessed the association between antimicrobial resistance genotype and phenotype and correlated phylogenetic clustering with location and sexual orientation. Findings: Mosaic penA XXXIV had a high positive predictive value for cef(RS). We found that two of the 118 cef(RS) isolates lacked a mosaic penA allele, and rechecking showed that these two were susceptible to cefixime. Of the 116 remaining cef(RS) isolates, 114 (98%) fell into two distinct lineages that have independently acquired mosaic penA allele XXXIV. A major lineage of cef(RS) strains spread eastward, predominantly through a sexual network of men who have sex with men. Eight of nine inferred transitions between sexual networks were introductions from men who have sex with men into the heterosexual population. Interpretation: Genomic methods might aid efforts to slow the spread of antibiotic-resistant N gonorrhoeae through augmentation of gonococcal outbreak surveillance and identification of populations that could benefit from increased screening for aymptomatic infections. Funding: American Sexually Transmitted Disease Association, Wellcome Trust, National Institute of General Medical Sciences, and National Institute of Allergy and Infectious Diseases, National Institutes of Health.

    Funded by: NIGMS NIH HHS: U54 GM088558

    The Lancet infectious diseases 2014

  • De Novo Loss-of-Function Mutations in SETD5, Encoding a Methyltransferase in a 3p25 Microdeletion Syndrome Critical Region, Cause Intellectual Disability.

    Grozeva D, Carss K, Spasic-Boskovic O, Parker MJ, Archer H, Firth HV, Park SM, Canham N, Holder SE, Wilson M, Hackett A, Field M, Floyd JA, UK10K Consortium, Hurles M and Raymond FL

    Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, UK.

    To identify further Mendelian causes of intellectual disability (ID), we screened a cohort of 996 individuals with ID for variants in 565 known or candidate genes by using a targeted next-generation sequencing approach. Seven loss-of-function (LoF) mutations-four nonsense (c.1195A>T [p.Lys399(∗)], c.1333C>T [p.Arg445(∗)], c.1866C>G [p.Tyr622(∗)], and c.3001C>T [p.Arg1001(∗)]) and three frameshift (c.2177_2178del [p.Thr726Asnfs(∗)39], c.3771dup [p.Ser1258Glufs(∗)65], and c.3856del [p.Ser1286Leufs(∗)84])-were identified in SETD5, a gene predicted to encode a methyltransferase. All mutations were compatible with de novo dominant inheritance. The affected individuals had moderate to severe ID with additional variable features of brachycephaly; a prominent high forehead with synophrys or striking full and broad eyebrows; a long, thin, and tubular nose; long, narrow upslanting palpebral fissures; and large, fleshy low-set ears. Skeletal anomalies, including significant leg-length discrepancy, were a frequent finding in two individuals. Congenital heart defects, inguinal hernia, or hypospadias were also reported. Behavioral problems, including obsessive-compulsive disorder, hand flapping with ritualized behavior, and autism, were prominent features. SETD5 lies within the critical interval for 3p25 microdeletion syndrome. The individuals with SETD5 mutations showed phenotypic similarity to those previously reported with a deletion in 3p25, and thus loss of SETD5 might be sufficient to account for many of the clinical features observed in this condition. Our findings add to the growing evidence that mutations in genes encoding methyltransferases regulating histone modification are important causes of ID. This analysis provides sufficient evidence that rare de novo LoF mutations in SETD5 are a relatively frequent (0.7%) cause of ID.

    American journal of human genetics 2014

  • A systematic review of definitions of extreme phenotypes of HIV control and progression.

    Gurdasani D, Iles L, Dillon DG, Young EH, Olson AD, Naranbhai V, Fidler S, Gkrania-Klotsas E, Post FA, Kellam P, Porter K and Sandhu MS

    aWellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton bStrangeways Research Laboratory, Department of Public Health and Primary Care, University of Cambridge, Wort's Causeway, Cambridge cMedical Research Council, Clinical Trials Unit, Aviation House, London, UK dCentre for the AIDS Programme of Research in South Africa (CAPRISA), Doris Duke Medical Research Institute, Nelson R Mandela School of Medicine, University of KwaZulu-Natal, Durban, South Africa eWellcome Trust Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford fImperial College Healthcare NHS Trust, London gCambridge University Hospitals NHS Foundation Trust, Department of Infectious Diseases, Addenbrooke's Hospital, Cambridge hKing's College London, Weston Education Centre iDivision of Infection and Immunity, University College London, London, UK.

    The study of individuals at opposite ends of the HIV clinical spectrum can provide invaluable insights into HIV biology. Heterogeneity in criteria used to define these individuals can introduce inconsistencies in results from research and make it difficult to identify biological mechanisms underlying these phenotypes. In this systematic review, we formally quantified the heterogeneity in definitions used for terms referring to extreme phenotypes in the literature, and identified common definitions and components used to describe these phenotypes. We assessed 714 definitions of HIV extreme phenotypes in 501 eligible studies published between 1 January 2000 and 15 March 2012, and identified substantial variation among these. This heterogeneity in definitions may represent important differences in biological endophenotypes and clinical progression profiles of individuals selected by these, suggesting the need for harmonized definitions. In this context, we were able to identify common components in existing definitions that may provide a framework for developing consensus definitions for these phenotypes in HIV infection.

    Funded by: Medical Research Council: G0901213; Wellcome Trust

    AIDS (London, England) 2014;28;2;149-62

  • A GC1 Acinetobacter baumannii isolate carrying AbaR3 and the aminoglycoside resistance transposon TnaphA6 in a conjugative plasmid.

    Hamidian M, Holt KE, Pickard D, Dougan G and Hall RM

    School of Molecular Bioscience, The University of Sydney, NSW 2006, Australia.

    Objectives: To locate the acquired antibiotic resistance genes, including the amikacin resistance transposon TnaphA6, in the genome of an Australian isolate belonging to Acinetobacter baumannii global clone 1 (GC1).

    Methods: A multiply antibiotic-resistant GC1 isolate harbouring TnaphA6 was sequenced using Illumina HiSeq, and reads were used to generate a de novo assembly and determine multilocus sequence types (STs). PCR was used to assemble the AbaR chromosomal resistance island and a large plasmid carrying TnaphA6. Plasmid DNA sequences were compared with ones available in GenBank. Conjugation experiments were conducted.

    Results: The A. baumannii GC1 isolate G7 was shown to include the AbaR3 antibiotic resistance island. It also contains an 8.7 kb cryptic plasmid, pAb-G7-1, and a 70 100 bp plasmid, pAb-G7-2, carrying TnaphA6. pAb-G7-2 belongs to the Aci6 Acinetobacter plasmid family. It encodes transfer functions and was shown to conjugate. Plasmids related to pAb-G7-2 were detected in further amikacin-resistant GC1 isolates using PCR. From the genome sequence, isolate G7 was ST1 (Institut Pasteur scheme) and ST231 (Oxford scheme). Using Oxford scheme PCR-based methods, the isolate was ST109 and this difference was traced to a single base difference resulting from the inclusion of the original primers in the gpi segment analysed.

    Conclusions: The multiply antibiotic-resistant GC1 isolate G7 carries most of its resistance genes in AbaR3 located in the chromosome. However, TnaphA6 is on a conjugative plasmid, pAb-G7-2. Primers developed to locate TnaphA6 in pAb-G7-2 will simplify the detection of plasmids related to pAb-G7-2 in A. baumannii isolates.

    The Journal of antimicrobial chemotherapy 2014;69;4;955-8

  • Identification of a marker for two lineages within the GC1 clone of Acinetobacter baumannii.

    Hamidian M, Wynn M, Holt KE, Pickard D, Dougan G and Hall RM

    School of Molecular Bioscience, The University of Sydney, NSW 2006, Australia.

    The Journal of antimicrobial chemotherapy 2014;69;2;557-8

  • Collateral damage.

    Hancock RE

    Centre for Microbial Diseases and Immunity Research, University of British Columbia, Vancouver, British Columbia, Canada, and the Wellcome Trust Sanger Institute, Hinxton, UK.

    Nature biotechnology 2014;32;1;66-8

  • Haptoglobin (HP) and Haptoglobin-related protein (HPR) copy number variation, natural selection, and trypanosomiasis.

    Hardwick RJ, Ménard A, Sironi M, Milet J, Garcia A, Sese C, Yang F, Fu B, Courtin D and Hollox EJ

    Department of Genetics, University of Leicester, Leicester, UK.

    Haptoglobin, coded by the HP gene, is a plasma protein that acts as a scavenger for free heme, and haptoglobin-related protein (coded by the HPR gene) forms part of the trypanolytic factor TLF-1, together with apolipoprotein L1 (ApoL1). We analyse the polymorphic small intragenic duplication of the HP gene, with alleles Hp1 and Hp2, in 52 populations, and find no evidence for natural selection either from extended haplotype analysis or from correlation with pathogen richness matrices. Using fiber-FISH, the paralog ratio test, and array-CGH data, we also confirm that the HPR gene is copy number variable, with duplication of the whole HPR gene at polymorphic frequencies in west and central Africa, up to an allele frequency of 15 %. The geographical distribution of the HPR duplication allele overlaps the region where the pathogen causing chronic human African trypanosomiasis, Trypanosoma brucei gambiense, is endemic. The HPR duplication has occurred on one SNP haplotype, but there is no strong evidence of extended homozygosity, a characteristic of recent natural selection. The HPR duplication shows a slight, non-significant undertransmission to human African trypanosomiasis-affected children of unaffected parents in the Democratic Republic of Congo. However, taken together with alleles of APOL1, there is an overall significant undertransmission of putative protective alleles to human African trypanosomiasis-affected children.

    Human genetics 2014;133;1;69-83

  • WormBase 2014: new views of curated biology.

    Harris TW, Baran J, Bieri T, Cabunoc A, Chan J, Chen WJ, Davis P, Done J, Grove C, Howe K, Kishore R, Lee R, Li Y, Muller HM, Nakamura C, Ozersky P, Paulini M, Raciti D, Schindelman G, Tuli MA, Van Auken K, Wang D, Wang X, Williams G, Wong JD, Yook K, Schedl T, Hodgkin J, Berriman M, Kersey P, Spieth J, Stein L and Sternberg PW

    Informatics and Bio-computing Platform, Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada, Genome Sequencing Center, Washington University, School of Medicine, St Louis, MO 63108, USA, Division of Biology and Biological Engineering 156-29, California Institute of Technology, Pasadena, CA 91125, USA, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Department of Genetics Campus, Washington University School of Medicine, St. Louis, MO 63110, USA, Genetics Unit, Department of Biochemistry, University of Oxford, Oxford OX1 3QU, UK, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK and Howard Hughes Medical Institute, California Institute of Technology, Pasadena, CA 91125, USA.

    WormBase ( is a highly curated resource dedicated to supporting research using the model organism Caenorhabditis elegans. With an electronic history predating the World Wide Web, WormBase contains information ranging from the sequence and phenotype of individual alleles to genome-wide studies generated using next-generation sequencing technologies. In recent years, we have expanded the contents to include data on additional nematodes of agricultural and medical significance, bringing the knowledge of C. elegans to bear on these systems and providing support for underserved research communities. Manual curation of the primary literature remains a central focus of the WormBase project, providing users with reliable, up-to-date and highly cross-linked information. In this update, we describe efforts to organize the original atomized and highly contextualized curated data into integrated syntheses of discrete biological topics. Next, we discuss our experiences coping with the vast increase in available genome sequences made possible through next-generation sequencing platforms. Finally, we describe some of the features and tools of the new WormBase Web site that help users better find and explore data of interest.

    Funded by: Howard Hughes Medical Institute; Medical Research Council: G070119; NHGRI NIH HHS: U41-HG002223

    Nucleic acids research 2014;42;Database issue;D789-93

  • A novel hybrid SCCmec-mecC region in Staphylococcus sciuri.

    Harrison EM, Paterson GK, Holden MT, Ba X, Rolo J, Morgan FJ, Pichon B, Kearns A, Zadoks RN, Peacock SJ, Parkhill J and Holmes MA

    Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.

    Objectives: Methicillin resistance in Staphylococcus spp. results from the expression of an alternative penicillin-binding protein 2a (encoded by mecA) with a low affinity for β-lactam antibiotics. Recently, a novel variant of mecA known as mecC (formerly mecALGA251) was identified in Staphylococcus aureus isolates from both humans and animals. In this study, we identified two Staphylococcus sciuri subsp. carnaticus isolates from bovine infections that harbour three different mecA homologues: mecA, mecA1 and mecC.

    Methods: We subjected the two isolates to whole-genome sequencing to further understand the genetic context of the mec-containing region. We also used PCR and RT-PCR to investigate the excision and expression of the SCCmec element and mec genes, respectively.

    Results: Whole-genome sequencing revealed a novel hybrid SCCmec region at the orfX locus consisting of a class E mec complex (mecI-mecR1-mecC1-blaZ) located immediately downstream of a staphylococcal cassette chromosome mec (SCCmec) type VII element. A second SCCmec attL site (attL2), which was imperfect, was present downstream of the mecC region. PCR analysis of stationary-phase cultures showed that both the SCCmec type VII element and a hybrid SCCmec-mecC element were capable of excision from the genome and forming a circular intermediate. Transcriptional analysis showed that mecC and mecA, but not mecA1, were both expressed in liquid culture supplemented with oxacillin.

    Conclusions: Overall, this study further highlights that a range of staphylococcal species harbour the mecC gene and furthers the view that coagulase-negative staphylococci associated with animals may act as reservoirs of antibiotic resistance genes for more pathogenic staphylococcal species.

    The Journal of antimicrobial chemotherapy 2014;69;4;911-8

  • The Vertebrate Genome Annotation browser 10 years on.

    Harrow JL, Steward CA, Frankish A, Gilbert JG, Gonzalez JM, Loveland JE, Mudge J, Sheppard D, Thomas M, Trevanion S and Wilming LG

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1HH, UK.

    The Vertebrate Genome Annotation (VEGA) database (, initially designed as a community resource for browsing manual annotation of the human genome project, now contains five reference genomes (human, mouse, zebrafish, pig and rat). Its introduction pages have been redesigned to enable the user to easily navigate between whole genomes and smaller multi-species haplotypic regions of interest such as the major histocompatibility complex. The VEGA browser is unique in that annotation is updated via the Human And Vertebrate Analysis aNd Annotation (HAVANA) update track every 2 weeks, allowing single gene updates to be made publicly available to the research community quickly. The user can now access different haplotypic subregions more easily, such as those from the non-obese diabetic mouse, and display them in a more intuitive way using the comparative tools. We also highlight how the user can browse manually annotated updated patches from the Genome Reference Consortium (GRC).

    Funded by: Biotechnology and Biological Sciences Research Council: BB/K009524/1; NHGRI NIH HHS: 5U54HG004555, U41 HG007234, U54 HG004555; Wellcome Trust: WT098051

    Nucleic acids research 2014;42;Database issue;D771-9

  • Optoactivation of locus ceruleus neurons evokes bidirectional changes in thermal nociception in rats.

    Hickey L, Li Y, Fyson SJ, Watson TC, Perrins R, Hewinson J, Teschemacher AG, Furue H, Lumb BM and Pickering AE

    School of Physiology and Pharmacology, University of Bristol, Bristol BS8 1TD, United Kingdom, Department of Anesthesia, University Hospitals Bristol, Bristol BS2 8HW, United Kingdom, Department of Information Physiology, National Institute for Physiological Sciences, Myodaiji, Okazaki 444-8787, Japan, Wellcome Trust Sanger Institute, Cambridge CB10 1SA, United Kingdom, and Sorbonne Universités, Université Pierre et Marie Curie Paris 6, Unité Mixte de Recherche-Scientifique 8246, Neuroscience Paris Seine, Navigation Memory and Aging team, F-75005 Paris, France.

    Pontospinal noradrenergic neurons are thought to form part of a descending endogenous analgesic system that exerts inhibitory influences on spinal nociception. Using optogenetic targeting, we tested the hypothesis that excitation of the locus ceruleus (LC) is antinociceptive. We transduced rat LC neurons by direct injection of a lentiviral vector expressing channelrhodopsin2 under the control of the PRS promoter. Subsequent optoactivation of the LC evoked repeatable, robust, antinociceptive (+4.7°C ± 1.0, p < 0.0001) or pronociceptive (-4.4°C ± 0.7, p < 0.0001) changes in hindpaw thermal withdrawal thresholds. Post hoc anatomical characterization of the distribution of transduced somata referenced against the position of the optical fiber and subsequent further functional analysis showed that antinociceptive actions were evoked from a distinct, ventral subpopulation of LC neurons. Therefore, the LC is capable of exerting potent, discrete, bidirectional influences on thermal nociception that are produced by specific subpopulations of noradrenergic neurons. This reflects an underlying functional heterogeneity of the influence of the LC on the processing of nociceptive information.

    The Journal of neuroscience : the official journal of the Society for Neuroscience 2014;34;12;4148-60

  • Trypsin- and Chymotrypsin-Like Serine Proteases in Schistosoma mansoni - 'The Undiscovered Country'.

    Horn M, Fajtová P, Rojo Arreola L, Ulrychová L, Bartošová-Sojková P, Franta Z, Protasio AV, Opavský D, Vondrášek J, McKerrow JH, Mareš M, Caffrey CR and Dvořák J

    Institute of Organic Chemistry and Biochemistry, Academy of Sciences of the Czech Republic, Prague, Czech Republic.

    Background: Blood flukes (Schistosoma spp.) are parasites that can survive for years or decades in the vasculature of permissive mammalian hosts, including humans. Proteolytic enzymes (proteases) are crucial for successful parasitism, including aspects of invasion, maturation and reproduction. Most attention has focused on the 'cercarial elastase' serine proteases that facilitate skin invasion by infective schistosome larvae, and the cysteine and aspartic proteases that worms use to digest the blood meal. Apart from the cercarial elastases, information regarding other S. mansoni serine proteases (SmSPs) is limited. To address this, we investigated SmSPs using genomic, transcriptomic, phylogenetic and functional proteomic approaches.

    Genes encoding five distinct SmSPs, termed SmSP1 - SmSP5, some of which comprise disparate protein domains, were retrieved from the S. mansoni genome database and annotated. Reverse transcription quantitative PCR (RT- qPCR) in various schistosome developmental stages indicated complex expression patterns for SmSPs, including their constituent protein domains. SmSP2 stood apart as being massively expressed in schistosomula and adult stages. Phylogenetic analysis segregated SmSPs into diverse clusters of family S1 proteases. SmSP1 to SmSP4 are trypsin-like proteases, whereas SmSP5 is chymotrypsin-like. In agreement, trypsin-like activities were shown to predominate in eggs, schistosomula and adults using peptidyl fluorogenic substrates. SmSP5 is particularly novel in the phylogenetics of family S1 schistosome proteases, as it is part of a cluster of sequences that fill a gap between the highly divergent cercarial elastases and other family S1 proteases.

    Our series of post-genomics analyses clarifies the complexity of schistosome family S1 serine proteases and highlights their interrelationships, including the cercarial elastases and, not least, the identification of a 'missing-link' protease cluster, represented by SmSP5. A framework is now in place to guide the characterization of individual proteases, their stage-specific expression and their contributions to parasitism, in particular, their possible modulation of host physiology.

    PLoS neglected tropical diseases 2014;8;3;e2766

  • Genome-Wide Association Study for Circulating Tissue Plasminogen Activator Levels and Functional Follow-Up Implicates Endothelial STXBP5 and STX2.

    Huang J, Huffman JE, Yamkauchi M, Trompet S, Asselbergs FW, Sabater-Lleal M, Trégouët DA, Chen WM, Smith NL, Kleber ME, Shin SY, Becker DM, Tang W, Dehghan A, Johnson AD, Truong V, Folkersen L, Yang Q, Oudot-Mellkah T, Buckley BM, Moore JH, Williams FM, Campbell H, Silbernagel G, Vitart V, Rudan I, Tofler GH, Navis GJ, Destefano A, Wright AF, Chen MH, de Craen AJ, Worrall BB, Rudnicka AR, Rumley A, Bookman EB, Psaty BM, Chen F, Keene KL, Franco OH, Böhm BO, Uitterlinden AG, Carter AM, Jukema JW, Sattar N, Bis JC, Ikram MA, the Cohorts for Heart and Aging Research in Genome Epidemiology (CHARGE) Consortium Neurology Working Group, Sale MM, McKnight B, Fornage M, Ford I, Taylor K, Slagboom PE, McArdle WL, Hsu FC, Franco-Cereceda A, Goodall AH, Yanek LR, Furie KL, Cushman M, Hofman A, Witteman JC, Folsom AR, Basu S, Matijevic N, van Gilst WH, Wilson JF, Westendorp RG, Kathiresan S, Reilly MP, the CARDIoGRAM Consortium, Tracy RP, Polasek O, Winkelmann BR, Grant PJ, Hillege HL, Cambien F, Stott DJ, Lowe GD, Spector TD, Meigs JB, Marz W, Eriksson P, Becker LC, Morange PE, Soranzo N, Williams SM, Hayward C, van der Harst P, Hamsten A, Lowenstein CJ, Strachan DP, O'Donnell CJ and the CHARGE Consortium Hemostatic Factor Working Group

    From National Heart, Lung, and Blood Institute's Framingham Heart Study, Framingham, MA (J.H., A.D.J., C.J.O.); Division of Intramural Research, National Heart, Lung, and Blood Institute, Bethesda, MD (J.H., A.D.J., C.J.O.); MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, Western General Hospital, Edinburgh, Scotland, United Kingdom (J.E.H., V.V., A.F.W., C.H.); The Aab Cardiovascular Research Institute, Department of Medicine, University of Rochester School of Medicine and Dentistry, Rochester, NY (M.Y., C.J.L.); Departments of Cardiology (S.T., J.W.J.), Gerontology and Geriatrics (S.T., A.J.M.d.C., R.G.J.W.), and Molecular Epidemiology (P.E.S.), Leiden University Medical Center, the Netherlands; Department of Cardiology, Division of Heart and Lungs, University Medical Center Utrecht, Utrecht, the Netherlands (F.W.A.); Durrer Center for Cardiogenetic Research, ICIN-Netherlands Heart Institute, Utrecht, the Netherlands (F.W.A.); Institute of Cardiovascular Science, Faculty of Population Health Sciences, University College London, London, United Kingdom (F.W.A.); Cardiovascular Genetics and Genomics Group, Atherosclerosis Research Unit, Department of Medicine (M.S.-L., L.F., P.E., A.H.), Karolinska Institutet, Karolinska University Hospital, Solna, Stockholm, Sweden; INSERM UMRS 937, Pierre et Marie Curie University, Paris, France (D.-A.T., V.T., T.O.M., F.C.); ICAN Institute for Cardiometabolism and Nutrion, Paris, France (D.-A.T., V.T., F.C.); Departments of Public Health Sciences (W.M.C., B.B.W., F.C.) and Biochemistry and Molecular Genetics (M.M.S.), Center for Public Health Genomics, University of Virginia, Charlottesville, VA; Departments of Epidemiology (N.L.S., B.M.P., B.M.), Medicine (B.M.P., J.C.B.), and Health Services (B.M.P.), University of Washington, Seattle, WA; Group Health Research Institute, Group Health Cooperative, Seattle, WA (N.L.S., B.M.P.); Seattle Epidemiologic Research and Information Center, VA Office of Research and Development, Seattle, WA (N.L.S.); Departments of Internal Medicine II-Cardiology (M.E.K.) and Internal Medicine I (B.O.B.), University of Ulm Medical Centre, Ulm, Germany; Mannheim Institute of Public Health, Medical Faculty of Mannheim, University of Heidelberg, Mannheim, Germany (M.E.K., W.M.); Wellcome Trust Sanger Institute, Hinxton, United Kingdom (S.-Y.S., N.S.); MRC Centre for CAiTE, School of Social and Community Medicine (S.-Y.S.), and ALSPAC Laboratory, Department of Social Medicine (W.L.M.), University of Bristol, Bristol, United Kingdom; Division of Internal Medicine, Johns Hopkins School of Medicine, Baltimore, MD (D.M.B., L.R.Y., L.C.B.); Divisions of Epidemiology and Community Health (W.T., A.R.F.) and Biostatistics (S.B.), University of Minnesota, Minneapolis, MN; Departments of Epidemiology (A.D., O.H.F., M.A.I., A.H., J.C.M.W.), Internal Medicine (A.G.U., P.J.G.), Radiology (M.A.I.), and Neurology (M.A.I.), Erasmus Medical Center, Rotterdam, the Netherlands; Netherlands Consortium of Healthy Aging sponsored by Netherlands Genomics Initiative, Leiden, the Netherlands (A.D., O.H.F., A.G.U., P.E.S., A.H., J.C.M.W., R.G.J.W.); Department of Biostatistics, Boston University, Boston, MA (Q.Y., A.D., M.-H.C.); Department of Pharmacology and Therapeutics, University College Cork, Ireland (B.M.B.); Departments of Genetics (J.H.M., S.M.W.) and Community and Family Medicine (J.H.M.), Gesiel School of Medicine at Dartmouth, Lebanon, NH; Department of Twin Research and Genetic Epidemiology, King's College London, United Kingdom (F.M.K.W., T.D.S., N.S.); Centre for Population Health Sciences, University of Edinburgh, Scotland, United Kingdom (H.C., I.R., J.F.W.); Department of Angiology, Swiss Cardiovascular Center, Bern, Switzerland (G.S.); Royal North Shore Hospital, University of Sydney, Australia (G.H.T.); Departments of Internal Medicine (G.J.N.) and Cardiology (W.H.v.G., H.L.H., P.v.d.H.), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands; Division of Population Health Sciences and Education, St George's University of London, London, United Kingdom (A.R.R., D.P.S.); Institute of Cardiovascular and Medical Sciences (A.R., D.J.S., G.D.L.), and Robertson Center for Biostatistics (I.F.), University of Glasgow, Glasgow, United Kingdom; Division of Genomic Medicine, National Human Genome Research Institute, Bethesda, MD (E.B.B.); Department of Biology and Center for Health Disparities Research, East Carolina University, Greenville, NC (K.L.K.); LKC School of Medicine, Nanyang Technological University, Singapore (B.O.B.); Division of Cardiovascular and Diabetes Research, Leeds University, Leeds, United Kingdom (A.M.C.); Durrer Center for Cardiogenetic Research, Amsterdam, the Netherlands (J.W.J.); Interuniversity Cardiology Institute of the Netherlands, Utrecht, the Netherlands (J.W.J.); BHF Glasgow Cardiovascular Research Centre, Faculty of Medicine, Glasgow, United Kingdom (N.S.); Brown Foundation Institute of Molecular Medicine and Human Genetics Center, Division of Epidemiology, School of Public Health (M.F.), and Hemostasis Laboratory (N.M.), University of Texas Health Science Center at Houston, Houston, TX; Medical Genetics Institute, Cedars-Sinai Medical Center, Los Angeles, CA (K.T.); Department of Biostatistical Sciences, Wake Forest School of Medicine, Winston-Salem, NC (F.-C.H.); Cardiothoracic Surgery Unit, Department of Molecular Medicine and Surgery (A.-F.C.), Karolinska Institutet, Stockholm, Sweden; Department of Cardiovascular Sciences, University of Leicester, Leicester, United Kingdom (A.H.G.); The Warren Alpert Medical School of Brown University, Providence, RI (K.L.F.); Departments of Medicine (M.C.) and Pathology (M.C., R.P.T.), University of Vermont, Burlington, VT; Cardiology Division (S.K., C.J.O.), Cardiovascular Research Center (S.K.), Center for Human Genetic Research (S.K.), and General Medicine Division (J.B.M.), Massachusetts General Hospital, Boston, MA; Program in Medical and Population Genetics, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA (S.K.); The Cardiovascular Institute, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA (M.R.P.); Department of Public Health, Faculty of Medicine, University of Split, Split, Croatia (O.P.); Cardiology Team Sachsenhausen, Frankfurt am Main, Germany (B.R.W.); Department of Medicine, Harvard Medical School, Boston, MA (J.B.M.); Synlab Academy, Mannheim, Germany (W.M.); Clinical Institute of Medical and Chemical Laboratory Diagnostics, Medical University of Graz, Graz, Austria (W.M.); and INSERM UMRS 1062, Aix-Marseille Université, Marseille, France (P.-E.M.).

    Objective: Tissue plasminogen activator (tPA), a serine protease, catalyzes the conversion of plasminogen to plasmin, the major enzyme responsible for endogenous fibrinolysis. In some populations, elevated plasma levels of tPA have been associated with myocardial infarction and other cardiovascular diseases. We conducted a meta-analysis of genome-wide association studies to identify novel correlates of circulating levels of tPA.

    Fourteen cohort studies with tPA measures (N=26 929) contributed to the meta-analysis. Three loci were significantly associated with circulating tPA levels (P<5.0×10(-8)). The first locus is on 6q24.3, with the lead single nucleotide polymorphism (SNP; rs9399599; P=2.9×10(-14)) within STXBP5. The second locus is on 8p11.21. The lead SNP (rs3136739; P=1.3×10(-9)) is intronic to POLB and <200 kb away from the tPA encoding the gene PLAT. We identified a nonsynonymous SNP (rs2020921) in modest linkage disequilibrium with rs3136739 (r(2)=0.50) within exon 5 of PLAT (P=2.0×10(-8)). The third locus is on 12q24.33, with the lead SNP (rs7301826; P=1.0×10(-9)) within intron 7 of STX2. We further found evidence for the association of lead SNPs in STXBP5 and STX2 with expression levels of the respective transcripts. In in vitro cell studies, silencing STXBP5 decreased the release of tPA from vascular endothelial cells, whereas silencing STX2 increased the tPA release. Through an in silico lookup, we found no associations of the 3 lead SNPs with coronary artery disease or stroke.

    Conclusions: We identified 3 loci associated with circulating tPA levels, the PLAT region, STXBP5, and STX2. Our functional studies implicate a novel role for STXBP5 and STX2 in regulating tPA release.

    Arteriosclerosis, thrombosis, and vascular biology 2014

  • Insertional Mutagenesis and Deep Profiling Reveals Gene Hierarchies and a Myc/p53-Dependent Bottleneck in Lymphomagenesis.

    Huser CA, Gilroy KL, de Ridder J, Kilbey A, Borland G, Mackay N, Jenkins A, Bell M, Herzyk P, van der Weyden L, Adams DJ, Rust AG, Cameron E and Neil JC

    Centre for Virus Research, Institute of Infection, Immunity and Inflammation, College of Medicine, Veterinary Medicine and Life Sciences, University of Glasgow, Glasgow, United Kingdom.

    Retroviral insertional mutagenesis (RIM) is a powerful tool for cancer genomics that was combined in this study with deep sequencing (RIM/DS) to facilitate a comprehensive analysis of lymphoma progression. Transgenic mice expressing two potent collaborating oncogenes in the germ line (CD2-MYC, -Runx2) develop rapid onset tumours that can be accelerated and rendered polyclonal by neonatal Moloney murine leukaemia virus (MoMLV) infection. RIM/DS analysis of 28 polyclonal lymphomas identified 771 common insertion sites (CISs) defining a 'progression network' that encompassed a remarkably large fraction of known MoMLV target genes, with further strong indications of oncogenic selection above the background of MoMLV integration preference. Progression driven by RIM was characterised as a Darwinian process of clonal competition engaging proliferation control networks downstream of cytokine and T-cell receptor signalling. Enhancer mode activation accounted for the most efficiently selected CIS target genes, including Ccr7 as the most prominent of a set of chemokine receptors driving paracrine growth stimulation and lymphoma dissemination. Another large target gene subset including candidate tumour suppressors was disrupted by intragenic insertions. A second RIM/DS screen comparing lymphomas of wild-type and parental transgenics showed that CD2-MYC tumours are virtually dependent on activation of Runx family genes in strong preference to other potent Myc collaborating genes (Gfi1, Notch1). Ikzf1 was identified as a novel collaborating gene for Runx2 and illustrated the interface between integration preference and oncogenic selection. Lymphoma target genes for MoMLV can be classified into (a) a small set of master regulators that confer self-renewal; overcoming p53 and other failsafe pathways and (b) a large group of progression genes that control autonomous proliferation in transformed cells. These findings provide insights into retroviral biology, human cancer genetics and the safety of vector-mediated gene therapy.

    PLoS genetics 2014;10;2;e1004167

  • Who watches the watchmen? An appraisal of benchmarks for multiple sequence alignment.

    Iantorno S, Gori K, Goldman N, Gil M and Dessimoz C

    Wellcome Trust Sanger Institute, Cambridge, UK.

    Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique in bioinformatics used to infer related residues among biological sequences. Thus alignment accuracy is crucial to a vast range of analyses, often in ways difficult to assess in those analyses. To compare the performance of different aligners and help detect systematic errors in alignments, a number of benchmarking strategies have been pursued. Here we present an overview of the main strategies-based on simulation, consistency, protein structure, and phylogeny-and discuss their different advantages and associated risks. We outline a set of desirable characteristics for effective benchmarking, and evaluate each strategy in light of them. We conclude that there is currently no universally applicable means of benchmarking MSA, and that developers and users of alignment tools should base their choice of benchmark depending on the context of application-with a keen awareness of the assumptions underlying each benchmarking strategy.

    Methods in molecular biology (Clifton, N.J.) 2014;1079;59-73

  • The genomic basis of vomeronasal-mediated behaviour.

    Ibarra-Soria X, Levitin MO and Logan DW

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    The vomeronasal organ (VNO) is a chemosensory subsystem found in the nose of most mammals. It is principally tasked with detecting pheromones and other chemical signals that initiate innate behavioural responses. The VNO expresses subfamilies of vomeronasal receptors (VRs) in a cell-specific manner: each sensory neuron expresses just one or two receptors and silences all the other receptor genes. VR genes vary greatly in number within mammalian genomes, from no functional genes in some primates to many hundreds in rodents. They bind semiochemicals, some of which are also encoded in gene families that are coexpanded in species with correspondingly large VR repertoires. Protein and peptide cues that activate the VNO tend to be expressed in exocrine tissues in sexually dimorphic, and sometimes individually variable, patterns. Few chemical ligand-VR-behaviour relationships have been fully elucidated to date, largely due to technical difficulties in working with large, homologous gene families with high sequence identity. However, analysis of mouse lines with mutations in genes involved in ligand-VR signal transduction has revealed that the VNO mediates a range of social behaviours, including male-male and maternal aggression, sexual attraction, lordosis, and selective pregnancy termination, as well as interspecific responses such as avoidance and defensive behaviours. The unusual logic of VR expression now offers an opportunity to map the specific neural circuits that drive these behaviours.

    Mammalian genome : official journal of the International Mammalian Genome Society 2014;25;1-2;75-86

  • A novel RCE1 isoform is required for H-Ras plasma membrane localization and is regulated by USP17.

    Jaworski J, Govender U, McFarlane C, de la Vega M, Greene MK, Rawlings ND, Johnston JA, Scott CJ and Burrows JF

    *School of Pharmacy, Queen's University Belfast, McClay Research Building, 97 Lisburn Road, Belfast BT9 7BL, U.K.

    Processing of the 'CaaX' motif found on the C-termini of many proteins, including the proto-oncogene Ras, requires the ER (endoplasmic reticulum)-resident protease RCE1 (Ras-converting enzyme 1) and is necessary for the proper localization and function of many of these 'CaaX' proteins. In the present paper, we report that several mammalian species have a novel isoform (isoform 2) of RCE1 resulting from an alternate splice site and producing an N-terminally truncated protein. We demonstrate that both RCE1 isoform 1 and the newly identified isoform 2 are required to reinstate proper H-Ras processing and thus plasma membrane localization in RCE1-null cells. In addition, we show that the deubiquitinating enzyme USP17 (ubiquitin-specific protease 17), previously shown to modulate RCE1 activity, can regulate the abundance and localization of isoform 2. Furthermore, we show that isoform 2 is ubiquitinated on Lys43 and deubiquitinated by USP17. Collectively, the findings of the present study indicate that RCE1 isoform 2 is required for proper 'CaaX' processing and that USP17 can regulate this via its modulation of RCE1 isoform 2 ubiquitination.

    The Biochemical journal 2014;457;2;289-300

  • RNA-seq Analysis of Host and Viral Gene Expression Highlights Interaction between Varicella Zoster Virus and Keratinocyte Differentiation.

    Jones M, Dry IR, Frampton D, Singh M, Kanda RK, Yee MB, Kellam P, Hollinshead M, Kinchington PR, O'Toole EA and Breuer J

    Division of Infection and Immunity, University College London, London, United Kingdom.

    Varicella zoster virus (VZV) is the etiological agent of chickenpox and shingles, diseases characterized by epidermal skin blistering. Using a calcium-induced keratinocyte differentiation model we investigated the interaction between epidermal differentiation and VZV infection. RNA-seq analysis showed that VZV infection has a profound effect on differentiating keratinocytes, altering the normal process of epidermal gene expression to generate a signature that resembles patterns of gene expression seen in both heritable and acquired skin-blistering disorders. Further investigation by real-time PCR, protein analysis and electron microscopy revealed that VZV specifically reduced expression of specific suprabasal cytokeratins and desmosomal proteins, leading to disruption of epidermal structure and function. These changes were accompanied by an upregulation of kallikreins and serine proteases. Taken together VZV infection promotes blistering and desquamation of the epidermis, both of which are necessary to the viral spread and pathogenesis. At the same time, analysis of the viral transcriptome provided evidence that VZV gene expression was significantly increased following calcium treatment of keratinocytes. Using reporter viruses and immunohistochemistry we confirmed that VZV gene and protein expression in skin is linked with cellular differentiation. These studies highlight the intimate host-pathogen interaction following VZV infection of skin and provide insight into the mechanisms by which VZV remodels the epidermal environment to promote its own replication and spread.

    PLoS pathogens 2014;10;1;e1003896

  • The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data.

    Köhler S, Doelken SC, Mungall CJ, Bauer S, Firth HV, Bailleul-Forestier I, Black GC, Brown DL, Brudno M, Campbell J, Fitzpatrick DR, Eppig JT, Jackson AP, Freson K, Girdea M, Helbig I, Hurst JA, Jähn J, Jackson LG, Kelly AM, Ledbetter DH, Mansour S, Martin CL, Moss C, Mumford A, Ouwehand WH, Park SM, Riggs ER, Scott RH, Sisodiya S, Vooren SV, Wapner RJ, Wilkie AO, Wright CF, Vulto-van Silfhout AT, Leeuw Nd, de Vries BB, Washingthon NL, Smith CL, Westerfield M, Schofield P, Ruef BJ, Gkoutos GV, Haendel M, Smedley D, Lewis SE and Robinson PN

    Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany, Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany, Lawrence Berkeley National Laboratory, Mail Stop 84R0171, Berkeley, CA 94720, USA, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK, Department of Medical Genetics, Cambridge University Addenbrooke's Hospital, Cambridge CB2 2QQ, UK, Université Paul Sabatier, Faculté de Chirurgie Dentaire, CHU Toulouse, France, Centre for Genomic Medicine, Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Sciences Centre (MAHSC), Manchester, UK, Centre for Genomic Medicine, Institute of Human Development, Faculty of Medical and Human Sciences, University of Manchester, MAHSC, Manchester M13 9WL, UK, Institute of Genetic Medicine. Newcastle University, Central Parkway, Newcastle upon Tyne, NE1 3BZ, UK, Department of Computer Science, University of Toronto, Ontario, Canada, Centre for Computational Medicine, Hospital for Sick Children, Toronto, Ontario, Canada, Department of Clinical Genetics, Leeds Teaching Hospitals NHS Trust, Leeds LS2 9NS, UK, MRC Human Genetics Unit, MRC Institute of Genetic and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK, The Jackson Laboratory, Bar Harbor, ME 04609, USA, Center for Molecular and Vascular Biology, University of Leuven, Belgium, Department of Neuropediatrics, University Medical Center Schleswig-Holstein, Kiel Campus, 24105 Kiel, Germany, NE Thames Genetics Service, Great Ormond Street Hospital, London WC1N 3JH, UK, Drexel University College of Medicine, Philadelphia, PA 19102, USA, Department of Haematology, University of Cambridge and NHS Blood and Transplant Cambridge, CB2 0PT Cambridge, UK, Autism and Developmental Medicine Institute, Geisinger Health System

    The Human Phenotype Ontology (HPO) project, available at, provides a structured, comprehensive and well-defined set of 10,088 classes (terms) describing human phenotypic abnormalities and 13,326 subclass relations between the HPO classes. In addition we have developed logical definitions for 46% of all HPO classes using terms from ontologies for anatomy, cell types, function, embryology, pathology and other domains. This allows interoperability with several resources, especially those containing phenotype information on model organisms such as mouse and zebrafish. Here we describe the updated HPO database, which provides annotations of 7,278 human hereditary syndromes listed in OMIM, Orphanet and DECIPHER to classes of the HPO. Various meta-attributes such as frequency, references and negations are associated with each annotation. Several large-scale projects worldwide utilize the HPO for describing phenotype information in their datasets. We have therefore generated equivalence mappings to other phenotype vocabularies such as LDDB, Orphanet, MedDRA, UMLS and phenoDB, allowing integration of existing datasets and interoperability with multiple biomedical resources. We have created various ways to access the HPO database content using flat files, a MySQL database, and Web-based tools. All data and documentation on the HPO project can be found online.

    Funded by: NIH HHS: R24 OD011883

    Nucleic acids research 2014;42;1;D966-74

  • A transcriptional switch underlies commitment to sexual development in malaria parasites.

    Kafsack BF, Rovira-Graells N, Clark TG, Bancells C, Crowley VM, Campino SG, Williams AE, Drought LG, Kwiatkowski DP, Baker DA, Cortés A and Llinás M

    1] Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA [2] Division of Basic Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA (B.F.C.K.); Department of Molecular Biology and Center for Infectious Disease Dynamics, The Pennsylvania State University, State College, Pennsylvania 16802, USA (V.M.C., M.L.).

    The life cycles of many parasites involve transitions between disparate host species, requiring these parasites to go through multiple developmental stages adapted to each of these specialized niches. Transmission of malaria parasites (Plasmodium spp.) from humans to the mosquito vector requires differentiation from asexual stages replicating within red blood cells into non-dividing male and female gametocytes. Although gametocytes were first described in 1880, our understanding of the molecular mechanisms involved in commitment to gametocyte formation is extremely limited, and disrupting this critical developmental transition remains a long-standing goal. Here we show that expression levels of the DNA-binding protein PfAP2-G correlate strongly with levels of gametocyte formation. Using independent forward and reverse genetics approaches, we demonstrate that PfAP2-G function is essential for parasite sexual differentiation. By combining genome-wide PfAP2-G cognate motif occurrence with global transcriptional changes resulting from PfAP2-G ablation, we identify early gametocyte genes as probable targets of PfAP2-G and show that their regulation by PfAP2-G is critical for their wild-type level expression. In the asexual blood-stage parasites pfap2-g appears to be among a set of epigenetically silenced loci prone to spontaneous activation. Stochastic activation presents a simple mechanism for a low baseline of gametocyte production. Overall, these findings identify PfAP2-G as a master regulator of sexual-stage development in malaria parasites and mark the first discovery of a transcriptional switch controlling a differentiation decision in protozoan parasites.

    Funded by: Biotechnology and Biological Sciences Research Council; Howard Hughes Medical Institute; Medical Research Council: G0600230, J005398; NIAID NIH HHS: R01 AI076276; NIGMS NIH HHS: P50GM071508; Wellcome Trust: 090532/Z/09/Z, 094752, 098051

    Nature 2014;507;7491;248-52

  • Antibacterial resistance in sub-Saharan Africa: an underestimated emergency.

    Kariuki S and Dougan G

    Centre for Microbiology Research, Kenya Medical Research Institute, Nairobi, Kenya; Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    Antibacterial resistance-associated infections are known to increase morbidity, mortality, and cost of treatment, and to potentially put others in the community at higher risk of infections. In high-income countries, where the burden of infectious diseases is relatively modest, resistance to first-line antibacterial agents is usually overcome by use of second- and third-line agents. However, in developing countries where the burden of infectious diseases is high, patients with antibacterial-resistant infections may be unable to obtain or afford effective second-line treatments. In sub-Saharan Africa (SSA), the situation is aggravated by poor hygiene, unreliable water supplies, civil conflicts, and increasing numbers of immunocompromised people, such as those with HIV, which facilitate both the evolution of resistant pathogens and their rapid spread in the community. Because of limited capacity for disease detection and surveillance, the burden of illnesses due to treatable bacterial infections, their specific etiologies, and the awareness of antibacterial resistance are less well established in most of SSA, and therefore the ability to mitigate their consequences is significantly limited.

    Annals of the New York Academy of Sciences 2014

  • Kdm3a lysine demethylase is an Hsp90 client required for cytoskeletal rearrangements during spermatogenesis.

    Kasioulis I, Syred HM, Tate P, Finch A, Shaw J, Seawright A, Fuszard M, Botting CH, Shirran S, Adams IR, Jackson IJ, van Heyningen V and Yeyati PL

    MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, EH4 2XU, UK Edinburgh Cancer Research UK Centre, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, EH4 2XU, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1HH, UK Biomedical Sciences Research Complex Mass Spectrometry and Proteomics Facility, BMS Annexe, North Haugh, University of St Andrews, St Andrews, Fife, KY16 9ST, UK.

    The lysine demethylase Kdm3a (Jhdm2a, Jmjd1a) is required for male fertility, sex determination and metabolic homeostasis through its nuclear role in chromatin remodeling. Many histone-modifying enzymes have additional non-histone substrates, as well as non-enzymatic functions, contributing to the full spectrum of events underlying their biological roles. Here, we present two Kdm3a mouse models that exhibit cytoplasmic defects that may account in part for the globozoospermia phenotype reported previously. Electron microscopy revealed abnormal acrosome, manchette and the absence of implantation fossa at the caudal end of the nucleus in mice without Kdm3a demethylase activity, thus affecting cytoplasmic structures required to elongate the sperm head. We describe an enzymatically active new Kdm3a isoform and show that subcellular distribution, protein levels and lysine demethylation activity of Kdm3a depended on Hsp90. We show that Kdm3a localizes to cytoplasmic structures of maturing spermatids affected in Kdm3a mutant mice which in turn display altered fractionation of β-actin and γ-tubulin. Kdm3a is therefore a multi-functional Hsp90 client protein that participates directly in the regulation of cytoskeletal components.

    Molecular biology of the cell 2014

  • Managing clinically significant findings in research: the UK10K example.

    Kaye J, Hurles M, Griffin H, Grewal J, Bobrow M, Timpson N, Smee C, Bolton P, Durbin R, Dyke S, Fitzpatrick D, Kennedy K, Kent A, Muddyman D, Muntoni F, Raymond LF, Semple R and Spector T

    Nuffield Department of Population Health, HeLEX - Centre for Health, Law and Emerging Technologies, University of Oxford, Oxford, UK.

    Recent advances in sequencing technology allow data on the human genome to be generated more quickly and in greater detail than ever before. Such detail includes findings that may be of significance to the health of the research participant involved. Although research studies generally do not feed back information on clinically significant findings (CSFs) to participants, this stance is increasingly being questioned. There may be difficulties and risks in feeding clinically significant information back to research participants, however, the UK10K consortium sought to address these by creating a detailed management pathway. This was not intended to create any obligation upon the researchers to feed back any CSFs they discovered. Instead, it provides a mechanism to ensure that any such findings can be passed on to the participant where appropriate. This paper describes this mechanism and the specific criteria, which must be fulfilled in order for a finding and participant to qualify for feedback. This mechanism could be used by future research consortia, and may also assist in the development of sound principles for dealing with CSFs.European Journal of Human Genetics advance online publication, 15 January 2014; doi:10.1038/ejhg.2013.290.

    European journal of human genetics : EJHG 2014

  • Expression of phosphofructokinase in skeletal muscle is influenced by genetic variation and associated with insulin sensitivity.

    Keildson S, Fadista J, Ladenvall C, Hedman AK, Elgzyri T, Small KS, Grundberg E, Nica AC, Glass D, Richards JB, Barrett A, Nisbet J, Zheng HF, Rönn T, Ström K, Eriksson KF, Prokopenko I, MAGIC Consortium, DIAGRAM Consortium, MuTHER Consortium, Spector TD, Dermitzakis ET, Deloukas P, McCarthy MI, Rung J, Groop L, Franks PW, Lindgren CM and Hansson O

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, U.K.

    Using an integrative approach in which genetic variation, gene expression, and clinical phenotypes are assessed in relevant tissues may help functionally characterize the contribution of genetics to disease susceptibility. We sought to identify genetic variation influencing skeletal muscle gene expression (expression quantitative trait loci [eQTLs]) as well as expression associated with measures of insulin sensitivity. We investigated associations of 3,799,401 genetic variants in expression of >7,000 genes from three cohorts (n = 104). We identified 287 genes with cis-acting eQTLs (false discovery rate [FDR] <5%; P < 1.96 × 10(-5)) and 49 expression-insulin sensitivity phenotype associations (i.e., fasting insulin, homeostasis model assessment-insulin resistance, and BMI) (FDR <5%; P = 1.34 × 10(-4)). One of these associations, fasting insulin/phosphofructokinase (PFKM), overlaps with an eQTL. Furthermore, the expression of PFKM, a rate-limiting enzyme in glycolysis, was nominally associated with glucose uptake in skeletal muscle (P = 0.026; n = 42) and overexpressed (Bonferroni-corrected P = 0.03) in skeletal muscle of patients with T2D (n = 102) compared with normoglycemic controls (n = 87). The PFKM eQTL (rs4547172; P = 7.69 × 10(-6)) was nominally associated with glucose uptake, glucose oxidation rate, intramuscular triglyceride content, and metabolic flexibility (P = 0.016-0.048; n = 178). We explored eQTL results using published data from genome-wide association studies (DIAGRAM and MAGIC), and a proxy for the PFKM eQTL (rs11168327; r(2) = 0.75) was nominally associated with T2D (DIAGRAM P = 2.7 × 10(-3)). Taken together, our analysis highlights PFKM as a potential regulator of skeletal muscle insulin sensitivity.

    Diabetes 2014;63;3;1154-65

  • The Impact of Different DNA Extraction Kits and Laboratories upon the Assessment of Human Gut Microbiota Composition by 16S rRNA Gene Sequencing.

    Kennedy NA, Walker AW, Berry SH, Duncan SH, Farquarson FM, Louis P, Thomson JM, Other members not named within the manuscript author list (alphabetical by surname):, Satsangi J, Flint HJ, Parkhill J, Lees CW and Hold GL

    Gastrointestinal Unit, Centre for Genomic and Experimental Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom.

    Introduction: Determining bacterial community structure in fecal samples through DNA sequencing is an important facet of intestinal health research. The impact of different commercially available DNA extraction kits upon bacterial community structures has received relatively little attention. The aim of this study was to analyze bacterial communities in volunteer and inflammatory bowel disease (IBD) patient fecal samples extracted using widely used DNA extraction kits in established gastrointestinal research laboratories.

    Methods: Fecal samples from two healthy volunteers (H3 and H4) and two relapsing IBD patients (I1 and I2) were investigated. DNA extraction was undertaken using MoBio Powersoil and MP Biomedicals FastDNA SPIN Kit for Soil DNA extraction kits. PCR amplification for pyrosequencing of bacterial 16S rRNA genes was performed in both laboratories on all samples. Hierarchical clustering of sequencing data was done using the Yue and Clayton similarity coefficient.

    Results: DNA extracted using the FastDNA kit and the MoBio kit gave median DNA concentrations of 475 (interquartile range 228-561) and 22 (IQR 9-36) ng/µL respectively (p<0.0001). Hierarchical clustering of sequence data by Yue and Clayton coefficient revealed four clusters. Samples from individuals H3 and I2 clustered by patient; however, samples from patient I1 extracted with the MoBio kit clustered with samples from patient H4 rather than the other I1 samples. Linear modelling on relative abundance of common bacterial families revealed significant differences between kits; samples extracted with MoBio Powersoil showed significantly increased Bacteroidaceae, Ruminococcaceae and Porphyromonadaceae, and lower Enterobacteriaceae, Lachnospiraceae, Clostridiaceae, and Erysipelotrichaceae (p<0.05).

    Conclusion: This study demonstrates significant differences in DNA yield and bacterial DNA composition when comparing DNA extracted from the same fecal sample with different extraction kits. This highlights the importance of ensuring that samples in a study are prepared with the same method, and the need for caution when cross-comparing studies that use different methods.

    PloS one 2014;9;2;e88982

  • Ensembl Genomes 2013: scaling up access to genome-wide data.

    Kersey PJ, Allen JE, Christensen M, Davis P, Falin LJ, Grabmueller C, Hughes DS, Humphrey J, Kerhornou A, Khobova J, Langridge N, McDowall MD, Maheswari U, Maslen G, Nuhn M, Ong CK, Paulini M, Pedro H, Toneva I, Tuli MA, Walts B, Williams G, Wilson D, Youens-Clark K, Monaco MK, Stein J, Wei X, Ware D, Bolser DM, Howe KL, Kulesha E, Lawson D and Staines DM

    The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Wellcome Trust Sanger Centre, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK, Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, NY 11724, USA and USDA-ARS, Cornell University, Ithaca, NY, 14853, USA.

    Ensembl Genomes ( is an integrating resource for genome-scale data from non-vertebrate species. The project exploits and extends technologies for genome annotation, analysis and dissemination, developed in the context of the vertebrate-focused Ensembl project, and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. This article provides an update to the previous publications about the resource, with a focus on recent developments. These include the addition of important new genomes (and related data sets) including crop plants, vectors of human disease and eukaryotic pathogens. In addition, the resource has scaled up its representation of bacterial genomes, and now includes the genomes of over 9000 bacteria. Specific extensions to the web and programmatic interfaces have been developed to support users in navigating these large data sets. Looking forward, analytic tools to allow targeted selection of data for visualization and download are likely to become increasingly important in future as the number of available genomes increases within all domains of life, and some of the challenges faced in representing bacterial data are likely to become commonplace for eukaryotes in future.

    Nucleic acids research 2014;42;1;D546-52

  • Cancer mouse models: Past, present and future.

    Khaled WT and Liu P

    Department of Pharmacology, University of Cambridge, Cambridge CB2 1PD, UK. Electronic address:

    The development and advances in gene targeting technology over the past three decades has facilitated the generation of cancer mouse models that recapitulate features of human malignancies. These models have been and still remain instrumental in revealing the complexities of human cancer biology. However, they will need to evolve in the post-genomic era of cancer research. In this review we will highlight some of the key developments over the past decades and will discuss the new possibilities of cancer mouse models in the light of emerging powerful gene manipulating tools.

    Seminars in cell & developmental biology 2014

  • Determinants of invasiveness beneath the capsule of the pneumococcus.

    Klugman KP, Bentley SD and McGee L

    Department of Global Health, Emory University.

    The Journal of infectious diseases 2014;209;3;321-2

  • The International Mouse Phenotyping Consortium Web Portal, a unified point of access for knockout mice and related phenotyping data.

    Koscielny G, Yaikhom G, Iyer V, Meehan TF, Morgan H, Atienza-Herrero J, Blake A, Chen CK, Easty R, Di Fenza A, Fiegel T, Grifiths M, Horne A, Karp NA, Kurbatova N, Mason JC, Matthews P, Oakley DJ, Qazi A, Regnart J, Retha A, Santos LA, Sneddon DJ, Warren J, Westerberg H, Wilson RJ, Melvin DG, Smedley D, Brown SD, Flicek P, Skarnes WC, Mallon AM and Parkinson H

    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Medical Research Council Harwell (Mammalian Genetics Unit and Mary Lyon Centre), Harwell, Oxfordshire OX11 0RD, UK and Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    The International Mouse Phenotyping Consortium (IMPC) web portal ( provides the biomedical community with a unified point of access to mutant mice and rich collection of related emerging and existing mouse phenotype data. IMPC mouse clinics worldwide follow rigorous highly structured and standardized protocols for the experimentation, collection and dissemination of data. Dedicated 'data wranglers' work with each phenotyping center to collate data and perform quality control of data. An automated statistical analysis pipeline has been developed to identify knockout strains with a significant change in the phenotype parameters. Annotation with biomedical ontologies allows biologists and clinicians to easily find mouse strains with phenotypic traits relevant to their research. Data integration with other resources will provide insights into mammalian gene function and human disease. As phenotype data become available for every gene in the mouse, the IMPC web portal will become an invaluable tool for researchers studying the genetic contributions of genes to human diseases.

    Nucleic acids research 2014;42;1;D802-9

  • A Linguistically Informed Autosomal STR Survey of Human Populations Residing in the Greater Himalayan Region.

    Kraaijenbrink T, van der Gaag KJ, Zuniga SB, Xue Y, Carvalho-Silva DR, Tyler-Smith C, Jobling MA, Parkin EJ, Su B, Shi H, Xiao CJ, Tang WR, Kashyap VK, Trivedi R, Sitalaximi T, Banerjee J, Gaselô KT, Tuladhar NM, Opgenort JR, van Driem GL, Barbujani G and de Knijff P

    MGC Department of Human and Clinical Genetics, Leiden University Medical Centre, Leiden, the Netherlands.

    The greater Himalayan region demarcates two of the most prominent linguistic phyla in Asia: Tibeto-Burman and Indo-European. Previous genetic surveys, mainly using Y-chromosome polymorphisms and/or mitochondrial DNA polymorphisms suggested a substantially reduced geneflow between populations belonging to these two phyla. These studies, however, have mainly focussed on populations residing far to the north and/or south of this mountain range, and have not been able to study geneflow patterns within the greater Himalayan region itself. We now report a detailed, linguistically informed, genetic survey of Tibeto-Burman and Indo-European speakers from the Himalayan countries Nepal and Bhutan based on autosomal microsatellite markers and compare these populations with surrounding regions. The genetic differentiation between populations within the Himalayas seems to be much higher than between populations in the neighbouring countries. We also observe a remarkable genetic differentiation between the Tibeto-Burman speaking populations on the one hand and Indo-European speaking populations on the other, suggesting that language and geography have played an equally large role in defining the genetic composition of present-day populations within the Himalayas.

    PloS one 2014;9;3;e91534

  • High risk population isolate reveals low frequency variants predisposing to intracranial aneurysms.

    Kurki MI, Gaál EI, Kettunen J, Lappalainen T, Menelaou A, Anttila V, van 't Hof FN, von Und Zu Fraunberg M, Helisalmi S, Hiltunen M, Lehto H, Laakso A, Kivisaari R, Koivisto T, Ronkainen A, Rinne J, Kiemeney LA, Vermeulen SH, Kaunisto MA, Eriksson JG, Aromaa A, Perola M, Lehtimäki T, Raitakari OT, Salomaa V, Gunel M, Dermitzakis ET, Ruigrok YM, Rinkel GJ, Niemelä M, Hernesniemi J, Ripatti S, de Bakker PI, Palotie A and Jääskeläinen JE

    Neurosurgery, NeuroCenter, Kuopio University Hospital, Kuopio, Finland ; Neurosurgery, Institute of Clinical Medicine, University of Eastern Finland, Kuopio, Finland ; Department of Neurobiology, A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio, Finland.

    3% of the population develops saccular intracranial aneurysms (sIAs), a complex trait, with a sporadic and a familial form. Subarachnoid hemorrhage from sIA (sIA-SAH) is a devastating form of stroke. Certain rare genetic variants are enriched in the Finns, a population isolate with a small founder population and bottleneck events. As the sIA-SAH incidence in Finland is >2× increased, such variants may associate with sIA in the Finnish population. We tested 9.4 million variants for association in 760 Finnish sIA patients (enriched for familial sIA), and in 2,513 matched controls with case-control status and with the number of sIAs. The most promising loci (p<5E-6) were replicated in 858 Finnish sIA patients and 4,048 controls. The frequencies and effect sizes of the replicated variants were compared to a continental European population using 717 Dutch cases and 3,004 controls. We discovered four new high-risk loci with low frequency lead variants. Three were associated with the case-control status: 2q23.3 (MAF 2.1%, OR 1.89, p 1.42×10-9); 5q31.3 (MAF 2.7%, OR 1.66, p 3.17×10-8); 6q24.2 (MAF 2.6%, OR 1.87, p 1.87×10-11) and one with the number of sIAs: 7p22.1 (MAF 3.3%, RR 1.59, p 6.08×-9). Two of the associations (5q31.3, 6q24.2) replicated in the Dutch sample. The 7p22.1 locus was strongly differentiated; the lead variant was more frequent in Finland (4.6%) than in the Netherlands (0.3%). Additionally, we replicated a previously inconclusive locus on 2q33.1 in all samples tested (OR 1.27, p 1.87×10-12). The five loci explain 2.1% of the sIA heritability in Finland, and may relate to, but not explain, the increased incidence of sIA-SAH in Finland. This study illustrates the utility of population isolates, familial enrichment, dense genotype imputation and alternate phenotyping in search for variants associated with complex diseases.

    PLoS genetics 2014;10;1;e1004134

  • Design of clone-specific probes from genome sequences for rapid PCR-typing of outbreak pathogens.

    López-Camacho E, Rentero Z, Ruiz-Carrascoso G, Wesselink JJ, Pérez-Vázquez M, Lusa-Bernal S, Gómez-Puertas P, Kingsley RA, Gómez-Sánchez P, Campos J, Oteo J and Mingorance J

    Servicio de Microbiología, Hospital Universitario La Paz, IdiPAZ, Madrid, Spain.

    The genome sequence of one OXA-48-producing Klebsiella pneumoniae belonging to sequence type (ST) 405, and three belonging to ST11, were used to design and test ST-specific PCR assays for typing OXA-48-producing K. pneumoniae. The approach proved to be useful for in-house development of rapid PCR typing assays for local outbreak surveillance.

    Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases 2014

  • Complete humanization of the mouse immunoglobulin loci enables efficient therapeutic antibody discovery.

    Lee EC, Liang Q, Ali H, Bayliss L, Beasley A, Bloomfield-Gerdes T, Bonoli L, Brown R, Campbell J, Carpenter A, Chalk S, Davis A, England N, Fane-Dremucheva A, Franz B, Germaschewski V, Holmes H, Holmes S, Kirby I, Kosmac M, Legent A, Lui H, Manin A, O'Leary S, Paterson J, Sciarrillo R, Speak A, Spensberger D, Tuffery L, Waddell N, Wang W, Wells S, Wong V, Wood A, Owen MJ, Friedrich GA and Bradley A

    Kymab Ltd., Babraham Research Campus, Cambridge, UK.

    If immunized with an antigen of interest, transgenic mice with large portions of unrearranged human immunoglobulin loci can produce fully human antigen-specific antibodies; several such antibodies are in clinical use. However, technical limitations inherent to conventional transgenic technology and sequence divergence between the human and mouse immunoglobulin constant regions limit the utility of these mice. Here, using repetitive cycles of genome engineering in embryonic stem cells, we have inserted the entire human immunoglobulin variable-gene repertoire (2.7 Mb) into the mouse genome, leaving the mouse constant regions intact. These transgenic mice are viable and fertile, with an immune system resembling that of wild-type mice. Antigen immunization results in production of high-affinity antibodies with long human-like complementarity-determining region 3 (CDR3H), broad epitope coverage and strong signatures of somatic hypermutation. These mice provide a robust system for the discovery of therapeutic human monoclonal antibodies; as a surrogate readout of the human antibody response, they may also aid vaccine design efforts.

    Nature biotechnology 2014

  • Molecular genetic evidence for overlap between general cognitive ability and risk for schizophrenia: a report from the Cognitive Genomics consorTium (COGENT).

    Lencz T, Knowles E, Davies G, Guha S, Liewald DC, Starr JM, Djurovic S, Melle I, Sundet K, Christoforou A, Reinvang I, Mukherjee S, Derosse P, Lundervold A, Steen VM, John M, Espeseth T, Räikkönen K, Widen E, Palotie A, Eriksson JG, Giegling I, Konte B, Ikeda M, Roussos P, Giakoumaki S, Burdick KE, Payton A, Ollier W, Horan M, Donohoe G, Morris D, Corvin A, Gill M, Pendleton N, Iwata N, Darvasi A, Bitsios P, Rujescu D, Lahti J, Hellard SL, Keller MC, Andreassen OA, Deary IJ, Glahn DC and Malhotra AK

    1] Division of Psychiatry Research, Zucker Hillside Hospital, Glen Oaks, NY, USA [2] Center for Psychiatric Neuroscience, Feinstein Institute for Medical Research, Manhasset, NY, USA [3] Hofstra North Shore-LIJ School of Medicine, Departments of Psychiatry and Molecular Medicine, Hempstead, NY, USA.

    It has long been recognized that generalized deficits in cognitive ability represent a core component of schizophrenia (SCZ), evident before full illness onset and independent of medication. The possibility of genetic overlap between risk for SCZ and cognitive phenotypes has been suggested by the presence of cognitive deficits in first-degree relatives of patients with SCZ; however, until recently, molecular genetic approaches to test this overlap have been lacking. Within the last few years, large-scale genome-wide association studies (GWAS) of SCZ have demonstrated that a substantial proportion of the heritability of the disorder is explained by a polygenic component consisting of many common single-nucleotide polymorphisms (SNPs) of extremely small effect. Similar results have been reported in GWAS of general cognitive ability. The primary aim of the present study is to provide the first molecular genetic test of the classic endophenotype hypothesis, which states that alleles associated with reduced cognitive ability should also serve to increase risk for SCZ. We tested the endophenotype hypothesis by applying polygenic SNP scores derived from a large-scale cognitive GWAS meta-analysis (~5000 individuals from nine nonclinical cohorts comprising the Cognitive Genomics consorTium (COGENT)) to four SCZ case-control cohorts. As predicted, cases had significantly lower cognitive polygenic scores compared to controls. In parallel, polygenic risk scores for SCZ were associated with lower general cognitive ability. In addition, using our large cognitive meta-analytic data set, we identified nominally significant cognitive associations for several SNPs that have previously been robustly associated with SCZ susceptibility. Results provide molecular confirmation of the genetic overlap between SCZ and general cognitive ability, and may provide additional insight into pathophysiology of the disorder.

    Molecular psychiatry 2014;19;2;168-74

  • Constitutional and somatic rearrangement of chromosome 21 in acute lymphoblastic leukaemia.

    Li Y, Schwab C, Ryan SL, Papaemmanuil E, Robinson HM, Jacobs P, Moorman AV, Dyer S, Borrow J, Griffiths M, Heerema NA, Carroll AJ, Talley P, Bown N, Telford N, Ross FM, Gaunt L, McNally RJ, Young BD, Sinclair P, Rand V, Teixeira MR, Joseph O, Robinson B, Maddison M, Dastugue N, Vandenberghe P, Haferlach C, Stephens PJ, Cheng J, Van Loo P, Stratton MR, Campbell PJ and Harrison CJ

    1] Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK [2].

    Changes in gene dosage are a major driver of cancer, known to be caused by a finite, but increasingly well annotated, repertoire of mutational mechanisms. This can potentially generate correlated copy-number alterations across hundreds of linked genes, as exemplified by the 2% of childhood acute lymphoblastic leukaemia (ALL) with recurrent amplification of megabase regions of chromosome 21 (iAMP21). We used genomic, cytogenetic and transcriptional analysis, coupled with novel bioinformatic approaches, to reconstruct the evolution of iAMP21 ALL. Here we show that individuals born with the rare constitutional Robertsonian translocation between chromosomes 15 and 21, rob(15;21)(q10;q10)c, have approximately 2,700-fold increased risk of developing iAMP21 ALL compared to the general population. In such cases, amplification is initiated by a chromothripsis event involving both sister chromatids of the Robertsonian chromosome, a novel mechanism for cancer predisposition. In sporadic iAMP21, breakage-fusion-bridge cycles are typically the initiating event, often followed by chromothripsis. In both sporadic and rob(15;21)c-associated iAMP21, the final stages frequently involve duplications of the entire abnormal chromosome. The end-product is a derivative of chromosome 21 or the rob(15;21)c chromosome with gene dosage optimized for leukaemic potential, showing constrained copy-number levels over multiple linked genes. Thus, dicentric chromosomes may be an important precipitant of chromothripsis, as we show rob(15;21)c to be constitutionally dicentric and breakage-fusion-bridge cycles generate dicentric chromosomes somatically. Furthermore, our data illustrate that several cancer-specific mutational processes, applied sequentially, can coordinate to fashion copy-number profiles over large genomic scales, incrementally refining the fitness benefits of aggregated gene dosage changes.

    Nature 2014

  • Novel skin phenotypes revealed by a genome-wide mouse reverse genetic screen.

    Liakath-Ali K, Vancollie VE, Heath E, Smedley DP, Estabel J, Sunter D, Ditommaso T, White JK, Ramirez-Solis R, Smyth I, Steel KP and Watt FM

    1] Centre for Stem Cells and Regenerative Medicine, King's College London, Guy's Hospital, London SE1 9RT, UK [2] Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1QW, UK [3] Wellcome Trust-Medical Research Council Stem Cell Institute, University of Cambridge, Tennis Court Road, Cambridge CB2 1QR, UK.

    Permanent stop-and-shop large-scale mouse mutant resources provide an excellent platform to decipher tissue phenogenomics. Here we analyse skin from 538 knockout mouse mutants generated by the Sanger Institute Mouse Genetics Project. We optimize immunolabelling of tail epidermal wholemounts to allow systematic annotation of hair follicle, sebaceous gland and interfollicular epidermal abnormalities using ontology terms from the Mammalian Phenotype Ontology. Of the 50 mutants with an epidermal phenotype, 9 map to human genetic conditions with skin abnormalities. Some mutant genes are expressed in the skin, whereas others are not, indicating systemic effects. One phenotype is affected by diet and several are incompletely penetrant. In-depth analysis of three mutants, Krt76, Myo5a (a model of human Griscelli syndrome) and Mysm1, provides validation of the screen. Our study is the first large-scale genome-wide tissue phenotype screen from the International Knockout Mouse Consortium and provides an open access resource for the scientific community.

    Nature communications 2014;5;3540

  • African origin of the malaria parasite Plasmodium vivax.

    Liu W, Li Y, Shaw KS, Learn GH, Plenderleith LJ, Malenke JA, Sundararaman SA, Ramirez MA, Crystal PA, Smith AG, Bibollet-Ruche F, Ayouba A, Locatelli S, Esteban A, Mouacha F, Guichet E, Butel C, Ahuka-Mundeke S, Inogwabini BI, Ndjango JB, Speede S, Sanz CM, Morgan DB, Gonder MK, Kranzusch PJ, Walsh PD, Georgiev AV, Muller MN, Piel AK, Stewart FA, Wilson ML, Pusey AE, Cui L, Wang Z, Färnert A, Sutherland CJ, Nolder D, Hart JA, Hart TB, Bertolani P, Gillis A, LeBreton M, Tafon B, Kiyang J, Djoko CF, Schneider BS, Wolfe ND, Mpoudi-Ngole E, Delaporte E, Carter R, Culleton RL, Shaw GM, Rayner JC, Peeters M, Hahn BH and Sharp PM

    Department of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.

    Plasmodium vivax is the leading cause of human malaria in Asia and Latin America but is absent from most of central Africa due to the near fixation of a mutation that inhibits the expression of its receptor, the Duffy antigen, on human erythrocytes. The emergence of this protective allele is not understood because P. vivax is believed to have originated in Asia. Here we show, using a non-invasive approach, that wild chimpanzees and gorillas throughout central Africa are endemically infected with parasites that are closely related to human P. vivax. Sequence analyses reveal that ape parasites lack host specificity and are much more diverse than human parasites, which form a monophyletic lineage within the ape parasite radiation. These findings indicate that human P. vivax is of African origin and likely selected for the Duffy-negative mutation. All extant human P. vivax parasites are derived from a single ancestor that escaped out of Africa.

    Funded by: NIAID NIH HHS: P30 AI045008, R01 AI091595, R01 AI58715, R37 AI050529, T32 AI007532; Wellcome Trust: 098051

    Nature communications 2014;5;3346

  • A DERL3-associated defect in the degradation of SLC2A1 mediates the Warburg effect.

    Lopez-Serra P, Marcilla M, Villanueva A, Ramos-Fernandez A, Palau A, Leal L, Wahi JE, Setien-Baranda F, Szczesna K, Moutinho C, Martinez-Cardus A, Heyn H, Sandoval J, Puertas S, Vidal A, Sanjuan X, Martinez-Balibrea E, Viñals F, Perales JC, Bramsem JB, Ørntoft TF, Andersen CL, Tabernero J, McDermott U, Boxer MB, Heiden MG, Albar JP and Esteller M

    Cancer Epigenetics and Biology Program (PEBC), Bellvitge Biomedical Research Institute (IDIBELL), L'Hospitalet, Barcelona, 08908 Catalonia, Spain.

    Cancer cells possess aberrant proteomes that can arise by the disruption of genes involved in physiological protein degradation. Here we demonstrate the presence of promoter CpG island hypermethylation-linked inactivation of DERL3 (Derlin-3), a key gene in the endoplasmic reticulum-associated protein degradation pathway, in human tumours. The restoration of in vitro and in vivo DERL3 activity highlights the tumour suppressor features of the gene. Using the stable isotopic labelling of amino acids in cell culture workflow for differential proteome analysis, we identify SLC2A1 (glucose transporter 1, GLUT1) as a downstream target of DERL3. Most importantly, SLC2A1 overexpression mediated by DERL3 epigenetic loss contributes to the Warburg effect in the studied cells and pinpoints a subset of human tumours with greater vulnerability to drugs targeting glycolysis.

    Nature communications 2014;5;3608

  • A proteomic chronology of gene expression through the cell cycle in human myeloid leukemia cells.

    Ly T, Ahmad Y, Shlien A, Soroka D, Mills A, Emanuele MJ, Stratton MR and Lamond AI

    Centre for Gene Regulation and Expression, College of Life Sciences, University of Dundee, Dundee, United Kingdom.

    Technological advances have enabled the analysis of cellular protein and RNA levels with unprecedented depth and sensitivity, allowing for an unbiased re-evaluation of gene regulation during fundamental biological processes. Here, we have chronicled the dynamics of protein and mRNA expression levels across a minimally perturbed cell cycle in human myeloid leukemia cells using centrifugal elutriation combined with mass spectrometry-based proteomics and RNA-Seq, avoiding artificial synchronization procedures. We identify myeloid-specific gene expression and variations in protein abundance, isoform expression and phosphorylation at different cell cycle stages. We dissect the relationship between protein and mRNA levels for both bulk gene expression and for over ∼6000 genes individually across the cell cycle, revealing complex, gene-specific patterns. This data set, one of the deepest surveys to date of gene expression in human cells, is presented in an online, searchable database, the Encyclopedia of Proteome Dynamics ( DOI:

    eLife 2014;3;e01630

  • Cloning of recombinant monoclonal antibodies from hybridomas in a single Mammalian expression plasmid.

    Müller-Sienerth N, Crosnier C, Wright GJ and Staudt N

    Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Antibodies are an integral part of biological and medical research. In addition, immunoglobulins are used in many diagnostic tests and are becoming increasingly important in the therapy of diseases. To express antibodies recombinantly, the immunoglobulin heavy and light chains are usually cloned into two different expression plasmids. Here, we describe a method for recombinant antibody expression from a single plasmid.

    Methods in molecular biology (Clifton, N.J.) 2014;1131;229-40

  • The rate of nonallelic homologous recombination in males is highly variable, correlated between monozygotic twins and independent of age.

    MacArthur JA, Spector TD, Lindsay SJ, Mangino M, Gill R, Small KS and Hurles ME

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    Nonallelic homologous recombination (NAHR) between highly similar duplicated sequences generates chromosomal deletions, duplications and inversions, which can cause diverse genetic disorders. Little is known about interindividual variation in NAHR rates and the factors that influence this. We estimated the rate of deletion at the CMT1A-REP NAHR hotspot in sperm DNA from 34 male donors, including 16 monozygotic (MZ) co-twins (8 twin pairs) aged 24 to 67 years old. The average NAHR rate was 3.5 × 10(-5) with a seven-fold variation across individuals. Despite good statistical power to detect even a subtle correlation, we observed no relationship between age of unrelated individuals and the rate of NAHR in their sperm, likely reflecting the meiotic-specific origin of these events. We then estimated the heritability of deletion rate by calculating the intraclass correlation (ICC) within MZ co-twins, revealing a significant correlation between MZ co-twins (ICC = 0.784, p = 0.0039), with MZ co-twins being significantly more correlated than unrelated pairs. We showed that this heritability cannot be explained by variation in PRDM9, a known regulator of NAHR, or variation within the NAHR hotspot itself. We also did not detect any correlation between Body Mass Index (BMI), smoking status or alcohol intake and rate of NAHR. Our results suggest that other, as yet unidentified, genetic or environmental factors play a significant role in the regulation of NAHR and are responsible for the extensive variation in the population for the probability of fathering a child with a genomic disorder resulting from a pathogenic deletion.

    Funded by: Wellcome Trust: 077014/Z/05/Z

    PLoS genetics 2014;10;3;e1004195

  • Single cell genomics: advances and future perspectives.

    Macaulay IC and Voet T

    Single Cell Genomics Centre, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    Advances in whole-genome and whole-transcriptome amplification have permitted the sequencing of the minute amounts of DNA and RNA present in a single cell, offering a window into the extent and nature of genomic and transcriptomic heterogeneity which occurs in both normal development and disease. Single-cell approaches stand poised to revolutionise our capacity to understand the scale of genomic, epigenomic, and transcriptomic diversity that occurs during the lifetime of an individual organism. Here, we review the major technological and biological breakthroughs achieved, describe the remaining challenges to overcome, and provide a glimpse into the promise of recent and future developments.

    PLoS genetics 2014;10;1;e1004126

  • Targeting of Slc25a21 Is Associated with Orofacial Defects and Otitis Media Due to Disrupted Expression of a Neighbouring Gene.

    Maguire S, Estabel J, Ingham N, Pearson S, Ryder E, Carragher DM, Walker N, Sanger MGP Slc25a21 Project Team, Bussell J, Chan WI, Keane TM, Adams DJ, Scudamore CL, Lelliott CJ, Ramírez-Solis R, Karp NA, Steel KP, White JK and Gerdin AK

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom.

    Homozygosity for Slc25a21tm1a(KOMP)Wtsi results in mice exhibiting orofacial abnormalities, alterations in carpal and rugae structures, hearing impairment and inflammation in the middle ear. In humans it has been hypothesised that the 2-oxoadipate mitochondrial carrier coded by SLC25A21 may be involved in the disease 2-oxoadipate acidaemia. Unexpectedly, no 2-oxoadipate acidaemia-like symptoms were observed in animals homozygous for Slc25a21tm1a(KOMP)Wtsi despite confirmation that this allele reduces Slc25a21 expression by 71.3%. To study the complete knockout, an allelic series was generated using the loxP and FRT sites typical of a Knockout Mouse Project allele. After removal of the critical exon and neomycin selection cassette, Slc25a21 knockout mice homozygous for the Slc25a21tm1b(KOMP)Wtsi and Slc25a21tm1d(KOMP)Wtsi alleles were phenotypically indistinguishable from wild-type. This led us to explore the genomic environment of Slc25a21 and to discover that expression of Pax9, located 3' of the target gene, was reduced in homozygous Slc25a21tm1a(KOMP)Wtsi mice. We hypothesize that the presence of the selection cassette is the cause of the down regulation of Pax9 observed. The phenotypes we observed in homozygous Slc25a21tm1a(KOMP)Wtsi mice were broadly consistent with a hypomorphic Pax9 allele with the exception of otitis media and hearing impairment which may be a novel consequence of Pax9 down regulation. We explore the ramifications associated with this particular targeted mutation and emphasise the need to interpret phenotypes taking into consideration all potential underlying genetic mechanisms.

    PloS one 2014;9;3;e91807

  • Fc gamma Receptor IIa-H131R Polymorphism and Malaria Susceptibility in Sympatric Ethnic Groups, Fulani and Dogon of Mali.

    Maiga B, Dolo A, Touré O, Dara V, Tapily A, Campino S, Sepulveda N, Corran P, Rockett K, Clark TG, Troye Blomberg M and Doumbo OK

    Malaria Research and Training Center/Department of Epidemiology of Parasitic Diseases/Faculty of Medicine, Pharmacy and Odonto - Stomatology, Bamako/USTTB, Mali; Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, Stockholm, Sweden.

    It has been previously shown that there are some interethnic differences in susceptibility to malaria between two sympatric ethnic groups of Mali, the Fulani and the Dogon. The lower susceptibility to Plasmodium falciparum malaria seen in the Fulani has not been fully explained by genetic polymorphisms previously known to be associated with malaria resistance, including haemoglobin S (HbS), haemoglobin C (HbC), alpha-thalassaemia and glucose-6-phosphate dehydrogenase (G6PD) deficiency. Given the observed differences in the distribution of FcγRIIa allotypes among different ethnic groups and with malaria susceptibility that have been reported, we analysed the rs1801274-R131H polymorphism in the FcγRIIa gene in a study of Dogon and Fulani in Mali (n = 939). We confirm that the Fulani have less parasite densities, less parasite prevalence, more spleen enlargement and higher levels of total IgG antibodies (anti-CSP, anti-AMA1, anti-MSP1 and anti-MSP2) and more total IgE (P < 0.05) compared with the Dogon ethnic group. Furthermore, the Fulani exhibit higher frequencies of the blood group O (56.5%) compared with the Dogon (43.5%) (P < 0.001). With regard to the FcγRIIa polymorphism and allele frequency, the Fulani group have a higher frequency of the H allele (Fulani 0.474, Dogon 0.341, P < 0.0001), which was associated with greater total IgE production (P = 0.004). Our findings show that the FcγRIIa polymorphism might have an implication in the relative protection seen in the Fulani tribe, with confirmatory studies required in other malaria endemic settings.

    Scandinavian journal of immunology 2014;79;1;43-50

  • Characterization of Vibrio cholerae Bacteriophages Isolated from the Environmental Waters of the Lake Victoria Region of Kenya.

    Maina AN, Mwaura FB, Oyugi J, Goulding D, Toribio AL and Kariuki S

    School of Biological Sciences, University of Nairobi, Nairobi, Kenya,

    Over the last decade, cholera outbreaks have become common in some parts of Kenya. The most recent cholera outbreak occurred in Coastal and Lake Victoria region during January 2009 and May 2010, where a total of 11,769 cases and 274 deaths were reported by the Ministry of Public Health and Sanitation. The objective of this study is to isolate Vibrio cholerae bacteriophages from the environmental waters of the Lake Victoria region of Kenya with potential for use as a biocontrol for cholera outbreaks. Water samples from wells, ponds, sewage effluent, boreholes, rivers, and lakes of the Lake Victoria region of Kenya were enriched for 48 h at 37 °C in broth containing a an environmental strain of V. cholerae. Bacteriophages were isolated from 5 out of the 42 environmental water samples taken. Isolated phages produced tiny, round, and clear plaques suggesting that these phages were lytic to V. cholerae. Transmission electron microscope examination revealed that all the nine phages belonged to the family Myoviridae, with typical icosahedral heads, long contractile tails, and fibers. Head had an average diameter of 88.3 nm and tail of length and width 84.9 and 16.1 nm, respectively. Vibriophages isolated from the Lake Victoria region of Kenya have been characterized and the isolated phages may have a potential to be used as antibacterial agents to control pathogenic V. cholerae bacteria in water reservoirs.

    Current microbiology 2014;68;1;64-70

  • Parallel dynamics and evolution: Protein conformational fluctuations and assembly reflect evolutionary changes in sequence and structure.

    Marsh JA and Teichmann SA

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Protein structure is dynamic: the intrinsic flexibility of polypeptides facilitates a range of conformational fluctuations, and individual protein chains can assemble into complexes. Proteins are also dynamic in evolution: significant variations in secondary, tertiary and quaternary structure can be observed among divergent members of a protein family. Recent work has highlighted intriguing similarities between these structural and evolutionary dynamics occurring at various levels. Here we review evidence showing how evolutionary changes in protein sequence and structure are often closely related to local protein flexibility and disorder, large-scale motions and quaternary structure assembly. We suggest that these correspondences can be largely explained by neutral evolution, while deviations between structural and evolutionary dynamics can provide valuable functional insights. Finally, we address future prospects for the field and practical applications that arise from a deeper understanding of the intimate relationship between protein structure, dynamics, function and evolution.

    BioEssays : news and reviews in molecular, cellular and developmental biology 2014;36;2;209-18

  • Assessing multivariate gene-metabolome associations with rare variants using Bayesian reduced rank regression.

    Marttinen P, Pirinen M, Sarin AP, Gillberg J, Kettunen J, Surakka I, Kangas AJ, Soininen P, O'Reilly PF, Kaakinen M, Kähönen M, Lehtimäki T, Ala-Korpela M, Raitakari OT, Salomaa V, Järvelin MR, Ripatti S and Kaski S

    Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Finland, Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA, Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Finland, Unit of Public Health Genomics, National Institute for Health and Welfare, Helsinki, Finland, Computational Medicine, Institute of Health Sciences, University of Oulu and Oulu University Hospital, Oulu, Finland, NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland, Department of Epidemiology and Biostatistics, MRC Health Protection, Agency (HPA) Centre for Environment and Health, School of Public Health, Imperial College, London, UK, Institute of Health Sciences, University of Oulu, Finland, Biocenter Oulu, University of Oulu, Finland, Department of Clinical Physiology, Tampere University Hospital and University of Tampere, Department of Clinical Chemistry, Fimlab Laboratories, University of Tampere School of Medicine, Tampere, Finland, Computational Medicine, School of Social and Community Medicine and the Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK, Department of Clinical Physiology and Nuclear Medicine, University of Turku and Turku University Hospital, Turku, Finland, Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku and Turku University Hospital, Turku, Finland, Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Finland, Unit of Primary Care, Oulu University Hospital, Finland, Department of Children and Young People and Families, National Institute for Health and Welfare, Oulu, Finland, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK, Hjelt Institute, University of Helsinki, Helsinki, Finland, Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Finland.

    Motivation: A typical genome-wide association study searches for associations between single nucleotide polymorphisms (SNPs) and a univariate phenotype. However, there is a growing interest to investigate associations between genomics data and multivariate phenotypes, for example in gene expression or metabolomics studies. A common approach is to perform a univariate test between each genotype-phenotype pair, and then to apply a stringent significance cutoff to account for the large number of tests performed. However, this approach has limited ability to uncover dependencies involving multiple variables. Another trend in the current genetics is the investigation of the impact of rare variants on the phenotype, where the standard methods often fail due to lack of power when the minor allele is present in only a limited number of individuals.

    Results: We propose a new statistical approach based on Bayesian reduced rank regression to assess the impact of multiple SNPs on a high-dimensional phenotype. Due to the method's ability to combine information over multiple SNPs and phenotypes, it is particularly suitable for detecting associations involving rare variants. We demonstrate the potential of our method and compare it with alternatives using the Northern Finland Birth Cohort with 4,702 individuals, for whom genome-wide SNP data along with lipoprotein profiles comprising 74 traits are available. We discovered two genes (XRCC4 and MTHFD2L) without previously reported associations, which replicated in a combined analysis of two additional cohorts: 2,390 individuals from the Cardiovascular Risk in Young Finns study and 3,659 individuals from the FINRISK Study.

    R-code freely available for download at


    Bioinformatics (Oxford, England) 2014

  • Identification of novel genetic Loci associated with thyroid peroxidase antibodies and clinical thyroid disease.

    Medici M, Porcu E, Pistis G, Teumer A, Brown SJ, Jensen RA, Rawal R, Roef GL, Plantinga TS, Vermeulen SH, Lahti J, Simmonds MJ, Husemoen LL, Freathy RM, Shields BM, Pietzner D, Nagy R, Broer L, Chaker L, Korevaar TI, Plia MG, Sala C, Völker U, Richards JB, Sweep FC, Gieger C, Corre T, Kajantie E, Thuesen B, Taes YE, Visser WE, Hattersley AT, Kratzsch J, Hamilton A, Li W, Homuth G, Lobina M, Mariotti S, Soranzo N, Cocca M, Nauck M, Spielhagen C, Ross A, Arnold A, van de Bunt M, Liyanarachchi S, Heier M, Grabe HJ, Masciullo C, Galesloot TE, Lim EM, Reischl E, Leedman PJ, Lai S, Delitala A, Bremner AP, Philips DI, Beilby JP, Mulas A, Vocale M, Abecasis G, Forsen T, James A, Widen E, Hui J, Prokisch H, Rietzschel EE, Palotie A, Feddema P, Fletcher SJ, Schramm K, Rotter JI, Kluttig A, Radke D, Traglia M, Surdulescu GL, He H, Franklyn JA, Tiller D, Vaidya B, de Meyer T, Jørgensen T, Eriksson JG, O'Leary PC, Wichmann E, Hermus AR, Psaty BM, Ittermann T, Hofman A, Bosi E, Schlessinger D, Wallaschofski H, Pirastu N, Aulchenko YS, de la Chapelle A, Netea-Maier RT, Gough SC, Meyer Zu Schwabedissen H, Frayling TM, Kaufman JM, Linneberg A, Räikkönen K, Smit JW, Kiemeney LA, Rivadeneira F, Uitterlinden AG, Walsh JP, Meisinger C, den Heijer M, Visser TJ, Spector TD, Wilson SG, Völzke H, Cappola A, Toniolo D, Sanna S, Naitza S and Peeters RP

    Department of Internal Medicine, Erasmus Medical Center Rotterdam, Rotterdam, The Netherlands.

    Autoimmune thyroid diseases (AITD) are common, affecting 2-5% of the general population. Individuals with positive thyroid peroxidase antibodies (TPOAbs) have an increased risk of autoimmune hypothyroidism (Hashimoto's thyroiditis), as well as autoimmune hyperthyroidism (Graves' disease). As the possible causative genes of TPOAbs and AITD remain largely unknown, we performed GWAS meta-analyses in 18,297 individuals for TPOAb-positivity (1769 TPOAb-positives and 16,528 TPOAb-negatives) and in 12,353 individuals for TPOAb serum levels, with replication in 8,990 individuals. Significant associations (P<5×10(-8)) were detected at TPO-rs11675434, ATXN2-rs653178, and BACH2-rs10944479 for TPOAb-positivity, and at TPO-rs11675434, MAGI3-rs1230666, and KALRN-rs2010099 for TPOAb levels. Individual and combined effects (genetic risk scores) of these variants on (subclinical) hypo- and hyperthyroidism, goiter and thyroid cancer were studied. Individuals with a high genetic risk score had, besides an increased risk of TPOAb-positivity (OR: 2.18, 95% CI 1.68-2.81, P = 8.1×10(-8)), a higher risk of increased thyroid-stimulating hormone levels (OR: 1.51, 95% CI 1.26-1.82, P = 2.9×10(-6)), as well as a decreased risk of goiter (OR: 0.77, 95% CI 0.66-0.89, P = 6.5×10(-4)). The MAGI3 and BACH2 variants were associated with an increased risk of hyperthyroidism, which was replicated in an independent cohort of patients with Graves' disease (OR: 1.37, 95% CI 1.22-1.54, P = 1.2×10(-7) and OR: 1.25, 95% CI 1.12-1.39, P = 6.2×10(-5)). The MAGI3 variant was also associated with an increased risk of hypothyroidism (OR: 1.57, 95% CI 1.18-2.10, P = 1.9×10(-3)). This first GWAS meta-analysis for TPOAbs identified five newly associated loci, three of which were also associated with clinical thyroid disease. With these markers we identified a large subgroup in the general population with a substantially increased risk of TPOAbs. The results provide insight into why individuals with thyroid autoimmunity do or do not eventually develop thyroid disease, and these markers may therefore predict which TPOAb-positives are particularly at risk of developing clinical thyroid dysfunction.

    PLoS genetics 2014;10;2;e1004123

  • The sex-specific associations of the aromatase gene with Alzheimer's disease and its interaction with IL10 in the Epistasis Project.

    Medway C, Combarros O, Cortina-Borja M, Butler HT, Ibrahim-Verbaas CA, de Bruijn RF, Koudstaal PJ, van Duijn CM, Ikram MA, Mateo I, Sánchez-Juan P, Lehmann MG, Heun R, Kölsch H, Deloukas P, Hammond N, Coto E, Alvarez V, Kehoe PG, Barber R, Wilcock GK, Brown K, Belbin O, Warden DR, Smith AD, Morgan K and Lehmann DJ

    Human Genetics Research, Queens Medical Centre, School of Molecular Medical Sciences, University of Nottingham, Nottingham, UK.

    Epistasis between interleukin-10 (IL10) and aromatase gene polymorphisms has previously been reported to modify the risk of Alzheimer's disease (AD). However, although the main effects of aromatase variants suggest a sex-specific effect in AD, there has been insufficient power to detect sex-specific epistasis between these genes to date. Here we used the cohort of 1757 AD patients and 6294 controls in the Epistasis Project. We replicated the previously reported main effects of aromatase polymorphisms in AD risk in women, for example, adjusted odds ratio of disease for rs1065778 GG=1.22 (95% confidence interval: 1.01-1.48, P=0.03). We also confirmed a reported epistatic interaction between IL10 rs1800896 and aromatase (CYP19A1) rs1062033, again only in women: adjusted synergy factor=1.94 (1.16-3.25, 0.01). Aromatase, a rate-limiting enzyme in the synthesis of estrogens, is expressed in AD-relevant brain regions ,and is downregulated during the disease. IL-10 is an anti-inflammatory cytokine. Given that estrogens have neuroprotective and anti-inflammatory activities and regulate microglial cytokine production, epistasis is biologically plausible. Diminishing serum estrogen in postmenopausal women, coupled with suboptimal brain estrogen synthesis, may contribute to the inflammatory state, that is a pathological hallmark of AD.

    European journal of human genetics : EJHG 2014;22;2;216-20

  • Community Case Clusters of Middle East Respiratory Syndrome Coronavirus in Hafr Al-Batin, Kingdom of Saudi Arabia: A Descriptive Genomic study.

    Memish ZA, Cotten M, Watson SJ, Kellam P, Zumla A, Alhakeem RF, Assiri A, Rabeeah AA and Al-Tawfiq JA

    Global Centre for Mass Gatherings Medicine (GCMGM), Ministry of Health, Riyadh, Kingdom of Saudi Arabia; College of Medicine, Alfaisal University, Riyadh, Kingdom of Saudi Arabia. Electronic address:

    The Middle East respiratory syndrome coronavirus (MERS-CoV) was first described in September 2012 and had caused a total of 191 cases of MERS-CoV infection with 82 deaths. Camels have been implicated as the reservoir of MERS-CoV, but the exact source and mode of transmission for most patients remain unknown. During a 3 month period, June to August 2013, there were 12 positive MERS-CoV cases reported from the Hafr Al-Batin district in the north east region of the Kingdom of Saudi Arabia. In addition to the different regional camel festivals in neighboring countries, Hafr Al-Batin has the biggest camel market in the entire Kingdom and host an annual camel festival. Thus, we conducted a detailed epidemiological, clinical and genomic study to ascertain common exposure and transmission patterns of all cases of MERS-CoV reported from Hafr Al-Batin. The genetic data indicated that at least two of the infected contacts could not have been directly infected from the index patient and alternate source should be considered. Camels appear as the likely source but other animals have not been ruled out. More detailed case control studies with detailed case histories, epidemiological information and genomic analysis are being conducted to delineate the missing pieces in the transmission dynamics of MERS-CoV outbreak.

    International journal of infectious diseases : IJID : official publication of the International Society for Infectious Diseases 2014

  • Online questionnaire development: Using film to engage participants and then gather attitudes towards the sharing of genomic data.

    Middleton A, Bragin E, Morley KI, Parker M and on behalf of the DDD Study

    Wellcome Trust Sanger Institute, Cambridge, UK. Electronic address:

    How can a researcher engage a participant in a survey, when the subject matter may be perceived as 'challenging' or even be totally unfamiliar to the participant? The Genomethics study addressed this via the creation and delivery of a novel online questionnaire containing 10 integrated films. The films documented various ethical dilemmas raised by genomic technologies and the survey ascertained attitudes towards these. Participants were recruited into the research using social media, traditional media and email invitation. The film-survey strategy was successful: 11,336 initial hits on the survey website led to 6944 completed surveys. Participants included from those who knew nothing of the subject matter through to experts in the field of genomics (61% compliance rate), 72% of participants answered every single question. This paper summarises the survey design process and validation methods applied. The recruitment strategy and results from the survey are presented elsewhere.

    Social science research 2014;44C;211-223

  • Finding people who will tell you their thoughts on genomics-recruitment strategies for social sciences research.

    Middleton A, Bragin E, Parker M and on behalf of the DDD Study

    Wellcome Trust Sanger Institute, Cambridge, CB10 1SA, UK,

    This paper offers a description of how social media, traditional media and direct invitation were used as tools for the recruitment of 6,944 research participants for a social sciences study on genomics. The remit was to gather the views of various stakeholders towards sharing incidental findings from whole genome studies. This involved recruiting members of the public, genetic health professionals, genomic researchers and non-genetic health professionals. A novel survey was designed that contained ten integrated films; this was made available online and open for completion by anyone worldwide. The recruitment methods are described together with the convenience and snowballing sampling framework. The most successful strategy involved the utilisation of social media; Facebook, Blogging, Twitter, LinkedIn and Google Ads led to the ascertainment of over 75 % of the final sample. We conclude that the strategies used were successful in recruiting in eclectic mix of appropriate participants. Design of the survey and results from the study are presented separately.

    Journal of community genetics 2014

  • Position statement on opportunistic genomic screening from the Association of Genetic Nurses and Counsellors (UK and Ireland).

    Middleton A, Patch C, Wiggins J, Barnes K, Crawford G, Benjamin C and Bruce A

    Human Genetics, Wellcome Trust Sanger Institute, Cambridge, UK.

    The American College of Medical Genetics and Genomics released recommendations for reporting incidental findings (IFs) in clinical exome and genome sequencing. These suggest 'opportunistic genomic screening' should be available to both adults and children each time a sequence is done and would be undertaken without seeking preferences from the patient first. Should opportunistic genomic screening be implemented in the United Kingdom, the Association of Genetic Nurses and Counsellors (AGNC), which represents British and Irish genetic counsellors and nurses, feels strongly that the following must be considered (see article for complete list): (1) Following appropriate genetic counselling, patients should be allowed to consent to or opt out of opportunistic genomic screening. (2) If true IFs are discovered the AGNC are guided by the report from the Joint Committee on Medical Genetics about the sharing of genetic testing results. (3) Children should not be routinely tested for adult-onset conditions. (4) The formation of a list of variants should involve a representative from the AGNC as well as a patient support group. (5) The variants should be for serious or life-threatening conditions for which there are treatments or preventative strategies available. (6) There needs to be robust evidence that the benefits of opportunistic screening outweigh the potential harms. (7) The clinical validity and utility of variants should be known. (8) There must be a quality assurance framework that operates to International standards for laboratory testing. (9) Psychosocial research is urgently needed in this area to understand the impact on patients.European Journal of Human Genetics advance online publication, 8 January 2014; doi:10.1038/ejhg.2013.301.

    European journal of human genetics : EJHG 2014

  • Metabolic and Target-Site Mechanisms Combine to Confer Strong DDT Resistance in Anopheles gambiae.

    Mitchell SN, Rigden DJ, Dowd AJ, Lu F, Wilding CS, Weetman D, Dadzie S, Jenkins AM, Regna K, Boko P, Djogbenou L, Muskavitch MA, Ranson H, Paine MJ, Mayans O and Donnelly MJ

    Department of Vector Biology, Liverpool School of Tropical Medicine, Liverpool, United Kingdom.

    The development of resistance to insecticides has become a classic exemplar of evolution occurring within human time scales. In this study we demonstrate how resistance to DDT in the major African malaria vector Anopheles gambiae is a result of both target-site resistance mechanisms that have introgressed between incipient species (the M- and S-molecular forms) and allelic variants in a DDT-detoxifying enzyme. Sequencing of the detoxification enzyme, Gste2, from DDT resistant and susceptible strains of An. gambiae, revealed a non-synonymous polymorphism (I114T), proximal to the DDT binding domain, which segregated with strain phenotype. Recombinant protein expression and DDT metabolism analysis revealed that the proteins from the susceptible strain lost activity at higher DDT concentrations, characteristic of substrate inhibition. The effect of I114T on GSTE2 protein structure was explored through X-ray crystallography. The amino acid exchange in the DDT-resistant strain introduced a hydroxyl group nearby the hydrophobic DDT-binding region. The exchange does not result in structural alterations but is predicted to facilitate local dynamics and enzyme activity. Expression of both wild-type and 114T alleles the allele in Drosophila conferred an increase in DDT tolerance. The 114T mutation was significantly associated with DDT resistance in wild caught M-form populations and acts in concert with target-site mutations in the voltage gated sodium channel (Vgsc-1575Y and Vgsc-1014F) to confer extreme levels of DDT resistance in wild caught An. gambiae.

    PloS one 2014;9;3;e92662

  • Genome-wide analysis of selection on the malaria parasite Plasmodium falciparum in West African populations of differing infection endemicity.

    Mobegi VA, Duffy CW, Amambua-Ngwa A, Loua KM, Laman E, Nwakanma DC, Macinnis B, Aspeling-Jones H, Murray L, Clark TG, Kwiatkowski DP and Conway DJ

    Pathogen Molecular Biology Department, London School of Hygiene and Tropical Medicine, London, UK; Medical Research Council Unit, Fajara, Banjul, The Gambia; National Institute of Public Health, Conakry, Republic of Guinea; The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK; Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.

    Locally varying selection on pathogens may be due to differences in drug pressure, host immunity, transmission opportunities between hosts, or the intensity of between-genotype competition within hosts. Highly recombining populations of the human malaria parasite Plasmodium falciparum throughout West Africa are closely related, as gene flow is relatively unrestricted in this endemic region, but markedly varying ecology and transmission intensity should cause distinct local selective pressures. Genome-wide analysis of sequence variation was undertaken on a sample of 100 P. falciparum clinical isolates from a highly endemic region of the Republic of Guinea where transmission occurs for most of each year, and compared with data from 52 clinical isolates from a previously sampled population from The Gambia where there is relatively limited seasonal malaria transmission. Paired-end short read sequences were mapped against the 3D7 P. falciparum reference genome sequence, and data on 136144 SNPs were obtained. Within-population analyses identifying loci showing evidence of recent positive directional selection and balancing selection confirm that antimalarial drugs and host immunity have been major selective agents. Many of the signatures of recent directional selection reflected by standardised integrated haplotype scores (|iHS|) were population-specific, including differences at drug resistance loci due to historically different antimalarial use between the countries. In contrast, both populations showed a similar set of loci likely to be under balancing selection as indicated by very high Tajima's D values, including a significant over-representation of genes expressed at the merozoite stage that invades erythrocytes, and several previously validated targets of acquired immunity. Between-population FST analysis identified exceptional differentiation of allele frequencies at a small number of loci, most markedly for five SNPs covering a 15kb region within and flanking the gdv1 gene that regulates the early stages of gametocyte development, which is likely related to the extreme differences in mosquito vector abundance and seasonality which determine the transmission opportunities for the sexual stage of the parasite.

    Molecular biology and evolution 2014

  • Widespread epidemic cholera caused by a restricted subset of Vibrio cholerae clones.

    Moore S, Thomson N, Mutreja A and Piarroux R

    Aix-Marseille University, UMR MD3, Marseilles, France.

    Since 1817, seven cholera pandemics have plagued humankind. As the causative agent, Vibrio cholerae, is autochthonous in the aquatic ecosystem and some studies have revealed links between outbreaks and fluctuations in climatic and aquatic conditions, it has been widely assumed that cholera epidemics are triggered by environmental factors that promote the growth of local bacterial reservoirs. However, mounting epidemiological findings and genome sequence analysis of clinical isolates have indicated that epidemics are largely unassociated with most of the V. cholerae strains in aquatic ecosystems. Instead, only a specific subset of V. cholerae El Tor 'types' appears to be responsible for current epidemics. A recent report examining the evolution of a variety of V. cholerae strains indicates that the current pandemic is monophyletic and originated from a single ancestral clone that has spread globally in successive waves. In this review, we examine the clonal nature of the disease, with the example of the recent history of cholera in the Americas. Epidemiological data and genome sequence-based analysis of V. cholerae isolates demonstrate that the cholera epidemics of the 1990s in South America were triggered by the importation of a pathogenic V. cholerae strain that gradually spread throughout the region until local outbreaks ceased in 2001. Latin America remained almost unaffected by the disease until a new toxigenic V. cholerae clone was imported into Haiti in 2010. Overall, cholera appears to be largely caused by a subset of specific V. cholerae clones rather than by the vast diversity of V. cholerae strains in the environment.

    Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases 2014

  • Reciprocal duplication of the williams-beuren syndrome deletion on chromosome 7q11.23 is associated with schizophrenia.

    Mulle JG, Pulver AE, McGrath JA, Wolyniec PS, Dodd AF, Cutler DJ, Sebat J, Malhotra D, Nestadt G, Conrad DF, Hurles M, Barnes CP, Ikeda M, Iwata N, Levinson DF, Gejman PV, Sanders AR, Duan J, Mitchell AA, Peter I, Sklar P, O'Dushlaine CT, Grozeva D, O'Donovan MC, Owen MJ, Hultman CM, Kähler AK, Sullivan PF, Molecular Genetics of Schizophrenia Consortium, Kirov G and Warren ST

    Department of Epidemiology, Rollins School of Public Health, Emory University; Department of Human Genetics, Emory University School of Medicine, Atlanta, Georgia. Electronic address:

    Background: Several copy number variants (CNVs) have been implicated as susceptibility factors for schizophrenia (SZ). Some of these same CNVs also increase risk for autism spectrum disorders, suggesting an etiologic overlap between these conditions. Recently, de novo duplications of a region on chromosome 7q11.23 were associated with autism spectrum disorders. The reciprocal deletion of this region causes Williams-Beuren syndrome.

    Methods: We assayed an Ashkenazi Jewish cohort of 554 SZ cases and 1014 controls for genome-wide CNV. An excess of large rare and de novo CNVs were observed, including a 1.4 Mb duplication on chromosome 7q11.23 identified in two unrelated patients. To test whether this 7q11.23 duplication is also associated with SZ, we obtained data for 14,387 SZ cases and 28,139 controls from seven additional studies with high-resolution genome-wide CNV detection. We performed a meta-analysis, correcting for study population of origin, to assess whether the duplication is associated with SZ.

    Results: We found duplications at 7q11.23 in 11 of 14,387 SZ cases with only 1 in 28,139 control subjects (unadjusted odds ratio 21.52, 95% confidence interval: 3.13-922.6, p value 5.5 × 10(-5); adjusted odds ratio 10.8, 95% confidence interval: 1.46-79.62, p value .007). Of three SZ duplication carriers with detailed retrospective data, all showed social anxiety and language delay premorbid to SZ onset, consistent with both human studies and animal models of the 7q11.23 duplication.

    Conclusions: We have identified a new CNV associated with SZ. Reciprocal duplication of the Williams-Beuren syndrome deletion at chromosome 7q11.23 confers an approximately tenfold increase in risk for SZ.

    Funded by: Medical Research Council: G0800509; NIMH NIH HHS: R01 MH080129, U01 MH094411

    Biological psychiatry 2014;75;5;371-7

  • Transmissible [corrected] dog cancer genome reveals the origin and history of an ancient cell lineage.

    Murchison EP, Wedge DC, Alexandrov LB, Fu B, Martincorena I, Ning Z, Tubio JM, Werner EI, Allen J, De Nardi AB, Donelan EM, Marino G, Fassati A, Campbell PJ, Yang F, Burt A, Weiss RA and Stratton MR

    Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Canine transmissible venereal tumor (CTVT) is the oldest known somatic cell lineage. It is a transmissible cancer that propagates naturally in dogs. We sequenced the genomes of two CTVT tumors and found that CTVT has acquired 1.9 million somatic substitution mutations and bears evidence of exposure to ultraviolet light. CTVT is remarkably stable and lacks subclonal heterogeneity despite thousands of rearrangements, copy-number changes, and retrotransposon insertions. More than 10,000 genes carry nonsynonymous variants, and 646 genes have been lost. CTVT first arose in a dog with low genomic heterozygosity that may have lived about 11,000 years ago. The cancer spawned by this individual dispersed across continents about 500 years ago. Our results provide a genetic identikit of an ancient dog and demonstrate the robustness of mammalian somatic cells to survive for millennia despite a massive mutation burden.

    Funded by: Wellcome Trust: 098051

    Science (New York, N.Y.) 2014;343;6169;437-40

  • Extreme Growth Failure is a Common Presentation of Ligase IV Deficiency.

    Murray JE, Bicknell LS, Yigit G, Duker AL, van Kogelenberg M, Haghayegh S, Wieczorek D, Kayserili H, Albert MH, Wise CA, Brandon J, Kleefstra T, Warris A, van der Flier M, Bamforth JS, Doonanco K, Adès L, Ma A, Field M, Johnson D, Shackley F, Firth H, Woods CG, Nürnberg P, Gatti RA, Hurles M, Bober MB, Wollnik B and Jackson AP

    MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK.

    Ligase IV syndrome is a rare differential diagnosis for Nijmegen breakage syndrome owing to a shared predisposition to lympho-reticular malignancies, significant microcephaly, and radiation hypersensitivity. Only 16 cases with mutations in LIG4 have been described to date with phenotypes varying from malignancy in developmentally normal individuals, to severe combined immunodeficiency and early mortality. Here, we report the identification of biallelic truncating LIG4 mutations in 11 patients with microcephalic primordial dwarfism presenting with restricted prenatal growth and extreme postnatal global growth failure (average OFC -10.1 s.d., height -5.1 s.d.). Subsequently, most patients developed thrombocytopenia and leucopenia later in childhood and many were found to have previously unrecognized immunodeficiency following molecular diagnosis. None have yet developed malignancy, though all patients tested had cellular radiosensitivity. A genotype-phenotype correlation was also noted with position of truncating mutations corresponding to disease severity. This work extends the phenotypic spectrum associated with LIG4 mutations, establishing that extreme growth retardation with microcephaly is a common presentation of bilallelic truncating mutations. Such growth failure is therefore sufficient to consider a diagnosis of LIG4 deficiency and early recognition of such cases is important as bone marrow failure, immunodeficiency, and sometimes malignancy are long term sequelae of this disorder.

    Human mutation 2014;35;1;76-85

  • Differential methylation of the TRPA1 promoter in pain sensitivity.

    MuTHER Consortium

    Chronic pain is a global public health problem, but the underlying molecular mechanisms are not fully understood. Here we examine genome-wide DNA methylation, first in 50 identical twins discordant for heat pain sensitivity and then in 50 further unrelated individuals. Whole-blood DNA methylation was characterized at 5.2 million loci by MeDIP sequencing and assessed longitudinally to identify differentially methylated regions associated with high or low pain sensitivity (pain DMRs). Nine meta-analysis pain DMRs show robust evidence for association (false discovery rate 5%) with the strongest signal in the pain gene TRPA1 (P=1.2 × 10(-13)). Several pain DMRs show longitudinal stability consistent with susceptibility effects, have similar methylation levels in the brain and altered expression in the skin. Our approach identifies epigenetic changes in both novel and established candidate genes that provide molecular insights into pain and may generalize to other complex traits.

    Nature communications 2014;5;2978

  • Association of a germline copy number polymorphism of APOBEC3A and APOBEC3B with burden of putative APOBEC-dependent mutations in breast cancer.

    Nik-Zainal S, Wedge DC, Alexandrov LB, Petljak M, Butler AP, Bolli N, Davies HR, Knappskog S, Martin S, Papaemmanuil E, Ramakrishna M, Shlien A, Simonic I, Xue Y, Tyler-Smith C, Campbell PJ and Stratton MR

    1] Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK. [2] Department of Medical Genetics, Addenbrooke's Hospital National Health Service (NHS) Trust, Cambridge, UK.

    The somatic mutations in a cancer genome are the aggregate outcome of one or more mutational processes operative through the lifetime of the individual with cancer. Each mutational process leaves a characteristic mutational signature determined by the mechanisms of DNA damage and repair that constitute it. A role was recently proposed for the APOBEC family of cytidine deaminases in generating particular genome-wide mutational signatures and a signature of localized hypermutation called kataegis. A germline copy number polymorphism involving APOBEC3A and APOBEC3B, which effectively deletes APOBEC3B, has been associated with modestly increased risk of breast cancer. Here we show that breast cancers in carriers of the deletion show more mutations of the putative APOBEC-dependent genome-wide signatures than cancers in non-carriers. The results suggest that the APOBEC3A-APOBEC3B germline deletion allele confers cancer susceptibility through increased activity of APOBEC-dependent mutational processes, although the mechanism by which this increase in activity occurs remains unknown.

    Nature genetics 2014

  • Changes in malaria parasite drug resistance in an endemic population over a 25-year period with resulting genomic evidence of selection.

    Nwakanma DC, Duffy CW, Amambua-Ngwa A, Oriero EC, Bojang KA, Pinder M, Drakeley CJ, Sutherland CJ, Milligan PJ, Macinnis B, Kwiatkowski DP, Clark TG, Greenwood BM and Conway DJ

    Medical Research Council Unit, Fajara, The Gambia.

    Background. Analysis of genome-wide polymorphism in many organisms has potential to identify genes under recent selection. However, data on historical allele frequency changes are rarely available for direct confirmation. Methods. We genotyped single nucleotide polymorphisms (SNPs) in 4 Plasmodium falciparum drug resistance genes in 668 archived parasite-positive blood samples of a Gambian population between 1984 and 2008. This covered a period before antimalarial resistance was detected locally, through subsequent failure of multiple drugs until introduction of artemisinin combination therapy. We separately performed genome-wide sequence analysis of 52 clinical isolates from 2008 to prospect for loci under recent directional selection. Results. Resistance alleles increased from very low frequencies, peaking in 2000 for chloroquine resistance-associated crt and mdr1 genes and at the end of the survey period for dhfr and dhps genes respectively associated with pyrimethamine and sulfadoxine resistance. Temporal changes fit a model incorporating likely selection coefficients over the period. Three of the drug resistance loci were in the top 4 regions under strong selection implicated by the genome-wide analysis. Conclusions. Genome-wide polymorphism analysis of an endemic population sample robustly identifies loci with detailed documentation of recent selection, demonstrating power to prospectively detect emerging drug resistance genes.

    Funded by: Medical Research Council: G1100123, MC_U190081987, MC_U190092708

    The Journal of infectious diseases 2014;209;7;1126-35

  • Linking tissues to phenotypes using gene expression profiles.

    Oellrich A, Sanger Mouse Genetics Project and Smedley D

    Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Despite great biological and computational efforts to determine the genetic causes underlying human heritable diseases, approximately half (3500) of these diseases are still without an identified genetic cause. Model organism studies allow the targeted modification of the genome and can help with the identification of genetic causes for human diseases. Targeted modifications have led to a vast amount of model organism data. However, these data are scattered across different databases, preventing an integrated view and missing out on contextual information. Once we are able to combine all the existing resources, will we be able to fully understand the causes underlying a disease and how species differ. Here, we present an integrated data resource combining tissue expression with phenotypes in mouse lines and bringing us one step closer to consequence chains from a molecular level to a resulting phenotype. Mutations in genes often manifest in phenotypes in the same tissue that the gene is expressed in. However, in other cases, a systems level approach is required to understand how perturbations to gene-networks connecting multiple tissues lead to a phenotype. Automated evaluation of the predicted tissue-phenotype associations reveals that 72-76% of the phenotypes are associated with disruption of genes expressed in the affected tissue. However, 55-64% of the individual phenotype-tissue associations show spatially separated gene expression and phenotype manifestation. For example, we see a correlation between 'total body fat' abnormalities and genes expressed in the 'brain', which fits recent discoveries linking genes expressed in the hypothalamus to obesity. Finally, we demonstrate that the use of our predicted tissue-phenotype associations can improve the detection of a known disease-gene association when combined with a disease gene candidate prediction tool. For example, JAK2, the known gene associated with Familial Erythrocytosis 1, rises from the seventh best candidate to the top hit when the associated tissues are taken into consideration. Database URL:

    Database : the journal of biological databases and curation 2014;2014;0;bau017

  • An Evaluation of HIV Elite Controller Definitions within a Large Seroconverter Cohort Collaboration.

    Olson AD, Meyer L, Prins M, Thiebaut R, Gurdasani D, Guiguet M, Chaix ML, Amornkul P, Babiker A, Sandhu MS, Porter K and for C. A. S. C. A. D. E. Collaboration in EuroCoord

    Medical Research Council Clinical Trials Unit at University College London, London, United Kingdom.

    Background: Understanding the mechanisms underlying viral control is highly relevant to vaccine studies and elite control (EC) of HIV infection. Although numerous definitions of EC exist, it is not clear which, if any, best identify this rare phenotype. Methods: We assessed a number of EC definitions used in the literature using CASCADE data of 25,692 HIV seroconverters. We estimated proportions maintaining EC of total ART-naïve follow-up time, and disease progression, comparing to non-EC. We also examined HIV-RNA and CD4 values and CD4 slope during EC and beyond (while ART naïve). Results: Most definitions classify ∼1% as ECs with median HIV-RNA 43-903 copies/ml and median CD4>500 cells/mm(3). Beyond EC status, median HIV-RNA levels remained low, although often detectable, and CD4 values high but with strong evidence of decline for all definitions. Median % ART-naïve time as EC was ≥92% although overlap between definitions was low. EC definitions with consecutive HIV-RNA measurements <75 copies/ml with follow-up≥ six months, or with 90% of measurements <400 copies/ml over ≥10 year follow-up preformed best overall. Individuals thus defined were less likely to progress to endpoint (hazard ratios ranged from 12.5-19.0 for non-ECs compared to ECs). Conclusions: ECs are rare, less likely to progress to clinical disease, but may eventually lose control. We suggest definitions requiring individuals to have consecutive undetectable HIV-RNA measurements for ≥ six months or otherwise with >90% of measurements <400 copies/ml over ≥10 years be used to define this phenotype.

    PloS one 2014;9;1;e86719

  • Loss of FTO Antagonises Wnt Signaling and Leads to Developmental Defects Associated with Ciliopathies.

    Osborn DP, Roccasecca RM, McMurray F, Hernandez-Hernandez V, Mukherjee S, Barroso I, Stemple D, Cox R, Beales PL and Christou-Savina S

    Biomedical Sciences, St George's University of London, London, United Kingdom.

    Common intronic variants in the Human fat mass and obesity-associated gene (FTO) are found to be associated with an increased risk of obesity. Overexpression of FTO correlates with increased food intake and obesity, whilst loss-of-function results in lethality and severe developmental defects. Despite intense scientific discussions around the role of FTO in energy metabolism, the function of FTO during development remains undefined. Here, we show that loss of Fto leads to developmental defects such as growth retardation, craniofacial dysmorphism and aberrant neural crest cells migration in Zebrafish. We find that the important developmental pathway, Wnt, is compromised in the absence of FTO, both in vivo (zebrafish) and in vitro (Fto(-/-) MEFs and HEK293T). Canonical Wnt signalling is down regulated by abrogated β-Catenin translocation to the nucleus whilst non-canonical Wnt/Ca(2+) pathway is activated via its key signal mediators CaMKII and PKCδ. Moreover, we demonstrate that loss of Fto results in short, absent or disorganised cilia leading to situs inversus, renal cystogenesis, neural crest cell defects and microcephaly in Zebrafish. Congruently, Fto knockout mice display aberrant tissue specific cilia. These data identify FTO as a protein-regulator of the balanced activation between canonical and non-canonical branches of the Wnt pathway. Furthermore, we present the first evidence that FTO plays a role in development and cilia formation/function.

    PloS one 2014;9;2;e87662

  • Unexplained diarrhoea in HIV-1 infected individuals.

    Oude Munnink BB, Canuti M, Deijs M, de Vries M, Jebbink MF, Rebers S, Molenkamp R, van Hemert FJ, Chung K, Cotten M, Snijders F, Sol CJ and van der Hoek L

    Laboratory of Experimental Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center of the University of Amsterdam, Amsterdam, The Netherlands.

    Background: Gastrointestinal symptoms, in particular diarrhoea, are common in non-treated HIV-1 infected individuals. Although various enteric pathogens have been implicated, the aetiology of diarrhoea remains unexplained in a large proportion of HIV-1 infected patients. Our aim is to identify the cause of diarrhoea for patients that remain negative in routine diagnostics.

    Methods: In this study stool samples of 196 HIV-1 infected persons, including 29 persons with diarrhoea, were examined for enteropathogens and HIV-1. A search for unknown and unexpected viruses was performed using virus discovery cDNA-AFLP combined with Roche-454 sequencing (VIDISCA-454).

    Results: HIV-1 RNA was detected in stool of 19 patients with diarrhoea (66%) compared to 75 patients (45%) without diarrhoea. In 19 of the 29 diarrhoea cases a known enteropathogen could be identified (66%). Next to these known causative agents, a range of recently identified viruses was identified via VIDISCA-454: cosavirus, Aichi virus, human gyrovirus, and non-A non-B hepatitis virus. Moreover, a novel virus was detected which was named immunodeficiency-associated stool virus (IASvirus). However, PCR based screening for these viruses showed that none of these novel viruses was associated with diarrhoea. Notably, among the 34% enteropathogen-negative cases, HIV-1 RNA shedding in stool was more frequently observed (80%) compared to enteropathogen-positive cases (47%), indicating that HIV-1 itself is the most likely candidate to be involved in diarrhoea.

    Conclusion: Unexplained diarrhoea in HIV-1 infected patients is probably not caused by recently described or previously unknown pathogens, but it is more likely that HIV-1 itself plays a role in intestinal mucosal abnormalities which leads to diarrhoea.

    BMC infectious diseases 2014;14;1;22

  • RAG-mediated recombination is the predominant driver of oncogenic rearrangement in ETV6-RUNX1 acute lymphoblastic leukemia.

    Papaemmanuil E, Rapado I, Li Y, Potter NE, Wedge DC, Tubio J, Alexandrov LB, Van Loo P, Cooke SL, Marshall J, Martincorena I, Hinton J, Gundem G, van Delft FW, Nik-Zainal S, Jones DR, Ramakrishna M, Titley I, Stebbings L, Leroy C, Menzies A, Gamble J, Robinson B, Mudie L, Raine K, O'Meara S, Teague JW, Butler AP, Cazzaniga G, Biondi A, Zuna J, Kempski H, Muschen M, Ford AM, Stratton MR, Greaves M and Campbell PJ

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, UK.

    The ETV6-RUNX1 fusion gene, found in 25% of childhood acute lymphoblastic leukemia (ALL) cases, is acquired in utero but requires additional somatic mutations for overt leukemia. We used exome and low-coverage whole-genome sequencing to characterize secondary events associated with leukemic transformation. RAG-mediated deletions emerge as the dominant mutational process, characterized by recombination signal sequence motifs near breakpoints, incorporation of non-templated sequence at junctions, ∼30-fold enrichment at promoters and enhancers of genes actively transcribed in B cell development and an unexpectedly high ratio of recurrent to non-recurrent structural variants. Single-cell tracking shows that this mechanism is active throughout leukemic evolution, with evidence of localized clustering and reiterated deletions. Integration of data on point mutations and rearrangements identifies ATF7IP and MGA as two new tumor-suppressor genes in ALL. Thus, a remarkably parsimonious mutational process transforms ETV6-RUNX1-positive lymphoblasts, targeting the promoters, enhancers and first exons of genes that normally regulate B cell differentiation.

    Funded by: NCI NIH HHS: R01 CA157644; Wellcome Trust: 077012/05/Z, WT088340MA, WT100183MA

    Nature genetics 2014;46;2;116-25

  • Prevalence and properties of mecC methicillin-resistant Staphylococcus aureus (MRSA) in bovine bulk tank milk in Great Britain.

    Paterson GK, Morgan FJ, Harrison EM, Peacock SJ, Parkhill J, Zadoks RN and Holmes MA

    Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge CB3 0ES, UK.

    Objectives: mecC methicillin-resistant Staphylococcus aureus (MRSA) represent a newly recognized form of MRSA, distinguished by the possession of a divergent mecA homologue, mecC. The first isolate to be identified came from bovine milk, but there are few data on the prevalence of mecC MRSA among dairy cattle. The aim of this study was to conduct a prevalence study of mecC MRSA among dairy farms in Great Britain. Methods: Test farms were randomly selected by random order generation and bulk tank samples were tested for the presence of mecC MRSA by broth enrichment and plating onto chromogenic agar. All MRSA isolated were screened by PCR for mecA and mecC, and mecC MRSA were further characterized by multilocus sequence typing, spa typing and antimicrobial susceptibility testing. Results: mecC MRSA were detected on 10 of 465 dairy farms sampled in England and Wales (prevalence 2.15%, 95% CI 1.17%-3.91%), but not from 625 farms sampled in Scotland (95% CI of prevalence 0%-0.61%). Seven isolates belonged to sequence type (ST) 425, while the other three belonged to clonal complex 130. Resistance to non-β-lactam antibiotics was uncommon. All 10 isolates produced a negative result by slide agglutination for penicillin-binding protein 2a. mecA MRSA ST398 was detected on one farm in England. Conclusions: mecC MRSA is widely distributed among dairy farms in Great Britain, but this distribution is not uniform across the whole country. These results provide an important baseline dataset to monitor the epidemiology of this emerging form of MRSA.

    Funded by: Medical Research Council: G1001787

    The Journal of antimicrobial chemotherapy 2014;69;3;598-602

  • Functional interpretation of non-coding sequence variation: Concepts and challenges.

    Paul DS, Soranzo N and Beck S

    UCL Cancer Institute, University College London, London, United Kingdom.

    Understanding the functional mechanisms underlying genetic signals associated with complex traits and common diseases, such as cancer, diabetes and Alzheimer's disease, is a formidable challenge. Many genetic signals discovered through genome-wide association studies map to non-protein coding sequences, where their molecular consequences are difficult to evaluate. This article summarizes concepts for the systematic interpretation of non-coding genetic signals using genome annotation data sets in different cellular systems. We outline strategies for the global analysis of multiple association intervals and the in-depth molecular investigation of individual intervals. We highlight experimental techniques to validate candidate (potential causal) regulatory variants, with a focus on novel genome-editing techniques including CRISPR/Cas9. These approaches are also applicable to low-frequency and rare variants, which have become increasingly important in genomic studies of complex traits and diseases. There is a pressing need to translate genetic signals into biological mechanisms, leading to prognostic, diagnostic and therapeutic advances.

    BioEssays : news and reviews in molecular, cellular and developmental biology 2014;36;2;191-9

  • Functional genomics reveals that Clostridium difficile Spo0A coordinates sporulation, virulence and metabolism.

    Pettit LJ, Browne HP, Yu L, Smits WK, Fagan RP, Barquist L, Martin MJ, Goulding D, Duncan SH, Flint HJ, Dougan G, Choudhary JS and Lawley TD

    Wellcome Trust Sanger Institute, Hinxton, UK.

    Background: Clostridium difficile is an anaerobic, Gram-positive bacterium that can reside as a commensal within the intestinal microbiota of healthy individuals or cause life-threatening antibiotic-associated diarrhea in immunocompromised hosts. C. difficile can also form highly resistant spores that are excreted facilitating host-to-host transmission. The C. difficile spo0A gene encodes a highly conserved transcriptional regulator of sporulation that is required for relapsing disease and transmission in mice.

    Results: Here we describe a genome-wide approach using a combined transcriptomic and proteomic analysis to identify Spo0A regulated genes. Our results validate Spo0A as a positive regulator of putative and novel sporulation genes as well as components of the mature spore proteome. We also show that Spo0A regulates a number of virulence-associated factors such as flagella and metabolic pathways including glucose fermentation leading to butyrate production.

    Conclusions: The C. difficile spo0A gene is a global transcriptional regulator that controls diverse sporulation, virulence and metabolic phenotypes coordinating pathogen adaptation to a wide range of host interactions. Additionally, the rich breadth of functional data allowed us to significantly update the annotation of the C. difficile 630 reference genome which will facilitate basic and applied research on this emerging pathogen.

    Funded by: Wellcome Trust: 086418, 098051

    BMC genomics 2014;15;160

  • Vying over spilt oil.

    Pham N TA and Anonye BO

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2014

  • Emerging insights on intestinal dysbiosis during bacterial infections.

    Pham TA and Lawley TD

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, United Kingdom.

    Infection of the gastrointestinal tract is commonly linked to pathological imbalances of the resident microbiota, termed dysbiosis. In recent years, advanced high-throughput genomic approaches have allowed us to examine the microbiota in an unprecedented manner, revealing novel biological insights about infection-associated dysbiosis at the community and individual species levels. A dysbiotic microbiota is typically reduced in taxonomic diversity and metabolic function, and can harbour pathobionts that exacerbate intestinal inflammation or manifest systemic disease. Dysbiosis can also promote pathogen genome evolution, while allowing the pathogens to persist at high density and transmit to new hosts. A deeper understanding of bacterial pathogenicity in the context of the intestinal microbiota should unveil new approaches for developing diagnostics and therapies for enteropathogens.

    Current opinion in microbiology 2014;17C;67-74

  • Obituary: Professor Frederick Sanger:13 August 1918 – 19 November 2013

    Powell, D

    Genetics Society News 2014;70;16-18

  • A Central Role for GRB10 in Regulation of Islet Function in Man.

    Prokopenko I, Poon W, Mägi R, Prasad B R, Salehi SA, Almgren P, Osmark P, Bouatia-Naji N, Wierup N, Fall T, Stančáková A, Barker A, Lagou V, Osmond C, Xie W, Lahti J, Jackson AU, Cheng YC, Liu J, O'Connell JR, Blomstedt PA, Fadista J, Alkayyali S, Dayeh T, Ahlqvist E, Taneera J, Lecoeur C, Kumar A, Hansson O, Hansson K, Voight BF, Kang HM, Levy-Marchal C, Vatin V, Palotie A, Syvänen AC, Mari A, Weedon MN, Loos RJ, Ong KK, Nilsson P, Isomaa B, Tuomi T, Wareham NJ, Stumvoll M, Widen E, Lakka TA, Langenberg C, Tönjes A, Rauramaa R, Kuusisto J, Frayling TM, Froguel P, Walker M, Eriksson JG, Ling C, Kovacs P, Ingelsson E, McCarthy MI, Shuldiner AR, Silver KD, Laakso M, Groop L and Lyssenko V

    Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, United Kingdom; Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom; Department of Genomics of Common Disease, School of Public Health, Imperial College London, Hammersmith Hospital, London, United Kingdom.

    Variants in the growth factor receptor-bound protein 10 (GRB10) gene were in a GWAS meta-analysis associated with reduced glucose-stimulated insulin secretion and increased risk of type 2 diabetes (T2D) if inherited from the father, but inexplicably reduced fasting glucose when inherited from the mother. GRB10 is a negative regulator of insulin signaling and imprinted in a parent-of-origin fashion in different tissues. GRB10 knock-down in human pancreatic islets showed reduced insulin and glucagon secretion, which together with changes in insulin sensitivity may explain the paradoxical reduction of glucose despite a decrease in insulin secretion. Together, these findings suggest that tissue-specific methylation and possibly imprinting of GRB10 can influence glucose metabolism and contribute to T2D pathogenesis. The data also emphasize the need in genetic studies to consider whether risk alleles are inherited from the mother or the father.

    PLoS genetics 2014;10;4;e1004235

  • A Genome-wide Association Analysis of a Broad Psychosis Phenotype Identifies Three Loci for Further Investigation.

    Psychosis Endophenotypes International Consortium and Wellcome Trust Case-Control Consortium 2

    Background: Genome-wide association studies (GWAS) have identified several loci associated with schizophrenia and/or bipolar disorder. We performed a GWAS of psychosis as a broad syndrome rather than within specific diagnostic categories.

    Methods: 1239 cases with schizophrenia, schizoaffective disorder, or psychotic bipolar disorder; 857 of their unaffected relatives, and 2739 healthy controls were genotyped with the Affymetrix 6.0 single nucleotide polymorphism (SNP) array. Analyses of 695,193 SNPs were conducted using UNPHASED, which combines information across families and unrelated individuals. We attempted to replicate signals found in 23 genomic regions using existing data on nonoverlapping samples from the Psychiatric GWAS Consortium and Schizophrenia-GENE-plus cohorts (10,352 schizophrenia patients and 24,474 controls).

    Results: No individual SNP showed compelling evidence for association with psychosis in our data. However, we observed a trend for association with same risk alleles at loci previously associated with schizophrenia (one-sided p = .003). A polygenic score analysis found that the Psychiatric GWAS Consortium's panel of SNPs associated with schizophrenia significantly predicted disease status in our sample (p = 5 × 10(-14)) and explained approximately 2% of the phenotypic variance.

    Conclusions: Although narrowly defined phenotypes have their advantages, we believe new loci may also be discovered through meta-analysis across broad phenotypes. The novel statistical methodology we introduced to model effect size heterogeneity between studies should help future GWAS that combine association evidence from related phenotypes. Applying these approaches, we highlight three loci that warrant further investigation. We found that SNPs conveying risk for schizophrenia are also predictive of disease status in our data.

    Funded by: Medical Research Council: G0901310, G1100583

    Biological psychiatry 2014;75;5;386-97

  • A polygenic burden of rare disruptive mutations in schizophrenia.

    Purcell SM, Moran JL, Fromer M, Ruderfer D, Solovieff N, Roussos P, O'Dushlaine C, Chambert K, Bergen SE, Kähler A, Duncan L, Stahl E, Genovese G, Fernández E, Collins MO, Komiyama NH, Choudhary JS, Magnusson PK, Banks E, Shakir K, Garimella K, Fennell T, Depristo M, Grant SG, Haggarty SJ, Gabriel S, Scolnick EM, Lander ES, Hultman CM, Sullivan PF, McCarroll SA and Sklar P

    1] Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA [2] Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA [3] Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA [4] Analytic and Translational Genetics Unit, Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA [5] Medical and Population Genetics Program, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.

    Schizophrenia is a common disease with a complex aetiology, probably involving multiple and heterogeneous genetic factors. Here, by analysing the exome sequences of 2,536 schizophrenia cases and 2,543 controls, we demonstrate a polygenic burden primarily arising from rare (less than 1 in 10,000), disruptive mutations distributed across many genes. Particularly enriched gene sets include the voltage-gated calcium ion channel and the signalling complex formed by the activity-regulated cytoskeleton-associated scaffold protein (ARC) of the postsynaptic density, sets previously implicated by genome-wide association and copy-number variation studies. Similar to reports in autism, targets of the fragile X mental retardation protein (FMRP, product of FMR1) are enriched for case mutations. No individual gene-based test achieves significance after correction for multiple testing and we do not detect any alleles of moderately low frequency (approximately 0.5 to 1 per cent) and moderately large effect. Taken together, these data suggest that population-based exome sequencing can discover risk alleles and complements established gene-mapping paradigms in neuropsychiatric disease.

    Nature 2014

  • Characterizing genetic variants for clinical action.

    Ramos EM, Din-Lovinescu C, Berg JS, Brooks LD, Duncanson A, Dunn M, Good P, Hubbard TJ, Jarvik GP, O'Donnell C, Sherry ST, Aronson N, Biesecker LG, Blumberg B, Calonge N, Colhoun HM, Epstein RS, Flicek P, Gordon ES, Green ED, Green RC, Hurles M, Kawamoto K, Knaus W, Ledbetter DH, Levy HP, Lyon E, Maglott D, McLeod HL, Rahman N, Randhawa G, Wicklund C, Manolio TA, Chisholm RL and Williams MS

    Genome-wide association studies, DNA sequencing studies, and other genomic studies are finding an increasing number of genetic variants associated with clinical phenotypes that may be useful in developing diagnostic, preventive, and treatment strategies for individual patients. However, few variants have been integrated into routine clinical practice. The reasons for this are several, but two of the most significant are limited evidence about the clinical implications of the variants and a lack of a comprehensive knowledge base that captures genetic variants, their phenotypic associations, and other pertinent phenotypic information that is openly accessible to clinical groups attempting to interpret sequencing data. As the field of medicine begins to incorporate genome-scale analysis into clinical care, approaches need to be developed for collecting and characterizing data on the clinical implications of variants, developing consensus on their actionability, and making this information available for clinical use. The National Human Genome Research Institute (NHGRI) and the Wellcome Trust thus convened a workshop to consider the processes and resources needed to: (1) identify clinically valid genetic variants; (2) decide whether they are actionable and what the action should be; and (3) provide this information for clinical use. This commentary outlines the key discussion points and recommendations from the workshop. © 2014 Wiley Periodicals, Inc.

    American journal of medical genetics. Part C, Seminars in medical genetics 2014;166;1;93-104

  • MEROPS: the database of proteolytic enzymes, their substrates and inhibitors.

    Rawlings ND, Waller M, Barrett AJ and Bateman A

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK and Proteins and Protein Families, EMBO European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.

    Peptidases, their substrates and inhibitors are of great relevance to biology, medicine and biotechnology. The MEROPS database ( aims to fulfill the need for an integrated source of information about these. The database has hierarchical classifications in which homologous sets of peptidases and protein inhibitors are grouped into protein species, which are grouped into families, which are in turn grouped into clans. Recent developments include the following. A community annotation project has been instigated in which acknowledged experts are invited to contribute summaries for peptidases. Software has been written to provide an Internet-based data entry form. Contributors are acknowledged on the relevant web page. A new display showing the intron/exon structures of eukaryote peptidase genes and the phasing of the junctions has been implemented. It is now possible to filter the list of peptidases from a completely sequenced bacterial genome for a particular strain of the organism. The MEROPS filing pipeline has been altered to circumvent the restrictions imposed on non-interactive blastp searches, and a HMMER search using specially generated alignments to maximize the distribution of organisms returned in the search results has been added.

    Nucleic acids research 2014;42;1;D503-9

  • Bioinformatic Analysis of Expression Data to Identify Effector Candidates.

    Reid AJ and Jones JT

    Parasite Genomics, Wellcome Trust Sanger Institute, Genome Campus, Cambridge, CB10 1SA, UK,

    Pathogens produce effectors that manipulate the host to the benefit of the pathogen. These effectors are often secreted proteins that are upregulated during the early phases of infection. These properties can be used to identify candidate effectors from genomes and transcriptomes of pathogens. Here we describe commonly used bioinformatic approaches that (1) allow identification of genes encoding predicted secreted proteins within a genome and (2) allow the identification of genes encoding predicted secreted proteins that are upregulated at important stages of the life cycle. Other approaches for bioinformatic identification of effector candidates, including OrthoMCL analysis to identify expanded gene families, are also described.

    Methods in molecular biology (Clifton, N.J.) 2014;1127;17-27

  • Functional annotation of noncoding sequence variants.

    Ritchie GR, Dunham I, Zeggini E and Flicek P

    1] European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK. [2] Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Identifying functionally relevant variants against the background of ubiquitous genetic variation is a major challenge in human genetics. For variants in protein-coding regions, our understanding of the genetic code and splicing allows us to identify likely candidates, but interpreting variants outside genic regions is more difficult. Here we present genome-wide annotation of variants (GWAVA), a tool that supports prioritization of noncoding variants by integrating various genomic and epigenomic annotations.

    Funded by: Wellcome Trust: 095908, 098051

    Nature methods 2014;11;3;294-6

  • Improved exome prioritization of disease genes through cross-species phenotype comparison.

    Robinson PN, Köhler S, Oellrich A, Sanger Mouse Genetics Project, Wang K, Mungall CJ, Lewis SE, Washington N, Bauer S, Seelow D, Krawitz P, Gilissen C, Haendel M and Smedley D

    Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany;

    Numerous new disease-gene associations have been identified by whole-exome sequencing studies in the last few years. However, many cases remain unsolved due to the sheer number of candidate variants remaining after common filtering strategies such as removing low quality and common variants and those deemed unlikely to be pathogenic. The observation that each of our genomes contains about 100 genuine loss-of-function variants makes identification of the causative mutation problematic when using these strategies alone. We propose using the wealth of genotype to phenotype data that already exists from model organism studies to assess the potential impact of these exome variants. Here, we introduce PHenotypic Interpretation of Variants in Exomes (PHIVE), an algorithm that integrates the calculation of phenotype similarity between human diseases and genetically modified mouse models with evaluation of the variants according to allele frequency, pathogenicity, and mode of inheritance approaches in our Exomiser tool. Large-scale validation of PHIVE analysis using 100,000 exomes containing known mutations demonstrated a substantial improvement (up to 54.1-fold) over purely variant-based (frequency and pathogenicity) methods with the correct gene recalled as the top hit in up to 83% of samples, corresponding to an area under the ROC curve of >95%. We conclude that incorporation of phenotype data can play a vital role in translational bioinformatics and propose that exome sequencing projects should systematically capture clinical phenotypes to take advantage of the strategy presented here.

    Funded by: NIH HHS: R24 OD011883

    Genome research 2014;24;2;340-8

  • POT1 loss-of-function variants predispose to familial melanoma.

    Robles-Espinoza CD, Harland M, Ramsay AJ, Aoude LG, Quesada V, Ding Z, Pooley KA, Pritchard AL, Tiffen JC, Petljak M, Palmer JM, Symmons J, Johansson P, Stark MS, Gartside MG, Snowden H, Montgomery GW, Martin NG, Liu JZ, Choi J, Makowski M, Brown KM, Dunning AM, Keane TM, López-Otín C, Gruis NA, Hayward NK, Bishop DT, Newton-Bishop JA and Adams DJ

    1] Experimental Cancer Genetics, Wellcome Trust Sanger Institute, Hinxton, UK. [2].

    Deleterious germline variants in CDKN2A account for around 40% of familial melanoma cases, and rare variants in CDK4, BRCA2, BAP1 and the promoter of TERT have also been linked to the disease. Here we set out to identify new high-penetrance susceptibility genes by sequencing 184 melanoma cases from 105 pedigrees recruited in the UK, The Netherlands and Australia that were negative for variants in known predisposition genes. We identified families where melanoma cosegregates with loss-of-function variants in the protection of telomeres 1 gene (POT1), with a proportion of family members presenting with an early age of onset and multiple primary tumors. We show that these variants either affect POT1 mRNA splicing or alter key residues in the highly conserved oligonucleotide/oligosaccharide-binding (OB) domains of POT1, disrupting protein-telomere binding and leading to increased telomere length. These findings suggest that POT1 variants predispose to melanoma formation via a direct effect on telomeres.

    Nature genetics 2014

  • Genomic confirmation of hybridisation and recent inbreeding in a vector-isolated Leishmania population.

    Rogers MB, Downing T, Smith BA, Imamura H, Sanders M, Svobodova M, Volf P, Berriman M, Cotton JA and Smith DF

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom ; Centre for Immunology and Infection, Department of Biology, University of York, York, United Kingdom.

    Although asexual reproduction via clonal propagation has been proposed as the principal reproductive mechanism across parasitic protozoa of the Leishmania genus, sexual recombination has long been suspected, based on hybrid marker profiles detected in field isolates from different geographical locations. The recent experimental demonstration of a sexual cycle in Leishmania within sand flies has confirmed the occurrence of hybridisation, but knowledge of the parasite life cycle in the wild still remains limited. Here, we use whole genome sequencing to investigate the frequency of sexual reproduction in Leishmania, by sequencing the genomes of 11 Leishmania infantum isolates from sand flies and 1 patient isolate in a focus of cutaneous leishmaniasis in the Çukurova province of southeast Turkey. This is the first genome-wide examination of a vector-isolated population of Leishmania parasites. A genome-wide pattern of patchy heterozygosity and SNP density was observed both within individual strains and across the whole group. Comparisons with other Leishmania donovani complex genome sequences suggest that these isolates are derived from a single cross of two diverse strains with subsequent recombination within the population. This interpretation is supported by a statistical model of the genomic variability for each strain compared to the L. infantum reference genome strain as well as genome-wide scans for recombination within the population. Further analysis of these heterozygous blocks indicates that the two parents were phylogenetically distinct. Patterns of linkage disequilibrium indicate that this population reproduced primarily clonally following the original hybridisation event, but that some recombination also occurred. This observation allowed us to estimate the relative rates of sexual and asexual reproduction within this population, to our knowledge the first quantitative estimate of these events during the Leishmania life cycle.

    Funded by: Wellcome Trust: 076355, 085822, 098051

    PLoS genetics 2014;10;1;e1004092

  • Rapid conversion of EUCOMM/KOMP-CSD alleles in mouse embryos using a cell-permeable Cre recombinase.

    Ryder E, Doe B, Gleeson D, Houghton R, Dalvi P, Grau E, Habib B, Miklejewska E, Newman S, Sethi D, Sinclair C, Vyas S, Wardle-Jones H, Sanger Mouse Genetics Project, Bottomley J, Bussell J, Galli A, Salisbury J and Ramirez-Solis R

    The Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK,

    We describe here use of a cell-permeable Cre to efficiently convert the EUCOMM/KOMP-CSD tm1a allele to the tm1b form in preimplantation mouse embryos in a high-throughput manner, consistent with the requirements of the International Mouse Phenotyping Consortium-affiliated NIH KOMP2 project. This method results in rapid allele conversion and minimizes the use of experimental animals when compared to conventional Cre transgenic mouse breeding, resulting in a significant reduction in costs and time with increased welfare benefits.

    Transgenic research 2014;23;1;177-85

  • The Yersinia pseudotuberculosis complex: Characterization and delineation of a new species, Yersinia wautersii.

    Savin C, Martin L, Bouchier C, Filali S, Chenau J, Zhou Z, Becher F, Fukushima H, Thomson NR, Scholz HC and Carniel E

    Yersinia Research Unit and National Reference Laboratory, Institut Pasteur, Paris, France. Electronic address:

    The genus Yersinia contains three species pathogenic for humans, one of which is the enteropathogen Yersinia pseudotuberculosis. A recent analysis by Multi Locus Sequence Typing (MLST) of the 'Y. pseudotuberculosis complex' revealed that this complex comprises three distinct populations: the Y. pestis/Y. pseudotuberculosis group, the recently described species Yersinia similis, and a third not yet characterized population designated 'Korean Group', because most strains were isolated in Korea. The aim of this study was to perform an in depth phenotypic and genetic characterization of the three populations composing the Y. pseudotuberculosis complex (excluding Y. pestis, which belonged to the Y. pseudotuberculosis cluster in the MLST analysis). Using a set of strains representative of each group, we found that the three populations had close metabolic properties, but were nonetheless distinguishable based on D-raffinose and D-melibiose fermentation, and on pyrazinamidase activity. Moreover, high-resolution electrospray mass spectrometry highlighted protein peaks characteristic of each population. Their 16S rRNA gene sequences shared high identity (≥99.5%), but specific nucleotide signatures for each group were identified. Multi-Locus Sequence Analysis also identified three genetically closely related but distinct populations. Finally, an Average Nucleotide Identity (ANI) analysis performed after sequencing the genomes of a subset of strains of each group also showed that intragroup identity (average for each group ≥99%) was higher than intergroup diversity (94.6-97.4%). Therefore, all phenotypic and genotypic traits studied concurred with the initial MLST data indicating that the Y. pseudotuberculosis complex comprises a third and clearly distinct population of strains forming a novel Yersinia species that we propose to designate Yersinia wautersii sp. nov. The isolation of some strains from humans, the detection of virulence genes (on the pYV and pVM82 plasmids, or encoding the superantigen ypmA) in some isolates, and the absence of pyrazinamidase activity (a hallmark of pathogenicity in the genus Yersinia) argue for the pathogenic potential of Y. wautersii.

    International journal of medical microbiology : IJMM 2014

  • TreeFam v9: a new website, more species and orthology-on-the-fly.

    Schreiber F, Patricio M, Muffato M, Pignatelli M and Bateman A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK and European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

    TreeFam ( is a database of phylogenetic trees inferred from animal genomes. For every TreeFam family we provide homology predictions together with the evolutionary history of the genes. Here we describe an update of the TreeFam database. The TreeFam project was resurrected in 2012 and has seen two releases since. The latest release (TreeFam 9) was made available in March 2013. It has orthology predictions and gene trees for 109 species in 15 736 families covering ∼2.2 million sequences. With release 9 we made modifications to our production pipeline and redesigned our website with improved gene tree visualizations and Wikipedia integration. Furthermore, we now provide an HMM-based sequence search that places a user-provided protein sequence into a TreeFam gene tree and provides quick orthology prediction. The tool uses Mafft and RAxML for the fast insertion into a reference alignment and tree, respectively. Besides the aforementioned technical improvements, we present a new approach to visualize gene trees and alternative displays that focuses on showing homology information from a species tree point of view. From release 9 onwards, TreeFam is now hosted at the EBI.

    Nucleic acids research 2014;42;1;D922-5

  • Re-sequencing Expands Our Understanding of the Phenotypic Impact of Variants at GWAS Loci.

    Service SK, Teslovich TM, Fuchsberger C, Ramensky V, Yajnik P, Koboldt DC, Larson DE, Zhang Q, Lin L, Welch R, Ding L, McLellan MD, O'Laughlin M, Fronick C, Fulton LL, Magrini V, Swift A, Elliott P, Jarvelin MR, Kaakinen M, McCarthy MI, Peltonen L, Pouta A, Bonnycastle LL, Collins FS, Narisu N, Stringham HM, Tuomilehto J, Ripatti S, Fulton RS, Sabatti C, Wilson RK, Boehnke M and Freimer NB

    Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, California, United States of America.

    Genome-wide association studies (GWAS) have identified >500 common variants associated with quantitative metabolic traits, but in aggregate such variants explain at most 20-30% of the heritable component of population variation in these traits. To further investigate the impact of genotypic variation on metabolic traits, we conducted re-sequencing studies in >6,000 members of a Finnish population cohort (The Northern Finland Birth Cohort of 1966 [NFBC]) and a type 2 diabetes case-control sample (The Finland-United States Investigation of NIDDM Genetics [FUSION] study). By sequencing the coding sequence and 5' and 3' untranslated regions of 78 genes at 17 GWAS loci associated with one or more of six metabolic traits (serum levels of fasting HDL-C, LDL-C, total cholesterol, triglycerides, plasma glucose, and insulin), and conducting both single-variant and gene-level association tests, we obtained a more complete understanding of phenotype-genotype associations at eight of these loci. At all eight of these loci, the identification of new associations provides significant evidence for multiple genetic signals to one or more phenotypes, and at two loci, in the genes ABCA1 and CETP, we found significant gene-level evidence of association to non-synonymous variants with MAF<1%. Additionally, two potentially deleterious variants that demonstrated significant associations (rs138726309, a missense variant in G6PC2, and rs28933094, a missense variant in LIPC) were considerably more common in these Finnish samples than in European reference populations, supporting our prior hypothesis that deleterious variants could attain high frequencies in this isolated population, likely due to the effects of population bottlenecks. Our results highlight the value of large, well-phenotyped samples for rare-variant association analysis, and the challenge of evaluating the phenotypic impact of such variants.

    PLoS genetics 2014;10;1;e1004147

  • Genomic Epidemiology of Vibrio cholerae O1 Associated with Floods, Pakistan, 2010.

    Shah MA, Mutreja A, Thomson N, Baker S, Parkhill J, Dougan G, Bokhari H and Wren BW

    In August 2010, Pakistan experienced major floods and a subsequent cholera epidemic. To clarify the population dynamics and transmission of Vibrio cholerae in Pakistan, we sequenced the genomes of all V. cholerae O1 El Tor isolates and compared the sequences to a global collection of 146 V. cholerae strains. Within the global phylogeny, all isolates from Pakistan formed 2 new subclades (PSC-1 and PSC-2), lying in the third transmission wave of the seventh-pandemic lineage that could be distinguished by signature deletions and their antimicrobial susceptibilities. Geographically, PSC-1 isolates originated from the coast, whereas PSC-2 isolates originated from inland areas flooded by the Indus River. Single-nucleotide polymorphism accumulation analysis correlated river flow direction with the spread of PSC-2. We found at least 2 sources of cholera in Pakistan during the 2010 epidemic and illustrate the value of a global genomic data bank in contextualizing cholera outbreaks.

    Emerging infectious diseases 2014;20;1;13-20

  • Efficient genome modification by CRISPR-Cas9 nickase with minimal off-target effects.

    Shen B, Zhang W, Zhang J, Zhou J, Wang J, Chen L, Wang L, Hodgkins A, Iyer V, Huang X and Skarnes WC

    1] Ministry of Education Key Laboratory of Model Animal for Disease Study, Model Animal Research Center of Nanjing University, Nanjing, China. [2].

    Bacterial RNA-directed Cas9 endonuclease is a versatile tool for site-specific genome modification in eukaryotes. Co-microinjection of mouse embryos with Cas9 mRNA and single guide RNAs induces on-target and off-target mutations that are transmissible to offspring. However, Cas9 nickase can be used to efficiently mutate genes without detectable damage at known off-target sites. This method is applicable for genome editing of any model organism and minimizes confounding problems of off-target mutations.

    Nature methods 2014

  • Plasmid deficiency in urogenital isolates of Chlamydia trachomatis reduces infectivity and virulence in a mouse model.

    Sigar IM, Schripsema JH, Wang Y, Clarke IN, Cutcliffe LT, Seth-Smith HM, Thomson NR, Bjartling C, Unemo M, Persson K and Ramsey KH

    Microbiology and Immunology Department, Chicago College of Osteopathic Medicine, Midwestern University, Downers Grove, IL, USA.

    We hypothesized that the plasmid of urogenital isolates of Chlamydia trachomatis would modulate infectivity and virulence in a mouse model. To test this hypothesis, we infected female mice in the respiratory or urogenital tract with graded doses of a human urogenital isolate of C. trachomatis, serovar F, possessing the cognate plasmid. For comparison, we inoculated mice with a plasmid-free serovar F isolate. Following urogenital inoculation, the plasmid-free isolate displayed significantly reduced infectivity compared with the wild-type strain with the latter yielding a 17-fold lower infectious dose to yield 50% infection. When inoculated via the respiratory tract, the plasmid-free isolate exhibited reduced infectivity and virulence (as measured by weight change) when compared to the wild-type isolate. Further, differences in infectivity, but not in virulence were observed in a C. trachomatis, serovar E isolate with a deletion within the plasmid coding sequence 1 when compared to a serovar E isolate with no mutations in the plasmid. We conclude that plasmid loss reduces virulence and infectivity in this mouse model. These findings further support a role for the chlamydial plasmid in infectivity and virulence in vivo.

    Pathogens and disease 2014;70;1;61-9

  • Dissecting mammalian immunity through mutation.

    Siggs OM

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Although mutation and natural selection have given rise to our immune system, a well-placed mutation can also cripple it, and within an expanding population we are recognizing more and more cases of single-gene mutations that compromise immunity. These mutations are an ideal tool for understanding human immunology, and there are more ways than ever to measure their physiological effects. There are also more ways to create mutations in the laboratory, and to use these resources to systematically define the function of every gene in our genome. This review focuses on the discovery and creation of mutations in the context of mammalian immunity, with an emphasis on the use of genome-wide chemical and CRISPR/Cas9 mutagenesis to reveal gene function.Immunology and Cell Biology advance online publication, 11 February 2014; doi:10.1038/icb.2014.8.

    Immunology and cell biology 2014

  • A cascade of DNA-binding proteins for sexual commitment and development in Plasmodium.

    Sinha A, Hughes KR, Modrzynska KK, Otto TD, Pfander C, Dickens NJ, Religa AA, Bushell E, Graham AL, Cameron R, Kafsack BF, Williams AE, Llinás M, Berriman M, Billker O and Waters AP

    1] Wellcome Trust Centre for Molecular Parasitology, University of Glasgow, Glasgow G12 8QQ, UK [2].

    Commitment to and completion of sexual development are essential for malaria parasites (protists of the genus Plasmodium) to be transmitted through mosquitoes. The molecular mechanism(s) responsible for commitment have been hitherto unknown. Here we show that PbAP2-G, a conserved member of the apicomplexan AP2 (ApiAP2) family of DNA-binding proteins, is essential for the commitment of asexually replicating forms to sexual development in Plasmodium berghei, a malaria parasite of rodents. PbAP2-G was identified from mutations in its encoding gene, PBANKA_143750, which account for the loss of sexual development frequently observed in parasites transmitted artificially by blood passage. Systematic gene deletion of conserved ApiAP2 genes in Plasmodium confirmed the role of PbAP2-G and revealed a second ApiAP2 member (PBANKA_103430, here termed PbAP2-G2) that significantly modulates but does not abolish gametocytogenesis, indicating that a cascade of ApiAP2 proteins are involved in commitment to the production and maturation of gametocytes. The data suggest a mechanism of commitment to gametocytogenesis in Plasmodium consistent with a positive feedback loop involving PbAP2-G that could be exploited to prevent the transmission of this pernicious parasite.

    Funded by: Howard Hughes Medical Institute; Medical Research Council: G0501670; NIAID NIH HHS: R01 AI076276; NIGMS NIH HHS: P50GM071508; Wellcome Trust: 083811/Z/07/Z, 085349, 098051

    Nature 2014;507;7491;253-7

  • IFITM proteins-cellular inhibitors of viral entry.

    Smith S, Weston S, Kellam P and Marsh M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.

    Interferon inducible transmembrane (IFITM) proteins are a recently discovered family of cellular anti-viral proteins that restrict the replication of a number of enveloped and non-enveloped viruses. IFITM proteins are located in the plasma membrane and endosomal membranes, the main portals of entry for many viruses. Biochemical and membrane fusion studies suggest IFITM proteins have the ability to inhibit viral entry, possibly by modulating the fluidity of cellular membranes. Here we discuss the IFITM proteins, recent work on their mode of action, and future directions for research.

    Current opinion in virology 2014;4C;71-77

  • Investigating the Feasibility of Scale up and Automation of Human Induced Pluripotent Stem Cells Cultured in Aggregates in Feeder Free Conditions.

    Soares F, Chandra A, Thomas R, Pedersen R, Vallier L and Williams D

    Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, Anne McLaren Laboratory for Regenerative medicine and Department of Surgery, University of Cambridge, UK; Centre for Biological Engineering, Loughborough University, UK. Electronic address:

    The transfer of a laboratory process into a manufacturing facility is one of the most critical steps required for the large scale production of cell-based therapy products. This study describes the first published protocol for scalable automated expansion of human induced pluripotent stem cell lines growing in aggregates in feeder-free and chemically defined medium. Cells were successfully transferred between different sites representative of research and manufacturing settings; and passaged manually and using the CompacT Select automation platform. Modified protocols were developed for the automated system and the management of cells aggregates (clumps) was identified as the critical step. Cellular morphology, pluripotency gene expression and differentiation into the three germ layers have been used compare the outcomes of manual and automated processes.

    Journal of biotechnology 2014

  • A genome-wide association study and biological pathway analysis of epilepsy prognosis in a prospective cohort of newly treated epilepsy.

    Speed D, Hoggart C, Petrovski S, Tachmazidou I, Coffey A, Jorgensen A, Eleftherohorinou H, De Iorio M, Todaro M, De T, Smith D, Smith PE, Jackson M, Cooper P, Kellett M, Howell S, Newton M, Yerra R, Tan M, French C, Reuber M, Sills GE, Chadwick D, Pirmohamed M, Bentley D, Scheffer I, Berkovic S, Balding D, Palotie A, Marson A, O'Brien TJ and Johnson MR

    UCL Genetics Institute, University College London WC1E 6BT, UK.

    We present the analysis of a prospective multicentre study to investigate genetic effects on the prognosis of newly treated epilepsy. Patients with a new clinical diagnosis of epilepsy requiring medication were recruited and followed up prospectively. The clinical outcome was defined as freedom from seizures for a minimum of 12 months in accordance with the consensus statement from the International League Against Epilepsy (ILAE). Genetic effects on remission of seizures after starting treatment were analysed with and without adjustment for significant clinical prognostic factors, and the results from each cohort were combined using a fixed-effects meta-analysis. After quality control (QC), we analysed 889 newly treated epilepsy patients using 472 450 genotyped and 6.9 × 10(6) imputed single-nucleotide polymorphisms. Suggestive evidence for association (defined as Pmeta < 5.0 × 10(-7)) with remission of seizures after starting treatment was observed at three loci: 6p12.2 (rs492146, Pmeta = 2.1 × 10(-7), OR[G] = 0.57), 9p23 (rs72700966, Pmeta = 3.1 × 10(-7), OR[C] = 2.70) and 15q13.2 (rs143536437, Pmeta = 3.2 × 10(-7), OR[C] = 1.92). Genes of biological interest at these loci include PTPRD and ARHGAP11B (encoding functions implicated in neuronal development) and GSTA4 (a phase II biotransformation enzyme). Pathway analysis using two independent methods implicated a number of pathways in the prognosis of epilepsy, including KEGG categories 'calcium signaling pathway' and 'phosphatidylinositol signaling pathway'. Through a series of power curves, we conclude that it is unlikely any single common variant explains >4.4% of the variation in the outcome of newly treated epilepsy.

    Human molecular genetics 2014;23;1;247-58

  • Neutrophils Recruited by IL-22 in Peripheral Tissues Function as TRAIL-Dependent Antiviral Effectors against MCMV.

    Stacey MA, Marsden M, Pham N TA, Clare S, Dolton G, Stack G, Jones E, Klenerman P, Gallimore AM, Taylor PR, Snelgrove RJ, Lawley TD, Dougan G, Benedict CA, Jones SA, Wilkinson GW and Humphreys IR

    Institute of Infection and Immunity, School of Medicine, Cardiff University, Cardiff CF14 4XN, Wales, UK.

    During primary infection, murine cytomegalovirus (MCMV) spreads systemically, resulting in virus replication and pathology in multiple organs. This disseminated infection is ultimately controlled, but the underlying immune defense mechanisms are unclear. Investigating the role of the cytokine IL-22 in MCMV infection, we discovered an unanticipated function for neutrophils as potent antiviral effector cells that restrict viral replication and associated pathogenesis in peripheral organs. NK-, NKT-, and T cell-secreted IL-22 orchestrated antiviral neutrophil-mediated responses via induction in stromal nonhematopoietic tissue of the neutrophil-recruiting chemokine CXCL1. The antiviral effector properties of infiltrating neutrophils were directly linked to the expression of TNF-related apoptosis-inducing ligand (TRAIL). Our data identify a role for neutrophils in antiviral defense, and establish a functional link between IL-22 and the control of antiviral neutrophil responses that prevents pathogenic herpesvirus infection in peripheral organs.

    Cell host & microbe 2014;15;4;471-83

  • Development of an antigen microarray for high throughput monoclonal antibody selection.

    Staudt N, Müller-Sienerth N and Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, United Kingdom. Electronic address:

    Monoclonal antibodies are valuable laboratory reagents and are increasingly being exploited as therapeutics to treat a range of diseases. Selecting new monoclonal antibodies that are validated to work in particular applications, despite the availability of several different techniques, can be resource intensive with uncertain outcomes. To address this, we have developed an approach that enables early screening of hybridoma supernatants generated from an animal immunised with up to five different antigens followed by cloning of the antibody into a single expression plasmid. While this approach relieved the cellular cloning bottleneck and had the desirable ability to screen antibody function prior to cloning, the small volume of hybridoma supernatant available for screening limited the number of antigens for pooled immunisation. Here, we report the development of an antigen microarray that significantly reduces the volume of supernatant required for functional screening. This approach permits a significant increase in the number of antigens for parallel monoclonal antibody selection from a single animal. Finally, we show the successful use of a convenient small-scale transfection method to rapidly identify plasmids that encode functional cloned antibodies, addressing another bottleneck in this approach. In summary, we show that a hybrid approach of combining established hybridoma antibody technology with refined screening and antibody cloning methods can be used to select monoclonal antibodies of desired functional properties against many different antigens from a single immunised host.

    Biochemical and biophysical research communications 2014

  • Common variant at 16p11.2 conferring risk of psychosis.

    Steinberg S, de Jong S, Mattheisen M, Costas J, Demontis D, Jamain S, Pietiläinen OP, Lin K, Papiol S, Huttenlocher J, Sigurdsson E, Vassos E, Giegling I, Breuer R, Fraser G, Walker N, Melle I, Djurovic S, Agartz I, Tuulio-Henriksson A, Suvisaari J, Lönnqvist J, Paunio T, Olsen L, Hansen T, Ingason A, Pirinen M, Strengman E, GROUP, Hougaard DM, Orntoft T, Didriksen M, Hollegaard MV, Nordentoft M, Abramova L, Kaleda V, Arrojo M, Sanjuán J, Arango C, Etain B, Bellivier F, Méary A, Schürhoff F, Szoke A, Ribolsi M, Magni V, Siracusano A, Sperling S, Rossner M, Christiansen C, Kiemeney LA, Franke B, van den Berg LH, Veldink J, Curran S, Bolton P, Poot M, Staal W, Rehnstrom K, Kilpinen H, Freitag CM, Meyer J, Magnusson P, Saemundsen E, Martsenkovsky I, Bikshaieva I, Martsenkovska I, Vashchenko O, Raleva M, Paketchieva K, Stefanovski B, Durmishi N, Pejovic Milovancevic M, Lecic Tosevski D, Silagadze T, Naneishvili N, Mikeladze N, Surguladze S, Vincent JB, Farmer A, Mitchell PB, Wright A, Schofield PR, Fullerton JM, Montgomery GW, Martin NG, Rubino IA, van Winkel R, Kenis G, De Hert M, Réthelyi JM, Bitter I, Terenius L, Jönsson EG, Bakker S, van Os J, Jablensky A, Leboyer M, Bramon E, Powell J, Murray R, Corvin A, Gill M, Morris D, O'Neill FA, Kendler K, Riley B, Wellcome Trust Case Control Consortium 2, Craddock N, Owen MJ, O'Donovan MC, Thorsteinsdottir U, Kong A, Ehrenreich H, Carracedo A, Golimbet V, Andreassen OA, Børglum AD, Mors O, Mortensen PB, Werge T, Ophoff RA, Nöthen MM, Rietschel M, Cichon S, Ruggeri M, Tosato S, Palotie A, St Clair D, Rujescu D, Collier DA, Stefansson H and Stefansson K

    deCODE genetics, Reykjavik, Iceland.

    Epidemiological and genetic data support the notion that schizophrenia and bipolar disorder share genetic risk factors. In our previous genome-wide association study, meta-analysis and follow-up (totaling as many as 18 206 cases and 42 536 controls), we identified four loci showing genome-wide significant association with schizophrenia. Here we consider a mixed schizophrenia and bipolar disorder (psychosis) phenotype (addition of 7469 bipolar disorder cases, 1535 schizophrenia cases, 333 other psychosis cases, 808 unaffected family members and 46 160 controls). Combined analysis reveals a novel variant at 16p11.2 showing genome-wide significant association (rs4583255[T]; odds ratio=1.08; P=6.6 × 10(-11)). The new variant is located within a 593-kb region that substantially increases risk of psychosis when duplicated. In line with the association of the duplication with reduced body mass index (BMI), rs4583255[T] is also associated with lower BMI (P=0.0039 in the public GIANT consortium data set; P=0.00047 in 22 651 additional Icelanders).

    Funded by: Medical Research Council: G0601030; NIMH NIH HHS: 1U24MH081810, 2N01MH080001-001, MH074027, N01 MH900001, R01 MH078075; Wellcome Trust: 075491/Z/04, 085475/B/08/Z, 085475/Z/08/Z, 085475PELTONEN, 098051

    Molecular psychiatry 2014;19;1;108-14

  • New mini- zincin structures provide a minimal scaffold for members of this metallopeptidase superfamily.

    Trame CB, Chang Y, Axelrod HL, Eberhardt RY, Coggill P, Punta M and Rawlings ND

    Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Background: The Acel_2062 protein from Acidothermus cellulolyticus is a protein of unknown function. Initial sequence analysis predicted that it was a metallopeptidase from the presence of a motif conserved amongst the Asp-zincins, which are peptidases that contain a single, catalytic zinc ion ligated by the histidines and aspartic acid within the motif (HEXXHXXGXXD). The Acel_2062 protein was chosen by the Joint Center for Structural Genomics for crystal structure determination to explore novel protein sequence space and structure-based function annotation. Results: The crystal structure confirmed that the Acel_2062 protein consisted of a single, zincin-like metallopeptidase-like domain. The Met-turn, a structural feature thought to be important for a Met-zincin because it stabilizes the active site, is absent, and its stabilizing role may have been conferred to the C-terminal Tyr113. In our crystallographic model there are two molecules in the asymmetric unit and from size-exclusion chromatography, the protein dimerizes in solution. A water molecule is present in the putative zinc-binding site in one monomer, which is replaced by one of two observed conformations of His95 in the other. Conclusions: The Acel_2062 protein is structurally related to the zincins. It contains the minimum structural features of a member of this protein superfamily, and can be described as a "mini- zincin". There is a striking parallel with the structure of a mini-Glu-zincin, which represents the minimum structure of a Glu-zincin (a metallopeptidase in which the third zinc ligand is a glutamic acid). Rather than being an ancestral state, phylogenetic analysis suggests that the mini-zincins are derived from larger proteins.

    BMC bioinformatics 2014;15;1;1

  • Naturally Acquired Antibodies Specific for Plasmodium falciparum Reticulocyte-Binding Protein Homologue 5 Inhibit Parasite Growth and Predict Protection From Malaria.

    Tran TM, Ongoiba A, Coursen J, Crosnier C, Diouf A, Huang CY, Li S, Doumbo S, Doumtabe D, Kone Y, Bathily A, Dia S, Niangaly M, Dara C, Sangala J, Miller LH, Doumbo OK, Kayentao K, Long CA, Miura K, Wright GJ, Traore B and Crompton PD

    Laboratory of Immunogenetics, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, Maryland.

    Background. Plasmodium falciparum reticulocyte-binding protein homologue 5 (PfRH5) is a blood-stage parasite protein essential for host erythrocyte invasion. PfRH5-specific antibodies raised in animals inhibit parasite growth in vitro, but the relevance of naturally acquired PfRH5-specific antibodies in humans is unclear. Methods. We assessed pre-malaria season PfRH5-specific immunoglobulin G (IgG) levels in 357 Malian children and adults who were uninfected with Plasmodium. Subsequent P. falciparum infections were detected by polymerase chain reaction every 2 weeks and malaria episodes by weekly physical examination and self-referral for 7 months. The primary outcome was time between the first P. falciparum infection and the first febrile malaria episode. PfRH5-specific IgG was assayed for parasite growth-inhibitory activity. Results. The presence of PfRH5-specific IgG at enrollment was associated with a longer time between the first blood-stage infection and the first malaria episode (PfRH5-seropositive median: 71 days, PfRH5-seronegative median: 18 days; P = .001). This association remained significant after adjustment for age and other factors associated with malaria risk/exposure (hazard ratio, .62; P = .02). Concentrated PfRH5-specific IgG purified from Malians inhibited P. falciparum growth in vitro. Conclusions. Naturally acquired PfRH5-specific IgG inhibits parasite growth in vitro and predicts protection from malaria. These findings strongly support efforts to develop PfRH5 as an urgently needed blood-stage malaria vaccine. Clinical Trials Registration NCT01322581.

    The Journal of infectious diseases 2014;209;5;789-98

  • Chromosome x-wide association study identifies Loci for fasting insulin and height and evidence for incomplete dosage compensation.

    Tukiainen T, Pirinen M, Sarin AP, Ladenvall C, Kettunen J, Lehtimäki T, Lokki ML, Perola M, Sinisalo J, Vlachopoulou E, Eriksson JG, Groop L, Jula A, Järvelin MR, Raitakari OT, Salomaa V and Ripatti S

    Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland ; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, United States of America ; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America.

    The X chromosome (chrX) represents one potential source for the "missing heritability" for complex phenotypes, which thus far has remained underanalyzed in genome-wide association studies (GWAS). Here we demonstrate the benefits of including chrX in GWAS by assessing the contribution of 404,862 chrX SNPs to levels of twelve commonly studied cardiometabolic and anthropometric traits in 19,697 Finnish and Swedish individuals with replication data on 5,032 additional Finns. By using a linear mixed model, we estimate that on average 2.6% of the additive genetic variance in these twelve traits is attributable to chrX, this being in proportion to the number of SNPs in the chromosome. In a chrX-wide association analysis, we identify three novel loci: two for height (rs182838724 near FGF16/ATRX/MAGT1, joint P-value = 2.71×10(-9), and rs1751138 near ITM2A, P-value = 3.03×10(-10)) and one for fasting insulin (rs139163435 in Xq23, P-value = 5.18×10(-9)). Further, we find that effect sizes for variants near ITM2A, a gene implicated in cartilage development, show evidence for a lack of dosage compensation. This observation is further supported by a sex-difference in ITM2A expression in whole blood (P-value = 0.00251), and is also in agreement with a previous report showing ITM2A escapes from X chromosome inactivation (XCI) in the majority of women. Hence, our results show one of the first links between phenotypic variation in a population sample and an XCI-escaping locus and pinpoint ITM2A as a potential contributor to the sexual dimorphism in height. In conclusion, our study provides a clear motivation for including chrX in large-scale genetic studies of complex diseases and traits.

    PLoS genetics 2014;10;2;e1004127

  • Loss-of-function mutations in MICU1 cause a brain and muscle disorder linked to primary alterations in mitochondrial calcium signaling.

    UK10K Consortium

    Mitochondrial Ca(2+) uptake has key roles in cell life and death. Physiological Ca(2+) signaling regulates aerobic metabolism, whereas pathological Ca(2+) overload triggers cell death. Mitochondrial Ca(2+) uptake is mediated by the Ca(2+) uniporter complex in the inner mitochondrial membrane, which comprises MCU, a Ca(2+)-selective ion channel, and its regulator, MICU1. Here we report mutations of MICU1 in individuals with a disease phenotype characterized by proximal myopathy, learning difficulties and a progressive extrapyramidal movement disorder. In fibroblasts from subjects with MICU1 mutations, agonist-induced mitochondrial Ca(2+) uptake at low cytosolic Ca(2+) concentrations was increased, and cytosolic Ca(2+) signals were reduced. Although resting mitochondrial membrane potential was unchanged in MICU1-deficient cells, the mitochondrial network was severely fragmented. Whereas the pathophysiology of muscular dystrophy and the core myopathies involves abnormal mitochondrial Ca(2+) handling, the phenotype associated with MICU1 deficiency is caused by a primary defect in mitochondrial Ca(2+) signaling, demonstrating the crucial role of mitochondrial Ca(2+) uptake in humans.

    Nature genetics 2014;46;2;188-93

  • Heps with pep: direct reprogramming into human hepatocytes.

    Vallier L

    Wellcome Trust-Medical Research Council Stem Cell Institute, Anne McLaren Institute for Regenerative Medicine, Department of Surgery, West Forvie Site, Robinson Way, Cambridge CB20SZ, UK; Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK. Electronic address:

    The limited supply and expansion capacity of primary human hepatocytes presents major challenges for pharmaceutical applications and development of cell-based therapies for liver diseases. Now in Cell Stem Cell, two papers demonstrate efficient direct reprogramming of human fibroblasts into induced hepatocytes, which exhibit metabolic properties similar to primary hepatocytes.

    Cell stem cell 2014;14;3;267-9

  • In vivo evolution of antimicrobial resistance in a series of Staphylococcus aureus patient isolates: the entire picture or a cautionary tale?

    van Hal SJ, Steen JA, Espedido BA, Grimmond SM, Cooper MA, Holden MT, Bentley SD, Gosbell IB and Jensen SO

    Antibiotic Resistance & Mobile Elements Group, School of Medicine, University of Western Sydney, Sydney, NSW, Australia.

    Objectives: To obtain an expanded understanding of antibiotic resistance evolution in vivo, particularly in the context of vancomycin exposure. Methods: The whole genomes of six consecutive methicillin-resistant Staphylococcus aureus blood culture isolates (ST239-MRSA-III) from a single patient exposed to various antimicrobials (over a 77 day period) were sequenced and analysed. Results: Variant analysis revealed the existence of non-susceptible sub-populations derived from a common susceptible ancestor, with the predominant circulating clone(s) selected for by type and duration of antimicrobial exposure. Conclusions: This study highlights the dynamic nature of bacterial evolution and that non-susceptible sub-populations can emerge from clouds of variation upon antimicrobial exposure. Diagnostically, this has direct implications for sample selection when using whole-genome sequencing as a tool to guide clinical therapy. In the context of bacteraemia, deep sequencing of bacterial DNA directly from patient blood samples would avoid culture 'bias' and identify mutations associated with circulating non-susceptible sub-populations, some of which may confer cross-resistance to alternate therapies.

    The Journal of antimicrobial chemotherapy 2014;69;2;363-7

  • Single cell analysis of cancer genomes.

    Van Loo P and Voet T

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK; Department of Human Genetics, VIB and KU Leuven, Leuven, Belgium.

    Genomic studies have provided key insights into how cancers develop, evolve, metastasize and respond to treatment. Cancers result from an interplay between mutation, selection and clonal expansions. In solid tumours, this Darwinian competition between subclones is also influenced by topological factors. Recent advances have made it possible to study cancers at the single cell level. These methods represent important tools to dissect cancer evolution and provide the potential to considerably change both cancer research and clinical practice. Here we discuss state-of-the-art methods for the isolation of a single cell, whole-genome and whole-transcriptome amplification of the cell's nucleic acids, as well as microarray and massively parallel sequencing analysis of such amplification products. We discuss the strengths and the limitations of the techniques, and explore single-cell methodologies for future cancer research, as well as diagnosis and treatment of the disease.

    Current opinion in genetics &amp; development 2014;24C;82-91

  • Adding genomic 'foliage' to the tree of life.

    Walker A

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2014;12;2;78

  • Heterozygous Loss-of-Function Mutations in YAP1 Cause Both Isolated and Syndromic Optic Fissure Closure Defects.

    Williamson KA, Rainger J, Floyd JA, Ansari M, Meynert A, Aldridge KV, Rainger JK, Anderson CA, Moore AT, Hurles ME, Clarke A, van Heyningen V, Verloes A, Taylor MS, Wilkie AO, UK10K Consortium and Fitzpatrick DR

    Medical Research Council Human Genetics Unit, Medical Research Council Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK.

    Exome sequence analysis of affected individuals from two families with autosomal-dominant inheritance of coloboma identified two different cosegregating heterozygous nonsense mutations (c.370C>T [p.Arg124(∗)] and c. 1066G>T [p.Glu356(∗)]) in YAP1. The phenotypes of the affected families differed in that one included no extraocular features and the other manifested with highly variable multisystem involvement, including hearing loss, intellectual disability, hematuria, and orofacial clefting. A combined LOD score of 4.2 was obtained for the association between YAP1 loss-of-function mutations and the phenotype in these families. YAP1 encodes an effector of the HIPPO-pathway-induced growth response, and whole-mount in situ hybridization in mouse embryos has shown that Yap1 is strongly expressed in the eye, brain, and fusing facial processes. RT-PCR showed that an alternative transcription start site (TSS) in intron 1 of YAP1 and Yap1 is widely used in human and mouse development, respectively. Transcripts from the alternative TSS are predicted to initiate at codon Met179 relative to the canonical transcript (RefSeq NM_001130145). In these alternative transcripts, the c.370C>T mutation in family 1305 is within the 5' UTR and cannot result in nonsense-mediated decay (NMD). The c. 1066G>T mutation in family 132 should result in NMD in transcripts from either TSS. Amelioration of the phenotype by the alternative transcripts provides a plausible explanation for the phenotypic differences between the families.

    American journal of human genetics 2014;94;2;295-302

  • Inactivating CUX1 mutations promote tumorigenesis.

    Wong CC, Martincorena I, Rust AG, Rashid M, Alifrangis C, Alexandrov LB, Tiffen JC, Kober C, Chronic Myeloid Disorders Working Group of the International Cancer Genome Consortium, Green AR, Massie CE, Nangalia J, Lempidaki S, Döhner H, Döhner K, Bray SJ, McDermott U, Papaemmanuil E, Campbell PJ and Adams DJ

    1] Experimental Cancer Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. [2] Department of Haematology, University of Cambridge, Hills Road, Cambridge, UK.

    A major challenge in cancer genetics is to determine which low-frequency somatic mutations are drivers of tumorigenesis. Here we interrogate the genomes of 7,651 diverse human cancers and find inactivating mutations in the homeodomain transcription factor gene CUX1 (cut-like homeobox 1) in ~1-5% of various tumors. Meta-analysis of CUX1 mutational status in 2,519 cases of myeloid malignancies reveals disruptive mutations associated with poor survival, highlighting the clinical significance of CUX1 loss. In parallel, we validate CUX1 as a bona fide tumor suppressor using mouse transposon-mediated insertional mutagenesis and Drosophila cancer models. We demonstrate that CUX1 deficiency activates phosphoinositide 3-kinase (PI3K) signaling through direct transcriptional downregulation of the PI3K inhibitor PIK3IP1 (phosphoinositide-3-kinase interacting protein 1), leading to increased tumor growth and susceptibility to PI3K-AKT inhibition. Thus, our complementary approaches identify CUX1 as a pan-driver of tumorigenesis and uncover a potential strategy for treating CUX1-mutant tumors.

    Funded by: Cancer Research UK; Medical Research Council; Wellcome Trust: 100140

    Nature genetics 2014;46;1;33-8

  • Plasmodium falciparum Erythrocyte Invasion: Combining Function with Immune Evasion.

    Wright GJ and Rayner JC

    Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom; Malaria Programme, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    PLoS pathogens 2014;10;3;e1003943

  • The Association Between Circulating Lipoprotein(a) and Type 2 Diabetes: Is It Causal?

    Ye Z, Haycock PC, Gurdasani D, Pomilla C, Boekholdt SM, Tsimikas S, Khaw KT, Wareham NJ, Sandhu MS and Forouhi NG

    MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, U.K.

    Epidemiological evidence supports a direct and causal association between lipoprotein(a) [Lp(a)] levels and coronary risk, but the nature of the association between Lp(a) levels and risk of type 2 diabetes (T2D) is unclear. In this study, we assessed the association of Lp(a) levels with risk of incident T2D and tested whether Lp(a) levels are causally linked to T2D. We analyzed data on 18,490 participants from the European Prospective Investigation of Cancer (EPIC)-Norfolk cohort that included adults aged 40-79 years at baseline 1993-1997. During an average 10 years of follow-up, 593 participants developed incident T2D. Cox regression models were used to estimate the association between Lp(a) levels and T2D. In Mendelian randomization analyses, based on EPIC-Norfolk combined with DIAbetes Genetics Replication And Meta-analysis data involving a total of 10,088 diabetes case participants and 68,346 control participants, we used a genetic variant (rs10455872) as an instrument to test whether the association between Lp(a) levels and T2D is causal. In adjusted analyses, there was an inverse association between Lp(a) levels and T2D: hazard ratio was 0.63 (95% CI 0.49-0.81; P trend = 0.003) comparing the top versus bottom quintile of Lp(a). In EPIC-Norfolk, a 1-SD increase in logLp(a) was associated with a lower risk of T2D (odds ratio [OR] 0.88 [95% CI: 0.80-0.95]). However, in Mendelian randomization analyses, a 1-SD increase in logLp(a) due to rs10455872, which explained 26.8% of the variability in Lp(a) levels, was not associated with risk of T2D (OR 1.03 [0.96-1.10]; P = 0.41). These prospective findings demonstrate a strong inverse association of Lp(a) levels with risk of T2D. However, a genetic variant that elevated Lp(a) levels was not associated with risk of T2D, suggesting that elevated Lp(a) levels are not causally associated with a lower risk of T2D.

    Diabetes 2014;63;1;332-42

  • One-step generation of different immunodeficient mice with multiple gene modifications by CRISPR/Cas9 mediated genome engineering.

    Zhou J, Shen B, Zhang W, Wang J, Yang J, Chen L, Zhang N, Zhu K, Xu J, Hu B, Leng Q and Huang X

    MOE Key Laboratory of Model Animal for Disease Study, Model Animal Research Center of Nanjing University, National Resource Center for Mutant Mice, Nanjing 210061, China.

    Taking advantage of the multiplexable genome engineering feature of the CRISPR/Cas9 system, we sought to generate different kinds of immunodeficient mouse strains by embryo co-microinjection of Cas9 mRNA and multiple sgRNAs targeting mouse B2m, Il2rg, Prf1, Prkdc, and Rag1. We successfully achieved multiple gene modifications, fragment deletion, double knockout of genes localizing on the same chromosome, and got different kinds of immunodeficient mouse models with different heritable genetic modifications at once, providing a one-step strategy for generating different immunodeficient mice which represents significant time-, labor-, and money-saving advantages over traditional approaches. Meanwhile, we improved the technology by optimizing the concentration of Cas9 and sgRNAs and designing two adjacent sgRNAs targeting one exon for each gene, which greatly increased the targeting efficiency and bi-allelic mutations.

    The international journal of biochemistry &amp; cell biology 2014;46;49-55

  • Dual sgRNAs facilitate CRISPR/Cas9-mediated mouse genome targeting.

    Zhou J, Wang J, Shen B, Chen L, Su Y, Yang J, Zhang W, Tian X and Huang X

    MOE Key Laboratory of Model Animal for Disease Study, Model Animal Research Center of Nanjing University, National Resource Center for Mutant Mice, Nanjing, China.

    The bacterial clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated 9 (Cas9) system is a versatile RNA-guided mammalian genome modification system. One-step generation of mouse genome targeting has been achieved by co-microinjection of one-cell stage embryos with Cas9 mRNA and small/single guide (sg)RNA. Many studies have focused on enhancing the efficiency of this system. In the present study, we report that simultaneous use of dual sgRNAs to target an individual gene significantly improved the Cas9-mediated genome targeting with a bi-allelic modification efficiency of up to 78%. We further observed that the target gene modifications were characterized by efficient germline transmission and site-dependent off-target effects, and also that the apolipoprotein E gene knockout-mediated defects in blood biochemical parameters were recapitulated by CRISPR/Cas9-mediated heritable gene modification. Our results provide a dual sgRNAs strategy to facilitate CRISPR/Cas9-mediated mouse genome targeting.

    The FEBS journal 2014

* quick link -