Sanger Institute - Publications 2012

Number of papers published in 2012: 482

  • An integrated map of genetic variation from 1,092 human genomes.

    1000 Genomes Project Consortium, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT and McVean GA

    By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/I021213/1; British Heart Foundation: RG/09/012/28096, RG/09/12/28096; Howard Hughes Medical Institute; Medical Research Council: G0701805, G0801823, G0900747, G0900747(91070); NCI NIH HHS: R01 CA166661, R01CA166661; NCRR NIH HHS: G12 RR003050, UL1RR024131; NHGRI NIH HHS: P01HG4120, P41HG2371, P41HG4221, R01 HG002898, R01 HG004960, R01 HG007022, R01HG2898, R01HG3698, R01HG4719, R01HG4960, R01HG5701, RC2HG5552, RC2HG5581, U01 HG005728, U01 HG006513, U01 HG006569, U01HG5208, U01HG5209, U01HG5211, U01HG5214, U01HG5715, U01HG5725, U01HG5728, U01HG6513, U01HG6569, U41HG4568, U54 HG003079, U54 HG003273, U54HG3067, U54HG3079, U54HG3273; NHLBI NIH HHS: HL078885, R01HL95045, RC2HL102925, T32HL94284; NIAID NIH HHS: AI077439, AI2009061; NIEHS NIH HHS: ES015794; NIGMS NIH HHS: R01GM59290, T32GM7748, T32GM8283; NIH HHS: DP2OD6514; NIMH NIH HHS: R01MH84698; NIMHD NIH HHS: G12 MD007579; NLM NIH HHS: T15LM7033; PHS HHS: HHSN268201100040C; Wellcome Trust: 085532, 086084, 090532, 095908, WT085475/Z/08/Z, WT085532AIA, WT086084/Z/08/Z, WT089250/Z/09/Z, WT090532/Z/09/Z, WT095552/Z/11/Z, WT098051

    Nature 2012;491;7422;56-65

  • Analysis of context-dependent errors for illumina sequencing.

    Abnizova I, Leonard S, Skelly T, Brown A, Jackson D, Gourtovaia M, Qi G, Te Boekhorst R, Faruque N, Lewis K and Cox T

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK. ia1@sanger.ac.uk

    The new generation of short-read sequencing technologies requires reliable measures of data quality. Such measures are especially important for variant calling. However, in the particular case of SNP calling, a great number of false-positive SNPs may be obtained. One needs to distinguish putative SNPs from sequencing or other errors. We found that not only the probability of sequencing errors (i.e. the quality value) is important to distinguish an FP-SNP but also the conditional probability of "correcting" this error (the "second best call" probability, conditional on that of the first call). Surprisingly, around 80% of mismatches can be "corrected" with this second call. Another way to reduce the rate of FP-SNPs is to retrieve DNA motifs that seem to be prone to sequencing errors, and to attach a corresponding conditional quality value to these motifs. We have developed several measures to distinguish between sequence errors and candidate SNPs, based on a base call's nucleotide context and its mismatch type. In addition, we suggested a simple method to correct the majority of mismatches, based on conditional probability of their "second" best intensity call. We attach a corresponding second call confidence (quality value) of being corrected to each mismatch.

    Journal of bioinformatics and computational biology 2012;10;2;1241005

  • Multilocus sequence typing as a replacement for serotyping in Salmonella enterica.

    Achtman M, Wain J, Weill FX, Nair S, Zhou Z, Sangal V, Krauland MG, Hale JL, Harbottle H, Uesbeck A, Dougan G, Harrison LH, Brisse S and S. Enterica MLST Study Group

    Environmental Research Institute and Department of Microbiology, University College Cork, Cork, Ireland. m.achtman@ucc.ie

    Salmonella enterica subspecies enterica is traditionally subdivided into serovars by serological and nutritional characteristics. We used Multilocus Sequence Typing (MLST) to assign 4,257 isolates from 554 serovars to 1092 sequence types (STs). The majority of the isolates and many STs were grouped into 138 genetically closely related clusters called eBurstGroups (eBGs). Many eBGs correspond to a serovar, for example most Typhimurium are in eBG1 and most Enteritidis are in eBG4, but many eBGs contained more than one serovar. Furthermore, most serovars were polyphyletic and are distributed across multiple unrelated eBGs. Thus, serovar designations confounded genetically unrelated isolates and failed to recognize natural evolutionary groupings. An inability of serotyping to correctly group isolates was most apparent for Paratyphi B and its variant Java. Most Paratyphi B were included within a sub-cluster of STs belonging to eBG5, which also encompasses a separate sub-cluster of Java STs. However, diphasic Java variants were also found in two other eBGs and monophasic Java variants were in four other eBGs or STs, one of which is in subspecies salamae and a second of which includes isolates assigned to Enteritidis, Dublin and monophasic Paratyphi B. Similarly, Choleraesuis was found in eBG6 and is closely related to Paratyphi C, which is in eBG20. However, Choleraesuis var. Decatur consists of isolates from seven other, unrelated eBGs or STs. The serological assignment of these Decatur isolates to Choleraesuis likely reflects lateral gene transfer of flagellar genes between unrelated bacteria plus purifying selection. By confounding multiple evolutionary groups, serotyping can be misleading about the disease potential of S. enterica. Unlike serotyping, MLST recognizes evolutionary groupings and we recommend that Salmonella classification by serotyping should be replaced by MLST or its equivalents.

    Funded by: Wellcome Trust

    PLoS pathogens 2012;8;6;e1002776

  • BLUEPRINT to decode the epigenetic signature written in blood.

    Adams D, Altucci L, Antonarakis SE, Ballesteros J, Beck S, Bird A, Bock C, Boehm B, Campo E, Caricasole A, Dahl F, Dermitzakis ET, Enver T, Esteller M, Estivill X, Ferguson-Smith A, Fitzgibbon J, Flicek P, Giehl C, Graf T, Grosveld F, Guigo R, Gut I, Helin K, Jarvius J, Küppers R, Lehrach H, Lengauer T, Lernmark Å, Leslie D, Loeffler M, Macintyre E, Mai A, Martens JH, Minucci S, Ouwehand WH, Pelicci PG, Pendeville H, Porse B, Rakyan V, Reik W, Schrappe M, Schübeler D, Seifert M, Siebert R, Simmons D, Soranzo N, Spicuglia S, Stratton M, Stunnenberg HG, Tanay A, Torrents D, Valencia A, Vellenga E, Vingron M, Walter J and Willcocks S

    Funded by: Medical Research Council: G0801156, MR/J001597/1; Wellcome Trust: 079249, 095606, 095645, 095908

    Nature biotechnology 2012;30;3;224-6

  • A suggested new bacteriophage genus: "Viunalikevirus".

    Adriaenssens EM, Ackermann HW, Anany H, Blasdel B, Connerton IF, Goulding D, Griffiths MW, Hooton SP, Kutter EM, Kropinski AM, Lee JH, Maes M, Pickard D, Ryu S, Sepehrizadeh Z, Shahrbabak SS, Toribio AL and Lavigne R

    Laboratory of Gene Technology, Katholieke Universiteit Leuven, Kasteelpark Arenberg 21, Heverlee, Belgium.

    We suggest a bacteriophage genus, "Viunalikevirus", as a new genus within the family Myoviridae. To date, this genus includes seven sequenced members: Salmonella phages ViI, SFP10 and ΦSH19; Escherichia phages CBA120 and PhaxI; Shigella phage phiSboM-AG3; and Dickeya phage LIMEstone1. Their shared myovirus morphology, with comparable head sizes and tail dimensions, and genome organization are considered distinguishing features. They appear to have conserved regulatory sequences, a horizontally acquired tRNA set and the probable substitution of an alternate base for thymine in the DNA. A close examination of the tail spike region in the DNA revealed four distinct tail spike proteins, an arrangement which might lead to the umbrella-like structures of the tails visible on electron micrographs. These properties set the suggested genus apart from the recently ratified subfamily Tevenvirinae, although a significant evolutionary relationship can be observed.

    Funded by: NIGMS NIH HHS: 2R15GM63637-3A1

    Archives of virology 2012;157;10;2035-46

  • Work-related exhaustion and telomere length: a population-based study.

    Ahola K, Sirén I, Kivimäki M, Ripatti S, Aromaa A, Lönnqvist J and Hovatta I

    Work Organizations, Finnish Institute of Occupational Health, Helsinki, Finland.

    Background: Psychological stress is suggested to accelerate the rate of biological aging. We investigated whether work-related exhaustion, an indicator of prolonged work stress, is associated with accelerated biological aging, as indicated by shorter leukocyte telomeres, that is, the DNA-protein complexes that cap chromosomal ends in cells. Methods: We used data from a representative sample of the Finnish working-age population, the Health 2000 Study. Our sample consisted of 2911 men and women aged 30-64. Work-related exhaustion was assessed using the Maslach Burnout Inventory - General Survey. We determined relative leukocyte telomere length using a quantitative real-time polymerase chain reaction (PCR) -based method. Results: After adjustment for age and sex, individuals with severe exhaustion had leukocyte telomeres on average 0.043 relative units shorter (standard error of the mean 0.016) than those with no exhaustion (p = 0.009). The association between exhaustion and relative telomere length remained significant after additional adjustment for marital and socioeconomic status, smoking, body mass index, and morbidities (adjusted difference 0.044 relative units, standard error of the mean 0.017, p = 0.008). Conclusions: These data suggest that work-related exhaustion is related to the acceleration of the rate of biological aging. This hypothesis awaits confirmation in a prospective study measuring changes in relative telomere length over time.

    PloS one 2012;7;7;e40186

  • Phenotypic and Genomic Analysis of Hypervirulent Human-associated Bordetella bronchiseptica.

    Ahuja U, Liu M, Tomida S, Park J, Souda P, Whitelegge J, Li H, Harvill ET, Parkhill J and Miller JF

    Department of Microbiology, Immunology and Molecular Genetics, University of California, BSRB 254, 615 Charles E, Young Drive East, Los Angeles, CA, 90095-1747, USA. jfmiller@ucla.edu.

    Unlabelled: ABSTRACT: Background: B. bronchiseptica infections are usually associated with wild or domesticated animals, but infrequently with humans. A recent phylogenetic analysis distinguished two distinct B. bronchiseptica subpopulations, designated complexes I and IV. Complex IV isolates appear to have a bias for infecting humans; however, little is known regarding their epidemiology, virulence properties, or comparative genomics. Results: Here we report a characterization of the virulence of human-associated complex IV B. bronchiseptica strains. In in vitro cytotoxicity assays, complex IV strains showed increased cytotoxicity in comparison to a panel of complex I strains. Some complex IV isolates were remarkably cytotoxic, resulting in LDH release levels in A549 cells that were 10- to 20-fold greater than complex I strains. In vivo, a subset of complex IV strains was found to be hypervirulent, with an increased ability to cause lethal pulmonary infections in mice. Hypercytotoxicity in vitro and hypervirulence in vivo were both dependent on the activity of the bsc T3SS and the BteA effector. To clarify differences between lineages, representative complex IV isolates were sequenced and their genomes were compared to complex I isolates. Although our analysis showed there were no genomic sequences that can be considered unique to complex IV strains, there were several loci that were predominantly found in complex IV isolates. Conclusion: Our observations reveal a T3SS-dependent hypervirulence phenotype in human-associated complex IV isolates, highlighting the need for further studies on the epidemiology and evolutionary dynamics of this B. bronchiseptica lineage.

    BMC microbiology 2012;12;167

  • Compound inheritance of a low-frequency regulatory SNP and a rare null mutation in exon-junction complex subunit RBM8A causes TAR syndrome.

    Albers CA, Paul DS, Schulze H, Freson K, Stephens JC, Smethurst PA, Jolley JD, Cvejic A, Kostadima M, Bertone P, Breuning MH, Debili N, Deloukas P, Favier R, Fiedler J, Hobbs CM, Huang N, Hurles ME, Kiddle G, Krapels I, Nurden P, Ruivenkamp CA, Sambrook JG, Smith K, Stemple DL, Strauss G, Thys C, van Geet C, Newbury-Ecob R, Ouwehand WH and Ghevaert C

    Department of Haematology, University of Cambridge, Cambridge, UK. caa@sanger.ac.uk

    The exon-junction complex (EJC) performs essential RNA processing tasks. Here, we describe the first human disorder, thrombocytopenia with absent radii (TAR), caused by deficiency in one of the four EJC subunits. Compound inheritance of a rare null allele and one of two low-frequency SNPs in the regulatory regions of RBM8A, encoding the Y14 subunit of EJC, causes TAR. We found that this inheritance mechanism explained 53 of 55 cases (P < 5 × 10(-228)) of the rare congenital malformation syndrome. Of the 53 cases with this inheritance pattern, 51 carried a submicroscopic deletion of 1q21.1 that has previously been associated with TAR, and two carried a truncation or frameshift null mutation in RBM8A. We show that the two regulatory SNPs result in diminished RBM8A transcription in vitro and that Y14 expression is reduced in platelets from individuals with TAR. Our data implicate Y14 insufficiency and, presumably, an EJC defect as the cause of TAR syndrome.

    Funded by: British Heart Foundation: FS/09/039, FS/09/039/27788, RG/09/012/28096, RG/09/12/28096; Wellcome Trust: 082597, 084183, WT-082597/Z/07/Z, WT-084183/2/07/2, WT091310

    Nature genetics 2012;44;4;435-9, S1-2

  • Powerful Identification of Cis-regulatory SNPs in Human Primary Monocytes Using Allele-Specific Gene Expression.

    Almlöf JC, Lundmark P, Lundmark A, Ge B, Maouche S, Göring HH, Liljedahl U, Enström C, Brocheton J, Proust C, Godefroy T, Sambrook JG, Jolley J, Crisp-Hihn A, Foad N, Lloyd-Jones H, Stephens J, Gwilliam R, Rice CM, Hengstenberg C, Samani NJ, Erdmann J, Schunkert H, Pastinen T, Deloukas P, Goodall AH, Ouwehand WH, Cambien F and Syvänen AC

    Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, Uppsala, Sweden.

    A large number of genome-wide association studies have been performed during the past five years to identify associations between SNPs and human complex diseases and traits. The assignment of a functional role for the identified disease-associated SNP is not straight-forward. Genome-wide expression quantitative trait locus (eQTL) analysis is frequently used as the initial step to define a function while allele-specific gene expression (ASE) analysis has not yet gained a wide-spread use in disease mapping studies. We compared the power to identify cis-acting regulatory SNPs (cis-rSNPs) by genome-wide allele-specific gene expression (ASE) analysis with that of traditional expression quantitative trait locus (eQTL) mapping. Our study included 395 healthy blood donors for whom global gene expression profiles in circulating monocytes were determined by Illumina BeadArrays. ASE was assessed in a subset of these monocytes from 188 donors by quantitative genotyping of mRNA using a genome-wide panel of SNP markers. The performance of the two methods for detecting cis-rSNPs was evaluated by comparing associations between SNP genotypes and gene expression levels in sample sets of varying size. We found that up to 8-fold more samples are required for eQTL mapping to reach the same statistical power as that obtained by ASE analysis for the same rSNPs. The performance of ASE is insensitive to SNPs with low minor allele frequencies and detects a larger number of significantly associated rSNPs using the same sample size as eQTL mapping. An unequivocal conclusion from our comparison is that ASE analysis is more sensitive for detecting cis-rSNPs than standard eQTL mapping. Our study shows the potential of ASE mapping in tissue samples and primary cells which are difficult to obtain in large numbers.

    PloS one 2012;7;12;e52260

  • High-throughput decoding of antitrypanosomal drug efficacy and resistance.

    Alsford S, Eckert S, Baker N, Glover L, Sanchez-Flores A, Leung KF, Turner DJ, Field MC, Berriman M and Horn D

    London School of Hygiene and Tropical Medicine, Keppel Street, London, WC1E 7HT, UK.

    The concept of disease-specific chemotherapy was developed a century ago. Dyes and arsenical compounds that displayed selectivity against trypanosomes were central to this work, and the drugs that emerged remain in use for treating human African trypanosomiasis (HAT). The importance of understanding the mechanisms underlying selective drug action and resistance for the development of improved HAT therapies has been recognized, but these mechanisms have remained largely unknown. Here we use all five current HAT drugs for genome-scale RNA interference target sequencing (RIT-seq) screens in Trypanosoma brucei, revealing the transporters, organelles, enzymes and metabolic pathways that function to facilitate antitrypanosomal drug action. RIT-seq profiling identifies both known drug importers and the only known pro-drug activator, and links more than fifty additional genes to drug action. A bloodstream stage-specific invariant surface glycoprotein (ISG75) family mediates suramin uptake, and the AP1 adaptin complex, lysosomal proteases and major lysosomal transmembrane protein, as well as spermidine and N-acetylglucosamine biosynthesis, all contribute to suramin action. Further screens link ubiquinone availability to nitro-drug action, plasma membrane P-type H(+)-ATPases to pentamidine action, and trypanothione and several putative kinases to melarsoprol action. We also demonstrate a major role for aquaglyceroporins in pentamidine and melarsoprol cross-resistance. These advances in our understanding of mechanisms of antitrypanosomal drug efficacy and resistance will aid the rational design of new therapies and help to combat drug resistance, and provide unprecedented molecular insight into the mode of action of antitrypanosomal drugs.

    Funded by: Wellcome Trust: 085775, 085775/Z/08/Z, 090007, 090007/Z/09/Z, 093010, 093010/Z/10/Z

    Nature 2012;482;7384;232-6

  • Population genomic scan for candidate signatures of balancing selection to guide antigen characterization in malaria parasites.

    Amambua-Ngwa A, Tetteh KK, Manske M, Gomez-Escobar N, Stewart LB, Deerhake ME, Cheeseman IH, Newbold CI, Holder AA, Knuepfer E, Janha O, Jallow M, Campino S, Macinnis B, Kwiatkowski DP and Conway DJ

    Medical Research Council Unit, Fajara, Banjul, The Gambia.

    Acquired immunity in vertebrates maintains polymorphisms in endemic pathogens, leading to identifiable signatures of balancing selection. To comprehensively survey for genes under such selection in the human malaria parasite Plasmodium falciparum, we generated paired-end short-read sequences of parasites in clinical isolates from an endemic Gambian population, which were mapped to the 3D7 strain reference genome to yield high-quality genome-wide coding sequence data for 65 isolates. A minority of genes did not map reliably, including the hypervariable var, rifin, and stevor families, but 5,056 genes (90.9% of all in the genome) had >70% sequence coverage with minimum read depth of 5 for at least 50 isolates, of which 2,853 genes contained 3 or more single nucleotide polymorphisms (SNPs) for analysis of polymorphic site frequency spectra. Against an overall background of negatively skewed frequencies, as expected from historical population expansion combined with purifying selection, the outlying minority of genes with signatures indicating exceptionally intermediate frequencies were identified. Comparing genes with different stage-specificity, such signatures were most common in those with peak expression at the merozoite stage that invades erythrocytes. Members of clag, PfMC-2TM, surfin, and msp3-like gene families were highly represented, the strongest signature being in the msp3-like gene PF10_0355. Analysis of msp3-like transcripts in 45 clinical and 11 laboratory adapted isolates grown to merozoite-containing schizont stages revealed surprisingly low expression of PF10_0355. In diverse clonal parasite lines the protein product was expressed in a minority of mature schizonts (<1% in most lines and ∼10% in clone HB3), and eight sub-clones of HB3 cultured separately had an intermediate spectrum of positive frequencies (0.9 to 7.5%), indicating phase variable expression of this polymorphic antigen. This and other identified targets of balancing selection are now prioritized for functional study.

    Funded by: Medical Research Council: U117532067; Wellcome Trust: 074695/Z/04/B, 090770/Z/09/Z, 098051

    PLoS genetics 2012;8;11;e1002992

  • Genome-wide association analysis of coffee drinking suggests association with CYP1A1/CYP1A2 and NRCAM.

    Amin N, Byrne E, Johnson J, Chenevix-Trench G, Walter S, Nolte IM, kConFab Investigators, Vink JM, Rawal R, Mangino M, Teumer A, Keers JC, Verwoert G, Baumeister S, Biffar R, Petersmann A, Dahmen N, Doering A, Isaacs A, Broer L, Wray NR, Montgomery GW, Levy D, Psaty BM, Gudnason V, Chakravarti A, Sulem P, Gudbjartsson DF, Kiemeney LA, Thorsteinsdottir U, Stefansson K, van Rooij FJ, Aulchenko YS, Hottenga JJ, Rivadeneira FR, Hofman A, Uitterlinden AG, Hammond CJ, Shin SY, Ikram A, Witteman JC, Janssens AC, Snieder H, Tiemeier H, Wolfenbuttel BH, Oostra BA, Heath AC, Wichmann E, Spector TD, Grabe HJ, Boomsma DI, Martin NG and van Duijn CM

    Unit of Genetic Epidemiology, Department of Epidemiology, Erasmus University Medical Center, Rotterdam, The Netherlands.

    Coffee consumption is a model for addictive behavior. We performed a meta-analysis of genome-wide association studies (GWASs) on coffee intake from 8 Caucasian cohorts (N=18 176) and sought replication of our top findings in a further 7929 individuals. We also performed a gene expression analysis treating different cell lines with caffeine. Genome-wide significant association was observed for two single-nucleotide polymorphisms (SNPs) in the 15q24 region. The two SNPs rs2470893 and rs2472297 (P-values=1.6 × 10(-11) and 2.7 × 10(-11)), which were also in strong linkage disequilibrium (r(2)=0.7) with each other, lie in the 23-kb long commonly shared 5' flanking region between CYP1A1 and CYP1A2 genes. CYP1A1 was found to be downregulated in lymphoblastoid cell lines treated with caffeine. CYP1A1 is known to metabolize polycyclic aromatic hydrocarbons, which are important constituents of coffee, whereas CYP1A2 is involved in the primary metabolism of caffeine. Significant evidence of association was also detected at rs382140 (P-value=3.9 × 10(-09)) near NRCAM-a gene implicated in vulnerability to addiction, and at another independent hit rs6495122 (P-value=7.1 × 10(-09))-an SNP associated with blood pressure-in the 15q24 region near the gene ULK3, in the meta-analysis of discovery and replication cohorts. Our results from GWASs and expression analysis also strongly implicate CAB39L in coffee drinking. Pathway analysis of differentially expressed genes revealed significantly enriched ubiquitin proteasome (P-value=2.2 × 10(-05)) and Parkinson's disease pathways (P-value=3.6 × 10(-05)).

    Funded by: NIAAA NIH HHS: K05 AA017688

    Molecular psychiatry 2012;17;11;1116-29

  • Analysis of high-depth sequence data for studying viral diversity: a comparison of next generation sequencing platforms using Segminator II.

    Archer J, Baillie G, Watson SJ, Kellam P, Rambaut A and Robertson DL

    Computational and Evolutionary Biology, Faculty of Life Sciences, University of Manchester, Manchester, UK. john.archer@manchester.ac.uk

    Background: Next generation sequencing provides detailed insight into the variation present within viral populations, introducing the possibility of treatment strategies that are both reactive and predictive. Current software tools, however, need to be scaled up to accommodate for high-depth viral data sets, which are often temporally or spatially linked. In addition, due to the development of novel sequencing platforms and chemistries, each with implicit strengths and weaknesses, it will be helpful for researchers to be able to routinely compare and combine data sets from different platforms/chemistries. In particular, error associated with a specific sequencing process must be quantified so that true biological variation may be identified.

    Results: Segminator II was developed to allow for the efficient comparison of data sets derived from different sources. We demonstrate its usage by comparing large data sets from 12 influenza H1N1 samples sequenced on both the 454 Life Sciences and Illumina platforms, permitting quantification of platform error. For mismatches median error rates at 0.10 and 0.12%, respectively, suggested that both platforms performed similarly. For insertions and deletions median error rates within the 454 data (at 0.3 and 0.2%, respectively) were significantly higher than those within the Illumina data (0.004 and 0.006%, respectively). In agreement with previous observations these higher rates were strongly associated with homopolymeric stretches on the 454 platform. Outside of such regions both platforms had similar indel error profiles. Additionally, we apply our software to the identification of low frequency variants.

    Conclusion: We have demonstrated, using Segminator II, that it is possible to distinguish platform specific error from biological variation using data derived from two different platforms. We have used this approach to quantify the amount of error present within the 454 and Illumina platforms in relation to genomic location as well as location on the read. Given that next generation data is increasingly important in the analysis of drug-resistance and vaccine trials, this software will be useful to the pathogen research community. A zip file containing the source code and jar file is freely available for download from http://www.bioinf.manchester.ac.uk/segminator/.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/H012419/1; Wellcome Trust: 095831

    BMC bioinformatics 2012;13;47

  • Identification of new susceptibility loci for osteoarthritis (arcOGEN): a genome-wide association study.

    arcOGEN Consortium, arcOGEN Collaborators, Zeggini E, Panoutsopoulou K, Southam L, Rayner NW, Day-Williams AG, Lopes MC, Boraska V, Esko T, Evangelou E, Hoffman A, Houwing-Duistermaat JJ, Ingvarsson T, Jonsdottir I, Jonnson H, Kerkhof HJ, Kloppenburg M, Bos SD, Mangino M, Metrustry S, Slagboom PE, Thorleifsson G, Raine EV, Ratnayake M, Ricketts M, Beazley C, Blackburn H, Bumpstead S, Elliott KS, Hunt SE, Potter SC, Shin SY, Yadav VK, Zhai G, Sherburn K, Dixon K, Arden E, Aslam N, Battley PK, Carluke I, Doherty S, Gordon A, Joseph J, Keen R, Koller NC, Mitchell S, O'Neill F, Paling E, Reed MR, Rivadeneira F, Swift D, Walker K, Watkins B, Wheeler M, Birrell F, Ioannidis JP, Meulenbelt I, Metspalu A, Rai A, Salter D, Stefansson K, Stykarsdottir U, Uitterlinden AG, van Meurs JB, Chapman K, Deloukas P, Ollier WE, Wallis GA, Arden N, Carr A, Doherty M, McCaskie A, Willkinson JM, Ralston SH, Valdes AM, Spector TD and Loughlin J

    Wellcome Trust Sanger Institute, Morgan Building, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK. eleftheria@sanger.ac.uk

    Background: Osteoarthritis is the most common form of arthritis worldwide and is a major cause of pain and disability in elderly people. The health economic burden of osteoarthritis is increasing commensurate with obesity prevalence and longevity. Osteoarthritis has a strong genetic component but the success of previous genetic studies has been restricted due to insufficient sample sizes and phenotype heterogeneity.

    Methods: We undertook a large genome-wide association study (GWAS) in 7410 unrelated and retrospectively and prospectively selected patients with severe osteoarthritis in the arcOGEN study, 80% of whom had undergone total joint replacement, and 11,009 unrelated controls from the UK. We replicated the most promising signals in an independent set of up to 7473 cases and 42,938 controls, from studies in Iceland, Estonia, the Netherlands, and the UK. All patients and controls were of European descent.

    Findings: We identified five genome-wide significant loci (binomial test p≤5·0×10(-8)) for association with osteoarthritis and three loci just below this threshold. The strongest association was on chromosome 3 with rs6976 (odds ratio 1·12 [95% CI 1·08-1·16]; p=7·24×10(-11)), which is in perfect linkage disequilibrium with rs11177. This SNP encodes a missense polymorphism within the nucleostemin-encoding gene GNL3. Levels of nucleostemin were raised in chondrocytes from patients with osteoarthritis in functional studies. Other significant loci were on chromosome 9 close to ASTN2, chromosome 6 between FILIP1 and SENP6, chromosome 12 close to KLHDC5 and PTHLH, and in another region of chromosome 12 close to CHST11. One of the signals close to genome-wide significance was within the FTO gene, which is involved in regulation of bodyweight-a strong risk factor for osteoarthritis. All risk variants were common in frequency and exerted small effects.

    Interpretation: Our findings provide insight into the genetics of arthritis and identify new pathways that might be amenable to future therapeutic intervention.

    Funding: arcOGEN was funded by a special purpose grant from Arthritis Research UK.

    Funded by: Arthritis Research UK: 18030; Medical Research Council: G0100594, G0901461

    Lancet 2012;380;9844;815-23

  • Population differentiation of southern Indian male lineages correlates with agricultural expansions predating the caste system.

    Arunkumar G, Soria-Hernanz DF, Kavitha VJ, Arun VS, Syama A, Ashokan KS, Gandhirajan KT, Vijayakumar K, Narayanan M, Jayalakshmi M, Ziegle JS, Royyuru AK, Parida L, Wells RS, Renfrew C, Schurr TG, Smith CT, Platt DE, Pitchappan R and Genographic Consortium

    The Genographic Laboratory, School of Biological Sciences, Madurai Kamaraj University, Madurai, Tamil Nadu, India.

    Previous studies that pooled Indian populations from a wide variety of geographical locations, have obtained contradictory conclusions about the processes of the establishment of the Varna caste system and its genetic impact on the origins and demographic histories of Indian populations. To further investigate these questions we took advantage that both Y chromosome and caste designation are paternally inherited, and genotyped 1,680 Y chromosomes representing 12 tribal and 19 non-tribal (caste) endogamous populations from the predominantly Dravidian-speaking Tamil Nadu state in the southernmost part of India. Tribes and castes were both characterized by an overwhelming proportion of putatively Indian autochthonous Y-chromosomal haplogroups (H-M69, F-M89, R1a1-M17, L1-M27, R2-M124, and C5-M356; 81% combined) with a shared genetic heritage dating back to the late Pleistocene (10-30 Kya), suggesting that more recent Holocene migrations from western Eurasia contributed <20% of the male lineages. We found strong evidence for genetic structure, associated primarily with the current mode of subsistence. Coalescence analysis suggested that the social stratification was established 4-6 Kya and there was little admixture during the last 3 Kya, implying a minimal genetic impact of the Varna (caste) system from the historically-documented Brahmin migrations into the area. In contrast, the overall Y-chromosomal patterns, the time depth of population diversifications and the period of differentiation were best explained by the emergence of agricultural technology in South Asia. These results highlight the utility of detailed local genetic studies within India, without prior assumptions about the importance of Varna rank status for population grouping, to obtain new insights into the relative influences of past demographic events for the population structure of the whole of modern India.

    Funded by: Wellcome Trust: 098051

    PloS one 2012;7;11;e50269

  • An evaluation of different meta-analysis approaches in the presence of allelic heterogeneity.

    Asimit J, Day-Williams A, Zgaga L, Rudan I, Boraska V and Zeggini E

    Department of Human Genetics, Wellcome Trust Sanger Institute, Hinxton, UK.

    Meta-analysis has proven a useful tool in genetic association studies. Allelic heterogeneity can arise from ethnic background differences across populations being meta-analyzed (for example, in search of common frequency variants through genome-wide association studies), and through the presence of multiple low frequency and rare associated variants in the same functional unit of interest (for example, within a gene or a regulatory region). The latter challenge will be increasingly relevant in whole-genome and whole-exome sequencing studies investigating association with complex traits. Here, we evaluate the performance of different approaches to meta-analysis in the presence of allelic heterogeneity. We simulate allelic heterogeneity scenarios in three populations and examine the performance of current approaches to the analysis of these data. We show that current approaches can detect only a small fraction of common frequency causal variants. We also find that for low-frequency variants with large effects (odds ratios 2-3), single-point tests have high power, but also high false-positive rates. P-value based meta-analysis of summary results from allele-matching locus-wide tests outperforms collapsing approaches. We conclude that current strategies for the combination of genetic association data in the presence of allelic heterogeneity are insufficiently powered.

    Funded by: Wellcome Trust: 098051

    European journal of human genetics : EJHG 2012;20;6;709-12

  • Imputation of rare variants in next-generation association studies.

    Asimit JL and Zeggini E

    Wellcome Trust Sanger Institute, Hinxton, UK. ja11@sanger.ac.uk

    The role of rare variants has become a focus in the search for association with complex traits. Imputation is a powerful and cost-efficient tool to access variants that have not been directly typed, but there are several challenges when imputing rare variants, most notably reference panel selection. Extensions to rare variant association tests to incorporate genotype uncertainty from imputation are discussed, as well as the use of imputed low-frequency and rare variants in the study of population isolates.

    Funded by: Wellcome Trust: 098051

    Human heredity 2012;74;3-4;196-204

  • ARIEL and AMELIA: testing for an accumulation of rare variants using next-generation sequencing data.

    Asimit JL, Day-Williams AG, Morris AP and Zeggini E

    Wellcome Trust Sanger Institute, Hinxton, UK.

    Objectives: There is increasing evidence that rare variants play a role in some complex traits, but their analysis is not straightforward. Locus-based tests become necessary due to low power in rare variant single-point association analyses. In addition, variant quality scores are available for sequencing data, but are rarely taken into account. Here, we propose two locus-based methods that incorporate variant quality scores: a regression-based collapsing approach and an allele-matching method.

    Methods: Using simulated sequencing data we compare 4 locus-based tests of trait association under different scenarios of data quality. We test two collapsing-based approaches and two allele-matching-based approaches, taking into account variant quality scores and ignoring variant quality scores. We implement the collapsing and allele-matching approaches accounting for variant quality in the freely available ARIEL and AMELIA software.

    Results: The incorporation of variant quality scores in locus-based association tests has power advantages over weighting each variant equally. The allele-matching methods are robust to the presence of both protective and risk variants in a locus, while collapsing methods exhibit a dramatic loss of power in this scenario.

    Conclusions: The incorporation of variant quality scores should be a standard protocol when performing locus-based association analysis on sequencing data. The ARIEL and AMELIA software implement collapsing and allele-matching locus association analysis methods, respectively, that allow the incorporation of variant quality scores.

    Funded by: Wellcome Trust: 088885, 090532, 098051

    Human heredity 2012;73;2;84-94

  • Large-Scale Gene-Centric Meta-analysis across 32 Studies Identifies Multiple Lipid Loci.

    Asselbergs FW, Guo Y, van Iperen EP, Sivapalaratnam S, Tragante V, Lanktree MB, Lange LA, Almoguera B, Appelman YE, Barnard J, Baumert J, Beitelshees AL, Bhangale TR, Chen YD, Gaunt TR, Gong Y, Hopewell JC, Johnson T, Kleber ME, Langaee TY, Li M, Li YR, Liu K, McDonough CW, Meijs MF, Middelberg RP, Musunuru K, Nelson CP, O'Connell JR, Padmanabhan S, Pankow JS, Pankratz N, Rafelt S, Rajagopalan R, Romaine SP, Schork NJ, Shaffer J, Shen H, Smith EN, Tischfield SE, van der Most PJ, van Vliet-Ostaptchouk JV, Verweij N, Volcik KA, Zhang L, Bailey KR, Bailey KM, Bauer F, Boer JM, Braund PS, Burt A, Burton PR, Buxbaum SG, Chen W, Cooper-Dehoff RM, Cupples LA, Dejong JS, Delles C, Duggan D, Fornage M, Furlong CE, Glazer N, Gums JG, Hastie C, Holmes MV, Illig T, Kirkland SA, Kivimaki M, Klein R, Klein BE, Kooperberg C, Kottke-Marchant K, Kumari M, Lacroix AZ, Mallela L, Murugesan G, Ordovas J, Ouwehand WH, Post WS, Saxena R, Scharnagl H, Schreiner PJ, Shah T, Shields DC, Shimbo D, Srinivasan SR, Stolk RP, Swerdlow DI, Taylor HA, Topol EJ, Toskala E, van Pelt JL, van Setten J, Yusuf S, Whittaker JC, Zwinderman AH, LifeLines Cohort Study, Anand SS, Balmforth AJ, Berenson GS, Bezzina CR, Boehm BO, Boerwinkle E, Casas JP, Caulfield MJ, Clarke R, Connell JM, Cruickshanks KJ, Davidson KW, Day IN, de Bakker PI, Doevendans PA, Dominiczak AF, Hall AS, Hartman CA, Hengstenberg C, Hillege HL, Hofker MH, Humphries SE, Jarvik GP, Johnson JA, Kaess BM, Kathiresan S, Koenig W, Lawlor DA, März W, Melander O, Mitchell BD, Montgomery GW, Munroe PB, Murray SS, Newhouse SJ, Onland-Moret NC, Poulter N, Psaty B, Redline S, Rich SS, Rotter JI, Schunkert H, Sever P, Shuldiner AR, Silverstein RL, Stanton A, Thorand B, Trip MD, Tsai MY, van der Harst P, van der Schoot E, van der Schouw YT, Verschuren WM, Watkins H, Wilde AA, Wolffenbuttel BH, Whitfield JB, Hovingh GK, Ballantyne CM, Wijmenga C, Reilly MP, Martin NG, Wilson JG, Rader DJ, Samani NJ, Reiner AP, Hegele RA, Kastelein JJ, Hingorani AD, Talmud PJ, Hakonarson H, Elbers CC, Keating BJ and Drenos F

    Department of Cardiology, Division of Heart and Lungs, University Medical Center Utrecht, 3508 GA Utrecht, The Netherlands; Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, 3508 GA Utrecht, The Netherlands; Department of Medical Genetics, Biomedical Genetics, University Medical Center Utrecht, 3584 CX Utrecht, The Netherlands.

    Genome-wide association studies (GWASs) have identified many SNPs underlying variations in plasma-lipid levels. We explore whether additional loci associated with plasma-lipid phenotypes, such as high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), total cholesterol (TC), and triglycerides (TGs), can be identified by a dense gene-centric approach. Our meta-analysis of 32 studies in 66,240 individuals of European ancestry was based on the custom ∼50,000 SNP genotyping array (the ITMAT-Broad-CARe array) covering ∼2,000 candidate genes. SNP-lipid associations were replicated either in a cohort comprising an additional 24,736 samples or within the Global Lipid Genetic Consortium. We identified four, six, ten, and four unreported SNPs in established lipid genes for HDL-C, LDL-C, TC, and TGs, respectively. We also identified several lipid-related SNPs in previously unreported genes: DGAT2, HCAR2, GPIHBP1, PPARG, and FTO for HDL-C; SOCS3, APOH, SPTY2D1, BRCA2, and VLDLR for LDL-C; SOCS3, UGT1A1, BRCA2, UBE3B, FCGR2A, CHUK, and INSIG2 for TC; and SERPINF2, C4B, GCK, GATA4, INSR, and LPAL2 for TGs. The proportion of explained phenotypic variance in the subset of studies providing individual-level data was 9.9% for HDL-C, 9.5% for LDL-C, 10.3% for TC, and 8.0% for TGs. This large meta-analysis of lipid phenotypes with the use of a dense gene-centric approach identified multiple SNPs not previously described in established lipid genes and several previously unknown loci. The explained phenotypic variance from this approach was comparable to that from a meta-analysis of GWAS data, suggesting that a focused genotyping approach can further increase the understanding of heritability of plasma lipids.

    American journal of human genetics 2012

  • Characterization of within-host Plasmodium falciparum diversity using next-generation sequence data.

    Auburn S, Campino S, Miotto O, Djimde AA, Zongo I, Manske M, Maslen G, Mangano V, Alcock D, MacInnis B, Rockett KA, Clark TG, Doumbo OK, Ouédraogo JB and Kwiatkowski DP

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom. sa3@sanger.ac.uk

    Our understanding of the composition of multi-clonal malarial infections and the epidemiological factors which shape their diversity remain poorly understood. Traditionally within-host diversity has been defined in terms of the multiplicity of infection (MOI) derived by PCR-based genotyping. Massively parallel, single molecule sequencing technologies now enable individual read counts to be derived on genome-wide datasets facilitating the development of new statistical approaches to describe within-host diversity. In this class of measures the F(WS) metric characterizes within-host diversity and its relationship to population level diversity. Utilizing P. falciparum field isolates from patients in West Africa we here explore the relationship between the traditional MOI and F(WS) approaches. F(WS) statistics were derived from read count data at 86,158 SNPs in 64 samples sequenced on the Illumina GA platform. MOI estimates were derived by PCR at the msp-1 and -2 loci. Significant correlations were observed between the two measures, particularly with the msp-1 locus (P = 5.92×10(-5)). The F(WS) metric should be more robust than the PCR-based approach owing to reduced sensitivity to potential locus-specific artifacts. Furthermore the F(WS) metric captures information on a range of parameters which influence out-crossing risk including the number of clones (MOI), their relative proportions and genetic divergence. This approach should provide novel insights into the factors which correlate with, and shape within-host diversity.

    Funded by: Howard Hughes Medical Institute: 55005502; Medical Research Council: G0600718, G19/9; Wellcome Trust: 089275, 090532, 090770

    PloS one 2012;7;2;e32891

  • Mouse large-scale phenotyping initiatives: overview of the European Mouse Disease Clinic (EUMODIC) and of the Wellcome Trust Sanger Institute Mouse Genetics Project.

    Ayadi A, Birling MC, Bottomley J, Bussell J, Fuchs H, Fray M, Gailus-Durner V, Greenaway S, Houghton R, Karp N, Leblanc S, Lengger C, Maier H, Mallon AM, Marschall S, Melvin D, Morgan H, Pavlovic G, Ryder E, Skarnes WC, Selloum M, Ramirez-Solis R, Sorg T, Teboul L, Vasseur L, Walling A, Weaver T, Wells S, White JK, Bradley A, Adams DJ, Steel KP, Hrabě de Angelis M, Brown SD and Herault Y

    Institut Clinique de la Souris, PHENOMIN, IGBMC/ICS-MCI, CNRS, INSERM, Université de Strasbourg, UMR7104, UMR964, 1 rue Laurent Fries, 67404 Illkirch, France.

    Two large-scale phenotyping efforts, the European Mouse Disease Clinic (EUMODIC) and the Wellcome Trust Sanger Institute Mouse Genetics Project (SANGER-MGP), started during the late 2000s with the aim to deliver a comprehensive assessment of phenotypes or to screen for robust indicators of diseases in mouse mutants. They both took advantage of available mouse mutant lines but predominantly of the embryonic stem (ES) cells resources derived from the European Conditional Mouse Mutagenesis programme (EUCOMM) and the Knockout Mouse Project (KOMP) to produce and study 799 mouse models that were systematically analysed with a comprehensive set of physiological and behavioural paradigms. They captured more than 400 variables and an additional panel of metadata describing the conditions of the tests. All the data are now available through EuroPhenome database (www.europhenome.org) and the WTSI mouse portal (http://www.sanger.ac.uk/mouseportal/), and the corresponding mouse lines are available through the European Mouse Mutant Archive (EMMA), the International Knockout Mouse Consortium (IKMC), or the Knockout Mouse Project (KOMP) Repository. Overall conclusions from both studies converged, with at least one phenotype scored in at least 80% of the mutant lines. In addition, 57% of the lines were viable, 13% subviable, 30% embryonic lethal, and 7% displayed fertility impairments. These efforts provide an important underpinning for a future global programme that will undertake the complete functional annotation of the mammalian genome in the mouse model.

    Funded by: Medical Research Council: G0300212, MC_QA137918, MC_U142684172, MC_U142684175; Wellcome Trust: 098051

    Mammalian genome : official journal of the International Mammalian Genome Society 2012;23;9-10;600-10

  • Ubiquitous Hepatocystis infections, but no evidence of Plasmodium falciparum-like malaria parasites in wild greater spot-nosed monkeys (Cercopithecus nictitans).

    Ayouba A, Mouacha F, Learn GH, Mpoudi-Ngole E, Rayner JC, Sharp PM, Hahn BH, Delaporte E and Peeters M

    Institut de Recherche pour le Développement, University of Montpellier, 34394 Montpellier, France.

    Western gorillas (Gorilla gorilla) have been identified as the natural reservoir of the parasites that were the immediate precursor of Plasmodium falciparum infecting humans. Recently, a P. falciparum-like sequence was reported in a sample from a captive greater spot-nosed monkey (Cercopithecus nictitans), and was taken to indicate that this species may also be a natural reservoir for P. falciparum-related parasites. To test this hypothesis we screened blood samples from 292 wild C. nictitans monkeys that had been hunted for bushmeat in Cameroon. We detected Hepatocystis spp. in 49% of the samples, as well as one sequence from a clade of Plasmodium spp. previously found in birds, lizards and bats. However, none of the 292 wild C. nictitans harbored P. falciparum-like parasites.

    Funded by: NIAID NIH HHS: AI91595, R01 AI50529; Wellcome Trust: 090851

    International journal for parasitology 2012;42;8;709-13

  • Rare and Low Frequency Variant Stratification in the UK Population: Description and Impact on Association Tests.

    Babron MC, de Tayrac M, Rutledge DN, Zeggini E and Génin E

    Inserm UMRS-946, Genetic variability and human diseases, Paris, France ; Institut Universitaire d'Hématologie, Univ Paris Diderot, Paris, France.

    Although variations in allele frequencies at common SNPs have been extensively studied in different populations, little is known about the stratification of rare variants and its impact on association tests. In this paper, we used Affymetrix 500K genotype data from the WTCCC to investigate if variants in three different frequency categories (below 1%, between 1 and 5%, above 5%) show different stratification patterns in the UK population. We found that these patterns are indeed different. The top principal component extracted from the rare variant category shows poor correlations with any principal component or combination of principal components from the low frequency or common variant categories. These results could suggest that a suitable solution to avoid false positive association due to population stratification would involve adjusting for the respective PCs when testing for variants in different allele frequency categories. However, we found this was not the case both on type 2 diabetes data and on simulated data. Indeed, adjusting rare variant association tests on PCs derived from rare variants does no better to correct for population stratification than adjusting on PCs derived from more common variants. Mixed models perform slightly better for low frequency variants than PC based adjustments but less well for the rarest variants. These results call for the need of new methodological developments specifically devoted to address rare variant stratification issues in association tests.

    PloS one 2012;7;10;e46519

  • A dominantly acting murine allele of Mcm4 causes chromosomal abnormalities and promotes tumorigenesis.

    Bagley BN, Keane TM, Maklakova VI, Marshall JG, Lester RA, Cancel MM, Paulsen AR, Bendzick LE, Been RA, Kogan SC, Cormier RT, Kendziorski C, Adams DJ and Collier LS

    School of Pharmacy and UW Carbone Cancer Center, University of Wisconsin Madison, Madison, WI, USA.

    Here we report the isolation of a murine model for heritable T cell lymphoblastic leukemia/lymphoma (T-ALL) called Spontaneous dominant leukemia (Sdl). Sdl heterozygous mice develop disease with a short latency and high penetrance, while mice homozygous for the mutation die early during embryonic development. Sdl mice exhibit an increase in the frequency of micronucleated reticulocytes, and T-ALLs from Sdl mice harbor small amplifications and deletions, including activating deletions at the Notch1 locus. Using exome sequencing it was determined that Sdl mice harbor a spontaneously acquired mutation in Mcm4 (Mcm4(D573H)). MCM4 is part of the heterohexameric complex of MCM2-7 that is important for licensing of DNA origins prior to S phase and also serves as the core of the replicative helicase that unwinds DNA at replication forks. Previous studies in murine models have discovered that genetic reductions of MCM complex levels promote tumor formation by causing genomic instability. However, Sdl mice possess normal levels of Mcms, and there is no evidence for loss-of-heterozygosity at the Mcm4 locus in Sdl leukemias. Studies in Saccharomyces cerevisiae indicate that the Sdl mutation produces a biologically inactive helicase. Together, these data support a model in which chromosomal abnormalities in Sdl mice result from the ability of MCM4(D573H) to incorporate into MCM complexes and render them inactive. Our studies indicate that dominantly acting alleles of MCMs can be compatible with viability but have dramatic oncogenic consequences by causing chromosomal abnormalities.

    Funded by: Cancer Research UK; NCI NIH HHS: K01CA122183, P30 CA014520, P30CA014520, R03CA137751; NIGMS NIH HHS: GM102756; Wellcome Trust

    PLoS genetics 2012;8;11;e1003034

  • Evolutionary dynamics of local pandemic H1N1/2009 influenza virus lineages revealed by whole-genome analysis.

    Baillie GJ, Galiano M, Agapow PM, Myers R, Chiam R, Gall A, Palser AL, Watson SJ, Hedge J, Underwood A, Platt S, McLean E, Pebody RG, Rambaut A, Green J, Daniels R, Pybus OG, Kellam P and Zambon M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom.

    Virus gene sequencing and phylogenetics can be used to study the epidemiological dynamics of rapidly evolving viruses. With complete genome data, it becomes possible to identify and trace individual transmission chains of viruses such as influenza virus during the course of an epidemic. Here we sequenced 153 pandemic influenza H1N1/09 virus genomes from United Kingdom isolates from the first (127 isolates) and second (26 isolates) waves of the 2009 pandemic and used their sequences, dates of isolation, and geographical locations to infer the genetic epidemiology of the epidemic in the United Kingdom. We demonstrate that the epidemic in the United Kingdom was composed of many cocirculating lineages, among which at least 13 were exclusively or predominantly United Kingdom clusters. The estimated divergence times of two of the clusters predate the detection of pandemic H1N1/09 virus in the United Kingdom, suggesting that the pandemic H1N1/09 virus was already circulating in the United Kingdom before the first clinical case. Crucially, three clusters contain isolates from the second wave of infections in the United Kingdom, two of which represent chains of transmission that appear to have persisted within the United Kingdom between the first and second waves. This demonstrates that whole-genome analysis can track in fine detail the behavior of individual influenza virus lineages during the course of a single epidemic or pandemic.

    Funded by: Medical Research Council: MC_U117512723; Wellcome Trust: 095831

    Journal of virology 2012;86;1;11-8

  • Frequency and patterns of protease gene resistance mutations in HIV-infected patients treated with lopinavir/ritonavir as their first protease inhibitor.

    Barber TJ, Harrison L, Asboe D, Williams I, Kirk S, Gilson R, Bansi L, Pillay D, Dunn D and UK HIV Drug Resistance Database and UK Collaborative HIV Cohort (UK CHIC) Study Steering Committees

    Medical Research Council Clinical Trials Unit, St Stephen's Centre, Chelsea and Westminster Hospital, 125 Kingsway, London, UK. t.barber@nhs.net

    Background: Selection of protease mutations on antiretroviral therapy (ART) including a ritonavir-boosted protease inhibitor (PI) has been reported infrequently. Scarce data exist from long-term cohorts on resistance incidence or mutational patterns emerging to different PIs.

    Methods: We studied UK patients receiving lopinavir/ritonavir as their first PI, either while naive to ART or having previously received non-PI-based ART. Virological failure was defined as viral load ≥ 400 copies/mL after previous suppression <400 copies/mL, or failure to achieve <400 copies/mL during the first 6 months. pol sequences whilst failing lopinavir or within 30 days after stopping were analysed. Major and minor mutations (IAS-USA 2008-after exclusion of polymorphisms) were considered. Predicted susceptibility was determined using the Stanford HIVdb algorithm.

    Results: Three thousand and fifty-six patients were followed for a median (IQR) of 14 (6-30) months, of whom 811 (27%) experienced virological failure. Of these, resistance test results were available on 291 (36%). One or more protease mutations were detected in 32 (11%) patients; the most frequent were I54V (n = 12), M46I (n = 11), V82A (n = 7) and L76V (n = 3). No association with viral subtype was evident. Many patients retained virus predicted to be susceptible to lopinavir (14, 44%), tipranavir (26, 81%) and darunavir (27, 84%).

    Conclusions: This study reflects the experience of patients in routine care. Selection of protease gene mutations by lopinavir/ritonavir occurred at a much higher rate than in clinical trials. The mutations observed showed only partial overlap with those previously identified by structural chemistry models, serial cell culture passage and genotype-phenotype analyses. There remained a low degree of predicted cross-resistance to other widely used PIs.

    Funded by: Medical Research Council: G00001999, G0600337, G0900274

    The Journal of antimicrobial chemotherapy 2012;67;4;995-1000

  • From HLA association to function.

    Barrett JC

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. barrett@sanger.ac.uk

    A new study refines the association signals for rheumatoid arthritis susceptibility in the major histocompatibility complex (MHC) region to five amino-acid positions encoded in three HLA genes, all within peptide-binding grooves. By adapting statistical methods from genome-wide association studies (GWAS) and using imputation from a large reference panel, they demonstrate the potential for this approach to identify functional variants in associated regions.

    Nature genetics 2012;44;3;235-6

  • Semaphorin-7A Is an Erythrocyte Receptor for P. falciparum Merozoite-Specific TRAP Homolog, MTRAP.

    Bartholdson SJ, Bustamante LY, Crosnier C, Johnson S, Lea S, Rayner JC and Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom ; Malaria Programme, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    The motility and invasion of Plasmodium parasites is believed to require a cytoplasmic actin-myosin motor associated with a cell surface ligand belonging to the TRAP (thrombospondin-related anonymous protein) family. Current models of invasion usually invoke the existence of specific receptors for the TRAP-family ligands on the surface of the host cell; however, the identities of these receptors remain largely unknown. Here, we identify the GPI-linked protein Semaphorin-7A (CD108) as an erythrocyte receptor for the P. falciparum merozoite-specific TRAP homolog (MTRAP) by using a systematic screening approach designed to detect extracellular protein interactions. The specificity of the interaction was demonstrated by showing that binding was saturable and by quantifying the equilibrium and kinetic biophysical binding parameters using surface plasmon resonance. We found that two MTRAP monomers interact via their tandem TSR domains with the Sema domains of a Semaphorin-7A homodimer. Known naturally-occurring polymorphisms in Semaphorin-7A did not quantitatively affect MTRAP binding nor did the presence of glycans on the receptor. Attempts to block the interaction during in vitro erythrocyte invasion assays using recombinant proteins and antibodies showed no significant inhibitory effect, suggesting the inaccessibility of the complex to proteinaceous blocking agents. These findings now provide important experimental evidence to support the model that parasite TRAP-family ligands interact with specific host receptors during cellular invasion.

    PLoS pathogens 2012;8;11;e1003031

  • Deficiency for the ubiquitin ligase UBE3B in a blepharophimosis-ptosis-intellectual-disability syndrome.

    Basel-Vanagaite L, Dallapiccola B, Ramirez-Solis R, Segref A, Thiele H, Edwards A, Arends MJ, Miró X, White JK, Désir J, Abramowicz M, Dentici ML, Lepri F, Hofmann K, Har-Zahav A, Ryder E, Karp NA, Estabel J, Gerdin AK, Podrini C, Ingham NJ, Altmüller J, Nürnberg G, Frommolt P, Abdelhak S, Pasmanik-Chor M, Konen O, Kelley RI, Shohat M, Nürnberg P, Flint J, Steel KP, Hoppe T, Kubisch C, Adams DJ and Borck G

    Raphael Recanati Genetics Institute, Rabin Medical Center, Beilinson Campus, Petah Tikva 49100, Israel. basel@post.tau.ac.il

    Ubiquitination plays a crucial role in neurodevelopment as exemplified by Angelman syndrome, which is caused by genetic alterations of the ubiquitin ligase-encoding UBE3A gene. Although the function of UBE3A has been widely studied, little is known about its paralog UBE3B. By using exome and capillary sequencing, we here identify biallelic UBE3B mutations in four patients from three unrelated families presenting an autosomal-recessive blepharophimosis-ptosis-intellectual-disability syndrome characterized by developmental delay, growth retardation with a small head circumference, facial dysmorphisms, and low cholesterol levels. UBE3B encodes an uncharacterized E3 ubiquitin ligase. The identified UBE3B variants include one frameshift and two splice-site mutations as well as a missense substitution affecting the highly conserved HECT domain. Disruption of mouse Ube3b leads to reduced viability and recapitulates key aspects of the human disorder, such as reduced weight and brain size and a downregulation of cholesterol synthesis. We establish that the probable Caenorhabditis elegans ortholog of UBE3B, oxi-1, functions in the ubiquitin/proteasome system in vivo and is especially required under oxidative stress conditions. Our data reveal the pleiotropic effects of UBE3B deficiency and reinforce the physiological importance of ubiquitination in neuronal development and function in mammals.

    Funded by: Medical Research Council: G0300212, MC_QA137918

    American journal of human genetics 2012;91;6;998-1010

  • Rapid-throughput skeletal phenotyping of 100 knockout mice identifies 9 new genes that determine bone strength.

    Bassett JH, Gogakos A, White JK, Evans H, Jacques RM, van der Spek AH, Sanger Mouse Genetics Project, Ramirez-Solis R, Ryder E, Sunter D, Boyde A, Campbell MJ, Croucher PI and Williams GR

    Molecular Endocrinology Group, Department of Medicine, Imperial College London, London, United Kingdom.

    Osteoporosis is a common polygenic disease and global healthcare priority but its genetic basis remains largely unknown. We report a high-throughput multi-parameter phenotype screen to identify functionally significant skeletal phenotypes in mice generated by the Wellcome Trust Sanger Institute Mouse Genetics Project and discover novel genes that may be involved in the pathogenesis of osteoporosis. The integrated use of primary phenotype data with quantitative x-ray microradiography, micro-computed tomography, statistical approaches and biomechanical testing in 100 unselected knockout mouse strains identified nine new genetic determinants of bone mass and strength. These nine new genes include five whose deletion results in low bone mass and four whose deletion results in high bone mass. None of the nine genes have been implicated previously in skeletal disorders and detailed analysis of the biomechanical consequences of their deletion revealed a novel functional classification of bone structure and strength. The organ-specific and disease-focused strategy described in this study can be applied to any biological system or tractable polygenic disease, thus providing a general basis to define gene function in a system-specific manner. Application of the approach to diseases affecting other physiological systems will help to realize the full potential of the International Mouse Phenotyping Consortium.

    Funded by: Arthritis Research UK: h UK 18292; Medical Research Council: G0800261; Wellcome Trust: 094134, 77157/Z/05/Z

    PLoS genetics 2012;8;8;e1002858

  • Epigenome-wide scans identify differentially methylated regions for age and age-related phenotypes in a healthy ageing population.

    Bell JT, Tsai PC, Yang TP, Pidsley R, Nisbet J, Glass D, Mangino M, Zhai G, Zhang F, Valdes A, Shin SY, Dempster EL, Murray RM, Grundberg E, Hedman AK, Nica A, Small KS, MuTHER Consortium, Dermitzakis ET, McCarthy MI, Mill J, Spector TD and Deloukas P

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK. jordana.bell@kcl.ac.uk

    Age-related changes in DNA methylation have been implicated in cellular senescence and longevity, yet the causes and functional consequences of these variants remain unclear. To elucidate the role of age-related epigenetic changes in healthy ageing and potential longevity, we tested for association between whole-blood DNA methylation patterns in 172 female twins aged 32 to 80 with age and age-related phenotypes. Twin-based DNA methylation levels at 26,690 CpG-sites showed evidence for mean genome-wide heritability of 18%, which was supported by the identification of 1,537 CpG-sites with methylation QTLs in cis at FDR 5%. We performed genome-wide analyses to discover differentially methylated regions (DMRs) for sixteen age-related phenotypes (ap-DMRs) and chronological age (a-DMRs). Epigenome-wide association scans (EWAS) identified age-related phenotype DMRs (ap-DMRs) associated with LDL (STAT5A), lung function (WT1), and maternal longevity (ARL4A, TBX20). In contrast, EWAS for chronological age identified hundreds of predominantly hyper-methylated age DMRs (490 a-DMRs at FDR 5%), of which only one (TBX20) was also associated with an age-related phenotype. Therefore, the majority of age-related changes in DNA methylation are not associated with phenotypic measures of healthy ageing in later life. We replicated a large proportion of a-DMRs in a sample of 44 younger adult MZ twins aged 20 to 61, suggesting that a-DMRs may initiate at an earlier age. We next explored potential genetic and environmental mechanisms underlying a-DMRs and ap-DMRs. Genome-wide overlap across cis-meQTLs, genotype-phenotype associations, and EWAS ap-DMRs identified CpG-sites that had cis-meQTLs with evidence for genotype-phenotype association, where the CpG-site was also an ap-DMR for the same phenotype. Monozygotic twin methylation difference analyses identified one potential environmentally-mediated ap-DMR associated with total cholesterol and LDL (CSMD1). Our results suggest that in a small set of genes DNA methylation may be a candidate mechanism of mediating not only environmental, but also genetic effects on age-related phenotypes.

    Funded by: European Research Council: 250157; Medical Research Council: G0900339; Wellcome Trust: 090532

    PLoS genetics 2012;8;4;e1002629

  • A robust clustering algorithm for identifying problematic samples in genome-wide association studies.

    Bellenguez C, Strange A, Freeman C, Wellcome Trust Case Control Consortium, Donnelly P and Spencer CC

    Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK.

    Summary: High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections.

    Availability: The algorithm is written in R and is freely available at www.well.ox.ac.uk/chris-spencer

    Contact: chris.spencer@well.ox.ac.uk

    Supplementary data are available at Bioinformatics online.

    Funded by: Wellcome Trust: 075491/Z/04/B, 084575/Z/08/Z, 090532/Z/09/Z

    Bioinformatics (Oxford, England) 2012;28;1;134-5

  • A genomic approach to bacterial taxonomy: an examination and proposed reclassification of species within the genus Neisseria.

    Bennett JS, Jolley KA, Earle SG, Corton C, Bentley SD, Parkhill J and Maiden MC

    Department of Zoology, University of Oxford, Oxford, UK. julia.bennett@zoo.ox.ac.uk

    In common with other bacterial taxa, members of the genus Neisseria are classified using a range of phenotypic and biochemical approaches, which are not entirely satisfactory in assigning isolates to species groups. Recently, there has been increasing interest in using nucleotide sequences for bacterial typing and taxonomy, but to date, no broadly accepted alternative to conventional methods is available. Here, the taxonomic relationships of 55 representative members of the genus Neisseria have been analysed using whole-genome sequence data. As genetic material belonging to the accessory genome is widely shared among different taxa but not present in all isolates, this analysis indexed nucleotide sequence variation within sets of genes, specifically protein-coding genes that were present and directly comparable in all isolates. Variation in these genes identified seven species groups, which were robust to the choice of genes and phylogenetic clustering methods used. The groupings were largely, but not completely, congruent with current species designations, with some minor changes in nomenclature and the reassignment of a few isolates necessary. In particular, these data showed that isolates classified as Neisseria polysaccharea are polyphyletic and probably include more than one taxonomically distinct organism. The seven groups could be reliably and rapidly generated with sequence variation within the 53 ribosomal protein subunit (rps) genes, further demonstrating that ribosomal multilocus sequence typing (rMLST) is a practicable and powerful means of characterizing bacteria at all levels, from domain to strain.

    Funded by: Wellcome Trust: 087622

    Microbiology (Reading, England) 2012;158;Pt 6;1570-80

  • The genome of Mycobacterium africanum West African 2 reveals a lineage-specific locus and genome erosion common to the M. tuberculosis complex.

    Bentley SD, Comas I, Bryant JM, Walker D, Smith NH, Harris SR, Thurston S, Gagneux S, Wood J, Antonio M, Quail MA, Gehre F, Adegbola RA, Parkhill J and de Jong BC

    Wellcome Trust Genome Campus, Wellcome Trust Sanger Institute, Hinxton, UK.

    Background: M. africanum West African 2 constitutes an ancient lineage of the M. tuberculosis complex that commonly causes human tuberculosis in West Africa and has an attenuated phenotype relative to M. tuberculosis.

    In search of candidate genes underlying these differences, the genome of M. africanum West African 2 was sequenced using classical capillary sequencing techniques. Our findings reveal a unique sequence, RD900, that was independently lost during the evolution of two important lineages within the complex: the "modern" M. tuberculosis group and the lineage leading to M. bovis. Closely related to M. bovis and other animal strains within the M. tuberculosis complex, M. africanum West African 2 shares an abundance of pseudogenes with M. bovis but also with M. africanum West African clade 1. Comparison with other strains of the M. tuberculosis complex revealed pseudogenes events in all the known lineages pointing toward ongoing genome erosion likely due to increased genetic drift and relaxed selection linked to serial transmission-bottlenecks and an intracellular lifestyle.

    The genomic differences identified between M. africanum West African 2 and the other strains of the Mycobacterium tuberculosis complex may explain its attenuated phenotype, and pave the way for targeted experiments to elucidate the phenotypic characteristic of M. africanum. Moreover, availability of the whole genome data allows for verification of conservation of targets used for the next generation of diagnostics and vaccines, in order to ensure similar efficacy in West Africa.

    Funded by: Medical Research Council: MC_U190071468, MC_U190074190, MC_U190081982, MC_U190081991, MC_U190085850; Wellcome Trust

    PLoS neglected tropical diseases 2012;6;2;e1552

  • An insertional mutagenesis screen identifies genes that cooperate with Mll-AF9 in a murine leukemogenesis model.

    Bergerson RJ, Collier LS, Sarver AL, Been RA, Lugthart S, Diers MD, Zuber J, Rappaport AR, Nixon MJ, Silverstein KA, Fan D, Lamblin AF, Wolff L, Kersey JH, Delwel R, Lowe SW, O'Sullivan MG, Kogan SC, Adams DJ and Largaespada DA

    Department of Genetics, Cell Biology and Development, Masonic Cancer Center, University of Minnesota Twin Cities, Minneapolis, MN 55455, USA.

    Patients with a t(9;11) translocation (MLL-AF9) develop acute myeloid leukemia (AML), and while in mice the expression of this fusion oncogene also results in the development of myeloid leukemia, it is with long latency. To identify mutations that cooperate with Mll-AF9, we infected neonatal wild-type (WT) or Mll-AF9 mice with a murine leukemia virus (MuLV). MuLV-infected Mll-AF9 mice succumbed to disease significantly faster than controls presenting predominantly with myeloid leukemia while infected WT animals developed predominantly lymphoid leukemia. We identified 88 candidate cancer genes near common sites of proviral insertion. Analysis of transcript levels revealed significantly elevated expression of Mn1, and a trend toward increased expression of Bcl11a and Fosb in Mll-AF9 murine leukemia samples with proviral insertions proximal to these genes. Accordingly, FOSB and BCL11A were also overexpressed in human AML harboring MLL gene translocations. FOSB was revealed to be essential for growth in mouse and human myeloid leukemia cells using shRNA lentiviral vectors in vitro. Importantly, MN1 cooperated with Mll-AF9 in leukemogenesis in an in vivo BM viral transduction and transplantation assay. Together, our data identified genes that define transcription factor networks and important genetic pathways acting during progression of leukemia induced by MLL fusion oncogenes.

    Funded by: Cancer Research UK; Howard Hughes Medical Institute; NCI NIH HHS: CA009138, F32 CA106192, K01 CA122183, U01 CA84221; Wellcome Trust

    Blood 2012;119;19;4512-23

  • Genomic Comparison of the Closely Related Salmonella enterica Serovars Enteritidis and Dublin.

    Betancor L, Yim L, Martínez A, Fookes M, Sasias S, Schelotto F, Thomson N, Maskell D and Chabalgoity JA

    Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de la República, Av. A. Navarro 3051, CP 11600, Montevideo, Uruguay.

    The Enteritidis and Dublin serovars of Salmonella enterica are closely related, yet they differ significantly in pathogenicity and epidemiology. S. Enteritidis is a broad host range serovar that commonly causes gastroenteritis and infrequently causes invasive disease in humans. S. Dublin mainly colonizes cattle but upon infecting humans often results in invasive disease.To gain a broader view of the extent of these differences we conducted microarray-based comparative genomics between several field isolates from each serovar. Genome degradation has been correlated with host adaptation in Salmonella, thus we also compared at whole genome scale the available genomic sequences of them to evaluate pseudogene composition within each serovar.Microarray analysis revealed 3771 CDS shared by both serovars while 33 were only present in Enteritidis and 87 were exclusive to Dublin. Pseudogene evaluation showed 177 inactive CDS in S. Dublin which correspond to active genes in S. Enteritidis, nine of which are also inactive in the host adapted S. Gallinarum and S. Choleraesuis serovars. Sequencing of these 9 CDS in several S. Dublin clinical isolates revealed that they are pseudogenes in all of them, indicating that this feature is not peculiar to the sequenced strain. Among these CDS, shdA (Peyer´s patch colonization factor) and mglA (galactoside transport ATP binding protein), appear also to be inactive in the human adapted S. Typhi and S. Paratyphi A, suggesting that functionality of these genes may be relevant for the capacity of certain Salmonella serovars to infect a broad range of hosts.

    The open microbiology journal 2012;6;5-13

  • Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes.

    Biankin AV, Waddell N, Kassahn KS, Gingras MC, Muthuswamy LB, Johns AL, Miller DK, Wilson PJ, Patch AM, Wu J, Chang DK, Cowley MJ, Gardiner BB, Song S, Harliwong I, Idrisoglu S, Nourse C, Nourbakhsh E, Manning S, Wani S, Gongora M, Pajic M, Scarlett CJ, Gill AJ, Pinho AV, Rooman I, Anderson M, Holmes O, Leonard C, Taylor D, Wood S, Xu Q, Nones K, Fink JL, Christ A, Bruxner T, Cloonan N, Kolle G, Newell F, Pinese M, Mead RS, Humphris JL, Kaplan W, Jones MD, Colvin EK, Nagrial AM, Humphrey ES, Chou A, Chin VT, Chantrill LA, Mawson A, Samra JS, Kench JG, Lovell JA, Daly RJ, Merrett ND, Toon C, Epari K, Nguyen NQ, Barbour A, Zeps N, Australian Pancreatic Cancer Genome Initiative, Kakkar N, Zhao F, Wu YQ, Wang M, Muzny DM, Fisher WE, Brunicardi FC, Hodges SE, Reid JG, Drummond J, Chang K, Han Y, Lewis LR, Dinh H, Buhay CJ, Beck T, Timms L, Sam M, Begley K, Brown A, Pai D, Panchal A, Buchner N, De Borja R, Denroche RE, Yung CK, Serra S, Onetto N, Mukhopadhyay D, Tsao MS, Shaw PA, Petersen GM, Gallinger S, Hruban RH, Maitra A, Iacobuzio-Donahue CA, Schulick RD, Wolfgang CL, Morgan RA, Lawlor RT, Capelli P, Corbo V, Scardoni M, Tortora G, Tempero MA, Mann KM, Jenkins NA, Perez-Mancera PA, Adams DJ, Largaespada DA, Wessels LF, Rust AG, Stein LD, Tuveson DA, Copeland NG, Musgrove EA, Scarpa A, Eshleman JR, Hudson TJ, Sutherland RL, Wheeler DA, Pearson JV, McPherson JD, Gibbs RA and Grimmond SM

    The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, Sydney, New South Wales 2010, Australia.

    Pancreatic cancer is a highly lethal malignancy with few effective therapies. We performed exome sequencing and copy number analysis to define genomic aberrations in a prospectively accrued clinical cohort (n = 142) of early (stage I and II) sporadic pancreatic ductal adenocarcinoma. Detailed analysis of 99 informative tumours identified substantial heterogeneity with 2,016 non-silent mutations and 1,628 copy-number variations. We define 16 significantly mutated genes, reaffirming known mutations (KRAS, TP53, CDKN2A, SMAD4, MLL3, TGFBR2, ARID1A and SF3B1), and uncover novel mutated genes including additional genes involved in chromatin modification (EPC1 and ARID2), DNA damage repair (ATM) and other mechanisms (ZIM2, MAP2K4, NALCN, SLC16A4 and MAGEA6). Integrative analysis with in vitro functional data and animal models provided supportive evidence for potential roles for these genetic aberrations in carcinogenesis. Pathway-based analysis of recurrently mutated genes recapitulated clustering in core signalling pathways in pancreatic ductal adenocarcinoma, and identified new mutated genes in each pathway. We also identified frequent and diverse somatic aberrations in genes described traditionally as embryonic regulators of axon guidance, particularly SLIT/ROBO signalling, which was also evident in murine Sleeping Beauty transposon-mediated somatic mutagenesis models of pancreatic cancer, providing further supportive evidence for the potential involvement of axon guidance genes in pancreatic carcinogenesis.

    Funded by: Cancer Research UK; NCI NIH HHS: 2P50CA101955, P01CA134292, P50 CA062924, P50 CA101955, P50 CA102701, P50CA062924, R01 CA97075; NHGRI NIH HHS: U54 HG003273; Wellcome Trust

    Nature 2012;491;7424;399-405

  • Impact of common variation in bone-related genes on type 2 diabetes and related traits.

    Billings LK, Hsu YH, Ackerman RJ, Dupuis J, Voight BF, Rasmussen-Torvik LJ, Hercberg S, Lathrop M, Barnes D, Langenberg C, Hui J, Fu M, Bouatia-Naji N, Lecoeur C, An P, Magnusson PK, Surakka I, Ripatti S, Christiansen L, Dalgård C, Folkersen L, Grundberg E, MAGIC Investigators, DIAGRAM + Consortium, MuTHER Consortium, ASCOT Investigators, GEFOS Consortium, Eriksson P, Kaprio J, Ohm Kyvik K, Pedersen NL, Borecki IB, Province MA, Balkau B, Froguel P, Shuldiner AR, Palmer LJ, Wareham N, Meneton P, Johnson T, Pankow JS, Karasik D, Meigs JB, Kiel DP and Florez JC

    Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts, USA.

    Exploring genetic pleiotropy can provide clues to a mechanism underlying the observed epidemiological association between type 2 diabetes and heightened fracture risk. We examined genetic variants associated with bone mineral density (BMD) for association with type 2 diabetes and glycemic traits in large well-phenotyped and -genotyped consortia. We undertook follow-up analysis in ∼19,000 individuals and assessed gene expression. We queried single nucleotide polymorphisms (SNPs) associated with BMD at levels of genome-wide significance, variants in linkage disequilibrium (r(2) > 0.5), and BMD candidate genes. SNP rs6867040, at the ITGA1 locus, was associated with a 0.0166 mmol/L (0.004) increase in fasting glucose per C allele in the combined analysis. Genetic variants in the ITGA1 locus were associated with its expression in the liver but not in adipose tissue. ITGA1 variants appeared among the top loci associated with type 2 diabetes, fasting insulin, β-cell function by homeostasis model assessment, and 2-h post-oral glucose tolerance test glucose and insulin levels. ITGA1 has demonstrated genetic pleiotropy in prior studies, and its suggested role in liver fibrosis, insulin secretion, and bone healing lends credence to its contribution to both osteoporosis and type 2 diabetes. These findings further underscore the link between skeletal and glucose metabolism and highlight a locus to direct future investigations.

    Funded by: Medical Research Council: G0900339, MC_U106179471; NCRR NIH HHS: 1-S10-RR-163736-01A1, M01-RR-16500, UL1-RR-025005; NHGRI NIH HHS: U01-HG-004402; NHLBI NIH HHS: 5-R01-HL-08770003, 5-R01-HL-08821502, N01-HC-25195, N02-HL-6-4278, R01-HL-086694, R01-HL-087641, R01-HL-59367, U01-HL-72515; NIA NIH HHS: R01-AG-18728, R01-AR/AG-41398; NIAMS NIH HHS: R21-AR-056405; NIDDK NIH HHS: 1-L30-DK-089944-01, 5-R01-DK-06833603, 5-R01-DK-07568102, K24 DK080140, K24-DK-080140, P30-DK-072488, P60-DK-079637, R01-DK-04261, R01-DK-078616, T32-DK-007028-35; PHS HHS: HHSN268200625226C, HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, HHSN268201100012C; Wellcome Trust

    Diabetes 2012;61;8;2176-86

  • Rare MTNR1B variants impairing melatonin receptor 1B function contribute to type 2 diabetes.

    Bonnefond A, Clément N, Fawcett K, Yengo L, Vaillant E, Guillaume JL, Dechaume A, Payne F, Roussel R, Czernichow S, Hercberg S, Hadjadj S, Balkau B, Marre M, Lantieri O, Langenberg C, Bouatia-Naji N, Meta-Analysis of Glucose and Insulin-Related Traits Consortium (MAGIC), Charpentier G, Vaxillaire M, Rocheleau G, Wareham NJ, Sladek R, McCarthy MI, Dina C, Barroso I, Jockers R and Froguel P

    Centre National de la Recherche Scientifique Unité Mixte de Recherche, Lille Pasteur Institute, France.

    Genome-wide association studies have revealed that common noncoding variants in MTNR1B (encoding melatonin receptor 1B, also known as MT(2)) increase type 2 diabetes (T2D) risk(1,2). Although the strongest association signal was highly significant (P < 1 × 10(-20)), its contribution to T2D risk was modest (odds ratio (OR) of ∼1.10-1.15)(1-3). We performed large-scale exon resequencing in 7,632 Europeans, including 2,186 individuals with T2D, and identified 40 nonsynonymous variants, including 36 very rare variants (minor allele frequency (MAF) <0.1%), associated with T2D (OR = 3.31, 95% confidence interval (CI) = 1.78-6.18; P = 1.64 × 10(-4)). A four-tiered functional investigation of all 40 mutants revealed that 14 were non-functional and rare (MAF < 1%), and 4 were very rare with complete loss of melatonin binding and signaling capabilities. Among the very rare variants, the partial- or total-loss-of-function variants but not the neutral ones contributed to T2D (OR = 5.67, CI = 2.17-14.82; P = 4.09 × 10(-4)). Genotyping the four complete loss-of-function variants in 11,854 additional individuals revealed their association with T2D risk (8,153 individuals with T2D and 10,100 controls; OR = 3.88, CI = 1.49-10.07; P = 5.37 × 10(-3)). This study establishes a firm functional link between MTNR1B and T2D risk.

    Funded by: Medical Research Council: MC_U106179471; Wellcome Trust: 077016, 077016/Z/05/Z, 090532

    Nature genetics 2012;44;3;297-301

  • Genome-wide association analysis of eating disorder-related symptoms, behaviors, and personality traits.

    Boraska V, Davis OS, Cherkas LF, Helder SG, Harris J, Krug I, Pei-Chi Liao T, Treasure J, Ntalla I, Karhunen L, Keski-Rahkonen A, Christakopoulou D, Raevuori A, Shin SY, Dedoussis GV, Kaprio J, Soranzo N, Spector TD, Collier DA and Zeggini E

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK; Department of Medical Biology, University of Split School of Medicine, Split, Croatia. vboraska@mefst.hr, vb2@sanger.ac.uk.

    Eating disorders (EDs) are common, complex psychiatric disorders thought to be caused by both genetic and environmental factors. They share many symptoms, behaviors, and personality traits, which may have overlapping heritability. The aim of the present study is to perform a genome-wide association scan (GWAS) of six ED phenotypes comprising three symptom traits from the Eating Disorders Inventory 2 [Drive for Thinness (DT), Body Dissatisfaction (BD), and Bulimia], Weight Fluctuation symptom, Breakfast Skipping behavior and Childhood Obsessive-Compulsive Personality Disorder trait (CHIRP). Investigated traits were derived from standardized self-report questionnaires completed by the TwinsUK population-based cohort. We tested 283,744 directly typed SNPs across six phenotypes of interest in the TwinsUK discovery dataset and followed-up signals from various strata using a two-stage replication strategy in two independent cohorts of European ancestry. We meta-analyzed a total of 2,698 individuals for DT, 2,680 for BD, 2,789 (821 cases/1,968 controls) for Bulimia, 1,360 (633 cases/727 controls) for Childhood Obsessive-Compulsive Personality Disorder trait, 2,773 (761 cases/2,012 controls) for Breakfast Skipping, and 2,967 (798 cases/2,169 controls) for Weight Fluctuation symptom. In this GWAS analysis of six ED-related phenotypes, we detected association of eight genetic variants with P < 10(-5) . Genetic variants that showed suggestive evidence of association were previously associated with several psychiatric disorders and ED-related phenotypes. Our study indicates that larger-scale collaborative studies will be needed to achieve the necessary power to detect loci underlying ED-related traits. © 2012 Wiley Periodicals, Inc.

    American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics 2012;159B;7;803-11

  • Genome-wide association study to identify common variants associated with brachial circumference: a meta-analysis of 14 cohorts.

    Boraska V, Day-Williams A, Franklin CS, Elliott KS, Panoutsopoulou K, Tachmazidou I, Albrecht E, Bandinelli S, Beilin LJ, Bochud M, Cadby G, Ernst F, Evans DM, Hayward C, Hicks AA, Huffman J, Huth C, James AL, Klopp N, Kolcic I, Kutalik Z, Lawlor DA, Musk AW, Pehlic M, Pennell CE, Perry JR, Peters A, Polasek O, St Pourcain B, Ring SM, Salvi E, Schipf S, Staessen JA, Teumer A, Timpson N, Vitart V, Warrington NM, Yaghootkar H, Zemunik T, Zgaga L, An P, Anttila V, Borecki IB, Holmen J, Ntalla I, Palotie A, Pietiläinen KH, Wedenoja J, Winsvold BS, Dedoussis GV, Kaprio J, Province MA, Zwart JA, Burnier M, Campbell H, Cusi D, Smith GD, Frayling TM, Gieger C, Palmer LJ, Pramstaller PP, Rudan I, Völzke H, Wichmann HE, Wright AF and Zeggini E

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom. vb2@sanger.ac.uk

    Brachial circumference (BC), also known as upper arm or mid arm circumference, can be used as an indicator of muscle mass and fat tissue, which are distributed differently in men and women. Analysis of anthropometric measures of peripheral fat distribution such as BC could help in understanding the complex pathophysiology behind overweight and obesity. The purpose of this study is to identify genetic variants associated with BC through a large-scale genome-wide association scan (GWAS) meta-analysis. We used fixed-effects meta-analysis to synthesise summary results across 14 GWAS discovery and 4 replication cohorts comprising overall 22,376 individuals (12,031 women and 10,345 men) of European ancestry. Individual analyses were carried out for men, women, and combined across sexes using linear regression and an additive genetic model: adjusted for age and adjusted for age and BMI. We prioritised signals for follow-up in two-stages. We did not detect any signals reaching genome-wide significance. The FTO rs9939609 SNP showed nominal evidence for association (p<0.05) in the age-adjusted strata for men and across both sexes. In this first GWAS meta-analysis for BC to date, we have not identified any genome-wide significant signals and do not observe robust association of previously established obesity loci with BC. Large-scale collaborations will be necessary to achieve higher power to detect loci underlying BC.

    Funded by: Canadian Institutes of Health Research: MOP-82893; Medical Research Council: G0800582, G9815508, MC_PC_U127561128, MC_U127561128; NHLBI NIH HHS: R01-HL-087700, R01-HL-088215; NIA NIH HHS: N01-AG-5-0002, N1-AG-1-1, N1-AG-1-2111; NIDDK NIH HHS: R01-DK-075681, R01-DK-8925601; NIMHD NIH HHS: 263 MD 821336, 263 MD 9164; Wellcome Trust: 092731, 098051, WT089062, WT092731

    PloS one 2012;7;3;e31369

  • Genome-wide meta-analysis of common variant differences between men and women.

    Boraska V, Jerončić A, Colonna V, Southam L, Nyholt DR, Rayner NW, Perry JR, Toniolo D, Albrecht E, Ang W, Bandinelli S, Barbalic M, Barroso I, Beckmann JS, Biffar R, Boomsma D, Campbell H, Corre T, Erdmann J, Esko T, Fischer K, Franceschini N, Frayling TM, Girotto G, Gonzalez JR, Harris TB, Heath AC, Heid IM, Hoffmann W, Hofman A, Horikoshi M, Zhao JH, Jackson AU, Hottenga JJ, Jula A, Kähönen M, Khaw KT, Kiemeney LA, Klopp N, Kutalik Z, Lagou V, Launer LJ, Lehtimäki T, Lemire M, Lokki ML, Loley C, Luan J, Mangino M, Mateo Leach I, Medland SE, Mihailov E, Montgomery GW, Navis G, Newnham J, Nieminen MS, Palotie A, Panoutsopoulou K, Peters A, Pirastu N, Polasek O, Rehnström K, Ripatti S, Ritchie GR, Rivadeneira F, Robino A, Samani NJ, Shin SY, Sinisalo J, Smit JH, Soranzo N, Stolk L, Swinkels DW, Tanaka T, Teumer A, Tönjes A, Traglia M, Tuomilehto J, Valsesia A, van Gilst WH, van Meurs JB, Smith AV, Viikari J, Vink JM, Waeber G, Warrington NM, Widen E, Willemsen G, Wright AF, Zanke BW, Zgaga L, Wellcome Trust Case Control Consortium, Boehnke M, d'Adamo AP, de Geus E, Demerath EW, den Heijer M, Eriksson JG, Ferrucci L, Gieger C, Gudnason V, Hayward C, Hengstenberg C, Hudson TJ, Järvelin MR, Kogevinas M, Loos RJ, Martin NG, Metspalu A, Pennell CE, Penninx BW, Perola M, Raitakari O, Salomaa V, Schreiber S, Schunkert H, Spector TD, Stumvoll M, Uitterlinden AG, Ulivi S, van der Harst P, Vollenweider P, Völzke H, Wareham NJ, Wichmann HE, Wilson JF, Rudan I, Xue Y and Zeggini E

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. vboraska@mefst.hr

    The male-to-female sex ratio at birth is constant across world populations with an average of 1.06 (106 male to 100 female live births) for populations of European descent. The sex ratio is considered to be affected by numerous biological and environmental factors and to have a heritable component. The aim of this study was to investigate the presence of common allele modest effects at autosomal and chromosome X variants that could explain the observed sex ratio at birth. We conducted a large-scale genome-wide association scan (GWAS) meta-analysis across 51 studies, comprising overall 114 863 individuals (61 094 women and 53 769 men) of European ancestry and 2 623 828 common (minor allele frequency >0.05) single-nucleotide polymorphisms (SNPs). Allele frequencies were compared between men and women for directly-typed and imputed variants within each study. Forward-time simulations for unlinked, neutral, autosomal, common loci were performed under the demographic model for European populations with a fixed sex ratio and a random mating scheme to assess the probability of detecting significant allele frequency differences. We do not detect any genome-wide significant (P < 5 × 10(-8)) common SNP differences between men and women in this well-powered meta-analysis. The simulated data provided results entirely consistent with these findings. This large-scale investigation across ~115 000 individuals shows no detectable contribution from common genetic variants to the observed skew in the sex ratio. The absence of sex-specific differences is useful in guiding genetic association study design, for example when using mixed controls for sex-biased traits.

    Funded by: Canadian Institutes of Health Research: MOP-82893; Cancer Research UK; Chief Scientist Office: CZB/4/710; Medical Research Council: G0401527, G1000143, G1001799, MC_PC_U127561128, MC_U106179471, MC_U127561128; NCRR NIH HHS: RR018787, UL1RR025005; NHGRI NIH HHS: U01HG004402; NHLBI NIH HHS: HL65234, HL67466, R01HL086694, R01HL087641, R01HL59367; NIA NIH HHS: N.1-AG-1-1, N.1-AG-1-2111, N01-AG-1-2100, N01-AG-5-0002; NIAAA NIH HHS: AA07535, AA10248, AA13320, AA13321, AA13326, AA14041; NIDDK NIH HHS: DK062370; NIMH NIH HHS: MH081802, MH66206, R01 MH059160, U24 MH068457-06; NLM NIH HHS: LM010098; PHS HHS: HHSN268200625226C, HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, HHSN268201100012C; Wellcome Trust: 076113, 089062/Z/09/Z, 092447/Z/10/Z, 095831, 098051, 89061/Z/09/Z

    Human molecular genetics 2012;21;21;4805-15

  • Omi, a recessive mutation on chromosome 10, is a novel allele of Ostm1.

    Bosman EA, Estabel J, Ismail O, Podrini C, White JK and Steel KP

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK, eabosman@yahoo.com.

    Large-scale N-ethyl-N-nitrosourea (ENU) mutagenesis has provided many rodent models for human disease. Here we describe the initial characterization and mapping of a recessive mutation that leads to degeneration of the incisors, failure of molars to erupt, a grey coat colour, and mild osteopetrosis. We mapped the omi mutation to chromosome 10 between D10Mit214 and D10Mit194. The Ostm1 gene is a likely candidate gene in this region and the grey-lethal allele, Ostm1 ( gl ), and omi mutations fail to complement each other. We show that om/om mice have reduced levels of Ostm1 protein. To date we have not been able to identify the causative mutation. We propose that omi is a novel hypomorphic mutation affecting Ostm1 expression, potentially in a regulatory element.

    Mammalian genome : official journal of the International Mammalian Genome Society 2012

  • A genome-wide association meta-analysis identifies new childhood obesity loci.

    Bradfield JP, Taal HR, Timpson NJ, Scherag A, Lecoeur C, Warrington NM, Hypponen E, Holst C, Valcarcel B, Thiering E, Salem RM, Schumacher FR, Cousminer DL, Sleiman PM, Zhao J, Berkowitz RI, Vimaleswaran KS, Jarick I, Pennell CE, Evans DM, St Pourcain B, Berry DJ, Mook-Kanamori DO, Hofman A, Rivadeneira F, Uitterlinden AG, van Duijn CM, van der Valk RJ, de Jongste JC, Postma DS, Boomsma DI, Gauderman WJ, Hassanein MT, Lindgren CM, Mägi R, Boreham CA, Neville CE, Moreno LA, Elliott P, Pouta A, Hartikainen AL, Li M, Raitakari O, Lehtimäki T, Eriksson JG, Palotie A, Dallongeville J, Das S, Deloukas P, McMahon G, Ring SM, Kemp JP, Buxton JL, Blakemore AI, Bustamante M, Guxens M, Hirschhorn JN, Gillman MW, Kreiner-Møller E, Bisgaard H, Gilliland FD, Heinrich J, Wheeler E, Barroso I, O'Rahilly S, Meirhaeghe A, Sørensen TI, Power C, Palmer LJ, Hinney A, Widen E, Farooqi IS, McCarthy MI, Froguel P, Meyre D, Hebebrand J, Jarvelin MR, Jaddoe VW, Smith GD, Hakonarson H, Grant SF and Early Growth Genetics Consortium

    Center for Applied Genomics, Abramson Research Center, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, USA.

    Multiple genetic variants have been associated with adult obesity and a few with severe obesity in childhood; however, less progress has been made in establishing genetic influences on common early-onset obesity. We performed a North American, Australian and European collaborative meta-analysis of 14 studies consisting of 5,530 cases (≥95th percentile of body mass index (BMI)) and 8,318 controls (<50th percentile of BMI) of European ancestry. Taking forward the eight newly discovered signals yielding association with P < 5 × 10(-6) in nine independent data sets (2,818 cases and 4,083 controls), we observed two loci that yielded genome-wide significant combined P values near OLFM4 at 13q14 (rs9568856; P = 1.82 × 10(-9); odds ratio (OR) = 1.22) and within HOXB5 at 17q21 (rs9299; P = 3.54 × 10(-9); OR = 1.14). Both loci continued to show association when two extreme childhood obesity cohorts were included (2,214 cases and 2,674 controls). These two loci also yielded directionally consistent associations in a previous meta-analysis of adult BMI(1).

    Funded by: British Heart Foundation: PG/09/023, PG/09/023/26806, PG/1996183/9569; Canadian Institutes of Health Research: MOP-82893; Medical Research Council: 74882, G0000934, G0100103, G0500539, G0600705, G0601653, G0800582, G0801056, G0900554, MC_UP_A620_1014; NHLBI NIH HHS: 1RC2HL101543, 1RC2HL101651, 5R01HL061768, 5R01HL076647, 5R01HL087679-02, 5R01HL087680; NICHD NIH HHS: R01 HD056465, R01 HD056465-01A1, R01 HD056465-02, R01 HD056465-03, R01 HD056465-04, R01 HD056465-05; NIDDK NIH HHS: R01 DK075787, U01 DK062418; NIEHS NIH HHS: 5P01ES009581, 5P01ES011627, 5P30ES007048, 5R01ES014447, 5R01ES014708, 5R01ES016535, 5R03ES014046; NIMH NIH HHS: 1RL1MH083268-01, 5R01MH63706:02; ORD VA: RD831861-01; PHS HHS: R826708-01; Wellcome Trust: 052515/2/97/2, 068545/Z/02, 076467, 077016/Z/05/Z, 083948, 086596/Z/08/Z, 090532, 092731, GR069224, WT088431MA

    Nature genetics 2012;44;5;526-31

  • The mammalian gene function resource: the international knockout mouse consortium.

    Bradley A, Anastassiadis K, Ayadi A, Battey JF, Bell C, Birling MC, Bottomley J, Brown SD, Bürger A, Bult CJ, Bushell W, Collins FS, Desaintes C, Doe B, Economides A, Eppig JT, Finnell RH, Fletcher C, Fray M, Frendewey D, Friedel RH, Grosveld FG, Hansen J, Hérault Y, Hicks G, Hörlein A, Houghton R, Hrabé de Angelis M, Huylebroeck D, Iyer V, de Jong PJ, Kadin JA, Kaloff C, Kennedy K, Koutsourakis M, Kent Lloyd KC, Marschall S, Mason J, McKerlie C, McLeod MP, von Melchner H, Moore M, Mujica AO, Nagy A, Nefedov M, Nutter LM, Pavlovic G, Peterson JL, Pollock J, Ramirez-Solis R, Rancourt DE, Raspa M, Remacle JE, Ringwald M, Rosen B, Rosenthal N, Rossant J, Ruiz Noppinger P, Ryder E, Schick JZ, Schnütgen F, Schofield P, Seisenberger C, Selloum M, Simpson EM, Skarnes WC, Smedley D, Stanford WL, Francis Stewart A, Stone K, Swan K, Tadepally H, Teboul L, Tocchini-Valentini GP, Valenzuela D, West AP, Yamamura KI, Yoshinaga Y and Wurst W

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1HH, UK, abradley@sanger.ac.uk.

    In 2007, the International Knockout Mouse Consortium (IKMC) made the ambitious promise to generate mutations in virtually every protein-coding gene of the mouse genome in a concerted worldwide action. Now, 5 years later, the IKMC members have developed high-throughput gene trapping and, in particular, gene-targeting pipelines and generated more than 17,400 mutant murine embryonic stem (ES) cell clones and more than 1,700 mutant mouse strains, most of them conditional. A common IKMC web portal ( www.knockoutmouse.org ) has been established, allowing easy access to this unparalleled biological resource. The IKMC materials considerably enhance functional gene annotation of the mammalian genome and will have a major impact on future biomedical research.

    Mammalian genome : official journal of the International Mammalian Genome Society 2012

  • Image-based characterization of thrombus formation in time-lapse DIC microscopy.

    Brieu N, Navab N, Serbanovic-Canic J, Ouwehand WH, Stemple DL, Cvejic A and Groher M

    Computer Aided Medical Procedures, Technische Universität München (TUM), Garching bei München 85748, Germany. brieu@in.tum.de

    The characterization of thrombus formation in time-lapse DIC microscopy is of increased interest for identifying genes which account for atherothrombosis and coronary artery diseases (CADs). In particular, we are interested in large-scale studies on zebrafish, which result in large amount of data, and require automatic processing. In this work, we present an image-based solution for the automatized extraction of parameters quantifying the temporal development of thrombotic plugs. Our system is based on the joint segmentation of thrombotic and aortic regions over time. This task is made difficult by the low contrast and the high dynamic conditions observed in vivo DIC microscopic scenes. Our key idea is to perform this segmentation by distinguishing the different motion patterns in image time series rather than by solving standard image segmentation tasks in each image frame. Thus, we are able to compensate for the poor imaging conditions. We model motion patterns by energies based on the idea of dynamic textures, and regularize the model by two prior energies on the shape of the aortic region and on the topological relationship between the thrombus and the aorta. We demonstrate the performance of our segmentation algorithm by qualitative and quantitative experiments on synthetic examples as well as on real in vivo microscopic sequences.

    Medical image analysis 2012;16;4;915-31

  • Analysis of Detailed Phenotype Profiles Reveals CHRNA5-CHRNA3-CHRNB4 Gene Cluster Association With Several Nicotine Dependence Traits.

    Broms U, Wedenoja J, Largeau MR, Korhonen T, Pitkäniemi J, Keskitalo-Vuokko K, Häppölä A, Heikkilä KH, Heikkilä K, Ripatti S, Sarin AP, Salminen O, Paunio T, Pergadia ML, Madden PA, Kaprio J and Loukola A

    Department of Public Health, Hjelt Institute, University of Helsinki, P.O.Box 41, FI-00014, Helsinki, Finland. anu.loukola@helsinki.fi.

    Introduction: The role of the nicotinic acetylcholine receptor gene cluster on chromosome 15q24-25 in the etiology of nicotine dependence (ND) is still being defined. In this study, we included all 15 tagging single nucleotide polymorphisms (SNPs) within the CHRNA5-CHRNA3-CHRNB4 cluster and tested associations with 30 smoking-related phenotypes.

    Methods: The study sample was ascertained from the Finnish Twin Cohort study. Twin pairs born 1938-1957 and concordant for a history of cigarette smoking were recruited along with their family members (mainly siblings), as part of the Nicotine Addiction Genetics consortium. The study sample consisted of 1,428 individuals (59% males) from 735 families, with mean age 55.6 years.

    Results: We detected multiple novel associations for ND. DSM-IV ND symptoms associated significantly with the proxy SNP Locus 1 (rs2036527, p = .000009) and Locus 2 (rs578776, p = .0001) and tolerance factor of the Nicotine Dependence Syndrome Scale (NDSS) showed suggestive association to rs11636753 (p = .0059), rs11634351 (p = .0069), and rs1948 (p = .0071) in CHRNB4. Furthermore, we report significant association with DSM-IV ND diagnosis (rs2036527, p = .0003) for the first time in a Caucasian population. Several SNPs indicated suggestive association for traits related to ages at smoking initiation. Also, rs11636753 in CHRNB4 showed suggestive association with regular drinking (p = .0029) and the comorbidity of depression and ND (p = .0034). Conclusions: We demonstrate novel associations of DSM-IV ND symptoms and the NDSS tolerance subscale. Our results confirm and extend association findings for other ND measures. We show pleiotropic effects of this gene cluster on multiple measures of ND and also regular drinking and the comorbidity of ND and depression.

    Funded by: NIDA NIH HHS: R01 DA012854-10

    Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 2012;14;6;720-33

  • Developing insights into the mechanisms of evolution of bacterial pathogens from whole-genome sequences.

    Bryant J, Chewapreecha C and Bentley SD

    Pathogen Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK.

    Evolution of bacterial pathogen populations has been detected in a variety of ways including phenotypic tests, such as metabolic activity, reaction to antisera and drug resistance and genotypic tests that measure variation in chromosome structure, repetitive loci and individual gene sequences. While informative, these methods only capture a small subset of the total variation and, therefore, have limited resolution. Advances in sequencing technologies have made it feasible to capture whole-genome sequence variation for each sample under study, providing the potential to detect all changes at all positions in the genome from single nucleotide changes to large-scale insertions and deletions. In this review, we focus on recent work that has applied this powerful new approach and summarize some of the advances that this has brought in our understanding of the details of how bacterial pathogens evolve.

    Future microbiology 2012;7;11;1283-96

  • Discovering genes involved in alcohol dependence and other alcohol responses: role of animal models.

    Buck KJ, Milner LC, Denmark DL, Grant SG and Kozell LB

    Oregon Health & Science University, Portland, Oregon and the Department of Veterans Affairs Medical Center, Portland, Oregon.

    The genetic determinants of alcoholism still are largely unknown, hindering effective treatment and prevention. Systematic approaches to gene discovery are critical if novel genes and mechanisms involved in alcohol dependence are to be identified. Although no animal model can duplicate all aspects of alcoholism in humans, robust animal models for specific alcohol-related traits, including physiological alcohol dependence and associated withdrawal, have been invaluable resources. Using a variety of genetic animal models, the identification of regions of chromosomal DNA that contain a gene or genes which affect a complex phenotype (i.e., quantitative trait loci [QTLs]) has allowed unbiased searches for candidate genes. Several QTLs with large effects on alcohol withdrawal severity in mice have been detected, and fine mapping of these QTLs has placed them in small intervals on mouse chromosomes 1 and 4 (which correspond to certain regions on human chromosomes 1 and 9). Subsequent work led to the identification of underlying quantitative trait genes (QTGs) (e.g., Mpdz) and high-quality QTG candidates (e.g., Kcnj9 and genes involved in mitochondrial respiration and oxidative stress) and their plausible mechanisms of action. Human association studies provide supporting evidence that these QTLs and QTGs may be directly relevant to alcohol risk factors in clinical populations.

    Alcohol research : current reviews 2012;34;3;367-74

  • Tissue-Specific Splicing of Disordered Segments that Embed Binding Motifs Rewires Protein Interaction Networks.

    Buljan M, Chalancon G, Eustermann S, Wagner GP, Fuxreiter M, Bateman A and Babu MM

    MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 0QH, UK; Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Alternative inclusion of exons increases the functional diversity of proteins. Among alternatively spliced exons, tissue-specific exons play a critical role in maintaining tissue identity. This raises the question of how tissue-specific protein-coding exons influence protein function. Here we investigate the structural, functional, interaction, and evolutionary properties of constitutive, tissue-specific, and other alternative exons in human. We find that tissue-specific protein segments often contain disordered regions, are enriched in posttranslational modification sites, and frequently embed conserved binding motifs. Furthermore, genes containing tissue-specific exons tend to occupy central positions in interaction networks and display distinct interaction partners in the respective tissues, and are enriched in signaling, development, and disease genes. Based on these findings, we propose that tissue-specific inclusion of disordered segments that contain binding motifs rewires interaction networks and signaling pathways. In this way, tissue-specific splicing may contribute to functional versatility of proteins and increases the diversity of interaction networks across tissues.

    Molecular cell 2012;46;6;871-83

  • Biocurators and biocuration: surveying the 21st century challenges.

    Burge S, Attwood TK, Bateman A, Berardini TZ, Cherry M, O'Donovan C, Xenarios L and Gaudet P

    European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK.

    Curated databases are an integral part of the tool set that researchers use on a daily basis for their work. For most users, however, how databases are maintained, and by whom, is rather obscure. The International Society for Biocuration (ISB) represents biocurators, software engineers, developers and researchers with an interest in biocuration. Its goals include fostering communication between biocurators, promoting and describing their work, and highlighting the added value of biocuration to the world. The ISB recently conducted a survey of biocurators to better understand their educational and scientific backgrounds, their motivations for choosing a curatorial job and their career goals. The results are reported here. From the responses received, it is evident that biocuration is performed by highly trained scientists and perceived to be a stimulating career, offering both intellectual challenges and the satisfaction of performing work essential to the modern scientific community. It is also apparent that the ISB has at least a dual role to play to facilitate biocurators' work: (i) to promote biocuration as a career within the greater scientific community; (ii) to aid the development of resources for biomedical research through promotion of nomenclature and data-sharing standards that will allow interconnection of biological databases and better exploit the pivotal contributions that biocurators are making. DATABASE URL: http://biocurator.org.

    Database : the journal of biological databases and curation 2012;2012;bar059

  • Rfam 11.0: 10 years of RNA families.

    Burge SW, Daub J, Eberhardt R, Tate J, Barquist L, Nawrocki EP, Eddy SR, Gardner PP and Bateman A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK, Janelia Farm Research Campus, Howard Hughes Medical Institute, Ashburn, VA 20147, USA and School of Biological Sciences, University of Canterbury, Private Bag 4800, Christchurch, 8140 New Zealand.

    The Rfam database (available via the website at http://rfam.sanger.ac.uk and through our mirror at http://rfam.janelia.org) is a collection of non-coding RNA families, primarily RNAs with a conserved RNA secondary structure, including both RNA genes and mRNA cis-regulatory elements. Each family is represented by a multiple sequence alignment, predicted secondary structure and covariance model. Here we discuss updates to the database in the latest release, Rfam 11.0, including the introduction of genome-based alignments for large families, the introduction of the Rfam Biomart as well as other user interface improvements. Rfam is available under the Creative Commons Zero license.

    Nucleic acids research 2012

  • New translational assays for preclinical modelling of cognition in schizophrenia: the touchscreen testing method for mice and rats.

    Bussey TJ, Holmes A, Lyon L, Mar AC, McAllister KA, Nithianantharajah J, Oomen CA and Saksida LM

    Department of Experimental Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, UK. tjb1000@cam.ac.uk

    We describe a touchscreen method that satisfies a proposed 'wish-list' of desirables for a cognitive testing method for assessing rodent models of schizophrenia. A number of tests relevant to schizophrenia research are described which are currently being developed and validated using this method. These tests can be used to study reward learning, memory, perceptual discrimination, object-place associative learning, attention, impulsivity, compulsivity, extinction, simple Pavlovian conditioning, and other constructs. The tests can be deployed using a 'flexible battery' approach to establish a cognitive profile for a particular mouse or rat model. We have found these tests to be capable of detecting not just impairments in function, but enhancements as well, which is essential for testing putative cognitive therapies. New tests are being continuously developed, many of which may prove particularly valuable for schizophrenia research.

    Neuropharmacology 2012;62;3;1191-203

  • A full-length recombinant Plasmodium falciparum PfRH5 protein induces inhibitory antibodies that are effective across common PfRH5 genetic variants.

    Bustamante LY, Bartholdson SJ, Crosnier C, Campos MG, Wanaguru M, Nguon C, Kwiatkowski DP, Wright GJ and Rayner JC

    Malaria Programme, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    The lack of an effective licensed vaccine remains one of the most significant gaps in the portfolio of tools being developed to eliminate Plasmodium falciparum malaria. Vaccines targeting erythrocyte invasion - an essential step for both parasite development and malaria pathogenesis - have faced the particular challenge of genetic diversity. Immunity-driven balancing selection pressure on parasite invasion proteins often results in the presence of multiple, antigenically distinct, variants within a population, leading to variant-specific immune responses. Such variation makes it difficult to design a vaccine that covers the full range of diversity, and could potentially facilitate the evolution of vaccine-resistant parasite strains. In this study, we investigate the effect of genetic diversity on invasion inhibition by antibodies to a high priority P. falciparum invasion candidate antigen, P. falciparum Reticulocyte Binding Protein Homologue 5 (PfRH5). Previous work has shown that virally delivered PfRH5 can induce antibodies that protect against a wide range of genetic variants. Here, we show that a full-length recombinant PfRH5 protein expressed in mammalian cells is biochemically active, as judged by saturable binding to its receptor, basigin, and is able to induce antibodies that strongly inhibit P. falciparum growth and invasion. Whole genome sequencing of 290 clinical P. falciparum isolates from across the world identifies only five non-synonymous PfRH5 SNPs that are present at frequencies of 10% or more in at least one geographical region. Antibodies raised against the 3D7 variant of PfRH5 were able to inhibit nine different P. falciparum strains, which between them included all of the five most common PfRH5 SNPs in this dataset, with no evidence for strain-specific immunity. We conclude that protein-based PfRH5 vaccines are an urgent priority for human efficacy trials.

    Vaccine 2012

  • The PRoteomics IDEntification (PRIDE) Converter 2 framework: an improved suite of tools to facilitate data submission to the PRIDE database and the ProteomeXchange consortium.

    Côté RG, Griss J, Dianes JA, Wang R, Wright JC, van den Toorn HW, van Breukelen B, Heck AJ, Hulstaert N, Martens L, Reisinger F, Csordas A, Ovelleiro D, Perez-Rivevol Y, Barsnes H, Hermjakob H and Vizcaíno JA

    Proteomics Services Team, EMBL Outstation, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    The original PRIDE Converter tool greatly simplified the process of submitting mass spectrometry (MS)-based proteomics data to the PRIDE database. However, after much user feedback, it was noted that the tool had some limitations and could not handle several user requirements that were now becoming commonplace. This prompted us to design and implement a whole new suite of tools that would build on the successes of the original PRIDE Converter and allow users to generate submission-ready, well-annotated PRIDE XML files. The PRIDE Converter 2 tool suite allows users to convert search result files into PRIDE XML (the format needed for performing submissions to the PRIDE database), generate mzTab skeleton files that can be used as a basis to submit quantitative and gel-based MS data, and post-process PRIDE XML files by filtering out contaminants and empty spectra, or by merging several PRIDE XML files together. All the tools have both a graphical user interface that provides a dialog-based, user-friendly way to convert and prepare files for submission, as well as a command-line interface that can be used to integrate the tools into existing or novel pipelines, for batch processing and power users. The PRIDE Converter 2 tool suite will thus become a cornerstone in the submission process to PRIDE and, by extension, to the ProteomeXchange consortium of MS-proteomics data repositories.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/I024204/1; Wellcome Trust: WT085949MA

    Molecular & cellular proteomics : MCP 2012;11;12;1682-9

  • Telomeres and cancer: from crisis to stability to crisis to stability.

    Campbell PJ

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK.

    Telomere attrition unleashes genomic instability, promoting cancer development. Once established, however, the malignant clone often re-establishes genomic stability through overexpression of telomerase. In two papers, one in this issue of Cell and one in the subsequent issue, DePinho and colleagues explore the consequences of telomerase re-expression and its validity as a therapeutic target in mouse models of cancer.

    Funded by: Wellcome Trust: 093867

    Cell 2012;148;4;633-5

  • Correlation of blood counts with vascular complications in essential thrombocythemia: analysis of the prospective PT1 cohort.

    Campbell PJ, Maclean C, Beer PA, Buck G, Wheatley K, Kiladjian JJ, Forsyth C, Harrison CN and Green AR

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, United Kingdom;

    Essential thrombocythemia, a myeloproliferative neoplasm, is associated with increased platelet count and risk of thrombosis or hemorrhage. Cytoreductive therapy aims to normalize platelet counts, despite minimal association between platelet count and complication rates. Evidence is increasing for correlation between white cell count and thrombosis but prospective data are lacking. We studied the relationship between vascular complications and 21,887 longitudinal blood counts in a prospective, multicenter cohort of 776 ET patients. After correction for confounding variables, no association was seen between blood counts at diagnosis and future complications. However, platelet count outside normal range during follow-up was associated with immediate risk of major hemorrhage (p=0.0005) but not thrombosis (p=0.7). Elevated white cell count during follow-up correlated with thrombosis (p=0.05) and major hemorrhage (p=0.01). These data imply that the aim of cytoreduction in essential thrombocythemia should be to keep the platelet count, and arguably the white cell count, within the normal range. This study is registered at http://isrctn.org as #72251782.

    Blood 2012

  • Specific expression of Kcna10, Pxn and Odf2 in the organ of Corti.

    Carlisle FA, Steel KP and Lewis MA

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    The development of the organ of Corti and the highly specialized cells required for hearing involves a multitude of genes, many of which remain unknown. Here we describe the expression pattern of three genes not previously studied in the inner ear in mice at a range of ages both embryonic and early postnatal. Kcna10, a tetrameric Shaker-like potassium channel, is expressed strongly in the hair cells themselves. Odf2, as its centriolar isoform Cenexin, marks the dendrites extending to and contacting hair cells, and Pxn, a focal adhesion scaffold protein, is most strongly expressed in pillar cells during the ages studied. The roles of these genes are yet to be elucidated, but their specific expression patterns imply potential functional significance in the inner ear.

    Gene expression patterns : GEP 2012;12;5-6;172-9

  • Association study of nonsynonymous single nucleotide polymorphisms in schizophrenia.

    Carrera N, Arrojo M, Sanjuán J, Ramos-Ríos R, Paz E, Suárez-Rama JJ, Páramo M, Agra S, Brenlla J, Martínez S, Rivero O, Collier DA, Palotie A, Cichon S, Nöthen MM, Rietschel M, Rujescu D, Stefansson H, Steinberg S, Sigurdsson E, St Clair D, Tosato S, Werge T, Stefansson K, González JC, Valero J, Gutiérrez-Zotes A, Labad A, Martorell L, Vilella E, Carracedo Á and Costas J

    Fundación Pública Galega de Medicina Xenómica-SERGAS, Hospital Clínico Universitario, Santiago de Compostela, Spain.

    Background: Genome-wide association studies using several hundred thousand anonymous markers present limited statistical power. Alternatively, association studies restricted to common nonsynonymous single nucleotide polymorphisms (nsSNPs) have the advantage of strongly reducing the multiple testing problem, while increasing the probability of testing functional single nucleotide polymorphisms (SNPs).

    Methods: We performed a case-control association study of common nsSNPs in Galician (northwest Spain) samples using the Affymetrix GeneChip Human 20k cSNP Kit, followed by a replication study of the more promising results. After quality control procedures, the discovery sample consisted of 5100 nsSNPs at minor allele frequency >5% analyzed in 476 schizophrenia patients and 447 control subjects. The replication sample consisted of 4069 cases and 15,128 control subjects of European origin. We also performed multilocus analysis, using aggregated scores of nsSNPs at liberal significance thresholds and cross-validation procedures.

    Results: The 5 independent nsSNPs with false discovery rate q ≤ .25, as well as 13 additional nsSNPs at p < .01 and located in functional candidate genes, were genotyped in the replication samples. One SNP, rs13107325, located at the metal ions transporter gene SLC39A8, reached significance in the combined sample after Bonferroni correction (trend test, p = 2.7 × 10(-6), allelic odds ratio = 1.32). This SNP presents minor allele frequency of 5% to 10% in many European populations but is rare outside Europe. We also confirmed the polygenic component of susceptibility.

    Conclusions: Taking into account that another metal ions transporter gene, SLC39A3, is associated to bipolar disorder, our findings reveal a role for brain metal homeostasis in psychosis.

    Biological psychiatry 2012;71;2;169-77

  • Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data.

    Carver T, Harris SR, Berriman M, Parkhill J and McQuillan JA

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. artemis@sanger.ac.uk

    Motivation: High-throughput sequencing (HTS) technologies have made low-cost sequencing of large numbers of samples commonplace. An explosion in the type, not just number, of sequencing experiments has also taken place including genome re-sequencing, population-scale variation detection, whole transcriptome sequencing and genome-wide analysis of protein-bound nucleic acids.

    Results: We present Artemis as a tool for integrated visualization and computational analysis of different types of HTS datasets in the context of a reference genome and its corresponding annotation.

    Availability: Artemis is freely available (under a GPL licence) for download (for MacOSX, UNIX and Windows) at the Wellcome Trust Sanger Institute websites: http://www.sanger.ac.uk/resources/software/artemis/.

    Funded by: Wellcome Trust: WT 076964

    Bioinformatics (Oxford, England) 2012;28;4;464-9

  • Microevolution of extensively drug-resistant tuberculosis in Russia.

    Casali N, Nikolayevskyy V, Balabanova Y, Ignatyeva O, Kontsevaya I, Harris SR, Bentley SD, Parkhill J, Nejentsev S, Hoffner SE, Horstmann RD, Brown T and Drobniewski F

    National Mycobacterium Reference Laboratory, Blizard Institute, Queen Mary, University of London, London E1 2AT, United Kingdom.

    Extensively drug-resistant (XDR) tuberculosis (TB), which is resistant to both first- and second-line antibiotics, is an escalating problem, particularly in the Russian Federation. Molecular fingerprinting of 2348 Mycobacterium tuberculosis isolates collected in Samara Oblast, Russia, revealed that 72% belonged to the Beijing lineage, a genotype associated with enhanced acquisition of drug resistance and increased virulence. Whole-genome sequencing of 34 Samaran isolates, plus 25 isolates representing global M. tuberculosis complex diversity, revealed that Beijing isolates originating in Eastern Europe formed a monophyletic group. Homoplasic polymorphisms within this clade were almost invariably associated with antibiotic resistance, indicating that the evolution of this population is primarily driven by drug therapy. Resistance genotypes showed a strong correlation with drug susceptibility phenotypes. A novel homoplasic mutation in rpoC, found only in isolates carrying a common rpoB rifampicin-resistance mutation, may play a role in fitness compensation. Most multidrug-resistant (MDR) isolates also had mutations in the promoter of a virulence gene, eis, which increase its expression and confer kanamycin resistance. Kanamycin therapy may thus select for mutants with increased virulence, helping preserve bacterial fitness and promoting transmission of drug-resistant TB strains. The East European clade was dominated by two MDR clusters, each disseminated across Samara. Polymorphisms conferring fluoroquinolone resistance were independently acquired multiple times within each cluster, indicating that XDR TB is currently not widely transmitted.

    Genome research 2012;22;4;735-45

  • Phylogeographic variation in recombination rates within a global clone of methicillin-resistant Staphylococcus aureus.

    Castillo-Ramírez S, Corander J, Marttinen P, Aldeljawi M, Hanage WP, Westh H, Boye K, Gulay Z, Bentley SD, Parkhill J, Holden MT and Feil EJ

    Department of Biology and Biochemistry, University of Bath, Claverton Down Bath, Bath and North East Somerset BA2 7AY, UK. E.Feil@bath.ac.uk.

    BACKGROUND: Next-generation sequencing (NGS) is a powerful tool for understanding both patterns of descent over time and space (phylogeography) and the molecular processes underpinning genome divergence in pathogenic bacteria. Here, we describe a synthesis between these perspectives by employing a recently developed Bayesian approach, BRATNextGen, for detecting recombination on an expanded NGS dataset of the globally disseminated methicillin-resistant Staphylococcus aureus (MRSA) clone ST239. RESULTS: The data confirm strong geographical clustering at continental, national and city scales and demonstrate that the rate of recombination varies significantly between phylogeographic sub-groups representing independent introductions from Europe. These differences are most striking when mobile non-core genes are included, but remain apparent even when only considering the stable core genome. The monophyletic ST239 sub-group corresponding to isolates from South America shows heightened recombination, the sub-group predominantly from Asia shows an intermediate level, and a very low level of recombination is noted in a third sub-group representing a large collection from Turkey. CONCLUSIONS: We show that the rapid global dissemination of a single pathogenic bacterial clone results in local variation in measured recombination rates. Possible explanatory variables include the size and time since emergence of each defined sub-population (as determined by the sampling frame), variation in transmission dynamics due to host movement, and changes in the bacterial genome affecting the propensity for recombination.

    Genome biology 2012;13;12;R126

  • Association of the GGCX (CAA)16/17 repeat polymorphism with higher warfarin dose requirements in African Americans.

    Cavallari LH, Perera M, Wadelius M, Deloukas P, Taube G, Patel SR, Aquino-Michaels K, Viana MA, Shapiro NL and Nutescu EA

    Department of Pharmacy Practice, University of Illinois, Chicago, IL 60612-7230, USA. humma@uic.edu

    Objective: Little is known about genetic contributors to higher than usual warfarin dose requirements, particularly for African Americans. This study tested the hypothesis that the γ-glutamyl carboxylase (GGCX) genotype contributes to warfarin dose requirements greater than 7.5 mg/day in an African American population.

    Methods: A total of 338 African Americans on a stable dose of warfarin were enrolled. The GGCX rs10654848 (CAA)n, rs12714145 (G>A), and rs699664 (p.R325Q); VKORC1 c.-1639G>A and rs61162043; and CYP2C9*2, *3, *5, *8, *11, and rs7089580 genotypes were tested for their association with dose requirements greater than 7.5 mg/day alone and in the context of other variables known to influence dose variability.

    Results: The GGCX rs10654848 (CAA)16 or 17 repeat occurred at a frequency of 2.6% in African Americans and was overrepresented among patients requiring greater than 7.5 mg/day versus those who required lower doses (12 vs. 3%, P=0.003; odds ratio 4.0, 95% confidence interval, 1.5-10.5). The GGCX rs10654848 genotype remained associated with high dose requirements on regression analysis including age, body size, and VKORC1 genotype. On linear regression, the GGCX rs10654848 genotype explained 2% of the overall variability in warfarin dose in African Americans. An examination of the GGCX rs10654848 genotype in warfarin-treated Caucasians revealed a (CAA)16 repeat frequency of only 0.27% (P=0.008 compared with African Americans).

    Conclusion: These data support the GGCX rs10654848 genotype as a predictor of higher than usual warfarin doses in African Americans, who have a 10-fold higher frequency of the (CAA)16/17 repeat compared with Caucasians.

    Funded by: NHLBI NIH HHS: K23 HL089808-01A2; Wellcome Trust

    Pharmacogenetics and genomics 2012;22;2;152-8

  • A common X-linked inborn error of carnitine biosynthesis may be a risk factor for nondysmorphic autism.

    Celestino-Soper PB, Violante S, Crawford EL, Luo R, Lionel AC, Delaby E, Cai G, Sadikovic B, Lee K, Lo C, Gao K, Person RE, Moss TJ, German JR, Huang N, Shinawi M, Treadwell-Deering D, Szatmari P, Roberts W, Fernandez B, Schroer RJ, Stevenson RE, Buxbaum JD, Betancur C, Scherer SW, Sanders SJ, Geschwind DH, Sutcliffe JS, Hurles ME, Wanders RJ, Shaw CA, Leal SM, Cook EH, Goin-Kochel RP, Vaz FM and Beaudet AL

    Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.

    We recently reported a deletion of exon 2 of the trimethyllysine hydroxylase epsilon (TMLHE) gene in a proband with autism. TMLHE maps to the X chromosome and encodes the first enzyme in carnitine biosynthesis, 6-N-trimethyllysine dioxygenase. Deletion of exon 2 of TMLHE causes enzyme deficiency, resulting in increased substrate concentration (6-N-trimethyllysine) and decreased product levels (3-hydroxy-6-N-trimethyllysine and γ-butyrobetaine) in plasma and urine. TMLHE deficiency is common in control males (24 in 8,787 or 1 in 366) and was not significantly increased in frequency in probands from simplex autism families (9 in 2,904 or 1 in 323). However, it was 2.82-fold more frequent in probands from male-male multiplex autism families compared with controls (7 in 909 or 1 in 130; P = 0.023). Additionally, six of seven autistic male siblings of probands in male-male multiplex families had the deletion, suggesting that TMLHE deficiency is a risk factor for autism (metaanalysis Z-score = 2.90 and P = 0.0037), although with low penetrance (2-4%). These data suggest that dysregulation of carnitine metabolism may be important in nondysmorphic autism; that abnormalities of carnitine intake, loss, transport, or synthesis may be important in a larger fraction of nondysmorphic autism cases; and that the carnitine pathway may provide a novel target for therapy or prevention of autism.

    Funded by: NICHD NIH HHS: HD-37283, P30HD-0240640; NIMH NIH HHS: 1U24MH081810, R01 MH061009; NINDS NIH HHS: R01 NS049261; Wellcome Trust: 076113, 077014/Z/05/Z

    Proceedings of the National Academy of Sciences of the United States of America 2012;109;21;7974-81

  • Finding a needle in a haystack. Microbial metatranscriptomes.

    Chappell L

    This month's Genome Watch highlights some of the technical challenges that need to be overcome to gain further insight into microbial metatranscriptomes.

    Nature reviews. Microbiology 2012;10;7;446

  • Inheritance of coronary artery disease in men: an analysis of the role of the Y chromosome.

    Charchar FJ, Bloomer LD, Barnes TA, Cowley MJ, Nelson CP, Wang Y, Denniff M, Debiec R, Christofidou P, Nankervis S, Dominiczak AF, Bani-Mustafa A, Balmforth AJ, Hall AS, Erdmann J, Cambien F, Deloukas P, Hengstenberg C, Packard C, Schunkert H, Ouwehand WH, Ford I, Goodall AH, Jobling MA, Samani NJ and Tomaszewski M

    School of Health Sciences, University of Ballarat, Ballarat, VIC, Australia.

    Background: A sexual dimorphism exists in the incidence and prevalence of coronary artery disease--men are more commonly affected than are age-matched women. We explored the role of the Y chromosome in coronary artery disease in the context of this sexual inequity.

    Methods: We genotyped 11 markers of the male-specific region of the Y chromosome in 3233 biologically unrelated British men from three cohorts: the British Heart Foundation Family Heart Study (BHF-FHS), West of Scotland Coronary Prevention Study (WOSCOPS), and Cardiogenics Study. On the basis of this information, each Y chromosome was tracked back into one of 13 ancient lineages defined as haplogroups. We then examined associations between common Y chromosome haplogroups and the risk of coronary artery disease in cross-sectional BHF-FHS and prospective WOSCOPS. Finally, we undertook functional analysis of Y chromosome effects on monocyte and macrophage transcriptome in British men from the Cardiogenics Study.

    Findings: Of nine haplogroups identified, two (R1b1b2 and I) accounted for roughly 90% of the Y chromosome variants among British men. Carriers of haplogroup I had about a 50% higher age-adjusted risk of coronary artery disease than did men with other Y chromosome lineages in BHF-FHS (odds ratio 1·75, 95% CI 1·20-2·54, p=0·004), WOSCOPS (1·45, 1·08-1·95, p=0·012), and joint analysis of both populations (1·56, 1·24-1·97, p=0·0002). The association between haplogroup I and increased risk of coronary artery disease was independent of traditional cardiovascular and socioeconomic risk factors. Analysis of macrophage transcriptome in the Cardiogenics Study revealed that 19 molecular pathways showing strong differential expression between men with haplogroup I and other lineages of the Y chromosome were interconnected by common genes related to inflammation and immunity, and that some of them have a strong relevance to atherosclerosis.

    Interpretation: The human Y chromosome is associated with risk of coronary artery disease in men of European ancestry, possibly through interactions of immunity and inflammation.

    Funding: British Heart Foundation; UK National Institute for Health Research; LEW Carty Charitable Fund; National Health and Medical Research Council of Australia; European Union 6th Framework Programme; Wellcome Trust.

    Funded by: British Heart Foundation: PG/06/097/21331; Wellcome Trust: 087576, WT-0841383/2/07/2

    Lancet 2012;379;9819;915-22

  • Using family-based imputation in genome-wide association studies with large complex pedigrees: the framingham heart study.

    Chen MH, Huang J, Chen WM, Larson MG, Fox CS, Vasan RS, Seshadri S, O'Donnell CJ and Yang Q

    Department of Neurology, Boston University School of Medicine, Boston, Massachusetts, United States of America ; Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, United States of America ; The National Heart, Lung, Blood Institute's Framingham Heart Study, Framingham, Massachusetts, United States of America.

    Imputation has been widely used in genome-wide association studies (GWAS) to infer genotypes of un-genotyped variants based on the linkage disequilibrium in external reference panels such as the HapMap and 1000 Genomes. However, imputation has only rarely been performed based on family relationships to infer genotypes of un-genotyped individuals. Using 8998 Framingham Heart Study (FHS) participants genotyped with Affymetrix 550K SNPs, we imputed genotypes of same set of SNPs for additional 3121 participants, most of whom were never genotyped due to lack of DNA sample. Prior to imputation, 122 pedigrees were too large to be handled by the imputation software Merlin. Therefore, we developed a novel pedigree splitting algorithm that can maximize the number of genotyped relatives for imputing each un-genotyped individual, while keeping new sub-pedigrees under a pre-specified size. In GWAS of four phenotypes available in FHS (Alzheimer disease, circulating levels of fibrinogen, high-density lipoprotein cholesterol, and uric acid), we compared results using genotyped individuals only with results using both genotyped and imputed individuals. We studied the impact of applying different imputation quality filtering thresholds on the association results and did not found a universal threshold that always resulted in a more significant p-value for previously identified loci. However most of these loci had a lower p-value when we only included imputed genotypes with with ≥60% SNP- and ≥50% person-specific imputation certainty. In summary, we developed a novel algorithm for splitting large pedigrees for imputation and found a plausible imputation quality filtering threshold based on FHS. Further examination may be required to generalize this threshold to other studies.

    PloS one 2012;7;12;e51589

  • Bayesian estimation of bacterial community composition from 454 sequencing data.

    Cheng L, Walker AW and Corander J

    Department of Mathematics and Statistics, P.O.Box 68 (Gustaf Hällströmin katu 2b), University of Helsinki, 00014 Helsinki, Finland and Pathogen Genomics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    Estimating bacterial community composition from a mixed sample in different applied contexts is an important task for many microbiologists. The bacterial community composition is commonly estimated by clustering polymerase chain reaction amplified 16S rRNA gene sequences. Current taxonomy-independent clustering methods for analyzing these sequences, such as UCLUST, ESPRIT-Tree and CROP, have two limitations: (i) expert knowledge is needed, i.e. a difference cutoff between species needs to be specified; (ii) closely related species cannot be separated. The first limitation imposes a burden on the user, since considerable effort is needed to select appropriate parameters, whereas the second limitation leads to an inaccurate description of the underlying bacterial community composition. We propose a probabilistic model-based method to estimate bacterial community composition which tackles these limitations. Our method requires very little expert knowledge, where only the possible maximum number of clusters needs to be specified. Also our method demonstrates its ability to separate closely related species in two experiments, in spite of sequencing errors and individual variations.

    Nucleic acids research 2012

  • Copy number variation of the APC gene is associated with regulation of bone mineral density.

    Chew S, Dastani Z, Brown SJ, Lewis JR, Dudbridge F, Soranzo N, Surdulescu GL, Richards JB, Spector TD and Wilson SG

    Department of Endocrinology and Diabetes, Sir Charles Gairdner Hospital, Nedlands 6009, Australia. Shelby.Chew@health.wa.gov.au

    Introduction: Genetic studies of osteoporosis have commonly examined SNPs in candidate genes or whole genome analyses, but insertions and deletions of DNA, collectively called copy number variations (CNVs), also comprise a large amount of the genetic variability between individuals. Previously, SNPs in the APC gene have been strongly associated with femoral neck and lumbar spine volumetric bone mineral density in older men. In addition, familial adenomatous polyposis patients carrying heterozygous mutations in the APC gene have been shown to have significantly higher mean bone mineral density than age- and sex-matched controls suggesting the importance of this gene in regulating bone mineral density. We examined CNV within the APC gene region to test for association with bone mineral density.

    Methods: DNA was extracted from venous blood, genotyped using the Human Hap610 arrays and CNV determined from the fluorescence intensity data in 2070 Caucasian men and women aged 47.0 ± 13.0 (mean ± SD) years, to assess the effects of the CNV on bone mineral density at the forearm, spine and total hip sites.

    Results: Data for covariate adjusted bone mineral density from subjects grouped by APC CNV genotype showed significant difference (P=0.02-0.002). Subjects with a single copy loss of APC had a 7.95%, 13.10% and 13.36% increase in bone mineral density at the forearm, spine and total hip sites respectively, compared to subjects with two copies of the APC gene.

    Conclusions: These data support previous findings of APC regulating bone mineral density and demonstrate that a novel CNV of the APC gene is significantly associated with bone mineral density in Caucasian men and women.

    Funded by: Canadian Institutes of Health Research; Wellcome Trust

    Bone 2012;51;5;939-43

  • Genome watch: Natural transformers.

    Chewapreecha C

    Nature reviews. Microbiology 2012;10;9;598

  • Conversion from mouse embryonic to extra-embryonic endoderm stem cells reveals distinct differentiation capacities of pluripotent stem cell states.

    Cho LT, Wamaitha SE, Tsai IJ, Artus J, Sherwood RI, Pedersen RA, Hadjantonakis AK and Niakan KK

    The Anne McLaren Laboratory for Regenerative Medicine, Stem Cell Institute, University of Cambridge, Cambridge CB2 0SZ, UK.

    The inner cell mass of the mouse pre-implantation blastocyst comprises epiblast progenitor and primitive endoderm cells of which cognate embryonic (mESCs) or extra-embryonic (XEN) stem cell lines can be derived. Importantly, each stem cell type retains the defining properties and lineage restriction of their in vivo tissue of origin. Recently, we demonstrated that XEN-like cells arise within mESC cultures. This raises the possibility that mESCs can generate self-renewing XEN cells without the requirement for gene manipulation. We have developed a novel approach to convert mESCs to XEN cells (cXEN) using growth factors. We confirm that the downregulation of the pluripotency transcription factor Nanog and the expression of primitive endoderm-associated genes Gata6, Gata4, Sox17 and Pdgfra are necessary for cXEN cell derivation. This approach highlights an important function for Fgf4 in cXEN cell derivation. Paracrine FGF signalling compensates for the loss of endogenous Fgf4, which is necessary to exit mESC self-renewal, but not for XEN cell maintenance. Our cXEN protocol also reveals that distinct pluripotent stem cells respond uniquely to differentiation promoting signals. cXEN cells can be derived from mESCs cultured with Erk and Gsk3 inhibitors (2i), and LIF, similar to conventional mESCs. However, we find that epiblast stem cells (EpiSCs) derived from the post-implantation embryo are refractory to cXEN cell establishment, consistent with the hypothesis that EpiSCs represent a pluripotent state distinct from mESCs. In all, these findings suggest that the potential of mESCs includes the capacity to give rise to both extra-embryonic and embryonic lineages.

    Development (Cambridge, England) 2012;139;16;2866-77

  • Enhanced susceptibility to Citrobacter rodentium infection in miR-155-deficient mice.

    Clare S, John V, Walker AW, Hill JL, Abreu-Goodger C, Hale C, Goulding D, Lawley TD, Mastroeni P, Frankel G, Enright AJ, Vigorito E and Dougan G

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    MicroRNAs (miRNAs) are small non-coding molecules that control gene expression post-transcriptionally, with miR-155 one of the first to be implicated in immune regulation. Here we show that miR-155-deficient mice are less able to eradicate a mucosal Citrobacter rodentium infection compared with wild-type C57BL/6 mice. miR-155-deficient mice exhibited prolonged colonisation associated with a higher C. rodentium burden in gastrointestinal tissue and spread into systemic tissues. Germinal centre formation and humoral immune responses against C. rodentium were severely impaired in infected miR-155 deficient mice. A similarly susceptible phenotype was observed in μMT mice reconstituted with miR-155-deficient B cells indicating that miR-155 is required intrinsically for mediating protection against this predominantly luminal bacterial pathogen.

    Infection and immunity 2012

  • 'Sifting the significance from the data' - the impact of high-throughput genomic technologies on human genetics and health care.

    Clarke AJ, Cooper DN, Krawczak M, Tyler-Smith C, Wallace HM, Wilkie AO, Raymond FL, Chadwick R, Craddock N, John R, Gallacher J and Chiano M

    Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, Wales CF14 4XN, UK. clarkeaj@cardiff.ac.uk

    This report is of a round-table discussion held in Cardiff in September 2009 for Cesagen, a research centre within the Genomics Network of the UK's Economic and Social Research Council. The meeting was arranged to explore ideas as to the likely future course of human genomics. The achievements of genomics research were reviewed, and the likely constraints on the pace of future progress were explored. New knowledge is transforming biology and our understanding of evolution and human disease. The difficulties we face now concern the interpretation rather than the generation of new sequence data. Our understanding of gene-environment interaction is held back by our current primitive tools for measuring environmental factors, and in addition, there may be fundamental constraints on what can be known about these complex interactions.

    Funded by: Wellcome Trust

    Human genomics 2012;6;11

  • Cfp1 integrates both CpG content and gene activity for accurate H3K4me3 deposition in embryonic stem cells.

    Clouaire T, Webb S, Skene P, Illingworth R, Kerr A, Andrews R, Lee JH, Skalnik D and Bird A

    Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3JR, United Kingdom;

    Trimethylation of histone H3 Lys 4 (H3K4me3) is a mark of active and poised promoters. The Set1 complex is responsible for most somatic H3K4me3 and contains the conserved subunit CxxC finger protein 1 (Cfp1), which binds to unmethylated CpGs and links H3K4me3 with CpG islands (CGIs). Here we report that Cfp1 plays unanticipated roles in organizing genome-wide H3K4me3 in embryonic stem cells. Cfp1 deficiency caused two contrasting phenotypes: drastic loss of H3K4me3 at expressed CGI-associated genes, with minimal consequences for transcription, and creation of "ectopic" H3K4me3 peaks at numerous regulatory regions. DNA binding by Cfp1 was dispensable for targeting H3K4me3 to active genes but was required to prevent ectopic H3K4me3 peaks. The presence of ectopic peaks at enhancers often coincided with increased expression of nearby genes. This suggests that CpG targeting prevents "leakage" of H3K4me3 to inappropriate chromatin compartments. Our results demonstrate that Cfp1 is a specificity factor that integrates multiple signals, including promoter CpG content and gene activity, to regulate genome-wide patterns of H3K4me3.

    Genes &amp; development 2012;26;15;1714-28

  • TNiK is required for postsynaptic and nuclear signaling pathways and cognitive function.

    Coba MP, Komiyama NH, Nithianantharajah J, Kopanitsa MV, Indersmitten T, Skene NG, Tuck EJ, Fricker DG, Elsegood KA, Stanford LE, Afinowi NO, Saksida LM, Bussey TJ, O'Dell TJ and Grant SG

    Genes to Cognition Programme, The Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom.

    Traf2 and NcK interacting kinase (TNiK) contains serine-threonine kinase and scaffold domains and has been implicated in cell proliferation and glutamate receptor regulation in vitro. Here we report its role in vivo using mice carrying a knock-out mutation. TNiK binds protein complexes in the synapse linking it to the NMDA receptor (NMDAR) via AKAP9. NMDAR and metabotropic receptors bidirectionally regulate TNiK phosphorylation and TNiK is required for AMPA expression and synaptic function. TNiK also organizes nuclear complexes and in the absence of TNiK, there was a marked elevation in GSK3β and phosphorylation levels of its cognate phosphorylation sites on NeuroD1 with alterations in Wnt pathway signaling. We observed impairments in dentate gyrus neurogenesis in TNiK knock-out mice and cognitive testing using the touchscreen apparatus revealed impairments in pattern separation on a test of spatial discrimination. Object-location paired associate learning, which is dependent on glutamatergic signaling, was also impaired. Additionally, TNiK knock-out mice displayed hyperlocomotor behavior that could be rapidly reversed by GSK3β inhibitors, indicating the potential for pharmacological rescue of a behavioral phenotype. These data establish TNiK as a critical regulator of cognitive functions and suggest it may play a regulatory role in diseases impacting on its interacting proteins and complexes.

    Funded by: Medical Research Council: G0802238; NIMH NIH HHS: MH609197, R01 MH060919; Wellcome Trust

    The Journal of neuroscience : the official journal of the Society for Neuroscience 2012;32;40;13987-99

  • The YARHG Domain: An Extracellular Domain in Search of a Function.

    Coggill P and Bateman A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.

    We have identified a new bacterial protein domain that we hypothesise binds to peptidoglycan. This domain is called the YARHG domain after the most highly conserved sequence-segment. The domain is found in the extracellular space and is likely to be composed of four alpha-helices. The domain is found associated with protein kinase domains, suggesting it is associated with signalling in some bacteria. The domain is also found associated with three different families of peptidases. The large number of different domains that are found associated with YARHG suggests that it is a useful functional module that nature has recombined multiple times.

    PloS one 2012;7;5;e35575

  • SpolPred: rapid and accurate prediction of Mycobacterium tuberculosis spoligotypes from short genomic sequences.

    Coll F, Mallard K, Preston MD, Bentley S, Parkhill J, McNerney R, Martin N and Clark TG

    Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton and Department of Computer Science and Information Systems, Pathogen Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Birkbeck College, London, UK.

    Summary: Spoligotyping is a well-established genotyping technique based on the presence of unique DNA sequences in Mycobacterium tuberculosis (Mtb), the causal agent of tuberculosis disease (TB). Although advances in sequencing technologies are leading to whole-genome bacterial characterization, tens of thousands of isolates have been spoligotyped, giving a global view of Mtb strain diversity. To bridge the gap, we have developed SpolPred, a software to predict the spoligotype from raw sequence reads. Our approach is compared with experimentally and de novo assembly determined strain types in a set of 44 Mtb isolates. In silico and experimental results are identical for almost all isolates (39/44). However, SpolPred detected five experimentally false spoligotypes and was more accurate and faster than the assembling strategy. Application of SpolPred to an additional seven isolates with no laboratory data led to types that clustered with identical experimental types in a phylogenetic analysis using single-nucleotide polymorphisms. Our results demonstrate the usefulness of the tool and its role in revealing experimental limitations. Availability and implementation: SpolPred is written in C and is available from www.pathogenseq.org/spolpred. CONTACT: francesc.coll@lshtm.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Online.

    Bioinformatics (Oxford, England) 2012;28;22;2991-3

  • Fibrinogen-binding and platelet-aggregation activities of a Lactobacillus salivarius septicaemia isolate are mediated by a novel fibrinogen-binding protein.

    Collins J, van Pijkeren JP, Svensson L, Claesson MJ, Sturme M, Li Y, Cooney JC, van Sinderen D, Walker AW, Parkhill J, Shannon O and O'Toole PW

    Department of Microbiology and Alimentary Pharmabiotic Centre, University College Cork, Cork, Ireland.

    The marketplace for probiotic foods is burgeoning, measured in billions of euro per annum. It is imperative, however, that all bacterial strains are fully assessed for human safety. The ability to bind fibrinogen is considered a potential pathogenicity trait that can lead to platelet aggregation, serious medical complications, and in some instances, death. Here we examined strains from species frequently used as probiotics for their ability to bind human fibrinogen. Only one strain (CCUG 47825), a Lactobacillus salivarius isolate from a case of septicaemia, was found to strongly adhere to fibrinogen. Furthermore, this strain was found to aggregate human platelets at a level comparable to the human pathogen Staphylococcus aureus. By sequencing the genome of CCUG 47825, we were able to identify candidate genes responsible for fibrinogen binding. Complementing the genetic analysis with traditional molecular microbiological techniques enabled the identification of the novel fibrinogen receptor, CCUG_2371. Although only strain CCUG 47825 bound fibrinogen under laboratory conditions, homologues of the novel fibrinogen binding gene CCUG_2371 are widespread among L. salivarius strains, maintaining their potential to bind fibrinogen if expressed. We highlight the fact that without a full genetic analysis of strains for human consumption, potential pathogenicity traits may go undetected.

    Funded by: Wellcome Trust: WT098051

    Molecular microbiology 2012;85;5;862-77

  • Incorporating RNA-seq data into the zebrafish Ensembl genebuild.

    Collins JE, White S, Searle SM and Stemple DL

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom.

    Ensembl gene annotation provides a comprehensive catalog of transcripts aligned to the reference sequence. It relies on publicly available species-specific and orthologous transcripts plus their inferred protein sequence. The accuracy of gene models is improved by increasing the species-specific component that can be cost-effectively achieved using RNA-seq. Two zebrafish gene annotations are presented in Ensembl version 62 built on the Zv9 reference sequence. Firstly, RNA-seq data from five tissues and seven developmental stages were assembled into 25,748 gene models. A 3'-end capture and sequencing protocol was developed to predict the 3' ends of transcripts, and 46.1% of the original models were subsequently refined. Secondly, a standard Ensembl genebuild, incorporating carefully filtered elements from the RNA-seq-only build, followed by a merge with the manually curated VEGA database, produced a comprehensive annotation of 26,152 genes represented by 51,569 transcripts. The RNA-seq-only and the Ensembl/VEGA genebuilds contribute contrasting elements to the final genebuild. The RNA-seq genebuild was used to adjust intron/exon boundaries of orthologous defined models, confirm their expression, and improve 3' untranslated regions. Importantly, the inferred protein alignments within the Ensembl genebuild conferred proof of model contiguity for the RNA-seq models. The zebrafish gene annotation has been enhanced by the incorporation of RNA-seq data and the pipeline will be used for other organisms. Organisms with little species-specific cDNA data will generally benefit the most.

    Genome research 2012;22;10;2067-78

  • Early environment and neurobehavioral development predict adult temperament clusters.

    Congdon E, Service S, Wessman J, Seppänen JK, Schönauer S, Miettunen J, Turunen H, Koiranen M, Joukamaa M, Järvelin MR, Peltonen L, Veijola J, Mannila H, Paunio T and Freimer NB

    University of California Los Angeles Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, California, United States of America.

    Background: Investigation of the environmental influences on human behavioral phenotypes is important for our understanding of the causation of psychiatric disorders. However, there are complexities associated with the assessment of environmental influences on behavior. We conducted a series of analyses using a prospective, longitudinal study of a nationally representative birth cohort from Finland (the Northern Finland 1966 Birth Cohort). Participants included a total of 3,761 male and female cohort members who were living in Finland at the age of 16 years and who had complete temperament scores. Our initial analyses (Wessman et al., in press) provide evidence in support of four stable and robust temperament clusters. Using these temperament clusters, as well as independent temperament dimensions for comparison, we conducted a data-driven analysis to assess the influence of a broad set of life course measures, assessed pre-natally, in infancy, and during adolescence, on adult temperament. Results: Measures of early environment, neurobehavioral development, and adolescent behavior significantly predict adult temperament, classified by both cluster membership and temperament dimensions. Specifically, our results suggest that a relatively consistent set of life course measures are associated with adult temperament profiles, including maternal education, characteristics of the family's location and residence, adolescent academic performance, and adolescent smoking. Conclusions: Our finding that a consistent set of life course measures predict temperament clusters indicate that these clusters represent distinct developmental temperament trajectories and that information about a subset of life course measures has implications for adult health outcomes.

    PloS one 2012;7;7;e38065

  • Circulating DNA and next-generation sequencing.

    Cooke S and Campbell P

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK.

    Personalising cancer medicine depends upon the implementation of personalised diagnostics and therapeutics. Detailed genomic screening is likely to play a central role in this. As the range of drugs and other therapies for cancer continues to increase, there is an increasingly urgent need for sensitive and specific measures of disease burden to guide treatment regimens. The ability to quantify disease burden with high accuracy and sensitivity in patients with cancer would open many potential routes to personalising therapeutic choices. For example, the intensity of therapy could be guided by the amount of disease at diagnosis; monitoring the response of patients to drugs could allow extension of the period of treatment in responders or early changeover of therapy in nonresponders; and early prediction of recurrence could allow salvage therapy to be instituted before complications of relapse develop. The detection of tumour-specific rearrangements in DNA free in the serum or plasma may provide a substantial advance in the accuracy of monitoring disease burden in patients with solid tumours.

    Recent results in cancer research. Fortschritte der Krebsforschung. Progrès dans les recherches sur le cancer 2012;195;143-9

  • The rocky road to personalized medicine: Computational and statistical challenges

    Corander J, Aittokallio T, RIPATTI S, Kaski S

    Personalized Medicine. 2012;9;109-14

  • Streptococcus pneumoniae: the evolution of antimicrobial resistance to beta-lactams, fluoroquinolones and macrolides.

    Cornick JE and Bentley SD

    Malawi-Liverpool-Wellcome Clinical Research Programme, PO Box 30096, Chichiri, Blantyre 3, Malawi; Institute of Infection and Global Health, The University of Liverpool, Liverpool, UK.

    Multi drug resistant Streptococcus pneumoniae constitute a major public health concern worldwide. In this review we discuss how the transformable nature of the pneumococcus, in parallel with antimicrobial induced stress, contributes to the evolution of antimicrobial resistance; and how the introduction of the pneumococcal conjugate vaccine has affected the situation.

    Microbes and infection / Institut Pasteur 2012

  • Interpretation of genomic copy number variants using DECIPHER.

    Corpas M, Bragin E, Clayton S, Bevan P and Firth HV

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    Many patients suffering from developmental disorders have submicroscopic deletions or duplications affecting the copy number of dosage-sensitive genes or disrupting normal gene expression. Many of these changes are novel or extremely rare, making clinical interpretation problematic and genotype/phenotype correlations difficult. Identification of patients sharing a genomic rearrangement and having phenotypes in common increases certainty in the diagnosis and allows characterization of new syndromes. The DECIPHER database is an online repository of genotype and phenotype data whose chief objective is to facilitate the association of genomic variation with phenotype to enable the clinical interpretation of copy number variation (CNV). This unit shows how DECIPHER can be used to (1) search for consented patients sharing a defined chromosomal location, (2) navigate regions of interest using in-house visualization tools and the Ensembl genome browser, (3) analyze affected genes and prioritize them according to their likelihood of haploinsufficiency, (4) upload patient aberrations and phenotypes, and (5) create printouts at different levels of detail. By following this protocol, clinicians and researchers alike will be able to learn how to characterize their patients' chromosomal imbalances using DECIPHER.

    Funded by: Wellcome Trust: WT077008

    Current protocols in human genetics / editorial board, Jonathan L. Haines ... [et al.] 2012;Chapter 8;Unit 8.14

  • Direct sequencing of small genomes on the Pacific Biosciences RS without library preparation.

    Coupland P, Chandra T, Quail M, Reik W and Swerdlow H

    Wellcome Trust Sanger Institute, Hinxton, UK. pc10@sanger.ac.uk

    We have developed a sequencing method on the Pacific Biosciences RS sequencer (the PacBio) for small DNA molecules that avoids the need for a standard library preparation. To date this approach has been applied toward sequencing single-stranded and double-stranded viral genomes, bacterial plasmids, plasmid vector models for DNA-modification analysis, and linear DNA fragments covering an entire bacterial genome. Using direct sequencing it is possible to generate sequence data from as little as 1 ng of DNA, offering a significant advantage over current protocols which typically require 400-500 ng of sheared DNA for the library preparation.

    Funded by: Medical Research Council: G0801156; Wellcome Trust: 095645, 098051

    BioTechniques 2012;53;6;365-72

  • A genome-wide association meta-analysis of circulating sex hormone-binding globulin reveals multiple Loci implicated in sex steroid hormone regulation.

    Coviello AD, Haring R, Wellons M, Vaidya D, Lehtimäki T, Keildson S, Lunetta KL, He C, Fornage M, Lagou V, Mangino M, Onland-Moret NC, Chen B, Eriksson J, Garcia M, Liu YM, Koster A, Lohman K, Lyytikäinen LP, Petersen AK, Prescott J, Stolk L, Vandenput L, Wood AR, Zhuang WV, Ruokonen A, Hartikainen AL, Pouta A, Bandinelli S, Biffar R, Brabant G, Cox DG, Chen Y, Cummings S, Ferrucci L, Gunter MJ, Hankinson SE, Martikainen H, Hofman A, Homuth G, Illig T, Jansson JO, Johnson AD, Karasik D, Karlsson M, Kettunen J, Kiel DP, Kraft P, Liu J, Ljunggren Ö, Lorentzon M, Maggio M, Markus MR, Mellström D, Miljkovic I, Mirel D, Nelson S, Morin Papunen L, Peeters PH, Prokopenko I, Raffel L, Reincke M, Reiner AP, Rexrode K, Rivadeneira F, Schwartz SM, Siscovick D, Soranzo N, Stöckl D, Tworoger S, Uitterlinden AG, van Gils CH, Vasan RS, Wichmann HE, Zhai G, Bhasin S, Bidlingmaier M, Chanock SJ, De Vivo I, Harris TB, Hunter DJ, Kähönen M, Liu S, Ouyang P, Spector TD, van der Schouw YT, Viikari J, Wallaschofski H, McCarthy MI, Frayling TM, Murray A, Franks S, Järvelin MR, de Jong FH, Raitakari O, Teumer A, Ohlsson C, Murabito JM and Perry JR

    Section of Preventive Medicine and Epidemiology, Boston University School of Medicine, Boston, Massachusetts, United States of America.

    Sex hormone-binding globulin (SHBG) is a glycoprotein responsible for the transport and biologic availability of sex steroid hormones, primarily testosterone and estradiol. SHBG has been associated with chronic diseases including type 2 diabetes (T2D) and with hormone-sensitive cancers such as breast and prostate cancer. We performed a genome-wide association study (GWAS) meta-analysis of 21,791 individuals from 10 epidemiologic studies and validated these findings in 7,046 individuals in an additional six studies. We identified twelve genomic regions (SNPs) associated with circulating SHBG concentrations. Loci near the identified SNPs included SHBG (rs12150660, 17p13.1, p = 1.8 × 10(-106)), PRMT6 (rs17496332, 1p13.3, p = 1.4 × 10(-11)), GCKR (rs780093, 2p23.3, p = 2.2 × 10(-16)), ZBTB10 (rs440837, 8q21.13, p = 3.4 × 10(-09)), JMJD1C (rs7910927, 10q21.3, p = 6.1 × 10(-35)), SLCO1B1 (rs4149056, 12p12.1, p = 1.9 × 10(-08)), NR2F2 (rs8023580, 15q26.2, p = 8.3 × 10(-12)), ZNF652 (rs2411984, 17q21.32, p = 3.5 × 10(-14)), TDGF3 (rs1573036, Xq22.3, p = 4.1 × 10(-14)), LHCGR (rs10454142, 2p16.3, p = 1.3 × 10(-07)), BAIAP2L1 (rs3779195, 7q21.3, p = 2.7 × 10(-08)), and UGT2B15 (rs293428, 4q13.2, p = 5.5 × 10(-06)). These genes encompass multiple biologic pathways, including hepatic function, lipid metabolism, carbohydrate metabolism and T2D, androgen and estrogen receptor function, epigenetic effects, and the biology of sex steroid hormone-responsive cancers including breast and prostate cancer. We found evidence of sex-differentiated genetic influences on SHBG. In a sex-specific GWAS, the loci 4q13.2-UGT2B15 was significant in men only (men p = 2.5 × 10(-08), women p = 0.66, heterogeneity p = 0.003). Additionally, three loci showed strong sex-differentiated effects: 17p13.1-SHBG and Xq22.3-TDGF3 were stronger in men, whereas 8q21.12-ZBTB10 was stronger in women. Conditional analyses identified additional signals at the SHBG gene that together almost double the proportion of variance explained at the locus. Using an independent study of 1,129 individuals, all SNPs identified in the overall or sex-differentiated or conditional analyses explained ~15.6% and ~8.4% of the genetic variation of SHBG concentrations in men and women, respectively. The evidence for sex-differentiated effects and allelic heterogeneity highlight the importance of considering these features when estimating complex trait variance.

    Funded by: Biotechnology and Biological Sciences Research Council: G20234; Medical Research Council: G0500539, G0600705, QLG1-CT-2000-01643; NCI NIH HHS: CA128034, CA40356, CA49449, CA87969, U01-CA98233; NCRR NIH HHS: RR-024156, U54 RR020278; NHGRI NIH HHS: U01 HG005152, U01-HG-004424, U01-HG-004446, U01-HG-004729; NHLBI NIH HHS: 5-K23-HL087114, 5R01HL087679-02, HL074338, HL074406, HL087679, N01 HC-95159, N01-HC-05187, N01-HC-25195, N01-HC-45134, N01-HC-45204, N01-HC-45205, N01-HC-48047, N01-HC-48048, N01-HC-48049, N01-HC-48050, N01-HC-95095, N01-HC-95160, N01-HC-95161, N01-HC-95162, N01-HC-95163, N01-HC-95164, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, N02-HL-6-4278, R01-HL-084099, R01-HL065611, R01HL094755; NIA NIH HHS: 1-AG-1-2111, 1R01AG032098-01A1, N.1-AG-1-1, N01-AG-5-0002, N01AG62101, N01AG62103, N01AG62106, R01AG31206, R21AG032598; NIAMS NIH HHS: AR 41398; NIGMS NIH HHS: GM053275-14; NIMH NIH HHS: 1RL1MH083268-01, 5R01MH63706, MH083268; NIMHD NIH HHS: 263 MD 821336, 263 MD 9164; PHS HHS: 24152, 32100-2, 32105-6, 32108-9, 32111-13, 32115, 32118-32119, 32122, 41071, 42107-26, 42129-32, 44221, HHSN268200782096C, NFBC1966; WHI NIH HHS: N01WH22110; Wellcome Trust: GR069224

    PLoS genetics 2012;8;7;e1002805

  • LRP1B deletion in high-grade serous ovarian cancers is associated with acquired chemotherapy resistance to liposomal doxorubicin.

    Cowin PA, George J, Fereday S, Loehrer E, Van Loo P, Cullinane C, Etemadmoghadam D, Ftouni S, Galletta L, Anglesio MS, Hendley J, Bowes L, Sheppard KE, Christie EL, Pearson RB, Harnett PR, Heinzelmann-Schwarz V, Friedlander M, McNally O, Quinn M, Campbell P, deFazio A, Bowtell DD and Australian Ovarian Cancer Study

    Cancer Genomics and Genetics Program, Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia.

    High-grade serous cancer (HGSC), the most common subtype of ovarian cancer, often becomes resistant to chemotherapy, leading to poor patient outcomes. Intratumoral heterogeneity occurs in nearly all solid cancers, including ovarian cancer, contributing to the development of resistance mechanisms. In this study, we examined the spatial and temporal genomic variation in HGSC using high-resolution single-nucleotide polymorphism arrays. Multiple metastatic lesions from individual patients were analyzed along with 22 paired pretreatment and posttreatment samples. We documented regions of differential DNA copy number between multiple tumor biopsies that correlated with altered expression of genes involved in cell polarity and adhesion. In the paired primary and relapse cohort, we observed a greater degree of genomic change in tumors from patients that were initially sensitive to chemotherapy and had longer progression-free interval compared with tumors from patients that were resistant to primary chemotherapy. Notably, deletion or downregulation of the lipid transporter LRP1B emerged as a significant correlate of acquired resistance in our analysis. Functional studies showed that reducing LRP1B expression was sufficient to reduce the sensitivity of HGSC cell lines to liposomal doxorubicin, but not to doxorubicin, whereas LRP1B overexpression was sufficient to increase sensitivity to liposomal doxorubicin. Together, our findings underscore the large degree of variation in DNA copy number in spatially and temporally separated tumors in HGSC patients, and they define LRP1B as a potential contributor to the emergence of chemotherapy resistance in these patients.

    Cancer research 2012;72;16;4060-73

  • A small predatory core genome in the divergent marine Bacteriovorax marinus SJ and the terrestrial Bdellovibrio bacteriovorus.

    Crossman LC, Chen H, Cerdeño-Tárraga AM, Brooks K, Quail MA, Pineiro SA, Hobley L, Sockett RE, Bentley SD, Parkhill J, Williams HN and Stine OC

    1] Department of Bioinformatics, The Genome Analysis Centre, Norwich Research Park, Norwich, UK [2] School of Biological Sciences, University of East Anglia, Norwich, UK.

    Bacteriovorax marinus SJ is a predatory delta-proteobacterium isolated from a marine environment. The genome sequence of this strain provides an interesting contrast to that of the terrestrial predatory bacterium Bdellovibrio bacteriovorus HD100. Based on their predatory lifestyle, Bacteriovorax were originally designated as members of the genus Bdellovibrio but subsequently were re-assigned to a new genus and family based on genetic and phenotypic differences. B. marinus attaches to Gram-negative bacteria, penetrates through the cell wall to form a bdelloplast, in which it replicates, as shown using microscopy. Bacteriovorax is distinct, as it shares only 30% of its gene products with its closest sequenced relatives. Remarkably, 34% of predicted genes over 500 nt in length were completely unique with no significant matches in the databases. As expected, Bacteriovorax shares several characteristic loci with the other delta-proteobacteria. A geneset shared between Bacteriovorax and Bdellovibrio that is not conserved among other delta-proteobacteria such as Myxobacteria (which destroy prey bacteria externally via lysis), or the non-predatory Desulfo-bacteria and Geobacter species was identified. These 291 gene orthologues common to both Bacteriovorax and Bdellovibrio may be the key indicators of host-interaction predatory-specific processes required for prey entry. The locus from Bdellovibrio bacteriovorus is implicated in the switch from predatory to prey/host-independent growth. Although the locus is conserved in B. marinus, the sequence has only limited similarity. The results of this study advance understanding of both the similarities and differences between Bdellovibrio and Bacteriovorax and confirm the distant relationship between the two and their separation into different families.The ISME Journal advance online publication, 6 September 2012; doi:10.1038/ismej.2012.90.

    The ISME journal 2012

  • A high-resolution view of genome-wide pneumococcal transformation.

    Croucher NJ, Harris SR, Barquist L, Parkhill J and Bentley SD

    Pathogen Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    Transformation is an important mechanism of microbial evolution through which bacteria have been observed to rapidly adapt in response to clinical interventions; examples include facilitating vaccine evasion and the development of penicillin resistance in the major respiratory pathogen Streptococcus pneumoniae. To characterise the process in detail, the genomes of 124 S. pneumoniae isolates produced through in vitro transformation were sequenced and recombination events detected. Those recombinations importing the selected marker were independent of unselected events elsewhere in the genome, the positions of which were not significantly affected by local sequence similarity between donor and recipient or mismatch repair processes. However, both types of recombinations were sometimes mosaic, with multiple non-contiguous segments originating from the same molecule of donor DNA. The lengths of the unselected events were exponentially distributed with a mean of 2.3 kb, implying that recombinations are stochastically resolved with a fixed per base probability of 4.4×10(-4) bp(-1). This distribution of recombination sizes, coupled with an observed under representation of large insertions within transferred sequence, suggests transformation has the potential to reduce the size of bacterial genomes, and is unlikely to act as an efficient mechanism for the uptake of accessory genomic loci.

    PLoS pathogens 2012;8;6;e1002745

  • Investigation of Host Candidate Malaria-Associated Risk/Protective SNPs in a Brazilian Amazonian Population.

    da Silva Santos S, Clark TG, Campino S, Suarez-Mutis MC, Rockett KA, Kwiatkowski DP and Fernandes O

    Laboratório Interdisciplinar de Pesquisas Médicas, IOC, Fiocruz, Rio de Janeiro, Brazil.

    The Brazilian Amazon is a hypo-endemic malaria region with nearly 300,000 cases each year. A variety of genetic polymorphisms, particularly in erythrocyte receptors and immune response related genes, have been described to be associated with susceptibility and resistance to malaria. In order to identify polymorphisms that might be associated with malaria clinical outcomes in a Brazilian Amazonian population, sixty-four human single nucleotide polymorphisms in 37 genes were analyzed using a Sequenom massARRAY iPLEX platform. A total of 648 individuals from two malaria endemic areas were studied, including 535 malaria cases (113 individuals with clinical mild malaria, 122 individuals with asymptomatic infection and 300 individuals with history of previous mild malaria) and 113 health controls with no history of malaria. The data revealed significant associations (p<0.003) between one SNP in the IL10 gene (rs1800896) and one SNP in the TLR4 gene (rs4986790) with reduced risk for clinical malaria, one SNP in the IRF1 gene (rs2706384) with increased risk for clinical malaria, one SNP in the LTA gene (rs909253) with protection from clinical malaria and one SNP in the TNF gene (RS1800750) associated with susceptibility to clinical malaria. Also, a new association was found between a SNP in the CTL4 gene (rs2242665), located at the major histocompatibility complex III region, and reduced risk for clinical malaria. This study represents the first association study from an Amazonian population involving a large number of host genetic polymorphisms with susceptibility or resistance to Plasmodium infection and malaria outcomes. Further studies should include a larger number of individuals, refined parameters and a fine-scale map obtained through DNA sequencing to increase the knowledge of the Amazonian population genetic diversity.

    PloS one 2012;7;5;e36692

  • High levels of RNA-editing site conservation amongst 15 laboratory mouse strains.

    Danecek P, Nellåker C, McIntyre RE, Buendia-Buendia JE, Bumpstead S, Ponting CP, Flint J, Durbin R, Keane TM and Adams DJ

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK.

    Background: Adenosine-to-inosine (A-to-I) editing is a site-selective post-transcriptional alteration of double-stranded RNA by ADAR deaminases that is crucial for homeostasis and development. Recently the Mouse Genomes Project generated genome sequences for 17 laboratory mouse strains and rich catalogues of variants. We also generated RNA-seq data from whole brain RNA from 15 of the sequenced strains.

    Results: Here we present a computational approach that takes an initial set of transcriptome/genome mismatch sites and filters these calls taking into account systematic biases in alignment, single nucleotide variant calling, and sequencing depth to identify RNA editing sites with high accuracy. We applied this approach to our panel of mouse strain transcriptomes identifying 7,389 editing sites with an estimated false-discovery rate of between 2.9 and 10.5%. The overwhelming majority of these edits were of the A-to-I type, with less than 2.4% not of this class, and only three of these edits could not be explained as alignment artifacts. We validated 24 novel RNA editing sites in coding sequence, including two non-synonymous edits in the Cacna1d gene that fell into the IQ domain portion of the Cav1.2 voltage-gated calcium channel, indicating a potential role for editing in the generation of transcript diversity.

    Conclusions: We show that despite over two million years of evolutionary divergence, the sites edited and the level of editing at each site is remarkably consistent across the 15 strains. In the Cds2 gene we find evidence for RNA editing acting to preserve the ancestral transcript sequence despite genomic sequence divergence.

    Funded by: Cancer Research UK; Medical Research Council; Wellcome Trust: 079912

    Genome biology 2012;13;4;26

  • Novel loci for adiponectin levels and their influence on type 2 diabetes and metabolic traits: a multi-ethnic meta-analysis of 45,891 individuals.

    Dastani Z, Hivert MF, Timpson N, Perry JR, Yuan X, Scott RA, Henneman P, Heid IM, Kizer JR, Lyytikäinen LP, Fuchsberger C, Tanaka T, Morris AP, Small K, Isaacs A, Beekman M, Coassin S, Lohman K, Qi L, Kanoni S, Pankow JS, Uh HW, Wu Y, Bidulescu A, Rasmussen-Torvik LJ, Greenwood CM, Ladouceur M, Grimsby J, Manning AK, Liu CT, Kooner J, Mooser VE, Vollenweider P, Kapur KA, Chambers J, Wareham NJ, Langenberg C, Frants R, Willems-Vandijk K, Oostra BA, Willems SM, Lamina C, Winkler TW, Psaty BM, Tracy RP, Brody J, Chen I, Viikari J, Kähönen M, Pramstaller PP, Evans DM, St Pourcain B, Sattar N, Wood AR, Bandinelli S, Carlson OD, Egan JM, Böhringer S, van Heemst D, Kedenko L, Kristiansson K, Nuotio ML, Loo BM, Harris T, Garcia M, Kanaya A, Haun M, Klopp N, Wichmann HE, Deloukas P, Katsareli E, Couper DJ, Duncan BB, Kloppenburg M, Adair LS, Borja JB, DIAGRAM+ Consortium, MAGIC Consortium, GLGC Investigators, MuTHER Consortium, Wilson JG, Musani S, Guo X, Johnson T, Semple R, Teslovich TM, Allison MA, Redline S, Buxbaum SG, Mohlke KL, Meulenbelt I, Ballantyne CM, Dedoussis GV, Hu FB, Liu Y, Paulweber B, Spector TD, Slagboom PE, Ferrucci L, Jula A, Perola M, Raitakari O, Florez JC, Salomaa V, Eriksson JG, Frayling TM, Hicks AA, Lehtimäki T, Smith GD, Siscovick DS, Kronenberg F, van Duijn C, Loos RJ, Waterworth DM, Meigs JB, Dupuis J, Richards JB, Voight BF, Scott LJ, Steinthorsdottir V, Dina C, Welch RP, Zeggini E, Huth C, Aulchenko YS, Thorleifsson G, McCulloch LJ, Ferreira T, Grallert H, Amin N, Wu G, Willer CJ, Raychaudhuri S, McCarroll SA, Hofmann OM, Segrè AV, van Hoek M, Navarro P, Ardlie K, Balkau B, Benediktsson R, Bennett AJ, Blagieva R, Boerwinkle E, Bonnycastle LL, Boström KB, Bravenboer B, Bumpstead S, Burtt NP, Charpentier G, Chines PS, Cornelis M, Crawford G, Doney AS, Elliott KS, Elliott AL, Erdos MR, Fox CS, Franklin CS, Ganser M, Gieger C, Grarup N, Green T, Griffin S, Groves CJ, Guiducci C, Hadjadj S, Hassanali N, Herder C, Isomaa B, Jackson AU, Johnson PR, Jørgensen T, Kao WH, Kong A, Kraft P, Kuusisto J, Lauritzen T, Li M, Lieverse A, Lindgren CM, Lyssenko V, Marre M, Meitinger T, Midthjell K, Morken MA, Narisu N, Nilsson P, Owen KR, Payne F, Petersen AK, Platou C, Proença C, Prokopenko I, Rathmann W, Rayner NW, Robertson NR, Rocheleau G, Roden M, Sampson MJ, Saxena R, Shields BM, Shrader P, Sigurdsson G, Sparsø T, Strassburger K, Stringham HM, Sun Q, Swift AJ, Thorand B, Tichet J, Tuomi T, van Dam RM, van Haeften TW, van Herpt T, van Vliet-Ostaptchouk JV, Walters GB, Weedon MN, Wijmenga C, Witteman J, Bergman RN, Cauchi S, Collins FS, Gloyn AL, Gyllensten U, Hansen T, Hide WA, Hitman GA, Hofman A, Hunter DJ, Hveem K, Laakso M, Morris AD, Palmer CN, Rudan I, Sijbrands E, Stein LD, Tuomilehto J, Uitterlinden A, Walker M, Watanabe RM, Abecasis GR, Boehm BO, Campbell H, Daly MJ, Hattersley AT, Pedersen O, Barroso I, Groop L, Sladek R, Thorsteinsdottir U, Wilson JF, Illig T, Froguel P, van Duijn CM, Stefansson K, Altshuler D, Boehnke M, McCarthy MI, Soranzo N, Wheeler E, Glazer NL, Bouatia-Naji N, Mägi R, Randall J, Elliott P, Rybin D, Dehghan A, Hottenga JJ, Song K, Goel A, Lajunen T, Doney A, Cavalcanti-Proença C, Kumari M, Timpson NJ, Zabena C, Ingelsson E, An P, O'Connell J, Luan J, Elliott A, McCarroll SA, Roccasecca RM, Pattou F, Sethupathy P, Ariyurek Y, Barter P, Beilby JP, Ben-Shlomo Y, Bergmann S, Bochud M, Bonnefond A, Borch-Johnsen K, Böttcher Y, Brunner E, Bumpstead SJ, Chen YD, Chines P, Clarke R, Coin LJ, Cooper MN, Crisponi L, Day IN, de Geus EJ, Delplanque J, Fedson AC, Fischer-Rosinsky A, Forouhi NG, Franzosi MG, Galan P, Goodarzi MO, Graessler J, Grundy S, Gwilliam R, Hallmans G, Hammond N, Han X, Hartikainen AL, Hayward C, Heath SC, Hercberg S, Hillman DR, Hingorani AD, Hui J, Hung J, Kaakinen M, Kaprio J, Kesaniemi YA, Kivimaki M, Knight B, Koskinen S, Kovacs P, Kyvik KO, Lathrop GM, Lawlor DA, Le Bacquer O, Lecoeur C, Li Y, Mahley R, Mangino M, Martínez-Larrad MT, McAteer JB, McPherson R, Meisinger C, Melzer D, Meyre D, Mitchell BD, Mukherjee S, Naitza S, Neville MJ, Orrù M, Pakyz R, Paolisso G, Pattaro C, Pearson D, Peden JF, Pedersen NL, Pfeiffer AF, Pichler I, Polasek O, Posthuma D, Potter SC, Pouta A, Province MA, Rayner NW, Rice K, Ripatti S, Rivadeneira F, Rolandsson O, Sandbaek A, Sandhu M, Sanna S, Sayer AA, Scheet P, Seedorf U, Sharp SJ, Shields B, Sigurðsson G, Sijbrands EJ, Silveira A, Simpson L, Singleton A, Smith NL, Sovio U, Swift A, Syddall H, Syvänen AC, Tönjes A, Uitterlinden AG, van Dijk KW, Varma D, Visvikis-Siest S, Vitart V, Vogelzangs N, Waeber G, Wagner PJ, Walley A, Ward KL, Watkins H, Wild SH, Willemsen G, Witteman JC, Yarnell JW, Zelenika D, Zethelius B, Zhai G, Zhao JH, Zillikens MC, DIAGRAM Consortium, GIANT Consortium, Global B Pgen Consortium, Borecki IB, Meneton P, Magnusson PK, Nathan DM, Williams GH, Silander K, Bornstein SR, Schwarz P, Spranger J, Karpe F, Shuldiner AR, Cooper C, Serrano-Ríos M, Lind L, Palmer LJ, Hu FB, Franks PW, Ebrahim S, Marmot M, Kao WH, Pramstaller PP, Wright AF, Stumvoll M, Hamsten A, Procardis Consortium, Buchanan TA, Valle TT, Rotter JI, Penninx BW, Boomsma DI, Cao A, Scuteri A, Schlessinger D, Uda M, Ruokonen A, Jarvelin MR, Peltonen L, Mooser V, Sladek R, MAGIC investigators, GLGC Consortium, Musunuru K, Smith AV, Edmondson AC, Stylianou IM, Koseki M, Pirruccello JP, Chasman DI, Johansen CT, Fouchier SW, Peloso GM, Barbalic M, Ricketts SL, Bis JC, Feitosa MF, Orho-Melander M, Melander O, Li X, Li M, Cho YS, Go MJ, Kim YJ, Lee JY, Park T, Kim K, Sim X, Ong RT, Croteau-Chonka DC, Lange LA, Smith JD, Ziegler A, Zhang W, Zee RY, Whitfield JB, Thompson JR, Surakka I, Spector TD, Smit JH, Sinisalo J, Scott J, Saharinen J, Sabatti C, Rose LM, Roberts R, Rieder M, Parker AN, Pare G, O'Donnell CJ, Nieminen MS, Nickerson DA, Montgomery GW, McArdle W, Masson D, Martin NG, Marroni F, Lucas G, Luben R, Lokki ML, Lettre G, Launer LJ, Lakatta EG, Laaksonen R, Kyvik KO, König IR, Khaw KT, Kaplan LM, Johansson Å, Janssens AC, Igl W, Hovingh GK, Hengstenberg C, Havulinna AS, Hastie ND, Harris TB, Haritunians T, Hall AS, Groop LC, Gonzalez E, Freimer NB, Erdmann J, Ejebe KG, Döring A, Dominiczak AF, Demissie S, Deloukas P, de Faire U, Crawford G, Chen YD, Caulfield MJ, Boekholdt SM, Assimes TL, Quertermous T, Seielstad M, Wong TY, Tai ES, Feranil AB, Kuzawa CW, Taylor HA, Gabriel SB, Holm H, Gudnason V, Krauss RM, Ordovas JM, Munroe PB, Kooner JS, Tall AR, Hegele RA, Kastelein JJ, Schadt EE, Strachan DP, Reilly MP, Samani NJ, Schunkert H, Cupples LA, Sandhu MS, Ridker PM, Rader DJ and Kathiresan S

    Department of Epidemiology, Biostatistics, and Occupational Health, Jewish General Hospital, Lady Davis Institute, McGill University, Montreal, Canada.

    Circulating levels of adiponectin, a hormone produced predominantly by adipocytes, are highly heritable and are inversely associated with type 2 diabetes mellitus (T2D) and other metabolic traits. We conducted a meta-analysis of genome-wide association studies in 39,883 individuals of European ancestry to identify genes associated with metabolic disease. We identified 8 novel loci associated with adiponectin levels and confirmed 2 previously reported loci (P = 4.5×10(-8)-1.2×10(-43)). Using a novel method to combine data across ethnicities (N = 4,232 African Americans, N = 1,776 Asians, and N = 29,347 Europeans), we identified two additional novel loci. Expression analyses of 436 human adipocyte samples revealed that mRNA levels of 18 genes at candidate regions were associated with adiponectin concentrations after accounting for multiple testing (p<3×10(-4)). We next developed a multi-SNP genotypic risk score to test the association of adiponectin decreasing risk alleles on metabolic traits and diseases using consortia-level meta-analytic data. This risk score was associated with increased risk of T2D (p = 4.3×10(-3), n = 22,044), increased triglycerides (p = 2.6×10(-14), n = 93,440), increased waist-to-hip ratio (p = 1.8×10(-5), n = 77,167), increased glucose two hours post oral glucose tolerance testing (p = 4.4×10(-3), n = 15,234), increased fasting insulin (p = 0.015, n = 48,238), but with lower in HDL-cholesterol concentrations (p = 4.5×10(-13), n = 96,748) and decreased BMI (p = 1.4×10(-4), n = 121,335). These findings identify novel genetic determinants of adiponectin levels, which, taken together, influence risk of T2D and markers of insulin resistance.

    Funded by: Biotechnology and Biological Sciences Research Council: G20234; British Heart Foundation: PG/08/094/26019, PG/09/002/26056, RG/07/008/23674; Canadian Institutes of Health Research; Chief Scientist Office: CZB/4/710; Department of Health: DHCS/07/07/008; FIC NIH HHS: TW05596; Medical Research Council: G0401527, G0600705, G0601966, G0700931, G0701863, G0800582, G0801056, G0900339, G0901213, G0902037, G1000143, G19/35, G8802774, MC_PC_U127561128, MC_U106179471, MC_U127561128, MC_U127592696, MC_UP_A100_1003, MC_UP_A620_1014, MC_UP_A620_1015; NCATS NIH HHS: UL1 TR000124; NCRR NIH HHS: M01-RR00425, RR-024156, RR20649, UL1 RR025008, UL1RR025005; NHGRI NIH HHS: U01HG004402; NHLBI NIH HHS: HL085144, HL094555, HL105756, N01 HC-15103, N01 HC-55222, N01 HC-95159, N01-HC-25195, N01-HC-35129, N01-HC-45133, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-65226, N01-HC-75150, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85084, N01-HC-85085, N01-HC-85086, N01-HC-95160, N01-HC-95161, N01-HC-95162, N01-HC-95163, N01-HC-95164, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, N01-HC-95170, N01-HC-95171, N01-HC-95172, N02-HL-6-4278, R01 HL087652, R01-HL085251, R01HL086694, R01HL087641, R01HL59367, RC2 HL102419, U01 HL080295; NIA NIH HHS: 1R01AG032098-01A1, N01AG62101, N01AG62103, N01AG62106; NICHD NIH HHS: R24 HD050924; NIDDK NIH HHS: 1 R01 DK075787-01A1, DK063491, DK078150, DK56350, K24 DK080140, R01 DK078150, R01DK056918; NIEHS NIH HHS: ES10126; PHS HHS: HHSN268200625226C, HHSN268200782096C; Wellcome Trust: 064890, 081682, 090532, 092731

    PLoS genetics 2012;8;3;e1002607

  • Large-Scale Identification of MicroRNA Targets in Murine Dgcr8-Deficient Embryonic Stem Cell Lines.

    Davis MP, Abreu-Goodger C, van Dongen S, Lu D, Tate PH, Bartonicek N, Kutter C, Liu P, Skarnes WC, Enright AJ and Dunham I

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.

    Small RNAs such as microRNAs play important roles in embryonic stem cell maintenance and differentiation. A broad range of microRNAs is expressed in embryonic stem cells while only a fraction of their targets have been identified. We have performed large-scale identification of embryonic stem cell microRNA targets using a murine embryonic stem cell line deficient in the expression of Dgcr8. These cells are heavily depleted for microRNAs, allowing us to reintroduce specific microRNA duplexes and identify refined target sets. We used deep sequencing of small RNAs, mRNA expression profiling and bioinformatics analysis of microRNA seed matches in 3' UTRs to identify target transcripts. Consequently, we have identified a network of microRNAs that converge on the regulation of several important cellular pathways. Additionally, our experiments have revealed a novel candidate for Dgcr8-independent microRNA genesis and highlighted the challenges currently facing miRNA annotation.

    PloS one 2012;7;8;e41762

  • Long-range DNA looping and gene expression analyses identify DEXI as an autoimmune disease candidate gene.

    Davison LJ, Wallace C, Cooper JD, Cope NF, Wilson NK, Smyth DJ, Howson JM, Saleh N, Al-Jeffery A, Angus KL, Stevens HE, Nutland S, Duley S, Coulson RM, Walker NM, Burren OS, Rice CM, Cambien F, Zeller T, Munzel T, Lackner K, Blankenberg S, Cardiogenics Consortium, Fraser P, Gottgens B, Todd JA, Attwood T, Belz S, Braund P, Cambien F, Cooper J, Crisp-Hihn A, Diemert P, Deloukas P, Foad N, Erdmann J, Goodall AH, Gracey J, Gray E, Gwilliams R, Heimerl S, Hengstenberg C, Jolley J, Krishnan U, Lloyd-Jones H, Lugauer I, Lundmark P, Maouche S, Moore JS, Muir D, Murray E, Nelson CP, Neudert J, Niblett D, O'Leary K, Ouwehand WH, Pollard H, Rankin A, Rice CM, Sager H, Samani NJ, Sambrook J, Schmitz G, Scholz M, Schroeder L, Schunkert H, Syvannen AC, Tennstedt S and Wallace C

    Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, University of Cambridge, Cambridge, UK. lucy.davison@cimr.cam.ac.uk

    The chromosome 16p13 region has been associated with several autoimmune diseases, including type 1 diabetes (T1D) and multiple sclerosis (MS). CLEC16A has been reported as the most likely candidate gene in the region, since it contains the most disease-associated single-nucleotide polymorphisms (SNPs), as well as an imunoreceptor tyrosine-based activation motif. However, here we report that intron 19 of CLEC16A, containing the most autoimmune disease-associated SNPs, appears to behave as a regulatory sequence, affecting the expression of a neighbouring gene, DEXI. The CLEC16A alleles that are protective from T1D and MS are associated with increased expression of DEXI, and no other genes in the region, in two independent monocyte gene expression data sets. Critically, using chromosome conformation capture (3C), we identified physical proximity between the DEXI promoter region and intron 19 of CLEC16A, separated by a loop of >150 kb. In reciprocal experiments, a 20 kb fragment of intron 19 of CLEC16A, containing SNPs associated with T1D and MS, as well as with DEXI expression, interacted with the promotor region of DEXI but not with candidate DNA fragments containing other potential causal genes in the region, including CLEC16A. Intron 19 of CLEC16A is highly enriched for transcription-factor-binding events and markers associated with enhancer activity. Taken together, these data indicate that although the causal variants in the 16p13 region lie within CLEC16A, DEXI is an unappreciated autoimmune disease candidate gene, and illustrate the power of the 3C approach in progressing from genome-wide association studies results to candidate causal genes.

    Funded by: Medical Research Council; Wellcome Trust: 061858, 076113, 076113/C/04/Z, 079895, 082549/Z/07/Z, 089989/Z/09/Z

    Human molecular genetics 2012;21;2;322-33

  • Diagnostic interpretation of array data using public databases and internet sources.

    de Leeuw N, Dijkhuizen T, Hehir-Kwa JY, Carter NP, Feuk L, Firth HV, Kuhn RM, Ledbetter DH, Martin CL, van Ravenswaaij-Arts CM, Scherer SW, Shams S, Van Vooren S, Sijmons R, Swertz M and Hastings R

    Department of Human Genetics, Radboud University Nijmegen Medical Centre, Nijmegen, the Netherlands. N.deLeeuw@antrg.umcn.nl.

    The range of commercially available array platforms and analysis software packages is expanding and their utility is improving, making reliable detection of copy number variation (CNV) relatively straightforward. Reliable interpretation of CNV data, however, is often difficult and requires expertise. With our knowledge of the human genome growing rapidly, applications for array testing continuously broadening, and the resolution of CNV detection increasing, this leads to great complexity in interpreting what can be daunting data. Correct CNV interpretation and optimal use of the genotype information provided by SNP probes on an array depends largely on knowledge present in various resources. In addition to the availability of host laboratories' own datasets and national registries, there are several public databases and Internet resources with genotype and phenotype information that can be used for array data interpretation. With so many resources now available, it is important to know which are fit-for-purpose in a diagnostic setting. We summarise the characteristics of the most commonly used Internet databases and resources, and propose a general data interpretation strategy that can be used for comparative hybridisation, comparative intensity and genotype-based array data.

    Human mutation 2012

  • Meta-analysis of genome-wide association studies for personality.

    de Moor MH, Costa PT, Terracciano A, Krueger RF, de Geus EJ, Toshiko T, Penninx BW, Esko T, Madden PA, Derringer J, Amin N, Willemsen G, Hottenga JJ, Distel MA, Uda M, Sanna S, Spinhoven P, Hartman CA, Sullivan P, Realo A, Allik J, Heath AC, Pergadia ML, Agrawal A, Lin P, Grucza R, Nutile T, Ciullo M, Rujescu D, Giegling I, Konte B, Widen E, Cousminer DL, Eriksson JG, Palotie A, Peltonen L, Luciano M, Tenesa A, Davies G, Lopez LM, Hansell NK, Medland SE, Ferrucci L, Schlessinger D, Montgomery GW, Wright MJ, Aulchenko YS, Janssens AC, Oostra BA, Metspalu A, Abecasis GR, Deary IJ, Räikkönen K, Bierut LJ, Martin NG, van Duijn CM and Boomsma DI

    Department of Biological Psychology, VU University Amsterdam, Amsterdam, The Netherlands. mhm.de.moor@psy.vu.nl

    Personality can be thought of as a set of characteristics that influence people's thoughts, feelings and behavior across a variety of settings. Variation in personality is predictive of many outcomes in life, including mental health. Here we report on a meta-analysis of genome-wide association (GWA) data for personality in 10 discovery samples (17,375 adults) and five in silico replication samples (3294 adults). All participants were of European ancestry. Personality scores for Neuroticism, Extraversion, Openness to Experience, Agreeableness and Conscientiousness were based on the NEO Five-Factor Inventory. Genotype data of ≈ 2.4M single-nucleotide polymorphisms (SNPs; directly typed and imputed using HapMap data) were available. In the discovery samples, classical association analyses were performed under an additive model followed by meta-analysis using the weighted inverse variance method. Results showed genome-wide significance for Openness to Experience near the RASA1 gene on 5q14.3 (rs1477268 and rs2032794, P=2.8 × 10(-8) and 3.1 × 10(-8)) and for Conscientiousness in the brain-expressed KATNAL2 gene on 18q21.1 (rs2576037, P=4.9 × 10(-8)). We further conducted a gene-based test that confirmed the association of KATNAL2 to Conscientiousness. In silico replication did not, however, show significant associations of the top SNPs with Openness and Conscientiousness, although the direction of effect of the KATNAL2 SNP on Conscientiousness was consistent in all replication samples. Larger scale GWA studies and alternative approaches are required for confirmation of KATNAL2 as a novel gene affecting Conscientiousness.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council; NCI NIH HHS: CA89392, P01 CA089392, P01 CA089392-08; NHGRI NIH HHS: U01 HG004422, U01 HG004422-02, U01 HG004446, U01HG004438; NIA NIH HHS: N01-AG-1-2109, Z99 AG999999, ZIA AG000180-25, ZIA AG000180-26, ZIA AG000196-03, ZIA AG000196-04, ZIA AG000197-03, ZIA AG000197-04; NIAAA NIH HHS: AA07580, AA07728, AA11998, AA13320, AA13321, K05 AA017688-04, U10 AA008401, U10AA008401; NIDA NIH HHS: DA019951, DA12854, R01 DA012854-10, R01 DA013423, R01 DA013423-05, R01 DA019963-01A2, R01 DA019963-02, R01 DA019963-03; NIMH NIH HHS: MH081802, R01 MH059160; Wellcome Trust: 089062/Z/09/Z, 89061/Z/09/Z

    Molecular psychiatry 2012;17;3;337-49

  • Investigating the potential for ethnic group harm in collaborative genomics research in Africa: Is ethnic stigmatisation likely?

    de Vries J, Jallow M, Williams TN, Kwiatkowski D, Parker M and Fitzpatrick R

    The Ethox Centre, Department of Public Health, University of Oxford, UK; Division of Human Genetics, Faculty of Health Sciences, University of Cape Town, South Africa.

    A common assumption in genomics research is that the use of ethnic categories has the potential to lead to ethnic stigmatisation - particularly when the research is done on minority populations. Yet few empirical studies have sought to investigate the relation between genomics and stigma, and fewer still with a focus on Africa. In this paper, we investigate the potential for genomics research to lead to harms to ethnic groups. We carried out 49 semi-structured, open-ended interviews with stakeholders in a current medical genomics research project in Africa, MalariaGEN. Interviews were conducted with MalariaGEN researchers, fieldworkers, members of three ethics committees who reviewed MalariaGEN project proposals, and with members of the two funding bodies providing support to the MalariaGEN project. Interviews were conducted in Kenya, The Gambia and the UK between June 2008 and October 2009. They covered a range of aspects relating to the use of ethnicity in the genomics project, including views on adverse effects of the inclusion of ethnicity in such research. Drawing on the empirical data, we argue that the risk of harm to ethnic groups is likely to be more acute in specific types of genomics research. We develop a typology of research questions and projects that carry a greater risk of harm to the populations included in genomics research. We conclude that the potential of generating harm to ethnic groups in genomics research is present if research includes populations that are already stigmatised or discriminated against, or where the research investigates questions with particular normative implications. We identify a clear need for genomics researchers to take account of the social context of the work they are proposing to do, including understanding the local realities and relations between ethnic groups, and whether diseases are already stigmatised.

    Social science &amp; medicine (1982) 2012

  • Genomic restructuring in the Tasmanian devil facial tumour: chromosome painting and gene mapping provide clues to evolution of a transmissible tumour.

    Deakin JE, Bender HS, Pearse AM, Rens W, O'Brien PC, Ferguson-Smith MA, Cheng Y, Morris K, Taylor R, Stuart A, Belov K, Amemiya CT, Murchison EP, Papenfuss AT and Graves JA

    Research School of Biology, The Australian National University, Canberra, Australia. janine.deakin@anu.edu.au

    Devil facial tumour disease (DFTD) is a fatal, transmissible malignancy that threatens the world's largest marsupial carnivore, the Tasmanian devil, with extinction. First recognised in 1996, DFTD has had a catastrophic effect on wild devil numbers, and intense research efforts to understand and contain the disease have since demonstrated that the tumour is a clonal cell line transmitted by allograft. We used chromosome painting and gene mapping to deconstruct the DFTD karyotype and determine the chromosome and gene rearrangements involved in carcinogenesis. Chromosome painting on three different DFTD tumour strains determined the origins of marker chromosomes and provided a general overview of the rearrangement in DFTD karyotypes. Mapping of 105 BAC clones by fluorescence in situ hybridisation provided a finer level of resolution of genome rearrangements in DFTD strains. Our findings demonstrate that only limited regions of the genome, mainly chromosomes 1 and X, are rearranged in DFTD. Regions rearranged in DFTD are also highly rearranged between different marsupials. Differences between strains are limited, reflecting the unusually stable nature of DFTD. Finally, our detailed maps of both the devil and tumour karyotypes provide a physical framework for future genomic investigations into DFTD.

    PLoS genetics 2012;8;2;e1002483

  • The Clostridium difficile spo0A gene is a persistence and transmission factor.

    Deakin LJ, Clare S, Fagan RP, Dawson LF, Pickard DJ, West MR, Wren BW, Fairweather NF, Dougan G and Lawley TD

    Microbial Pathogenesis Laboratory, Wellcome Trust Sanger Institute, Hinxton, United Kingdom.

    Clostridium difficile is a major cause of chronic antibiotic-associated diarrhea and a significant health care-associated pathogen that forms highly resistant and infectious spores. Spo0A is a highly conserved transcriptional regulator that plays a key role in initiating sporulation in Bacillus and Clostridium species. Here, we use a murine model to study the role of the C. difficile spo0A gene during infection and transmission. We demonstrate that C. difficile spo0A mutant derivatives can cause intestinal disease but are unable to persist within and effectively transmit between mice. Thus, the C. difficile Spo0A protein plays a key role in persistent infection, including recurrence and host-to-host transmission in mice.

    Funded by: Medical Research Council: G0800170, G0901743; Wellcome Trust: 098051, WT086418MA

    Infection and immunity 2012;80;8;2704-11

  • Molecular mechanisms of drug resistance in natural Leishmania populations vary with genetic background.

    Decuypere S, Vanaerschot M, Brunker K, Imamura H, Müller S, Khanal B, Rijal S, Dujardin JC and Coombs GH

    Department of Biomedical Sciences, Institute of Tropical Medicine, Antwerp, Belgium.

    The evolution of drug-resistance in pathogens is a major global health threat. Elucidating the molecular basis of pathogen drug-resistance has been the focus of many studies but rarely is it known whether a drug-resistance mechanism identified is universal for the studied pathogen; it has seldom been clarified whether drug-resistance mechanisms vary with the pathogen's genotype. Nevertheless this is of critical importance in gaining an understanding of the complexity of this global threat and in underpinning epidemiological surveillance of pathogen drug resistance in the field. This study aimed to assess the molecular and phenotypic heterogeneity that emerges in natural parasite populations under drug treatment pressure. We studied lines of the protozoan parasite Leishmania (L.) donovani with differential susceptibility to antimonial drugs; the lines being derived from clinical isolates belonging to two distinct genetic populations that circulate in the leishmaniasis endemic region of Nepal. Parasite pathways known to be affected by antimonial drugs were characterised on five experimental levels in the lines of the two populations. Characterisation of DNA sequence, gene expression, protein expression and thiol levels revealed a number of molecular features that mark antimonial-resistant parasites in only one of the two populations studied. A final series of in vitro stress phenotyping experiments confirmed this heterogeneity amongst drug-resistant parasites from the two populations. These data provide evidence that the molecular changes associated with antimonial-resistance in natural Leishmania populations depend on the genetic background of the Leishmania population, which has resulted in a divergent set of resistance markers in the Leishmania populations. This heterogeneity of parasite adaptations provides severe challenges for the control of drug resistance in the field and the design of molecular surveillance tools for widespread applicability.

    Funded by: Wellcome Trust: 085349, WT061173MA-SM

    PLoS neglected tropical diseases 2012;6;2;e1514

  • Angiopoietin-2 Is a Direct Transcriptional Target of TAL1, LYL1 and LMO2 in Endothelial Cells.

    Deleuze V, El-Hajj R, Chalhoub E, Dohet C, Pinet V, Couttet P and Mathieu D

    Institut de Génétique Moléculaire de Montpellier UMR 5535 CNRS, Montpellier, France.

    The two related basic helix-loop-helix, TAL1 and LYL1, and their cofactor LIM-only-2 protein (LMO2) are present in blood and endothelial cells. While their crucial role in early hematopoiesis is well established, their function in endothelial cells and especially in angiogenesis is less understood. Here, we identified ANGIOPOIETIN-2 (ANG-2), which encodes a major regulator of angiogenesis, as a direct transcriptional target of TAL1, LYL1 and LMO2. Knockdown of any of the three transcription factors in human blood and lymphatic endothelial cells caused ANG-2 mRNA and protein down-regulation. Transient transfections showed that the full activity of the ANG-2 promoter required the integrity of a highly conserved Ebox-GATA composite element. Accordingly, chromatin immunoprecipitation assays demonstrated that TAL1, LYL1, LMO2 and GATA2 occupied this region of ANG-2 promoter in human endothelial cells. Furthermore, we showed that LMO2 played a central role in assembling TAL1-E47, LYL1-LYL1 or/and LYL1-TAL1 dimers with GATA2. The resulting complexes were able to activate endogenous ANG-2 expression in endothelial cells as well as in non-endothelial cells. Finally, we showed that ANG-2 gene activation during angiogenesis concurred with the up-regulation of TAL1 and LMO2. Altogether, we identified ANG-2 as a bona fide target gene of LMO2-complexes with TAL1 and/or LYL1, highlighting a new function of the three hematopoietic factors in the endothelial lineage.

    PloS one 2012;7;7;e40484

  • Genome-wide association study identifies novel loci associated with circulating phospho- and sphingolipid concentrations.

    Demirkan A, van Duijn CM, Ugocsai P, Isaacs A, Pramstaller PP, Liebisch G, Wilson JF, Johansson Å, Rudan I, Aulchenko YS, Kirichenko AV, Janssens AC, Jansen RC, Gnewuch C, Domingues FS, Pattaro C, Wild SH, Jonasson I, Polasek O, Zorkoltseva IV, Hofman A, Karssen LC, Struchalin M, Floyd J, Igl W, Biloglav Z, Broer L, Pfeufer A, Pichler I, Campbell S, Zaboli G, Kolcic I, Rivadeneira F, Huffman J, Hastie ND, Uitterlinden A, Franke L, Franklin CS, Vitart V, DIAGRAM Consortium, Nelson CP, Preuss M, CARDIoGRAM Consortium, Bis JC, O'Donnell CJ, Franceschini N, CHARGE Consortium, Witteman JC, Axenovich T, Oostra BA, Meitinger T, Hicks AA, Hayward C, Wright AF, Gyllensten U, Campbell H, Schmitz G and EUROSPAN consortium

    Genetic Epidemiology Unit, Department of Epidemiology, Erasmus University Medical Center, Rotterdam, The Netherlands.

    Phospho- and sphingolipids are crucial cellular and intracellular compounds. These lipids are required for active transport, a number of enzymatic processes, membrane formation, and cell signalling. Disruption of their metabolism leads to several diseases, with diverse neurological, psychiatric, and metabolic consequences. A large number of phospholipid and sphingolipid species can be detected and measured in human plasma. We conducted a meta-analysis of five European family-based genome-wide association studies (N = 4034) on plasma levels of 24 sphingomyelins (SPM), 9 ceramides (CER), 57 phosphatidylcholines (PC), 20 lysophosphatidylcholines (LPC), 27 phosphatidylethanolamines (PE), and 16 PE-based plasmalogens (PLPE), as well as their proportions in each major class. This effort yielded 25 genome-wide significant loci for phospholipids (smallest P-value = 9.88×10(-204)) and 10 loci for sphingolipids (smallest P-value = 3.10×10(-57)). After a correction for multiple comparisons (P-value<2.2×10(-9)), we observed four novel loci significantly associated with phospholipids (PAQR9, AGPAT1, PKD2L1, PDXDC1) and two with sphingolipids (PLD2 and APOE) explaining up to 3.1% of the variance. Further analysis of the top findings with respect to within class molar proportions uncovered three additional loci for phospholipids (PNLIPRP2, PCDH20, and ABDH3) suggesting their involvement in either fatty acid elongation/saturation processes or fatty acid specific turnover mechanisms. Among those, 14 loci (KCNH7, AGPAT1, PNLIPRP2, SYT9, FADS1-2-3, DLG2, APOA1, ELOVL2, CDK17, LIPC, PDXDC1, PLD2, LASS4, and APOE) mapped into the glycerophospholipid and 12 loci (ILKAP, ITGA9, AGPAT1, FADS1-2-3, APOA1, PCDH20, LIPC, PDXDC1, SGPP1, APOE, LASS4, and PLD2) to the sphingolipid pathways. In large meta-analyses, associations between FADS1-2-3 and carotid intima media thickness, AGPAT1 and type 2 diabetes, and APOA1 and coronary artery disease were observed. In conclusion, our study identified nine novel phospho- and sphingolipid loci, substantially increasing our knowledge of the genetic basis for these traits.

    Funded by: Medical Research Council; Wellcome Trust

    PLoS genetics 2012;8;2;e1002490

  • The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression.

    Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, Guernec G, Martin D, Merkel A, Knowles DG, Lagarde J, Veeravalli L, Ruan X, Ruan Y, Lassmann T, Carninci P, Brown JB, Lipovich L, Gonzalez JM, Thomas M, Davis CA, Shiekhattar R, Gingeras TR, Hubbard TJ, Notredame C, Harrow J and Guigó R

    Bioinformatics and Genomics, Centre for Genomic Regulation and UPF, 08003 Barcelona, Catalonia, Spain.

    The human genome contains many thousands of long noncoding RNAs (lncRNAs). While several studies have demonstrated compelling biological and disease roles for individual examples, analytical and experimental approaches to investigate these genes have been hampered by the lack of comprehensive lncRNA annotation. Here, we present and analyze the most complete human lncRNA annotation to date, produced by the GENCODE consortium within the framework of the ENCODE project and comprising 9277 manually annotated genes producing 14,880 transcripts. Our analyses indicate that lncRNAs are generated through pathways similar to that of protein-coding genes, with similar histone-modification profiles, splicing signals, and exon/intron lengths. In contrast to protein-coding genes, however, lncRNAs display a striking bias toward two-exon transcripts, they are predominantly localized in the chromatin and nucleus, and a fraction appear to be preferentially processed into small RNAs. They are under stronger selective pressure than neutrally evolving sequences-particularly in their promoter regions, which display levels of selection comparable to protein-coding genes. Importantly, about one-third seem to have arisen within the primate lineage. Comprehensive analysis of their expression in multiple human organs and brain regions shows that lncRNAs are generally lower expressed than protein-coding genes, and display more tissue-specific expression patterns, with a large fraction of tissue-specific lncRNAs expressed in the brain. Expression correlation analysis indicates that lncRNAs show particularly striking positive correlation with the expression of antisense coding genes. This GENCODE annotation represents a valuable resource for future studies of lncRNAs.

    Funded by: NHGRI NIH HHS: 1U54HG004555-01, 1U54HG004557-01, K99 HG006698

    Genome research 2012;22;9;1775-89

  • Next-generation sequencing in breast cancer: first take home messages.

    Desmedt C, Voet T, Sotiriou C and Campbell PJ

    aBreast Cancer Translational Laboratory, Université Libre de Bruxelles, Jules Bordet Institute, Brussels bDepartment of Human Genetics, University of Leuven, Leuven, Belgium cWellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK.

    PURPOSE OF REVIEW: We are currently on the threshold of a revolution in breast cancer research, thanks to the emergence of novel technologies based on next-generation sequencing (NGS). In this review, we will describe the different sequencing technologies and platforms, and summarize the main findings from the latest sequencing articles in breast cancer. RECENT FINDINGS: Firstly, the sequencing of a few hundreds of breast tumors has revealed new cancer genes. Although these were not frequently mutated, mutated genes from different patients could be grouped into the deregulation of similar pathways. Secondly, NGS allowed further exploration of intratumor heterogeneity and revealed that although subclonal mutations were present in all tumors, there was always a dominant clone, which comprised at least 50% of the tumor cells. Finally, tumor-specific DNA rearrangements could be detected in the patient's plasma, suggesting that NGS could be used to personalize the monitoring of the disease. SUMMARY: The application of NGS to breast cancer has been associated with tremendous advances and promises for increasing the understanding of the disease. However, there still remain many unanswered questions, such as the role of structural changes of tumor genomes in cancer progression and treatment response/resistance.

    Current opinion in oncology 2012

  • Genetic polymorphisms associated with anti-malarial antibody levels in a low and unstable malaria transmission area in southern Sri Lanka.

    Dewasurendra RL, Suriyaphol P, Fernando SD, Carter R, Rockett K, Corran P, Kwiatkowski D, Karunaweera ND and 7 MalariaGEN Consortium

    Department of Parasitology, Faculty of Medicine, University of Colombo, Colombo, Sri Lanka.

    Background: The incidence of malaria in Sri Lanka has significantly declined in recent years. Similar trends were seen in Kataragama, a known malaria endemic location within the southern province of the country, over the past five years. This is a descriptive study of anti-malarial antibody levels and selected host genetic mutations in residents of Kataragama, under low malaria transmission conditions. Methods: Sera were collected from 1,011 individuals residing in Kataragama and anti-malarial antibodies and total IgE levels were measured by a standardized ELISA technique. Host DNA was extracted and used for genotyping of selected SNPs in known genes associated with malaria. The antibody levels were analysed in relation to the past history of malaria (during past 10 years), age, sex, the location of residence within Kataragama and selected host genetic markers. Results: A significant increase in antibodies against Plasmodium falciparum antigens AMA1, MSP2, NANP and Plasmodium vivax antigen MSP1 in individuals with past history of malaria were observed when compared to those who did not. A marked increase of anti-MSP1(Pf) and anti-AMA1(Pv) was also evident in individuals between 45-59 years (when compared to other age groups). Allele frequencies for two SNPs in genes that code for IL-13 and TRIM-5 were found to be significantly different between those who have experienced one or more malaria attacks within past 10 years and those who did not. When antibody levels were classified into a low-high binary trait, significant associations were found with four SNPs for anti-AMA1(Pf); two SNPs for anti-MSP1(Pf); eight SNPs for anti-NANP(Pf); three SNPs for anti-AMA1(Pv); seven SNPs for anti-MSP1(Pv); and nine SNPs for total IgE. Eleven of these SNPs with significant associations with anti-malarial antibody levels were found to be non-synonymous. Conclusions: Evidence is suggestive of an age-acquired immunity in this study population in spite of low malaria transmission levels. Several SNPs were in linkage disequilibrium and had a significant association with elevated antibody levels, suggesting that these host genetic mutations might have an individual or collective effect on inducing or/and maintaining high anti-malarial antibody levels.

    Funded by: Medical Research Council: G0600230, G0600718; Wellcome Trust: 075491/Z04, 077012/Z/05/Z, 090532/Z/09/Z, 090770/Z/09/Z, WT077383/Z/05/Z

    Malaria journal 2012;11;281

  • Landscape of transcription in human cells.

    Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, Tanzer A, Lagarde J, Lin W, Schlesinger F, Xue C, Marinov GK, Khatun J, Williams BA, Zaleski C, Rozowsky J, Röder M, Kokocinski F, Abdelhamid RF, Alioto T, Antoshechkin I, Baer MT, Bar NS, Batut P, Bell K, Bell I, Chakrabortty S, Chen X, Chrast J, Curado J, Derrien T, Drenkow J, Dumais E, Dumais J, Duttagupta R, Falconnet E, Fastuca M, Fejes-Toth K, Ferreira P, Foissac S, Fullwood MJ, Gao H, Gonzalez D, Gordon A, Gunawardena H, Howald C, Jha S, Johnson R, Kapranov P, King B, Kingswood C, Luo OJ, Park E, Persaud K, Preall JB, Ribeca P, Risk B, Robyr D, Sammeth M, Schaffer L, See LH, Shahab A, Skancke J, Suzuki AM, Takahashi H, Tilgner H, Trout D, Walters N, Wang H, Wrobel J, Yu Y, Ruan X, Hayashizaki Y, Harrow J, Gerstein M, Hubbard T, Reymond A, Antonarakis SE, Hannon G, Giddings MC, Ruan Y, Wold B, Carninci P, Guigó R and Gingeras TR

    Centre for Genomic Regulation and UPF, Doctor Aiguader 88, Barcelona 08003, Catalonia, Spain.

    Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.

    Funded by: NCI NIH HHS: P30 CA045508; NHGRI NIH HHS: 1RC2HG005591, R01 HG003700, R01HG003700, RC2 HG005591, U01 HG003147, U54 HG004555, U54 HG004557, U54 HG004558, U54 HG004576, U54 HG007004, U54HG004555, U54HG004557, U54HG004558, U54HG004576; Wellcome Trust: 062023

    Nature 2012;489;7414;101-8

  • Evidence for transcript networks composed of chimeric RNAs in human cells.

    Djebali S, Lagarde J, Kapranov P, Lacroix V, Borel C, Mudge JM, Howald C, Foissac S, Ucla C, Chrast J, Ribeca P, Martin D, Murray RR, Yang X, Ghamsari L, Lin C, Bell I, Dumais E, Drenkow J, Tress ML, Gelpí JL, Orozco M, Valencia A, van Berkum NL, Lajoie BR, Vidal M, Stamatoyannopoulos J, Batut P, Dobin A, Harrow J, Hubbard T, Dekker J, Frankish A, Salehi-Ashtiani K, Reymond A, Antonarakis SE, Guigó R and Gingeras TR

    Bioinformatics and Genomics, Centre for Genomic Regulation and Universitat Pompeu Fabra, Barcelona, Catalonia, Spain.

    The classic organization of a gene structure has followed the Jacob and Monod bacterial gene model proposed more than 50 years ago. Since then, empirical determinations of the complexity of the transcriptomes found in yeast to human has blurred the definition and physical boundaries of genes. Using multiple analysis approaches we have characterized individual gene boundaries mapping on human chromosomes 21 and 22. Analyses of the locations of the 5' and 3' transcriptional termini of 492 protein coding genes revealed that for 85% of these genes the boundaries extend beyond the current annotated termini, most often connecting with exons of transcripts from other well annotated genes. The biological and evolutionary importance of these chimeric transcripts is underscored by (1) the non-random interconnections of genes involved, (2) the greater phylogenetic depth of the genes involved in many chimeric interactions, (3) the coordination of the expression of connected genes and (4) the close in vivo and three dimensional proximity of the genomic regions being transcribed and contributing to parts of the chimeric RNAs. The non-random nature of the connection of the genes involved suggest that chimeric transcripts should not be studied in isolation, but together, as an RNA network.

    Funded by: NHGRI NIH HHS: HG003143, R01 HG003143, R01 HG003143-08, U01HG003147, U01HG003150, U54 HG004592, U54HG004557; Wellcome Trust

    PloS one 2012;7;1;e28213

  • Human SH2B1 mutations are associated with maladaptive behaviors and obesity.

    Doche ME, Bochukova EG, Su HW, Pearce LR, Keogh JM, Henning E, Cline JM, Saeed S, Dale A, Cheetham T, Barroso I, Argetsinger LS, O'Rahilly S, Rui L, Carter-Su C and Farooqi IS

    Department of Molecular and Integrative Physiology, University of Michigan Medical School, Ann Arbor, Michigan 48109-5622, USA.

    Src homology 2 B adapter protein 1 (SH2B1) modulates signaling by a variety of ligands that bind to receptor tyrosine kinases or JAK-associated cytokine receptors, including leptin, insulin, growth hormone (GH), and nerve growth factor (NGF). Targeted deletion of Sh2b1 in mice results in increased food intake, obesity, and insulin resistance, with an intermediate phenotype seen in heterozygous null mice on a high-fat diet. We identified SH2B1 loss-of-function mutations in a large cohort of patients with severe early-onset obesity. Mutation carriers exhibited hyperphagia, childhood-onset obesity, disproportionate insulin resistance, and reduced final height as adults. Unexpectedly, mutation carriers exhibited a spectrum of behavioral abnormalities that were not reported in controls, including social isolation and aggression. We conclude that SH2B1 plays a critical role in the control of human food intake and body weight and is implicated in maladaptive human behavior.

    Funded by: Medical Research Council: G0900554, G9824984; NCI NIH HHS: P30-CA46592; NIDDK NIH HHS: P30 DK020572, P60-DK20572, R01 DK054222, R01-DK065122, R01-DK073601, R01-DK54222; NIGMS NIH HHS: T32-GM008322; Wellcome Trust: 077016/Z/05/Z, 082390/Z/07/Z, 098497

    The Journal of clinical investigation 2012;122;12;4732-6

  • Hyperactive piggyBac gene transfer in human cells and in vivo.

    Doherty JE, Huye LE, Yusa K, Zhou L, Craig NL and Wilson MH

    Medical Scientist Training Program, Baylor College of Medicine, Houston, TX 77030, USA.

    We characterized a recently developed hyperactive piggyBac (pB) transposase enzyme [containing seven mutations (7pB)] for gene transfer in human cells in vitro and to somatic cells in mice in vivo. Despite a protein level expression similar to that of native pB, 7pB significantly increased the gene transfer efficiency of a neomycin resistance cassette transposon in both HEK293 and HeLa cultured human cells. Native pB and SB100X, the most active transposase of the Sleeping Beauty transposon system, exhibited similar transposition efficiency in cultured human cell lines. When delivered to primary human T cells ex vivo, 7pB increased gene delivery two- to threefold compared with piggyBac and SB100X. The activity of hyperactive 7pB transposase was not affected by the addition of a 24-kDa N-terminal tag, whereas SB100X manifested a 50% reduction in transposition. Hyperactive 7pB was compared with native pB and SB100X in vivo in mice using hydrodynamic tail-vein injection of a limiting dose of transposase DNA combined with luciferase reporter transposons. We followed transgene expression for up to 6 months and observed approximately 10-fold greater long-term gene expression in mice injected with a codon-optimized version of 7pB compared with mice injected with native pB or SB100X. We conclude that hyperactive piggyBac elements can increase gene transfer in human cells and in vivo and should enable improved gene delivery using the piggyBac transposon system in a variety of cell and gene-therapy applications.

    Funded by: NIDDK NIH HHS: T32DK064717; NIGMS NIH HHS: T32GM007330

    Human gene therapy 2012;23;3;311-20

  • The microbiological and clinical characteristics of invasive salmonella in gallbladders from cholecystectomy patients in kathmandu, Nepal.

    Dongol S, Thompson CN, Clare S, Nga TV, Duy PT, Karkey A, Arjyal A, Koirala S, Khatri NS, Maskey P, Poudel S, Jaiswal VK, Vaidya S, Dougan G, Farrar JJ, Dolecek C, Basnyat B and Baker S

    Oxford University Clinical Research Unit, Patan Academy of Health Sciences, Kathmandu, Nepal.

    Gallbladder carriage of invasive Salmonella is considered fundamental in sustaining typhoid fever transmission. Bile and tissue was obtained from 1,377 individuals undergoing cholecystectomy in Kathmandu to investigate the prevalence, characteristics and relevance of invasive Salmonella in the gallbladder in an endemic area. Twenty percent of bile samples contained a Gram-negative organism, with Salmonella Typhi and Salmonella Paratyphi A isolated from 24 and 22 individuals, respectively. Gallbladders that contained Salmonella were more likely to show evidence of acute inflammation with extensive neutrophil infiltrate than those without Salmonella, corresponding with higher neutrophil and lower lymphocyte counts in the blood of Salmonella positive individuals. Antimicrobial resistance in the invasive Salmonella isolates was limited, indicating that gallbladder colonization is unlikely to be driven by antimicrobial resistance. The overall role of invasive Salmonella carriage in the gallbladder is not understood; here we show that 3.5% of individuals undergoing cholecystectomy in this setting have a high concentration of antimicrobial sensitive, invasive Salmonella in their bile. We predict that such individuals will become increasingly important if current transmission mechanisms are disturbed; prospectively identifying these individuals is, therefore, paramount for rapid local and regional elimination.

    PloS one 2012;7;10;e47342

  • The battle of the SNPs.

    Downing T

    Nature reviews. Microbiology 2012;10;1;6

  • Genome-wide SNP and microsatellite variation illuminate population-level epidemiology in the Leishmania donovani species complex.

    Downing T, Stark O, Vanaerschot M, Imamura H, Sanders M, Decuypere S, de Doncker S, Maes I, Rijal S, Sundar S, Dujardin JC, Berriman M and Schönian G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK. Tim.Downing@sanger.ac.uk

    The species of the Leishmania donovani species complex cause visceral leishmaniasis, a debilitating infectious disease transmitted by sandflies. Understanding molecular changes associated with population structure in these parasites can help unravel their epidemiology and spread in humans. In this study, we used a panel of standard microsatellite loci and genome-wide SNPs to investigate population-level diversity in L. donovani strains recently isolated from a small geographic area spanning India, Bihar and Nepal, and compared their variation to that found in diverse strains of the L. donovani complex isolates from Europe, Africa and Asia. Microsatellites and SNPs could clearly resolve the phylogenetic relationships of the strains between continents, and microsatellite phylogenies indicated that certain older Indian strains were closely related to African strains. In the context of the anti-malaria spraying campaigns in the 1960s, this was consistent with a pattern of episodic population size contractions and clonal expansions in these parasites that was supported by population history simulations. In sharp contrast to the low resolution provided by microsatellites, SNPs retained a much more fine-scale resolution of population-level variability to the extent that they identified four different lineages from the same region one of which was more closely related to African and European strains than to Indian or Nepalese ones. Joining results of in vitro testing the antimonial drug sensitivity with the phylogenetic signals from the SNP data highlighted protein-level mutations revealing a distinct drug-resistant group of Nepalese and Indian L. donovani. This study demonstrates the power of genomic data for exploring parasite population structure. Furthermore, markers defining different genetic groups have been discovered that could potentially be applied to investigate drug resistance in clinical Leishmania strains.

    Funded by: Wellcome Trust: WT 085775/Z/08/Z

    Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases 2012;12;1;149-59

  • Fluorescence-based phenotypic selection allows forward genetic screens in haploid human cells.

    Duncan LM, Timms RT, Zavodszky E, Cano F, Dougan G, Randow F and Lehner PJ

    Cambridge Institute for Medical Research, Addenbrooke's Hospital, Cambridge, United Kingdom.

    The isolation of haploid cell lines has recently allowed the power of forward genetic screens to be applied to mammalian cells. The interest in applying this powerful genetic approach to a mammalian system is only tempered by the limited utility of these screens, if confined to lethal phenotypes. Here we expand the scope of these approaches beyond live/dead screens and show that selection for a cell surface phenotype via fluorescence-activated cell sorting can identify the key molecules in an intracellular pathway, in this case MHC class I antigen presentation. Non-lethal haploid genetic screens are widely applicable to identify genes involved in essentially any cellular pathway.

    Funded by: Medical Research Council; Wellcome Trust

    PloS one 2012;7;6;e39651

  • Variation in human genes encoding adhesion and proinflammatory molecules are associated with severe malaria in the Vietnamese.

    Dunstan SJ, Rockett KA, Quyen NT, Teo YY, Thai CQ, Hang NT, Jeffreys A, Clark TG, Small KS, Simmons CP, Day N, O'Riordan SE, Kwiatkowski DP, Farrar J, Phu NH, Hien TT and MalariaGEN Consortium

    Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam. sdunstan@oucru.org

    The genetic basis for susceptibility to malaria has been studied widely in African populations but less is known of the contribution of specific genetic variants in Asian populations. We genotyped 67 single-nucleotide polymorphisms (SNPs) in 1030 severe malaria cases and 2840 controls from Vietnam. After data quality control, genotyping data of 956 cases and 2350 controls were analysed for 65 SNPs (3 gender confirmation, 62 positioned in/near 42 malarial candidate genes). A total of 14 SNPs were monomorphic and 2 (rs8078340 and rs33950507) were not in Hardy-Weinberg equilibrium in controls (P<0.01). In all, 7/46 SNPs in 6 genes (ICAM1, IL1A, IL17RC, IL13, LTA and TNF) were associated with severe malaria, with 3/7 SNPs in the TNF/LTA region. Genotype-phenotype correlations between SNPs and clinical parameters revealed that genotypes of rs708567 (IL17RC) correlate with parasitemia (P=0.028, r(2)=0.0086), with GG homozygotes having the lowest parasite burden. Additionally, rs708567 GG homozygotes had a decreased risk of severe malaria (P=0.007, OR=0.78 (95% CI; 0.65-0.93)) and death (P=0.028, OR=0.58 (95% CI; 0.37-0.93)) than those with AA and AG genotypes. In summary, variants in six genes encoding adhesion and proinflammatory molecules are associated with severe malaria in the Vietnamese. Further replicative studies in independent populations will be necessary to confirm these findings.

    Funded by: Medical Research Council: G0600230, G0600718, G19/9; Wellcome Trust: 075491/Z/04, 077012/Z/05/Z, 089276/Z/09/Z, 090532/Z/09/Z, 098051, WT077383/Z/05/Z

    Genes and immunity 2012;13;6;503-8

  • AntiFam: a tool to help identify spurious ORFs in protein annotation.

    Eberhardt RY, Haft DH, Punta M, Martin M, O'Donovan C and Bateman A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA. UK. re3@sanger.ac.uk

    As the deluge of genomic DNA sequence grows the fraction of protein sequences that have been manually curated falls. In turn, as the number of laboratories with the ability to sequence genomes in a high-throughput manner grows, the informatics capability of those labs to accurately identify and annotate all genes within a genome may often be lacking. These issues have led to fears about transitive annotation errors making sequence databases less reliable. During the lifetime of the Pfam protein families database a number of protein families have been built, which were later identified as composed solely of spurious open reading frames (ORFs) either on the opposite strand or in a different, overlapping reading frame with respect to the true protein-coding or non-coding RNA gene. These families were deleted and are no longer available in Pfam. However, we realized that these may perform a useful function to identify new spurious ORFs. We have collected these families together in AntiFam along with additional custom-made families of spurious ORFs. This resource currently contains 23 families that identified 1310 spurious proteins in UniProtKB and a further 4119 spurious proteins in a collection of metagenomic sequences. UniProt has adopted AntiFam as a part of the UniProtKB quality control process and will investigate these spurious proteins for exclusion.

    Funded by: NHGRI NIH HHS: R01 HG004881).; Wellcome Trust: WT077044/Z/05/Z

    Database : the journal of biological databases and curation 2012;2012;bas003

  • Genomics: ENCODE explained.

    Ecker JR, Bickmore WA, Barroso I, Pritchard JK, Gilad Y and Segal E

    Howard Hughes Medical Institute and the Salk Institute for Biological Studies, La Jolla, California 92037, USA. ecker@salk.edu

    Nature 2012;489;7414;52-5

  • An integrated encyclopedia of DNA elements in the human genome.

    ENCODE Project Consortium, Bernstein BE, Birney E, Dunham I, Green ED, Gunter C and Snyder M

    The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

    Funded by: NCI NIH HHS: P30 CA016086, P30 CA045508; NHGRI NIH HHS: K99 HG006698, R01 HG003143, R01 HG003988, R01 HG005085, R01HG003143, R01HG003541, R01HG003700, R01HG003988, R01HG004456-03, RC2 HG005573, RC2HG005591, RC2HG005679, U01 HG004695, U01HG004561, U01HG004571, U01HG004695, U41HG004568, U54 HG006997, U54HG004555, U54HG004557, U54HG004558, U54HG004563, U54HG004570, U54HG004576, U54HG004592; NIDDK NIH HHS: R01 DK054369, R01 DK065806, R37 DK044746; PHS HHS: ZIAHG200323, ZIAHG200341; Wellcome Trust: 095908

    Nature 2012;489;7414;57-74

  • Genetic characterization of northeastern Italian population isolates in the context of broader European genetic diversity.

    Esko T, Mezzavilla M, Nelis M, Borel C, Debniak T, Jakkula E, Julia A, Karachanak S, Khrunin A, Kisfali P, Krulisova V, Aušrelé Kučinskiené Z, Rehnström K, Traglia M, Nikitina-Zake L, Zimprich F, Antonarakis SE, Estivill X, Glavač D, Gut I, Klovins J, Krawczak M, Kučinskas V, Lathrop M, Macek M, Marsal S, Meitinger T, Melegh B, Limborska S, Lubinski J, Paolotie A, Schreiber S, Toncheva D, Toniolo D, Wichmann HE, Zimprich A, Metspalu M, Gasparini P, Metspalu A and D'Adamo P

    1] Estonian Genome Center, University of Tartu, Tartu, Estonia [2] Department of Biotechnology, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia [3] Estonian Biocentre, Tartu, Estonia.

    Population genetic studies on European populations have highlighted Italy as one of genetically most diverse regions. This is possibly due to the country's complex demographic history and large variability in terrain throughout the territory. This is the reason why Italy is enriched for population isolates, Sardinia being the best-known example. As the population isolates have a great potential in disease-causing genetic variants identification, we aimed to genetically characterize a region from northeastern Italy, which is known for isolated communities. Total of 1310 samples, collected from six geographically isolated villages, were genotyped at >145 000 single-nucleotide polymorphism positions. Newly genotyped data were analyzed jointly with the available genome-wide data sets of individuals of European descent, including several population isolates. Despite the linguistic differences and geographical isolation the village populations still show the greatest genetic similarity to other Italian samples. The genetic isolation and small effective population size of the village populations is manifested by higher levels of genomic homozygosity and elevated linkage disequilibrium. These estimates become even more striking when the detected substructure is taken into account. The observed level of genetic isolation in Friuli-Venezia Giulia region is more extreme according to several measures of isolation compared with Sardinians, French Basques and northern Finns, thus proving the status of an isolate.European Journal of Human Genetics advance online publication, 19 December 2012; doi:10.1038/ejhg.2012.229.

    European journal of human genetics : EJHG 2012

  • Comparison of methods for competitive tests of pathway analysis.

    Evangelou M, Rendon A, Ouwehand WH, Wernisch L and Dudbridge F

    Medical Research Council Biostatistics Unit, Institute of Public Health, Cambridge, United Kingdom.

    It has been suggested that pathway analysis can complement single-SNP analysis in exploring genomewide association data. Pathway analysis incorporates the available biological knowledge of genes and SNPs and is expected to improve the chances of revealing the underlying genetic architecture of complex traits. Methods for pathway analysis can be classified as competitive (enrichment) or self-contained (association) according to the hypothesis tested. Although association tests are statistically more powerful than enrichment tests they can be difficult to calibrate because biases in analysis accumulate across multiple SNPs or genes. Furthermore, enrichment tests can be more scientifically relevant than association tests, as they detect pathways with relatively more evidence for association than the remaining genes. Here we show how some well known association tests can be simply adapted to test for enrichment, and compare their performance to some established enrichment tests. We propose versions of the Adaptive Rank Truncated Product (ARTP), Tail Strength Measure and Fisher's combination of p-values for testing the enrichment null hypothesis. We compare the behaviour of these proposed methods with the established Hypergeometric Test and Gene-Set Enrichment Analysis (GSEA). The results of the simulation study show that the modified version of the ARTP method has generally the best performance across the situations considered. The methods were also applied for finding enriched pathways for body mass index (BMI) and platelet function phenotypes. The pathway analysis of BMI identified the Vasoactive Intestinal Peptide pathway as significantly associated with BMI. This pathway has been previously reported as associated with BMI and the risk of obesity. The ARTP method was the method that identified the largest number of enriched pathways across all tested pathway databases and phenotypes. The simulation and data application results are in agreement with previous work on association tests and suggests that the ARTP should be preferred for both enrichment and association testing.

    PloS one 2012;7;7;e41018

  • IFITM3 restricts the morbidity and mortality associated with influenza.

    Everitt AR, Clare S, Pertel T, John SP, Wash RS, Smith SE, Chin CR, Feeley EM, Sims JS, Adams DJ, Wise HM, Kane L, Goulding D, Digard P, Anttila V, Baillie JK, Walsh TS, Hume DA, Palotie A, Xue Y, Colonna V, Tyler-Smith C, Dunning J, Gordon SB, GenISIS Investigators, MOSAIC Investigators, Smyth RL, Openshaw PJ, Dougan G, Brass AL and Kellam P

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.

    The 2009 H1N1 influenza pandemic showed the speed with which a novel respiratory virus can spread and the ability of a generally mild infection to induce severe morbidity and mortality in a subset of the population. Recent in vitro studies show that the interferon-inducible transmembrane (IFITM) protein family members potently restrict the replication of multiple pathogenic viruses. Both the magnitude and breadth of the IFITM proteins' in vitro effects suggest that they are critical for intrinsic resistance to such viruses, including influenza viruses. Using a knockout mouse model, we now test this hypothesis directly and find that IFITM3 is essential for defending the host against influenza A virus in vivo. Mice lacking Ifitm3 display fulminant viral pneumonia when challenged with a normally low-pathogenicity influenza virus, mirroring the destruction inflicted by the highly pathogenic 1918 'Spanish' influenza. Similar increased viral replication is seen in vitro, with protection rescued by the re-introduction of Ifitm3. To test the role of IFITM3 in human influenza virus infection, we assessed the IFITM3 alleles of individuals hospitalized with seasonal or pandemic influenza H1N1/09 viruses. We find that a statistically significant number of hospitalized subjects show enrichment for a minor IFITM3 allele (SNP rs12252-C) that alters a splice acceptor site, and functional assays show the minor CC genotype IFITM3 has reduced influenza virus restriction in vitro. Together these data reveal that the action of a single intrinsic immune effector, IFITM3, profoundly alters the course of influenza virus infection in mouse and humans.

    Funded by: Chief Scientist Office; Medical Research Council: G0600511, G0800767, G0800777, G0802752, G0901697, MC_G1001212, MC_U122785833; NIAID NIH HHS: R01 AI091786, R01AI091786; Wellcome Trust: 090382, 090382/Z/09/Z, 090385/Z/09/Z, 098051

    Nature 2012;484;7395;519-23

  • High-density genetic mapping identifies new susceptibility loci for rheumatoid arthritis.

    Eyre S, Bowes J, Diogo D, Lee A, Barton A, Martin P, Zhernakova A, Stahl E, Viatte S, McAllister K, Amos CI, Padyukov L, Toes RE, Huizinga TW, Wijmenga C, Trynka G, Franke L, Westra HJ, Alfredsson L, Hu X, Sandor C, de Bakker PI, Davila S, Khor CC, Heng KK, Andrews R, Edkins S, Hunt SE, Langford C, Symmons D, Biologics in Rheumatoid Arthritis Genetics and Genomics Study Syndicate, Wellcome Trust Case Control Consortium, Concannon P, Onengut-Gumuscu S, Rich SS, Deloukas P, Gonzalez-Gay MA, Rodriguez-Rodriguez L, Arlsetig L, Martin J, Rantapää-Dahlqvist S, Plenge RM, Raychaudhuri S, Klareskog L, Gregersen PK and Worthington J

    1] Arthritis Research UK Epidemiology Unit, Centre for Musculoskeletal Research, University of Manchester, Manchester Academic Health Science Centre, Manchester, UK. [2] National Institute for Health Research, Manchester Musculoskeletal Biomedical Research Unit, Central Manchester University Hospitals National Health Service Foundation Trust, Manchester Academic Health Sciences Centre, Manchester, UK. [3].

    Using the Immunochip custom SNP array, which was designed for dense genotyping of 186 loci identified through genome-wide association studies (GWAS), we analyzed 11,475 individuals with rheumatoid arthritis (cases) of European ancestry and 15,870 controls for 129,464 markers. We combined these data in a meta-analysis with GWAS data from additional independent cases (n = 2,363) and controls (n = 17,872). We identified 14 new susceptibility loci, 9 of which were associated with rheumatoid arthritis overall and five of which were specifically associated with disease that was positive for anticitrullinated peptide antibodies, bringing the number of confirmed rheumatoid arthritis risk loci in individuals of European ancestry to 46. We refined the peak of association to a single gene for 19 loci, identified secondary independent effects at 6 loci and identified association to low-frequency variants at 4 loci. Bioinformatic analyses generated strong hypotheses for the causal SNP at seven loci. This study illustrates the advantages of dense SNP mapping analysis to inform subsequent functional investigations.

    Nature genetics 2012

  • Comparative proteomics reveals a significant bias toward alternative protein isoforms with conserved structure and function.

    Ezkurdia I, del Pozo A, Frankish A, Rodriguez JM, Harrow J, Ashman K, Valencia A and Tress ML

    Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre, Madrid, Spain.

    Advances in high-throughput mass spectrometry are making proteomics an increasingly important tool in genome annotation projects. Peptides detected in mass spectrometry experiments can be used to validate gene models and verify the translation of putative coding sequences (CDSs). Here, we have identified peptides that cover 35% of the genes annotated by the GENCODE consortium for the human genome as part of a comprehensive analysis of experimental spectra from two large publicly available mass spectrometry databases. We detected the translation to protein of "novel" and "putative" protein-coding transcripts as well as transcripts annotated as pseudogenes and nonsense-mediated decay targets. We provide a detailed overview of the population of alternatively spliced protein isoforms that are detectable by peptide identification methods. We found that 150 genes expressed multiple alternative protein isoforms. This constitutes the largest set of reliably confirmed alternatively spliced proteins yet discovered. Three groups of genes were highly overrepresented. We detected alternative isoforms for 10 of the 25 possible heterogeneous nuclear ribonucleoproteins, proteins with a key role in the splicing process. Alternative isoforms generated from interchangeable homologous exons and from short indels were also significantly enriched, both in human experiments and in parallel analyses of mouse and Drosophila proteomics experiments. Our results show that a surprisingly high proportion (almost 25%) of the detected alternative isoforms are only subtly different from their constitutive counterparts. Many of the alternative splicing events that give rise to these alternative isoforms are conserved in mouse. It was striking that very few of these conserved splicing events broke Pfam functional domains or would damage globular protein structures. This evidence of a strong bias toward subtle differences in CDS and likely conserved cellular function and structure is remarkable and strongly suggests that the translation of alternative transcripts may be subject to selective constraints.

    Funded by: NHGRI NIH HHS: U54 HG0004555

    Molecular biology and evolution 2012;29;9;2265-83

  • Genetics of gene expression in primary immune cells identifies cell type-specific master regulators and roles of HLA alleles.

    Fairfax BP, Makino S, Radhakrishnan J, Plant K, Leslie S, Dilthey A, Ellis P, Langford C, Vannberg FO and Knight JC

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK. bfairfax@well.ox.ac.uk

    Trans-acting genetic variants have a substantial, albeit poorly characterized, role in the heritable determination of gene expression. Using paired purified primary monocytes and B cells, we identify new predominantly cell type-specific cis and trans expression quantitative trait loci (eQTLs), including multi-locus trans associations to LYZ and KLF4 in monocytes and B cells, respectively. Additionally, we observe a B cell-specific trans association of rs11171739 at 12q13.2, a known autoimmune disease locus, with IP6K2 (P = 5.8 × 10(-15)), PRIC285 (P = 3.0 × 10(-10)) and an upstream region of CDKN1A (P = 2 × 10(-52)), suggesting roles for cell cycle regulation and peroxisome proliferator-activated receptor γ (PPARγ) signaling in autoimmune pathogenesis. We also find that specific human leukocyte antigen (HLA) alleles form trans associations with the expression of AOAH and ARHGAP24 in monocytes but not in B cells. In summary, we show that mapping gene expression in defined primary cell populations identifies new cell type-specific trans-regulated networks and provides insights into the genetic basis of disease susceptibility.

    Funded by: Wellcome Trust: 074318, 075491/Z/04, 088891

    Nature genetics 2012;44;5;502-10

  • Generation of anti-Notch antibodies and their application in blocking Notch signalling in neural stem cells.

    Falk R, Falk A, Dyson MR, Melidoni A, Parthiban K, Young J, Roake W and McCafferty J

    University of Cambridge, Department of Biochemistry, Tennis Court Road, CB2 1QW Cambridge, UK.

    Notch signalling occurs via direct cell-cell interactions and plays an important role in linking the fates of neighbouring cells. There are four different mammalian Notch receptors that can be activated by five cell surface ligands. The ability to inhibit specific Notch receptors would help identify the roles of individual family members and potentially provide a means to study and control cell differentiation. Anti-Notch antibodies in the form of single chain Fvs were generated from an antibody phage display library by selection on either the ligand binding domain or the negative regulatory region (NRR) of Notch1 and Notch2. Six antibodies targeting the NRR of Notch1 and four antibodies recognising the NRR of Notch2 were found to prevent receptor activation in cell-based luciferase reporter assays. These antibodies were potent, highly specific inhibitors of individual Notch receptors and interfered with endogenous signalling in stem cell systems of both human and mouse origin. Antibody-mediated inhibition of Notch efficiently down-regulated transcription of the immediate Notch target gene hairy and enhancer of split 5 (Hes5) in both mouse and human neural stem cells and revealed a redundant regulation of Hes5 in these cells as complete down-regulation was seen only after simultaneous blocking of Notch1 and Notch2. In addition, these antibodies promoted differentiation of neural stem cells towards a neuronal fate. In contrast to the widely used small molecule γ-secretase inhibitors, which block all 4 Notch receptors (and a multitude of other signalling pathways), antibodies allow blockade of individual Notch family members in a highly specific way. Specific inhibition will allow examination of the effect of individual Notch receptors in complex differentiation schemes regulated by the co-ordinated action of multiple signalling pathways.

    Methods (San Diego, Calif.) 2012

  • Automatic categorization of diverse experimental information in the bioscience literature.

    Fang R, Schindelman G, Van Auken K, Fernandes J, Chen W, Wang X, Davis P, Tuli MA, Marygold SJ, Millburn G, Matthews B, Zhang H, Brown N, Gelbart WM and Sternberg PW

    Howard Hughes Medical Institute and Biology Division, California Institute of Technology, Pasadena, CA 91125, USA.

    Background: Curation of information from bioscience literature into biological knowledge databases is a crucial way of capturing experimental information in a computable form. During the biocuration process, a critical first step is to identify from all published literature the papers that contain results for a specific data type the curator is interested in annotating. This step normally requires curators to manually examine many papers to ascertain which few contain information of interest and thus, is usually time consuming. We developed an automatic method for identifying papers containing these curation data types among a large pool of published scientific papers based on the machine learning method Support Vector Machine (SVM). This classification system is completely automatic and can be readily applied to diverse experimental data types. It has been in use in production for automatic categorization of 10 different experimental datatypes in the biocuration process at WormBase for the past two years and it is in the process of being adopted in the biocuration process at FlyBase and the Saccharomyces Genome Database (SGD). We anticipate that this method can be readily adopted by various databases in the biocuration community and thereby greatly reducing time spent on an otherwise laborious and demanding task. We also developed a simple, readily automated procedure to utilize training papers of similar data types from different bodies of literature such as C. elegans and D. melanogaster to identify papers with any of these data types for a single database. This approach has great significance because for some data types, especially those of low occurrence, a single corpus often does not have enough training papers to achieve satisfactory performance.

    Results: We successfully tested the method on ten data types from WormBase, fifteen data types from FlyBase and three data types from Mouse Genomics Informatics (MGI). It is being used in the curation work flow at WormBase for automatic association of newly published papers with ten data types including RNAi, antibody, phenotype, gene regulation, mutant allele sequence, gene expression, gene product interaction, overexpression phenotype, gene interaction, and gene structure correction.

    Conclusions: Our methods are applicable to a variety of data types with training set containing several hundreds to a few thousand documents. It is completely automatic and, thus can be readily incorporated to different workflow at different literature-based databases. We believe that the work presented here can contribute greatly to the tremendous task of automating the important yet labor-intensive biocuration effort.

    Funded by: Howard Hughes Medical Institute; NHGRI NIH HHS: P41 HG000739, P41 HG002223, P41 HG002223-10S1, R01 HG004090

    BMC bioinformatics 2012;13;16

  • Bifidobacterial surface-exopolysaccharide facilitates commensal-host interaction through immune modulation and pathogen protection.

    Fanning S, Hall LJ, Cronin M, Zomer A, MacSharry J, Goulding D, Motherway MO, Shanahan F, Nally K, Dougan G and van Sinderen D

    Alimentary Pharmabiotic Centre, University College Cork, Cork, Ireland.

    Bifidobacteria comprise a significant proportion of the human gut microbiota. Several bifidobacterial strains are currently used as therapeutic interventions, claiming various health benefits by acting as probiotics. However, the precise mechanisms by which they maintain habitation within their host and consequently provide these benefits are not fully understood. Here we show that Bifidobacterium breve UCC2003 produces a cell surface-associated exopolysaccharide (EPS), the biosynthesis of which is directed by either half of a bidirectional gene cluster, thus leading to production of one of two possible EPSs. Alternate transcription of the two opposing halves of this cluster appears to be the result of promoter reorientation. Surface EPS provided stress tolerance and promoted in vivo persistence, but not initial colonization. Marked differences were observed in host immune response: strains producing surface EPS (EPS(+)) failed to elicit a strong immune response compared with EPS-deficient variants. Specifically, EPS production was shown to be linked to the evasion of adaptive B-cell responses. Furthermore, presence of EPS(+) B. breve reduced colonization levels of the gut pathogen Citrobacter rodentium. Our data thus assigns a pivotal and beneficial role for EPS in modulating various aspects of bifidobacterial-host interaction, including the ability of commensal bacteria to remain immunologically silent and in turn provide pathogen protection. This finding enforces the probiotic concept and provides mechanistic insights into health-promoting benefits for both animal and human hosts.

    Funded by: Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 2012;109;6;2108-13

  • Cohesin regulates tissue-specific expression by stabilising highly occupied cis-regulatory modules.

    Faure AJ, Schmidt D, Watt S, Schwalie PC, Wilson MD, Xu H, Ramsay RG, Odom DT and Flicek P

    European Bioinformatics Institute;

    The cohesin protein complex contributes to transcriptional regulation in a CTCF-independent manner by colocalising with master regulators at tissue-specific loci. The regulation of transcription involves the concerted action of multiple transcription factors (TFs) and cohesin's role in this context of combinatorial TF binding remains unexplored. To investigate cohesin-non-CTCF (CNC) binding events in vivo we mapped cohesin and CTCF, as well as a collection of tissue-specific and ubiquitous transcriptional regulators, using ChIP-seq in primary mouse liver. We observe a positive correlation between the number of distinct TFs bound and the presence of CNC sites. In contrast to regions of the genome where cohesin and CTCF colocalise, CNC sites coincide with the binding of master regulators and enhancer-markers and are significantly associated with liver-specific expressed genes. We also show that cohesin presence partially explains the commonly observed discrepancy between TF motif score and ChIP signal. Evidence from these statistical analyses in wild type cells, and comparisons to maps of TF binding in Rad21-cohesin haploinsufficient mouse liver, suggests that cohesin helps to stabilise large protein-DNA complexes. Finally, we observe that the presence of mirrored CTCF binding events at promoters and their nearby cohesin-bound enhancers is associated with elevated expression levels.

    Genome research 2012

  • Invasive non-typhoidal salmonella disease: an emerging and neglected tropical disease in Africa.

    Feasey NA, Dougan G, Kingsley RA, Heyderman RS and Gordon MA

    Malawi-Liverpool-Wellcome Trust Clinical Research Programme, Blantyre, Malawi; Department of Gastroenterology, Institute of Translational Medicine, University of Liverpool, Liverpool, UK.

    Invasive strains of non-typhoidal salmonellae have emerged as a prominent cause of bloodstream infection in African adults and children, with an associated case fatality of 20-25%. The clinical presentation of invasive non-typhoidal salmonella disease in Africa is diverse: fever, hepatosplenomegaly, and respiratory symptoms are common, and features of enterocolitis are often absent. The most important risk factors are HIV infection in adults, and malaria, HIV, and malnutrition in children. A distinct genotype of Salmonella enterica var Typhimurium, ST313, has emerged as a new pathogenic clade in sub-Saharan Africa, and might have adapted to cause invasive disease in human beings. Multidrug-resistant ST313 has caused epidemics in several African countries, and has driven the use of expensive antimicrobial drugs in the poorest health services in the world. Studies of systemic cellular and humoral immune responses in adults infected with HIV have revealed key host immune defects contributing to invasive non-typhoidal salmonella disease. This emerging pathogen might therefore have adapted to occupy an ecological and immunological niche provided by HIV, malaria, and malnutrition in Africa. A good understanding of the epidemiology of this neglected disease will open new avenues for development and implementation of vaccine and public health strategies to prevent infections and interrupt transmission.

    Lancet 2012

  • Making your database available through Wikipedia: the pros and cons.

    Finn RD, Gardner PP and Bateman A

    HHMI Janelia Farm Research Campus, 19700 Helix Drive, Ashburn, VA, USA. finnr@janelia.hhmi.org

    Wikipedia, the online encyclopedia, is the most famous wiki in use today. It contains over 3.7 million pages of content; with many pages written on scientific subject matters that include peer-reviewed citations, yet are written in an accessible manner and generally reflect the consensus opinion of the community. In this, the 19th Annual Database Issue of Nucleic Acids Research, there are 11 articles that describe the use of a wiki in relation to a biological database. In this commentary, we discuss how biological databases can be integrated with Wikipedia, thereby utilising the pre-existing infrastructure, tools and above all, large community of authors (or Wikipedians). The limitations to the content that can be included in Wikipedia are highlighted, with examples drawn from articles found in this issue and other wiki-based resources, indicating why other wiki solutions are necessary. We discuss the merits of using open wikis, like Wikipedia, versus other models, with particular reference to potential vandalism. Finally, we raise the question about the future role of dedicated database biocurators in context of the thousands of crowdsourced, community annotations that are now being stored in wikis.

    Funded by: Howard Hughes Medical Institute; Wellcome Trust: WT098051

    Nucleic acids research 2012;40;Database issue;D9-12

  • Progressive cross-reactivity in IgE responses: an explanation for the slow development of human immunity to schistosomiasis?

    Fitzsimmons CM, Jones FM, Pinot de Moira A, Protasio AV, Khalife J, Dickinson HA, Tukahebwa EM and Dunne DW

    Department of Pathology, University of Cambridge, Cambridge, United Kindgdom. cmf1000@cam.ac.uk

    People in regions of Schistosoma mansoni endemicity slowly acquire immunity, but why this takes years to develop is still not clear. It has been associated with increases in parasite-specific IgE, induced, some investigators propose, to antigens exposed during the death of adult worms. These antigens include members of the tegumental-allergen-like protein family (TAL1 to TAL13). Previously, in a group of S. mansoni-infected Ugandan males, we showed that IgE responses to three TALs expressed in worms (TAL1, -3, and -5) became more prevalent with age. Now, in a subcohort we examined associations of these responses with resistance to reinfection and use the data to propose a mechanism for the slow development of immunity. IgE was measured 9 weeks posttreatment and at reinfection at 2 years (n = 144). An anti-TAL5 IgE (herein referred to as TAL5 IgE) response was associated with reduced reinfection even after adjusting for age using regression analysis (geometric mean odds ratio, 0.24; P = 0.016). TAL5 IgE responders were a subset of TAL3 IgE responders, themselves a subset of TAL1 responders. TAL3 IgE and TAL5 IgE were highly cross-reactive, with TAL3 the immunizing antigen and TAL5 the cross-reactive antigen. Transcriptional and translational studies show that TAL3 is most abundant in adult worms and that TAL5 is most abundant in infectious larvae. We propose that in chronic schistosomiasis, older individuals have repeatedly experienced IgE antigens exposed when adult worms die (e.g., TAL3) and that this leads to increasing cross-reactivity with antigens of invading larvae (e.g., TAL5). Progressive accumulation of worm/larvae cross-reactivity could explain the age-dependent immunity observed in areas of endemicity.

    Funded by: Wellcome Trust: 083931/∼/07/Z

    Infection and immunity 2012;80;12;4264-70

  • Ensembl 2012.

    Flicek P, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, Fitzgerald S, Gil L, Gordon L, Hendrix M, Hourlier T, Johnson N, Kähäri AK, Keefe D, Keenan S, Kinsella R, Komorowska M, Koscielny G, Kulesha E, Larsson P, Longden I, McLaren W, Muffato M, Overduin B, Pignatelli M, Pritchard B, Riat HS, Ritchie GR, Ruffier M, Schuster M, Sobral D, Tang YA, Taylor K, Trevanion S, Vandrovcova J, White S, Wilson M, Wilder SP, Aken BL, Birney E, Cunningham F, Dunham I, Durbin R, Fernández-Suarez XM, Harrow J, Herrero J, Hubbard TJ, Parker A, Proctor G, Spudich G, Vogel J, Yates A, Zadissa A and Searle SM

    European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK. flicek@ebi.ac.uk

    The Ensembl project (http://www.ensembl.org) provides genome resources for chordate genomes with a particular focus on human genome data as well as data for key model organisms such as mouse, rat and zebrafish. Five additional species were added in the last year including gibbon (Nomascus leucogenys) and Tasmanian devil (Sarcophilus harrisii) bringing the total number of supported species to 61 as of Ensembl release 64 (September 2011). Of these, 55 species appear on the main Ensembl website and six species are provided on the Ensembl preview site (Pre!Ensembl; http://pre.ensembl.org) with preliminary support. The past year has also seen improvements across the project.

    Funded by: NHGRI NIH HHS: U01HG004695, U41HG006104, U54HG004563; Wellcome Trust: 095908, WT062023, WT079643

    Nucleic acids research 2012;40;Database issue;D84-90

  • Resisting resistance.

    Foth BJ

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. microbes@sanger.ac.uk.

    This month's Genome Watch describes how knowledge of the malaria parasite genome can be used to better understand and mitigate the emergence of drug resistance.

    Nature reviews. Microbiology 2012;10;8;524

  • The importance of identifying alternative splicing in vertebrate genome annotation.

    Frankish A, Mudge JM, Thomas M and Harrow J

    Human and Vertebrate Analysis and Annotation Team, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. af2@sanger.ac.uk

    While alternative splicing (AS) can potentially expand the functional repertoire of vertebrate genomes, relatively few AS transcripts have been experimentally characterized. We describe our detailed manual annotation of vertebrate genomes, which is generating a publicly available geneset rich in AS. In order to achieve this we have adopted a highly sensitive approach to annotating gene models supported by correctly mapped, canonically spliced transcriptional evidence combined with a highly cautious approach to adding unsupported extensions to models and making decisions on their functional potential. We use information about the predicted functional potential and structural properties of every AS transcript annotated at a protein-coding or non-coding locus to place them into one of eleven subclasses. We describe the incorporation of new sequencing and proteomics technologies into our annotation pipelines, which are used to identify and validate AS. Combining all data sources has led to the production of a rich geneset containing an average of 6.3 AS transcripts for every human multi-exon protein-coding gene. The datasets produced have proved very useful in providing context to studies investigating the functional potential of genes and the effect of variation may have on gene structure and function. DATABASE URL: http://www.ensembl.org/index.html, http://vega.sanger.ac.uk/index.html.

    Funded by: NHGRI NIH HHS: 5U54HG004555-04S1; Wellcome Trust: WT077198

    Database : the journal of biological databases and curation 2012;2012;bas014

  • Genome-wide association analysis identifies susceptibility loci for migraine without aura.

    Freilinger T, Anttila V, de Vries B, Malik R, Kallela M, Terwindt GM, Pozo-Rosich P, Winsvold B, Nyholt DR, van Oosterhout WP, Artto V, Todt U, Hämäläinen E, Fernández-Morales J, Louter MA, Kaunisto MA, Schoenen J, Raitakari O, Lehtimäki T, Vila-Pueyo M, Göbel H, Wichmann E, Sintas C, Uitterlinden AG, Hofman A, Rivadeneira F, Heinze A, Tronvik E, van Duijn CM, Kaprio J, Cormand B, Wessman M, Frants RR, Meitinger T, Müller-Myhsok B, Zwart JA, Färkkilä M, Macaya A, Ferrari MD, Kubisch C, Palotie A, Dichgans M, van den Maagdenberg AM and International Headache Genetics Consortium

    Institute for Stroke and Dementia Research, Klinikum der Universität München, Munich, Germany.

    Migraine without aura is the most common form of migraine, characterized by recurrent disabling headache and associated autonomic symptoms. To identify common genetic variants associated with this migraine type, we analyzed genome-wide association data of 2,326 clinic-based German and Dutch individuals with migraine without aura and 4,580 population-matched controls. We selected SNPs from 12 loci with 2 or more SNPs associated with P values of <1 × 10(-5) for replication testing in 2,508 individuals with migraine without aura and 2,652 controls. SNPs at two of these loci showed convincing replication: at 1q22 (in MEF2D; replication P = 4.9 × 10(-4); combined P = 7.06 × 10(-11)) and at 3p24 (near TGFBR2; replication P = 1.0 × 10(-4); combined P = 1.17 × 10(-9)). In addition, SNPs at the PHACTR1 and ASTN2 loci showed suggestive evidence of replication (P = 0.01; combined P = 3.20 × 10(-8) and P = 0.02; combined P = 3.86 × 10(-8), respectively). We also replicated associations at two previously reported migraine loci in or near TRPM8 and LRP1. This study identifies the first susceptibility loci for migraine without aura, thereby expanding our knowledge of this debilitating neurological disorder.

    Funded by: Wellcome Trust: 098051

    Nature genetics 2012;44;7;777-82

  • A familial case with interstitial 2q36 deletion: variable phenotypic expression in full and mosaic state.

    Freitas ÉL, Gribble SM, Simioni M, Vieira TP, Prigmore E, Krepischi AC, Rosenberg C, Pearson PL, Melo DG and Gil-da-Silva-Lopes VL

    Department of Medical Genetics, Faculty of Medical Sciences, University of Campinas, Rua Tessália Vieira de Camargo, 126 CEP 13083-887 Campinas, São Paulo, Brazil.

    Submicroscopic chromosomal anomalies play an important role in the etiology of craniofacial malformations, including midline facial defects with hypertelorism (MFDH). MFDH is a common feature combination in several conditions, of which Frontonasal Dysplasia is the most frequently encountered manifestation; in most cases the etiology remains unknown. We identified a parent to child transmission of a 6.2 Mb interstitial deletion of chromosome region 2q36.1q36.3 by array-CGH and confirmed by FISH and microsatellite analysis. The patient and her mother both presented an MFDH phenotype although the phenotype in the mother was much milder than her daughter. Inspection of haplotype segregation within the family of 2q36.1 region suggests that the deletion arose on a chromosome derived from the maternal grandfather. Evidences based on FISH, microsatellite and array-CGH analysis point to a high frequency mosaicism for presence of a deleted region 2q36 occurring in blood of the mother. The frequency of mosaicism in other tissues could not be determined. We here suggest that the milder phenotype observed in the proband's mother can be explained by the mosaic state of the deletion. This most likely arose by an early embryonic deletion in the maternal embryo resulting in both gonadal and somatic mosaicism of two cell lines, with and without the deleted chromosome. The occurrence of gonadal mosaicism increases the recurrence risk significantly and is often either underestimated or not even taken into account in genetic counseling where new mutation is suspected.

    European journal of medical genetics 2012;55;11;660-5

  • Intracranial aneurysm risk locus 5q23.2 is associated with elevated systolic blood pressure.

    Gaál EI, Salo P, Kristiansson K, Rehnström K, Kettunen J, Sarin AP, Niemelä M, Jula A, Raitakari OT, Lehtimäki T, Eriksson JG, Widen E, Günel M, Kurki M, von und Zu Fraunberg M, Jääskeläinen JE, Hernesniemi J, Järvelin MR, Pouta A, International Consortium for Blood Pressure Genome-Wide Association Studies, Newton-Cheh C, Salomaa V, Palotie A and Perola M

    Public Health Genomics Unit, Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Finland. emilia.gaal@helsinki.fi

    Although genome-wide association studies (GWAS) have identified hundreds of complex trait loci, the pathomechanisms of most remain elusive. Studying the genetics of risk factors predisposing to disease is an attractive approach to identify targets for functional studies. Intracranial aneurysms (IA) are rupture-prone pouches at cerebral artery branching sites. IA is a complex disease for which GWAS have identified five loci with strong association and a further 14 loci with suggestive association. To decipher potential underlying disease mechanisms, we tested whether there are IA loci that convey their effect through elevating blood pressure (BP), a strong risk factor of IA. We performed a meta-analysis of four population-based Finnish cohorts (n(FIN)  =  11 266) not selected for IA, to assess the association of previously identified IA candidate loci (n  =  19) with BP. We defined systolic BP (SBP), diastolic BP, mean arterial pressure, and pulse pressure as quantitative outcome variables. The most significant result was further tested for association in the ICBP-GWAS cohort of 200 000 individuals. We found that the suggestive IA locus at 5q23.2 in PRDM6 was significantly associated with SBP in individuals of European descent (p(FIN)  =  3.01E-05, p(ICBP-GWAS)  =  0.0007, p(ALL)  =  8.13E-07). The risk allele of IA was associated with higher SBP. PRDM6 encodes a protein predominantly expressed in vascular smooth muscle cells. Our study connects a complex disease (IA) locus with a common risk factor for the disease (SBP). We hypothesize that common variants in PRDM6 can contribute to altered vascular wall structure, hence increasing SBP and predisposing to IA. True positive associations often fail to reach genome-wide significance in GWAS. Our findings show that analysis of traditional risk factors as intermediate phenotypes is an effective tool for deciphering hidden heritability. Further, we demonstrate that common disease loci identified in a population isolate may bear wider significance.

    Funded by: Medical Research Council: G0500539, G0600705; NHLBI NIH HHS: 5R01HL087679-02; NIMH NIH HHS: 1RL1MH083268-01; Wellcome Trust: GR069224

    PLoS genetics 2012;8;3;e1002563

  • Controls of nucleosome positioning in the human genome.

    Gaffney DJ, McVicker G, Pai AA, Fondufe-Mittendorf YN, Lewellen N, Michelini K, Widom J, Gilad Y and Pritchard JK

    Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America. dg13@sanger.ac.uk

    Nucleosomes are important for gene regulation because their arrangement on the genome can control which proteins bind to DNA. Currently, few human nucleosomes are thought to be consistently positioned across cells; however, this has been difficult to assess due to the limited resolution of existing data. We performed paired-end sequencing of micrococcal nuclease-digested chromatin (MNase-seq) from seven lymphoblastoid cell lines and mapped over 3.6 billion MNase-seq fragments to the human genome to create the highest-resolution map of nucleosome occupancy to date in a human cell type. In contrast to previous results, we find that most nucleosomes have more consistent positioning than expected by chance and a substantial fraction (8.7%) of nucleosomes have moderate to strong positioning. In aggregate, nucleosome sequences have 10 bp periodic patterns in dinucleotide frequency and DNase I sensitivity; and, across cells, nucleosomes frequently have translational offsets that are multiples of 10 bp. We estimate that almost half of the genome contains regularly spaced arrays of nucleosomes, which are enriched in active chromatin domains. Single nucleotide polymorphisms that reduce DNase I sensitivity can disrupt the phasing of nucleosome arrays, which indicates that they often result from positioning against a barrier formed by other proteins. However, nucleosome arrays can also be created by DNA sequence alone. The most striking example is an array of over 400 nucleosomes on chromosome 12 that is created by tandem repetition of sequences with strong positioning properties. In summary, a large fraction of nucleosomes are consistently positioned--in some regions because they adopt favored sequence positions, and in other regions because they are forced into specific arrangements by chromatin remodeling or DNA binding proteins.

    Funded by: Howard Hughes Medical Institute; NHGRI NIH HHS: HG006123; NIMH NIH HHS: MH090951

    PLoS genetics 2012;8;11;e1003036

  • Dissecting the regulatory architecture of gene expression QTLs.

    Gaffney DJ, Veyrieras JB, Degner JF, Pique-Regi R, Pai AA, Crawford GE, Stephens M, Gilad Y and Pritchard JK

    Department of Human Genetics, University of Chicago, 920 E58th Street, Chicago, IL 60637, USA. dg13@sanger.ac.uk

    Background: Expression quantitative trait loci (eQTLs) are likely to play an important role in the genetics of complex traits; however, their functional basis remains poorly understood. Using the HapMap lymphoblastoid cell lines, we combine 1000 Genomes genotypes and an extensive catalogue of human functional elements to investigate the biological mechanisms that eQTLs perturb.

    Results: We use a Bayesian hierarchical model to estimate the enrichment of eQTLs in a wide variety of regulatory annotations. We find that approximately 40% of eQTLs occur in open chromatin, and that they are particularly enriched in transcription factor binding sites, suggesting that many directly impact protein-DNA interactions. Analysis of core promoter regions shows that eQTLs also frequently disrupt some known core promoter motifs but, surprisingly, are not enriched in other well-known motifs such as the TATA box. We also show that information from regulatory annotations alone, when weighted by the hierarchical model, can provide a meaningful ranking of the SNPs that are most likely to drive gene expression variation.

    Conclusions: Our study demonstrates how regulatory annotation and the association signal derived from eQTL-mapping can be combined into a single framework. We used this approach to further our understanding of the biology that drives human gene expression variation, and of the putatively causal SNPs that underlie it.

    Funded by: Howard Hughes Medical Institute; NHGRI NIH HHS: R01 HG006123; NHLBI NIH HHS: R01 HL092206-04; NIGMS NIH HHS: GM077959; NIMH NIH HHS: MH084703, MH090951

    Genome biology 2012;13;1;R7

  • Universal amplification, next-generation sequencing, and assembly of HIV-1 genomes.

    Gall A, Ferns B, Morris C, Watson S, Cotten M, Robinson M, Berry N, Pillay D and Kellam P

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    Whole HIV-1 genome sequences are pivotal for large-scale studies of inter- and intrahost evolution, including the acquisition of drug resistance mutations. The ability to rapidly and cost-effectively generate large numbers of HIV-1 genome sequences from different populations and geographical locations and determine the effect of minority genetic variants is, however, a limiting factor. Next-generation sequencing promises to bridge this gap but is hindered by the lack of methods for the enrichment of virus genomes across the phylogenetic breadth of HIV-1 and methods for the robust assembly of the virus genomes from short-read data. Here we report a method for the amplification, next-generation sequencing, and unbiased de novo assembly of HIV-1 genomes of groups M, N, and O, as well as recombinants, that does not require prior knowledge of the sequence or subtype. A sensitivity of at least 3,000 copies/ml was determined by using plasma virus samples of known copy numbers. We applied our novel method to compare the genome diversities of HIV-1 groups, subtypes, and genes. The highest level of diversity was found in the env, nef, vpr, tat, and rev genes and parts of the gag gene. Furthermore, we used our method to investigate mutations associated with HIV-1 drug resistance in clinical samples at the level of the complete genome. Drug resistance mutations were detected as both major variant and minor species. In conclusion, we demonstrate the feasibility of our method for large-scale HIV-1 genome sequencing. This will enable the phylogenetic and phylodynamic resolution of the ongoing pandemic and efficient monitoring of complex HIV-1 drug resistance genotypes.

    Funded by: Wellcome Trust: S0753

    Journal of clinical microbiology 2012;50;12;3838-44

  • Exploiting genetic complexity in cancer to improve therapeutic strategies.

    Garnett MJ and McDermott U

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, UK. mj12@sanger.ac.uk

    Advances in genome sequencing technologies are enabling researchers to make rapid progress in defining the entire repertoire of causal genetic changes in cancer. The response of patients with cancer to therapy is often highly variable and there is an increasing number of examples where mutations in cancer genomes have been shown to have a profound effect on the clinical effectiveness of drugs. An urgent challenge for the research and clinical communities is how to translate these genomic data sets into new and improved therapeutic strategies for the treatment of patients. The use of large-scale cell line-based drug screens to identify genomic 'biomarkers' of drug response for the stratification of patients has the potential to transform how patients with cancer are treated.

    Drug discovery today 2012;17;5-6;188-93

  • Systematic identification of genomic markers of drug sensitivity in cancer cells.

    Garnett MJ, Edelman EJ, Heidorn SJ, Greenman CD, Dastur A, Lau KW, Greninger P, Thompson IR, Luo X, Soares J, Liu Q, Iorio F, Surdez D, Chen L, Milano RJ, Bignell GR, Tam AT, Davies H, Stevenson JA, Barthorpe S, Lutz SR, Kogera F, Lawrence K, McLaren-Douglas A, Mitropoulos X, Mironenko T, Thi H, Richardson L, Zhou W, Jewitt F, Zhang T, O'Brien P, Boisvert JL, Price S, Hur W, Yang W, Deng X, Butler A, Choi HG, Chang JW, Baselga J, Stamenkovic I, Engelman JA, Sharma SV, Delattre O, Saez-Rodriguez J, Gray NS, Settleman J, Futreal PA, Haber DA, Stratton MR, Ramaswamy S, McDermott U and Benes CH

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Clinical responses to anticancer therapies are often restricted to a subset of patients. In some cases, mutated cancer genes are potent biomarkers for responses to targeted agents. Here, to uncover new biomarkers of sensitivity and resistance to cancer therapeutics, we screened a panel of several hundred cancer cell lines--which represent much of the tissue-type and genetic diversity of human cancers--with 130 drugs under clinical and preclinical investigation. In aggregate, we found that mutated cancer genes were associated with cellular response to most currently available cancer drugs. Classic oncogene addiction paradigms were modified by additional tissue-specific or expression biomarkers, and some frequently mutated genes were associated with sensitivity to a broad range of therapeutic agents. Unexpected relationships were revealed, including the marked sensitivity of Ewing's sarcoma cells harbouring the EWS (also known as EWSR1)-FLI1 gene translocation to poly(ADP-ribose) polymerase (PARP) inhibitors. By linking drug activity to the functional complexity of cancer genomes, systematic pharmacogenomic profiling in cancer cell lines provides a powerful biomarker discovery platform to guide rational cancer therapeutic strategies.

    Funded by: Howard Hughes Medical Institute; NHGRI NIH HHS: 1U54HG006097-01; NIGMS NIH HHS: P41GM079575-02; Wellcome Trust: 086357

    Nature 2012;483;7391;570-5

  • Recent advances in biocuration: meeting report from the fifth International Biocuration Conference.

    Gaudet P, Arighi C, Bastian F, Bateman A, Blake JA, Cherry MJ, D'Eustachio P, Finn R, Giglio M, Hirschman L, Kania R, Klimke W, Martin MJ, Karsch-Mizrachi I, Munoz-Torres M, Natale D, O'Donovan C, Ouellette F, Pruitt KD, Robinson-Rechavi M, Sansone SA, Schofield P, Sutton G, Van Auken K, Vasudevan S, Wu C, Young J and Mazumder R

    International Society for Biocuration and CALIPHO Group, Swiss Institute of Bioinformatics, 1 Rue Michel Servet, Geneva, Switzerland. pascale.gaudet@isb-sib.ch

    The 5th International Biocuration Conference brought together over 300 scientists to exchange on their work, as well as discuss issues relevant to the International Society for Biocuration's (ISB) mission. Recurring themes this year included the creation and promotion of gold standards, the need for more ontologies, and more formal interactions with journals. The conference is an essential part of the ISB's goal to support exchanges among members of the biocuration community. Next year's conference will be held in Cambridge, UK, from 7 to 10 April 2013. In the meanwhile, the ISB website provides information about the society's activities (http://biocurator.org), as well as related events of interest.

    Database : the journal of biological databases and curation 2012;2012;bas036

  • Genetic analysis of Xenopus tropicalis.

    Geach TJ, Stemple DL and Zimmerman LB

    National Institute for Medical Research, London, England, UK.

    The pipid frog Xenopus tropicalis has emerged as a powerful new model system for combining genetic and genomic analysis of tetrapod development with robust embryological, molecular, and biochemical assays. Its early development closely resembles that of its well-understood relative X. laevis, from which techniques and reagents can be readily transferred. In contrast to the tetraploid X. laevis, X. tropicalis has a compact diploid genome with strong synteny to those of amniotes. Recently, advances in high-throughput sequencing together with solution-hybridization whole-exome enrichment technology offer powerful strategies for cloning novel mutations as well as reverse genetic identification of sequence lesions in specific genes of interest. Further advantages include the wide range of functional and molecular assays available, the large number of embryos/meioses produced, and the ease of haploid genetics and gynogenesis. The addition of these genetic tools to X. tropicalis provides a uniquely flexible platform for analysis of gene function in vertebrate development.

    Funded by: Medical Research Council: MC_U117560482, U117560482; Wellcome Trust: WT 077047/Z/05/Z

    Methods in molecular biology (Clifton, N.J.) 2012;917;69-110

  • Experimental and husbandry procedures as potential modifiers of the results of phenotyping tests.

    Gerdin AK, Igosheva N, Roberson LA, Ismail O, Karp N, Sanderson M, Cambridge E, Shannon C, Sunter D, Ramirez-Solis R, Bussell J and White JK

    To maximize the sensitivity of detecting affects of genetic variants in mice, variables have been minimized through the use of inbred mouse lines, by eliminating infectious organisms and controlling environmental variables. However, the impact of standard animal husbandry and experimental procedures on the validity of experimental data is under appreciated. In this study we monitored the impact of these procedures by using parameters that reflect stress and physiological responses to it. Short-term measures included telemetered heart rate and systolic arterial pressure, core body temperature and blood glucose, while longer-term parameters were assessed such as body weight. Male and female C57BL6/NTac mice were subjected to a range of stressors with different perceived severities ranging from repeated blood glucose and core temperature measurement procedures, intra-peritoneal injection and overnight fasting to cage transport and cage changing. Our studies reveal that common husbandry and experimental procedures significantly influence mouse physiology and behaviour. Systolic arterial pressure, heart rate, locomotor activity, core temperature and blood glucose were elevated in response to a range of experimental procedures. Differences between sexes were evident, female mice displayed more sustained cardiovascular responses and locomotor activity than male mice. These results have important implications for the design and implementation of multiple component experiments where the lasting effects of stress from previous tests may modify the outcomes of subsequent ones.

    Physiology & behavior 2012;106;5;602-611

  • Intratumor heterogeneity and branched evolution revealed by multiregion sequencing.

    Gerlinger M, Rowan AJ, Horswell S, Larkin J, Endesfelder D, Gronroos E, Martinez P, Matthews N, Stewart A, Tarpey P, Varela I, Phillimore B, Begum S, McDonald NQ, Butler A, Jones D, Raine K, Latimer C, Santos CR, Nohadani M, Eklund AC, Spencer-Dene B, Clark G, Pickering L, Stamp G, Gore M, Szallasi Z, Downward J, Futreal PA and Swanton C

    Cancer Research UK London Research Institute, London, United Kingdom.

    Background: Intratumor heterogeneity may foster tumor evolution and adaptation and hinder personalized-medicine strategies that depend on results from single tumor-biopsy samples.

    Methods: To examine intratumor heterogeneity, we performed exome sequencing, chromosome aberration analysis, and ploidy profiling on multiple spatially separated samples obtained from primary renal carcinomas and associated metastatic sites. We characterized the consequences of intratumor heterogeneity using immunohistochemical analysis, mutation functional analysis, and profiling of messenger RNA expression.

    Results: Phylogenetic reconstruction revealed branched evolutionary tumor growth, with 63 to 69% of all somatic mutations not detectable across every tumor region. Intratumor heterogeneity was observed for a mutation within an autoinhibitory domain of the mammalian target of rapamycin (mTOR) kinase, correlating with S6 and 4EBP phosphorylation in vivo and constitutive activation of mTOR kinase activity in vitro. Mutational intratumor heterogeneity was seen for multiple tumor-suppressor genes converging on loss of function; SETD2, PTEN, and KDM5C underwent multiple distinct and spatially separated inactivating mutations within a single tumor, suggesting convergent phenotypic evolution. Gene-expression signatures of good and poor prognosis were detected in different regions of the same tumor. Allelic composition and ploidy profiling analysis revealed extensive intratumor heterogeneity, with 26 of 30 tumor samples from four tumors harboring divergent allelic-imbalance profiles and with ploidy heterogeneity in two of four tumors.

    Conclusions: Intratumor heterogeneity can lead to underestimation of the tumor genomics landscape portrayed from single tumor-biopsy samples and may present major challenges to personalized-medicine and biomarker development. Intratumor heterogeneity, associated with heterogeneous protein function, may foster tumor adaptation and therapeutic failure through Darwinian selection. (Funded by the Medical Research Council and others.).

    Funded by: Cancer Research UK; Medical Research Council: G0701935, G0902275; Wellcome Trust

    The New England journal of medicine 2012;366;10;883-92

  • The role of variation at AβPP, PSEN1, PSEN2, and MAPT in late onset Alzheimer's disease.

    Gerrish A, Russo G, Richards A, Moskvina V, Ivanov D, Harold D, Sims R, Abraham R, Hollingworth P, Chapman J, Hamshere M, Pahwa JS, Dowzell K, Williams A, Jones N, Thomas C, Stretton A, Morgan AR, Lovestone S, Powell J, Proitsi P, Lupton MK, Brayne C, Rubinsztein DC, Gill M, Lawlor B, Lynch A, Morgan K, Brown KS, Passmore PA, Craig D, McGuinness B, Todd S, Johnston JA, Holmes C, Mann D, Smith AD, Love S, Kehoe PG, Hardy J, Mead S, Fox N, Rossor M, Collinge J, Maier W, Jessen F, Kölsch H, Heun R, Schürmann B, van den Bussche H, Heuser I, Kornhuber J, Wiltfang J, Dichgans M, Frölich L, Hampel H, Hüll M, Rujescu D, Goate AM, Kauwe JS, Cruchaga C, Nowotny P, Morris JC, Mayo K, Livingston G, Bass NJ, Gurling H, McQuillin A, Gwilliam R, Deloukas P, Davies G, Harris SE, Starr JM, Deary IJ, Al-Chalabi A, Shaw CE, Tsolaki M, Singleton AB, Guerreiro R, Mühleisen TW, Nöthen MM, Moebus S, Jöckel KH, Klopp N, Wichmann HE, Carrasquillo MM, Pankratz VS, Younkin SG, Jones L, Holmans PA, O'Donovan MC, Owen MJ and Williams J

    MRC Centre for Neuropsychiatric Genetics and Genomics, Department of Psychological Medicine and Neurology, School of Medicine, Neuroscience and Mental Health Research Institute, Cardiff University, Cardiff, UK.

    Rare mutations in AβPP, PSEN1, and PSEN2 cause uncommon early onset forms of Alzheimer's disease (AD), and common variants in MAPT are associated with risk of other neurodegenerative disorders. We sought to establish whether common genetic variation in these genes confer risk to the common form of AD which occurs later in life (>65 years). We therefore tested single-nucleotide polymorphisms at these loci for association with late-onset AD (LOAD) in a large case-control sample consisting of 3,940 cases and 13,373 controls. Single-marker analysis did not identify any variants that reached genome-wide significance, a result which is supported by other recent genome-wide association studies. However, we did observe a significant association at the MAPT locus using a gene-wide approach (p = 0.009). We also observed suggestive association between AD and the marker rs9468, which defines the H1 haplotype, an extended haplotype that spans the MAPT gene and has previously been implicated in other neurodegenerative disorders including Parkinson's disease, progressive supranuclear palsy, and corticobasal degeneration. In summary common variants at AβPP, PSEN1, and PSEN2 and MAPT are unlikely to make strong contributions to susceptibility for LOAD. However, the gene-wide effect observed at MAPT indicates a possible contribution to disease risk which requires further study.

    Funded by: Biotechnology and Biological Sciences Research Council: G0700704/84698; Chief Scientist Office; Medical Research Council: G0601846, G0701075, G0900688, MC_U123160651, MC_U123160657; Wellcome Trust: 095317

    Journal of Alzheimer's disease : JAD 2012;28;2;377-87

  • Haplotype analyses of haemoglobin C and haemoglobin s and the dynamics of the evolutionary response to malaria in kassena-nankana district of ghana.

    Ghansah A, Rockett KA, Clark TG, Wilson MD, Koram KA, Oduro AR, Amenga-Etego L, Anyorigiya T, Hodgson A, Milligan P, Rogers WO and Kwiatkowski DP

    Noguchi Memorial Institute for Medical Research, University of Ghana, Accra, Ghana.

    Background: Haemoglobin S (HbS) and C (HbC) are variants of the HBB gene which both protect against malaria. It is not clear, however, how these two alleles have evolved in the West African countries where they co-exist at high frequencies. Here we use haplotypic signatures of selection to investigate the evolutionary history of the malaria-protective alleles HbS and HbC in the Kassena-Nankana District (KND) of Ghana.

    The haplotypic structure of HbS and HbC alleles was investigated, by genotyping 56 SNPs around the HBB locus. We found that, in the KND population, both alleles reside on extended haplotypes (approximately 1.5 Mb for HbS and 650 Kb for HbC) that are significantly less diverse than those of the ancestral HbA allele. The extended haplotypes span a recombination hotspot that is known to exist in this region of the genome

    Significance: Our findings show strong support for recent positive selection of both the HbS and HbC alleles and provide insights into how these two alleles have both evolved in the population of northern Ghana.

    PloS one 2012;7;4;e34565

  • JAK2V617F homozygosity arises commonly and recurrently in PV and ET, but PV is characterized by expansion of a dominant homozygous subclone.

    Godfrey AL, Chen E, Pagano F, Ortmann CA, Silber Y, Bellosillo B, Guglielmelli P, Harrison CN, Reilly JT, Stegelmann F, Bijou F, Lippert E, McMullin MF, Boiron JM, Döhner K, Vannucchi AM, Besses C, Campbell PJ and Green AR

    Cambridge Institute for Medical Research and Department of Haematology, University of Cambridge, Cambridge, United Kingdom.

    Subclones homozygous for JAK2V617F are more common in polycythemia vera (PV) than essential thrombocythemia (ET), but their prevalence and significance remain unclear. The JAK2 mutation status of 6495 BFU-E, grown in low erythropoietin conditions, was determined in 77 patients with PV or ET. Homozygous-mutant colonies were common in patients with JAK2V617F-positive PV and were surprisingly prevalent in JAK2V617F-positive ET and JAK2 exon 12-mutated PV. Using microsatellite PCR to map loss-of-heterozygosity breakpoints within individual colonies, we demonstrate that recurrent acquisition of JAK2V617F homozygosity occurs frequently in both PV and ET. PV was distinguished from ET by expansion of a dominant homozygous subclone, the selective advantage of which is likely to reflect additional genetic or epigenetic lesions. Our results suggest a model in which development of a dominant JAK2V617F-homzygous subclone drives erythrocytosis in many PV patients, with alternative mechanisms operating in those with small or undetectable homozygous-mutant clones.

    Funded by: Medical Research Council; Wellcome Trust

    Blood 2012;120;13;2704-7

  • Extensive compensatory cis-trans regulation in the evolution of mouse gene expression.

    Goncalves A, Leigh-Brown S, Thybert D, Stefflova K, Turro E, Flicek P, Brazma A, Odom DT and Marioni JC

    European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.

    Gene expression levels are thought to diverge primarily via regulatory mutations in trans within species, and in cis between species. To test this hypothesis in mammals we used RNA-sequencing to measure gene expression divergence between C57BL/6J and CAST/EiJ mouse strains and allele-specific expression in their F1 progeny. We identified 535 genes with parent-of-origin specific expression patterns, although few of these showed full allelic silencing. This suggests that the number of imprinted genes in a typical mouse somatic tissue is relatively small. In the set of nonimprinted genes, 32% showed evidence of divergent expression between the two strains. Of these, 2% could be attributed purely to variants acting in trans, while 43% were attributable only to variants acting in cis. The genes with expression divergence driven by changes in trans showed significantly higher sequence constraint than genes where the divergence was explained by variants acting in cis. The remaining genes with divergent patterns of expression (55%) were regulated by a combination of variants acting in cis and variants acting in trans. Intriguingly, the changes in expression induced by the cis and trans variants were in opposite directions more frequently than expected by chance, implying that compensatory regulation to stabilize gene expression levels is widespread. We propose that expression levels of genes regulated by this mechanism are fine-tuned by cis variants that arise following regulatory changes in trans, suggesting that many cis variants are not the primary targets of natural selection.

    Funded by: Cancer Research UK: A15603; Wellcome Trust

    Genome research 2012;22;12;2376-84

  • Estimation of rearrangement phylogeny for cancer genomes.

    Greenman CD, Pleasance ED, Newman S, Yang F, Fu B, Nik-Zainal S, Jones D, Lau KW, Carter N, Edwards PA, Futreal PA, Stratton MR and Campbell PJ

    Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom. C.Greenman@uea.ac.uk

    Cancer genomes are complex, carrying thousands of somatic mutations including base substitutions, insertions and deletions, rearrangements, and copy number changes that have been acquired over decades. Recently, technologies have been introduced that allow generation of high-resolution, comprehensive catalogs of somatic alterations in cancer genomes. However, analyses of these data sets generally do not indicate the order in which mutations have occurred, or the resulting karyotype. Here, we introduce a mathematical framework that begins to address this problem. By using samples with accurate data sets, we can reconstruct relatively complex temporal sequences of rearrangements and provide an assembly of genomic segments into digital karyotypes. For cancer genes mutated in rearranged regions, this information can provide a chronological examination of the selective events that have taken place.

    Funded by: Medical Research Council; Wellcome Trust

    Genome research 2012;22;2;346-61

  • Comprehensive Exploration of the Effects of miRNA SNPs on Monocyte Gene Expression.

    Greliche N, Zeller T, Wild PS, Rotival M, Schillert A, Ziegler A, Deloukas P, Erdmann J, Hengstenberg C, Ouwehand WH, Samani NJ, Schunkert H, Munzel T, Lackner KJ, Cambien F, Goodall AH, Tiret L, Blankenberg S, Trégouët DA and Cardiogenics Consortium

    INSERM UMR_S 937, Pierre and Marie Curie University (UPMC, Paris 6), Paris, France ; Université Paris-Sud, Paris, France.

    We aimed to assess whether pri-miRNA SNPs (miSNPs) could influence monocyte gene expression, either through marginal association or by interacting with polymorphisms located in 3'UTR regions (3utrSNPs). We then conducted a genome-wide search for marginal miSNPs effects and pairwise miSNPs × 3utrSNPs interactions in a sample of 1,467 individuals for which genome-wide monocyte expression and genotype data were available. Statistical associations that survived multiple testing correction were tested for replication in an independent sample of 758 individuals with both monocyte gene expression and genotype data. In both studies, the hsa-mir-1279 rs1463335 was found to modulate in cis the expression of LYZ and in trans the expression of CNTN6, CTRC, COPZ2, KRT9, LRRFIP1, NOD1, PCDHA6, ST5 and TRAF3IP2 genes, supporting the role of hsa-mir-1279 as a regulator of several genes in monocytes. In addition, we identified two robust miSNPs × 3utrSNPs interactions, one involving HLA-DPB1 rs1042448 and hsa-mir-219-1 rs107822, the second the H1F0 rs1894644 and hsa-mir-659 rs5750504, modulating the expression of the associated genes.As some of the aforementioned genes have previously been reported to reside at disease-associated loci, our findings provide novel arguments supporting the hypothesis that the genetic variability of miRNAs could also contribute to the susceptibility to human diseases.

    PloS one 2012;7;9;e45863

  • Detection of cytoplasmic nucleophosmin expression by imaging flow cytometry.

    Grimwade L, Gudgin E, Bloxham D, Bottley G, Vassiliou G, Huntly B, Scott MA and Erber WN

    Haemato-Oncology Diagnostics Service, Department of Haematology, Addenbrooke's Hospital, Cambridge, United Kingdom. lizz.grimwade@addenbrookes.nhs.uk

    Mutations within the nucleophosmin NPM1 gene occur in approximately one-third of cases of acute myeloid leukemia (AML). These mutations result in cytoplasmic accumulation of the mutant NPM protein. NPM1 mutations are currently detected by molecular methods. Using samples from 37 AML patients, we investigated whether imaging flow cytometry could be a viable alternative to this current technique. Bone marrow/peripheral blood cells were stained with anti-NPM antibody and DRAQ5 nuclear stain, and data were acquired on an ImageStream imaging flow cytometer (Amnis Corp., Seattle, USA). Using the similarity feature for data analysis, we demonstrated that this technique could successfully identify cases of AML with a NPM1 mutation based on cytoplasmic NPM protein staining (at similarity threshold of 1.1 sensitivity 88% and specificity 90%). Combining data of mean fluorescence intensity and % dissimilar staining in a 0-2 scoring system further improved the sensitivity (100%). Imaging flow cytometry has the potential to be included as part of a standard flow cytometry antibody panel to identify potential NPM1 mutations as part of diagnosis and minimal residual disease monitoring. Imaging flow cytometry is an exciting technology that has many possible applications in the diagnosis of hematological malignancies, including the potential to integrate modalities.

    Funded by: Wellcome Trust: 095663

    Cytometry. Part A : the journal of the International Society for Analytical Cytology 2012;81;10;896-900

  • Analyses of pig genomes provide insight into porcine demography and evolution.

    Groenen MA, Archibald AL, Uenishi H, Tuggle CK, Takeuchi Y, Rothschild MF, Rogel-Gaillard C, Park C, Milan D, Megens HJ, Li S, Larkin DM, Kim H, Frantz LA, Caccamo M, Ahn H, Aken BL, Anselmo A, Anthon C, Auvil L, Badaoui B, Beattie CW, Bendixen C, Berman D, Blecha F, Blomberg J, Bolund L, Bosse M, Botti S, Bujie Z, Bystrom M, Capitanu B, Carvalho-Silva D, Chardon P, Chen C, Cheng R, Choi SH, Chow W, Clark RC, Clee C, Crooijmans RP, Dawson HD, Dehais P, De Sapio F, Dibbits B, Drou N, Du ZQ, Eversole K, Fadista J, Fairley S, Faraut T, Faulkner GJ, Fowler KE, Fredholm M, Fritz E, Gilbert JG, Giuffra E, Gorodkin J, Griffin DK, Harrow JL, Hayward A, Howe K, Hu ZL, Humphray SJ, Hunt T, Hornshøj H, Jeon JT, Jern P, Jones M, Jurka J, Kanamori H, Kapetanovic R, Kim J, Kim JH, Kim KW, Kim TH, Larson G, Lee K, Lee KT, Leggett R, Lewin HA, Li Y, Liu W, Loveland JE, Lu Y, Lunney JK, Ma J, Madsen O, Mann K, Matthews L, McLaren S, Morozumi T, Murtaugh MP, Narayan J, Nguyen DT, Ni P, Oh SJ, Onteru S, Panitz F, Park EW, Park HS, Pascal G, Paudel Y, Perez-Enciso M, Ramirez-Gonzalez R, Reecy JM, Rodriguez-Zas S, Rohrer GA, Rund L, Sang Y, Schachtschneider K, Schraiber JG, Schwartz J, Scobie L, Scott C, Searle S, Servin B, Southey BR, Sperber G, Stadler P, Sweedler JV, Tafer H, Thomsen B, Wali R, Wang J, Wang J, White S, Xu X, Yerle M, Zhang G, Zhang J, Zhang J, Zhao S, Rogers J, Churcher C and Schook LB

    Animal Breeding and Genomics Centre, Wageningen University, De Elst 1, 6708 WD, Wageningen, The Netherlands. martien.groenen@wur.nl

    For 10,000 years pigs and humans have shared a close and complex relationship. From domestication to modern breeding practices, humans have shaped the genomes of domestic pigs. Here we present the assembly and analysis of the genome sequence of a female domestic Duroc pig (Sus scrofa) and a comparison with the genomes of wild and domestic pigs from Europe and Asia. Wild pigs emerged in South East Asia and subsequently spread across Eurasia. Our results reveal a deep phylogenetic split between European and Asian wild boars ∼1 million years ago, and a selective sweep analysis indicates selection on genes involved in RNA processing and regulation. Genes associated with immune response and olfaction exhibit fast evolution. Pigs have the largest repertoire of functional olfactory receptor genes, reflecting the importance of smell in this scavenging animal. The pig genome sequence provides an important resource for further improvements of this important livestock species, and our identification of many putative disease-causing variants extends the potential of the pig as a biomedical model.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/E010520/1, BB/E010520/2, BB/G004013/1, BB/H005935/1, BB/I025328/1; European Research Council: 249894; Medical Research Council: G0900950; NCRR NIH HHS: P20-RR017686, R13 RR020283A, R13 RR032267A; NHGRI NIH HHS: R21 HG006464; NIAID NIH HHS: T32 AI083196; NIDA NIH HHS: P30 DA018310, R21 DA027548; NLM NIH HHS: 5 P41 LM006252, 5 P41LM006252; Wellcome Trust: 095908

    Nature 2012;491;7424;393-8

  • Mapping cis- and trans-regulatory effects across multiple tissues in twins.

    Grundberg E, Small KS, Hedman ÅK, Nica AC, Buil A, Keildson S, Bell JT, Yang TP, Meduri E, Barrett A, Nisbett J, Sekowska M, Wilk A, Shin SY, Glass D, Travers M, Min JL, Ring S, Ho K, Thorleifsson G, Kong A, Thorsteindottir U, Ainali C, Dimas AS, Hassanali N, Ingle C, Knowles D, Krestyaninova M, Lowe CE, Di Meglio P, Montgomery SB, Parts L, Potter S, Surdulescu G, Tsaprouni L, Tsoka S, Bataille V, Durbin R, Nestle FO, O'Rahilly S, Soranzo N, Lindgren CM, Zondervan KT, Ahmadi KR, Schadt EE, Stefansson K, Smith GD, McCarthy MI, Deloukas P, Dermitzakis ET, Spector TD and Multiple Tissue Human Expression Resource (MuTHER) Consortium

    Wellcome Trust Sanger Institute, Hinxton, UK.

    Sequence-based variation in gene expression is a key driver of disease risk. Common variants regulating expression in cis have been mapped in many expression quantitative trait locus (eQTL) studies, typically in single tissues from unrelated individuals. Here, we present a comprehensive analysis of gene expression across multiple tissues conducted in a large set of mono- and dizygotic twins that allows systematic dissection of genetic (cis and trans) and non-genetic effects on gene expression. Using identity-by-descent estimates, we show that at least 40% of the total heritable cis effect on expression cannot be accounted for by common cis variants, a finding that reveals the contribution of low-frequency and rare regulatory variants with respect to both transcriptional regulation and complex trait susceptibility. We show that a substantial proportion of gene expression heritability is trans to the structural gene, and we identify several replicating trans variants that act predominantly in a tissue-restricted manner and may regulate the transcription of many genes.

    Funded by: Medical Research Council: G0900339, G9815508; Wellcome Trust: 081917/Z/07/Z, 085235, 090532, 092731

    Nature genetics 2012;44;10;1084-9

  • Chado controller: advanced annotation management with a community annotation system.

    Guignon V, Droc G, Alaux M, Baurens FC, Garsmeur O, Poiron C, Carver T, Rouard M and Bocs S

    CIRAD, UMR AGAP, F-34398 Montpellier, France. valentin.guignon@cirad.fr

    Summary: We developed a controller that is compliant with the Chado database schema, GBrowse and genome annotation-editing tools such as Artemis and Apollo. It enables the management of public and private data, monitors manual annotation (with controlled vocabularies, structural and functional annotation controls) and stores versions of annotation for all modified features. The Chado controller uses PostgreSQL and Perl.

    Availability: The Chado Controller package is available for download at http://www.gnpannot.org/content/chado-controller and runs on any Unix-like operating system, and documentation is available at http://www.gnpannot.org/content/chado-controller-doc The system can be tested using the GNPAnnot Sandbox at http://www.gnpannot.org/content/gnpannot-sandbox-form

    Contact: valentin.guignon@cirad.fr; stephanie.sidibe-bocs@cirad.fr

    Supplementary data are available at Bioinformatics online.

    Bioinformatics (Oxford, England) 2012;28;7;1054-6

  • Lipoprotein(a) and risk of coronary, cerebrovascular, and peripheral artery disease: the EPIC-Norfolk prospective population study.

    Gurdasani D, Sjouke B, Tsimikas S, Hovingh GK, Luben RN, Wainwright NW, Pomilla C, Wareham NJ, Khaw KT, Boekholdt SM and Sandhu MS

    Department of Public Health and Primary Care, Institute of Public Health, University of Cambridge, Cambridge, United Kingdom.

    Objective: Although the association between circulating levels of lipoprotein(a) [Lp(a)] and risk of coronary artery disease (CAD) and stroke is well established, its role in risk of peripheral arterial disease (PAD) remains unclear. Here, we examine the association between Lp(a) levels and PAD in a large prospective cohort. To contextualize these findings, we also examined the association between Lp(a) levels and risk of stroke and CAD and studied the role of low-density lipoprotein as an effect modifier of Lp(a)-associated cardiovascular risk.

    Lp(a) levels were measured in apparently healthy participants in the European Prospective Investigation of Cancer (EPIC)-Norfolk cohort. Cox regression was used to quantify the association between Lp(a) levels and risk of PAD, stroke, and CAD outcomes. During 212 981 person-years at risk, a total of 2365 CAD, 284 ischemic stroke, and 596 PAD events occurred in 18 720 participants. Lp(a) was associated with PAD and CAD outcomes but not with ischemic stroke (hazard ratio per 2.7-fold increase in Lp(a) of 1.37, 95% CI 1.25-1.50, 1.13, 95% CI 1.04-1.22 and 0.91, 95% CI 0.79-1.03, respectively). Low-density lipoprotein cholesterol levels did not modify these associations.

    Conclusions: Lp(a) levels were associated with future PAD and CAD events. The association between Lp(a) and cardiovascular disease was not modified by low-density lipoprotein cholesterol levels.

    Funded by: Medical Research Council: G0801566

    Arteriosclerosis, thrombosis, and vascular biology 2012;32;12;3058-65

  • Targeted deletion of microRNA-22 promotes stress-induced cardiac dilation and contractile dysfunction.

    Gurha P, Abreu-Goodger C, Wang T, Ramirez MO, Drumond AL, van Dongen S, Chen Y, Bartonicek N, Enright AJ, Lee B, Kelm RJ, Reddy AK, Taffet GE, Bradley A, Wehrens XH, Entman ML and Rodriguez A

    Baylor College of Medicine, Department of Molecular and Human Genetics, One Baylor Plaza, Houston, TX, 77030, USA.

    Background: Delineating the role of microRNAs (miRNAs) in the posttranscriptional gene regulation offers new insights into how the heart adapts to pathological stress. We developed a knockout of miR-22 in mice and investigated its function in the heart. Here, we show that miR-22-deficient mice are impaired in inotropic and lusitropic response to acute stress by dobutamine. Furthermore, the absence of miR-22 sensitized mice to cardiac decompensation and left ventricular dilation after long-term stimulation by pressure overload. Calcium transient analysis revealed reduced sarcoplasmic reticulum Ca(2+) load in association with repressed sarcoplasmic reticulum Ca(2+) ATPase activity in mutant myocytes. Genetic ablation of miR-22 also led to a decrease in cardiac expression levels for Serca2a and muscle-restricted genes encoding proteins in the vicinity of the cardiac Z disk/titin cytoskeleton. These phenotypes were attributed in part to inappropriate repression of serum response factor activity in stressed hearts. Global analysis revealed increased expression of the transcriptional/translational repressor purine-rich element binding protein B, a highly conserved miR-22 target implicated in the negative control of muscle expression. Conclusion: These data indicate that miR-22 functions as an integrator of Ca(2+) homeostasis and myofibrillar protein content during stress in the heart and shed light on the mechanisms that enhance propensity toward heart failure.

    Funded by: NHLBI NIH HHS: HL089598, HL089792, HL091947, K25 HL73041, R01 HL22512

    Circulation 2012;125;22;2751-61

  • Afghanistan's ethnic groups share a Y-chromosomal heritage structured by historical events.

    Haber M, Platt DE, Ashrafian Bonab M, Youhanna SC, Soria-Hernanz DF, Martínez-Cruz B, Douaihy B, Ghassibe-Sabbagh M, Rafatpanah H, Ghanbari M, Whale J, Balanovsky O, Wells RS, Comas D, Tyler-Smith C, Zalloua PA and Genographic Consortium

    The Lebanese American University, Chouran, Beirut, Lebanon.

    Afghanistan has held a strategic position throughout history. It has been inhabited since the Paleolithic and later became a crossroad for expanding civilizations and empires. Afghanistan's location, history, and diverse ethnic groups present a unique opportunity to explore how nations and ethnic groups emerged, and how major cultural evolutions and technological developments in human history have influenced modern population structures. In this study we have analyzed, for the first time, the four major ethnic groups in present-day Afghanistan: Hazara, Pashtun, Tajik, and Uzbek, using 52 binary markers and 19 short tandem repeats on the non-recombinant segment of the Y-chromosome. A total of 204 Afghan samples were investigated along with more than 8,500 samples from surrounding populations important to Afghanistan's history through migrations and conquests, including Iranians, Greeks, Indians, Middle Easterners, East Europeans, and East Asians. Our results suggest that all current Afghans largely share a heritage derived from a common unstructured ancestral population that could have emerged during the Neolithic revolution and the formation of the first farming communities. Our results also indicate that inter-Afghan differentiation started during the Bronze Age, probably driven by the formation of the first civilizations in the region. Later migrations and invasions into the region have been assimilated differentially among the ethnic groups, increasing inter-population genetic differences, and giving the Afghans a unique genetic diversity in Central Asia.

    Funded by: Wellcome Trust

    PloS one 2012;7;3;e34288

  • Toward a roadmap in global biobanking for health.

    Harris JR, Burton P, Knoppers BM, Lindpaintner K, Bledsoe M, Brookes AJ, Budin-Ljøsne I, Chisholm R, Cox D, Deschênes M, Fortier I, Hainaut P, Hewitt R, Kaye J, Litton JE, Metspalu A, Ollier B, Palmer LJ, Palotie A, Pasterk M, Perola M, Riegman PH, van Ommen GJ, Yuille M and Zatloukal K

    Department of Genes and Environment, Division of Epidemiology, The Norwegian Institute of Public Health, Oslo, Norway.

    Biobanks can have a pivotal role in elucidating disease etiology, translation, and advancing public health. However, meeting these challenges hinges on a critical shift in the way science is conducted and requires biobank harmonization. There is growing recognition that a common strategy is imperative to develop biobanking globally and effectively. To help guide this strategy, we articulate key principles, goals, and priorities underpinning a roadmap for global biobanking to accelerate health science, patient care, and public health. The need to manage and share very large amounts of data has driven innovations on many fronts. Although technological solutions are allowing biobanks to reach new levels of integration, increasingly powerful data-collection tools, analytical techniques, and the results they generate raise new ethical and legal issues and challenges, necessitating a reconsideration of previous policies, practices, and ethical norms. These manifold advances and the investments that support them are also fueling opportunities for biobanks to ultimately become integral parts of health-care systems in many countries. International harmonization to increase interoperability and sustainability are two strategic priorities for biobanking. Tackling these issues requires an environment favorably inclined toward scientific funding and equipped to address socio-ethical challenges. Cooperation and collaboration must extend beyond systems to enable the exchange of data and samples to strategic alliances between many organizations, including governmental bodies, funding agencies, public and private science enterprises, and other stakeholders, including patients. A common vision is required and we articulate the essential basis of such a vision herein.European Journal of Human Genetics advance online publication, 20 June 2012; doi:10.1038/ejhg.2012.96.

    European journal of human genetics : EJHG 2012

  • Whole-genome analysis of diverse Chlamydia trachomatis strains identifies phylogenetic relationships masked by current clinical typing.

    Harris SR, Clarke IN, Seth-Smith HM, Solomon AW, Cutcliffe LT, Marsh P, Skilton RJ, Holland MJ, Mabey D, Peeling RW, Lewis DA, Spratt BG, Unemo M, Persson K, Bjartling C, Brunham R, de Vries HJ, Morré SA, Speksnijder A, Bébéar CM, Clerc M, de Barbeyrac B, Parkhill J and Thomson NR

    Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. sh16@sanger.ac.uk

    Chlamydia trachomatis is responsible for both trachoma and sexually transmitted infections, causing substantial morbidity and economic cost globally. Despite this, our knowledge of its population and evolutionary genetics is limited. Here we present a detailed phylogeny based on whole-genome sequencing of representative strains of C. trachomatis from both trachoma and lymphogranuloma venereum (LGV) biovars from temporally and geographically diverse sources. Our analysis shows that predicting phylogenetic structure using ompA, which is traditionally used to classify Chlamydia, is misleading because extensive recombination in this region masks any true relationships present. We show that in many instances, ompA is a chimera that can be exchanged in part or as a whole both within and between biovars. We also provide evidence for exchange of, and recombination within, the cryptic plasmid, which is another key diagnostic target. We used our phylogenetic framework to show how genetic exchange has manifested itself in ocular, urogenital and LGV C. trachomatis strains, including the epidemic LGV serotype L2b.

    Funded by: Wellcome Trust: 080348, 089472, 098051

    Nature genetics 2012;44;4;413-9, S1

  • GENCODE: the reference human genome annotation for The ENCODE Project.

    Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S, Barnes I, Bignell A, Boychenko V, Hunt T, Kay M, Mukherjee G, Rajan J, Despacio-Reyes G, Saunders G, Steward C, Harte R, Lin M, Howald C, Tanzer A, Derrien T, Chrast J, Walters N, Balasubramanian S, Pei B, Tress M, Rodriguez JM, Ezkurdia I, van Baren J, Brent M, Haussler D, Kellis M, Valencia A, Reymond A, Gerstein M, Guigó R and Hubbard TJ

    Wellcome Trust Sanger Institute, Wellcome Trust Campus, Hinxton, Cambridge CB10 1SA, United Kingdom. jla1@sanger.ac.uk

    The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.

    Funded by: NHGRI NIH HHS: 5U54HG004555; Wellcome Trust: 095908, WT098051

    Genome research 2012;22;9;1760-74

  • Tracking and coordinating an international curation effort for the CCDS Project.

    Harte RA, Farrell CM, Loveland JE, Suner MM, Wilming L, Aken B, Barrell D, Frankish A, Wallin C, Searle S, Diekhans M, Harrow J and Pruitt KD

    Center for Biomolecular Science and Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA.

    The Consensus Coding Sequence (CCDS) collaboration involves curators at multiple centers with a goal of producing a conservative set of high quality, protein-coding region annotations for the human and mouse reference genome assemblies. The CCDS data set reflects a 'gold standard' definition of best supported protein annotations, and corresponding genes, which pass a standard series of quality assurance checks and are supported by manual curation. This data set supports use of genome annotation information by human and mouse researchers for effective experimental design, analysis and interpretation. The CCDS project consists of analysis of automated whole-genome annotation builds to identify identical CDS annotations, quality assurance testing and manual curation support. Identical CDS annotations are tracked with a CCDS identifier (ID) and any future change to the annotated CDS structure must be agreed upon by the collaborating members. CCDS curation guidelines were developed to address some aspects of curation in order to improve initial annotation consistency and to reduce time spent in discussing proposed annotation updates. Here, we present the current status of the CCDS database and details on our procedures to track and coordinate our efforts. We also present the relevant background and reasoning behind the curation standards that we have developed for CCDS database treatment of transcripts that are nonsense-mediated decay (NMD) candidates, for transcripts containing upstream open reading frames, for identifying the most likely translation start codons and for the annotation of readthrough transcripts. Examples are provided to illustrate the application of these guidelines. DATABASE URL: http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi.

    Funded by: NHGRI NIH HHS: 5U54 HG004555, 5U54HG00455-04; Wellcome Trust: 095908, WT062023, WT077198

    Database : the journal of biological databases and curation 2012;2012;bas008

  • The future of technologies for personalised medicine.

    Harvey A, Brand A, Holgate ST, Kristiansen LV, Lehrach H, Palotie A and Prainsack B

    Finchley Court, Norwich, UK.

    Personalised medicine promises prediction, prevention and treatment of illness that is targeted to individuals’ needs. New technologies for detailed biological profiling of individuals at the molecular level have been crucial in initiating the move to personalised medicine; further novel technologies will be necessary if the vision is to become a reality. We will need to develop new technologies to collect and analyse data in a way that is not just linear but integrated (understanding system level functioning) and dynamic (understanding system in flux). Key factors in the development of technologies for personalised medicine are standardisation, integration and harmonisation. For example, the tools and processes for data collection and analysis must be standardised across research sites. Research activity at different sites must be integrated to maximise synergies, and scientific research must be integrated with healthcare to ensure effective translation. There must also be harmonisation between scientific practices in different research sites, between science and healthcare and between science, healthcare and wider society, including the ethical and regulatory frameworks, the prevailing political and cultural ethos and the expectations of patients/citizens.

    New biotechnology 2012;29;6;625-33

  • Comparative genomic analyses of the Taylorellae.

    Hauser H, Richter DC, van Tonder A, Clark L and Preston A

    Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK.

    Contagious equine metritis (CEM) is an important venereal disease of horses that is of concern to the thoroughbred industry. Taylorella equigenitalis is a causative agent of CEM but very little is known about it or its close relative Taylorella asinigenitalis. To reveal novel information about Taylorella biology, comparative genomic analyses were undertaken. Whole genome sequencing was performed for the T. equigenitalis type strain, NCTC11184. Draft genome sequences were produced for a second T. equigenitalis strain and for a strain of T. asinigenitalis. These genome sequences were analysed and compared to each other and the recently released genome sequence of T. equigenitalis MCE9. These analyses revealed that T. equigenitalis strains appear to be very similar to each other with relatively little strain-specific DNA content. A number of genes were identified that encode putative toxins and adhesins that are possibly involved in infection. Analysis of T. asinigenitalis revealed that it has a very similar gene repertoire to that of T. equigenitalis but shares surprisingly little DNA sequence identity with it. The generation of genome sequence information greatly increases knowledge of these poorly characterised bacteria and greatly facilitates study of them.

    Veterinary microbiology 2012;159;1-2;195-203

  • Comparison of the Exomes of Common Carp (Cyprinus carpio) and Zebrafish (Danio rerio).

    Henkel CV, Dirks RP, Jansen HJ, Forlenza M, Wiegertjes GF, Howe K, van den Thillart GE and Spaink HP

    1 ZF-Screens B.V. , Leiden, The Netherlands .

    Abstract Research on common carp, Cyprinus carpio, is beneficial for zebrafish research because of resources available owing to its large body size, such as the availability of sufficient organ material for transcriptomics, proteomics, and metabolomics. Here we describe the shot gun sequencing of a clonal double-haploid common carp line. The assembly consists of 511891 scaffolds with an N50 of 17 kb, predicting a total genome size of 1.4-1.5 Gb. A detailed analysis of the ten largest scaffolds indicates that the carp genome has a considerably lower repeat coverage than zebrafish, whilst the average intron size is significantly smaller, making it comparable to the fugu genome. The quality of the scaffolding was confirmed by comparisons with RNA deep sequencing data sets and a manual analysis for synteny with the zebrafish, especially the Hox gene clusters. In the ten largest scaffolds analyzed, the synteny of genes is almost complete. Comparisons of predicted exons of common carp with those of the zebrafish revealed only few genes specific for either zebrafish or carp, most of these being of unknown function. This supports the hypothesis of an additional genome duplication event in the carp evolutionary history, which-due to a higher degree of compactness-did not result in a genome larger than that of zebrafish.

    Zebrafish 2012;9;2;59-67

  • Next-generation sequencing and large genome assemblies.

    Henson J, Tischler G and Ning Z

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    The next-generation sequencing (NGS) revolution has drastically reduced time and cost requirements for sequencing of large genomes, and also qualitatively changed the problem of assembly. This article reviews the state of the art in de novo genome assembly, paying particular attention to mammalian-sized genomes. The strengths and weaknesses of the main sequencing platforms are highlighted, leading to a discussion of assembly and the new challenges associated with NGS data. Current approaches to assembly are outlined and the various software packages available are introduced and compared. The question of whether quality assemblies can be produced using short-read NGS data alone, or whether it must be combined with more expensive sequencing techniques, is considered. Prospects for future assemblers and tests of assembly performance are also discussed.

    Pharmacogenomics 2012;13;8;901-15

  • Interactions between PPAR-α and inflammation-related cytokine genes on the development of Alzheimer's disease, observed by the Epistasis Project.

    Heun R, Kölsch H, Ibrahim-Verbaas CA, Combarros O, Aulchenko YS, Breteler M, Schuur M, van Duijn CM, Hammond N, Belbin O, Cortina-Borja M, Wilcock GK, Brown K, Barber R, Kehoe PG, Coto E, Alvarez V, Lehmann MG, Deloukas P, Mateo I, Morgan K, Warden DR, Smith AD and Lehmann DJ

    Department of Psychiatry, University of Bonn, Bonn, Germany; 2Department of Psychiatry, Royal Derby Hospital,Uttoxeter Road, Derby DE22 3WQ, UK.

    Objective: Neuroinflammation contributes to the pathogenesis of sporadic Alzheimer's disease (AD). Variations in genes relevant to inflammation may be candidate genes for AD risk. Whole-genome association studies have identified relevant new and known genes. Their combined effects do not explain 100% of the risk, genetic interactions may contribute. We investigated whether genes involved in inflammation, i.e. PPAR-α, interleukins (IL) IL- 1α, IL-1β, IL-6, and IL-10 may interact to increase AD risk.

    Methods: The Epistasis Project identifies interactions that affect the risk of AD. Genotyping of single nucleotide polymorphisms (SNPs) in PPARA, IL1A, IL1B, IL6 and IL10 was performed. Possible associations were analyzed by fitting logistic regression models with AD as outcome, controlling for centre, age, sex and presence of apolipoprotein ε4 allele (APOEε4). Adjusted synergy factors were derived from interaction terms (p<0.05 two-sided).

    Results: We observed four significant interactions between different SNPs in PPARA and in interleukins IL1A, IL1B, IL10 that may affect AD risk. There were no significant interactions between PPARA and IL6.

    Conclusions: In addition to an association of the PPARA L162V polymorphism with the AD risk, we observed four significant interactions between SNPs in PPARA and SNPs in IL1A, IL1B and IL10 affecting AD risk. We prove that gene-gene interactions explain part of the heritability of AD and are to be considered when assessing the genetic risk. Necessary replications will require between 1450 and 2950 of both cases and controls, depending on the prevalence of the SNP, to have 80% power to detect the observed synergy factors.

    International journal of molecular epidemiology and genetics 2012;3;1;39-47

  • A review of the role of stem cells in the development and treatment of glioma.

    Heywood RM, Marcus HJ, Ryan DJ, Piccirillo SG, Al-Mayhani TM and Watts C

    Centre for Brain Repair, Department Clinical Neurosciences, Cambridge University, E.D. Adrian Building, Forvie Site Hills Road, Cambridge, CB2 0PY, UK.

    The neurosurgical management of patients with intrinsic glial cancers is one of the most rapidly evolving areas of practice. This has been fuelled by advances in surgical technique not only in cytoreduction but also in drug delivery. Further innovation will depend on a deeper understanding of the biology of the disease and an appreciation of the limitations of current knowledge. Here we review the controversial topic of cancer stem cells applied to glioma to provide neurosurgeons with a working overview. It is now recognised that the adult human brain contains regionally specified cell populations capable of self-renewal that may contribute to tumour growth and maintenance following accumulated mutational change. Tumour cells adapted to maintain growth demonstrate some stem-like characteristics and as such constitute a legitimate therapeutic target. Cellular reprogramming technologies raise the potential of developing stem cells as novel surgical tools to target disease and possibly ameliorate some of the consequences of treatment. Achieving these goals remains a significant challenge to neurosurgical oncologists, not least in challenging how we think about treating brain cancer. This review will briefly examine our understanding of adult stem cells within the brain, the evidence that they contribute to the development of brain tumours as tumour-initiating cells, and the potential implications for therapy. It will also look at the role stem cells may play in the future management of glioma.

    Acta neurochirurgica 2012;154;6;951-69

  • Genome mapping and genomics of caenorhabditis elegans

    Hodgkin,J., PAULINI,M. and TULI,M.A.

    Genome Mapping and Genomics in Animals 2012;4;17-30

  • Discovery of new treatments in the context of delivering personalized medicine

    Holgate ST, PALOTIE A

    Personalized Medicine. 2012;9;101-4

  • Producing parasitic helminth reference and draft genomes at the Wellcome Trust Sanger Institute.

    Holroyd N and Sanchez-Flores A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. neh@sanger.ac.uk

    The Wellcome Trust Sanger Institute (WTSI) is producing de novo reference quality genomes for parasitic helminth species from platyhelminth tapeworms (cestodes), flukes (trematodes) and roundworms (nematodes) primarily using second-generation (Illumina and 454) sequencing technologies. The reference genomes will be followed with draft coverage from a number of related strains or species. Comparing species- or strain-specific differences will help to unravel the genomic basis for differences in the organism's biology and ultimately contribute towards identifying potential novel targets for vaccine therapies. Second-generation sequencing technologies are revolutionizing parasite genomics. This article reviews the impact that sequencing technologies has had on genomics and how it has shaped the parasitic helminth genome sequencing initiative at WTSI.

    Parasite immunology 2012;34;2-3;100-7

  • Shigella sonnei genome sequencing and phylogenetic analysis indicate recent global dissemination from Europe.

    Holt KE, Baker S, Weill FX, Holmes EC, Kitchen A, Yu J, Sangal V, Brown DJ, Coia JE, Kim DW, Choi SY, Kim SH, da Silveira WD, Pickard DJ, Farrar JJ, Parkhill J, Dougan G and Thomson NR

    Department of Microbiology and Immunology, University of Melbourne, Victoria, Australia. kholt@unimelb.edu.au

    Shigella are human-adapted Escherichia coli that have gained the ability to invade the human gut mucosa and cause dysentery(1,2), spreading efficiently via low-dose fecal-oral transmission(3,4). Historically, S. sonnei has been predominantly responsible for dysentery in developed countries but is now emerging as a problem in the developing world, seeming to replace the more diverse Shigella flexneri in areas undergoing economic development and improvements in water quality(4-6). Classical approaches have shown that S. sonnei is genetically conserved and clonal(7). We report here whole-genome sequencing of 132 globally distributed isolates. Our phylogenetic analysis shows that the current S. sonnei population descends from a common ancestor that existed less than 500 years ago and that diversified into several distinct lineages with unique characteristics. Our analysis suggests that the majority of this diversification occurred in Europe and was followed by more recent establishment of local pathogen populations on other continents, predominantly due to the pandemic spread of a single, rapidly evolving, multidrug-resistant lineage.

    Funded by: Medical Research Council: G0800173, G0800173(86345); Wellcome Trust: 0689

    Nature genetics 2012;44;9;1056-9

  • High-resolution genotyping of the endemic Salmonella Typhi population during a Vi (typhoid) vaccination trial in Kolkata.

    Holt KE, Dutta S, Manna B, Bhattacharya SK, Bhaduri B, Pickard DJ, Ochiai RL, Ali M, Clemens JD and Dougan G

    Department of Microbiology and Immunology, University of Melbourne, Melbourne, Australia. kholt@unimelb.edu.au

    Background: Typhoid fever, caused by Salmonella enterica serovar Typhi (S. Typhi), is a major health problem especially in developing countries. Vaccines against typhoid are commonly used by travelers but less so by residents of endemic areas.

    Methodology: We used single nucleotide polymorphism (SNP) typing to investigate the population structure of 372 S. Typhi isolated during a typhoid disease burden study and Vi vaccine trial in Kolkata, India. Approximately sixty thousand people were enrolled for fever surveillance for 19 months prior to, and 24 months following, Vi vaccination of one third of the study population (May 2003-December 2006, vaccinations given December 2004).

    A diverse S. Typhi population was detected, including 21 haplotypes. The most common were of the H58 haplogroup (69%), which included all multidrug resistant isolates (defined as resistance to chloramphenicol, ampicillin and co-trimoxazole). Quinolone resistance was particularly high among H58-G isolates (97% Nalidixic acid resistant, 30% with reduced susceptibility to ciprofloxacin). Multiple typhoid fever episodes were detected in 22 households, however household clustering was not associated with specific S. Typhi haplotypes.

    Conclusions: Typhoid fever in Kolkata is caused by a diverse population of S. Typhi, however H58 haplotypes dominate and are associated with multidrug and quinolone resistance. Vi vaccination did not obviously impact on the haplotype population structure of the S. Typhi circulating during the study period.

    Funded by: Wellcome Trust

    PLoS neglected tropical diseases 2012;6;1;e1490

  • Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome.

    Howald C, Tanzer A, Chrast J, Kokocinski F, Derrien T, Walters N, Gonzalez JM, Frankish A, Aken BL, Hourlier T, Vogel JH, White S, Searle S, Harrow J, Hubbard TJ, Guigó R and Reymond A

    Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.

    Within the ENCODE Consortium, GENCODE aimed to accurately annotate all protein-coding genes, pseudogenes, and noncoding transcribed loci in the human genome through manual curation and computational methods. Annotated transcript structures were assessed, and less well-supported loci were systematically, experimentally validated. Predicted exon-exon junctions were evaluated by RT-PCR amplification followed by highly multiplexed sequencing readout, a method we called RT-PCR-seq. Seventy-nine percent of all assessed junctions are confirmed by this evaluation procedure, demonstrating the high quality of the GENCODE gene set. RT-PCR-seq was also efficient to screen gene models predicted using the Human Body Map (HBM) RNA-seq data. We validated 73% of these predictions, thus confirming 1168 novel genes, mostly noncoding, which will further complement the GENCODE annotation. Our novel experimental validation pipeline is extremely sensitive, far more than unbiased transcriptome profiling through RNA sequencing, which is becoming the norm. For example, exon-exon junctions unique to GENCODE annotated transcripts are five times more likely to be corroborated with our targeted approach than with extensive large human transcriptome profiling. Data sets such as the HBM and ENCODE RNA-seq data fail sampling of low-expressed transcripts. Our RT-PCR-seq targeted approach also has the advantage of identifying novel exons of known genes, as we discovered unannotated exons in ~11% of assessed introns. We thus estimate that at least 18% of known loci have yet-unannotated exons. Our work demonstrates that the cataloging of all of the genic elements encoded in the human genome will necessitate a coordinated effort between unbiased and targeted approaches, like RNA-seq and RT-PCR-seq.

    Funded by: NHGRI NIH HHS: U54 HG004555, U54 HG004557; Wellcome Trust: 095908, WT077198/Z/05/Z

    Genome research 2012;22;9;1698-710

  • WormBase: Annotating many nematode genomes.

    Howe K, Davis P, Paulini M, Tuli MA, Williams G, Yook K, Durbin R, Kersey P and Sternberg PW

    European Bioinformatics Institute; Wellcome Trust Genome Campus; Hinxton, Cambridge UK.

    WormBase (www.wormbase.org) has been serving the scientific community for over 11 years as the central repository for genomic and genetic information for the soil nematode Caenorhabditis elegans. The resource has evolved from its beginnings as a database housing the genomic sequence and genetic and physical maps of a single species, and now represents the breadth and diversity of nematode research, currently serving genome sequence and annotation for around 20 nematodes. In this article, we focus on WormBase's role of genome sequence annotation, describing how we annotate and integrate data from a growing collection of nematode species and strains. We also review our approaches to sequence curation, and discuss the impact on annotation quality of large functional genomics projects such as modENCODE.

    Worm 2012;1;1;15-21

  • Exploration of signals of positive selection derived from genotype-based human genome scans using re-sequencing data.

    Hu M, Ayub Q, Guerra-Assunção JA, Long Q, Ning Z, Huang N, Romero IG, Mamanova L, Akan P, Liu X, Coffey AJ, Turner DJ, Swerdlow H, Burton J, Quail MA, Conrad DF, Enright AJ, Tyler-Smith C and Xue Y

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK.

    We have investigated whether regions of the genome showing signs of positive selection in scans based on haplotype structure also show evidence of positive selection when sequence-based tests are applied, whether the target of selection can be localized more precisely, and whether such extra evidence can lead to increased biological insights. We used two tools: simulations under neutrality or selection, and experimental investigation of two regions identified by the HapMap2 project as putatively selected in human populations. Simulations suggested that neutral and selected regions should be readily distinguished and that it should be possible to localize the selected variant to within 40 kb at least half of the time. Re-sequencing of two ~300 kb regions (chr4:158Mb and chr10:22Mb) lacking known targets of selection in HapMap CHB individuals provided strong evidence for positive selection within each and suggested the micro-RNA gene hsa-miR-548c as the best candidate target in one region, and changes in regulation of the sperm protein gene SPAG6 in the other.

    Funded by: Wellcome Trust: 077009

    Human genetics 2012;131;5;665-74

  • 1000 Genomes-based imputation identifies novel and refined associations for the Wellcome Trust Case Control Consortium phase 1 Data.

    Huang J, Ellinghaus D, Franke A, Howie B and Li Y

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    We hypothesize that imputation based on data from the 1000 Genomes Project can identify novel association signals on a genome-wide scale due to the dense marker map and the large number of haplotypes. To test the hypothesis, the Wellcome Trust Case Control Consortium (WTCCC) Phase I genotype data were imputed using 1000 genomes as reference (20100804 EUR), and seven case/control association studies were performed using imputed dosages. We observed two 'missed' disease-associated variants that were undetectable by the original WTCCC analysis, but were reported by later studies after the 2007 WTCCC publication. One is within the IL2RA gene for association with type 1 diabetes and the other in proximity with the CDKN2B gene for association with type 2 diabetes. We also identified two refined associations. One is SNP rs11209026 in exon 9 of IL23R for association with Crohn's disease, which is predicted to be probably damaging by PolyPhen2. The other refined variant is in the CUX2 gene region for association with type 1 diabetes, where the newly identified top SNP rs1265564 has an association P-value of 1.68 × 10(-16). The new lead SNP for the two refined loci provides a more plausible explanation for the disease association. We demonstrated that 1000 Genomes-based imputation could indeed identify both novel (in our case, 'missed' because they were detected and replicated by studies after 2007) and refined signals. We anticipate the findings derived from this study to provide timely information when individual groups and consortia are beginning to engage in 1000 genomes-based imputation.

    Funded by: NCI NIH HHS: R01 CA082659-11S1

    European journal of human genetics : EJHG 2012;20;7;801-5

  • Genome-wide association study for circulating levels of PAI-1 provides novel insights into its regulation.

    Huang J, Sabater-Lleal M, Asselbergs FW, Tregouet D, Shin SY, Ding J, Baumert J, Oudot-Mellakh T, Folkersen L, Johnson AD, Smith NL, Williams SM, Ikram MA, Kleber ME, Becker DM, Truong V, Mychaleckyj JC, Tang W, Yang Q, Sennblad B, Moore JH, Williams FM, Dehghan A, Silbernagel G, Schrijvers EM, Smith S, Karakas M, Tofler GH, Silveira A, Navis GJ, Lohman K, Chen MH, Peters A, Goel A, Hopewell JC, Chambers JC, Saleheen D, Lundmark P, Psaty BM, Strawbridge RJ, Boehm BO, Carter AM, Meisinger C, Peden JF, Bis JC, McKnight B, Öhrvik J, Taylor K, Franzosi MG, Seedorf U, Collins R, Franco-Cereceda A, Syvänen AC, Goodall AH, Yanek LR, Cushman M, Müller-Nurasyid M, Folsom AR, Basu S, Matijevic N, van Gilst WH, Kooner JS, Hofman A, Danesh J, Clarke R, Meigs JB, DIAGRAM Consortium, Kathiresan S, Reilly MP, CARDIoGRAM Consortium, Klopp N, Harris TB, Winkelmann BR, Grant PJ, Hillege HL, Watkins H, C4D Consortium, Spector TD, Becker LC, Tracy RP, März W, Uitterlinden AG, Eriksson P, Cambien F, CARDIOGENICS Consortium, Morange PE, Koenig W, Soranzo N, van der Harst P, Liu Y, O'Donnell CJ and Hamsten A

    National Heart, Lung, and Blood Institute (NHBLI) Framingham Heart Study, Framingham, MA 01702, USA.

    We conducted a genome-wide association study to identify novel associations between genetic variants and circulating plasminogen activator inhibitor-1 (PAI-1) concentration, and examined functional implications of variants and genes that were discovered. A discovery meta-analysis was performed in 19 599 subjects, followed by replication analysis of genome-wide significant (P < 5 × 10(-8)) single nucleotide polymorphisms (SNPs) in 10 796 independent samples. We further examined associations with type 2 diabetes and coronary artery disease, assessed the functional significance of the SNPs for gene expression in human tissues, and conducted RNA-silencing experiments for one novel association. We confirmed the association of the 4G/5G proxy SNP rs2227631 in the promoter region of SERPINE1 (7q22.1) and discovered genome-wide significant associations at 3 additional loci: chromosome 7q22.1 close to SERPINE1 (rs6976053, discovery P = 3.4 × 10(-10)); chromosome 11p15.2 within ARNTL (rs6486122, discovery P = 3.0 × 10(-8)); and chromosome 3p25.2 within PPARG (rs11128603, discovery P = 2.9 × 10(-8)). Replication was achieved for the 7q22.1 and 11p15.2 loci. There was nominal association with type 2 diabetes and coronary artery disease at ARNTL (P < .05). Functional studies identified MUC3 as a candidate gene for the second association signal on 7q22.1. In summary, SNPs in SERPINE1 and ARNTL and an SNP associated with the expression of MUC3 were robustly associated with circulating levels of PAI-1.

    Funded by: Biotechnology and Biological Sciences Research Council; British Heart Foundation: RG/08/014/24067; Department of Health; Medical Research Council: G0700931; NCRR NIH HHS: M01 RR00052, RR-024156, RR018787, UL1-RR-025005; NHGRI NIH HHS: U01-HG-004402; NHLBI NIH HHS: 1U01 HL072518, HL080295, HL087652, HL105756, HL65234, HL67466, N01 HC-15103, N01 HC-55222, N01 HC085086, N01 HC095166, N01-HC-25195, N01-HC-35129, N01-HC-45133, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-75150, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85084, N01-HC-85085, N01-HC-85086, N01-HC-85239, N01-HC-95159, N01-HC-95160, N01-HC-95161, N01-HC-95162, N01-HC-95163, N01-HC-95164, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, N02-HL-6-4278, P01 HL074940, R01 HL085251, R01 HL087652, R01 HL092196, R01 HL095603, R01 HL095796, R01 HL59684, R01-HL-086694, R01-HL-087641, R01-HL-59367, R01-HL59367, R01HL071051, R01HL071205, R01HL071250, R01HL071251, R01HL071252, R01HL071258, R01HL071259, U01 HL072518, U01 HL080295; NIA NIH HHS: 1R01AG032098-01A1, AG-023629, AG-027058, AG-15928, AG-20098, N01AG62101, N01AG62103, N01AG62106; NIDDK NIH HHS: DK063491, K24 DK080140; NIGMS NIH HHS: T32 GM080178; NLM NIH HHS: LM010098, R01 LM010098; PHS HHS: HHSN268200625226C, HHSN268200782096C, HHSN268201200036C; Wellcome Trust

    Blood 2012;120;24;4873-81

  • MED12 controls the response to multiple cancer drugs through regulation of TGF-β receptor signaling.

    Huang S, Hölzel M, Knijnenburg T, Schlicker A, Roepman P, McDermott U, Garnett M, Grernrum W, Sun C, Prahallad A, Groenendijk FH, Mittempergher L, Nijkamp W, Neefjes J, Salazar R, Ten Dijke P, Uramoto H, Tanaka F, Beijersbergen RL, Wessels LF and Bernards R

    Division of Molecular Carcinogenesis, Cancer Genomics Center and Cancer Systems Biology Center, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands.

    Inhibitors of the ALK and EGF receptor tyrosine kinases provoke dramatic but short-lived responses in lung cancers harboring EML4-ALK translocations or activating mutations of EGFR, respectively. We used a large-scale RNAi screen to identify MED12, a component of the transcriptional MEDIATOR complex that is mutated in cancers, as a determinant of response to ALK and EGFR inhibitors. MED12 is in part cytoplasmic where it negatively regulates TGF-βR2 through physical interaction. MED12 suppression therefore results in activation of TGF-βR signaling, which is both necessary and sufficient for drug resistance. TGF-β signaling causes MEK/ERK activation, and consequently MED12 suppression also confers resistance to MEK and BRAF inhibitors in other cancers. MED12 loss induces an EMT-like phenotype, which is associated with chemotherapy resistance in colon cancer patients and to gefitinib in lung cancer. Inhibition of TGF-βR signaling restores drug responsiveness in MED12(KD) cells, suggesting a strategy to treat drug-resistant tumors that have lost MED12.

    Funded by: Wellcome Trust: 093868

    Cell 2012;151;5;937-50

  • Isolation of homozygous mutant mouse embryonic stem cells using a dual selection system.

    Huang Y, Pettitt SJ, Guo G, Liu G, Li MA, Yang F and Bradley A

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. huangyue@pumc.edu.cn

    Obtaining random homozygous mutants in mammalian cells for forward genetic studies has always been problematic due to the diploid genome. With one mutation per cell, only one allele of an autosomal gene can be disrupted, and the resulting heterozygous mutant is unlikely to display a phenotype. In cells with a genetic background deficient for the Bloom's syndrome helicase, such heterozygous mutants segregate homozygous daughter cells at a low frequency due to an elevated rate of crossover following mitotic recombination between homologous chromosomes. We constructed DNA vectors that are selectable based on their copy number and used these to isolate these rare homozygous mutant cells independent of their phenotype. We use the piggyBac transposon to limit the initial mutagenesis to one copy per cell, and select for cells that have increased the transposon copy number to two or more. This yields homozygous mutants with two allelic mutations, but also cells that have duplicated the mutant chromosome and become aneuploid during culture. On average, 26% of the copy number gain events occur by the mitotic recombination pathway. We obtained homozygous cells from 40% of the heterozygous mutants tested. This method can provide homozygous mammalian loss-of-function mutants for forward genetic applications.

    Funded by: Wellcome Trust: 79643

    Nucleic acids research 2012;40;3;e21

  • No evidence of an association between mitochondrial DNA variants and osteoarthritis in 7393 cases and 5122 controls.

    Hudson G, Panoutsopoulou K, Wilson I, Southam L, Rayner NW, Arden N, Birrell F, Carluke I, Carr A, Chapman K, Deloukas P, Doherty M, McCaskie A, Ollier WE, Ralston SH, Reed MR, Spector TD, Valdes AM, Wallis GA, Wilkinson JM, Zeggini E, Samuels DC, Loughlin J, Chinnery PF and arcOGEN Consortium

    1Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, UK.

    OBJECTIVES: Osteoarthritis (OA) has a complex aetiology with a strong genetic component. Genome-wide association studies implicate several nuclear genes in the aetiology, but a major component of the heritability has yet to be defined at the molecular level. Initial studies implicate maternally inherited variants of mitochondrial DNA (mtDNA) in subgroups of patients with OA based on gender and specific joint involvement, but these findings have not been replicated. METHODS: The authors studied 138 maternally inherited mtDNA variants genotyped in a two cohort genetic association study across a total of 7393 OA cases from the arcOGEN consortium and 5122 controls genotyped in the Wellcome Trust Case Control consortium 2 study. RESULTS: Following data quality control we examined 48 mtDNA variants that were common in cohort 1 and cohort 2, and found no association with OA. None of the phenotypic subgroups previously associated with mtDNA haplogroups were associated in this study. CONCLUSIONS: We were not able to replicate previously published findings in the largest mtDNA association study to date. The evidence linking OA to mtDNA is not compelling at present.

    Annals of the rheumatic diseases 2012

  • Fitness of Escherichia coli strains carrying expressed and partially silent IncN and IncP1 plasmids.

    Humphrey B, Thomson NR, Thomas CM, Brooks K, Sanders M, Delsol AA, Roe JM, Bennett PM and Enne VI

    Bristol Centre for Antimicrobial Research, Department of Cellular and Molecular Medicine, University of Bristol, Medical Sciences Building, University Walk, Bristol, BS8 1TD, UK.

    Background: Understanding the survival of resistance plasmids in the absence of selective pressure for the antibiotic resistance genes they carry is important for assessing the value of interventions to combat resistant bacteria. Here, several poorly explored questions regarding the fitness impact of IncP1 and IncN broad host range plasmids on their bacterial hosts are examined; namely, whether related plasmids have similar fitness impacts, whether this varies according to host genetic background, and what effect antimicrobial resistance gene silencing has on fitness. Results: For the IncP1 group pairwise in vitro growth competition demonstrated that the fitness cost of plasmid RP1 depends on the host strain. For the IncN group, plasmids R46 and N3 whose sequence is presented for the first time conferred remarkably different fitness costs despite sharing closely related backbone structures, implicating the accessory genes in fitness. Silencing of antimicrobial resistance genes was found to be beneficial for host fitness with RP1 but not for IncN plasmid pVE46. Conclusions: These findings suggest that the fitness impact of a given plasmid on its host cannot be inferred from results obtained with other host-plasmid combinations, even if these are closely related.

    Funded by: Wellcome Trust: 076964, 089222

    BMC microbiology 2012;12;53

  • Rare and functional SIAE variants are not associated with autoimmune disease risk in up to 66,924 individuals of European ancestry.

    Hunt KA, Smyth DJ, Balschun T, Ban M, Mistry V, Ahmad T, Anand V, Barrett JC, Bhaw-Rosun L, Bockett NA, Brand OJ, Brouwer E, Concannon P, Cooper JD, Dias KR, van Diemen CC, Dubois PC, Edkins S, Fölster-Holst R, Fransen K, Glass DN, Heap GA, Hofmann S, Huizinga TW, Hunt S, Langford C, Lee J, Mansfield J, Marrosu MG, Mathew CG, Mein CA, Müller-Quernheim J, Nutland S, Onengut-Gumuscu S, Ouwehand W, Pearce K, Prescott NJ, Posthumus MD, Potter S, Rosati G, Sambrook J, Satsangi J, Schreiber S, Shtir C, Simmonds MJ, Sudman M, Thompson SD, Toes R, Trynka G, Vyse TJ, Walker NM, Weidinger S, Zhernakova A, Zoledziewska M, Type 1 Diabetes Genetics Consortium, UK Inflammatory Bowel Disease (IBD) Genetics Consortium, Wellcome Trust Case Control Consortium, Weersma RK, Gough SC, Sawcer S, Wijmenga C, Parkes M, Cucca F, Franke A, Deloukas P, Rich SS, Todd JA and van Heel DA

    Nature genetics 2012;44;1;3-5

  • InterPro in 2011: new developments in the family and domain prediction database.

    Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, Bateman A, Bernard T, Binns D, Bork P, Burge S, de Castro E, Coggill P, Corbett M, Das U, Daugherty L, Duquenne L, Finn RD, Fraser M, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, McMenamin C, Mi H, Mutowo-Muellenet P, Mulder N, Natale D, Orengo C, Pesseat S, Punta M, Quinn AF, Rivoire C, Sangrador-Vegas A, Selengut JD, Sigrist CJ, Scheremetjew M, Tate J, Thimmajanarthanan M, Thomas PD, Wu CH, Yeats C and Yong SY

    EMBL Outstation European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD Cambridge, UK. hunter@ebi.ac.uk

    InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F010508/1; NIGMS NIH HHS: GM081084

    Nucleic acids research 2012;40;Database issue;D306-12

  • Common variants at 6q22 and 17q21 are associated with intracranial volume.

    Ikram MA, Fornage M, Smith AV, Seshadri S, Schmidt R, Debette S, Vrooman HA, Sigurdsson S, Ropele S, Taal HR, Mook-Kanamori DO, Coker LH, Longstreth WT, Niessen WJ, DeStefano AL, Beiser A, Zijdenbos AP, Struchalin M, Jack CR, Rivadeneira F, Uitterlinden AG, Knopman DS, Hartikainen AL, Pennell CE, Thiering E, Steegers EA, Hakonarson H, Heinrich J, Palmer LJ, Jarvelin MR, McCarthy MI, Grant SF, St Pourcain B, Timpson NJ, Smith GD, Sovio U, Early Growth Genetics Consortium, Nalls MA, Au R, Hofman A, Gudnason H, van der Lugt A, Harris TB, Meeks WM, Vernooij MW, van Buchem MA, Catellier D, Jaddoe VW, Gudnason V, Windham BG, Wolf PA, van Duijn CM, Mosley TH, Schmidt H, Launer LJ, Breteler MM, DeCarli C and Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium

    Department of Epidemiology, Erasmus Medical Center University Medical Center, Rotterdam, The Netherlands. m.a.ikram@erasmusmc.nl

    During aging, intracranial volume remains unchanged and represents maximally attained brain size, while various interacting biological phenomena lead to brain volume loss. Consequently, intracranial volume and brain volume in late life reflect different genetic influences. Our genome-wide association study (GWAS) in 8,175 community-dwelling elderly persons did not reveal any associations at genome-wide significance (P < 5 × 10(-8)) for brain volume. In contrast, intracranial volume was significantly associated with two loci: rs4273712 (P = 3.4 × 10(-11)), a known height-associated locus on chromosome 6q22, and rs9915547 (P = 1.5 × 10(-12)), localized to the inversion on chromosome 17q21. We replicated the associations of these loci with intracranial volume in a separate sample of 1,752 elderly persons (P = 1.1 × 10(-3) for 6q22 and 1.2 × 10(-3) for 17q21). Furthermore, we also found suggestive associations of the 17q21 locus with head circumference in 10,768 children (mean age of 14.5 months). Our data identify two loci associated with head size, with the inversion at 17q21 also likely to be involved in attaining maximal brain size.

    Funded by: Canadian Institutes of Health Research: MOP 82893; Medical Research Council: 74882, G0500539; NCRR NIH HHS: UL1RR025005; NHGRI NIH HHS: U01-HG004402; NHLBI NIH HHS: 5R01HL087679-02, HL093029, N01-HC-25195, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N02-HL-6-4278, R01-HL087641, R01-HL093029; NIA NIH HHS: AG031287, AG033040, AG033193, AG08122, AG16495, N01-AG-12100, P30AG013846; NICHD NIH HHS: 1R01HD056465-01A1; NIMH NIH HHS: 1RL1MH083268-01; NINDS NIH HHS: NS17950; PHS HHS: HHSN268200625226C; Wellcome Trust: 076467, GR069224

    Nature genetics 2012;44;5;539-44

  • A method to infer positive selection from marker dynamics in an asexual population.

    Illingworth CJ and Mustonen V

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    Motivation: The observation of positive selection acting on a mutant indicates that the corresponding mutation has some form of functional relevance. Determining the fitness effects of mutations thus has relevance to many interesting biological questions. One means of identifying beneficial mutations in an asexual population is to observe changes in the frequency of marked subsets of the population. We here describe a method to estimate the establishment times and fitnesses of beneficial mutations from neutral marker frequency data.

    Results: The method accurately reproduces complex marker frequency trajectories. In simulations for which positive selection is close to 5% per generation, we obtain correlations upwards of 0.91 between correct and inferred haplotype establishment times. Where mutation selection coefficients are exponentially distributed, the inferred distribution of haplotype fitnesses is close to being correct. Applied to data from a bacterial evolution experiment, our method reproduces an observed correlation between evolvability and initial fitness defect.

    Funded by: Wellcome Trust: 098051

    Bioinformatics (Oxford, England) 2012;28;6;831-7

  • Components of selection in the evolution of the influenza virus: linkage effects beat inherent selection.

    Illingworth CJ and Mustonen V

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom. ci3@sanger.ac.uk

    The influenza virus is an important human pathogen, with a rapid rate of evolution in the human population. The rate of homologous recombination within genes of influenza is essentially zero. As such, where two alleles within the same gene are in linkage disequilibrium, interference between alleles will occur, whereby selection acting upon one allele has an influence upon the frequency of the other. We here measured the relative importance of selection and interference effects upon the evolution of influenza. We considered time-resolved allele frequency data from the global evolutionary history of the haemagglutinin gene of human influenza A/H3N2, conducting an in-depth analysis of sequences collected since 1996. Using a model that accounts for selection-caused interference between alleles in linkage disequilibrium, we estimated the inherent selective benefit of individual polymorphisms in the viral population. These inherent selection coefficients were in turn used to calculate the total selective effect of interference acting upon each polymorphism, considering the effect of the initial background upon which a mutation arose, and the subsequent effect of interference from other alleles that were under selection. Viewing events in retrospect, we estimated the influence of each of these components in determining whether a mutant allele eventually fixed or died in the global viral population. Our inherent selection coefficients, when combined across different regions of the protein, were consistent with previous measurements of dN/dS for the same system. Alleles going on to fix in the global population tended to be under more positive selection, to arise on more beneficial backgrounds, and to avoid strong negative interference from other alleles under selection. However, on average, the fate of a polymorphism was determined more by the combined influence of interference effects than by its inherent selection coefficient.

    Funded by: Wellcome Trust: 098051

    PLoS pathogens 2012;8;12;e1003091

  • Quantifying selection acting on a complex trait using allele frequency time series data.

    Illingworth CJ, Parts L, Schiffels S, Liti G and Mustonen V

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    When selection is acting on a large genetically diverse population, beneficial alleles increase in frequency. This fact can be used to map quantitative trait loci by sequencing the pooled DNA from the population at consecutive time points and observing allele frequency changes. Here, we present a population genetic method to analyze time series data of allele frequencies from such an experiment. Beginning with a range of proposed evolutionary scenarios, the method measures the consistency of each with the observed frequency changes. Evolutionary theory is utilized to formulate equations of motion for the allele frequencies, following which likelihoods for having observed the sequencing data under each scenario are derived. Comparison of these likelihoods gives an insight into the prevailing dynamics of the system under study. We illustrate the method by quantifying selective effects from an experiment, in which two phenotypically different yeast strains were first crossed and then propagated under heat stress (Parts L, Cubillos FA, Warringer J, et al. [14 co-authors]. 2011. Revealing the genetic structure of a trait by sequencing a population under selection. Genome Res). From these data, we discover that about 6% of polymorphic sites evolve nonneutrally under heat stress conditions, either because of their linkage to beneficial (driver) alleles or because they are drivers themselves. We further identify 44 genomic regions containing one or more candidate driver alleles, quantify their apparent selective advantage, obtain estimates of recombination rates within the regions, and show that the dynamics of the drivers display a strong signature of selection going beyond additive models. Our approach is applicable to study adaptation in a range of systems under different evolutionary pressures.

    Funded by: Wellcome Trust: 098051, WT077192/Z/05/Z

    Molecular biology and evolution 2012;29;4;1187-97

  • Novel Loci for metabolic networks and multi-tissue expression studies reveal genes for atherosclerosis.

    Inouye M, Ripatti S, Kettunen J, Lyytikäinen LP, Oksala N, Laurila PP, Kangas AJ, Soininen P, Savolainen MJ, Viikari J, Kähönen M, Perola M, Salomaa V, Raitakari O, Lehtimäki T, Taskinen MR, Järvelin MR, Ala-Korpela M, Palotie A and de Bakker PI

    Medical Systems Biology, Departments of Pathology and of Microbiology and Immunology, The University of Melbourne, Parkville, Victoria, Australia.

    Association testing of multiple correlated phenotypes offers better power than univariate analysis of single traits. We analyzed 6,600 individuals from two population-based cohorts with both genome-wide SNP data and serum metabolomic profiles. From the observed correlation structure of 130 metabolites measured by nuclear magnetic resonance, we identified 11 metabolic networks and performed a multivariate genome-wide association analysis. We identified 34 genomic loci at genome-wide significance, of which 7 are novel. In comparison to univariate tests, multivariate association analysis identified nearly twice as many significant associations in total. Multi-tissue gene expression studies identified variants in our top loci, SERPINA1 and AQP9, as eQTLs and showed that SERPINA1 and AQP9 expression in human blood was associated with metabolites from their corresponding metabolic networks. Finally, liver expression of AQP9 was associated with atherosclerotic lesion area in mice, and in human arterial tissue both SERPINA1 and AQP9 were shown to be upregulated (6.3-fold and 4.6-fold, respectively) in atherosclerotic plaques. Our study illustrates the power of multi-phenotype GWAS and highlights candidate genes for atherosclerosis.

    PLoS genetics 2012;8;8;e1002907

  • Genome-wide association study identifies a variant in HDAC9 associated with large vessel ischemic stroke.

    International Stroke Genetics Consortium (ISGC), Wellcome Trust Case Control Consortium 2 (WTCCC2), Bellenguez C, Bevan S, Gschwendtner A, Spencer CC, Burgess AI, Pirinen M, Jackson CA, Traylor M, Strange A, Su Z, Band G, Syme PD, Malik R, Pera J, Norrving B, Lemmens R, Freeman C, Schanz R, James T, Poole D, Murphy L, Segal H, Cortellini L, Cheng YC, Woo D, Nalls MA, Müller-Myhsok B, Meisinger C, Seedorf U, Ross-Adams H, Boonen S, Wloch-Kopec D, Valant V, Slark J, Furie K, Delavaran H, Langford C, Deloukas P, Edkins S, Hunt S, Gray E, Dronov S, Peltonen L, Gretarsdottir S, Thorleifsson G, Thorsteinsdottir U, Stefansson K, Boncoraglio GB, Parati EA, Attia J, Holliday E, Levi C, Franzosi MG, Goel A, Helgadottir A, Blackwell JM, Bramon E, Brown MA, Casas JP, Corvin A, Duncanson A, Jankowski J, Mathew CG, Palmer CN, Plomin R, Rautanen A, Sawcer SJ, Trembath RC, Viswanathan AC, Wood NW, Worrall BB, Kittner SJ, Mitchell BD, Kissela B, Meschia JF, Thijs V, Lindgren A, Macleod MJ, Slowik A, Walters M, Rosand J, Sharma P, Farrall M, Sudlow CL, Rothwell PM, Dichgans M, Donnelly P and Markus HS

    Wellcome Trust Centre for Human Genetics, University of Oxford, UK.

    Genetic factors have been implicated in stroke risk, but few replicated associations have been reported. We conducted a genome-wide association study (GWAS) for ischemic stroke and its subtypes in 3,548 affected individuals and 5,972 controls, all of European ancestry. Replication of potential signals was performed in 5,859 affected individuals and 6,281 controls. We replicated previous associations for cardioembolic stroke near PITX2 and ZFHX3 and for large vessel stroke at a 9p21 locus. We identified a new association for large vessel stroke within HDAC9 (encoding histone deacetylase 9) on chromosome 7p21.1 (including further replication in an additional 735 affected individuals and 28,583 controls) (rs11984041; combined P = 1.87 × 10(-11); odds ratio (OR) = 1.42, 95% confidence interval (CI) = 1.28-1.57). All four loci exhibited evidence for heterogeneity of effect across the stroke subtypes, with some and possibly all affecting risk for only one subtype. This suggests distinct genetic architectures for different stroke subtypes.

    Funded by: Medical Research Council: G0000934; NCATS NIH HHS: UL1 TR000077; Wellcome Trust: 068545/Z/02, 084724, 085475, 085475/B/08/Z, 085475/Z/08/Z, WT084724MA

    Nature genetics 2012;44;3;328-33

  • Transcriptional data: a new gateway to drug repositioning?

    Iorio F, Rittman T, Ge H, Menden M and Saez-Rodriguez J

    EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK; Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK.

    Recent advances in computational biology suggest that any perturbation to the transcriptional programme of the cell can be summarised by a proper 'signature': a set of genes combined with a pattern of expression. Therefore, it should be possible to generate proxies of clinicopathological phenotypes and drug effects through signatures acquired via DNA microarray technology. Gene expression signatures have recently been assembled and compared through genome-wide metrics, unveiling unexpected drug-disease and drug-drug 'connections' by matching corresponding signatures. Consequently, novel applications for existing drugs have been predicted and experimentally validated. Here, we describe related methods, case studies and resources while discussing challenges and benefits of exploiting existing repositories of microarray data that could serve as a search space for systematic drug repositioning.

    Drug discovery today 2012

  • Genome-wide association study implicates HLA-C*01:02 as a risk factor at the major histocompatibility complex locus in schizophrenia.

    Irish Schizophrenia Genomics Consortium and the Wellcome Trust Case Control Consortium 2

    Background: We performed a genome-wide association study (GWAS) to identify common risk variants for schizophrenia. Methods: The discovery scan included 1606 patients and 1794 controls from Ireland, using 6,212,339 directly genotyped or imputed single nucleotide polymorphisms (SNPs). A subset of this sample (270 cases and 860 controls) was subsequently included in the Psychiatric GWAS Consortium-schizophrenia GWAS meta-analysis. Results: One hundred eight SNPs were taken forward for replication in an independent sample of 13,195 cases and 31,021 control subjects. The most significant associations in discovery, corrected for genomic inflation, were (rs204999, p combined = 1.34 × 10(-9) and in combined samples (rs2523722 p combined = 2.88 × 10(-16)) mapped to the major histocompatibility complex (MHC) region. We imputed classical human leukocyte antigen (HLA) alleles at the locus; the most significant finding was with HLA-C*01:02. This association was distinct from the top SNP signal. The HLA alleles DRB1*03:01 and B*08:01 were protective, replicating a previous study. Conclusions: This study provides further support for involvement of MHC class I molecules in schizophrenia. We found evidence of association with previously reported risk alleles at the TCF4, VRK2, and ZNF804A loci.

    Funded by: Howard Hughes Medical Institute: 072894/Z/03/Z, 075491/Z/04/B; Medical Research Council: G0000934; NIMH NIH HHS: MH 41953, MH083094; Wellcome Trust: 068545/Z/02, 085475/B/08/Z, 085475/Z/08/Z, 090532/Z/09/Z

    Biological psychiatry 2012;72;8;620-8

  • Guidelines for Reporting Novel mecA Gene Homologues.

    Ito T, Hiramatsu K, Tomasz A, de Lencastre H, Perreten V, Holden MT, Coleman DC, Goering R, Giffard PM, Skov RL, Zhang K, Westh H, O'Brien F, Tenover FC, Oliveira DC, Boyle-Vavra S, Laurent F, Kearns AM, Kreiswirth B, Ko KS, Grundmann H, Sollid JE, John JF, Daum R, Soderquist B, Buist G and on behalf of the International Working Group on the Classification of Staphylococcal Cassette Chromosome Elements (IWG-SCC)

    Department of Bacteriology, Juntendo University, Tokyo, Japan.

    Antimicrobial agents and chemotherapy 2012;56;10;4997-4999

  • Antigenic diversity is generated by distinct evolutionary mechanisms in African trypanosome species.

    Jackson AP, Berry A, Aslett M, Allison HC, Burton P, Vavrova-Anderson J, Brown R, Browne H, Corton N, Hauser H, Gamble J, Gilderthorp R, Marcello L, McQuillan J, Otto TD, Quail MA, Sanders MJ, van Tonder A, Ginger ML, Field MC, Barry JD, Hertz-Fowler C and Berriman M

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom. andrew.jackson@sanger.ac.uk

    Antigenic variation enables pathogens to avoid the host immune response by continual switching of surface proteins. The protozoan blood parasite Trypanosoma brucei causes human African trypanosomiasis ("sleeping sickness") across sub-Saharan Africa and is a model system for antigenic variation, surviving by periodically replacing a monolayer of variant surface glycoproteins (VSG) that covers its cell surface. We compared the genome of Trypanosoma brucei with two closely related parasites Trypanosoma congolense and Trypanosoma vivax, to reveal how the variant antigen repertoire has evolved and how it might affect contemporary antigenic diversity. We reconstruct VSG diversification showing that Trypanosoma congolense uses variant antigens derived from multiple ancestral VSG lineages, whereas in Trypanosoma brucei VSG have recent origins, and ancestral gene lineages have been repeatedly co-opted to novel functions. These historical differences are reflected in fundamental differences between species in the scale and mechanism of recombination. Using phylogenetic incompatibility as a metric for genetic exchange, we show that the frequency of recombination is comparable between Trypanosoma congolense and Trypanosoma brucei but is much lower in Trypanosoma vivax. Furthermore, in showing that the C-terminal domain of Trypanosoma brucei VSG plays a crucial role in facilitating exchange, we reveal substantial species differences in the mechanism of VSG diversification. Our results demonstrate how past VSG evolution indirectly determines the ability of contemporary parasites to generate novel variant antigens through recombination and suggest that the current model for antigenic variation in Trypanosoma brucei is only one means by which these parasites maintain chronic infections.

    Funded by: Wellcome Trust: 085349/Z/08/Z, WT 055558/Z/98/A, WT 055558/Z/98/C, WT 085775/Z/08/Z

    Proceedings of the National Academy of Sciences of the United States of America 2012;109;9;3416-21

  • Exome sequencing and genetic testing for MODY.

    Johansson S, Irgens H, Chudasama KK, Molnes J, Aerts J, Roque FS, Jonassen I, Levy S, Lima K, Knappskog PM, Bell GI, Molven A and Njølstad PR

    Department of Clinical Medicine, University of Bergen, Bergen, Norway.

    Context: Genetic testing for monogenic diabetes is important for patient care. Given the extensive genetic and clinical heterogeneity of diabetes, exome sequencing might provide additional diagnostic potential when standard Sanger sequencing-based diagnostics is inconclusive. Objective: The aim of the study was to examine the performance of exome sequencing for a molecular diagnosis of MODY in patients who have undergone conventional diagnostic sequencing of candidate genes with negative results. We performed exome enrichment followed by high-throughput sequencing in nine patients with suspected MODY. They were Sanger sequencing-negative for mutations in the HNF1A, HNF4A, GCK, HNF1B and INS genes. We excluded common, non-coding and synonymous gene variants, and performed in-depth analysis on filtered sequence variants in a pre-defined set of 111 genes implicated in glucose metabolism. Results: On average, we obtained 45 X median coverage of the entire targeted exome and found 199 rare coding variants per individual. We identified 0-4 rare non-synonymous and nonsense variants per individual in our a priori list of 111 candidate genes. Three of the variants were considered pathogenic (in ABCC8, HNF4A and PPARG, respectively), thus exome sequencing led to a genetic diagnosis in at least three of the nine patients. Approximately 91% of known heterozygous SNPs in the target exomes were detected, but we also found low coverage in some key diabetes genes using our current exome sequencing approach. Novel variants in the genes ARAP1, GLIS3, MADD, NOTCH2 and WFS1 need further investigation to reveal their possible role in diabetes. Conclusion: Our results demonstrate that exome sequencing can improve molecular diagnostics of MODY when used as a complement to Sanger sequencing. However, improvements will be needed, especially concerning coverage, before the full potential of exome sequencing can be realized.

    PloS one 2012;7;5;e38050

  • Bcl11a is required for neuronal morphogenesis and sensory circuit formation in dorsal spinal cord development.

    John A, Brylka H, Wiegreffe C, Simon R, Liu P, Jüttner R, Crenshaw EB, Luyten FP, Jenkins NA, Copeland NG, Birchmeier C and Britsch S

    Institute of Molecular and Cellular Anatomy, Ulm University, 89081 Ulm, Germany.

    Dorsal spinal cord neurons receive and integrate somatosensory information provided by neurons located in dorsal root ganglia. Here we demonstrate that dorsal spinal neurons require the Krüppel-C(2)H(2) zinc-finger transcription factor Bcl11a for terminal differentiation and morphogenesis. The disrupted differentiation of dorsal spinal neurons observed in Bcl11a mutant mice interferes with their correct innervation by cutaneous sensory neurons. To understand the mechanism underlying the innervation deficit, we characterized changes in gene expression in the dorsal horn of Bcl11a mutants and identified dysregulated expression of the gene encoding secreted frizzled-related protein 3 (sFRP3, or Frzb). Frzb mutant mice show a deficit in the innervation of the spinal cord, suggesting that the dysregulated expression of Frzb can account in part for the phenotype of Bcl11a mutants. Thus, our genetic analysis of Bcl11a reveals essential functions of this transcription factor in neuronal morphogenesis and sensory wiring of the dorsal spinal cord and identifies Frzb, a component of the Wnt pathway, as a downstream acting molecule involved in this process.

    Development (Cambridge, England) 2012;139;10;1831-41

  • Resolution of a meningococcal disease outbreak from whole-genome sequence data with rapid web-based analysis methods.

    Jolley KA, Hill DM, Bratcher HB, Harrison OB, Feavers IM, Parkhill J and Maiden MC

    Department of Zoology, University of Oxford, Oxford, United Kingdom.

    The increase in the capacity and reduction in cost of whole-genome sequencing methods present the imminent prospect of such data being used routinely in real time for investigations of bacterial disease outbreaks. For this to be realized, however, it is necessary that generic, portable, and robust analysis frameworks be available, which can be readily interpreted and used in real time by microbiologists, clinicians, and public health epidemiologists. We have achieved this with a set of analysis tools integrated into the PubMLST.org website, which can in principle be used for the analysis of any pathogen. The approach is demonstrated with genomic data from isolates obtained during a well-characterized meningococcal disease outbreak at the University of Southampton, United Kingdom, that occurred in 1997. Whole-genome sequence data were collected, de novo assembled, and deposited into the PubMLST Neisseria BIGSdb database, which automatically annotated the sequences. This enabled the immediate and backwards-compatible classification of the isolates with a number of schemes, including the following: conventional, extended, and ribosomal multilocus sequence typing (MLST, eMLST, and rMLST); antigen gene sequence typing (AGST); analysis based on genes conferring antibiotic susceptibility. The isolates were also compared to a reference isolate belonging to the same clonal complex (ST-11) at 1,975 loci. Visualization of the data with the NeighborNet algorithm, implemented in SplitsTree 4 within the PubMLST website, permitted complete resolution of the outbreak and related isolates, demonstrating that multiple closely related but distinct strains were simultaneously present in asymptomatic carriage and disease, with two causing disease and one responsible for the outbreak itself.

    Journal of clinical microbiology 2012;50;9;3046-53

  • The genomic basis of adaptive evolution in threespine sticklebacks.

    Jones FC, Grabherr MG, Chan YF, Russell P, Mauceli E, Johnson J, Swofford R, Pirun M, Zody MC, White S, Birney E, Searle S, Schmutz J, Grimwood J, Dickson MC, Myers RM, Miller CT, Summers BR, Knecht AK, Brady SD, Zhang H, Pollen AA, Howes T, Amemiya C, Broad Institute Genome Sequencing Platform &amp; Whole Genome Assembly Team, Baldwin J, Bloom T, Jaffe DB, Nicol R, Wilkinson J, Lander ES, Di Palma F, Lindblad-Toh K and Kingsley DM

    Department of Developmental Biology, Beckman Center B300, Stanford University School of Medicine, Stanford California 94305, USA.

    Marine stickleback fish have colonized and adapted to thousands of streams and lakes formed since the last ice age, providing an exceptional opportunity to characterize genomic mechanisms underlying repeated ecological adaptation in nature. Here we develop a high-quality reference genome assembly for threespine sticklebacks. By sequencing the genomes of twenty additional individuals from a global set of marine and freshwater populations, we identify a genome-wide set of loci that are consistently associated with marine-freshwater divergence. Our results indicate that reuse of globally shared standing genetic variation, including chromosomal inversions, has an important role in repeated evolution of distinct marine and freshwater sticklebacks, and in the maintenance of divergent ecotypes during early stages of reproductive isolation. Both coding and regulatory changes occur in the set of loci underlying marine-freshwater evolution, but regulatory changes appear to predominate in this well known example of repeated adaptive evolution in nature.

    Funded by: Howard Hughes Medical Institute; NHGRI NIH HHS: P50-HG002568

    Nature 2012;484;7392;55-61

  • Analysis of protein palmitoylation reveals a pervasive role in Plasmodium development and pathogenesis.

    Jones ML, Collins MO, Goulding D, Choudhary JS and Rayner JC

    Malaria Programme, The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Asexual stage Plasmodium falciparum replicates and undergoes a tightly regulated developmental process in human erythrocytes. One mechanism involved in the regulation of this process is posttranslational modification (PTM) of parasite proteins. Palmitoylation is a PTM in which cysteine residues undergo a reversible lipid modification, which can regulate target proteins in diverse ways. Using complementary palmitoyl protein purification approaches and quantitative mass spectrometry, we examined protein palmitoylation in asexual-stage P. falciparum parasites and identified over 400 palmitoylated proteins, including those involved in cytoadherence, drug resistance, signaling, development, and invasion. Consistent with the prevalence of palmitoylated proteins, palmitoylation is essential for P. falciparum asexual development and influences erythrocyte invasion by directly regulating the stability of components of the actin-myosin invasion motor. Furthermore, P. falciparum uses palmitoylation in diverse ways, stably modifying some proteins while dynamically palmitoylating others. Palmitoylation therefore plays a central role in regulating P. falciparum blood stage development.

    Funded by: Wellcome Trust: 079643/Z/06/Z, 089084

    Cell host & microbe 2012;12;2;246-58

  • Getting stuck in: protein palmitoylation in Plasmodium.

    Jones ML, Tay CL and Rayner JC

    Malaria Programme, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Palmitoylation is the reversible post-translational addition of a lipid moiety to cysteine residues on targeted proteins. The recent use of proteomic-scale techniques to study protein palmitoylation in multiple organisms has radically changed our understanding of the diversity of proteins and signaling pathways that are affected by palmitoylation. These experiments have made clear that, similarly to phosphorylation, palmitoylation is a regulatory tool that has an impact upon a wide range of essential eukaryotic processes. A recent proteome-level analysis of protein palmitoylation in Plasmodium has revealed the importance of palmitoylation in parasite biology and has raised new and exciting questions about several Plasmodium-specific and virulence-associated processes.

    Trends in parasitology 2012

  • Dispatches from the functional phase of genome biology.

    Jostins L

    Statistical and Computational Genetics, Wellcome Trust Sanger Institute, Cambridge CB10 1HH, UK. lj4@sanger.ac.uk.

    ABSTRACT: A report on the 25th annual meeting on The Biology of Genomes, Cold Spring Harbor, USA, 8-12 May 2012.

    Genome biology 2012;13;6;316

  • Misuse of hierarchical linear models overstates the significance of a reported association between OXTR and prosociality.

    Jostins L, Pickrell JK, MacArthur DG and Barrett JC

    Proceedings of the National Academy of Sciences of the United States of America 2012;109;18;E1048

  • Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease.

    Jostins L, Ripke S, Weersma RK, Duerr RH, McGovern DP, Hui KY, Lee JC, Schumm LP, Sharma Y, Anderson CA, Essers J, Mitrovic M, Ning K, Cleynen I, Theatre E, Spain SL, Raychaudhuri S, Goyette P, Wei Z, Abraham C, Achkar JP, Ahmad T, Amininejad L, Ananthakrishnan AN, Andersen V, Andrews JM, Baidoo L, Balschun T, Bampton PA, Bitton A, Boucher G, Brand S, Büning C, Cohain A, Cichon S, D'Amato M, De Jong D, Devaney KL, Dubinsky M, Edwards C, Ellinghaus D, Ferguson LR, Franchimont D, Fransen K, Gearry R, Georges M, Gieger C, Glas J, Haritunians T, Hart A, Hawkey C, Hedl M, Hu X, Karlsen TH, Kupcinskas L, Kugathasan S, Latiano A, Laukens D, Lawrance IC, Lees CW, Louis E, Mahy G, Mansfield J, Morgan AR, Mowat C, Newman W, Palmieri O, Ponsioen CY, Potocnik U, Prescott NJ, Regueiro M, Rotter JI, Russell RK, Sanderson JD, Sans M, Satsangi J, Schreiber S, Simms LA, Sventoraityte J, Targan SR, Taylor KD, Tremelling M, Verspaget HW, De Vos M, Wijmenga C, Wilson DC, Winkelmann J, Xavier RJ, Zeissig S, Zhang B, Zhang CK, Zhao H, International IBD Genetics Consortium (IIBDGC), Silverberg MS, Annese V, Hakonarson H, Brant SR, Radford-Smith G, Mathew CG, Rioux JD, Schadt EE, Daly MJ, Franke A, Parkes M, Vermeire S, Barrett JC and Cho JH

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK.

    Crohn's disease and ulcerative colitis, the two common forms of inflammatory bowel disease (IBD), affect over 2.5 million people of European ancestry, with rising prevalence in other populations. Genome-wide association studies and subsequent meta-analyses of these two diseases as separate phenotypes have implicated previously unsuspected mechanisms, such as autophagy, in their pathogenesis and showed that some IBD loci are shared with other inflammatory diseases. Here we expand on the knowledge of relevant pathways by undertaking a meta-analysis of Crohn's disease and ulcerative colitis genome-wide association scans, followed by extensive validation of significant findings, with a combined total of more than 75,000 cases and controls. We identify 71 new associations, for a total of 163 IBD loci, that meet genome-wide significance thresholds. Most loci contribute to both phenotypes, and both directional (consistently favouring one allele over the course of human history) and balancing (favouring the retention of both alleles within populations) selection effects are evident. Many IBD loci are also implicated in other immune-mediated disorders, most notably with ankylosing spondylitis and psoriasis. We also observe considerable overlap between susceptibility loci for IBD and mycobacterial infection. Gene co-expression network analysis emphasizes this relationship, with pathways shared between host responses to mycobacteria and those predisposing to IBD.

    Funded by: British Heart Foundation: G0000934; Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Medical Research Council: G0600329, G0800675, G0800759; NCATS NIH HHS: UL1 TR000124, UL1 TR000124-01, UL1 TR000142; NCI NIH HHS: CA141743, R01 CA141743; NCRR NIH HHS: M01-RR00425; NIAID NIH HHS: AI062773; NIDDK NIH HHS: DK043351, DK062413, DK062420, DK062422, DK062423, DK062429, DK062429-S1, DK062431, DK062432, DK063491, DK076984, DK084554, DK83756, K23 DK097142, P01DK046763, P30 DK043351, R01 DK055731, R01 DK064869, U01 DK062418, U01 DK062420, U01 DK062422, U01 DK062429, U01 DK062431, U01 DK062432; NIGMS NIH HHS: T32GM07205; Wellcome Trust: 068545/Z/02, 083948/Z/07/Z, 085475/B/08/Z, 085475/Z/08/Z, 089120, 098051

    Nature 2012;491;7422;119-24

  • Reactome - a curated knowledgebase of biological pathways: megakaryocytes and platelets.

    Jupe S, Akkerman JW, Soranzo N and Ouwehand WH

    European Bioinformatics Institute, Hinxton, Cambridge, United Kingdom Department of Clinical Chemistry and Haematology, University Medical Centre Utrecht, Utrecht, the Netherlands The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom Department of Haematology, University of Cambridge and National Health Service Blood and Transplant, Cambridge, United Kingdom.

    The platelet field is undergoing a radical transformation from reductionist simplification to large scale integration. Following the era of simplification whereby biological processes were dissected at the molecular and atomic level, new technologies have now generated an overwhelming flow of information that can only be comprehended in an integrated approach. High throughput analyses of transcription and translation in megakaryocytes and platelets, individual analyses of membranes and secretory granules, the clustering of pathways for platelet activation and inhibition in signalosomes all add to a complexity that requires platforms for knowledge accumulation.

    Journal of thrombosis and haemostasis : JTH 2012

  • Interaction of insulin and PPAR-α genes in Alzheimer's disease: the Epistasis Project.

    Kölsch H, Lehmann DJ, Ibrahim-Verbaas CA, Combarros O, van Duijn CM, Hammond N, Belbin O, Cortina-Borja M, Lehmann MG, Aulchenko YS, Schuur M, Breteler M, Wilcock GK, Brown K, Kehoe PG, Barber R, Coto E, Alvarez V, Deloukas P, Mateo I, Maier W, Morgan K, Warden DR, Smith AD and Heun R

    Department of Psychiatry, University of Bonn, Bonn, Germany. heike.koelsch@uni-bonn.de

    Altered glucose metabolism has been described in Alzheimer's disease (AD). We re-investigated the interaction of the insulin (INS) and the peroxisome proliferator-activated receptor alpha (PPARA) genes in AD risk in the Epistasis Project, including 1,757 AD cases and 6,294 controls. Allele frequencies of both SNPs (PPARA L162V, INS intron 0 A/T) differed between Northern Europeans and Northern Spanish. The PPARA 162LL genotype increased AD risk in Northern Europeans (p = 0.04), but not in Northern Spanish (p = 0.2). There was no association of the INS intron 0 TT genotype with AD. We observed an interaction on AD risk between PPARA 162LL and INS intron 0 TT genotypes in Northern Europeans (Synergy factor 2.5, p = 0.016), but not in Northern Spanish. We suggest that dysregulation of glucose metabolism contributes to the development of AD and might be due in part to genetic variations in INS and PPARA and their interaction especially in Northern Europeans.

    Journal of neural transmission (Vienna, Austria : 1996) 2012;119;4;473-9

  • Routine use of microbial whole genome sequencing in diagnostic and public health microbiology.

    Köser CU, Ellington MJ, Cartwright EJ, Gillespie SH, Brown NM, Farrington M, Holden MT, Dougan G, Bentley SD, Parkhill J and Peacock SJ

    Department of Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge, United Kingdom.

    PLoS pathogens 2012;8;8;e1002824

  • The B10 Idd9.3 locus mediates accumulation of functionally superior CD137(+) regulatory T cells in the nonobese diabetic type 1 diabetes model.

    Kachapati K, Adams DE, Wu Y, Steward CA, Rainbow DB, Wicker LS, Mittler RS and Ridgway WM

    Division of Immunology, Allergy and Rheumatology, University of Cincinnati College of Medicine, Cincinnati, OH 45267, USA.

    CD137 is a T cell costimulatory molecule encoded by the prime candidate gene (designated Tnfrsf9) in NOD.B10 Idd9.3 congenic mice protected from type 1 diabetes (T1D). NOD T cells show decreased CD137-mediated T cell signaling compared with NOD.B10 Idd9.3 T cells, but it has been unclear how this decreased CD137 T cell signaling could mediate susceptibility to T1D. We and others have shown that a subset of regulatory T cells (Tregs) constitutively expresses CD137 (whereas effector T cells do not, and only express CD137 briefly after activation). In this study, we show that the B10 Idd9.3 region intrinsically contributes to accumulation of CD137(+) Tregs with age. NOD.B10 Idd9.3 mice showed significantly increased percentages and numbers of CD137(+) peripheral Tregs compared with NOD mice. Moreover, Tregs expressing the B10 Idd9.3 region preferentially accumulated in mixed bone marrow chimeric mice reconstituted with allotypically marked NOD and NOD.B10 Idd9.3 bone marrow. We demonstrate a possible significance of increased numbers of CD137(+) Tregs by showing functional superiority of FACS-purified CD137(+) Tregs in vitro compared with CD137(-) Tregs in T cell-suppression assays. Increased functional suppression was also associated with increased production of the alternatively spliced CD137 isoform, soluble CD137, which has been shown to suppress T cell proliferation. We show for the first time, to our knowledge, that CD137(+) Tregs are the primary cellular source of soluble CD137. NOD.B10 Idd9.3 mice showed significantly increased serum soluble CD137 compared with NOD mice with age, consistent with their increased numbers of CD137(+) Tregs with age. These studies demonstrate the importance of CD137(+) Tregs in T1D and offer a new hypothesis for how the NOD Idd9.3 region could act to increase T1D susceptibility.

    Funded by: NIAID NIH HHS: U19 AI056374, U19AI56374; Wellcome Trust: 079895, 091157

    Journal of immunology (Baltimore, Md. : 1950) 2012;189;10;5001-15

  • A Potential Novel Spontaneous Preterm Birth Gene, AR, Identified by Linkage and Association Analysis of X Chromosomal Markers.

    Karjalainen MK, Huusko JM, Ulvila J, Sotkasiira J, Luukkonen A, Teramo K, Plunkett J, Anttila V, Palotie A, Haataja R, Muglia LJ and Hallman M

    Department of Pediatrics, Institute of Clinical Medicine, University of Oulu, Oulu, Finland.

    Preterm birth is the major cause of neonatal mortality and morbidity. In many cases, it has severe life-long consequences for the health and neurological development of the newborn child. More than 50% of all preterm births are spontaneous, and currently there is no effective prevention. Several studies suggest that genetic factors play a role in spontaneous preterm birth (SPTB). However, its genetic background is insufficiently characterized. The aim of the present study was to perform a linkage analysis of X chromosomal markers in SPTB in large northern Finnish families with recurrent SPTBs. We found a significant linkage signal (HLOD  = 3.72) on chromosome locus Xq13.1 when the studied phenotype was being born preterm. There were no significant linkage signals when the studied phenotype was giving preterm deliveries. Two functional candidate genes, those encoding the androgen receptor (AR) and the interleukin-2 receptor gamma subunit (IL2RG), located near this locus were analyzed as candidates for SPTB in subsequent case-control association analyses. Nine single-nucleotide polymorphisms (SNPs) within these genes and an AR exon-1 CAG repeat, which was previously demonstrated to be functionally significant, were analyzed in mothers with preterm delivery (n = 272) and their offspring (n = 269), and in mothers with exclusively term deliveries (n = 201) and their offspring (n = 199), all originating from northern Finland. A replication study population consisting of individuals born preterm (n = 111) and term (n = 197) from southern Finland was also analyzed. Long AR CAG repeats (≥26) were overrepresented and short repeats (≤19) underrepresented in individuals born preterm compared to those born at term. Thus, our linkage and association results emphasize the role of the fetal genome in genetic predisposition to SPTB and implicate AR as a potential novel fetal susceptibility gene for SPTB.

    PloS one 2012;7;12;e51378

  • Robust and sensitive analysis of mouse knockout phenotypes.

    Karp NA, Melvin D, Sanger Mouse Genetics Project and Mott RF

    Mouse Informatics Group, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    A significant challenge of in-vivo studies is the identification of phenotypes with a method that is robust and reliable. The challenge arises from practical issues that lead to experimental designs which are not ideal. Breeding issues, particularly in the presence of fertility or fecundity problems, frequently lead to data being collected in multiple batches. This problem is acute in high throughput phenotyping programs. In addition, in a high throughput environment operational issues lead to controls not being measured on the same day as knockouts. We highlight how application of traditional methods, such as a Student's t-Test or a 2-way ANOVA, in these situations give flawed results and should not be used. We explore the use of mixed models using worked examples from Sanger Mouse Genome Project focusing on Dual-Energy X-Ray Absorptiometry data for the analysis of mouse knockout data and compare to a reference range approach. We show that mixed model analysis is more sensitive and less prone to artefacts allowing the discovery of subtle quantitative phenotypes essential for correlating a gene's function to human disease. We demonstrate how a mixed model approach has the additional advantage of being able to include covariates, such as body weight, to separate effect of genotype from these covariates. This is a particular issue in knockout studies, where body weight is a common phenotype and will enhance the precision of assigning phenotypes and the subsequent selection of lines for secondary phenotyping. The use of mixed models with in-vivo studies has value not only in improving the quality and sensitivity of the data analysis but also ethically as a method suitable for small batches which reduces the breeding burden of a colony. This will reduce the use of animals, increase throughput, and decrease cost whilst improving the quality and depth of knowledge gained.

    PloS one 2012;7;12;e52410

  • The fallacy of ratio correction to address confounding factors.

    Karp NA, Segonds-Pichon A, Gerdin AK, Ramírez-Solis R and White JK

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Scientists aspire to measure cause and effect. Unfortunately confounding variables, ones that are associated with both the probable cause and the outcome, can lead to an association that is true but potentially misleading. For example, altered body weight is often observed in a gene knockout; however, many other variables, such as lean mass, will also change as the body weight changes. This leaves the researcher asking whether the change in that variable is expected for that change in weight. Ratio correction, which is often referred to as normalization, is a method used commonly to remove the effect of a confounding variable. Although ratio correction is used widely in biological research, it is not the method recommended in the statistical literature to address confounding factors; instead regression methods such as the analysis of covariance (ANCOVA) are proposed. This method examines the difference in means after adjusting for the confounding relationship. Using real data, this manuscript demonstrates how the ratio correction approach is flawed and can result in erroneous calls of significance leading to inappropriate biological conclusions. This arises as some of the underlying assumptions are not met. The manuscript goes on to demonstrate that researchers should use ANCOVA, and discusses how graphical tools can be used readily to judge the robustness of this method. This study is therefore a clear example of why assumption testing is an important component of a study and thus why it is included in the Animal Research: Reporting of In Vivo Experiment (ARRIVE) guidelines.

    Funded by: Wellcome Trust: WT077157/Z/05/Z

    Laboratory animals 2012;46;3;245-52

  • Using genome-wide complex trait analysis to quantify 'missing heritability' in Parkinson's disease.

    Keller MF, Saad M, Bras J, Bettella F, Nicolaou N, Simón-Sánchez J, Mittag F, Büchel F, Sharma M, Gibbs JR, Schulte C, Moskvina V, Durr A, Holmans P, Kilarski LL, Guerreiro R, Hernandez DG, Brice A, Ylikotila P, Stefánsson H, Majamaa K, Morris HR, Williams N, Gasser T, Heutink P, Wood NW, Hardy J, Martinez M, Singleton AB, Nalls MA and for the International Parkinson's Disease Genomics Consortium (IPDGC) and The Wellcome Trust Case Control Consortium 2 (WTCCC2)

    A full list of The International Parkinson's Disease Genomics Consortium (IPDGC) and The Wellcome Trust Case Control Consortium 2 (WTCCC2) members and affiliations appears at the end of this manuscript.

    Genome-wide association studies (GWASs) have been successful at identifying single-nucleotide polymorphisms (SNPs) highly associated with common traits; however, a great deal of the heritable variation associated with common traits remains unaccounted for within the genome. Genome-wide complex trait analysis (GCTA) is a statistical method that applies a linear mixed model to estimate phenotypic variance of complex traits explained by genome-wide SNPs, including those not associated with the trait in a GWAS. We applied GCTA to 8 cohorts containing 7096 case and 19 455 control individuals of European ancestry in order to examine the missing heritability present in Parkinson's disease (PD). We meta-analyzed our initial results to produce robust heritability estimates for PD types across cohorts. Our results identify 27% (95% CI 17-38, P = 8.08E - 08) phenotypic variance associated with all types of PD, 15% (95% CI -0.2 to 33, P = 0.09) phenotypic variance associated with early-onset PD and 31% (95% CI 17-44, P = 1.34E - 05) phenotypic variance associated with late-onset PD. This is a substantial increase from the genetic variance identified by top GWAS hits alone (between 3 and 5%) and indicates there are substantially more risk loci to be identified. Our results suggest that although GWASs are a useful tool in identifying the most common variants associated with complex disease, a great deal of common variants of small effect remain to be discovered.

    Human molecular genetics 2012;21;22;4996-5009

  • Reply to "Human genetic studies on osteoarthritis from clinicians' viewpoints".

    Kerkhof HJ, Evangelou E, Meulenbelt I, van Meurs JB, Zeggini E and Valdes AM

    Osteoarthritis and cartilage / OARS, Osteoarthritis Research Society 2012;20;3;250-1; author reply 252

  • Avidity-based extracellular interaction screening (AVEXIS) for the scalable detection of low-affinity extracellular receptor-ligand interactions.

    Kerr JS and Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute.

    Extracellular protein:protein interactions between secreted or membrane-tethered proteins are critical for both initiating intercellular communication and ensuring cohesion within multicellular organisms. Proteins predicted to form extracellular interactions are encoded by approximately a quarter of human genes, but despite their importance and abundance, the majority of these proteins have no documented binding partner. Primarily, this is due to their biochemical intractability: membrane-embedded proteins are difficult to solubilise in their native conformation and contain structurally-important posttranslational modifications. Also, the interaction affinities between receptor proteins are often characterised by extremely low interaction strengths (half-lives < 1 second) precluding their detection with many commonly-used high throughput methods. Here, we describe an assay, AVEXIS (AVidity-based EXtracellular Interaction Screen) that overcomes these technical challenges enabling the detection of very weak protein interactions (t(1/2) ≤ 0.1 sec) with a low false positive rate. The assay is usually implemented in a high throughput format to enable the systematic screening of many thousands of interactions in a convenient microtitre plate format (Fig. 1). It relies on the production of soluble recombinant protein libraries that contain the ectodomain fragments of cell surface receptors or secreted proteins within which to screen for interactions; therefore, this approach is suitable for type I, type II, GPI-linked cell surface receptors and secreted proteins but not for multipass membrane proteins such as ion channels or transporters. The recombinant protein libraries are produced using a convenient and high-level mammalian expression system, to ensure that important posttranslational modifications such as glycosylation and disulphide bonds are added. Expressed recombinant proteins are secreted into the medium and produced in two forms: a biotinylated bait which can be captured on a streptavidin-coated solid phase suitable for screening, and a pentamerised enzyme-tagged (β-lactamase) prey. The bait and prey proteins are presented to each other in a binary fashion to detect direct interactions between them, similar to a conventional ELISA (Fig. 1). The pentamerisation of the proteins in the prey is achieved through a peptide sequence from the cartilage oligomeric matrix protein (COMP) and increases the local concentration of the ectodomains thereby providing significant avidity gains to enable even very transient interactions to be detected. By normalising the activities of both the bait and prey to predetermined levels prior to screening, we have shown that interactions having monomeric half-lives of 0.1 sec can be detected with low false positive rates.

    Funded by: Wellcome Trust: 077108

    Journal of visualized experiments : JoVE 2012;61;e3881

  • Genome-wide association study identifies multiple loci influencing human serum metabolite levels.

    Kettunen J, Tukiainen T, Sarin AP, Ortega-Alonso A, Tikkanen E, Lyytikäinen LP, Kangas AJ, Soininen P, Würtz P, Silander K, Dick DM, Rose RJ, Savolainen MJ, Viikari J, Kähönen M, Lehtimäki T, Pietiläinen KH, Inouye M, McCarthy MI, Jula A, Eriksson J, Raitakari OT, Salomaa V, Kaprio J, Järvelin MR, Peltonen L, Perola M, Freimer NB, Ala-Korpela M, Palotie A and Ripatti S

    Institute for Molecular Medicine Finland, University of Helsinki, Finland.

    Nuclear magnetic resonance assays allow for measurement of a wide range of metabolic phenotypes. We report here the results of a GWAS on 8,330 Finnish individuals genotyped and imputed at 7.7 million SNPs for a range of 216 serum metabolic phenotypes assessed by NMR of serum samples. We identified significant associations (P < 2.31 × 10(-10)) at 31 loci, including 11 for which there have not been previous reports of associations to a metabolic trait or disorder. Analyses of Finnish twin pairs suggested that the metabolic measures reported here show higher heritability than comparable conventional metabolic phenotypes. In accordance with our expectations, SNPs at the 31 loci associated with individual metabolites account for a greater proportion of the genetic component of trait variance (up to 40%) than is typically observed for conventional serum metabolic phenotypes. The identification of such associations may provide substantial insight into cardiometabolic disorders.

    Funded by: Medical Research Council: G0500539, G0600705; NHLBI NIH HHS: 5R01HL087679; NIAAA NIH HHS: AA-08315, AA-09203, AA-12502, AA-15416, R01 AA015416, R37 AA012502; NIMH NIH HHS: 1RL1MH083268; Wellcome Trust: 089062/Z/09/Z, 090532, 098051, 89061/Z/09/Z, GR069224

    Nature genetics 2012;44;3;269-76

  • How next-generation sequencing is transforming complex disease genetics.

    Kilpinen H and Barrett JC

    Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva 1211, Switzerland; Swiss Institute of Bioinformatics, Geneva 1211, Switzerland.

    Progress in understanding the genetics of human disease is closely tied to technological developments in DNA sequencing. Recently, next-generation technology has transformed the scale of sequencing; compared to the methods used in the Human Genome Project, modern sequencers are 50000-fold faster. Complex disease genetics presents an immediate opportunity to use this technology to move from approaches using only partial information (linkage and genome-wide association studies, GWAS) to complete analysis of the relationship between genomic variation and phenotype. We first describe sequence-based improvements to existing study designs, followed by prioritization of both samples and genomic regions to be sequenced, and then address the ultimate goal of analyzing thousands of whole-genome sequences. Finally, we discuss how the same technology will also fundamentally change the way we understand the biological mechanisms underlying disease associations discovered through sequencing.

    Trends in genetics : TIG 2012

  • Properties of a Novel PBP2A Protein Homolog from Staphylococcus aureus Strain LGA251 and Its Contribution to the β-Lactam-resistant Phenotype.

    Kim C, Milheiriço C, Gardete S, Holmes MA, Holden MT, de Lencastre H and Tomasz A

    From the Laboratory of Microbiology and Infectious Diseases, The Rockefeller University, New York, New York 10065.

    Methicillin-resistant Staphylococcus aureus (MRSA) strains show strain-to-strain variation in resistance level, in genetic background, and also in the structure of the chromosomal cassette (SCCmec) that carries the resistance gene mecA. In contrast, strain-to-strain variation in the sequence of the mecA determinant was found to be much more limited among MRSA isolates examined so far. The first exception to this came with the recent identification of MRSA strain LGA251, which carries a new homolog of this gene together with regulatory elements mecI/mecR that also have novel, highly divergent structures. After cloning and purification in Escherichia coli, PBP2A(LGA), the protein product of the new mecA homolog, showed aberrant mobility in SDS-PAGE, structural instability and loss of activity at 37 °C, and a higher relative affinity for oxacillin as compared with cefoxitin. The mecA homolog free of its regulatory elements was cloned into a plasmid and introduced into the background of the β-lactam-susceptible S. aureus strain COL-S. In this background, the mecA homolog expressed a high-level resistance to cefoxitin (MIC = 400 μg/ml) and a somewhat lower resistance to oxacillin (minimal inhibitory concentration = 200 μg/ml). Similar to PBP2A, the protein homolog PBP2A(LGA) was able to replace the essential function of the S. aureus PBP2 for growth. In contrast to PBP2A, PBP2A(LGA) did not depend on the transglycosylase activity of the native PBP2 for expression of high level resistance to oxacillin, suggesting that the PBP2A homolog may preferentially cooperate with a monofunctional transglycosylase as the alternative source of transglycosylase activity.

    The Journal of biological chemistry 2012;287;44;36854-63

  • De novo CNV analysis implicates specific abnormalities of postsynaptic signalling complexes in the pathogenesis of schizophrenia.

    Kirov G, Pocklington AJ, Holmans P, Ivanov D, Ikeda M, Ruderfer D, Moran J, Chambert K, Toncheva D, Georgieva L, Grozeva D, Fjodorova M, Wollerton R, Rees E, Nikolov I, van de Lagemaat LN, Bayés A, Fernandez E, Olason PI, Böttcher Y, Komiyama NH, Collins MO, Choudhary J, Stefansson K, Stefansson H, Grant SG, Purcell S, Sklar P, O'Donovan MC and Owen MJ

    Department of Psychological Medicine and Neurology, MRC Centre for Neuropsychiatric Genetics and Genomics, School of Medicine, Neuroscience and Mental Health Research Institute, Cardiff University, Cardiff, UK. kirov@cardiff.ac.uk

    A small number of rare, recurrent genomic copy number variants (CNVs) are known to substantially increase susceptibility to schizophrenia. As a consequence of the low fecundity in people with schizophrenia and other neurodevelopmental phenotypes to which these CNVs contribute, CNVs with large effects on risk are likely to be rapidly removed from the population by natural selection. Accordingly, such CNVs must frequently occur as recurrent de novo mutations. In a sample of 662 schizophrenia proband-parent trios, we found that rare de novo CNV mutations were significantly more frequent in cases (5.1% all cases, 5.5% family history negative) compared with 2.2% among 2623 controls, confirming the involvement of de novo CNVs in the pathogenesis of schizophrenia. Eight de novo CNVs occurred at four known schizophrenia loci (3q29, 15q11.2, 15q13.3 and 16p11.2). De novo CNVs of known pathogenic significance in other genomic disorders were also observed, including deletion at the TAR (thrombocytopenia absent radius) region on 1q21.1 and duplication at the WBS (Williams-Beuren syndrome) region at 7q11.23. Multiple de novos spanned genes encoding members of the DLG (discs large) family of membrane-associated guanylate kinases (MAGUKs) that are components of the postsynaptic density (PSD). Two de novos also affected EHMT1, a histone methyl transferase known to directly regulate DLG family members. Using a systems biology approach and merging novel CNV and proteomics data sets, systematic analysis of synaptic protein complexes showed that, compared with control CNVs, case de novos were significantly enriched for the PSD proteome (P=1.72 × 10⁻⁶. This was largely explained by enrichment for members of the N-methyl-D-aspartate receptor (NMDAR) (P=4.24 × 10⁻⁶) and neuronal activity-regulated cytoskeleton-associated protein (ARC) (P=3.78 × 10⁻⁸) postsynaptic signalling complexes. In an analysis of 18 492 subjects (7907 cases and 10 585 controls), case CNVs were enriched for members of the NMDAR complex (P=0.0015) but not ARC (P=0.14). Our data indicate that defects in NMDAR postsynaptic signalling and, possibly, ARC complexes, which are known to be important in synaptic plasticity and cognition, play a significant role in the pathogenesis of schizophrenia.

    Funded by: Medical Research Council: G0800509; NIMH NIH HHS: MH066392-05A1

    Molecular psychiatry 2012;17;2;142-53

  • Gene expression profiles in white blood cells of volunteers exposed to a 50 Hz electromagnetic field.

    Kirschenlohr H, Ellis P, Hesketh R and Metcalfe J

    Department of Biochemistry, University of Cambridge, Sanger Building, Cambridge, CB2 1GA, United Kingdom.

    Consistent and independently replicated laboratory evidence to support a causative relationship between environmental exposure to extremely low-frequency electromagnetic fields (EMFs) at power line frequencies and the associated increase in risk of childhood leukemia has not been obtained. In particular, although gene expression responses have been reported in a wide variety of cells, none has emerged as robust, widely replicated effects. DNA microarrays facilitate comprehensive searches for changes in gene expression without a requirement to select candidate responsive genes. To determine if gene expression changes occur in white blood cells of volunteers exposed to an ELF-EMF, each of 17 pairs of male volunteers age 20-30 was subjected either to a 50 Hz EMF exposure of 62.0 ± 7.1 μT for 2 h or to a sham exposure (0.21 ± 0.05 μT) at the same time (11:00 a.m. to 13:00 p.m.). The alternative regime for each volunteer was repeated on the following day and the two-day sequence was repeated 6 days later, with the exception that a null exposure (0.085 ± 0.01 μT) replaced the sham exposure. Five blood samples (10 ml) were collected at 2 h intervals from 9:00 to 17:00 with five additional samples during the exposure and sham or null exposure periods on each study day. RNA samples were pooled for the same time on each study day for the group of 17 volunteers that were subjected to the ELF-EMF exposure/sham or null exposure sequence and were analyzed on Illumina microarrays. Time courses for 16 mammalian genes previously reported to be responsive to ELF-EMF exposure, including immediate early genes, stress response, cell proliferation and apoptotic genes were examined in detail. No genes or gene sets showed consistent response profiles to repeated ELF-EMF exposures. A stress response was detected as a transient increase in plasma cortisol at the onset of either exposure or sham exposure on the first study day. The cortisol response diminished progressively on subsequent exposures or sham exposures, and was attributable to mild stress associated with the experimental protocol.

    Radiation research 2012;178;3;138-49

  • Structural genomics plucks high-hanging membrane proteins.

    Kloppmann E, Punta M and Rost B

    Department of Bioinformatics and Computational Biology, Technical University Munich, Boltzmannstr. 3, 85748 Garching/Munich, Germany; New York Consortium on Membrane Protein Structure, New York Structural Biology Center, 89 Convent Avenue, New York, NY 10027, USA.

    Recent years have seen the establishment of structural genomics centers that explicitly target integral membrane proteins. Here, we review the advances in targeting these extremely high-hanging fruits of structural biology in high-throughput mode. We observe that the experimental determination of high-resolution structures of integral membrane proteins is increasingly successful both in terms of getting structures and of covering important protein families, for example, from Pfam. Structural genomics has begun to contribute significantly toward this progress. An important component of this contribution is the set up of robotic pipelines that generate a wealth of experimental data for membrane proteins. We argue that prediction methods for the identification of membrane regions and for the comparison of membrane proteins largely suffice to meet the challenges of target selection for structural genomics of membrane proteins. In contrast, we need better methods to prioritize the most promising members in a family of closely related proteins and to annotate protein function from sequence and structure in absence of homology.

    Current opinion in structural biology 2012

  • The impact of diabetes on the pathogenesis of sepsis.

    Koh GC, Peacock SJ, van der Poll T and Wiersinga WJ

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. gavin.koh@gmail.com

    Diabetes is associated with an increased susceptibility to infection and sepsis. Conflicting data exist on whether the mortality of patients with sepsis is influenced by the presence of diabetes, fuelling the ongoing debate on the benefit of tight glucose regulation in patients with sepsis. The main reason for which diabetes predisposes to infection appears to be abnormalities of the host response, particularly in neutrophil chemotaxis, adhesion and intracellular killing, defects that have been attributed to the effect of hyperglycaemia. There is also evidence for defects in humoral immunity, and this may play a larger role than previously recognised. We review the literature on the immune response in diabetes and its potential contribution to the pathogenesis of sepsis. In addition, the effect of diabetes treatment on the immune response is discussed, with specific reference to insulin, metformin, sulphonylureas and thiazolidinediones.

    Funded by: Wellcome Trust: 086532/Z/08/Z

    European journal of clinical microbiology & infectious diseases : official publication of the European Society of Clinical Microbiology 2012;31;4;379-88

  • Comparative genomics of vancomycin-resistant Staphylococcus aureus strains and their positions within the clade most commonly associated with Methicillin-resistant S. aureus hospital-acquired infection in the United States.

    Kos VN, Desjardins CA, Griggs A, Cerqueira G, Van Tonder A, Holden MT, Godfrey P, Palmer KL, Bodi K, Mongodin EF, Wortman J, Feldgarden M, Lawley T, Gill SR, Haas BJ, Birren B and Gilmore MS

    Department of Ophthalmology, Harvard Microbial Sciences Initiative, Harvard Medical School, Massachusetts Eye and Ear Infirmary, Boston, Massachusetts, USA.

    Methicillin-resistant Staphylococcus aureus (MRSA) strains are leading causes of hospital-acquired infections in the United States, and clonal cluster 5 (CC5) is the predominant lineage responsible for these infections. Since 2002, there have been 12 cases of vancomycin-resistant S. aureus (VRSA) infection in the United States-all CC5 strains. To understand this genetic background and what distinguishes it from other lineages, we generated and analyzed high-quality draft genome sequences for all available VRSA strains. Sequence comparisons show unambiguously that each strain independently acquired Tn1546 and that all VRSA strains last shared a common ancestor over 50 years ago, well before the occurrence of vancomycin resistance in this species. In contrast to existing hypotheses on what predisposes this lineage to acquire Tn1546, the barrier posed by restriction systems appears to be intact in most VRSA strains. However, VRSA (and other CC5) strains were found to possess a constellation of traits that appears to be optimized for proliferation in precisely the types of polymicrobic infection where transfer could occur. They lack a bacteriocin operon that would be predicted to limit the occurrence of non-CC5 strains in mixed infection and harbor a cluster of unique superantigens and lipoproteins to confound host immunity. A frameshift in dprA, which in other microbes influences uptake of foreign DNA, may also make this lineage conducive to foreign DNA acquisition. IMPORTANCE: Invasive methicillin-resistant Staphylococcus aureus (MRSA) infection now ranks among the leading causes of death in the United States. Vancomycin is a key last-line bactericidal drug for treating these infections. However, since 2002, vancomycin resistance has entered this species. Of the now 12 cases of vancomycin-resistant S. aureus (VRSA), each was believed to represent a new acquisition of the vancomycin-resistant transposon Tn1546 from enterococcal donors. All acquisitions of Tn1546 so far have occurred in MRSA strains of the clonal cluster 5 genetic background, the most common hospital lineage causing hospital-acquired MRSA infection. To understand the nature of these strains, we determined and examined the nucleotide sequences of the genomes of all available VRSA. Genome comparison identified candidate features that position strains of this lineage well for acquiring resistance to antibiotics in mixed infection.

    Funded by: NEI NIH HHS: EY017381, R01 EY017381; NIAID NIH HHS: AI083214, P01 AI083214, R01 AI072360; PHS HHS: HHSN272200700055C, HHSN27220090018C

    mBio 2012;3;3

  • Transposon mutagenesis identifies genes that transform neural stem cells into glioma-initiating cells.

    Koso H, Takeda H, Yew CC, Ward JM, Nariai N, Ueno K, Nagasaki M, Watanabe S, Rust AG, Adams DJ, Copeland NG and Jenkins NA

    Division of Genetics and Genomics, Institute of Molecular and Cell Biology, Agency for Science, Technology and Research, Singapore 138673.

    Neural stem cells (NSCs) are considered to be the cell of origin of glioblastoma multiforme (GBM). However, the genetic alterations that transform NSCs into glioma-initiating cells remain elusive. Using a unique transposon mutagenesis strategy that mutagenizes NSCs in culture, followed by additional rounds of mutagenesis to generate tumors in vivo, we have identified genes and signaling pathways that can transform NSCs into glioma-initiating cells. Mobilization of Sleeping Beauty transposons in NSCs induced the immortalization of astroglial-like cells, which were then able to generate tumors with characteristics of the mesenchymal subtype of GBM on transplantation, consistent with a potential astroglial origin for mesenchymal GBM. Sequence analysis of transposon insertion sites from tumors and immortalized cells identified more than 200 frequently mutated genes, including human GBM-associated genes, such as Met and Nf1, and made it possible to discriminate between genes that function during astroglial immortalization vs. later stages of tumor development. We also functionally validated five GBM candidate genes using a previously undescribed high-throughput method. Finally, we show that even clonally related tumors derived from the same immortalized line have acquired distinct combinations of genetic alterations during tumor development, suggesting that tumor formation in this model system involves competition among genetically variant cells, which is similar to the Darwinian evolutionary processes now thought to generate many human cancers. This mutagenesis strategy is faster and simpler than conventional transposon screens and can potentially be applied to any tissue stem/progenitor cells that can be grown and differentiated in vitro.

    Proceedings of the National Academy of Sciences of the United States of America 2012;109;44;E2998-3007

  • Novel mutations consolidate KCTD7 as a progressive myoclonus epilepsy gene.

    Kousi M, Anttila V, Schulz A, Calafato S, Jakkula E, Riesch E, Myllykangas L, Kalimo H, Topçu M, Gökben S, Alehan F, Lemke JR, Alber M, Palotie A, Kopra O and Lehesjoki AE

    Folkhälsan Institute of Genetics, Biomedicum Helsinki, PO Box 63, Haartmaninkatu 8, University of Helsinki, FIN-00014 Helsinki, Finland.

    Background: The progressive myoclonus epilepsies (PMEs) comprise a group of clinically and genetically heterogeneous disorders characterised by myoclonus, epilepsy, and neurological deterioration. This study aimed to identify the underlying gene(s) in childhood onset PME patients with unknown molecular genetic background.

    Methods: Homozygosity mapping was applied on genome-wide single nucleotide polymorphism data of 18 Turkish patients. The potassium channel tetramerisation domain-containing 7 (KCTD7) gene, previously associated with PME in a single inbred family, was screened for mutations. The spatiotemporal expression of KCTD7 was assessed in cellular cultures and mouse brain tissue.

    Results: Overlapping homozygosity in 8/18 patients defined a 1.5 Mb segment on 7q11.21 as the major candidate locus. Screening of the positional candidate gene KCTD7 revealed homozygous missense mutations in two of the eight cases. Screening of KCTD7 in a further 132 PME patients revealed four additional mutations (two missense, one in-frame deletion, and one frameshift-causing) in five families. Eight patients presented with myoclonus and epilepsy and one with ataxia, the mean age of onset being 19 months. Within 2 years after onset, progressive loss of mental and motor skills ensued leading to severe dementia and motor handicap. KCTD7 showed cytosolic localisation and predominant neuronal expression, with widespread expression throughout the brain. None of three polypeptides carrying patient missense mutations affected the subcellular distribution of KCTD7.

    Discussion: These data confirm the causality of KCTD7 defects in PME, and imply that KCTD7 mutation screening should be considered in PME patients with onset around 2 years of age followed by rapid mental and motor deterioration.

    Journal of medical genetics 2012;49;6;391-9

  • The transcriptional landscape and small RNAs of Salmonella enterica serovar Typhimurium.

    Kröger C, Dillon SC, Cameron AD, Papenfort K, Sivasankaran SK, Hokamp K, Chao Y, Sittka A, Hébrard M, Händler K, Colgan A, Leekitcharoenphon P, Langridge GC, Lohan AJ, Loftus B, Lucchini S, Ussery DW, Dorman CJ, Thomson NR, Vogel J and Hinton JC

    Department of Microbiology, School of Genetics and Microbiology, Moyne Institute of Preventive Medicine, Trinity College Dublin, Dublin 2, Ireland.

    More than 50 y of research have provided great insight into the physiology, metabolism, and molecular biology of Salmonella enterica serovar Typhimurium (S. Typhimurium), but important gaps in our knowledge remain. It is clear that a precise choreography of gene expression is required for Salmonella infection, but basic genetic information such as the global locations of transcription start sites (TSSs) has been lacking. We combined three RNA-sequencing techniques and two sequencing platforms to generate a robust picture of transcription in S. Typhimurium. Differential RNA sequencing identified 1,873 TSSs on the chromosome of S. Typhimurium SL1344 and 13% of these TSSs initiated antisense transcripts. Unique findings include the TSSs of the virulence regulators phoP, slyA, and invF. Chromatin immunoprecipitation revealed that RNA polymerase was bound to 70% of the TSSs, and two-thirds of these TSSs were associated with σ(70) (including phoP, slyA, and invF) from which we identified the -10 and -35 motifs of σ(70)-dependent S. Typhimurium gene promoters. Overall, we corrected the location of important genes and discovered 18 times more promoters than identified previously. S. Typhimurium expresses 140 small regulatory RNAs (sRNAs) at early stationary phase, including 60 newly identified sRNAs. Almost half of the experimentally verified sRNAs were found to be unique to the Salmonella genus, and <20% were found throughout the Enterobacteriaceae. This description of the transcriptional map of SL1344 advances our understanding of S. Typhimurium, arguably the most important bacterial infection model.

    Proceedings of the National Academy of Sciences of the United States of America 2012;109;20;E1277-86

  • Integrated molecular profiles of invasive breast tumors and ductal carcinoma in situ (DCIS) reveal differential vascular and interleukin signaling.

    Kristensen VN, Vaske CJ, Ursini-Siegel J, Van Loo P, Nordgard SH, Sachidanandam R, Sørlie T, Wärnberg F, Haakensen VD, Helland Å, Naume B, Perou CM, Haussler D, Troyanskaya OG and Børresen-Dale AL

    Department of Genetics, Institute for Cancer Research, and Cancer Clinic, Oslo University Hospital Radiumhospitalet, 0310 Oslo, Norway.

    We use an integrated approach to understand breast cancer heterogeneity by modeling mRNA, copy number alterations, microRNAs, and methylation in a pathway context utilizing the pathway recognition algorithm using data integration on genomic models (PARADIGM). We demonstrate that combining mRNA expression and DNA copy number classified the patients in groups that provide the best predictive value with respect to prognosis and identified key molecular and stromal signatures. A chronic inflammatory signature, which promotes the development and/or progression of various epithelial tumors, is uniformly present in all breast cancers. We further demonstrate that within the adaptive immune lineage, the strongest predictor of good outcome is the acquisition of a gene signature that favors a high T-helper 1 (Th1)/cytotoxic T-lymphocyte response at the expense of Th2-driven humoral immunity. Patients who have breast cancer with a basal HER2-negative molecular profile (PDGM2) are characterized by high expression of protumorigenic Th2/humoral-related genes (24-38%) and a low Th1/Th2 ratio. The luminal molecular subtypes are again differentiated by low or high FOXM1 and ERBB4 signaling. We show that the interleukin signaling profiles observed in invasive cancers are absent or weakly expressed in healthy tissue but already prominent in ductal carcinoma in situ, together with ECM and cell-cell adhesion regulating pathways. The most prominent difference between low and high mammographic density in healthy breast tissue by PARADIGM was that of STAT4 signaling. In conclusion, by means of a pathway-based modeling methodology (PARADIGM) integrating different layers of molecular data from whole-tumor samples, we demonstrate that we can stratify immune signatures that predict patient survival.

    Funded by: Howard Hughes Medical Institute; NCI NIH HHS: P50-CA58223, R01 CA138255, R01-CA138255

    Proceedings of the National Academy of Sciences of the United States of America 2012;109;8;2802-7

  • Auditory function in the Tc1 mouse model of down syndrome suggests a limited region of human chromosome 21 involved in otitis media.

    Kuhn S, Ingham N, Pearson S, Gribble SM, Clayton S, Steel KP and Marcotti W

    Department of Biomedical Science, University of Sheffield, Sheffield, United Kingdom.

    Down syndrome is one of the most common congenital disorders leading to a wide range of health problems in humans, including frequent otitis media. The Tc1 mouse carries a significant part of human chromosome 21 (Hsa21) in addition to the full set of mouse chromosomes and shares many phenotypes observed in humans affected by Down syndrome with trisomy of chromosome 21. However, it is unknown whether Tc1 mice exhibit a hearing phenotype and might thus represent a good model for understanding the hearing loss that is common in Down syndrome. In this study we carried out a structural and functional assessment of hearing in Tc1 mice. Auditory brainstem response (ABR) measurements in Tc1 mice showed normal thresholds compared to littermate controls and ABR waveform latencies and amplitudes were equivalent to controls. The gross anatomy of the middle and inner ears was also similar between Tc1 and control mice. The physiological properties of cochlear sensory receptors (inner and outer hair cells: IHCs and OHCs) were investigated using single-cell patch clamp recordings from the acutely dissected cochleae. Adult Tc1 IHCs exhibited normal resting membrane potentials and expressed all K(+) currents characteristic of control hair cells. However, the size of the large conductance (BK) Ca(2+) activated K(+) current (I(K,f)), which enables rapid voltage responses essential for accurate sound encoding, was increased in Tc1 IHCs. All physiological properties investigated in OHCs were indistinguishable between the two genotypes. The normal functional hearing and the gross structural anatomy of the middle and inner ears in the Tc1 mouse contrast to that observed in the Ts65Dn model of Down syndrome which shows otitis media. Genes that are trisomic in Ts65Dn but disomic in Tc1 may predispose to otitis media when an additional copy is active.

    Funded by: Action on Hearing Loss: G41; Wellcome Trust: 077189, 088719

    PloS one 2012;7;2;e31433

  • Rapid Turnover of Long Noncoding RNAs and the Evolution of Gene Expression.

    Kutter C, Watt S, Stefflova K, Wilson MD, Goncalves A, Ponting CP, Odom DT and Marques AC

    Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Cambridge, United Kingdom.

    A large proportion of functional sequence within mammalian genomes falls outside protein-coding exons and can be transcribed into long RNAs. However, the roles in mammalian biology of long noncoding RNA (lncRNA) are not well understood. Few lncRNAs have experimentally determined roles, with some of these being lineage-specific. Determining the extent by which transcription of lncRNA loci is retained or lost across multiple evolutionary lineages is essential if we are to understand their contribution to mammalian biology and to lineage-specific traits. Here, we experimentally investigated the conservation of lncRNA expression among closely related rodent species, allowing the evolution of DNA sequence to be uncoupled from evolution of transcript expression. We generated total RNA (RNAseq) and H3K4me3-bound (ChIPseq) DNA data, and combined both to construct catalogues of transcripts expressed in the adult liver of Mus musculus domesticus (C57BL/6J), Mus musculus castaneus, and Rattus norvegicus. We estimated the rate of transcriptional turnover of lncRNAs and investigated the effects of their lineage-specific birth or death. LncRNA transcription showed considerably greater gain and loss during rodent evolution, compared with protein-coding genes. Nucleotide substitution rates were found to mirror the in vivo transcriptional conservation of intergenic lncRNAs between rodents: only the sequences of noncoding loci with conserved transcription were constrained. Finally, we found that lineage-specific intergenic lncRNAs appear to be associated with modestly elevated expression of genomically neighbouring protein-coding genes. Our findings show that nearly half of intergenic lncRNA loci have been gained or lost since the last common ancestor of mouse and rat, and they predict that such rapid transcriptional turnover contributes to the evolution of tissue- and lineage-specific gene expression.

    PLoS genetics 2012;8;7;e1002841

  • The population pharmacokinetics of R- and S-warfarin: effect of genetic and clinical factors.

    Lane S, Al-Zubiedi S, Hatch E, Matthews I, Jorgensen AL, Deloukas P, Daly AK, Park BK, Aarons L, Ogungbenro K, Kamali F, Hughes D and Pirmohamed M

    Department of Biostatstics, Brownlow Street, University of Liverpool, Liverpool L69 3GS, UK. slane@liverpool.ac.uk

    Background: Warfarin is a drug with a narrow therapeutic index and large interindividual variability in daily dosing requirements. Patients commencing warfarin treatment are at risk of bleeding due to excessive anticoagulation caused by overdosing. The interindividual variability in dose requirements is influenced by a number of factors, including polymorphisms in genes mediating warfarin pharmacology, co-medication, age, sex, body size and diet.

    Aims: To develop population pharmacokinetic models of both R- and S-warfarin using clinical and genetic factors and to identify the covariates which influence the interindividual variability in the pharmacokinetic parameters of clearance and volume of distribution in patients on long-term warfarin therapy.

    Methods: Patients commencing warfarin therapy were followed up for 26 weeks. Plasma warfarin enantiomer concentrations were determined in 306 patients for S-warfarin and in 309 patients for R-warfarin at 1, 8 and 26 weeks. Patients were also genotyped for CYP2C9 variants (CYP2C9*1,*2 and *3), two single-nucleotide polymorphisms (SNPs) in CYP1A2, one SNP in CYP3A4 and six SNPs in CYP2C19. A base pharmacokinetic model was developed using NONMEM software to determine the warfarin clearance and volume of distribution. The model was extended to include covariates that influenced the between-subject variability.

    Results: Bodyweight, age, sex and CYP2C9 genotype significantly influenced S-warfarin clearance. The S-warfarin clearance was estimated to be 0.144 l h⁻¹ (95% confidence interval 0.131, 0.157) in a 70 kg woman aged 69.8 years with the wild-type CYP2C9 genotype, and the volume of distribution was 16.6 l (95% confidence interval 13.5, 19.7). Bodyweight and age, along with the SNPs rs3814637 (in CYP2C19) and rs2242480 (in CYP3A4), significantly influenced R-warfarin clearance. The R-warfarin clearance was estimated to be 0.125 l h⁻¹ (95% confidence interval 0.115, 0.135) in a 70 kg individual aged 69.8 years with the wild-type CYP2C19 and CYP3A4 genotypes, and the volume of distribution was 10.9 l (95% confidence interval 8.63, 13.2).

    Conclusions: Our analysis, based on exposure rather than dose, provides quantitative estimates of the clinical and genetic factors impacting on the clearance of both the S- and R-enantiomers of warfarin, which can be used in developing improved dosing algorithms.

    Funded by: Department of Health; Wellcome Trust

    British journal of clinical pharmacology 2012;73;1;66-76

  • Targeted restoration of the intestinal microbiota with a simple, defined bacteriotherapy resolves relapsing Clostridium difficile disease in mice.

    Lawley TD, Clare S, Walker AW, Stares MD, Connor TR, Raisen C, Goulding D, Rad R, Schreiber F, Brandt C, Deakin LJ, Pickard DJ, Duncan SH, Flint HJ, Clark TG, Parkhill J and Dougan G

    Wellcome Trust Sanger Institute, Hinxton, United Kingdom. tl2@sanger.ac.uk

    Relapsing C. difficile disease in humans is linked to a pathological imbalance within the intestinal microbiota, termed dysbiosis, which remains poorly understood. We show that mice infected with epidemic C. difficile (genotype 027/BI) develop highly contagious, chronic intestinal disease and persistent dysbiosis characterized by a distinct, simplified microbiota containing opportunistic pathogens and altered metabolite production. Chronic C. difficile 027/BI infection was refractory to vancomycin treatment leading to relapsing disease. In contrast, treatment of C. difficile 027/BI infected mice with feces from healthy mice rapidly restored a diverse, healthy microbiota and resolved C. difficile disease and contagiousness. We used this model to identify a simple mixture of six phylogenetically diverse intestinal bacteria, including novel species, which can re-establish a health-associated microbiota and clear C. difficile 027/BI infection from mice. Thus, targeting a dysbiotic microbiota with a defined mixture of phylogenetically diverse bacteria can trigger major shifts in the microbial community structure that displaces C. difficile and, as a result, resolves disease and contagiousness. Further, we demonstrate a rational approach to harness the therapeutic potential of health-associated microbial communities to treat C. difficile disease and potentially other forms of intestinal dysbiosis.

    Funded by: Medical Research Council: 93614, G0901743; Wellcome Trust: 076964, 098051

    PLoS pathogens 2012;8;10;e1002995

  • Characterization and gene expression analysis of the cir multi-gene family of Plasmodium chabaudi chabaudi (AS).

    Lawton J, Brugat T, Yan YX, Reid AJ, Böhme U, Otto TD, Pain A, Jackson A, Berriman M, Cunningham D, Preiser P and Langhorne J

    Division of Parasitology, MRC National Institute for Medical Research, London, UK.

    Background: The pir genes comprise the largest multi-gene family in Plasmodium, with members found in P. vivax, P. knowlesi and the rodent malaria species. Despite comprising up to 5% of the genome, little is known about the functions of the proteins encoded by pir genes. P. chabaudi causes chronic infection in mice, which may be due to antigenic variation. In this model, pir genes are called cirs and may be involved in this mechanism, allowing evasion of host immune responses. In order to fully understand the role(s) of CIR proteins during P. chabaudi infection, a detailed characterization of the cir gene family was required.

    Results: The cir repertoire was annotated and a detailed bioinformatic characterization of the encoded CIR proteins was performed. Two major sub-families were identified, which have been named A and B. Members of each sub-family displayed different amino acid motifs, and were thus predicted to have undergone functional divergence. In addition, the expression of the entire cir repertoire was analyzed via RNA sequencing and microarray. Up to 40% of the cir gene repertoire was expressed in the parasite population during infection, and dominant cir transcripts could be identified. In addition, some differences were observed in the pattern of expression between the cir subgroups at the peak of P. chabaudi infection. Finally, specific cir genes were expressed at different time points during asexual blood stages.

    Conclusions: In conclusion, the large number of cir genes and their expression throughout the intraerythrocytic cycle of development indicates that CIR proteins are likely to be important for parasite survival. In particular, the detection of dominant cir transcripts at the peak of P. chabaudi infection supports the idea that CIR proteins are expressed, and could perform important functions in the biology of this parasite. Further application of the methodologies described here may allow the elucidation of CIR sub-family A and B protein functions, including their contribution to antigenic variation and immune evasion.

    Funded by: Medical Research Council: MC_EX_G0901345, U117584248

    BMC genomics 2012;13;125

  • Comprehensive sequence analysis of nine Usher syndrome genes in the UK National Collaborative Usher Study.

    Le Quesne Stabej P, Saihan Z, Rangesh N, Steele-Stallard HB, Ambrose J, Coffey A, Emmerson J, Haralambous E, Hughes Y, Steel KP, Luxon LM, Webster AR and Bitner-Glindzicz M

    Clinical and Molecular Genetics, Institute of Child Health, UCL, London, UK.

    Background: Usher syndrome (USH) is an autosomal recessive disorder comprising retinitis pigmentosa, hearing loss and, in some cases, vestibular dysfunction. It is clinically and genetically heterogeneous with three distinctive clinical types (I-III) and nine Usher genes identified. This study is a comprehensive clinical and genetic analysis of 172 Usher patients and evaluates the contribution of digenic inheritance.

    Methods: The genes MYO7A, USH1C, CDH23, PCDH15, USH1G, USH2A, GPR98, WHRN, CLRN1 and the candidate gene SLC4A7 were sequenced in 172 UK Usher patients, regardless of clinical type.

    Results: No subject had definite mutations (nonsense, frameshift or consensus splice site mutations) in two different USH genes. Novel missense variants were classified UV1-4 (unclassified variant): UV4 is 'probably pathogenic', based on control frequency <0.23%, identification in trans to a pathogenic/probably pathogenic mutation and segregation with USH in only one family; and UV3 ('likely pathogenic') as above, but no information on phase. Overall 79% of identified pathogenic/UV4/UV3 variants were truncating and 21% were missense changes. MYO7A accounted for 53.2%, and USH1C for 14.9% of USH1 families (USH1C:c.496+1G>A being the most common USH1 mutation in the cohort). USH2A was responsible for 79.3% of USH2 families and GPR98 for only 6.6%. No mutations were found in USH1G, WHRN or SLC4A7.

    Conclusions: One or two pathogenic/likely pathogenic variants were identified in 86% of cases. No convincing cases of digenic inheritance were found. It is concluded that digenic inheritance does not make a significant contribution to Usher syndrome; the observation of multiple variants in different genes is likely to reflect polymorphic variation, rather than digenic effects.

    Funded by: Wellcome Trust

    Journal of medical genetics 2012;49;1;27-36

  • Characterising chromosome rearrangements: recent technical advances in molecular cytogenetics.

    Le Scouarnec S and Gribble SM

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. sls2@sanger.ac.uk

    Genomic rearrangements can result in losses, amplifications, translocations and inversions of DNA fragments thereby modifying genome architecture, and potentially having clinical consequences. Many genomic disorders caused by structural variation have initially been uncovered by early cytogenetic methods. The last decade has seen significant progression in molecular cytogenetic techniques, allowing rapid and precise detection of structural rearrangements on a whole-genome scale. The high resolution attainable with these recently developed techniques has also uncovered the role of structural variants in normal genetic variation alongside single-nucleotide polymorphisms (SNPs). We describe how array-based comparative genomic hybridisation, SNP arrays, array painting and next-generation sequencing analytical methods (read depth, read pair and split read) allow the extensive characterisation of chromosome rearrangements in human genomes.

    Funded by: Wellcome Trust: WT098051

    Heredity 2012;108;1;75-85

  • Genotype-based test in mapping cis-regulatory variants from allele-specific expression data.

    Lefebvre JF, Vello E, Ge B, Montgomery SB, Dermitzakis ET, Pastinen T and Labuda D

    Centre de Recherche du CHU Sainte-Justine, Université de Montréal, Montréal, Québec, Canada.

    Identifying and understanding the impact of gene regulatory variation is of considerable importance in evolutionary and medical genetics; such variants are thought to be responsible for human-specific adaptation [1] and to have an important role in genetic disease. Regulatory variation in cis is readily detected in individuals showing uneven expression of a transcript from its two allelic copies, an observation referred to as allelic imbalance (AI). Identifying individuals exhibiting AI allows mapping of regulatory DNA regions and the potential to identify the underlying causal genetic variant(s). However, existing mapping methods require knowledge of the haplotypes, which make them sensitive to phasing errors. In this study, we introduce a genotype-based mapping test that does not require haplotype-phase inference to locate regulatory regions. The test relies on partitioning genotypes of individuals exhibiting AI and those not expressing AI in a 2×3 contingency table. The performance of this test to detect linkage disequilibrium (LD) between a potential regulatory site and a SNP located in this region was examined by analyzing the simulated and the empirical AI datasets. In simulation experiments, the genotype-based test outperforms the haplotype-based tests with the increasing distance separating the regulatory region from its regulated transcript. The genotype-based test performed equally well with the experimental AI datasets, either from genome-wide cDNA hybridization arrays or from RNA sequencing. By avoiding the need of haplotype inference, the genotype-based test will suit AI analyses in population samples of unknown haplotype structure and will additionally facilitate the identification of cis-regulatory variants that are located far away from the regulated transcript.

    PloS one 2012;7;6;e38667

  • Transferrin and HFE genes interact in Alzheimer's disease risk: the Epistasis Project.

    Lehmann DJ, Schuur M, Warden DR, Hammond N, Belbin O, Kölsch H, Lehmann MG, Wilcock GK, Brown K, Kehoe PG, Morris CM, Barker R, Coto E, Alvarez V, Deloukas P, Mateo I, Gwilliam R, Combarros O, Arias-Vásquez A, Aulchenko YS, Ikram MA, Breteler MM, van Duijn CM, Oulhaj A, Heun R, Cortina-Borja M, Morgan K, Robson K and Smith AD

    Oxford Project to Investigate Memory and Ageing, University Department of Physiology, Anatomy and Genetics, Oxford, UK. donald.lehmann@pharm.ox.ac.uk

    Iron overload may contribute to the risk of Alzheimer's disease (AD). In the Epistasis Project, with 1757 cases of AD and 6295 controls, we studied 4 variants in 2 genes of iron metabolism: hemochromatosis (HFE) C282Y and H63D, and transferrin (TF) C2 and -2G/A. We replicated the reported interaction between HFE 282Y and TF C2 in the risk of AD: synergy factor, 1.75 (95% confidence interval, 1.1-2.8, p = 0.02) in Northern Europeans. The synergy factor was 3.1 (1.4-6.9; 0.007) in subjects with the APOEε4 allele. We found another interaction, between HFE 63HH and TF -2AA, markedly modified by age. Both interactions were found mainly or only in Northern Europeans. The interaction between HFE 282Y and TF C2 has now been replicated twice, in altogether 2313 cases of AD and 7065 controls, and has also been associated with increased iron load. We therefore suggest that iron overload may be a causative factor in the development of AD. Treatment for iron overload might thus be protective in some cases.

    Funded by: Medical Research Council: G0400074, G0400546, G0502157, G0900652, G1100540

    Neurobiology of aging 2012;33;1;202.e1-13

  • A cornucopia of candidates for deafness.

    Lewis MA and Steel KP

    Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Many genes involved in deafness are yet to be discovered. Here, Senthilan et al. focus on the Drosophila Johnston's organ to uncover a wide variety of genes, including several unexpected candidates as well as those already known to underlie deafness in mice and humans.

    Cell 2012;150;5;879-81

  • Six novel susceptibility Loci for early-onset androgenetic alopecia and their unexpected association with common diseases.

    Li R, Brockschmidt FF, Kiefer AK, Stefansson H, Nyholt DR, Song K, Vermeulen SH, Kanoni S, Glass D, Medland SE, Dimitriou M, Waterworth D, Tung JY, Geller F, Heilmann S, Hillmer AM, Bataille V, Eigelshoven S, Hanneken S, Moebus S, Herold C, den Heijer M, Montgomery GW, Deloukas P, Eriksson N, Heath AC, Becker T, Sulem P, Mangino M, Vollenweider P, Spector TD, Dedoussis G, Martin NG, Kiemeney LA, Mooser V, Stefansson K, Hinds DA, Nöthen MM and Richards JB

    Departments of Medicine, Human Genetics, Epidemiology, and Biostatistics, Lady Davis Institute, Jewish General Hospital, McGill University, Montreal, Quebec, Canada.

    Androgenetic alopecia (AGA) is a highly heritable condition and the most common form of hair loss in humans. Susceptibility loci have been described on the X chromosome and chromosome 20, but these loci explain a minority of its heritable variance. We conducted a large-scale meta-analysis of seven genome-wide association studies for early-onset AGA in 12,806 individuals of European ancestry. While replicating the two AGA loci on the X chromosome and chromosome 20, six novel susceptibility loci reached genome-wide significance (p = 2.62×10(-9)-1.01×10(-12)). Unexpectedly, we identified a risk allele at 17q21.31 that was recently associated with Parkinson's disease (PD) at a genome-wide significant level. We then tested the association between early-onset AGA and the risk of PD in a cross-sectional analysis of 568 PD cases and 7,664 controls. Early-onset AGA cases had significantly increased odds of subsequent PD (OR = 1.28, 95% confidence interval: 1.06-1.55, p = 8.9×10(-3)). Further, the AGA susceptibility alleles at the 17q21.31 locus are on the H1 haplotype, which is under negative selection in Europeans and has been linked to decreased fertility. Combining the risk alleles of six novel and two established susceptibility loci, we created a genotype risk score and tested its association with AGA in an additional sample. Individuals in the highest risk quartile of a genotype score had an approximately six-fold increased risk of early-onset AGA [odds ratio (OR) = 5.78, p = 1.4×10(-88)]. Our results highlight unexpected associations between early-onset AGA, Parkinson's disease, and decreased fertility, providing important insights into the pathophysiology of these conditions.

    PLoS genetics 2012;8;5;e1002746

  • Design and implementation of ProteinWorldDB

    Lifschitz,S., Viana,C.J.M., Tristão,C., Catanho,M., Degrave,W.M., De Miranda,A.B., Bezerra,M. and OTTO,T.D.

    Affiliation: PUC-Rio, Departamento de Informática, Rio de Janeiro, RJ, Brazil; Affiliation: Laboratório de Genômica Funcional e Bioinformática, Instituto Oswaldo Cruz, Rio de Janeiro, RJ, Brazil; Affiliation: Laboratório de Biologia Computacional e Sistemas, Instituto Oswaldo Cruz, Rio de Janeiro, RJ, Brazil; Affiliation: Wellcome Trust Sanger Institute, Hinxton, United Kingdom; Correspondence Address: Lifschitz, S.; PUC-Rio, Departamento de Informática, Rio de Janeiro, RJ, Brazil

    This work involves the comparison of protein information in a genomic scale. The main goal is to improve the quality and interpretation of biological data, besides our understanding of biological systems and their interactions. Stringent comparisons were obtained after the application of the Smith-Waterman algorithm in a pair wise manner to all predicted proteins encoded in both completely sequenced and unfinished genomes available in the public database RefSeq. Comparisons were run through a computational grid and the complete result reaches a volume of over 900 GB. Consequently, the database system design is a critical step in order to store and manage the information from comparisons' results. This paper describes database conceptual design issues for the creation of a database that represents a data set of protein sequence cross-comparisons. We show that our conceptual schema and its relational mapping enables users to extract relevant information, from simple to complex queries integrating distinct data sources

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)  2012;7409;144-55

  • The 2-methylcitrate cycle is implicated in the detoxification of propionate in Toxoplasma gondii.

    Limenitakis J, Oppenheim RD, Creek DJ, Foth BJ, Barrett MP and Soldati-Favre D

    Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, CMU 1 Rue Michel Servet, 1211, Geneva, Switzerland.

    Toxoplasma gondii belongs to the coccidian subgroup of the Apicomplexa phylum. The Coccidia are obligate intracellular pathogens that establish infection in their mammalian host via the enteric route. These parasites lack a mitochondrial pyruvate dehydrogenase complex but have preserved the degradation of branched-chain amino acids (BCAA) as a possible pathway to generate acetyl-CoA. Importantly, degradation of leucine, isoleucine and valine could lead to concomitant accumulation of propionyl-CoA, a toxic metabolite that inhibits cell growth. Like fungi and bacteria, the Coccidia possess the complete set of enzymes necessary to metabolize and detoxify propionate by oxidation to pyruvate via the 2-methylcitrate cycle (2-MCC). Phylogenetic analysis provides evidence that the 2-MCC was acquired via horizontal gene transfer. In T. gondii tachyzoites, this pathway is split between the cytosol and the mitochondrion. Although the rate-limiting enzyme 2-methylisocitrate lyase is dispensable for parasite survival, its substrates accumulate in parasites deficient in the enzyme and its absence confers increased sensitivity to propionic acid. BCAA is also dispensable in tachyzoites, leaving unresolved the source of mitochondrial acetyl-CoA.

    Molecular microbiology 2012

  • Muscle diseases in the zebrafish.

    Lin YY

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, United Kingdom. YYL@sanger.ac.uk

    Animal models in biomedical research are important for understanding the pathological mechanisms of human diseases at a molecular and cellular level. Several aspects of mammalian animals, however, may limit their use in modelling neuromuscular disorders. Many attributes of zebrafish (Danio rerio) are complementary to mammalian experimental systems, establishing the zebrafish as a powerful model organism in disease biology. This review focuses on a number of key studies using the zebrafish to model hereditary muscle diseases with additional emphasis on recent advances in zebrafish functional genomics and drug discovery. Increasing research in zebrafish disease models, combined with knowledge from mammalian models, will bring novel insights into the disease pathogenesis of neuromuscular disorders, as well as facilitate the development of effective therapeutic strategies.

    Funded by: Wellcome Trust

    Neuromuscular disorders : NMD 2012;22;8;673-84

  • Mosaic overgrowth with fibroadipose hyperplasia is caused by somatic activating mutations in PIK3CA.

    Lindhurst MJ, Parker VE, Payne F, Sapp JC, Rudge S, Harris J, Witkowski AM, Zhang Q, Groeneveld MP, Scott CE, Daly A, Huson SM, Tosi LL, Cunningham ML, Darling TN, Geer J, Gucev Z, Sutton VR, Tziotzios C, Dixon AK, Helliwell T, O'Rahilly S, Savage DB, Wakelam MJ, Barroso I, Biesecker LG and Semple RK

    The National Human Genome Research Institute, US National Institutes of Health, Bethesda, Maryland, USA.

    The phosphatidylinositol 3-kinase (PI3K)-AKT signaling pathway is critical for cellular growth and metabolism. Correspondingly, loss of function of PTEN, a negative regulator of PI3K, or activating mutations in AKT1, AKT2 or AKT3 have been found in distinct disorders featuring overgrowth or hypoglycemia. We performed exome sequencing of DNA from unaffected and affected cells from an individual with an unclassified syndrome of congenital progressive segmental overgrowth of fibrous and adipose tissue and bone and identified the cancer-associated mutation encoding p.His1047Leu in PIK3CA, the gene that encodes the p110α catalytic subunit of PI3K, only in affected cells. Sequencing of PIK3CA in ten additional individuals with overlapping syndromes identified either the p.His1047Leu alteration or a second cancer-associated alteration, p.His1047Arg, in nine cases. Affected dermal fibroblasts showed enhanced basal and epidermal growth factor (EGF)-stimulated phosphatidylinositol 3,4,5-trisphosphate (PIP(3)) generation and concomitant activation of downstream signaling relative to their unaffected counterparts. Our findings characterize a distinct overgrowth syndrome, biochemically demonstrate activation of PI3K signaling and thereby identify a rational therapeutic target.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council; NCATS NIH HHS: UL1 TR000423; Wellcome Trust: 077016, 078986, 078986/Z/06/Z, 080952, 091551, 091551/Z/10/Z, 095515, 097721, 097721/Z/11/Z, 098051/Z/05/Z, 80952/Z/06/Z

    Nature genetics 2012;44;8;928-33

  • Metal binding in proteins: Machine learning complements X-ray absorption spectroscopy

    Lippi,M., Passerini,A., PUNTA,M. and Frasconi,P.

    Lecture Notes in Computer Science  2012;7524;854-857

  • High prevalence of posterior polymorphous corneal dystrophy in the czech republic; linkage disequilibrium mapping and dating an ancestral mutation.

    Liskova P, Gwilliam R, Filipec M, Jirsova K, Reinstein Merjava S, Deloukas P, Webb TR, Bhattacharya SS, Ebenezer ND, Morris AG and Hardcastle AJ

    Laboratory of the Biology and Pathology of the Eye, Institute of Inherited Metabolic Diseases, First Faculty of Medicine, Charles University in Prague and General University Hospital in Prague, Prague, Czech Republic ; Department of Ophthalmology, First Faculty of Medicine, Charles University in Prague and General University Hospital in Prague, Prague, Czech Republic ; UCL Institute of Ophthalmology, London, United Kingdom.

    Posterior polymorphous corneal dystrophy (PPCD) is a rare autosomal dominant genetically heterogeneous disorder. Nineteen Czech PPCD pedigrees with 113 affected family members were identified, and 17 of these kindreds were genotyped for markers on chromosome 20p12.1- 20q12. Comparison of haplotypes in 81 affected members, 20 unaffected first degree relatives and 13 spouses, as well as 55 unrelated controls, supported the hypothesis of a shared ancestor in 12 families originating from one geographic location. In 38 affected individuals from nine of these pedigrees, a common haplotype was observed between D20S48 and D20S107 spanning approximately 23 Mb, demonstrating segregation of disease with the PPCD1 locus. This haplotype was not detected in 110 ethnically matched control chromosomes. Within the common founder haplotype, a core mini-haplotype was detected for D20S605, D20S182 and M189K2 in all 67 affected members from families 1-12, however alleles representing the core mini-haplotype were also detected in population matched controls. The most likely location of the responsible gene within the disease interval, and estimated mutational age, were inferred by linkage disequilibrium mapping (DMLE+2.3). The appearance of a disease-causing mutation was dated between 64-133 generations. The inferred ancestral locus carrying a PPCD1 disease-causing variant within the disease interval spans 60 Kb on 20p11.23, which contains a single known protein coding gene, ZNF133. However, direct sequence analysis of coding and untranslated exons did not reveal a potential pathogenic mutation. Microdeletion or duplication was also excluded by comparative genomic hybridization using a dense chromosome 20 specific array. Geographical origin, haplotype and statistical analysis suggest that in 14 unrelated families an as yet undiscovered mutation on 20p11.23 was inherited from a common ancestor. Prevalence of PPCD in the Czech Republic appears to be the highest worldwide and our data suggests that at least one other novel locus for PPCD also exists.

    PloS one 2012;7;9;e45495

  • Dense fine-mapping study identifies new susceptibility loci for primary biliary cirrhosis.

    Liu JZ, Almarri MA, Gaffney DJ, Mells GF, Jostins L, Cordell HJ, Ducker SJ, Day DB, Heneghan MA, Neuberger JM, Donaldson PT, Bathgate AJ, Burroughs A, Davies MH, Jones DE, Alexander GJ, Barrett JC, Sandford RN, Anderson CA, UK Primary Biliary Cirrhosis (PBC) Consortium and Wellcome Trust Case Control Consortium 3

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    We genotyped 2,861 cases of primary biliary cirrhosis (PBC) from the UK PBC Consortium and 8,514 UK population controls across 196,524 variants within 186 known autoimmune risk loci. We identified 3 loci newly associated with PBC (at P<5×10(-8)), increasing the number of known susceptibility loci to 25. The most associated variant at 19p12 is a low-frequency nonsynonymous SNP in TYK2, further implicating JAK-STAT and cytokine signaling in disease pathogenesis. An additional five loci contained nonsynonymous variants in high linkage disequilibrium (LD; r2>0.8) with the most associated variant at the locus. We found multiple independent common, low-frequency and rare variant association signals at five loci. Of the 26 independent non-human leukocyte antigen (HLA) signals tagged on the Immunochip, 15 have SNPs in B-lymphoblastoid open chromatin regions in high LD (r2>0.8) with the most associated variant. This study shows how data from dense fine-mapping arrays coupled with functional genomic data can be used to identify candidate causal variants for functional follow-up.

    Funded by: Medical Research Council: G0000934, G0500020, G0800460, G0802068; NIDDK NIH HHS: U01-DK-062418; Wellcome Trust: 068545/Z/02, 076113/C/04/Z, 085925/Z/08/Z, 098051, WT090355/A/09/Z, WT090355/B/09/Z

    Nature genetics 2012;44;10;1137-41

  • Expression of chemosensory proteins in the tsetse fly Glossina morsitans morsitans is related to female host-seeking behaviour.

    Liu R, He X, Lehane S, Lehane M, Hertz-Fowler C, Berriman M, Field LM and Zhou JJ

    Department of Biological Chemistry, Rothamsted Research, Harpenden, UK.

    Chemosensory proteins (CSPs) are a class of soluble proteins present in high concentrations in the sensilla of insect antennae. It has been proposed that they play an important role in insect olfaction by mediating interactions between odorants and odorant receptors. Here we report, for the first time, the presence of five CSP genes in the tsetse fly Glossina morsitans morsitans, a major vector transmitting nagana in livestock. Real-time quantitative reverse transcription PCR showed that three of the CSPs are expressed in antennae. One of them, GmmCSP2, is transcribed at a very high level and could be involved in olfaction. We also determined expression in the antennae of both males and females at different life stages and with different blood feeding regimes. The transcription of GmmCSP2 was lower in male antennae than in females, with a sharp increase in 10-week-old flies, 48 h after a bloodmeal. Thus there is a clear relationship between CSP gene transcription and host searching behaviour. Genome annotation and phylogenetic analyses comparing G. morsitans morsitans CSPs with those of other Diptera showed rapid evolution after speciation of mosquitoes.

    Funded by: Biotechnology and Biological Sciences Research Council; Wellcome Trust: WT085775/Z/08/Z

    Insect molecular biology 2012;21;1;41-8

  • Learned recognition of maternal signature odors mediates the first suckling episode in mice.

    Logan DW, Brunet LJ, Webb WR, Cutforth T, Ngai J and Stowers L

    Department of Cell Biology, The Scripps Research Institute, La Jolla, CA 92037, USA.

    Background: Soon after birth, all mammals must initiate milk suckling to survive. In rodents, this innate behavior is critically dependent on uncharacterized maternally derived chemosensory ligands. Recently, the first pheromone sufficient to initiate suckling was isolated from the rabbit. Identification of the olfactory cues that trigger first suckling in the mouse would provide the means to determine the neural mechanisms that generate innate behavior.

    Results: Here we use behavioral analysis, metabolomics, and calcium imaging of primary sensory neurons and find no evidence of ligands with intrinsic bioactivity, such as pheromones, acting to promote first suckling in the mouse. Instead, we find that the initiation of suckling is dependent on variable blends of maternal "signature odors" that are learned and recognized prior to first suckling.

    Conclusions: As observed with pheromone-mediated behavior, the response to signature odors releases innate behavior. However, this mechanism tolerates variability in both the signaling ligands and sensory neurons, which may maximize the probability that this first essential behavior is successfully initiated. These results suggest that mammalian species have evolved multiple strategies to ensure the onset of this critical behavior.

    Funded by: NIDCD NIH HHS: R01 DC006885, R01 DC009413; Wellcome Trust: 098051

    Current biology : CB 2012;22;21;1998-2007

  • GeneDB--an annotation database for pathogens.

    Logan-Klumpler FJ, De Silva N, Boehme U, Rogers MB, Velarde G, McQuillan JA, Carver T, Aslett M, Olsen C, Subramanian S, Phan I, Farris C, Mitra S, Ramasamy G, Wang H, Tivey A, Jackson A, Houston R, Parkhill J, Holden M, Harb OS, Brunk BP, Myler PJ, Roos D, Carrington M, Smith DF, Hertz-Fowler C and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. fl2@sanger.ac.uk

    GeneDB (http://www.genedb.org) is a genome database for prokaryotic and eukaryotic pathogens and closely related organisms. The resource provides a portal to genome sequence and annotation data, which is primarily generated by the Pathogen Genomics group at the Wellcome Trust Sanger Institute. It combines data from completed and ongoing genome projects with curated annotation, which is readily accessible from a web based resource. The development of the database in recent years has focused on providing database-driven annotation tools and pipelines, as well as catering for increasingly frequent assembly updates. The website has been significantly redesigned to take advantage of current web technologies, and improve usability. The current release stores 41 data sets, of which 17 are manually curated and maintained by biologists, who review and incorporate data from the scientific literature, as well as other sources. GeneDB is primarily a production and annotation database for the genomes of predominantly pathogenic organisms.

    Funded by: Wellcome Trust: WT 043565, WT 085775/Z/08/Z

    Nucleic acids research 2012;40;Database issue;D98-108

  • A combined functional annotation score for non-synonymous variants.

    Lopes MC, Joyce C, Ritchie GR, John SL, Cunningham F, Asimit J and Zeggini E

    Wellcome Trust Sanger Institute, Hinxton, Hinxton, UK. ml10@sanger.ac.uk

    Aims: Next-generation sequencing has opened the possibility of large-scale sequence-based disease association studies. A major challenge in interpreting whole-exome data is predicting which of the discovered variants are deleterious or neutral. To address this question in silico, we have developed a score called Combined Annotation scoRing toOL (CAROL), which combines information from 2 bioinformatics tools: PolyPhen-2 and SIFT, in order to improve the prediction of the effect of non-synonymous coding variants.

    Methods: We used a weighted Z method that combines the probabilistic scores of PolyPhen-2 and SIFT. We defined 2 dataset pairs to train and test CAROL using information from the dbSNP: 'HGMD-PUBLIC' and 1000 Genomes Project databases. The training pair comprises a total of 980 positive control (disease-causing) and 4,845 negative control (non-disease-causing) variants. The test pair consists of 1,959 positive and 9,691 negative controls.

    Results: CAROL has higher predictive power and accuracy for the effect of non-synonymous variants than each individual annotation tool (PolyPhen-2 and SIFT) and benefits from higher coverage.

    Conclusion: The combination of annotation tools can help improve automated prediction of whole-genome/exome non-synonymous variant functional consequences.

    Funded by: Wellcome Trust: 095908, 098051, WT088885/Z/09/Z

    Human heredity 2012;73;1;47-51

  • Do genome-wide association scans have potential for translation?

    Lopes MC, Zeggini E and Panoutsopoulou K

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    The success of genome-wide association studies (GWAS) in identifying replicating associations has greatly contributed to understanding of the genetic aetiology of complex diseases. This review discusses and provides examples of the potential of GWAS findings to be translated into clinical practice, i.e., diagnosis, prediction, prognosis, novel treatments and response to treatment of common diseases. The biological insights afforded by newly-identified robust associations represent the largest, albeit indirect, translational contribution of GWAS.

    Funded by: Arthritis Research UK: 18030; Wellcome Trust: WT088885/Z/09/Z

    Clinical chemistry and laboratory medicine : CCLM / FESCC 2012;50;2;255-60

  • Mice deficient in the H+-ATPase a4 subunit have severe hearing impairment associated with enlarged endolymphatic compartments within the inner ear.

    Lorente-Cánovas B, Ingham N, Norgett EE, Golder ZJ, Karet Frankl FE and Steel KP

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK;

    Mutations in the ATP6V0A4 gene lead to autosomal recessive distal renal tubular acidosis in patients who often show sensorineural hearing impairment. A first Atp6v0a4 knockout mouse model that recapitulates the loss of H(+)-ATPase function seen in humans has been generated and recently published (Norgett et al., 2012). Here we present the first detailed analysis of the structure and function of the auditory system in Atp6v0a4(-/-) knockout mice. Measurements of the auditory brainstem response (ABR) showed significantly elevated thresholds in homozygous mutant mice, which indicate severe hearing impairment. Heterozygote thresholds were normal. Analysis of paint-filled inner ears and sections from E16.5 embryos revealed a marked expansion of cochlear and endolymphatic ducts in Atp6v0a4(-/-) mice. A regulatory link between Atp6v0a4, Foxi1 and Pds has been reported and we found that the endolymphatic sac of Atp6v0a4(-/-) mice expresses both Foxi1 and Pds, which suggests a downstream position of Atp6v0a4. These mutants also showed a lack of endocochlear potential, suggesting a functional defect of the stria vascularis on the lateral wall of the cochlear duct. However, the main K(+) channels involved in the generation of endocochlear potential, Kcnj10 and Kcnq1, are strongly expressed in Atp6v0a4(-/-) mice. Our results lead to a better understanding of the role of this proton pump in hearing function.

    Disease models &amp; mechanisms 2012

  • Community gene annotation in practice.

    Loveland JE, Gilbert JG, Griffiths E and Harrow JL

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK. jel@sanger.ac.uk

    Manual annotation of genomic data is extremely valuable to produce an accurate reference gene set but is expensive compared with automatic methods and so has been limited to model organisms. Annotation tools that have been developed at the Wellcome Trust Sanger Institute (WTSI, http://www.sanger.ac.uk/.) are being used to fill that gap, as they can be used remotely and so open up viable community annotation collaborations. We introduce the 'Blessed' annotator and 'Gatekeeper' approach to Community Annotation using the Otterlace/ZMap genome annotation tool. We also describe the strategies adopted for annotation consistency, quality control and viewing of the annotation. DATABASE URL: http://vega.sanger.ac.uk/index.html.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F02195X/1; Wellcome Trust: WT077198

    Database : the journal of biological databases and curation 2012;2012;bas009

  • MiR-25 Regulates Wwp2 and Fbxw7 and Promotes Reprogramming of Mouse Fibroblast Cells to iPSCs.

    Lu D, Davis MP, Abreu-Goodger C, Wang W, Campos LS, Siede J, Vigorito E, Skarnes WC, Dunham I, Enright AJ and Liu P

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    Background: miRNAs are a class of small non-coding RNAs that regulate gene expression and have critical functions in various biological processes. Hundreds of miRNAs have been identified in mammalian genomes but only a small number of them have been functionally characterized. Recent studies also demonstrate that some miRNAs have important roles in reprogramming somatic cells to induced pluripotent stem cells (iPSCs). Methods: We screened 52 miRNAs cloned in a piggybac (PB) vector for their roles in reprogramming of mouse embryonic fibroblast cells to iPSCs. To identify targets of miRNAs, we made Dgcr8-deficient embryonic stem (ES) cells and introduced miRNA mimics to these cells, which lack miRNA biogenesis. The direct target genes of miRNA were identified through global gene expression analysis and target validation. We found that over-expressing miR-25 or introducing miR-25 mimics enhanced production of iPSCs. We identified a number of miR-25 candidate gene targets. Of particular interest were two ubiquitin ligases, Wwp2 and Fbxw7, which have been proposed to regulate Oct4, c-Myc and Klf5, respectively. Our findings thus highlight the complex interplay between miRNAs and transcription factors involved in reprogramming, stem cell self-renewal and maintenance of pluripotency.

    PloS one 2012;7;8;e40938

  • Genome-wide Transcriptome Profiling Reveals the Functional Impact of Rare De Novo and Recurrent CNVs in Autism Spectrum Disorders.

    Luo R, Sanders SJ, Tian Y, Voineagu I, Huang N, Chu SH, Klei L, Cai C, Ou J, Lowe JK, Hurles ME, Devlin B, State MW and Geschwind DH

    Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA.

    Copy-number variants (CNVs) are a major contributor to the pathophysiology of autism spectrum disorders (ASDs), but the functional impact of CNVs remains largely unexplored. Because brain tissue is not available from most samples, we interrogated gene expression in lymphoblasts from 244 families with discordant siblings in the Simons Simplex Collection in order to identify potentially pathogenic variation. Our results reveal that the overall frequency of significantly misexpressed genes (which we refer to here as outliers) identified in probands and unaffected siblings does not differ. However, in probands, but not their unaffected siblings, the group of outlier genes is significantly enriched in neural-related pathways, including neuropeptide signaling, synaptogenesis, and cell adhesion. We demonstrate that outlier genes cluster within the most pathogenic CNVs (rare de novo CNVs) and can be used for the prioritization of rare CNVs of potentially unknown significance. Several nonrecurrent CNVs with significant gene-expression alterations are identified (these include deletions in chromosomal regions 3q27, 3p13, and 3p26 and duplications at 2p15), suggesting that these are potential candidate ASD loci. In addition, we identify distinct expression changes in 16p11.2 microdeletions, 16p11.2 microduplications, and 7q11.23 duplications, and we show that specific genes within the 16p CNV interval correlate with differences in head circumference, an ASD-relevant phenotype. This study provides evidence that pathogenic structural variants have a functional impact via transcriptome alterations in ASDs at a genome-wide level and demonstrates the utility of integrating gene expression with mutation data for the prioritization of genes disrupted by potentially pathogenic mutations.

    American journal of human genetics 2012

  • A balanced translocation truncates Neurotrimin in a family with intracranial and thoracic aortic aneurysm.

    Luukkonen TM, Pöyhönen M, Palotie A, Ellonen P, Lagström S, Lee JH, Terwilliger JD, Salonen R and Varilo T

    Institute for Molecular Medicine Finland FIMM, Helsinki, Finland.

    Background: Balanced chromosomal rearrangements occasionally have strong phenotypic effects, which may be useful in understanding pathobiology. However, conventional strategies for characterising breakpoints are laborious and inaccurate. We present here a proband with a thoracic aortic aneurysm (TAA) and a balanced translocation t(10;11) (q23.2;q24.2). Our purpose was to sequence the chromosomal breaks in this family to reveal a novel candidate gene for aneurysm.

    Intracranial aneurysm (IA) and TAAs appear to run in the family in an autosomal dominant manner: After exploring the family history, we observed that the proband's two siblings both died from cerebral haemorrhage, and the proband's parent and parent's sibling died from aortic rupture. After application of a genome-wide paired-end DNA sequencing method for breakpoint mapping, we demonstrate that this translocation breaks intron 1 of a splicing isoform of Neurotrimin at 11q25 in a previously implicated candidate region for IAs and AAs (OMIM 612161).

    Conclusions: Our results demonstrate the feasibility of genome-wide paired-end sequencing for the characterisation of balanced rearrangements and identification of candidate genes in patients with potentially disease-associated chromosome rearrangements. The family samples were gathered as a part of our recently launched National Registry of Reciprocal Balanced Translocations and Inversions in Finland (n=2575), and we believe that such a registry will be a powerful resource for the localisation of chromosomal aberrations, which can bring insight into the aetiology of related phenotypes.

    Funded by: NIMH NIH HHS: MH084995, R01 MH084995; Wellcome Trust: 098051

    Journal of medical genetics 2012;49;10;621-9

  • Estimating reassortment rates in co-circulating Eurasian swine influenza viruses.

    Lycett SJ, Baillie G, Coulter E, Bhatt S, Kellam P, McCauley JW, Wood JL, Brown IH, Pybus OG, Leigh Brown AJ and Combating Swine Influenza Initiative-COSI Consortium

    Institute of Evolutionary Biology, University of Edinburgh, Kings Buildings, West Mains Road, Edinburgh EH9 3JT, UK. samantha.lycett@ed.ac.uk

    Swine have often been considered as a mixing vessel for different influenza strains. In order to assess their role in more detail, we undertook a retrospective sequencing study to detect and characterize the reassortants present in European swine and to estimate the rate of reassortment between H1N1, H1N2 and H3N2 subtypes with Eurasian (avian-like) internal protein-coding segments. We analysed 69 newly obtained whole genome sequences of subtypes H1N1-H3N2 from swine influenza viruses sampled between 1982 and 2008, using Illumina and 454 platforms. Analyses of these genomes, together with previously published genomes, revealed a large monophyletic clade of Eurasian swine-lineage polymerase segments containing H1N1, H1N2 and H3N2 subtypes. We subsequently examined reassortments between the haemagglutinin and neuraminidase segments and estimated the reassortment rates between lineages using a recently developed evolutionary analysis method. High rates of reassortment between H1N2 and H1N1 Eurasian swine lineages were detected in European strains, with an average of one reassortment every 2-3 years. This rapid reassortment results from co-circulating lineages in swine, and in consequence we should expect further reassortments between currently circulating swine strains and the recent swine-origin H1N1v pandemic strain.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/H014306/1; Medical Research Council: MC_G0902096, MC_U117512723; Wellcome Trust

    The Journal of general virology 2012;93;Pt 11;2326-36

  • Genetically distinct subsets within ANCA-associated vasculitis.

    Lyons PA, Rayner TF, Trivedi S, Holle JU, Watts RA, Jayne DR, Baslund B, Brenchley P, Bruchfeld A, Chaudhry AN, Cohen Tervaert JW, Deloukas P, Feighery C, Gross WL, Guillevin L, Gunnarsson I, Harper L, Hrušková Z, Little MA, Martorana D, Neumann T, Ohlsson S, Padmanabhan S, Pusey CD, Salama AD, Sanders JS, Savage CO, Segelmark M, Stegeman CA, Tesař V, Vaglio A, Wieczorek S, Wilde B, Zwerina J, Rees AJ, Clayton DG and Smith KG

    Cambridge Institute for Medical Research, and Department of Medicine, University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Cambridge, United Kingdom. kgcs2@cam.ac.uk

    Background: Antineutrophil cytoplasmic antibody (ANCA)-associated vasculitis is a severe condition encompassing two major syndromes: granulomatosis with polyangiitis (formerly known as Wegener's granulomatosis) and microscopic polyangiitis. Its cause is unknown, and there is debate about whether it is a single disease entity and what role ANCA plays in its pathogenesis. We investigated its genetic basis.

    Methods: A genomewide association study was performed in a discovery cohort of 1233 U.K. patients with ANCA-associated vasculitis and 5884 controls and was replicated in 1454 Northern European case patients and 1666 controls. Quality control, population stratification, and statistical analyses were performed according to standard criteria.

    Results: We found both major-histocompatibility-complex (MHC) and non-MHC associations with ANCA-associated vasculitis and also that granulomatosis with polyangiitis and microscopic polyangiitis were genetically distinct. The strongest genetic associations were with the antigenic specificity of ANCA, not with the clinical syndrome. Anti-proteinase 3 ANCA was associated with HLA-DP and the genes encoding α(1)-antitrypsin (SERPINA1) and proteinase 3 (PRTN3) (P=6.2×10(-89), P=5.6×10(-12,) and P=2.6×10(-7), respectively). Anti-myeloperoxidase ANCA was associated with HLA-DQ (P=2.1×10(-8)).

    Conclusions: This study confirms that the pathogenesis of ANCA-associated vasculitis has a genetic component, shows genetic distinctions between granulomatosis with polyangiitis and microscopic polyangiitis that are associated with ANCA specificity, and suggests that the response against the autoantigen proteinase 3 is a central pathogenic feature of proteinase 3 ANCA-associated vasculitis. These data provide preliminary support for the concept that proteinase 3 ANCA-associated vasculitis and myeloperoxidase ANCA-associated vasculitis are distinct autoimmune syndromes. (Funded by the British Heart Foundation and others.).

    Funded by: British Heart Foundation: SP/09/001/27117; Medical Research Council; Wellcome Trust: 083650/Z/07/Z

    The New England journal of medicine 2012;367;3;214-23

  • Genome-Wide Association Analysis of Imputed Rare Variants: Application to Seven Common Complex Diseases.

    Mägi R, Asimit JL, Day-Williams AG, Zeggini E and Morris AP

    Estonian Genome Centre, University of Tartu, Tartu, Estonia.

    Genome-wide association studies have been successful in identifying loci contributing effects to a range of complex human traits. The majority of reproducible associations within these loci are with common variants, each of modest effect, which together explain only a small proportion of heritability. It has been suggested that much of the unexplained genetic component of complex traits can thus be attributed to rare variation. However, genome-wide association study genotyping chips have been designed primarily to capture common variation, and thus are underpowered to detect the effects of rare variants. Nevertheless, we demonstrate here, by simulation, that imputation from an existing scaffold of genome-wide genotype data up to high-density reference panels has the potential to identify rare variant associations with complex traits, without the need for costly re-sequencing experiments. By application of this approach to genome-wide association studies of seven common complex diseases, imputed up to publicly available reference panels, we identify genome-wide significant evidence of rare variant association in PRDM10 with coronary artery disease and multiple genes in the major histocompatibility complex (MHC) with type 1 diabetes. The results of our analyses highlight that genome-wide association studies have the potential to offer an exciting opportunity for gene discovery through association with rare variants, conceivably leading to substantial advancements in our understanding of the genetic architecture underlying complex human traits.

    Funded by: Wellcome Trust: 090532, 098017

    Genetic epidemiology 2012

  • A systematic survey of loss-of-function variants in human protein-coding genes.

    MacArthur DG, Balasubramanian S, Frankish A, Huang N, Morris J, Walter K, Jostins L, Habegger L, Pickrell JK, Montgomery SB, Albers CA, Zhang ZD, Conrad DF, Lunter G, Zheng H, Ayub Q, DePristo MA, Banks E, Hu M, Handsaker RE, Rosenfeld JA, Fromer M, Jin M, Mu XJ, Khurana E, Ye K, Kay M, Saunders GI, Suner MM, Hunt T, Barnes IH, Amid C, Carvalho-Silva DR, Bignell AH, Snow C, Yngvadottir B, Bumpstead S, Cooper DN, Xue Y, Romero IG, 1000 Genomes Project Consortium, Wang J, Li Y, Gibbs RA, McCarroll SA, Dermitzakis ET, Pritchard JK, Barrett JC, Harrow J, Hurles ME, Gerstein MB and Tyler-Smith C

    Wellcome Trust Sanger Institute, Hinxton, UK. macarthur@atgu.mgh.harvard.edu

    Genome-sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease-causing variants, as well as common LoF variants in nonessential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.

    Funded by: British Heart Foundation: RG/09/012/28096; NHGRI NIH HHS: U54 HG003273; Wellcome Trust: 085532, 090532, 090532/Z/09/Z, 098051

    Science (New York, N.Y.) 2012;335;6070;823-8

  • A New Isolation with Migration Model along Complete Genomes Infers Very Different Divergence Processes among Closely Related Great Ape Species.

    Mailund T, Halager AE, Westergaard M, Dutheil JY, Munch K, Andersen LN, Lunter G, Prüfer K, Scally A, Hobolth A and Schierup MH

    Bioinformatics Research Center, Aarhus University, Aarhus, Denmark.

    We present a hidden Markov model (HMM) for inferring gradual isolation between two populations during speciation, modelled as a time interval with restricted gene flow. The HMM describes the history of adjacent nucleotides in two genomic sequences, such that the nucleotides can be separated by recombination, can migrate between populations, or can coalesce at variable time points, all dependent on the parameters of the model, which are the effective population sizes, splitting times, recombination rate, and migration rate. We show by extensive simulations that the HMM can accurately infer all parameters except the recombination rate, which is biased downwards. Inference is robust to variation in the mutation rate and the recombination rate over the sequence and also robust to unknown phase of genomes unless they are very closely related. We provide a test for whether divergence is gradual or instantaneous, and we apply the model to three key divergence processes in great apes: (a) the bonobo and common chimpanzee, (b) the eastern and western gorilla, and (c) the Sumatran and Bornean orang-utan. We find that the bonobo and chimpanzee appear to have undergone a clear split, whereas the divergence processes of the gorilla and orang-utan species occurred over several hundred thousands years with gene flow stopping quite recently. We also apply the model to the Homo/Pan speciation event and find that the most likely scenario involves an extended period of gene flow during speciation.

    PLoS genetics 2012;8;12;e1003125

  • Accessing data from the International Mouse Phenotyping Consortium: state of the art and future plans.

    Mallon AM, Iyer V, Melvin D, Morgan H, Parkinson H, Brown SD, Flicek P and Skarnes WC

    Mammalian Genetics Unit, Medical Research Council Harwell, Harwell, Oxfordshire OX11 0RD, UK. a.mallon@har.mrc.ac.uk

    The International Mouse Phenotyping Consortium (IMPC) (http://www.mousephenotype.org) will reveal the pleiotropic functions of every gene in the mouse genome and uncover the wider role of genetic loci within diverse biological systems. Comprehensive informatics solutions are vital to ensuring that this vast array of data is captured in a standardised manner and made accessible to the scientific community for interrogation and analysis. Here we review the existing EuroPhenome and WTSI phenotype informatics systems and the IKMC portal, and present plans for extending these systems and lessons learned to the development of a robust IMPC informatics infrastructure.

    Funded by: Medical Research Council: MC_U142684172, MC_U142684175; NHGRI NIH HHS: U54 HG006370

    Mammalian genome : official journal of the International Mammalian Genome Society 2012;23;9-10;641-52

  • A Neuronal Transcriptome Response Involving Stress Pathways is Buffered by Neuronal microRNAs.

    Manakov SA, Morton A, Enright AJ and Grant SG

    Genes to Cognition Programme, Wellcome Trust Sanger Institute Cambridge, UK ; RNA Genomics Lab, European Molecular Biology Laboratory-European Bioinformatics Institute Cambridge, UK.

    A single microRNA (miRNA) can inhibit a large number of mRNA transcripts. This widespread regulatory function has been experimentally demonstrated for a number of miRNAs. However, even when a multitude of targets is confirmed, function of a miRNA is frequently interpreted through a prism of a handful arbitrarily selected "interesting" targets. In this work we first show that hundreds of transcripts with target sites for two miRNAs expressed endogenously in neurons, miR-124 and miR-434-3p, are coordinately upregulated in a variety of neuronal stresses. This creates a landscape where these two miRNAs can exert their widespread inhibitory potential on stress-induced transcripts. Next, we experimentally demonstrate that overexpression of these two miRNAs indeed significantly inhibits expression of hundreds of stress-induced transcripts, thus confirming that these transcripts are enriched in true targets of examined miRNAs. A number of miRNAs were previously shown to have important roles in the regulation of stress responses, and our results suggest that these roles should be understood in light of a wide spread activation of miRNA targets during stresses. Importantly, a popular cationic lipid transfection reagent triggers such induction of miRNA targets. Therefore, when a transfection paradigm is employed to study miRNA function, the results of such studies should be interpreted with consideration for the inadvertent induction of miRNA targets.

    Frontiers in neuroscience 2012;6;156

  • Candidate human genetic polymorphisms and severe malaria in a Tanzanian population.

    Manjurano A, Clark TG, Nadjm B, Mtove G, Wangai H, Sepulveda N, Campino SG, Maxwell C, Olomi R, Rockett KR, Jeffreys A, MalariaGen Consortium, Riley EM, Reyburn H and Drakeley C

    Joint Malaria Programme, Kilimanjaro Christian Medical Centre, Moshi, Tanzania.

    Human genetic background strongly influences susceptibility to malaria infection and progression to severe disease and death. Classical genetic studies identified haemoglobinopathies and erythrocyte-associated polymorphisms, as protective against severe disease. High throughput genotyping by mass spectrometry allows multiple single nucleotide polymorphisms (SNPs) to be examined simultaneously. We compared the prevalence of 65 human SNP's, previously associated with altered risk of malaria, between Tanzanian children with and without severe malaria. Five hundred children, aged 1-10 years, with severe malaria were recruited from those admitted to hospital in Muheza, Tanzania and compared with matched controls. Genotyping was performed by Sequenom MassArray, and conventional PCR was used to detect deletions in the alpha-thalassaemia gene. SNPs in two X-linked genes were associated with altered risk of severe malaria in females but not in males: heterozygosity for one or other of two SNPs in the G6PD gene was associated with protection from all forms of severe disease whilst two SNPs in the gene encoding CD40L were associated with respiratory distress. A SNP in the adenyl cyclase 9 (ADCY9) gene was associated with protection from acidosis whilst a polymorphism in the IL-1α gene (IL1A) was associated with an increased risk of acidosis. SNPs in the genes encoding IL-13 and reticulon-3 (RTN3) were associated with increased risk of cerebral malaria. This study confirms previously known genetic associations with protection from severe malaria (HbS, G6PD). It identifies two X-linked genes associated with altered risk of severe malaria in females, identifies mutations in ADCY9, IL1A and CD40L as being associated with altered risk of severe respiratory distress and acidosis, both of which are characterised by high serum lactate levels, and also identifies novel genetic associations with severe malaria (TRIM5) and cerebral malaria(IL-13 and RTN3). Further studies are required to test the generality of these associations and to understand their functional consequences.

    Funded by: Medical Research Council: G0600230, G0600718; Wellcome Trust: 075491/Z/04, 087285, 090532/Z/09/Z, 090770/Z/09/Z, 096527, 77383/Z/05/Z

    PloS one 2012;7;10;e47463

  • Sleeping Beauty mutagenesis reveals cooperating mutations and pathways in pancreatic adenocarcinoma.

    Mann KM, Ward JM, Yew CC, Kovochich A, Dawson DW, Black MA, Brett BT, Sheetz TE, Dupuy AJ, Australian Pancreatic Cancer Genome Initiative, Chang DK, Biankin AV, Waddell N, Kassahn KS, Grimmond SM, Rust AG, Adams DJ, Jenkins NA and Copeland NG

    Division of Genetics and Genomics, Institute of Molecular and Cell Biology, Singapore 138673.

    Pancreatic cancer is one of the most deadly cancers affecting the Western world. Because the disease is highly metastatic and difficult to diagnosis until late stages, the 5-y survival rate is around 5%. The identification of molecular cancer drivers is critical for furthering our understanding of the disease and development of improved diagnostic tools and therapeutics. We have conducted a mutagenic screen using Sleeping Beauty (SB) in mice to identify new candidate cancer genes in pancreatic cancer. By combining SB with an oncogenic Kras allele, we observed highly metastatic pancreatic adenocarcinomas. Using two independent statistical methods to identify loci commonly mutated by SB in these tumors, we identified 681 loci that comprise 543 candidate cancer genes (CCGs); 75 of these CCGs, including Mll3 and Ptk2, have known mutations in human pancreatic cancer. We identified point mutations in human pancreatic patient samples for another 11 CCGs, including Acvr2a and Map2k4. Importantly, 10% of the CCGs are involved in chromatin remodeling, including Arid4b, Kdm6a, and Nsd3, and all SB tumors have at least one mutated gene involved in this process; 20 CCGs, including Ctnnd1, Fbxo11, and Vgll4, are also significantly associated with poor patient survival. SB mutagenesis provides a rich resource of mutations in potential cancer drivers for cross-comparative analyses with ongoing sequencing efforts in human pancreatic adenocarcinoma.

    Proceedings of the National Academy of Sciences of the United States of America 2012;109;16;5934-41

  • Adaptive mechanisms in pathogens: universal aneuploidy in Leishmania.

    Mannaert A, Downing T, Imamura H and Dujardin JC

    Unit of Molecular Parasitology, Department of Biomedical Sciences, Institute of Tropical Medicine, Antwerp, Belgium.

    Genomic stability and maintenance of the correct chromosome number are assumed to be essential for normal development in eukaryotes. Aneuploidy is usually associated with severe abnormalities and decrease of cell fitness, but some organisms appear to rely on aneuploidy for rapid adaptation to changing environments. This phenomenon is mostly described in pathogenic fungi and cancer cells. However, recent genome studies highlight the importance of Leishmania as a new model for studies on aneuploidy. Several reports revealed extensive variation in chromosome copy number, indicating that aneuploidy is a constitutive feature of this protozoan parasite genus. Aneuploidy appears to be beneficial in organisms that are primarily asexual, unicellular, and that undergo sporadic epidemic expansions, including common pathogens as well as cancer.

    Trends in parasitology 2012

  • A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance.

    Manning AK, Hivert MF, Scott RA, Grimsby JL, Bouatia-Naji N, Chen H, Rybin D, Liu CT, Bielak LF, Prokopenko I, Amin N, Barnes D, Cadby G, Hottenga JJ, Ingelsson E, Jackson AU, Johnson T, Kanoni S, Ladenvall C, Lagou V, Lahti J, Lecoeur C, Liu Y, Martinez-Larrad MT, Montasser ME, Navarro P, Perry JR, Rasmussen-Torvik LJ, Salo P, Sattar N, Shungin D, Strawbridge RJ, Tanaka T, van Duijn CM, An P, de Andrade M, Andrews JS, Aspelund T, Atalay M, Aulchenko Y, Balkau B, Bandinelli S, Beckmann JS, Beilby JP, Bellis C, Bergman RN, Blangero J, Boban M, Boehnke M, Boerwinkle E, Bonnycastle LL, Boomsma DI, Borecki IB, Böttcher Y, Bouchard C, Brunner E, Budimir D, Campbell H, Carlson O, Chines PS, Clarke R, Collins FS, Corbatón-Anchuelo A, Couper D, de Faire U, Dedoussis GV, Deloukas P, Dimitriou M, Egan JM, Eiriksdottir G, Erdos MR, Eriksson JG, Eury E, Ferrucci L, Ford I, Forouhi NG, Fox CS, Franzosi MG, Franks PW, Frayling TM, Froguel P, Galan P, de Geus E, Gigante B, Glazer NL, Goel A, Groop L, Gudnason V, Hallmans G, Hamsten A, Hansson O, Harris TB, Hayward C, Heath S, Hercberg S, Hicks AA, Hingorani A, Hofman A, Hui J, Hung J, Jarvelin MR, Jhun MA, Johnson PC, Jukema JW, Jula A, Kao WH, Kaprio J, Kardia SL, Keinanen-Kiukaanniemi S, Kivimaki M, Kolcic I, Kovacs P, Kumari M, Kuusisto J, Kyvik KO, Laakso M, Lakka T, Lannfelt L, Lathrop GM, Launer LJ, Leander K, Li G, Lind L, Lindstrom J, Lobbens S, Loos RJ, Luan J, Lyssenko V, Mägi R, Magnusson PK, Marmot M, Meneton P, Mohlke KL, Mooser V, Morken MA, Miljkovic I, Narisu N, O'Connell J, Ong KK, Oostra BA, Palmer LJ, Palotie A, Pankow JS, Peden JF, Pedersen NL, Pehlic M, Peltonen L, Penninx B, Pericic M, Perola M, Perusse L, Peyser PA, Polasek O, Pramstaller PP, Province MA, Räikkönen K, Rauramaa R, Rehnberg E, Rice K, Rotter JI, Rudan I, Ruokonen A, Saaristo T, Sabater-Lleal M, Salomaa V, Savage DB, Saxena R, Schwarz P, Seedorf U, Sennblad B, Serrano-Rios M, Shuldiner AR, Sijbrands EJ, Siscovick DS, Smit JH, Small KS, Smith NL, Smith AV, Stančáková A, Stirrups K, Stumvoll M, Sun YV, Swift AJ, Tönjes A, Tuomilehto J, Trompet S, Uitterlinden AG, Uusitupa M, Vikström M, Vitart V, Vohl MC, Voight BF, Vollenweider P, Waeber G, Waterworth DM, Watkins H, Wheeler E, Widen E, Wild SH, Willems SM, Willemsen G, Wilson JF, Witteman JC, Wright AF, Yaghootkar H, Zelenika D, Zemunik T, Zgaga L, DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium, Multiple Tissue Human Expression Resource (MUTHER) Consortium, Wareham NJ, McCarthy MI, Barroso I, Watanabe RM, Florez JC, Dupuis J, Meigs JB and Langenberg C

    Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, USA.

    Recent genome-wide association studies have described many loci implicated in type 2 diabetes (T2D) pathophysiology and β-cell dysfunction but have contributed little to the understanding of the genetic basis of insulin resistance. We hypothesized that genes implicated in insulin resistance pathways might be uncovered by accounting for differences in body mass index (BMI) and potential interactions between BMI and genetic variants. We applied a joint meta-analysis approach to test associations with fasting insulin and glucose on a genome-wide scale. We present six previously unknown loci associated with fasting insulin at P < 5 × 10(-8) in combined discovery and follow-up analyses of 52 studies comprising up to 96,496 non-diabetic individuals. Risk variants were associated with higher triglyceride and lower high-density lipoprotein (HDL) cholesterol levels, suggesting a role for these loci in insulin resistance pathways. The discovery of these loci will aid further characterization of the role of insulin resistance in T2D pathophysiology.

    Funded by: British Heart Foundation: RG/07/008/23674; Chief Scientist Office: CZB/4/710; Medical Research Council: G0100222, G0701863, G0900339, G0902037, G19/35, G8802774, MC_PC_U127561128, MC_U106179471, MC_U106179472, MC_U127561128, MC_U127592696, MC_UP_A100_1003; NCATS NIH HHS: UL1 TR000124; NCRR NIH HHS: S10 RR029392; NHLBI NIH HHS: R01 HL105756; NIDDK NIH HHS: K24 DK080140, P30 DK063491, R01 DK072193, R01 DK078616; NIMH NIH HHS: R37 MH059490; Wellcome Trust: 090532, 091551

    Nature genetics 2012;44;6;659-69

  • Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing.

    Manske M, Miotto O, Campino S, Auburn S, Almagro-Garcia J, Maslen G, O'Brien J, Djimde A, Doumbo O, Zongo I, Ouedraogo JB, Michon P, Mueller I, Siba P, Nzila A, Borrmann S, Kiara SM, Marsh K, Jiang H, Su XZ, Amaratunga C, Fairhurst R, Socheat D, Nosten F, Imwong M, White NJ, Sanders M, Anastasi E, Alcock D, Drury E, Oyola S, Quail MA, Turner DJ, Ruano-Rubio V, Jyothi D, Amenga-Etego L, Hubbart C, Jeffreys A, Rowlands K, Sutherland C, Roper C, Mangano V, Modiano D, Tan JC, Ferdig MT, Amambua-Ngwa A, Conway DJ, Takala-Harrison S, Plowe CV, Rayner JC, Rockett KA, Clark TG, Newbold CI, Berriman M, MacInnis B and Kwiatkowski DP

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    Malaria elimination strategies require surveillance of the parasite population for genetic changes that demand a public health response, such as new forms of drug resistance. Here we describe methods for the large-scale analysis of genetic variation in Plasmodium falciparum by deep sequencing of parasite DNA obtained from the blood of patients with malaria, either directly or after short-term culture. Analysis of 86,158 exonic single nucleotide polymorphisms that passed genotyping quality control in 227 samples from Africa, Asia and Oceania provides genome-wide estimates of allele frequency distribution, population structure and linkage disequilibrium. By comparing the genetic diversity of individual infections with that of the local parasite population, we derive a metric of within-host diversity that is related to the level of inbreeding in the population. An open-access web application has been established for the exploration of regional differences in allele frequency and of highly differentiated loci in the P. falciparum genome.

    Funded by: Howard Hughes Medical Institute: 55005502; Medical Research Council: G0600718, G19/9; Wellcome Trust: 075491/Z/04, 077012/Z/05/Z, 082370, 089275, 090532, 090532/Z/09/Z, 090770, 090770/Z/09/Z, 092654, 098051

    Nature 2012;487;7407;375-9

  • Comparative genomics of Brachyspira pilosicoli strains: genome rearrangements, reductions and correlation of genetic compliment with phenotypic diversity.

    Mappley LJ, Black ML, AbuOun M, Darby AC, Woodward MJ, Parkhill J, Turner AK, Bellgard MI, La T, Phillips ND, La Ragione RM and Hampson DJ

    Department of Bacteriology, Animal Health and Veterinary Laboratories Agency, Reading University, Addlestone, Surrey, UK. l.mappley@reading.ac.uk

    Background: The anaerobic spirochaete Brachyspira pilosicoli causes enteric disease in avian, porcine and human hosts, amongst others. To date, the only available genome sequence of B. pilosicoli is that of strain 95/1000, a porcine isolate. In the first intra-species genome comparison within the Brachyspira genus, we report the whole genome sequence of B. pilosicoli B2904, an avian isolate, the incomplete genome sequence of B. pilosicoli WesB, a human isolate, and the comparisons with B. pilosicoli 95/1000. We also draw on incomplete genome sequences from three other Brachyspira species. Finally we report the first application of the high-throughput Biolog phenotype screening tool on the B. pilosicoli strains for detailed comparisons between genotype and phenotype. Results: Feature and sequence genome comparisons revealed a high degree of similarity between the three B. pilosicoli strains, although the genomes of B2904 and WesB were larger than that of 95/1000 (~2,765, 2.890 and 2.596 Mb, respectively). Genome rearrangements were observed which correlated largely with the positions of mobile genetic elements. Through comparison of the B2904 and WesB genomes with the 95/1000 genome, features that we propose are non-essential due to their absence from 95/1000 include a peptidase, glycine reductase complex components and transposases. Novel bacteriophages were detected in the newly-sequenced genomes, which appeared to have involvement in intra- and inter-species horizontal gene transfer. Phenotypic differences predicted from genome analysis, such as the lack of genes for glucuronate catabolism in 95/1000, were confirmed by phenotyping. Conclusions: The availability of multiple B. pilosicoli genome sequences has allowed us to demonstrate the substantial genomic variation that exists between these strains, and provides an insight into genetic events that are shaping the species. In addition, phenotype screening allowed determination of how genotypic differences translated to phenotype. Further application of such comparisons will improve understanding of the metabolic capabilities of Brachyspira species.

    BMC genomics 2012;13;454

  • The capsular polysaccharide Vi from Salmonella typhi is a B1b antigen.

    Marshall JL, Flores-Langarica A, Kingsley RA, Hitchcock JR, Ross EA, López-Macías C, Lakey J, Martin LB, Toellner KM, MacLennan CA, MacLennan IC, Henderson IR, Dougan G and Cunningham AF

    Medical Research Council Centre for Immune Regulation, School of Immunity and Infection, Institute for Biomedical Research, University of Birmingham, Edgbaston, Birmingham B15 2TT, United Kingdom.

    Vaccination with purified capsular polysaccharide Vi Ag from Salmonella typhi can protect against typhoid fever, although the mechanism for its efficacy is not clearly established. In this study, we have characterized the B cell response to this vaccine in wild-type and T cell-deficient mice. We show that immunization with typhoid Vi polysaccharide vaccine rapidly induces proliferation in B1b peritoneal cells, but not in B1a cells or marginal zone B cells. This induction of B1b proliferation is concomitant with the detection of splenic Vi-specific Ab-secreting cells and protective Ab in Rag1-deficient B1b cell chimeras generated by adoptive transfer-induced specific Ab after Vi immunization. Furthermore, Ab derived from peritoneal B cells is sufficient to confer protection against Salmonella that express Vi Ag. Expression of Vi by Salmonella during infection did not inhibit the development of early Ab responses to non-Vi Ags. Despite this, the protection conferred by immunization of mice with porin proteins from Salmonella, which induce Ab-mediated protection, was reduced postinfection with Vi-expressing Salmonella, although protection was not totally abrogated. This work therefore suggests that, in mice, B1b cells contribute to the protection induced by Vi Ag, and targeting non-Vi Ags as subunit vaccines may offer an attractive strategy to augment current Vi-based vaccine strategies.

    Funded by: Medical Research Council

    Journal of immunology (Baltimore, Md. : 1950) 2012;189;12;5527-32

  • The Subtype of GluN2 C-terminal Domain Determines the Response to Excitotoxic Insults.

    Martel MA, Ryan TJ, Bell KF, Fowler JH, McMahon A, Al-Mubarak B, Komiyama NH, Horsburgh K, Kind PC, Grant SG, Wyllie DJ and Hardingham GE

    Centre for Integrative Physiology, University of Edinburgh School of Biomedical Sciences, Hugh Robson Building, George Square, Edinburgh EH8 9XD, UK.

    It is currently unclear whether the GluN2 subtype influences NMDA receptor (NMDAR) excitotoxicity. We report that the toxicity of NMDAR-mediated Ca(2+) influx is differentially controlled by the cytoplasmic C-terminal domains of GluN2B (CTD(2B)) and GluN2A (CTD(2A)). Studying the effects of acute expression of GluN2A/2B-based chimeric subunits with reciprocal exchanges of their CTDs revealed that CTD(2B) enhances NMDAR toxicity, compared to CTD(2A). Furthermore, the vulnerability of forebrain neurons in vitro and in vivo to NMDAR-dependent Ca(2+) influx is lowered by replacing the CTD of GluN2B with that of GluN2A by targeted exon exchange in a mouse knockin model. Mechanistically, CTD(2B) exhibits stronger physical/functional coupling to the PSD-95-nNOS pathway, which suppresses protective CREB activation. Dependence of NMDAR excitotoxicity on the GluN2 CTD subtype can be overcome by inducing high levels of NMDAR activity. Thus, the identity (2A versus 2B) of the GluN2 CTD controls the toxicity dose-response to episodes of NMDAR activity.

    Neuron 2012;74;3;543-56

  • Detection of recombination events in bacterial genomes from large population samples.

    Marttinen P, Hanage WP, Croucher NJ, Connor TR, Harris SR, Bentley SD and Corander J

    Department of Biomedical Engineering and Computational Science, Aalto University, PO Box 12200, FI-00076 AALTO, Finland. pekka.marttinen@aalto.fi

    Analysis of important human pathogen populations is currently under transition toward whole-genome sequencing of growing numbers of samples collected on a global scale. Since recombination in bacteria is often an important factor shaping their evolution by enabling resistance elements and virulence traits to rapidly transfer from one evolutionary lineage to another, it is highly beneficial to have access to tools that can detect recombination events. Multiple advanced statistical methods exist for such purposes; however, they are typically limited either to only a few samples or to data from relatively short regions of a total genome. By harnessing the power of recent advances in Bayesian modeling techniques, we introduce here a method for detecting homologous recombination events from whole-genome sequence data for bacterial population samples on a large scale. Our statistical approach can efficiently handle hundreds of whole genome sequenced population samples and identify separate origins of the recombinant sequence, offering an enhanced insight into the diversification of bacterial clones at the level of the whole genome. A data set of 241 whole genome sequences from an important pandemic lineage of Streptococcus pneumoniae is used together with multiple simulated data sets to demonstrate the potential of our approach.

    Funded by: NIGMS NIH HHS: U54GM088558

    Nucleic acids research 2012;40;1;e6

  • Integrating cytogenetics and genomics in comparative evolutionary studies of cichlid fish.

    Mazzuchelli J, Kocher TD, Yang F and Martins C

    Department of Morphology, Bioscience Institute, UNESP - São Paulo State University, 18618-970, Botucatu, SP, Brazil. cmartins@ibb.unesp.br.

    Unlabelled: ABSTRACT: Background: The availability of a large number of recently sequenced vertebrate genomes opens new avenues to integrate cytogenetics and genomics in comparative and evolutionary studies. Cytogenetic mapping can offer alternative means to identify conserved synteny shared by distinct genomes and also to define genome regions that are still not fine characterized even after wide-ranging nucleotide sequence efforts. An efficient way to perform comparative cytogenetic mapping is based on BAC clones mapping by fluorescence in situ hybridization. In this report, to address the knowledge gap on the genome evolution in cichlid fishes, BAC clones of an Oreochromis niloticus library covering the linkage groups (LG) 1, 3, 5, and 7 were mapped onto the chromosomes of 9 African cichlid species. The cytogenetic mapping data were also integrated with BAC-end sequences information of O. niloticus and comparatively analyzed against the genome of other fish species and vertebrates. Results: The location of BACs from LG1, 3, 5, and 7 revealed a strong chromosomal conservation among the analyzed cichlid species genomes, which evidenced a synteny of the markers of each LG. Comparative in silico analysis also identified large genomic blocks that were conserved in distantly related fish groups and also in other vertebrates. Conclusions: Although it has been suggested that fishes contain plastic genomes with high rates of chromosomal rearrangements and probably low rates of synteny conservation, our results evidence that large syntenic chromosome segments have been maintained conserved during evolution, at least for the considered markers. Additionally, our current cytogenetic mapping efforts integrated with genomic approaches conduct to a new perspective to address important questions involving chromosome evolution in fishes.

    BMC genomics 2012;13;463

  • Molecular tracing of the emergence, adaptation, and transmission of hospital-associated methicillin-resistant Staphylococcus aureus.

    McAdam PR, Templeton KE, Edwards GF, Holden MT, Feil EJ, Aanensen DM, Bargawi HJ, Spratt BG, Bentley SD, Parkhill J, Enright MC, Holmes A, Girvan EK, Godfrey PA, Feldgarden M, Kearns AM, Rambaut A, Robinson DA and Fitzgerald JR

    The Roslin Institute and Edinburgh Infectious Diseases, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian EH259RG, United Kingdom.

    Hospital-associated infections caused by methicillin-resistant Staphylococcus aureus (MRSA) are a global health burden dominated by a small number of bacterial clones. The pandemic EMRSA-16 clone (ST36-II) has been widespread in UK hospitals for 20 y, but its evolutionary origin and the molecular basis for its hospital association are unclear. We carried out a Bayesian phylogenetic reconstruction on the basis of the genome sequences of 87 S. aureus isolates including 60 EMRSA-16 and 27 additional clonal complex 30 (CC30) isolates, collected from patients in three continents over a 53-y period. The three major pandemic clones to originate from the CC30 lineage, including phage type 80/81, Southwest Pacific, and EMRSA-16, shared a most recent common ancestor that existed over 100 y ago, whereas the hospital-associated EMRSA-16 clone is estimated to have emerged about 35 y ago. Our CC30 genome-wide analysis revealed striking molecular correlates of hospital- or community-associated pandemics represented by mobile genetic elements and nonsynonymous mutations affecting antibiotic resistance and virulence. Importantly, phylogeographic analysis indicates that EMRSA-16 spread within the United Kingdom by transmission from hospitals in large population centers in London and Glasgow to regional health-care settings, implicating patient referrals as an important cause of nationwide transmission. Taken together, the high-resolution phylogenomic approach used resulted in a unique understanding of the emergence and transmission of a major MRSA clone and provided molecular correlates of its hospital adaptation. Similar approaches for hospital-associated clones of other bacterial pathogens may inform appropriate measures for controlling their intra- and interhospital spread.

    Funded by: Biotechnology and Biological Sciences Research Council; Chief Scientist Office; Medical Research Council; NIGMS NIH HHS: GM080602, R01 GM080602; PHS HHS: HHSN272200900018C; Wellcome Trust: 098051

    Proceedings of the National Academy of Sciences of the United States of America 2012;109;23;9107-12

  • Tandem duplication of chromosomal segments is common in ovarian and breast cancer genomes.

    McBride DJ, Etemadmoghadam D, Cooke SL, Alsop K, George J, Butler A, Cho J, Galappaththige D, Greenman C, Howarth KD, Lau KW, Ng CK, Raine K, Teague J, Wedge DC, Cancer Study Group AO, Caubit X, Stratton MR, Brenton JD, Campbell PJ, Futreal PA and Bowtell DD

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. david.bowtell@petermac.org

    The application of paired-end next generation sequencing approaches has made it possible to systematically characterize rearrangements of the cancer genome to base-pair level. Utilizing this approach, we report the first detailed analysis of ovarian cancer rearrangements, comparing high-grade serous and clear cell cancers, and these histotypes with other solid cancers. Somatic rearrangements were systematically characterized in eight high-grade serous and five clear cell ovarian cancer genomes and we report here the identification of > 600 somatic rearrangements. Recurrent rearrangements of the transcriptional regulator gene, TSHZ3, were found in three of eight serous cases. Comparison to breast, pancreatic and prostate cancer genomes revealed that a subset of ovarian cancers share a marked tandem duplication phenotype with triple-negative breast cancers. The tandem duplication phenotype was not linked to BRCA1/2 mutation, suggesting that other common mechanisms or carcinogenic exposures are operative. High-grade serous cancers arising in women with germline BRCA1 or BRCA2 mutation showed a high frequency of small chromosomal deletions. These findings indicate that BRCA1/2 germline mutation may contribute to widespread structural change and that other undefined mechanism(s), which are potentially shared with triple-negative breast cancer, promote tandem chromosomal duplications that sculpt the ovarian cancer genome.

    Funded by: Cancer Research UK; Wellcome Trust: 077012/Z/05/Z, 093867

    The Journal of pathology 2012;227;4;446-55

  • Disruption of mouse Cenpj, a regulator of centriole biogenesis, phenocopies Seckel syndrome.

    McIntyre RE, Lakshminarasimhan Chavali P, Ismail O, Carragher DM, Sanchez-Andrade G, Forment JV, Fu B, Del Castillo Velasco-Herrera M, Edwards A, van der Weyden L, Yang F, Sanger Mouse Genetics Project, Ramirez-Solis R, Estabel J, Gallagher FA, Logan DW, Arends MJ, Tsang SH, Mahajan VB, Scudamore CL, White JK, Jackson SP, Gergely F and Adams DJ

    Experimental Cancer Genetics, Wellcome Trust Sanger Institute, Hinxton, United Kingdom.

    Disruption of the centromere protein J gene, CENPJ (CPAP, MCPH6, SCKL4), which is a highly conserved and ubiquitiously expressed centrosomal protein, has been associated with primary microcephaly and the microcephalic primordial dwarfism disorder Seckel syndrome. The mechanism by which disruption of CENPJ causes the proportionate, primordial growth failure that is characteristic of Seckel syndrome is unknown. By generating a hypomorphic allele of Cenpj, we have developed a mouse (Cenpj(tm/tm)) that recapitulates many of the clinical features of Seckel syndrome, including intrauterine dwarfism, microcephaly with memory impairment, ossification defects, and ocular and skeletal abnormalities, thus providing clear confirmation that specific mutations of CENPJ can cause Seckel syndrome. Immunohistochemistry revealed increased levels of DNA damage and apoptosis throughout Cenpj(tm/tm) embryos and adult mice showed an elevated frequency of micronucleus induction, suggesting that Cenpj-deficiency results in genomic instability. Notably, however, genomic instability was not the result of defective ATR-dependent DNA damage signaling, as is the case for the majority of genes associated with Seckel syndrome. Instead, Cenpj(tm/tm) embryonic fibroblasts exhibited irregular centriole and centrosome numbers and mono- and multipolar spindles, and many were near-tetraploid with numerical and structural chromosomal abnormalities when compared to passage-matched wild-type cells. Increased cell death due to mitotic failure during embryonic development is likely to contribute to the proportionate dwarfism that is associated with CENPJ-Seckel syndrome.

    Funded by: Cancer Research UK: 11224, A11224; European Research Council: 268536; Medical Research Council: G0901338; NEI NIH HHS: K08 EY020530, NIH 1K08EY020530-01A1; Wellcome Trust: 092096, 098051

    PLoS genetics 2012;8;11;e1003022

  • Cancer gene discovery in the mouse.

    McIntyre RE, van der Weyden L and Adams DJ

    Experimental Cancer Genetics, The Wellcome Trust Sanger Institute, Hinxton, Cambs CB10 1HH, UK.

    Developments in high-throughput genome analysis and in computational tools have made it possible to rapidly profile entire cancer genomes with basepair resolution. In parallel with these advances, mouse models of cancer have evolved into powerful tools for cancer gene discovery. Here we discuss some of the approaches that may be used for cancer gene identification in the mouse and discuss how a cross-species 'oncogenomics' approach to cancer gene discovery represents a powerful strategy for finding genes that drive tumorigenesis.

    Funded by: Cancer Research UK; Wellcome Trust

    Current opinion in genetics & development 2012;22;1;14-20

  • SynGAP isoforms exert opposing effects on synaptic strength.

    McMahon AC, Barnett MW, O'Leary TS, Stoney PN, Collins MO, Papadia S, Choudhary JS, Komiyama NH, Grant SG, Hardingham GE, Wyllie DJ and Kind PC

    Centre for Integrative Physiology, University of Edinburgh, Edinburgh EH8 9XD, UK.

    Alternative promoter usage and alternative splicing enable diversification of the transcriptome. Here we demonstrate that the function of Synaptic GTPase-Activating Protein (SynGAP), a key synaptic protein, is determined by the combination of its amino-terminal sequence with its carboxy-terminal sequence. 5' rapid amplification of cDNA ends and primer extension show that different N-terminal protein sequences arise through alternative promoter usage that are regulated by synaptic activity and postnatal age. Heterogeneity in C-terminal protein sequence arises through alternative splicing. Overexpression of SynGAP α1 versus α2 C-termini-containing proteins in hippocampal neurons has opposing effects on synaptic strength, decreasing and increasing miniature excitatory synaptic currents amplitude/frequency, respectively. The magnitude of this C-terminal-dependent effect is modulated by the N-terminal peptide sequence. This is the first demonstration that activity-dependent alternative promoter usage can change the function of a synaptic protein at excitatory synapses. Furthermore, the direction and degree of synaptic modulation exerted by different protein isoforms from a single gene locus is dependent on the combination of differential promoter usage and alternative splicing.

    Funded by: Medical Research Council: G0300466, G0601584, G0700967, G0902044, G0902044(94018); Wellcome Trust

    Nature communications 2012;3;900

  • New insights into the bacterial fitness-associated mechanisms revealed by the characterization of large plasmids of an avian pathogenic E. coli.

    Mellata M, Maddux JT, Nam T, Thomson N, Hauser H, Stevens MP, Mukhopadhyay S, Sarker S, Crabbé A, Nickerson CA, Santander J and Curtiss R

    The Biodesign Institute, Arizona State University, Tempe, Arizona, United States of America. melha.mellata@asu.edu

    Extra-intestinal pathogenic E. coli (ExPEC), including avian pathogenic E. coli (APEC), pose a considerable threat to both human and animal health, with illness causing substantial economic loss. APEC strain χ7122 (O78∶K80∶H9), containing three large plasmids [pChi7122-1 (IncFIB/FIIA-FIC), pChi7122-2 (IncFII), and pChi7122-3 (IncI(2))]; and a small plasmid pChi7122-4 (ColE2-like), has been used for many years as a model strain to study the molecular mechanisms of ExPEC pathogenicity and zoonotic potential. We previously sequenced and characterized the plasmid pChi7122-1 and determined its importance in systemic APEC infection; however the roles of the other pChi7122 plasmids were still ambiguous. Herein we present the sequence of the remaining pChi7122 plasmids, confirming that pChi7122-2 and pChi7122-3 encode an ABC iron transport system (eitABCD) and a putative type IV fimbriae respectively, whereas pChi7122-4 is a cryptic plasmid. New features were also identified, including a gene cluster on pChi7122-2 that is not present in other E. coli strains but is found in Salmonella serovars and is predicted to encode the sugars catabolic pathways. In vitro evaluation of the APEC χ7122 derivative strains with the three large plasmids, either individually or in combinations, provided new insights into the role of plasmids in biofilm formation, bile and acid tolerance, and the interaction of E. coli strains with 3-D cultures of intestinal epithelial cells. In this study, we show that the nature and combinations of plasmids, as well as the background of the host strains, have an effect on these phenomena. Our data reveal new insights into the role of extra-chromosomal sequences in fitness and diversity of ExPEC in their phenotypes.

    Funded by: NIAID NIH HHS: R21 AI090416

    PloS one 2012;7;1;e29481

  • Communication about DTC testing: commentary on a 'family experience of personal genomics'.

    Middleton A

    This paper provides a commentary on 'Family Experience of Personal Genomics' (Corpas 2012). An overview is offered on the communication literature available to help support individuals and families to communicate about genetic information. Despite there being a wealth of evidence, built on years of genetic counseling practice, this does not appear to have been translated clearly to the Direct to Consumer (DTC) testing market. In many countries it is possible to order a DTC genetic test without the involvement of any health professional; there has been heated debate about whether this is appropriate or not. Much of the focus surrounding this has been on whether it is necessary to have a health professional available to offer their clinical knowledge and help with interpreting the DTC genetic test data. What has been missed from this debate is the importance of enabling customers of DTC testing services access to the abundance of information about how to communicate their genetic risks to others, including immediate family. Family communication about health and indeed genetics can be fraught with difficulty. Genetic health professionals, specifically genetic counselors, have particular expertise in family communication about genetics. Such information could be incredibly useful to kinships as they grapple with knowing how to communicate their genomic information with relatives.

    Funded by: Wellcome Trust: WT077008

    Journal of genetic counseling 2012;21;3;392-8

  • Generation of the Sotos syndrome deletion in mice.

    Migdalska AM, van der Weyden L, Ismail O, Sanger Mouse Genetics Project, Rust AG, Rashid M, White JK, Sánchez-Andrade G, Lupski JR, Logan DW, Arends MJ and Adams DJ

    Experimental Cancer Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1HH, UK.

    Haploinsufficiency of the human 5q35 region spanning the NSD1 gene results in a rare genomic disorder known as Sotos syndrome (Sotos), with patients displaying a variety of clinical features, including pre- and postnatal overgrowth, intellectual disability, and urinary/renal abnormalities. We used chromosome engineering to generate a segmental monosomy, i.e., mice carrying a heterozygous 1.5-Mb deletion of 36 genes on mouse chromosome 13 (4732471D19Rik-B4galt7), syntenic with 5q35.2-q35.3 in humans (Df(13)Ms2Dja ( +/- ) mice). Surprisingly Df(13)Ms2Dja ( +/- ) mice were significantly smaller for their gestational age and also showed decreased postnatal growth, in contrast to Sotos patients. Df(13)Ms2Dja ( +/- ) mice did, however, display deficits in long-term memory retention and dilation of the pelvicalyceal system, which in part may model the learning difficulties and renal abnormalities observed in Sotos patients. Thus, haploinsufficiency of genes within the mouse 4732471D19Rik-B4galt7 deletion interval play important roles in growth, memory retention, and the development of the renal pelvicalyceal system.

    Funded by: Cancer Research UK; Wellcome Trust

    Mammalian genome : official journal of the International Mammalian Genome Society 2012;23;11-12;749-57

  • Modeling partial monosomy for human chromosome 21q11.2-q21.1 reveals haploinsufficient genes influencing behavior and fat deposition.

    Migdalska AM, van der Weyden L, Ismail O, White JK, Sanger Mouse Genetics Project, Sánchez-Andrade G, Logan DW, Arends MJ and Adams DJ

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    Haploinsufficiency of part of human chromosome 21 results in a rare condition known as Monosomy 21. This disease displays a variety of clinical phenotypes, including intellectual disability, craniofacial dysmorphology, skeletal and cardiac abnormalities, and respiratory complications. To search for dosage-sensitive genes involved in this disorder, we used chromosome engineering to generate a mouse model carrying a deletion of the Lipi-Usp25 interval, syntenic with 21q11.2-q21.1 in humans. Haploinsufficiency for the 6 genes in this interval resulted in no gross morphological defects and behavioral analysis performed using an open field test, a test of anxiety, and tests for social interaction were normal in monosomic mice. Monosomic mice did, however, display impaired memory retention compared to control animals. Moreover, when fed a high-fat diet (HFD) monosomic mice exhibited a significant increase in fat mass/fat percentage estimate compared with controls, severe fatty changes in their livers, and thickened subcutaneous fat. Thus, genes within the Lipi-Usp25 interval may participate in memory retention and in the regulation of fat deposition.

    Funded by: Cancer Research UK; Wellcome Trust

    PloS one 2012;7;1;e29681

  • FiRePat - Finding Regulatory Patterns between sRNAs and Genes

    Mohorianu,I., Lopez-Gomollon,S., SCHWACH,F., Dalmay,T. and Moulton,V.

    Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2012;2;3;273-84

  • Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes.

    Morris AP, Voight BF, Teslovich TM, Ferreira T, Segrè AV, Steinthorsdottir V, Strawbridge RJ, Khan H, Grallert H, Mahajan A, Prokopenko I, Kang HM, Dina C, Esko T, Fraser RM, Kanoni S, Kumar A, Lagou V, Langenberg C, Luan J, Lindgren CM, Müller-Nurasyid M, Pechlivanis S, Rayner NW, Scott LJ, Wiltshire S, Yengo L, Kinnunen L, Rossin EJ, Raychaudhuri S, Johnson AD, Dimas AS, Loos RJ, Vedantam S, Chen H, Florez JC, Fox C, Liu CT, Rybin D, Couper DJ, Kao WH, Li M, Cornelis MC, Kraft P, Sun Q, van Dam RM, Stringham HM, Chines PS, Fischer K, Fontanillas P, Holmen OL, Hunt SE, Jackson AU, Kong A, Lawrence R, Meyer J, Perry JR, Platou CG, Potter S, Rehnberg E, Robertson N, Sivapalaratnam S, Stančáková A, Stirrups K, Thorleifsson G, Tikkanen E, Wood AR, Almgren P, Atalay M, Benediktsson R, Bonnycastle LL, Burtt N, Carey J, Charpentier G, Crenshaw AT, Doney AS, Dorkhan M, Edkins S, Emilsson V, Eury E, Forsen T, Gertow K, Gigante B, Grant GB, Groves CJ, Guiducci C, Herder C, Hreidarsson AB, Hui J, James A, Jonsson A, Rathmann W, Klopp N, Kravic J, Krjutškov K, Langford C, Leander K, Lindholm E, Lobbens S, Männistö S, Mirza G, Mühleisen TW, Musk B, Parkin M, Rallidis L, Saramies J, Sennblad B, Shah S, Sigurðsson G, Silveira A, Steinbach G, Thorand B, Trakalo J, Veglia F, Wennauer R, Winckler W, Zabaneh D, Campbell H, van Duijn C, Uitterlinden AG, Hofman A, Sijbrands E, Abecasis GR, Owen KR, Zeggini E, Trip MD, Forouhi NG, Syvänen AC, Eriksson JG, Peltonen L, Nöthen MM, Balkau B, Palmer CN, Lyssenko V, Tuomi T, Isomaa B, Hunter DJ, Qi L, Wellcome Trust Case Control Consortium, Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC) Investigators, Genetic Investigation of ANthropometric Traits (GIANT) Consortium, Asian Genetic Epidemiology Network–Type 2 Diabetes (AGEN-T2D) Consortium, South Asian Type 2 Diabetes (SAT2D) Consortium, Shuldiner AR, Roden M, Barroso I, Wilsgaard T, Beilby J, Hovingh K, Price JF, Wilson JF, Rauramaa R, Lakka TA, Lind L, Dedoussis G, Njølstad I, Pedersen NL, Khaw KT, Wareham NJ, Keinanen-Kiukaanniemi SM, Saaristo TE, Korpi-Hyövälti E, Saltevo J, Laakso M, Kuusisto J, Metspalu A, Collins FS, Mohlke KL, Bergman RN, Tuomilehto J, Boehm BO, Gieger C, Hveem K, Cauchi S, Froguel P, Baldassarre D, Tremoli E, Humphries SE, Saleheen D, Danesh J, Ingelsson E, Ripatti S, Salomaa V, Erbel R, Jöckel KH, Moebus S, Peters A, Illig T, de Faire U, Hamsten A, Morris AD, Donnelly PJ, Frayling TM, Hattersley AT, Boerwinkle E, Melander O, Kathiresan S, Nilsson PM, Deloukas P, Thorsteinsdottir U, Groop LC, Stefansson K, Hu F, Pankow JS, Dupuis J, Meigs JB, Altshuler D, Boehnke M, McCarthy MI and DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium

    Wellcome Trust Centre for Human Genetics, University of Oxford, UK. amorris@well.ox.ac.uk

    To extend understanding of the genetic architecture and molecular basis of type 2 diabetes (T2D), we conducted a meta-analysis of genetic variants on the Metabochip, including 34,840 cases and 114,981 controls, overwhelmingly of European descent. We identified ten previously unreported T2D susceptibility loci, including two showing sex-differentiated association. Genome-wide analyses of these data are consistent with a long tail of additional common variant loci explaining much of the variation in susceptibility to T2D. Exploration of the enlarged set of susceptibility loci implicates several processes, including CREBBP-related transcription, adipocytokine signaling and cell cycle regulation, in diabetes pathogenesis.

    Funded by: British Heart Foundation: RG/07/008/23674, RG/08/014/24067, RG/98002, RG2008/08; Cancer Research UK; Chief Scientist Office: CZB/4/672, CZB/4/710; Department of Health: DHCS/07/07/008; Medical Research Council: G0000649, G0401527, G0601261, G0701863, G0902037, G1000143, G19/35, G8802774, MC_U106179471, MC_UP_A100_1003; NCI NIH HHS: CA055075; NCRR NIH HHS: UL1 RR029887, UL1RR025005; NHGRI NIH HHS: 1Z01HG000024, N01HG65403, U01HG004399, U01HG004402; NHLBI NIH HHS: N01HC25195, N02HL64278, R01HL086694, R01HL087641, R01HL59367; NIA NIH HHS: AG028555, AG04563, AG08724, AG08861, AG10175; NIDDK NIH HHS: DK058845, DK062370, DK072193, DK073490, DK078616, DK080140, K24 DK080140, R01 DK072193, R01 DK073490, U01 DK085545; NIGMS NIH HHS: T32 GM007753; NINDS NIH HHS: 1R21NS064908; PHS HHS: HHSN268200625226C, HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, HHSN268201100012C; Wellcome Trust: 064890, 081682, 090367, 090532, 098017, 098395, GR072960, GR076113, GR077016, GR081682, GR083270, GR083948, GR084711, GR086596, GR090532, GR098051

    Nature genetics 2012;44;9;981-90

  • Olorin: combining gene flow with exome sequencing in large family studies of complex disease.

    Morris JA and Barrett JC

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK. olorin@sanger.ac.uk

    Motivation: The existence of families with many individuals affected by the same complex disease has long suggested the possibility of rare alleles of high penetrance. In contrast to Mendelian diseases, however, linkage studies have identified very few reproducibly linked loci in diseases such as diabetes and autism. Genome-wide association studies have had greater success with such diseases, but these results explain neither the extreme disease load nor the within-family linkage peaks, of some large pedigrees. Combining linkage information with exome or genome sequencing from large complex disease pedigrees might finally identify family-specific, high-penetrance mutations.

    Results: Olorin is a tool, which integrates gene flow within families with next generation sequencing data to enable the analysis of complex disease pedigrees. Users can interactively filter and prioritize variants based on haplotype sharing across selected individuals and other measures of importance, including predicted functional consequence and population frequency.

    Funded by: Wellcome Trust: WT098051

    Bioinformatics (Oxford, England) 2012;28;24;3320-1

  • Generation of multipotent lung and airway progenitors from mouse ESCs and patient-specific cystic fibrosis iPSCs.

    Mou H, Zhao R, Sherwood R, Ahfeldt T, Lapey A, Wain J, Sicilian L, Izvolsky K, Musunuru K, Cowan C and Rajagopal J

    Center for Regenerative Medicine, Massachusetts General Hospital, Boston, 02114, USA.

    Deriving lung progenitors from patient-specific pluripotent cells is a key step in producing differentiated lung epithelium for disease modeling and transplantation. By mimicking the signaling events that occur during mouse lung development, we generated murine lung progenitors in a series of discrete steps. Definitive endoderm derived from mouse embryonic stem cells (ESCs) was converted into foregut endoderm, then into replicating Nkx2.1+ lung endoderm, and finally into multipotent embryonic lung progenitor and airway progenitor cells. We demonstrated that precisely-timed BMP, FGF, and WNT signaling are required for NKX2.1 induction. Mouse ESC-derived Nkx2.1+ progenitor cells formed respiratory epithelium (tracheospheres) when transplanted subcutaneously into mice. We then adapted this strategy to produce disease-specific lung progenitor cells from human Cystic Fibrosis induced pluripotent stem cells (iPSCs), creating a platform for dissecting human lung disease. These disease-specific human lung progenitors formed respiratory epithelium when subcutaneously engrafted into immunodeficient mice.

    Funded by: NHLBI NIH HHS: P30 HL101287, R21 HL108055, R21 HL109786

    Cell stem cell 2012;10;4;385-97

  • Genotypic homogeneity of multidrug resistant s. Typhimurium infecting distinct adult and childhood susceptibility groups in Blantyre, Malawi.

    Msefula CL, Kingsley RA, Gordon MA, Molyneux E, Molyneux ME, Maclennan CA, Dougan G and Heyderman RS

    Department of Microbiology, College of Medicine, University of Malawi, Blantyre, Malawi.

    Nontyphoidal Salmonella (NTS) serovars are a common cause of bacteraemia in young children and HIV-infected adults in Malawi and elsewhere in sub-Saharan Africa. These patient populations provide diverse host-immune environments that have the potential to drive bacterial adaptation and evolution. We therefore investigated the diversity of 27 multidrug resistant (MDR) Salmonella Typhimurium strains isolated over 6 years (2002-2008) from HIV-infected adults and children and HIV-uninfected children. Sequence reads from whole-genome sequencing of these isolates using the Illumina GA platform were mapped to the genome of the laboratory strain S. Typhimurium SL1344 excluding homoplastic regions that contained prophage and insertion elements. A phylogenetic tree generated from single nucleotide polymorphisms showed that all 27 strains clustered with the prototypical MDR strain D23580. There was no clustering of strains based on host HIV status or age, suggesting that these susceptible populations acquire S. Typhimurium from common sources or that isolates are transmitted freely between these populations. However, 7/14 of the most recent isolates (2006/2008) formed a distinct clade that branched off 22 SNPs away from the cluster containing earlier isolates. These data suggest that the MDR bacterial population is not static, but is undergoing microevolution which might result in further epidemiology change.

    PloS one 2012;7;7;e42085

  • Genomic Pathology of SLE-Associated Copy-Number Variation at the FCGR2C/FCGR3B/FCGR2B Locus.

    Mueller M, Barros P, Witherden AS, Roberts AL, Zhang Z, Schaschl H, Yu CY, Hurles ME, Schaffner C, Floto RA, Game L, Steinberg KM, Wilson RK, Graves TA, Eichler EE, Cook HT, Vyse TJ and Aitman TJ

    Physiological Genomics and Medicine Group, MRC Clinical Sciences Centre, Faculty of Medicine, Imperial College London, London W12 0NN, UK.

    Reduced FCGR3B copy number is associated with increased risk of systemic lupus erythematosus (SLE). The five FCGR2/FCGR3 genes are arranged across two highly paralogous genomic segments on chromosome 1q23. Previous studies have suggested mechanisms for structural rearrangements at the FCGR2/FCGR3 locus and have proposed mechanisms whereby altered FCGR3B copy number predisposes to autoimmunity, but the high degree of sequence similarity between paralogous segments has prevented precise definition of the molecular events and their functional consequences. To pursue the genomic pathology associated with FCGR3B copy-number variation, we integrated sequencing data from fosmid and bacterial artificial chromosome clones and sequence-captured DNA from FCGR3B-deleted genomes to establish a detailed map of allelic and paralogous sequence variation across the FCGR2/FCGR3 locus. This analysis identified two highly paralogous 24.5 kb blocks within the FCGR2C/FCGR3B/FCGR2B locus that are devoid of nonpolymorphic paralogous sequence variations and that define the limits of the genomic regions in which nonallelic homologous recombination leads to FCGR2C/FCGR3B copy-number variation. Further, the data showed evidence of swapping of haplotype blocks between these highly paralogous blocks that most likely arose from sequential ancestral recombination events across the region. Functionally, we found by flow cytometry, immunoblotting and cDNA sequencing that individuals with FCGR3B-deleted alleles show ectopic presence of FcγRIIb on natural killer (NK) cells. We conclude that FCGR3B deletion juxtaposes the 5'-regulatory sequences of FCGR2C with the coding sequence of FCGR2B, creating a chimeric gene that results in an ectopic accumulation of FcγRIIb on NK cells and provides an explanation for SLE risk associated with reduced FCGR3B gene copy number.

    American journal of human genetics 2012

  • Behavior and target site selection of conjugative transposon Tn916 in two different strains of toxigenic Clostridium difficile.

    Mullany P, Williams R, Langridge GC, Turner DJ, Whalan R, Clayton C, Lawley T, Hussain H, McCurrie K, Morden N, Allan E and Roberts AP

    Department of Microbial Diseases, UCL Eastman Dental Institute, University College London, London, UK. p.mullany@ucl.ac.uk

    The insertion sites of the conjugative transposon Tn916 in the anaerobic pathogen Clostridium difficile were determined using Illumina Solexa high-throughput DNA sequencing of Tn916 insertion libraries in two different clinical isolates: 630ΔE, an erythromycin-sensitive derivative of 630 (ribotype 012), and the ribotype 027 isolate R20291, which was responsible for a severe outbreak of C. difficile disease. A consensus 15-bp Tn916 insertion sequence was identified which was similar in both strains, although an extended consensus sequence was observed in R20291. A search of the C. difficile 630 genome showed that the Tn916 insertion motif was present 100,987 times, with approximately 63,000 of these motifs located in genes and 35,000 in intergenic regions. To test the usefulness of Tn916 as a mutagen, a functional screen allowed the isolation of a mutant. This mutant contained Tn916 inserted into a gene involved in flagellar biosynthesis.

    Funded by: Medical Research Council: G0601176

    Applied and environmental microbiology 2012;78;7;2147-53

  • Genome sequencing and analysis of the Tasmanian devil and its transmissible cancer.

    Murchison EP, Schulz-Trieglaff OB, Ning Z, Alexandrov LB, Bauer MJ, Fu B, Hims M, Ding Z, Ivakhno S, Stewart C, Ng BL, Wong W, Aken B, White S, Alsop A, Becq J, Bignell GR, Cheetham RK, Cheng W, Connor TR, Cox AJ, Feng ZP, Gu Y, Grocock RJ, Harris SR, Khrebtukova I, Kingsbury Z, Kowarsky M, Kreiss A, Luo S, Marshall J, McBride DJ, Murray L, Pearse AM, Raine K, Rasolonjatovo I, Shaw R, Tedder P, Tregidgo C, Vilella AJ, Wedge DC, Woods GM, Gormley N, Humphray S, Schroth G, Smith G, Hall K, Searle SM, Carter NP, Papenfuss AT, Futreal PA, Campbell PJ, Yang F, Bentley DR, Evers DJ and Stratton MR

    Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK. elizabeth.murchison@sanger.ac.uk

    The Tasmanian devil (Sarcophilus harrisii), the largest marsupial carnivore, is endangered due to a transmissible facial cancer spread by direct transfer of living cancer cells through biting. Here we describe the sequencing, assembly, and annotation of the Tasmanian devil genome and whole-genome sequences for two geographically distant subclones of the cancer. Genomic analysis suggests that the cancer first arose from a female Tasmanian devil and that the clone has subsequently genetically diverged during its spread across Tasmania. The devil cancer genome contains more than 17,000 somatic base substitution mutations and bears the imprint of a distinct mutational process. Genotyping of somatic mutations in 104 geographically and temporally distributed Tasmanian devil tumors reveals the pattern of evolution and spread of this parasitic clonal lineage, with evidence of a selective sweep in one geographical area and persistence of parallel lineages in other populations.

    Funded by: Wellcome Trust: 077012/Z/05/Z, 088340, 095908

    Cell 2012;148;4;780-91

  • Evolution of an Eurasian avian-like influenza virus in naïve and vaccinated pigs.

    Murcia PR, Hughes J, Battista P, Lloyd L, Baillie GJ, Ramirez-Gonzalez RH, Ormond D, Oliver K, Elton D, Mumford JA, Caccamo M, Kellam P, Grenfell BT, Holmes EC and Wood JL

    Cambridge Infectious Diseases Consortium, Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom.

    Influenza viruses are characterized by an ability to cross species boundaries and evade host immunity, sometimes with devastating consequences. The 2009 pandemic of H1N1 influenza A virus highlights the importance of pigs in influenza emergence, particularly as intermediate hosts by which avian viruses adapt to mammals before emerging in humans. Although segment reassortment has commonly been associated with influenza emergence, an expanded host-range is also likely to be associated with the accumulation of specific beneficial point mutations. To better understand the mechanisms that shape the genetic diversity of avian-like viruses in pigs, we studied the evolutionary dynamics of an Eurasian Avian-like swine influenza virus (EA-SIV) in naïve and vaccinated pigs linked by natural transmission. We analyzed multiple clones of the hemagglutinin 1 (HA1) gene derived from consecutive daily viral populations. Strikingly, we observed both transient and fixed changes in the consensus sequence along the transmission chain. Hence, the mutational spectrum of intra-host EA-SIV populations is highly dynamic and allele fixation can occur with extreme rapidity. In addition, mutations that could potentially alter host-range and antigenicity were transmitted between animals and mixed infections were commonplace, even in vaccinated pigs. Finally, we repeatedly detected distinct stop codons in virus samples from co-housed pigs, suggesting that they persisted within hosts and were transmitted among them. This implies that mutations that reduce viral fitness in one host, but which could lead to fitness benefits in a novel host, can circulate at low frequencies.

    Funded by: Medical Research Council: G0801822; NICHD NIH HHS: R24 HD047879; NIGMS NIH HHS: R01 GM080533-05; Wellcome Trust

    PLoS pathogens 2012;8;5;e1002730

  • An atypical facial appearance and growth pattern in a child with Cornelia de Lange Syndrome: an intragenic deletion predicting loss of the N-terminal region of NIPBL.

    Murray JE, Walayat M, Gillett P, Sharkey FH, Rajan D, Carter NP and FitzPatrick DR

    South-east Scotland Clinical Genetics Services Western General Hospital, Edinburgh, UK. Jennie.murray@luht.scot.nhs.uk

    Cornelia de Lange Syndrome (CdLS) is a multisystem disorder with a live birth prevalence of approximately one per 15 000. Clinical diagnosis is based on a characteristic facies – low frontal hair line, short nose, triangular nasal tip, crescent shaped mouth, upturned nose, and arched eyebrows – characteristic limb defects and a distinctive pattern of growth and development. Approximately half of all classical cases of CdLS have heterozygous loss of-function mutations in the gene encoding NIPBL, a component of the cohesion-loading apparatus (Dorsett and Krantz, 2009). Herein we describe a patient with a rare intragenic deletion of NIPBL who has typical microcephaly and developmental problems but atypical growth pattern and facial features.

    Funded by: Medical Research Council: MC_PC_U127561093, MC_U127561093; Wellcome Trust: WT077008

    Clinical dysmorphology 2012;21;1;22-3

  • Beyond knockouts: cre resources for conditional mutagenesis.

    Murray SA, Eppig JT, Smedley D, Simpson EM and Rosenthal N

    The Jackson Laboratory, 600 Main Street, Bar Harbor, ME, 04609, USA, steve.murray@jax.org.

    With the effort of the International Phenotyping Consortium to produce thousands of strains with conditional potential gathering steam, there is growing recognition that it must be supported by a rich toolbox of cre driver strains. The approaches to build cre strains have evolved in both sophistication and reliability, replacing first-generation strains with tools that can target individual cell populations with incredible precision and specificity. The modest set of cre drivers generated by individual labs over the past 15+ years is now growing rapidly, thanks to a number of large-scale projects to produce new cre strains for the community. The power of this growing resource, however, depends upon the proper deep characterization of strain function, as even the best designed strain can display a variety of undesirable features that must be considered in experimental design. This must be coupled with the parallel development of informatics tools to provide functional data to the user and facilitated access to the strains through public repositories. We discuss the current progress on all of these fronts and the challenges that remain to ensure the scientific community can capitalize on the tremendous number of mouse resources at their disposal.

    Mammalian genome : official journal of the International Mammalian Genome Society 2012

  • Bacterial frequent flyers.

    Mutreja A

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. microbes@sanger.ac.uk.

    This month's Genome Watch describes how sequencing technology is providing insight into the geographical relationships and global travel of bacterial pathogens.

    Nature reviews. Microbiology 2012;10;11;734

  • Genome wide adaptations of Plasmodium falciparum in response to lumefantrine selective drug pressure.

    Mwai L, Diriye A, Masseno V, Muriithi S, Feltwell T, Musyoki J, Lemieux J, Feller A, Mair GR, Marsh K, Newbold C, Nzila A and Carret CK

    Kenya Medical Research Institute, Welcome Trust Research Programme, Kilifi, Kenya.

    The combination therapy of the Artemisinin-derivative Artemether (ART) with Lumefantrine (LM) (Coartem®) is an important malaria treatment regimen in many endemic countries. Resistance to Artemisinin has already been reported, and it is feared that LM resistance (LMR) could also evolve quickly. Therefore molecular markers which can be used to track Coartem® efficacy are urgently needed. Often, stable resistance arises from initial, unstable phenotypes that can be identified in vitro. Here we have used the Plasmodium falciparum multidrug resistant reference strain V1S to induce LMR in vitro by culturing the parasite under continuous drug pressure for 16 months. The initial IC(50) (inhibitory concentration that kills 50% of the parasite population) was 24 nM. The resulting resistant strain V1S(LM), obtained after culture for an estimated 166 cycles under LM pressure, grew steadily in 378 nM of LM, corresponding to 15 times the IC(50) of the parental strain. However, after two weeks of culturing V1S(LM) in drug-free medium, the IC(50) returned to that of the initial, parental strain V1S. This transient drug tolerance was associated with major changes in gene expression profiles: using the PFSANGER Affymetrix custom array, we identified 184 differentially expressed genes in V1S(LM). Among those are 18 known and putative transporters including the multidrug resistance gene 1 (pfmdr1), the multidrug resistance associated protein and the V-type H+ pumping pyrophosphatase 2 (pfvp2) as well as genes associated with fatty acid metabolism. In addition we detected a clear selective advantage provided by two genomic loci in parasites grown under LM drug pressure, suggesting that all, or some of those genes contribute to development of LM tolerance--they may prove useful as molecular markers to monitor P. falciparum LM susceptibility.

    PloS one 2012;7;2;e31623

  • A GWAS sequence variant for platelet volume marks an alternative DNM3 promoter in megakaryocytes near a MEIS1 binding site.

    Nürnberg ST, Rendon A, Smethurst PA, Paul DS, Voss K, Thon JN, Lloyd-Jones H, Sambrook JG, Tijssen MR, HaemGen Consortium, Italiano JE, Deloukas P, Gottgens B, Soranzo N and Ouwehand WH

    Department of Haematology, University of Cambridge and National Health Service Blood and Transplant, Cambridge, United Kingdom.

    We recently identified 68 genomic loci where common sequence variants are associated with platelet count and volume. Platelets are formed in the bone marrow by megakaryocytes, which are derived from hematopoietic stem cells by a process mainly controlled by transcription factors. The homeobox transcription factor MEIS1 is uniquely transcribed in megakaryocytes and not in the other lineage-committed blood cells. By ChIP-seq, we show that 5 of the 68 loci pinpoint a MEIS1 binding event within a group of 252 MK-overexpressed genes. In one such locus in DNM3, regulating platelet volume, the MEIS1 binding site falls within a region acting as an alternative promoter that is solely used in megakaryocytes, where allelic variation dictates different levels of a shorter transcript. The importance of dynamin activity to the latter stages of thrombopoiesis was confirmed by the observation that the inhibitor Dynasore reduced murine proplatelet for-mation in vitro.

    Funded by: British Heart Foundation: RG/09/12/28096; Medical Research Council: G0800784; NHLBI NIH HHS: HL68130; Wellcome Trust: WT-084183/2/07/2

    Blood 2012;120;24;4859-68

  • Genomic islands of divergence in hybridizing Heliconius butterflies identified by large-scale targeted sequencing.

    Nadeau NJ, Whibley A, Jones RT, Davey JW, Dasmahapatra KK, Baxter SW, Quail MA, Joron M, ffrench-Constant RH, Blaxter ML, Mallet J and Jiggins CD

    Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK. njn27@cam.ac.uk

    Heliconius butterflies represent a recent radiation of species, in which wing pattern divergence has been implicated in speciation. Several loci that control wing pattern phenotypes have been mapped and two were identified through sequencing. These same gene regions play a role in adaptation across the whole Heliconius radiation. Previous studies of population genetic patterns at these regions have sequenced small amplicons. Here, we use targeted next-generation sequence capture to survey patterns of divergence across these entire regions in divergent geographical races and species of Heliconius. This technique was successful both within and between species for obtaining high coverage of almost all coding regions and sufficient coverage of non-coding regions to perform population genetic analyses. We find major peaks of elevated population differentiation between races across hybrid zones, which indicate regions under strong divergent selection. These 'islands' of divergence appear to be more extensive between closely related species, but there is less clear evidence for such islands between more distantly related species at two further points along the 'speciation continuum'. We also sequence fosmid clones across these regions in different Heliconius melpomene races. We find no major structural rearrangements but many relatively large (greater than 1 kb) insertion/deletion events (including gain/loss of transposable elements) that are variable between races.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council

    Philosophical transactions of the Royal Society of London. Series B, Biological sciences 2012;367;1587;343-53

  • Zebrafish breeding in the laboratory environment.

    Nasiadka A and Clark MD

    Zebrafish International Resource Center, University of Oregon, Eugene, OR 97403, USA. andrzej@zebrafish.org

    The zebrafish, Danio rerio, has become a major model organism used in biomedical studies. The widespread use of Danio rerio in research laboratories requires a comprehensive understanding of the husbandry of this species to ensure efficient propagation and maintenance of healthy and genetically diverse colonies. Breeding is a key element in zebrafish husbandry. It is a complex process influenced by a number of factors. Mate choice and mating behavior depend, for example, on olfactory cues, visual stimuli, and social interactions. Spawning is affected by the age and size of fish, interval at which fish are used for egg production, light cycle, diet, and fish health status. A number of breeding strategies, based on either single-pair matings or group crosses, are commonly employed in the laboratory to propagate lines and to identify carriers of specific mutations and/or transgenes. Propagation of zebrafish lines, in particular wild-type-derived strains, is closely monitored to ensure that genetic diversity and vigor are maintained. A robust zebrafish line typically carries a large number of polymorphic variations, which may interfere with reproducibility of experiments. To get a better insight into these variations, a wild-type hybrid Sanger AB Tübingen line has been generated from sequenced homozygous founders.

    Funded by: NCRR NIH HHS: P40 RR012546; Wellcome Trust

    ILAR journal / National Research Council, Institute of Laboratory Animal Resources 2012;53;2;161-8

  • The genomic landscape shaped by selection on transposable elements across 18 mouse strains.

    Nellåker C, Keane TM, Yalcin B, Wong K, Agam A, Belgard TG, Flint J, Adams DJ, Frankel WN and Ponting CP

    MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, UK. christoffer.nellaker@dpag.ox.ac.uk

    Background: Transposable element (TE)-derived sequence dominates the landscape of mammalian genomes and can modulate gene function by dysregulating transcription and translation. Our current knowledge of TEs in laboratory mouse strains is limited primarily to those present in the C57BL/6J reference genome, with most mouse TEs being drawn from three distinct classes, namely short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs) and the endogenous retrovirus (ERV) superfamily. Despite their high prevalence, the different genomic and gene properties controlling whether TEs are preferentially purged from, or are retained by, genetic drift or positive selection in mammalian genomes remain poorly defined.

    Results: Using whole genome sequencing data from 13 classical laboratory and 4 wild-derived mouse inbred strains, we developed a comprehensive catalogue of 103,798 polymorphic TE variants. We employ this extensive data set to characterize TE variants across the Mus lineage, and to infer neutral and selective processes that have acted over 2 million years. Our results indicate that the majority of TE variants are introduced though the male germline and that only a minority of TE variants exert detectable changes in gene expression. However, among genes with differential expression across the strains there are twice as many TE variants identified as being putative causal variants as expected.

    Conclusions: Most TE variants that cause gene expression changes appear to be purged rapidly by purifying selection. Our findings demonstrate that past TE insertions have often been highly deleterious, and help to prioritize TE variants according to their likely contribution to gene expression or phenotype variation.

    Funded by: Cancer Research UK; Medical Research Council; NINDS NIH HHS: R01 NS031348; Wellcome Trust: 079912

    Genome biology 2012;13;6;R45

  • An integrated functional genomics approach identifies the regulatory network directed by brachyury (T) in chordoma.

    Nelson AC, Pillay N, Henderson S, Presneau N, Tirabosco R, Halai D, Berisha F, Flicek P, Stemple DL, Stern CD, Wardle FC and Flanagan AM

    Randall Division of Cell and Molecular Biophysics, New Hunt's House, King's College London, Guy's Campus, London, SE1 1UL, UK.

    Chordoma is a rare malignant tumour of bone, the molecular marker of which is the expression of the transcription factor, brachyury. Having recently demonstrated that silencing brachyury induces growth arrest in a chordoma cell line, we now seek to identify its downstream target genes. Here we use an integrated functional genomics approach involving shRNA-mediated brachyury knockdown, gene expression microarray, ChIP-seq experiments, and bioinformatics analysis to achieve this goal. We confirm that the T-box binding motif of human brachyury is identical to that found in mouse, Xenopus, and zebrafish development, and that brachyury acts primarily as an activator of transcription. Using human chordoma samples for validation purposes, we show that brachyury binds 99 direct targets and indirectly influences the expression of 64 other genes, thereby acting as a master regulator of an elaborate oncogenic transcriptional network encompassing diverse signalling pathways including components of the cell cycle, and extracellular matrix components. Given the wide repertoire of its active binding and the relative specific localization of brachyury to the tumour cells, we propose that an RNA interference-based gene therapy approach is a plausible therapeutic avenue worthy of investigation.

    Funded by: Medical Research Council: G0700095, G0700213

    The Journal of pathology 2012;228;3;274-85

  • Chromosomal rearrangements and karyotype evolution in carnivores revealed by chromosome painting.

    Nie W, Wang J, Su W, Wang D, Tanomtong A, Perelman PL, Graphodatsky AS and Yang F

    State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, the Chinese Academy of Sciences, Kunming, Yunnan, PR China.whnie@mail.kiz.ac.cn

    Chromosomal evolution in carnivores has been revisited extensively using cross-species chromosome painting. Painting probes derived from flow-sorted chromosomes of the domestic dog, which has one of the most rearranged karyotypes in mammals and the highest dipoid number (2n=78) in carnivores, are a powerful tool in detecting both evolutionary intra- and inter-chromosomal rearrangements. However, only a few comparative maps have been established between dog and other non-Canidae species. Here, we extended cross-species painting with dog probes to seven more species representing six carnivore families: Eurasian lynx (Lynx lynx), the stone marten (Martes foina), the small Indian civet (Viverricula indica), the Asian palm civet (Paradoxurus hermaphrodites), Javan mongoose (Hepestes javanicas), the raccoon (Procyon lotor) and the giant panda (Ailuropoda melanoleuca). The numbers and positions of intra-chromosomal rearrangements were found to differ among these carnivore species. A comparative map between human and stone marten, and a map among the Yangtze finless porpoise (Neophocaena phocaenoides asiaeorientalis), stone marten and human were also established to facilitate outgroup comparison and to integrate comparative maps between stone marten and other carnivores with such maps between human and other species. These comparative maps give further insight into genome evolution and karyotype phylogenetic relationships among carnivores, and will facilitate the transfer of gene mapping data from human, domestic dog and cat to other species.

    Heredity 2012;108;1;17-27

  • The role of sphingosine-1-phosphate transporter Spns2 in immune system function.

    Nijnik A, Clare S, Hale C, Chen J, Raisen C, Mottram L, Lucas M, Estabel J, Ryder E, Adissu H, Sanger Mouse Genetics Project, Adams NC, Ramirez-Solis R, White JK, Steel KP, Dougan G and Hancock RE

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom. anastasiya.nyzhnyk@mcgill.ca

    Sphingosine-1-phosphate (S1P) is lipid messenger involved in the regulation of embryonic development, immune system functions, and many other physiological processes. However, the mechanisms of S1P transport across cellular membranes remain poorly understood, with several ATP-binding cassette family members and the spinster 2 (Spns2) member of the major facilitator superfamily known to mediate S1P transport in cell culture. Spns2 was also shown to control S1P activities in zebrafish in vivo and to play a critical role in zebrafish cardiovascular development. However, the in vivo roles of Spns2 in mammals and its involvement in the different S1P-dependent physiological processes have not been investigated. In this study, we characterized Spns2-null mouse line carrying the Spns2(tm1a(KOMP)Wtsi) allele (Spns2(tm1a)). The Spns2(tm1a/tm1a) animals were viable, indicating a divergence in Spns2 function from its zebrafish ortholog. However, the immunological phenotype of the Spns2(tm1a/tm1a) mice closely mimicked the phenotypes of partial S1P deficiency and impaired S1P-dependent lymphocyte trafficking, with a depletion of lymphocytes in circulation, an increase in mature single-positive T cells in the thymus, and a selective reduction in mature B cells in the spleen and bone marrow. Spns2 activity in the nonhematopoietic cells was critical for normal lymphocyte development and localization. Overall, Spns2(tm1a/tm1a) resulted in impaired humoral immune responses to immunization. This study thus demonstrated a physiological role for Spns2 in mammalian immune system functions but not in cardiovascular development. Other components of the S1P signaling network are investigated as drug targets for immunosuppressive therapy, but the selective action of Spns2 may present an advantage in this regard.

    Funded by: Canadian Institutes of Health Research; Medical Research Council: G0300212, MC_QA137918; Wellcome Trust: 098051

    Journal of immunology (Baltimore, Md. : 1950) 2012;189;1;102-11

  • The critical role of histone H2A-deubiquitinase Mysm1 in hematopoiesis and lymphocyte differentiation.

    Nijnik A, Clare S, Hale C, Raisen C, McIntyre RE, Yusa K, Everitt AR, Mottram L, Podrini C, Lucas M, Estabel J, Goulding D, Sanger Institute Microarray Facility, Sanger Mouse Genetics Project, Adams N, Ramirez-Solis R, White JK, Adams DJ, Hancock RE and Dougan G

    Wellcome Trust Genome Campus, The Wellcome Trust Sanger Institute, Cambridge, United Kingdom. anastasiya.nyzhnyk@mcgill.ca

    Stem cell differentiation and lineage specification depend on coordinated programs of gene expression, but our knowledge of the chromatin-modifying factors regulating these events remains incomplete. Ubiquitination of histone H2A (H2A-K119u) is a common chromatin modification associated with gene silencing, and controlled by the ubiquitin-ligase polycomb repressor complex 1 (PRC1) and H2A-deubiquitinating enzymes (H2A-DUBs). The roles of H2A-DUBs in mammalian development, stem cells, and hematopoiesis have not been addressed. Here we characterized an H2A-DUB targeted mouse line Mysm1(tm1a/tm1a) and demonstrated defects in BM hematopoiesis, resulting in lymphopenia, anemia, and thrombocytosis. Development of lymphocytes was impaired from the earliest stages of their differentiation, and there was also a depletion of erythroid cells and a defect in erythroid progenitor function. These phenotypes resulted from a cell-intrinsic requirement for Mysm1 in the BM. Importantly, Mysm1(tm1a/tm1a) HSCs were functionally impaired, and this was associated with elevated levels of reactive oxygen species, γH2AX DNA damage marker, and p53 protein in the hematopoietic progenitors. Overall, these data establish a role for Mysm1 in the maintenance of BM stem cell function, in the control of oxidative stress and genetic stability in hematopoietic progenitors, and in the development of lymphoid and erythroid lineages.

    Funded by: Canadian Institutes of Health Research; Wellcome Trust

    Blood 2012;119;6;1370-9

  • Mutational processes molding the genomes of 21 breast cancers.

    Nik-Zainal S, Alexandrov LB, Wedge DC, Van Loo P, Greenman CD, Raine K, Jones D, Hinton J, Marshall J, Stebbings LA, Menzies A, Martin S, Leung K, Chen L, Leroy C, Ramakrishna M, Rance R, Lau KW, Mudie LJ, Varela I, McBride DJ, Bignell GR, Cooke SL, Shlien A, Gamble J, Whitmore I, Maddison M, Tarpey PS, Davies HR, Papaemmanuil E, Stephens PJ, McLaren S, Butler AP, Teague JW, Jönsson G, Garber JE, Silver D, Miron P, Fatima A, Boyault S, Langerød A, Tutt A, Martens JW, Aparicio SA, Borg Å, Salomon AV, Thomas G, Børresen-Dale AL, Richardson AL, Neuberger MS, Futreal PA, Campbell PJ, Stratton MR and Breast Cancer Working Group of the International Cancer Genome Consortium

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    All cancers carry somatic mutations. The patterns of mutation in cancer genomes reflect the DNA damage and repair processes to which cancer cells and their precursors have been exposed. To explore these mechanisms further, we generated catalogs of somatic mutation from 21 breast cancers and applied mathematical methods to extract mutational signatures of the underlying processes. Multiple distinct single- and double-nucleotide substitution signatures were discernible. Cancers with BRCA1 or BRCA2 mutations exhibited a characteristic combination of substitution mutation signatures and a distinctive profile of deletions. Complex relationships between somatic mutation prevalence and transcription were detected. A remarkable phenomenon of localized hypermutation, termed "kataegis," was observed. Regions of kataegis differed between cancers but usually colocalized with somatic rearrangements. Base substitutions in these regions were almost exclusively of cytosine at TpC dinucleotides. The mechanisms underlying most of these mutational signatures are unknown. However, a role for the APOBEC family of cytidine deaminases is proposed.

    Funded by: Department of Health; Medical Research Council: MC_U105178806; NCI NIH HHS: CA089393; Wellcome Trust: 088340, 098051, WT088340MA

    Cell 2012;149;5;979-93

  • The life history of 21 breast cancers.

    Nik-Zainal S, Van Loo P, Wedge DC, Alexandrov LB, Greenman CD, Lau KW, Raine K, Jones D, Marshall J, Ramakrishna M, Shlien A, Cooke SL, Hinton J, Menzies A, Stebbings LA, Leroy C, Jia M, Rance R, Mudie LJ, Gamble SJ, Stephens PJ, McLaren S, Tarpey PS, Papaemmanuil E, Davies HR, Varela I, McBride DJ, Bignell GR, Leung K, Butler AP, Teague JW, Martin S, Jönsson G, Mariani O, Boyault S, Miron P, Fatima A, Langerød A, Aparicio SA, Tutt A, Sieuwerts AM, Borg Å, Thomas G, Salomon AV, Richardson AL, Børresen-Dale AL, Futreal PA, Stratton MR, Campbell PJ and Breast Cancer Working Group of the International Cancer Genome Consortium

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Cancer evolves dynamically as clonal expansions supersede one another driven by shifting selective pressures, mutational processes, and disrupted cancer genes. These processes mark the genome, such that a cancer's life history is encrypted in the somatic mutations present. We developed algorithms to decipher this narrative and applied them to 21 breast cancers. Mutational processes evolve across a cancer's lifespan, with many emerging late but contributing extensive genetic variation. Subclonal diversification is prominent, and most mutations are found in just a fraction of tumor cells. Every tumor has a dominant subclonal lineage, representing more than 50% of tumor cells. Minimal expansion of these subclones occurs until many hundreds to thousands of mutations have accumulated, implying the existence of long-lived, quiescent cell lineages capable of substantial proliferation upon acquisition of enabling genomic changes. Expansion of the dominant subclone to an appreciable mass may therefore represent the final rate-limiting step in a breast cancer's development, triggering diagnosis.

    Funded by: Department of Health; NCI NIH HHS: CA089393; Wellcome Trust: 088340, 093867, 098051

    Cell 2012;149;5;994-1007

  • The transporter ABCB7 is a mediator of the phenotype of acquired refractory anemia with ring sideroblasts.

    Nikpour M, Scharenberg C, Liu A, Conte S, Karimi M, Mortera-Blanco T, Giai V, Fernandez-Mercado M, Papaemmanuil E, Högstrand K, Jansson M, Vedin I, Stephen Wainscoat J, Campbell P, Cazzola M, Boultwood J, Grandien A and Hellström-Lindberg E

    Department of Medicine, Center for Hematology and Regenerative Medicine, Karolinska Institutet, Karolinska University Hospital Huddinge, Stockholm, Sweden.

    Refractory anemia with ring sideroblasts (RARS) is characterized by mitochondrial ferritin (FTMT) accumulation and markedly suppressed expression of the iron transporter ABCB7. To test the hypothesis that ABCB7 is a key mediator of ineffective erythropoiesis of RARS, we modulated its expression in hematopoietic cells. ABCB7 up and downregulation did not influence growth and survival of K562 cells. In normal bone marrow, ABCB7 downregulation reduced erythroid differentiation, growth and colony formation, and resulted in a gene expression pattern similar to that observed in intermediate RARS erythroblasts, and in the accumulation of FTMT. Importantly, forced ABCB7 expression restored erythroid colony growth and decreased FTMT expression level in RARS CD34+ marrow cells. Mutations in the SF3B1 gene, a core component of the RNA splicing machinery, were recently identified in a high proportion of patients with RARS and 11 of the 13 RARS patients in this study carried this mutation. Interestingly, ABCB7 exon usage differed between normal bone marrow and RARS, as well as within the RARS cohort. In addition, SF3B1 silencing resulted in downregulation of ABCB7 in K562 cells undergoing erythroid differentiation. Our findings support that ABCB7 is implicated in the phenotype of acquired RARS and suggest a relation between SF3B1 mutations and ABCB7 downregulation.Leukemia advance online publication, 20 November 2012; doi:10.1038/leu.2012.298.

    Leukemia : official journal of the Leukemia Society of America, Leukemia Research Fund, U.K 2012

  • Expansion of CORE-SINEs in the genome of the Tasmanian devil.

    Nilsson MA, Janke A, Murchison EP, Ning Z and Hallström BM

    LOEWE - Biodiversity and Climate Research Center, BiK-F, Senckenberganlage 25, Frankfurt am Main, D-60325, Germany. maria.nilsson-janke@senckenberg.de.

    Unlabelled: ABSTRACT: Background: The genome of the carnivorous marsupial, the Tasmanian devil (Sarcophilus harrisii, Order: Dasyuromorphia), was sequenced in the hopes of finding a cure for or gaining a better understanding of the contagious devil facial tumor disease that is threatening the species' survival. To better understand the Tasmanian devil genome, we screened it for transposable elements and investigated the dynamics of short interspersed element (SINE) retroposons. Results: The temporal history of Tasmanian devil SINEs, elucidated using a transposition in transposition analysis, indicates that WSINE1, a CORE-SINE present in around 200,000 copies, is the most recently active element. Moreover, we discovered a new subtype of WSINE1 (WSINE1b) that comprises at least 90% of all Tasmanian devil WSINE1s. The frequencies of WSINE1 subtypes differ in the genomes of two of the other Australian marsupial orders. A co-segregation analysis indicated that at least 66 subfamilies of WSINE1 evolved during the evolution of Dasyuromorphia. Using a substitution rate derived from WSINE1 insertions, the ages of the subfamilies were estimated and correlated with a newly established phylogeny of Dasyuromorphia. Phylogenetic analyses and divergence time estimates of mitochondrial genome data indicate a rapid radiation of the Tasmanian devil and the closest relative the quolls (Dasyurus) around 14 million years ago. Conclusions: The radiation and abundance of CORE-SINEs in marsupial genomes indicates that they may be a major player in the evolution of marsupials. It is evident that the early phases of evolution of the carnivorous marsupial order Dasyuromorphia was characterized by a burst of SINE activity. A correlation between a speciation event and a major burst of retroposon activity is for the first time shown in a marsupial genome.

    BMC genomics 2012;13;172

  • Atp6v0a4 knockout mouse is a model of distal renal tubular acidosis with hearing loss, with additional extrarenal phenotype.

    Norgett EE, Golder ZJ, Lorente-Cánovas B, Ingham N, Steel KP and Karet Frankl FE

    Department of Medical Genetics, University of Cambridge, Cambridge CB2 0XY, United Kingdom.

    Autosomal recessive distal renal tubular acidosis (dRTA) is a severe disorder of acid-base homeostasis, often accompanied by sensorineural deafness. We and others have previously shown that mutations in the tissue-restricted a4 and B1 subunits of the H(+)-ATPase underlie this syndrome. Here, we describe an Atp6v0a4 knockout mouse, which lacks the a4 subunit. Using β-galactosidase as a reporter for the null gene, developmental a4 expression was detected in developing bone, nose, eye, and skin, in addition to that expected in kidney and inner ear. By the time of weaning, Atp6v0a4(-/-) mice demonstrated severe metabolic acidosis, hypokalemia, and early nephrocalcinosis. Null mice were hypocitraturic, but hypercalciuria was absent. They were severely hearing-impaired, as shown by elevated auditory brainstem response thresholds and absent endocochlear potential. They died rapidly unless alkalinized. If they survived weaning with alkali supplementation, treatment could later be withdrawn, but -/- animals remained acidotic with alkaline urine. They also had an impaired sense of smell. Heterozygous animals were biochemically normal until acid-challenged, when they became more acidotic than +/+ animals. This mouse model recapitulates the loss of H(+)-ATPase function seen in human disease and can provide additional insights into dRTA and the physiology of the a4 subunit.

    Proceedings of the National Academy of Sciences of the United States of America 2012;109;34;13775-80

  • Genome-wide association meta-analysis identifies new endometriosis risk loci.

    Nyholt DR, Low SK, Anderson CA, Painter JN, Uno S, Morris AP, Macgregor S, Gordon SD, Henders AK, Martin NG, Attia J, Holliday EG, McEvoy M, Scott RJ, Kennedy SH, Treloar SA, Missmer SA, Adachi S, Tanaka K, Nakamura Y, Zondervan KT, Zembutsu H and Montgomery GW

    1] Queensland Institute of Medical Research, Brisbane, Queensland, Australia. [2].

    We conducted a genome-wide association meta-analysis of 4,604 endometriosis cases and 9,393 controls of Japanese and European ancestry. We show that rs12700667 on chromosome 7p15.2, previously found to associate with disease in Europeans, replicates in Japanese (P = 3.6 × 10(-3)), and we confirm association of rs7521902 at 1p36.12 near WNT4. In addition, we establish an association of rs13394619 in GREB1 at 2p25.1 with endometriosis and identify a newly associated locus at 12q22 near VEZT (rs10859871). Excluding cases of European ancestry of minimal or unknown severity, we identified additional previously unknown loci at 2p14 (rs4141819), 6p22.3 (rs7739264) and 9p21.3 (rs1537377). All seven SNP effects were replicated in an independent cohort and associated at P <5 × 10(-8) in a combined analysis. Finally, we found a significant overlap in polygenic risk for endometriosis between the genome-wide association cohorts of European and Japanese descent (P = 8.8 × 10(-11)), indicating that many weakly associated SNPs represent true endometriosis risk loci and that risk prediction and future targeted disease therapy may be transferred across these populations.

    Nature genetics 2012

  • Sex-specific influence of DRD2 on ADHD-type temperament in a large population-based birth cohort.

    Nyman ES, Loukola A, Varilo T, Taanila A, Hurtig T, Moilanen I, Loo S, McGough JJ, Järvelin MR, Smalley SL, Nelson SF and Peltonen L

    Public Health Genomics Unit, Institute for Molecular Medicine Finland FIMM, University of Helsinki and National Institute for Health and Welfare, Helsinki, Finland. emma.nyman@thl.fi

    Attention-deficit/hyperactivity disorder (ADHD) is a childhood-onset neurodevelopmental disorder with a significant public-health impact. Previously, we described a candidate gene study in a population-based birth cohort that demonstrated an association with ADHD-affected males and the dopamine receptor D2 (DRD2). The current study evaluates potential associations of dopamine receptor genes and Cloninger temperament traits within this same sample. Participants with stringent lifetime ADHD diagnoses were ascertained systematically from the genetically isolated Northern Finland 1986 Birth Cohort (n=9432), resulting in 178 cases and 157 controls. Markers in all known dopamine receptor genes were genotyped. We report an association of DRD2 with low Persistence in females (rs1079727 P=0.02, rs1124491 P=0.02, rs1800497 P=0.03). The associated DRD2 minor allelic haplotype (CAA, P=0.03) is the same haplotype we previously associated with ADHD in males in this birth cohort. The current study further supports previous results on the role of DRD2 in individuals with ADHD. Investigations suggest that DRD2 may have an impact on both males and females, but the particular outcome appears sex-specific, manifesting as ADHD in males and low Persistence in females. Furthermore, these findings suggest that the putative role of low Persistence as an endophenotype for ADHD deserves further investigation.

    Funded by: NIMH NIH HHS: MH01966, MH063706

    Psychiatric genetics 2012;22;4;197-201

  • The rhoptry proteome of Eimeria tenella sporozoites.

    Oakes RD, Kurian D, Bromley E, Ward C, Lal K, Blake DP, Reid AJ, Pain A, Sinden RE, Wastling JM and Tomley FM

    Institute for Animal Health, Compton, Newbury, Berkshire, UK.

    Proteins derived from the rhoptry secretory organelles are crucial for the invasion and survival of apicomplexan parasites within host cells. The rhoptries are club-shaped organelles that contain two distinct subpopulations of proteins that localise to separate compartments of the organelle. Proteins from the neck region (rhoptry neck proteins, RON) are secreted early in invasion and a subset of these is critical for the formation and function of the moving junction between parasite and host membranes. Proteins from the bulb compartment (rhoptry protein, ROP) are released later, into the nascent parasitophorous vacuole where they have a role in modifying the vacuolar environment, and into the host cell where they act as key determinants of virulence through their ability to interact with host cell signalling pathways, causing an array of downstream effects. In this paper we present the results of an extensive proteomics analysis of the rhoptry organelles from the coccidian parasite, Eimeria tenella, which is a highly pathogenic parasite of the domestic chicken causing severe caecal coccidiosis. Several different classes of rhoptry protein have been identified. First are the RON proteins that have varying degrees of similarity to proteins of Toxoplasma gondii and Neospora caninum. For some RON families, E. tenella expresses more than one gene product and many of the individual RON proteins are differentially expressed between the sporozoite and merozoite developmental stages. The E. tenella sporozoite rhoptry expresses only a limited repertoire of proteins with homology to known ROP proteins from other coccidia, including just two secreted ROP kinases, both of which appear to be equipped for catalytic activity. Finally, a large number of hitherto undescribed proteins that map to the sporozoite rhoptry are identified, many of which have orthologous proteins encoded within the genomes of T. gondii and N. caninum.

    International journal for parasitology 2012

  • Transmission of malaria to mosquitoes blocked by bumped kinase inhibitors.

    Ojo KK, Pfander C, Mueller NR, Burstroem C, Larson ET, Bryan CM, Fox AM, Reid MC, Johnson SM, Murphy RC, Kennedy M, Mann H, Leibly DJ, Hewitt SN, Verlinde CL, Kappe S, Merritt EA, Maly DJ, Billker O and Van Voorhis WC

    Division of Allergy and Infectious Diseases, Department of Medicine, University of Washington, Seattle, Washington 98195-6423, USA.

    Effective control and eradication of malaria will require new tools to prevent transmission. Current antimalarial therapies targeting the asexual stage of Plasmodium do not prevent transmission of circulating gametocytes from infected humans to mosquitoes. Here, we describe a new class of transmission-blocking compounds, bumped kinase inhibitors (BKIs), which inhibit microgametocyte exflagellation. Oocyst formation and sporozoite production, necessary for transmission to mammals, were inhibited in mosquitoes fed on either BKI-1-treated human blood or mice treated with BKI-1. BKIs are hypothesized to act via inhibition of Plasmodium calcium-dependent protein kinase 4 and predicted to have little activity against mammalian kinases. Our data show that BKIs do not inhibit proliferation of mammalian cell lines and are well tolerated in mice. Used in combination with drugs active against asexual stages of Plasmodium, BKIs could prove an important tool for malaria control and eradication.

    Funded by: Medical Research Council: G0501670; NIAID NIH HHS: R01AI080625, R01AI089441; NIGMS NIH HHS: R01 GM086858, R01GM086858; Wellcome Trust: WT089085/Z/09/Z

    The Journal of clinical investigation 2012;122;6;2301-5

  • Intracontinental spread of human invasive Salmonella Typhimurium pathovariants in sub-Saharan Africa.

    Okoro CK, Kingsley RA, Connor TR, Harris SR, Parry CM, Al-Mashhadani MN, Kariuki S, Msefula CL, Gordon MA, de Pinna E, Wain J, Heyderman RS, Obaro S, Alonso PL, Mandomando I, MacLennan CA, Tapia MD, Levine MM, Tennant SM, Parkhill J and Dougan G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    A highly invasive form of non-typhoidal Salmonella (iNTS) disease has recently been documented in many countries in sub-Saharan Africa. The most common Salmonella enterica serovar causing this disease is Typhimurium (Salmonella Typhimurium). We applied whole-genome sequence-based phylogenetic methods to define the population structure of sub-Saharan African invasive Salmonella Typhimurium isolates and compared these to global Salmonella Typhimurium populations. Notably, the vast majority of sub-Saharan invasive Salmonella Typhimurium isolates fell within two closely related, highly clustered phylogenetic lineages that we estimate emerged independently ∼52 and ∼35 years ago in close temporal association with the current HIV pandemic. Clonal replacement of isolates from lineage I by those from lineage II was potentially influenced by the use of chloramphenicol for the treatment of iNTS disease. Our analysis suggests that iNTS disease is in part an epidemic in sub-Saharan Africa caused by highly related Salmonella Typhimurium lineages that may have occupied new niches associated with a compromised human population and antibiotic treatment.

    Funded by: Wellcome Trust: 098051

    Nature genetics 2012;44;11;1215-21

  • High-resolution single nucleotide polymorphism analysis distinguishes recrudescence and reinfection in recurrent invasive nontyphoidal Salmonella typhimurium disease.

    Okoro CK, Kingsley RA, Quail MA, Kankwatira AM, Feasey NA, Parkhill J, Dougan G and Gordon MA

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Background: Bloodstream infection with invasive nontyphoidal Salmonella (iNTS) is common and severe among human immunodeficiency virus (HIV)-infected adults throughout sub-Saharan Africa. The epidemiology of iNTS is poorly understood. Survivors frequently experience multiply recurrent iNTS disease, despite appropriate antimicrobial therapy, but recrudescence and reinfection have previously been difficult to distinguish.

    Methods: We used high-resolution single nucleotide polymorphism (SNP) typing and whole-genome phylogenetics to investigate 47 iNTS isolates from 14 patients with multiple recurrences following an index presentation with iNTS disease in Blantyre, Malawi. We isolated nontyphoidal salmonellae organisms from blood (n = 35), bone marrow (n = 8), stool (n = 2), urine (n = 1), and throat (n = 1) samples; these isolates comprised serotypes Typhimurium (n = 43) and Enteritidis (n = 4).

    Results: Recrudescence with identical or highly phylogenetically related isolates accounted for 78% of recurrences, and reinfection with phylogenetically distinct isolates accounted for 22% of recurrences. Both recrudescence and reinfection could occur in the same individual, and reinfection could either precede or follow recrudescence. The number of days to recurrence (23-486 d) was not different for recrudescence or reinfection. The number of days to recrudescence was unrelated to the number of SNPs accumulated by recrudescent organisms, suggesting that there was little genetic change during persistence in the host, despite exposure to multiple courses of antibiotics. Of Salmonella Typhimurium isolates, 42 of 43 were pathovar ST313.

    Conclusions: High-resolution whole-genome phylogenetics successfully discriminated recrudescent iNTS from reinfection, despite a high level of clonality within and among individuals, giving insights into pathogenesis and management. These methods also have adequate resolution to investigate the epidemiology and transmission of this important African pathogen.

    Funded by: Wellcome Trust: 076964, 098051

    Clinical infectious diseases : an official publication of the Infectious Diseases Society of America 2012;54;7;955-63

  • Recessive HYDIN mutations cause primary ciliary dyskinesia without randomization of left-right body asymmetry.

    Olbrich H, Schmidts M, Werner C, Onoufriadis A, Loges NT, Raidt J, Banki NF, Shoemark A, Burgoyne T, Al Turki S, Hurles ME, UK10K Consortium, Köhler G, Schroeder J, Nürnberg G, Nürnberg P, Chung EM, Reinhardt R, Marthin JK, Nielsen KG, Mitchison HM and Omran H

    Department of General Pediatrics, University Children's Hospital Muenster, 48149 Muenster, Germany.

    Primary ciliary dyskinesia (PCD) is a genetically heterogeneous recessive disorder characterized by defective cilia and flagella motility. Chronic destructive-airway disease is caused by abnormal respiratory-tract mucociliary clearance. Abnormal propulsion of sperm flagella contributes to male infertility. Genetic defects in most individuals affected by PCD cause randomization of left-right body asymmetry; approximately half show situs inversus or situs ambiguous. Almost 70 years after the hy3 mouse possessing Hydin mutations was described as a recessive hydrocephalus model, we report HYDIN mutations in PCD-affected persons without hydrocephalus. By homozygosity mapping, we identified a PCD-associated locus, chromosomal region 16q21-q23, which contains HYDIN. However, a nearly identical 360 kb paralogous segment (HYDIN2) in chromosomal region 1q21.1 complicated mutational analysis. In three affected German siblings linked to HYDIN, we identified homozygous c.3985G>T mutations that affect an evolutionary conserved splice acceptor site and that subsequently cause aberrantly spliced transcripts predicting premature protein termination in respiratory cells. Parallel whole-exome sequencing identified a homozygous nonsense HYDIN mutation, c.922A>T (p.Lys307(∗)), in six individuals from three Faroe Island PCD-affected families that all carried an 8.8 Mb shared haplotype across HYDIN, indicating an ancestral founder mutation in this isolated population. We demonstrate by electron microscopy tomography that, consistent with the effects of loss-of-function mutations, HYDIN mutant respiratory cilia lack the C2b projection of the central pair (CP) apparatus; similar findings were reported in Hydin-deficient Chlamydomonas and mice. High-speed videomicroscopy demonstrated markedly reduced beating amplitudes of respiratory cilia and stiff sperm flagella. Like the hy3 mouse model, all nine PCD-affected persons had normal body composition because nodal cilia function is apparently not dependent on the function of the CP apparatus.

    Funded by: Wellcome Trust: 091310, 091551

    American journal of human genetics 2012;91;4;672-84

  • The Stat6-regulated KRAB domain zinc finger protein Zfp157 regulates the balance of lineages in mammary glands and compensates for loss of Gata-3.

    Oliver CH, Khaled WT, Frend H, Nichols J and Watson CJ

    Department of Pathology, University of Cambridge, Cambridge CB2 1QP, United Kingdom.

    Lineage commitment studies in mammary glands have focused on identifying cell populations that display stem or progenitor properties. However, the mechanisms that control cell fate have been incompletely explored. Herein we show that zinc finger protein 157 (Zfp157) is required to establish the balance between luminal alveolar pStat5- and Gata-3-expressing cells in the murine mammary gland. Using mice in which the zfp157 gene was disrupted, we found that alveologenesis was accelerated concomitantly with a dramatic skewing of the proportion of pStat5-expressing cells relative to Gata-3⁺ cells. This suppression of the Gata-3⁺ lineage was associated with increased expression of the inhibitor of helix-loop-helix protein Id2. Surprisingly, Gata-3 becomes dispensable in the absence of Zfp157, as mice deficient for both Zfp157 and Gata-3 lactate normally, although the glands display a mild epithelial dysplasia. These data suggest that the luminal alveolar compartment of the mammary gland is comprised of a number of distinct cell populations that, although interdependant, exhibit considerable cell fate plasticity.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/H006206/1

    Genes &amp; development 2012;26;10;1086-97

  • Cestode genomics - progress and prospects for advancing basic and applied aspects of flatworm biology.

    Olson PD, Zarowiecki M, Kiss F and Brehm K

    Department of Zoology, The Natural History Museum, London, UK.

    Characterization of the first tapeworm genome, Echinococcus multilocularis, is now nearly complete, and genome assemblies of E. granulosus, Taenia solium and Hymenolepis microstoma are in advanced draft versions. These initiatives herald the beginning of a genomic era in cestodology and underpin a diverse set of research agendas targeting both basic and applied aspects of tapeworm biology. We discuss the progress in the genomics of these species, provide insights into the presence and composition of immunologically relevant gene families, including the antigen B- and EG95/45W families, and discuss chemogenomic approaches toward the development of novel chemotherapeutics against cestode diseases. In addition, we discuss the evolution of tapeworm parasites and introduce the research programmes linked to genome initiatives that are aimed at understanding signalling systems involved in basic host-parasite interactions and morphogenesis.

    Funded by: Biotechnology and Biological Sciences Research Council: BBG0038151

    Parasite immunology 2012;34;2-3;130-50

  • Exome sequencing of liver fluke-associated cholangiocarcinoma.

    Ong CK, Subimerb C, Pairojkul C, Wongkham S, Cutcutache I, Yu W, McPherson JR, Allen GE, Ng CC, Wong BH, Myint SS, Rajasegaran V, Heng HL, Gan A, Zang ZJ, Wu Y, Wu J, Lee MH, Huang D, Ong P, Chan-on W, Cao Y, Qian CN, Lim KH, Ooi A, Dykema K, Furge K, Kukongviriyapan V, Sripa B, Wongkham C, Yongvanit P, Futreal PA, Bhudhisawasdi V, Rozen S, Tan P and Teh BT

    National Cancer Centre Singapore-Van Andel Research Institute Translational Research Laboratory, Division of Medical Sciences, Singapore.

    Opisthorchis viverrini-related cholangiocarcinoma (CCA), a fatal bile duct cancer, is a major public health concern in areas endemic for this parasite. We report here whole-exome sequencing of eight O. viverrini-related tumors and matched normal tissue. We identified and validated 206 somatic mutations in 187 genes using Sanger sequencing and selected 15 genes for mutation prevalence screening in an additional 46 individuals with CCA (cases). In addition to the known cancer-related genes TP53 (mutated in 44.4% of cases), KRAS (16.7%) and SMAD4 (16.7%), we identified somatic mutations in 10 newly implicated genes in 14.8-3.7% of cases. These included inactivating mutations in MLL3 (in 14.8% of cases), ROBO2 (9.3%), RNF43 (9.3%) and PEG3 (5.6%), and activating mutations in the GNAS oncogene (9.3%). These genes have functions that can be broadly grouped into three biological classes: (i) deactivation of histone modifiers, (ii) activation of G protein signaling and (iii) loss of genome stability. This study provides insight into the mutational landscape contributing t