Sanger Institute - Publications 2011

Number of papers published in 2011: 210

  • Genomics in 2011: challenges and opportunities.

    Adams DJ, Berger B, Harismendy O, Huttenhower C, Liu XS, Myers CL, Oshlack A, Rinn JL and Walhout AJ

    Wellcome Trust Sanger Institute.

    As we come to the end of 2011, Genome Biology has asked some members of our Editorial Board for their views on the state of play in genomics. What was their favorite paper of 2011? What are the challenges in their particular research area? Who has had the biggest influence on their careers? What advice would they give to young researchers embarking on a career in research?

    Genome biology 2011;12;12;137

  • Exome sequencing identifies NBEAL2 as the causative gene for gray platelet syndrome.

    Albers CA, Cvejic A, Favier R, Bouwmans EE, Alessi MC, Bertone P, Jordan G, Kettleborough RN, Kiddle G, Kostadima M, Read RJ, Sipos B, Sivapalaratnam S, Smethurst PA, Stephens J, Voss K, Nurden A, Rendon A, Nurden P and Ouwehand WH

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Gray platelet syndrome (GPS) is a predominantly recessive platelet disorder that is characterized by mild thrombocytopenia with large platelets and a paucity of α-granules; these abnormalities cause mostly moderate but in rare cases severe bleeding. We sequenced the exomes of four unrelated individuals and identified NBEAL2 as the causative gene; it has no previously known function but is a member of a gene family that is involved in granule development. Silencing of nbeal2 in zebrafish abrogated thrombocyte formation.

    Funded by: British Heart Foundation: RG/09/012/28096; Medical Research Council: MC_U105260799; Wellcome Trust: 082597, 082961, 084183

    Nature genetics 2011;43;8;735-7

  • IDH1 and IDH2 mutations are frequent events in central chondrosarcoma and central and periosteal chondromas but not in other mesenchymal tumours.

    Amary MF, Bacsi K, Maggiani F, Damato S, Halai D, Berisha F, Pollock R, O'Donnell P, Grigoriadis A, Diss T, Eskandarpour M, Presneau N, Hogendoorn PC, Futreal A, Tirabosco R and Flanagan AM

    Department of Histopathology, Royal National Orthopaedic Hospital NHS Trust, Stanmore, Middlesex HA7 4LP, UK.

    Somatic mutations in isocitrate dehydrogenase 1 (IDH1) and IDH2 occur in gliomas and acute myeloid leukaemia (AML). Since patients with multiple enchondromas have occasionally been reported to have these conditions, we hypothesized that the same mutations would occur in cartilaginous neoplasms. Approximately 1200 mesenchymal tumours, including 220 cartilaginous tumours, 222 osteosarcomas and another ∼750 bone and soft tissue tumours, were screened for IDH1 R132 mutations, using Sequenom(®) mass spectrometry. Cartilaginous tumours and chondroblastic osteosarcomas, wild-type for IDH1 R132, were analysed for IDH2 (R172, R140) mutations. Validation was performed by capillary sequencing and restriction enzyme digestion. Heterozygous somatic IDH1/IDH2 mutations, which result in the production of a potential oncometabolite, 2-hydroxyglutarate, were only detected in central and periosteal cartilaginous tumours, and were found in at least 56% of these, ∼40% of which were represented by R132C. IDH1 R132H mutations were confirmed by immunoreactivity for this mutant allele. The ratio of IDH1:IDH2 mutation was 10.6 : 1. No IDH2 R140 mutations were detected. Mutations were detected in enchondromas through to conventional central and dedifferentiated chondrosarcomas, in patients with both solitary and multiple neoplasms. No germline mutations were detected. No mutations were detected in peripheral chondrosarcomas and osteochondromas. In conclusion, IDH1 and IDH2 mutations represent the first common genetic abnormalities to be identified in conventional central and periosteal cartilaginous tumours. As in gliomas and AML, the mutations appear to occur early in tumourigenesis. We speculate that a mosaic pattern of IDH-mutation-bearing cells explains the reports of diverse tumours (gliomas, AML, multiple cartilaginous neoplasms, haemangiomas) occurring in the same patient.

    Funded by: Wellcome Trust: WT077012

    The Journal of pathology 2011;224;3;334-43

  • Ollier disease and Maffucci syndrome are caused by somatic mosaic mutations of IDH1 and IDH2.

    Amary MF, Damato S, Halai D, Eskandarpour M, Berisha F, Bonar F, McCarthy S, Fantin VR, Straley KS, Lobo S, Aston W, Green CL, Gale RE, Tirabosco R, Futreal A, Campbell P, Presneau N and Flanagan AM

    Histopathology Unit, Royal National Orthopaedic Hospital National Health Service Trust, Stanmore, UK.

    Ollier disease and Maffucci syndrome are characterized by multiple central cartilaginous tumors that are accompanied by soft tissue hemangiomas in Maffucci syndrome. We show that in 37 of 40 individuals with these syndromes, at least one tumor has a mutation in isocitrate dehydrogenase 1 (IDH1) or in IDH2, 65% of which result in a R132C substitution in the protein. In 18 of 19 individuals with more than one tumor analyzed, all tumors from a given individual shared the same IDH1 mutation affecting Arg132. In 2 of 12 subjects, a low level of mutated DNA was identified in non-neoplastic tissue. The levels of the metabolite 2HG were measured in a series of central cartilaginous and vascular tumors, including samples from syndromic and nonsyndromic subjects, and these levels correlated strongly with the presence of IDH1 mutations. The findings are compatible with a model in which IDH1 or IDH2 mutations represent early post-zygotic occurrences in individuals with these syndromes.

    Funded by: Wellcome Trust: WT077012

    Nature genetics 2011;43;12;1262-5

  • Synthetic associations are unlikely to account for many common disease genome-wide association signals.

    Anderson CA, Soranzo N, Zeggini E and Barrett JC

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB101HH, United Kingdom.

    Funded by: Wellcome Trust: WT089120/Z/09/Z, WT091745/Z/10/Z

    PLoS biology 2011;9;1;e1000580

  • Comparative whole genome sequence analysis of the carcinogenic bacterial model pathogen Helicobacter felis.

    Arnold IC, Zigova Z, Holden M, Lawley TD, Rad R, Dougan G, Falkow S, Bentley SD and Müller A

    Institute of Molecular Cancer Research, University of Zürich, Switzerland.

    The gram-negative bacterium Helicobacter felis naturally colonizes the gastric mucosa of dogs and cats. Due to its ability to persistently infect laboratory mice, H. felis has been used extensively to experimentally model gastric disorders induced in humans by H. pylori. We determined the 1.67 Mb genome sequence of H. felis using combined Solexa and 454 pyrosequencing, annotated the genome, and compared it with multiple previously published Helicobacter genomes. About 1,063 (63.6%) of the 1,671 genes identified in the H. felis genome have orthologues in H. pylori, its closest relative among the fully sequenced Helicobacter species. Many H. pylori virulence factors are shared by H. felis: these include the gamma-glutamyl transpeptidase GGT, the immunomodulator NapA, and the secreted enzymes collagenase and HtrA. Helicobacter felis lacks a Cag pathogenicity island and the vacuolating cytotoxin VacA but possesses a complete comB system conferring natural competence. Remarkable features of the H. felis genome include its paucity of transcriptional regulators and an extraordinary abundance of chemotaxis sensors and restriction/modification systems. Helicobacter felis possesses an episomally replicating 6.7-kb plasmid and harbors three chromosomal regions with deviating GC content. These putative horizontally acquired regions show homology and synteny with the recently isolated H. pylori plasmid pHPPC4 and homology to Campylobacter bacteriophage genes (transposases, structural, and lytic genes), respectively. In summary, the H. felis genome harbors a variety of putative mobile elements that are unique among Helicobacter species and may contribute to this pathogen's carcinogenic properties.

    Funded by: Wellcome Trust: 076962, 076964

    Genome biology and evolution 2011;3;302-8

  • Enterotypes of the human gut microbiome.

    Arumugam M, Raes J, Pelletier E, Le Paslier D, Yamada T, Mende DR, Fernandes GR, Tap J, Bruls T, Batto JM, Bertalan M, Borruel N, Casellas F, Fernandez L, Gautier L, Hansen T, Hattori M, Hayashi T, Kleerebezem M, Kurokawa K, Leclerc M, Levenez F, Manichanh C, Nielsen HB, Nielsen T, Pons N, Poulain J, Qin J, Sicheritz-Ponten T, Tims S, Torrents D, Ugarte E, Zoetendal EG, Wang J, Guarner F, Pedersen O, de Vos WM, Brunak S, Doré J, MetaHIT Consortium, Antolín M, Artiguenave F, Blottiere HM, Almeida M, Brechot C, Cara C, Chervaux C, Cultrone A, Delorme C, Denariaz G, Dervyn R, Foerstner KU, Friss C, van de Guchte M, Guedon E, Haimet F, Huber W, van Hylckama-Vlieg J, Jamet A, Juste C, Kaci G, Knol J, Lakhdari O, Layec S, Le Roux K, Maguin E, Mérieux A, Melo Minardi R, M'rini C, Muller J, Oozeer R, Parkhill J, Renault P, Rescigno M, Sanchez N, Sunagawa S, Torrejon A, Turner K, Vandemeulebrouck G, Varela E, Winogradsky Y, Zeller G, Weissenbach J, Ehrlich SD and Bork P

    European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany.

    Our knowledge of species and functional composition of the human gut microbiome is rapidly increasing, but it is still based on very few cohorts and little is known about variation across the world. By combining 22 newly sequenced faecal metagenomes of individuals from four countries with previously published data sets, here we identify three robust clusters (referred to as enterotypes hereafter) that are not nation or continent specific. We also confirmed the enterotypes in two published, larger cohorts, indicating that intestinal microbiota variation is generally stratified, not continuous. This indicates further the existence of a limited number of well-balanced host-microbial symbiotic states that might respond differently to diet and drug intake. The enterotypes are mostly driven by species composition, but abundant molecular functions are not necessarily provided by abundant species, highlighting the importance of a functional analysis to understand microbial communities. Although individual host properties such as body mass index, age, or gender cannot explain the observed enterotypes, data-driven marker genes or functional modules can be identified for each of these host properties. For example, twelve genes significantly correlate with age and three functional modules with the body mass index, hinting at a diagnostic potential of microbial markers.

    Funded by: NIDDK NIH HHS: K24 DK002800; Wellcome Trust: 076964, 082372

    Nature 2011;473;7346;174-80

  • Comprehensive comparison of three commercial human whole-exome capture platforms.

    Asan, Xu Y, Jiang H, Tyler-Smith C, Xue Y, Jiang T, Wang J, Wu M, Liu X, Tian G, Wang J, Wang J, Yang H and Zhang X

    Beijing Institute of Genomics, Chinese Academy of Sciences, No.7 Beitucheng West Road, Chaoyang District, Beijing 100029, China.

    Background: Exome sequencing, which allows the global analysis of protein coding sequences in the human genome, has become an effective and affordable approach to detecting causative genetic mutations in diseases. Currently, there are several commercial human exome capture platforms; however, the relative performances of these have not been characterized sufficiently to know which is best for a particular study.

    Results: We comprehensively compared three platforms: NimbleGen's Sequence Capture Array and SeqCap EZ, and Agilent's SureSelect. We assessed their performance in a variety of ways, including number of genes covered and capture efficacy. Differences that may impact on the choice of platform were that Agilent SureSelect covered approximately 1,100 more genes, while NimbleGen provided better flanking sequence capture. Although all three platforms achieved similar capture specificity of targeted regions, the NimbleGen platforms showed better uniformity of coverage and greater genotype sensitivity at 30- to 100-fold sequencing depth. All three platforms showed similar power in exome SNP calling, including medically relevant SNPs. Compared with genotyping and whole-genome sequencing data, the three platforms achieved a similar accuracy of genotype assignment and SNP detection. Importantly, all three platforms showed similar levels of reproducibility, GC bias and reference allele bias.

    Conclusions: We demonstrate key differences between the three platforms, particularly advantages of solutions over array capture and the importance of a large gene target set.

    Funded by: Wellcome Trust

    Genome biology 2011;12;9;R95

  • An effective method to purify Plasmodium falciparum DNA directly from clinical blood samples for whole genome high-throughput sequencing.

    Auburn S, Campino S, Clark TG, Djimde AA, Zongo I, Pinches R, Manske M, Mangano V, Alcock D, Anastasi E, Maslen G, Macinnis B, Rockett K, Modiano D, Newbold CI, Doumbo OK, Ouédraogo JB and Kwiatkowski DP

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.

    Highly parallel sequencing technologies permit cost-effective whole genome sequencing of hundreds of Plasmodium parasites. The ability to sequence clinical Plasmodium samples, extracted directly from patient blood without a culture step, presents a unique opportunity to sample the diversity of "natural" parasite populations in high resolution clinical and epidemiological studies. A major challenge to sequencing clinical Plasmodium samples is the abundance of human DNA, which may substantially reduce the yield of Plasmodium sequence. We tested a range of human white blood cell (WBC) depletion methods on P. falciparum-infected patient samples in search of a method displaying an optimal balance of WBC-removal efficacy, cost, simplicity, and applicability to low resource settings. In the first of a two-part study, combinations of three different WBC depletion methods were tested on 43 patient blood samples in Mali. A two-step combination of Lymphoprep plus Plasmodipur best fitted our requirements, although moderate variability was observed in human DNA quantity. This approach was further assessed in a larger sample of 76 patients from Burkina Faso. WBC-removal efficacy remained high (<30% human DNA in >70% samples) and lower variation was observed in human DNA quantities. In order to assess the Plasmodium sequence yield at different human DNA proportions, 59 samples with up to 60% human DNA contamination were sequenced on the Illumina Genome Analyzer platform. An average ~40-fold coverage of the genome was observed per lane for samples with ≤ 30% human DNA. Even in low resource settings, using a simple two-step combination of Lymphoprep plus Plasmodipur, over 70% of clinical sample preparations should exhibit sufficiently low human DNA quantities to enable ~40-fold sequence coverage of the P. falciparum genome using a single lane on the Illumina Genome Analyzer platform. This approach should greatly facilitate large-scale clinical and epidemiologic studies of P. falciparum.

    Funded by: Howard Hughes Medical Institute: 55005502; Medical Research Council: G0600718, G19/9; Wellcome Trust: 090532, 090770

    PloS one 2011;6;7;e22213

  • Male lineages in the Himalayan foothills: a commentary on Y-chromosome haplogroup diversity in the sub-Himalayan Terai and Duars populations of East India.

    Ayub Q

    Journal of human genetics 2011;56;12;813-4

  • Evolutionary dynamics of local pandemic H1N1/2009 influenza virus lineages revealed by whole-genome analysis.

    Baillie GJ, Galiano M, Agapow PM, Myers R, Chiam R, Gall A, Palser AL, Watson SJ, Hedge J, Underwood A, Platt S, McLean E, Pebody RG, Rambaut A, Green J, Daniels R, Pybus OG, Kellam P and Zambon M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom.

    Virus gene sequencing and phylogenetics can be used to study the epidemiological dynamics of rapidly evolving viruses. With complete genome data, it becomes possible to identify and trace individual transmission chains of viruses such as influenza virus during the course of an epidemic. Here we sequenced 153 pandemic influenza H1N1/09 virus genomes from United Kingdom isolates from the first (127 isolates) and second (26 isolates) waves of the 2009 pandemic and used their sequences, dates of isolation, and geographical locations to infer the genetic epidemiology of the epidemic in the United Kingdom. We demonstrate that the epidemic in the United Kingdom was composed of many cocirculating lineages, among which at least 13 were exclusively or predominantly United Kingdom clusters. The estimated divergence times of two of the clusters predate the detection of pandemic H1N1/09 virus in the United Kingdom, suggesting that the pandemic H1N1/09 virus was already circulating in the United Kingdom before the first clinical case. Crucially, three clusters contain isolates from the second wave of infections in the United Kingdom, two of which represent chains of transmission that appear to have persisted within the United Kingdom between the first and second waves. This demonstrates that whole-genome analysis can track in fine detail the behavior of individual influenza virus lineages during the course of a single epidemic or pandemic.

    Funded by: Medical Research Council: MC_U117512723; Wellcome Trust: 095831

    Journal of virology 2011;86;1;11-8

  • Parallel evolution of genes and languages in the Caucasus region.

    Balanovsky O, Dibirova K, Dybo A, Mudrak O, Frolova S, Pocheshkhova E, Haber M, Platt D, Schurr T, Haak W, Kuznetsova M, Radzhabov M, Balaganskaya O, Romanov A, Zakharova T, Soria Hernanz DF, Zalloua P, Koshel S, Ruhlen M, Renfrew C, Wells RS, Tyler-Smith C, Balanovska E and Genographic Consortium

    Research Centre for Medical Genetics, Russian Academy of Medical Sciences, Moscow, Russia.

    We analyzed 40 single nucleotide polymorphism and 19 short tandem repeat Y-chromosomal markers in a large sample of 1,525 indigenous individuals from 14 populations in the Caucasus and 254 additional individuals representing potential source populations. We also employed a lexicostatistical approach to reconstruct the history of the languages of the North Caucasian family spoken by the Caucasus populations. We found a different major haplogroup to be prevalent in each of four sets of populations that occupy distinct geographic regions and belong to different linguistic branches. The haplogroup frequencies correlated with geography and, even more strongly, with language. Within haplogroups, a number of haplotype clusters were shown to be specific to individual populations and languages. The data suggested a direct origin of Caucasus male lineages from the Near East, followed by high levels of isolation, differentiation, and genetic drift in situ. Comparison of genetic and linguistic reconstructions covering the last few millennia showed striking correspondences between the topology and dates of the respective gene and language trees and with documented historical events. Overall, in the Caucasus region, unmatched levels of gene-language coevolution occurred within geographically isolated populations, probably due to its mountainous terrain.

    Funded by: Wellcome Trust: 077009

    Molecular biology and evolution 2011;28;10;2905-20

  • Gene inactivation and its implications for annotation in the era of personal genomics.

    Balasubramanian S, Habegger L, Frankish A, MacArthur DG, Harte R, Tyler-Smith C, Harrow J and Gerstein M

    Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA.

    The first wave of personal genomes documents how no single individual genome contains the full complement of functional genes. Here, we describe the extent of variation in gene and pseudogene numbers between individuals arising from inactivation events such as premature termination or aberrant splicing due to single-nucleotide polymorphisms. This highlights the inadequacy of the current reference sequence and gene set. We present a proposal to define a reference gene set that will remain stable as more individuals are sequenced. In particular, we recommend that the ancestral allele be used to define the reference sequence from which a core human reference gene annotation set can be derived. In addition, we call for the development of an expanded gene set to include human-specific genes that have arisen recently and are absent from the ancestral set.

    Funded by: Wellcome Trust

    Genes & development 2011;25;1;1-10

  • CCR4-associated factor 1 coordinates the expression of Plasmodium falciparum egress and invasion proteins.

    Balu B, Maher SP, Pance A, Chauhan C, Naumov AV, Andrews RM, Ellis PD, Khan SM, Lin JW, Janse CJ, Rayner JC and Adams JH

    Department of Global Health, College of Public Health, University of South Florida, College of Public Health, 3720 Spectrum Blvd., Suite 304, Tampa, FL, USA.

    Coordinated regulation of gene expression is a hallmark of the Plasmodium falciparum asexual blood-stage development cycle. We report that carbon catabolite repressor protein 4 (CCR4)-associated factor 1 (CAF1) is critical in regulating more than 1,000 genes during malaria parasites' intraerythrocytic stages, especially egress and invasion proteins. CAF1 knockout results in mistimed expression, aberrant accumulation and localization of proteins involved in parasite egress, and invasion of new host cells, leading to premature release of predominantly half-finished merozoites, drastically reducing the intraerythrocytic growth rate of the parasite. This study demonstrates that CAF1 of the CCR4-Not complex is a significant gene regulatory mechanism needed for Plasmodium development within the human host.

    Funded by: NIAID NIH HHS: R01 AI033656, R01 AI094973, R01 AI094973-01, R01AI033656, R01AI094973; Wellcome Trust

    Eukaryotic cell 2011;10;9;1257-63

  • RNAcentral: A vision for an international database of RNA sequences.

    Bateman A, Agrawal S, Birney E, Bruford EA, Bujnicki JM, Cochrane G, Cole JR, Dinger ME, Enright AJ, Gardner PP, Gautheret D, Griffiths-Jones S, Harrow J, Herrero J, Holmes IH, Huang HD, Kelly KA, Kersey P, Kozomara A, Lowe TM, Marz M, Moxon S, Pruitt KD, Samuelsson T, Stadler PF, Vilella AJ, Vogel JH, Williams KP, Wright MW and Zwieb C

    During the last decade there has been a great increase in the number of noncoding RNA genes identified, including new classes such as microRNAs and piRNAs. There is also a large growth in the amount of experimental characterization of these RNA components. Despite this growth in information, it is still difficult for researchers to access RNA data, because key data resources for noncoding RNAs have not yet been created. The most pressing omission is the lack of a comprehensive RNA sequence database, much like UniProt, which provides a comprehensive set of protein knowledge. In this article we propose the creation of a new open public resource that we term RNAcentral, which will contain a comprehensive collection of RNA sequences and fill an important gap in the provision of biomedical databases. We envision RNA researchers from all over the world joining a federated RNAcentral network, contributing specialized knowledge and databases. RNAcentral would centralize key data that are currently held across a variety of databases, allowing researchers instant access to a single, unified resource. This resource would facilitate the next generation of RNA research and help drive further discoveries, including those that improve food production and human and animal health. We encourage additional RNA database resources and research groups to join this effort. We aim to obtain international network funding to further this endeavor.

    Funded by: NHGRI NIH HHS: P41 HG003345

    RNA (New York, N.Y.) 2011;17;11;1941-6

  • Meta-analysis of genome-wide association studies from the CHARGE consortium identifies common variants associated with carotid intima media thickness and plaque.

    Bis JC, Kavousi M, Franceschini N, Isaacs A, Abecasis GR, Schminke U, Post WS, Smith AV, Cupples LA, Markus HS, Schmidt R, Huffman JE, Lehtimäki T, Baumert J, Münzel T, Heckbert SR, Dehghan A, North K, Oostra B, Bevan S, Stoegerer EM, Hayward C, Raitakari O, Meisinger C, Schillert A, Sanna S, Völzke H, Cheng YC, Thorsson B, Fox CS, Rice K, Rivadeneira F, Nambi V, Halperin E, Petrovic KE, Peltonen L, Wichmann HE, Schnabel RB, Dörr M, Parsa A, Aspelund T, Demissie S, Kathiresan S, Reilly MP, Taylor K, Uitterlinden A, Couper DJ, Sitzer M, Kähönen M, Illig T, Wild PS, Orru M, Lüdemann J, Shuldiner AR, Eiriksdottir G, White CC, Rotter JI, Hofman A, Seissler J, Zeller T, Usala G, Ernst F, Launer LJ, D'Agostino RB, O'Leary DH, Ballantyne C, Thiery J, Ziegler A, Lakatta EG, Chilukoti RK, Harris TB, Wolf PA, Psaty BM, Polak JF, Li X, Rathmann W, Uda M, Boerwinkle E, Klopp N, Schmidt H, Wilson JF, Viikari J, Koenig W, Blankenberg S, Newman AB, Witteman J, Heiss G, Duijn Cv, Scuteri A, Homuth G, Mitchell BD, Gudnason V, O'Donnell CJ and CARDIoGRAM Consortium

    Cardiovascular Health Research Unit and Department of Medicine, University of Washington, Seattle, Washington, USA.

    Carotid intima media thickness (cIMT) and plaque determined by ultrasonography are established measures of subclinical atherosclerosis that each predicts future cardiovascular disease events. We conducted a meta-analysis of genome-wide association data in 31,211 participants of European ancestry from nine large studies in the setting of the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium. We then sought additional evidence to support our findings among 11,273 individuals using data from seven additional studies. In the combined meta-analysis, we identified three genomic regions associated with common carotid intima media thickness and two different regions associated with the presence of carotid plaque (P < 5 × 10(-8)). The associated SNPs mapped in or near genes related to cellular signaling, lipid metabolism and blood pressure homeostasis, and two of the regions were associated with coronary artery disease (P < 0.006) in the Coronary Artery Disease Genome-Wide Replication and Meta-Analysis (CARDIoGRAM) consortium. Our findings may provide new insight into pathways leading to subclinical atherosclerosis and subsequent cardiovascular events.

    Funded by: Chief Scientist Office: CZB/4/710; Intramural NIH HHS: Z01 HL006002-01, Z99 HL999999; Medical Research Council: MC_U127561128; NCATS NIH HHS: UL1 TR000005; NCRR NIH HHS: M01 RR 16500, M01RR00069, UL1RR025005; NHGRI NIH HHS: HG005581, U01HG004402; NHLBI NIH HHS: HL075366, HL080295, HL084729, HL087652, HL105756, N01 HC-15103, N01 HC-55222, N01-HC-25195, N01-HC-35129, N01-HC-45133, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-75150, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85084, N01-HC-85085, N01-HC-85086, N01-HC-85239, N02-HL-6-4278, R01HL086694, R01HL087641, R01HL59367, U01 HL072515-06; NIA NIH HHS: AG-023629, AG-027058, AG-15928, AG-20098, AG033193, AG08122, AG16495, N01-AG-1-2109, N01-AG-12100, R01 AG18728; NIDDK NIH HHS: DK063491, P30 DK072488; NIGMS NIH HHS: U01 GM074518-04; NINDS NIH HHS: NS17950; PHS HHS: 268200625226C

    Nature genetics 2011;43;10;940-7

  • Abdominal aortic aneurysm is associated with a variant in low-density lipoprotein receptor-related protein 1.

    Bown MJ, Jones GT, Harrison SC, Wright BJ, Bumpstead S, Baas AF, Gretarsdottir S, Badger SA, Bradley DT, Burnand K, Child AH, Clough RE, Cockerill G, Hafez H, Scott DJ, Futers S, Johnson A, Sohrabi S, Smith A, Thompson MM, van Bockxmeer FM, Waltham M, Matthiasson SE, Thorleifsson G, Thorsteinsdottir U, Blankensteijn JD, Teijink JA, Wijmenga C, de Graaf J, Kiemeney LA, Assimes TL, McPherson R, CARDIoGRAM Consortium, Global BPgen Consortium, DIAGRAM Consortium, VRCNZ Consortium, Folkersen L, Franco-Cereceda A, Palmen J, Smith AJ, Sylvius N, Wild JB, Refstrup M, Edkins S, Gwilliam R, Hunt SE, Potter S, Lindholt JS, Frikke-Schmidt R, Tybjærg-Hansen A, Hughes AE, Golledge J, Norman PE, van Rij A, Powell JT, Eriksson P, Stefansson K, Thompson JR, Humphries SE, Sayers RD, Deloukas P and Samani NJ

    Department of Cardiovascular Sciences, University of Leicester, Leicester LE2 7LX, UK.

    Abdominal aortic aneurysm (AAA) is a common cause of morbidity and mortality and has a significant heritability. We carried out a genome-wide association discovery study of 1866 patients with AAA and 5435 controls and replication of promising signals (lead SNP with a p value < 1 × 10(-5)) in 2871 additional cases and 32,687 controls and performed further follow-up in 1491 AAA and 11,060 controls. In the discovery study, nine loci demonstrated association with AAA (p < 1 × 10(-5)). In the replication sample, the lead SNP at one of these loci, rs1466535, located within intron 1 of low-density-lipoprotein receptor-related protein 1 (LRP1) demonstrated significant association (p = 0.0042). We confirmed the association of rs1466535 and AAA in our follow-up study (p = 0.035). In a combined analysis (6228 AAA and 49182 controls), rs1466535 had a consistent effect size and direction in all sample sets (combined p = 4.52 × 10(-10), odds ratio 1.15 [1.10-1.21]). No associations were seen for either rs1466535 or the 12q13.3 locus in independent association studies of coronary artery disease, blood pressure, diabetes, or hyperlipidaemia, suggesting that this locus is specific to AAA. Gene-expression studies demonstrated a trend toward increased LRP1 expression for the rs1466535 CC genotype in arterial tissues; there was a significant (p = 0.029) 1.19-fold (1.04-1.36) increase in LRP1 expression in CC homozygotes compared to TT homozygotes in aortic adventitia. Functional studies demonstrated that rs1466535 might alter a SREBP-1 binding site and influence enhancer activity at the locus. In conclusion, this study has identified a biologically plausible genetic variant associated specifically with AAA, and we suggest that this variant has a possible functional role in LRP1 expression.

    Funded by: British Heart Foundation: FS/11/16/28696, PG/10/001/28098, RG2008/08; Wellcome Trust: 076113, 084695, 085475

    American journal of human genetics 2011;89;5;619-27

  • TSIDER1, a short and non-autonomous Salivarian trypanosome-specific retroposon related to the ingi6 subclade.

    Bringaud F, Berriman M and Hertz-Fowler C

    Centre de Résonance Magnétique des Systèmes Biologiques, UMR 5536, Université Bordeaux Segalen, CNRS, 146 rue Léo Saignat, 33076 Bordeaux, France.

    Retroposons of the ingi clade are the most abundant transposable elements identified in the trypanosomatid genomes. Some are long autonomous elements (ingi, L1Tc) while others, such as RIME and NARTc, are short non-coding elements that parasitize the retrotransposition machinery of the active autonomous ones for their own mobilization. Here, we identified a new family of short non-autonomous retroposons of the ingi clade, called TSIDER1, which are present in the genome of Salivarian (African) trypanosomes, Trypanosoma brucei, T. congolense and T. vivax, but absent in the T. cruzi and Leishmania spp. genomes and, as such, TSIDER1 is the only retroposon subfamily conserved at the nucleotide level between African trypanosome species. We identified three TvSIDER1 families within the genome of T. vivax and the high level of sequence conservation within the TvSIDER1a and TvSIDER1b groups suggests that they are still active. We propose that TvSIDER1a/b elements are using the Tvingi retrotransposition machinery, as they are preceded by the same conserved pattern characteristic of the ingi6 subclade, which corresponds to the retroposon-encoded endonuclease binding site. In contrast, TcoSIDER1, TbSIDER1 and TvSIDER1c are too divergent to be considered as active retroposons. The relatively low number of SIDER elements identified in the T. congolense (70 copies), T. vivax (32 copies) and T. brucei (22 copies) genomes confirms that trypanosomes have not expanded short transposable elements, which is in contrast to Leishmania spp. (∼2000 copies), where SIDER play a role in the regulation of gene expression.

    Funded by: Wellcome Trust: WT 085775/Z//08/Z

    Molecular and biochemical parasitology 2011;179;1;30-6

  • Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and "resurrected" pseudogenes in the mouse genome.

    Brosch M, Saunders GI, Frankish A, Collins MO, Yu L, Wright J, Verstraten R, Adams DJ, Harrow J, Choudhary JS and Hubbard T

    The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    Recent advances in proteomic mass spectrometry (MS) offer the chance to marry high-throughput peptide sequencing to transcript models, allowing the validation, refinement, and identification of new protein-coding loci. We present a novel pipeline that integrates highly sensitive and statistically robust peptide spectrum matching with genome-wide protein-coding predictions to perform large-scale gene validation and discovery in the mouse genome for the first time. In searching an excess of 10 million spectra, we have been able to validate 32%, 17%, and 7% of all protein-coding genes, exons, and splice boundaries, respectively. Moreover, we present strong evidence for the identification of multiple alternatively spliced translations from 53 genes and have uncovered 10 entirely novel protein-coding genes, which are not covered in any mouse annotation data sources. One such novel protein-coding gene is a fusion protein that spans the Ins2 and Igf2 loci to produce a transcript encoding the insulin II and the insulin-like growth factor 2-derived peptides. We also report nine processed pseudogenes that have unique peptide hits, demonstrating, for the first time, that they are not just transcribed but are translated and are therefore resurrected into new coding loci. This work not only highlights an important utility for MS data in genome annotation but also provides unique insights into the gene structure and propagation in the mouse genome. All these data have been subsequently used to improve the publicly available mouse annotation available in both the Vega and Ensembl genome browsers (

    Funded by: Cancer Research UK: 13031; Wellcome Trust: 077198

    Genome research 2011;21;5;756-67

  • Population genetic analysis of Plasmodium falciparum parasites using a customized Illumina GoldenGate genotyping assay.

    Campino S, Auburn S, Kivinen K, Zongo I, Ouedraogo JB, Mangano V, Djimde A, Doumbo OK, Kiara SM, Nzila A, Borrmann S, Marsh K, Michon P, Mueller I, Siba P, Jiang H, Su XZ, Amaratunga C, Socheat D, Fairhurst RM, Imwong M, Anderson T, Nosten F, White NJ, Gwilliam R, Deloukas P, MacInnis B, Newbold CI, Rockett K, Clark TG and Kwiatkowski DP

    Wellcome Trust Sanger Institute, Hinxton, United Kingdom.

    The diversity in the Plasmodium falciparum genome can be used to explore parasite population dynamics, with practical applications to malaria control. The ability to identify the geographic origin and trace the migratory patterns of parasites with clinically important phenotypes such as drug resistance is particularly relevant. With increasing single-nucleotide polymorphism (SNP) discovery from ongoing Plasmodium genome sequencing projects, a demand for high SNP and sample throughput genotyping platforms for large-scale population genetic studies is required. Low parasitaemias and multiple clone infections present a number of challenges to genotyping P. falciparum. We addressed some of these issues using a custom 384-SNP Illumina GoldenGate assay on P. falciparum DNA from laboratory clones (long-term cultured adapted parasite clones), short-term cultured parasite isolates and clinical (non-cultured isolates) samples from East and West Africa, Southeast Asia and Oceania. Eighty percent of the SNPs (n = 306) produced reliable genotype calls on samples containing as little as 2 ng of total genomic DNA and on whole genome amplified DNA. Analysis of artificial mixtures of laboratory clones demonstrated high genotype calling specificity and moderate sensitivity to call minor frequency alleles. Clear resolution of geographically distinct populations was demonstrated using Principal Components Analysis (PCA), and global patterns of population genetic diversity were consistent with previous reports. These results validate the utility of the platform in performing population genetic studies of P. falciparum.

    Funded by: Howard Hughes Medical Institute; Intramural NIH HHS; Medical Research Council: G0600718, G19/9; NIAID NIH HHS: R37 AI048071; Wellcome Trust: 090532, 093956

    PloS one 2011;6;6;e20251

  • Determinants of bluetongue virus virulence in murine models of disease.

    Caporale M, Wash R, Pini A, Savini G, Franchi P, Golder M, Patterson-Kane J, Mertens P, Di Gialleonardo L, Armillotta G, Lelli R, Kellam P and Palmarini M

    Medical Research Council-University of Glasgow Centre for Virus Research, Institute of Infection, Inflammation and Immunity, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom.

    Bluetongue is a major infectious disease of ruminants that is caused by bluetongue virus (BTV). In this study, we analyzed virulence and genetic differences of (i) three BTV field strains from Italy maintained at either a low (L strains) or high (H strains) passage number in cell culture and (ii) three South African "reference" wild-type strains and their corresponding live attenuated vaccine strains. The Italian BTV L strains, in general, were lethal for both newborn NIH-Swiss mice inoculated intracerebrally and adult type I interferon receptor-deficient (IFNAR(-/-)) mice, while the virulence of the H strains was attenuated significantly in both experimental models. Similarly, the South African vaccine strains were not pathogenic for IFNAR(-/-) mice, while the corresponding wild-type strains were virulent. Thus, attenuation of the virulence of the BTV strains used in this study is not mediated by the presence of an intact interferon system. No clear distinction in virulence was observed for the South African BTV strains in newborn NIH-Swiss mice. Full genomic sequencing revealed relatively few amino acid substitutions, scattered in several different viral proteins, for the strains found to be attenuated in mice compared to the pathogenic related strains. However, only the genome segments encoding VP1, VP2, and NS2 consistently showed nonsynonymous changes between all virulent and attenuated strain pairs. This study established an experimental platform for investigating the determinants of BTV virulence. Future studies using reverse genetics will allow researchers to precisely map and "weight" the relative influences of the various genome segments and viral proteins on BTV virulence.

    Funded by: Medical Research Council: G0801822; Wellcome Trust

    Journal of virology 2011;85;21;11479-89

  • Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data.

    Carver T, Harris SR, Berriman M, Parkhill J and McQuillan JA

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Motivation: High-throughput sequencing (HTS) technologies have made low-cost sequencing of large numbers of samples commonplace. An explosion in the type, not just number, of sequencing experiments has also taken place including genome re-sequencing, population-scale variation detection, whole transcriptome sequencing and genome-wide analysis of protein-bound nucleic acids.

    Results: We present Artemis as a tool for integrated visualization and computational analysis of different types of HTS datasets in the context of a reference genome and its corresponding annotation.

    Availability: Artemis is freely available (under a GPL licence) for download (for MacOSX, UNIX and Windows) at the Wellcome Trust Sanger Institute websites:

    Funded by: Wellcome Trust: WT 076964

    Bioinformatics (Oxford, England) 2011;28;4;464-9

  • A modified vaccinia Ankara virus (MVA) vaccine expressing African horse sickness virus (AHSV) VP2 protects against AHSV challenge in an IFNAR -/- mouse model.

    Castillo-Olivares J, Calvo-Pinilla E, Casanova I, Bachanek-Bankowska K, Chiam R, Maan S, Nieto JM, Ortego J and Mertens PP

    Institute for Animal Health, Pirbright, Woking, Surrey, United Kingdom.

    African horse sickness (AHS) is a lethal viral disease of equids, which is transmitted by Culicoides midges that become infected after biting a viraemic host. The use of live attenuated vaccines has been vital for the control of this disease in endemic regions. However, there are safety concerns over their use in non-endemic countries. Research efforts over the last two decades have therefore focused on developing alternative vaccines based on recombinant baculovirus or live viral vectors expressing structural components of the AHS virion. However, ethical and financial considerations, relating to the use of infected horses in high biosecurity installations, have made progress very slow. We have therefore assessed the potential of an experimental mouse-model for AHSV infection for vaccine and immunology research. We initially characterised AHSV infection in this model, then tested the protective efficacy of a recombinant vaccine based on modified vaccinia Ankara expressing AHS-4 VP2 (MVA-VP2).

    Funded by: Biotechnology and Biological Sciences Research Council: BBS/B/00654

    PloS one 2011;6;1;e16503

  • The impact of recombination on dN/dS within recently emerged bacterial clones.

    Castillo-Ramírez S, Harris SR, Holden MT, He M, Parkhill J, Bentley SD and Feil EJ

    Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, United Kingdom.

    The development of next-generation sequencing platforms is set to reveal an unprecedented level of detail on short-term molecular evolutionary processes in bacteria. Here we re-analyse genome-wide single nucleotide polymorphism (SNP) datasets for recently emerged clones of methicillin resistant Staphylococcus aureus (MRSA) and Clostridium difficile. We note a highly significant enrichment of synonymous SNPs in those genes which have been affected by recombination, i.e. those genes on mobile elements designated "non-core" (in the case of S. aureus), or those core genes which have been affected by homologous replacements (S. aureus and C. difficile). This observation suggests that the previously documented decrease in dN/dS over time in bacteria applies not only to genomes of differing levels of divergence overall, but also to horizontally acquired genes of differing levels of divergence within a single genome. We also consider the role of increased drift acting on recently emerged, highly specialised clones, and the impact of recombination on selection at linked sites. This work has implications for a wide range of genomic analyses.

    Funded by: Wellcome Trust

    PLoS pathogens 2011;7;7;e1002129

  • Genome-wide association study identifies loci influencing concentrations of liver enzymes in plasma.

    Chambers JC, Zhang W, Sehmi J, Li X, Wass MN, Van der Harst P, Holm H, Sanna S, Kavousi M, Baumeister SE, Coin LJ, Deng G, Gieger C, Heard-Costa NL, Hottenga JJ, Kühnel B, Kumar V, Lagou V, Liang L, Luan J, Vidal PM, Mateo Leach I, O'Reilly PF, Peden JF, Rahmioglu N, Soininen P, Speliotes EK, Yuan X, Thorleifsson G, Alizadeh BZ, Atwood LD, Borecki IB, Brown MJ, Charoen P, Cucca F, Das D, de Geus EJ, Dixon AL, Döring A, Ehret G, Eyjolfsson GI, Farrall M, Forouhi NG, Friedrich N, Goessling W, Gudbjartsson DF, Harris TB, Hartikainen AL, Heath S, Hirschfield GM, Hofman A, Homuth G, Hyppönen E, Janssen HL, Johnson T, Kangas AJ, Kema IP, Kühn JP, Lai S, Lathrop M, Lerch MM, Li Y, Liang TJ, Lin JP, Loos RJ, Martin NG, Moffatt MF, Montgomery GW, Munroe PB, Musunuru K, Nakamura Y, O'Donnell CJ, Olafsson I, Penninx BW, Pouta A, Prins BP, Prokopenko I, Puls R, Ruokonen A, Savolainen MJ, Schlessinger D, Schouten JN, Seedorf U, Sen-Chowdhry S, Siminovitch KA, Smit JH, Spector TD, Tan W, Teslovich TM, Tukiainen T, Uitterlinden AG, Van der Klauw MM, Vasan RS, Wallace C, Wallaschofski H, Wichmann HE, Willemsen G, Würtz P, Xu C, Yerges-Armstrong LM, Alcohol Genome-wide Association (AlcGen) Consortium, Diabetes Genetics Replication and Meta-analyses (DIAGRAM+) Study, Genetic Investigation of Anthropometric Traits (GIANT) Consortium, Global Lipids Genetics Consortium, Genetics of Liver Disease (GOLD) Consortium, International Consortium for Blood Pressure (ICBP-GWAS), Meta-analyses of Glucose and Insulin-Related Traits Consortium (MAGIC), Abecasis GR, Ahmadi KR, Boomsma DI, Caulfield M, Cookson WO, van Duijn CM, Froguel P, Matsuda K, McCarthy MI, Meisinger C, Mooser V, Pietiläinen KH, Schumann G, Snieder H, Sternberg MJ, Stolk RP, Thomas HC, Thorsteinsdottir U, Uda M, Waeber G, Wareham NJ, Waterworth DM, Watkins H, Whitfield JB, Witteman JC, Wolffenbuttel BH, Fox CS, Ala-Korpela M, Stefansson K, Vollenweider P, Völzke H, Schadt EE, Scott J, Järvelin MR, Elliott P and Kooner JS

    Epidemiology and Biostatistics, Imperial College London, Norfolk Place, London, UK.

    Concentrations of liver enzymes in plasma are widely used as indicators of liver disease. We carried out a genome-wide association study in 61,089 individuals, identifying 42 loci associated with concentrations of liver enzymes in plasma, of which 32 are new associations (P = 10(-8) to P = 10(-190)). We used functional genomic approaches including metabonomic profiling and gene expression analyses to identify probable candidate genes at these regions. We identified 69 candidate genes, including genes involved in biliary transport (ATP8B1 and ABCB11), glucose, carbohydrate and lipid metabolism (FADS1, FADS2, GCKR, JMJD1C, HNF1A, MLXIPL, PNPLA3, PPP1R3B, SLC2A2 and TRIB1), glycoprotein biosynthesis and cell surface glycobiology (ABO, ASGR1, FUT2, GPLD1 and ST3GAL4), inflammation and immunity (CD276, CDH6, GCKR, HNF1A, HPR, ITGA1, RORA and STAT4) and glutathione metabolism (GSTT1, GSTT2 and GGT), as well as several genes of uncertain or unknown function (including ABHD12, EFHD1, EFNA1, EPHA2, MICAL3 and ZNF827). Our results provide new insight into genetic mechanisms and pathways influencing markers of liver function.

    Funded by: British Heart Foundation: FS/10/011/27881, PG/09/002/26056, PG/09/023/26806, RG/07/008/23674; Cancer Research UK: 14136; Department of Health: PHCS/C4/4/016; Intramural NIH HHS: Z01 AG000675-02, Z99 DK999999, ZIA DK075013-05, ZIA DK075013-07; Medical Research Council: G0100222, G0401527, G0601653, G0601966, G0700342, G0700931, G0701863, G0902037, G1000143, G19/35, G8802774, G9521010, MC_PC_U127561128, MC_U106179471, MC_U106188470, MC_U127561128, MC_UP_A100_1003, MC_UP_A620_1015; NHLBI NIH HHS: R01 HL087647; NIAAA NIH HHS: K05 AA017688; Wellcome Trust: 090532

    Nature genetics 2011;43;11;1131-8

  • Defining the power limits of genome-wide association scan meta-analyses.

    Chapman K, Ferreira T, Morris A, Asimit J and Zeggini E

    Wellcome Trust Centre for Human Genetics, Roosevelt Drive, University of Oxford, Oxford, United Kingdom.

    Large-scale meta-analyses of genome-wide association scans (GWAS) have been successful in discovering common risk variants with modest and small effects. The detection of lower frequency signals will undoubtedly require concerted efforts of at least similar scale. We investigate the sample size-dictated power limits of GWAS meta-analyses, in the presence and absence of modest levels of heterogeneity and across a range of different allelic architectures. We find that data combination through large-scale collaboration is vital in the quest for complex trait susceptibility loci, but that effect size heterogeneity across meta-analyzed studies drawn from similar populations does not appear to have a profound effect on sample size requirements.

    Funded by: Wellcome Trust: 088885, 090532, WT079557MA, WT081682/Z/06/Z, WT088885/Z/09/Z

    Genetic epidemiology 2011;35;8;781-9

  • Expressions of individuality.

    Chappell L

    Nature reviews. Microbiology 2011;9;10;701

  • Genome-wide association study reveals three susceptibility loci for common migraine in the general population.

    Chasman DI, Schürks M, Anttila V, de Vries B, Schminke U, Launer LJ, Terwindt GM, van den Maagdenberg AM, Fendrich K, Völzke H, Ernst F, Griffiths LR, Buring JE, Kallela M, Freilinger T, Kubisch C, Ridker PM, Palotie A, Ferrari MD, Hoffmann W, Zee RY and Kurth T

    Division of Preventive Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA.

    Migraine is a common, heterogeneous and heritable neurological disorder. Its pathophysiology is incompletely understood, and its genetic influences at the population level are unknown. In a population-based genome-wide analysis including 5,122 migraineurs and 18,108 non-migraineurs, rs2651899 (1p36.32, PRDM16), rs10166942 (2q37.1, TRPM8) and rs11172113 (12q13.3, LRP1) were among the top seven associations (P < 5 × 10(-6)) with migraine. These SNPs were significant in a meta-analysis among three replication cohorts and met genome-wide significance in a meta-analysis combining the discovery and replication cohorts (rs2651899, odds ratio (OR) = 1.11, P = 3.8 × 10(-9); rs10166942, OR = 0.85, P = 5.5 × 10(-12); and rs11172113, OR = 0.90, P = 4.3 × 10(-9)). The associations at rs2651899 and rs10166942 were specific for migraine compared with non-migraine headache. None of the three SNP associations was preferential for migraine with aura or without aura, nor were any associations specific for migraine features. TRPM8 has been the focus of neuropathic pain models, whereas LRP1 modulates neuronal glutamate signaling, plausibly linking both genes to migraine pathophysiology.

    Funded by: NCI NIH HHS: CA-47988, R01 CA047988, R01 CA047988-21; NHLBI NIH HHS: HL-043851, HL-080467, HL-099355, R01 HL043851, R01 HL043851-10, R01 HL080467, R01 HL080467-05, RC1 HL099355, RC1 HL099355-02; NINDS NIH HHS: NS-061836, R01 NS061836, R01 NS061836-03

    Nature genetics 2011;43;7;695-8

  • Quantitative RNA-seq analysis of the Campylobacter jejuni transcriptome.

    Chaudhuri RR, Yu L, Kanji A, Perkins TT, Gardner PP, Choudhary J, Maskell DJ and Grant AJ

    Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge CB3 0ES, UK.

    Campylobacter jejuni is the most common bacterial cause of foodborne disease in the developed world. Its general physiology and biochemistry, as well as the mechanisms enabling it to colonize and cause disease in various hosts, are not well understood, and new approaches are required to understand its basic biology. High-throughput sequencing technologies provide unprecedented opportunities for functional genomic research. Recent studies have shown that direct Illumina sequencing of cDNA (RNA-seq) is a useful technique for the quantitative and qualitative examination of transcriptomes. In this study we report RNA-seq analyses of the transcriptomes of C. jejuni (NCTC11168) and its rpoN mutant. This has allowed the identification of hitherto unknown transcriptional units, and further defines the regulon that is dependent on rpoN for expression. The analysis of the NCTC11168 transcriptome was supplemented by additional proteomic analysis using liquid chromatography-MS. The transcriptomic and proteomic datasets represent an important resource for the Campylobacter research community.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/D00019X/1; Medical Research Council: G0801161; Wellcome Trust: 079643/Z/06/Z

    Microbiology (Reading, England) 2011;157;Pt 10;2922-2932

  • Genetic screens using the piggyBac transposon.

    Chew SK, Rad R, Futreal PA, Bradley A and Liu P

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK.

    Transposons are an attractive system to use in genetic screens as they are molecularly tractable and the disrupted loci that give rise to the desired phenotype are easily mapped. We consider herein the characteristics of the piggyBac transposon system in complementing existing mammalian screen strategies, including the Sleeping Beauty transposon system. We also describe the design of the piggyBac resources that we have developed for both forward and reverse genetic screens, and the protocols we use in these experiments.

    Funded by: Wellcome Trust

    Methods (San Diego, Calif.) 2011;53;4;366-71

  • Modernizing reference genome assemblies.

    Church DM, Schneider VA, Graves T, Auger K, Cunningham F, Bouk N, Chen HC, Agarwala R, McLaren WM, Ritchie GR, Albracht D, Kremitzki M, Rock S, Kotkiewicz H, Kremitzki C, Wollam A, Trani L, Fulton L, Fulton R, Matthews L, Whitehead S, Chow W, Torrance J, Dunn M, Harden G, Threadgold G, Wood J, Collins J, Heath P, Griffiths G, Pelan S, Grafham D, Eichler EE, Weinstock G, Mardis ER, Wilson RK, Howe K, Flicek P and Hubbard T

    National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America.

    Funded by: Wellcome Trust: 077198, 095908

    PLoS biology 2011;9;7;e1001091

  • The GENCODE exome: sequencing the complete human exome.

    Coffey AJ, Kokocinski F, Calafato MS, Scott CE, Palta P, Drury E, Joyce CJ, Leproust EM, Harrow J, Hunt S, Lehesjoki AE, Turner DJ, Hubbard TJ and Palotie A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.

    Sequencing the coding regions, the exome, of the human genome is one of the major current strategies to identify low frequency and rare variants associated with human disease traits. So far, the most widely used commercial exome capture reagents have mainly targeted the consensus coding sequence (CCDS) database. We report the design of an extended set of targets for capturing the complete human exome, based on annotation from the GENCODE consortium. The extended set covers an additional 5594 genes and 10.3 Mb compared with the current CCDS-based sets. The additional regions include potential disease genes previously inaccessible to exome resequencing studies, such as 43 genes linked to ion channel activity and 70 genes linked to protein kinase activity. In total, the new GENCODE exome set developed here covers 47.9 Mb and performed well in sequence capture experiments. In the sample set used in this study, we identified over 5000 SNP variants more in the GENCODE exome target (24%) than in the CCDS-based exome sequencing.

    Funded by: NHGRI NIH HHS: 5U54HG004555, U54 HG004555; Wellcome Trust: 077198, WT062023, WT089062

    European journal of human genetics : EJHG 2011;19;7;827-31

  • A world in a grain of sand: human history from genetic data.

    Colonna V, Pagani L, Xue Y and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK.

    Genome-wide genotypes and sequences are enriching our understanding of the past 50,000 years of human history and providing insights into earlier periods largely inaccessible to mitochondrial DNA and Y-chromosomal studies.To see a world in a grain of sand ...William Blake, Auguries of Innocence.

    Funded by: Wellcome Trust

    Genome biology 2011;12;11;234

  • Variation in genome-wide mutation rates within and between human families.

    Conrad DF, Keebler JE, DePristo MA, Lindsay SJ, Zhang Y, Casals F, Idaghdour Y, Hartl CL, Torroja C, Garimella KV, Zilversmit M, Cartwright R, Rouleau GA, Daly M, Stone EA, Hurles ME, Awadalla P and 1000 Genomes Project

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    J.B.S. Haldane proposed in 1947 that the male germline may be more mutagenic than the female germline. Diverse studies have supported Haldane's contention of a higher average mutation rate in the male germline in a variety of mammals, including humans. Here we present, to our knowledge, the first direct comparative analysis of male and female germline mutation rates from the complete genome sequences of two parent-offspring trios. Through extensive validation, we identified 49 and 35 germline de novo mutations (DNMs) in two trio offspring, as well as 1,586 non-germline DNMs arising either somatically or in the cell lines from which the DNA was derived. Most strikingly, in one family, we observed that 92% of germline DNMs were from the paternal germline, whereas, in contrast, in the other family, 64% of DNMs were from the maternal germline. These observations suggest considerable variation in mutation rates within and between families.

    Funded by: NIGMS NIH HHS: R01 GM070806; Wellcome Trust: 077014, 077014/Z/05/Z, 085532, 090532

    Nature genetics 2011;43;7;712-4

  • A genome-wide association study in Europeans and South Asians identifies five new loci for coronary artery disease.

    Coronary Artery Disease (C4D) Genetics Consortium

    Genome-wide association studies have identified 11 common variants convincingly associated with coronary artery disease (CAD)¹⁻⁷, a modest number considering the apparent heritability of CAD⁸. All of these variants have been discovered in European populations. We report a meta-analysis of four large genome-wide association studies of CAD, with ∼575,000 genotyped SNPs in a discovery dataset comprising 15,420 individuals with CAD (cases) (8,424 Europeans and 6,996 South Asians) and 15,062 controls. There was little evidence for ancestry-specific associations, supporting the use of combined analyses. Replication in an independent sample of 21,408 cases and 19,185 controls identified five loci newly associated with CAD (P < 5 × 10⁻⁸ in the combined discovery and replication analysis): LIPA on 10q23, PDGFD on 11q22, ADAMTS7-MORF4L1 on 15q25, a gene rich locus on 7q22 and KIAA1462 on 10p11. The CAD-associated SNP in the PDGFD locus showed tissue-specific cis expression quantitative trait locus effects. These findings implicate new pathways for CAD susceptibility.

    Funded by: British Heart Foundation: RG/08/014/24067; Department of Health: RP-PG-0407-10371; Medical Research Council: G0601966, G0700931, G0801056, G9521010, MC_U137686857

    Nature genetics 2011;43;4;339-44

  • Basigin is a receptor essential for erythrocyte invasion by Plasmodium falciparum.

    Crosnier C, Bustamante LY, Bartholdson SJ, Bei AK, Theron M, Uchikawa M, Mboup S, Ndir O, Kwiatkowski DP, Duraisingh MT, Rayner JC and Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.

    Erythrocyte invasion by Plasmodium falciparum is central to the pathogenesis of malaria. Invasion requires a series of extracellular recognition events between erythrocyte receptors and ligands on the merozoite, the invasive form of the parasite. None of the few known receptor-ligand interactions involved are required in all parasite strains, indicating that the parasite is able to access multiple redundant invasion pathways. Here, we show that we have identified a receptor-ligand pair that is essential for erythrocyte invasion in all tested P. falciparum strains. By systematically screening a library of erythrocyte proteins, we have found that the Ok blood group antigen, basigin, is a receptor for PfRh5, a parasite ligand that is essential for blood stage growth. Erythrocyte invasion was potently inhibited by soluble basigin or by basigin knockdown, and invasion could be completely blocked using low concentrations of anti-basigin antibodies; importantly, these effects were observed across all laboratory-adapted and field strains tested. Furthermore, Ok(a-) erythrocytes, which express a basigin variant that has a weaker binding affinity for PfRh5, had reduced invasion efficiencies. Our discovery of a cross-strain dependency on a single extracellular receptor-ligand pair for erythrocyte invasion by P. falciparum provides a focus for new anti-malarial therapies.

    Funded by: Medical Research Council: G19/9; NCEZID CDC HHS: R36 CK000119, R36 CK000119-01; NIAID NIH HHS: 2T32 AI007535-12, R01 AI057919, R01 AI057919-05, R01AI057919, T32 AI007535; Wellcome Trust: 077108, 089084, 090532

    Nature 2011;480;7378;534-7

  • Disruption of mouse Slx4, a regulator of structure-specific nucleases, phenocopies Fanconi anemia.

    Crossan GP, van der Weyden L, Rosado IV, Langevin F, Gaillard PHL, McIntyre RE, Sanger Mouse Genetics Project, Gallagher F, Kettunen MI, Lewis DY, Brindle K, Arends MJ, Adams DJ and Patel KJ

    Medical Research Council, Laboratory of Molecular Biology, Cambridge, UK.

    The evolutionarily conserved SLX4 protein, a key regulator of nucleases, is critical for DNA damage response. SLX4 nuclease complexes mediate repair during replication and can also resolve Holliday junctions formed during homologous recombination. Here we describe the phenotype of the Btbd12 knockout mouse, the mouse ortholog of SLX4, which recapitulates many key features of the human genetic illness Fanconi anemia. Btbd12-deficient animals are born at sub-Mendelian ratios, have greatly reduced fertility, are developmentally compromised and are prone to blood cytopenias. Btbd12(-/-) cells prematurely senesce, spontaneously accumulate damaged chromosomes and are particularly sensitive to DNA crosslinking agents. Genetic complementation reveals a crucial requirement for Btbd12 (also known as Slx4) to interact with the structure-specific endonuclease Xpf-Ercc1 to promote crosslink repair. The Btbd12 knockout mouse therefore establishes a disease model for Fanconi anemia and genetically links a regulator of nuclease incision complexes to the Fanconi anemia DNA crosslink repair pathway.

    Funded by: Cancer Research UK: 12401, A11073, A11376, A12401, A8449; Medical Research Council: MC_U105178811, U.1051.03.009(78811); Wellcome Trust: 098051

    Nature genetics 2011;43;2;147-52

  • Rapid pneumococcal evolution in response to clinical interventions.

    Croucher NJ, Harris SR, Fraser C, Quail MA, Burton J, van der Linden M, McGee L, von Gottberg A, Song JH, Ko KS, Pichon B, Baker S, Parry CM, Lambertsen LM, Shahinas D, Pillai DR, Mitchell TJ, Dougan G, Tomasz A, Klugman KP, Parkhill J, Hanage WP and Bentley SD

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Epidemiological studies of the naturally transformable bacterial pathogen Streptococcus pneumoniae have previously been confounded by high rates of recombination. Sequencing 240 isolates of the PMEN1 (Spain(23F)-1) multidrug-resistant lineage enabled base substitutions to be distinguished from polymorphisms arising through horizontal sequence transfer. More than 700 recombinations were detected, with genes encoding major antigens frequently affected. Among these were 10 capsule-switching events, one of which accompanied a population shift as vaccine-escape serotype 19A isolates emerged in the USA after the introduction of the conjugate polysaccharide vaccine. The evolution of resistance to fluoroquinolones, rifampicin, and macrolides was observed to occur on multiple occasions. This study details how genomic plasticity within lineages of recombinogenic bacteria can permit adaptation to clinical interventions over remarkably short time scales.

    Funded by: Medical Research Council: G0800596; Wellcome Trust: 076962, 076964

    Science (New York, N.Y.) 2011;331;6016;430-4

  • Assessing the complex architecture of polygenic traits in diverged yeast populations.

    Cubillos FA, Billi E, Zörgö E, Parts L, Fargier P, Omholt S, Blomberg A, Warringer J, Louis EJ and Liti G

    Centre for Genetics and Genomics, Queen's Medical Centre, University of Nottingham, Nottingham, UK.

    Phenotypic variation arising from populations adapting to different niches has a complex underlying genetic architecture. A major challenge in modern biology is to identify the causative variants driving phenotypic variation. Recently, the baker's yeast, Saccharomyces cerevisiae has emerged as a powerful model for dissecting complex traits. However, past studies using a laboratory strain were unable to reveal the complete architecture of polygenic traits. Here, we present a linkage study using 576 recombinant strains obtained from crosses of isolates representative of the major lineages. The meiotic recombinational landscape appears largely conserved between populations; however, strain-specific hotspots were also detected. Quantitative measurements of growth in 23 distinct ecologically relevant environments show that our recombinant population recapitulates most of the standing phenotypic variation described in the species. Linkage analysis detected an average of 6.3 distinct QTLs for each condition tested in all crosses, explaining on average 39% of the phenotypic variation. The QTLs detected are not constrained to a small number of loci, and the majority are specific to a single cross-combination and to a specific environment. Moreover, crosses between strains of similar phenotypes generate greater variation in the offspring, suggesting the presence of many antagonistic alleles and epistatic interactions. We found that subtelomeric regions play a key role in defining individual quantitative variation, emphasizing the importance of the adaptive nature of these regions in natural populations. This set of recombinant strains is a powerful tool for investigating the complex architecture of polygenic traits.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F015216/1, BB/G01616X/1, BBF0152161; Wellcome Trust: WT 084507MA, WT077192 ⁄ Z ⁄ 05 ⁄ Z

    Molecular ecology 2011;20;7;1401-13

  • A viral discovery methodology for clinical biopsy samples utilising massively parallel next generation sequencing.

    Daly GM, Bexfield N, Heaney J, Stubbs S, Mayer AP, Palser A, Kellam P, Drou N, Caccamo M, Tiley L, Alexander GJ, Bernal W and Heeney JL

    Department of Veterinary Medicine, The University of Cambridge, Cambridge, United Kingdom.

    Here we describe a virus discovery protocol for a range of different virus genera, that can be applied to biopsy-sized tissue samples. Our viral enrichment procedure, validated using canine and human liver samples, significantly improves viral read copy number and increases the length of viral contigs that can be generated by de novo assembly. This in turn enables the Illumina next generation sequencing (NGS) platform to be used as an effective tool for viral discovery from tissue samples.

    Funded by: Wellcome Trust

    PloS one 2011;6;12;e28879

  • The effect of next-generation sequencing technology on complex trait research.

    Day-Williams AG and Zeggini E

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Background: Advances in the understanding of complex trait genetics have always been enabled by advances in genomic technology. Next-generation sequencing (NGS) is set to revolutionize the way complex trait genetics research is carried out.

    Results: NGS has multiple applications in the field of human genetics, but is accompanied by substantial study design, analysis and interpretation challenges. This review discusses key aspects of study design considerations, data handling issues and required analytical developments. We also highlight early successes in mapping genetic traits using NGS.

    Conclusion: NGS opens the entire spectrum of genomic alterations for the genetic analysis of complex traits and there are early publications illustrating its power. Continuing development in analytical tools will allow the promise of NGS to be realized.

    European journal of clinical investigation 2011;41;5;561-7

  • Linkage analysis without defined pedigrees.

    Day-Williams AG, Blangero J, Dyer TD, Lange K and Sobel EM

    Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095-7088, USA.

    The need to collect accurate and complete pedigree information has been a drawback of family-based linkage and association studies. Even in case-control studies, investigators should be aware of, and condition on, familial relationships. In single nucleotide polymorphism (SNP) genome scans, relatedness can be directly inferred from the genetic data rather than determined through interviews. Various methods of estimating relatedness have previously been implemented, most notably in PLINK. We present new fast and accurate algorithms for estimating global and local kinship coefficients from dense SNP genotypes. These algorithms require only a single pass through the SNP genotype data. We also show that these estimates can be used to cluster individuals into pedigrees. With these estimates in hand, quantitative trait locus linkage analysis proceeds via traditional variance components methods without any prior relationship information. We demonstrate the success of our algorithms on simulated and real data sets. Our procedures make linkage analysis as easy as a typical genomewide association study.

    Funded by: NHGRI NIH HHS: R01 HG006139; NHLBI NIH HHS: P01 HL045522-18; NIGMS NIH HHS: GM053275, R01 GM053275, R01 GM053275-15; NIMH NIH HHS: MH059490, R37 MH059490-12

    Genetic epidemiology 2011;35;5;360-70

  • An evaluation of different target enrichment methods in pooled sequencing designs for complex disease association studies.

    Day-Williams AG, McLay K, Drury E, Edkins S, Coffey AJ, Palotie A and Zeggini E

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom.

    Pooled sequencing can be a cost-effective approach to disease variant discovery, but its applicability in association studies remains unclear. We compare sequence enrichment methods coupled to next-generation sequencing in non-indexed pools of 1, 2, 10, 20 and 50 individuals and assess their ability to discover variants and to estimate their allele frequencies. We find that pooled resequencing is most usefully applied as a variant discovery tool due to limitations in estimating allele frequency with high enough accuracy for association studies, and that in-solution hybrid-capture performs best among the enrichment methods examined regardless of pool size.

    Funded by: Wellcome Trust: WT088885/Z/09/Z

    PloS one 2011;6;11;e26279

  • A variant in MCF2L is associated with osteoarthritis.

    Day-Williams AG, Southam L, Panoutsopoulou K, Rayner NW, Esko T, Estrada K, Helgadottir HT, Hofman A, Ingvarsson T, Jonsson H, Keis A, Kerkhof HJ, Thorleifsson G, Arden NK, Carr A, Chapman K, Deloukas P, Loughlin J, McCaskie A, Ollier WE, Ralston SH, Spector TD, Wallis GA, Wilkinson JM, Aslam N, Birell F, Carluke I, Joseph J, Rai A, Reed M, Walker K, arcOGEN Consortium, Doherty SA, Jonsdottir I, Maciewicz RA, Muir KR, Metspalu A, Rivadeneira F, Stefansson K, Styrkarsdottir U, Uitterlinden AG, van Meurs JB, Zhang W, Valdes AM, Doherty M and Zeggini E

    Wellcome Trust Sanger Institute, Hinxton, UK.

    Osteoarthritis (OA) is a prevalent, heritable degenerative joint disease with a substantial public health impact. We used a 1000-Genomes-Project-based imputation in a genome-wide association scan for osteoarthritis (3177 OA cases and 4894 controls) to detect a previously unidentified risk locus. We discovered a small disease-associated set of variants on chromosome 13. Through large-scale replication, we establish a robust association with SNPs in MCF2L (rs11842874, combined odds ratio [95% confidence interval] 1.17 [1.11-1.23], p = 2.1 × 10(-8)) across a total of 19,041 OA cases and 24,504 controls of European descent. This risk locus represents the third established signal for OA overall. MCF2L regulates a nerve growth factor (NGF), and treatment with a humanized monoclonal antibody against NGF is associated with reduction in pain and improvement in function for knee OA patients.

    Funded by: Medical Research Council: G0100594, G0901461, MC_U122886349

    American journal of human genetics 2011;89;3;446-50

  • Contrasting signals of positive selection in genes involved in human skin-color variation from tests based on SNP scans and resequencing.

    de Gruijter JM, Lao O, Vermeulen M, Xue Y, Woodwark C, Gillson CJ, Coffey AJ, Ayub Q, Mehdi SQ, Kayser M and Tyler-Smith C

    Department of Forensic Molecular Biology, Erasmus MC University Medical Center, PO Box 2040, Rotterdam, 3000 CA, The Netherlands.

    Background: Numerous genome-wide scans conducted by genotyping previously ascertained single-nucleotide polymorphisms (SNPs) have provided candidate signatures for positive selection in various regions of the human genome, including in genes involved in pigmentation traits. However, it is unclear how well the signatures discovered by such haplotype-based test statistics can be reproduced in tests based on full resequencing data. Four genes (oculocutaneous albinism II (OCA2), tyrosinase-related protein 1 (TYRP1), dopachrome tautomerase (DCT), and KIT ligand (KITLG)) implicated in human skin-color variation, have shown evidence for positive selection in Europeans and East Asians in previous SNP-scan data. In the current study, we resequenced 4.7 to 6.7 kb of DNA from each of these genes in Africans, Europeans, East Asians, and South Asians.

    Results: Applying all commonly used neutrality-test statistics for allele frequency distribution to the newly generated sequence data provided conflicting results regarding evidence for positive selection. Previous haplotype-based findings could not be clearly confirmed. Although some tests were marginally significant for some populations and genes, none of them were significant after multiple-testing correction. Combined P values for each gene-population pair did not improve these results. Application of Approximate Bayesian Computation Markov chain Monte Carlo based to these sequence data using a simple forward simulator revealed broad posterior distributions of the selective parameters for all four genes, providing no support for positive selection. However, when we applied this approach to published sequence data on SLC45A2, another human pigmentation candidate gene, we could readily confirm evidence for positive selection, as previously detected with sequence-based and some haplotype-based tests.

    Conclusions: Overall, our data indicate that even genes that are strong biological candidates for positive selection and show reproducible signatures of positive selection in SNP scans do not always show the same replicability of selection signals in other tests, which should be considered in future studies on detecting positive selection in genetic data.

    Investigative genetics 2011;2;1;24

  • Computational identification of insertional mutagenesis targets for cancer gene discovery.

    de Jong J, de Ridder J, van der Weyden L, Sun N, van Uitert M, Berns A, van Lohuizen M, Jonkers J, Adams DJ and Wessels LF

    Bioinformatics and Statistics, The Netherlands Cancer Institute, Plesmanlaan 121, 1066CX Amsterdam, The Netherlands.

    Insertional mutagenesis is a potent forward genetic screening technique used to identify candidate cancer genes in mouse model systems. An important, yet unresolved issue in the analysis of these screens, is the identification of the genes affected by the insertions. To address this, we developed Kernel Convolved Rule Based Mapping (KC-RBM). KC-RBM exploits distance, orientation and insertion density across tumors to automatically map integration sites to target genes. We perform the first genome-wide evaluation of the association of insertion occurrences with aberrant gene expression of the predicted targets in both retroviral and transposon data sets. We demonstrate the efficiency of KC-RBM by showing its superior performance over existing approaches in recovering true positives from a list of independently, manually curated cancer genes. The results of this work will significantly enhance the accuracy and speed of cancer gene discovery in forward genetic screens. KC-RBM is available as R-package.

    Funded by: Cancer Research UK; Wellcome Trust; Worldwide Cancer Research: 07-0585

    Nucleic acids research 2011;39;15;e105

  • Genetic risk reclassification for type 2 diabetes by age below or above 50 years using 40 type 2 diabetes risk single nucleotide polymorphisms.

    de Miguel-Yanes JM, Shrader P, Pencina MJ, Fox CS, Manning AK, Grant RW, Dupuis J, Florez JC, D'Agostino RB, Cupples LA, Meigs JB, MAGIC Investigators and DIAGRAM+ Investigators

    General Medicine Division, Massachusetts General Hospital, Boston, Massachusetts, USA.

    Objective: To test if knowledge of type 2 diabetes genetic variants improves disease prediction.

    Research design and methods: We tested 40 single nucleotide polymorphisms (SNPs) associated with diabetes in 3,471 Framingham Offspring Study subjects followed over 34 years using pooled logistic regression models stratified by age (<50 years, diabetes cases = 144; or ≥50 years, diabetes cases = 302). Models included clinical risk factors and a 40-SNP weighted genetic risk score.

    Results: In people <50 years of age, the clinical risk factors model C-statistic was 0.908; the 40-SNP score increased it to 0.911 (P = 0.3; net reclassification improvement (NRI): 10.2%, P = 0.001). In people ≥50 years of age, the C-statistics without and with the score were 0.883 and 0.884 (P = 0.2; NRI: 0.4%). The risk per risk allele was higher in people <50 than ≥50 years of age (24 vs. 11%; P value for age interaction = 0.02).

    Conclusions: Knowledge of common genetic variation appropriately reclassifies younger people for type 2 diabetes risk beyond clinical risk factors but not older people.

    Funded by: Medical Research Council: MC_U106179474; NCRR NIH HHS: 1S10RR163736-01A1; NHLBI NIH HHS: N01-HC- 25195; NIDDK NIH HHS: K23 DK065978, K23 DK65978, K24 DK080140, R01 DK078616, R21 DK084527, R21 DK084527-01

    Diabetes care 2011;34;1;121-5

  • Ethical issues in human genomics research in developing countries.

    de Vries J, Bull SJ, Doumbo O, Ibrahim M, Mercereau-Puijalon O, Kwiatkowski D and Parker M

    The Ethox Centre, Department of Public Health and Primary Care, University of Oxford, Old Road Campus, Headington, Oxford, OX3 7LF, UK.

    Background: Genome-wide association studies (GWAS) provide a powerful means of identifying genetic variants that play a role in common diseases. Such studies present important ethical challenges. An increasing number of GWAS is taking place in lower income countries and there is a pressing need to identify the particular ethical challenges arising in such contexts. In this paper, we draw upon the experiences of the MalariaGEN Consortium to identify specific ethical issues raised by such research in Africa, Asia and Oceania.

    Discussion: We explore ethical issues in three key areas: protecting the interests of research participants, regulation of international collaborative genomics research and protecting the interests of scientists in low income countries. With regard to participants, important challenges are raised about community consultation and consent. Genomics research raises ethical and governance issues about sample export and ownership, about the use of archived samples and about the complexity of reviewing such large international projects. In the context of protecting the interests of researchers in low income countries, we discuss aspects of data sharing and capacity building that need to be considered for sustainable and mutually beneficial collaborations.

    Summary: Many ethical issues are raised when genomics research is conducted on populations that are characterised by lower average income and literacy levels, such as the populations included in MalariaGEN. It is important that such issues are appropriately addressed in such research. Our experience suggests that the ethical issues in genomics research can best be identified, analysed and addressed where ethics is embedded in the design and implementation of such research projects.

    Funded by: Medical Research Council: G0600230, G0600718, G19/9; Wellcome Trust: 077383/Z/05/Z, 087285/Z/08/Z, WT 083326/Z/07/Z

    BMC medical ethics 2011;12;5

  • Cell type-specific DNA methylation at intragenic CpG islands in the immune system.

    Deaton AM, Webb S, Kerr AR, Illingworth RS, Guy J, Andrews R and Bird A

    Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, United Kingdom

    Human and mouse genomes contain a similar number of CpG islands (CGIs), which are discrete CpG-rich DNA sequences associated with transcription start sites. In both species, ∼50% of all CGIs are remote from annotated promoters but, nevertheless, often have promoter-like features. To determine the role of CGI methylation in cell differentiation, we analyzed DNA methylation at a comprehensive CGI set in cells of the mouse hematopoietic lineage. Using a method that potentially detects ∼33% of genomic CpGs in the methylated state, we found that large differences in gene expression were accompanied by surprisingly few DNA methylation changes. There were, however, many DNA methylation differences between hematopoietic cells and a distantly related tissue, brain. Altered DNA methylation in the immune system occurred predominantly at CGIs within gene bodies, which have the properties of cell type-restricted promoters, but infrequently at annotated gene promoters or CGI flanking sequences (CGI "shores"). Unexpectedly, elevated intragenic CGI methylation correlated with silencing of the associated gene. Differentially methylated intragenic CGIs tended to lack H3K4me3 and associate with a transcriptionally repressive environment regardless of methylation state. Our results indicate that DNA methylation changes play a relatively minor role in the late stages of differentiation and suggest that intragenic CGIs represent regulatory sites of differential gene expression during the early stages of lineage specification.

    Funded by: Medical Research Council; Wellcome Trust

    Genome research 2011;21;7;1074-86

  • Does a short breastfeeding period protect from FTO-induced adiposity in children?

    Dedoussis GV, Yannakoulia M, Timpson NJ, Manios Y, Kanoni S, Scott RA, Papoutsakis C, Deloukas P, Pitsiladis YP, Davey-Smith G, Hirschhorn JN and Lyon HN

    Department of Dietetics and Nutrition, Harokopio University, Athens, Greece.

    Context: A number of studies have reported replicable associations between common genetic loci and obesity indices. One of these loci is the fat mass and obesity associated locus (FTO). We aimed to assess whether breastfeeding mediated the known association between FTO and indices of body fatness.

    Methods: This study includes three independent pediatric cohorts, two of Greek origin (the Gene-Diet Attica Investigation: GENDAI, n=1 138 and the "Growth, Exercise and Nutrition Epidemiological Study In preschoolers": the GENESIS study, n=2 374) and one British (the Avon Longitudinal Study of Parents and Children:ALSPAC, n=4 325). Among other information, breastfeeding history was recorded. A DNA sample was ascertained by either blood or saliva. Genotyping for FTO variants was performed in GENDAI and ALSPAC for the rs9939609, while in GENESIS, for the rs17817449 variant.

    Results: In all cohorts, multivariate analysis showed that the association between FTO:rs9939609 and measures of obesity was consistent across newly presented cohorts (GENDAI: Body mass index [BMI], β=0.43, p=0.009; Waist Circumference, β=1.067, p=0.019; triceps skinfold, β=0.972, p=0.003; subscapular skinfold, β=0.593, p=0.023; GENESIS: Waist Circumference, β=0.473, p=0.008 and subscapular skinfold, β=0.227, p=0.014). Inclusion of one month of breastfeeding as an interaction term effectively removed these associations with indices of obesity (BMI, Waist-Hip-Ratio and subscapular skinfold). No evidence of such interaction was observed for the independent cohort of British children.

    Conclusions: Our findings indicate that in two moderately sized Greek samples, breastfeeding may exert a modifying effect on the relationship between variants at the FTO locus and indices of adiposity. These findings were not replicated in a larger British collection.

    Funded by: Medical Research Council: G0600705, G9815508; NIDDK NIH HHS: K23 DK067288; Wellcome Trust

    International journal of pediatric obesity : IJPO : an official journal of the International Association for the Study of Obesity 2011;6;2-2;e326-35

  • Specific capture and whole-genome sequencing of viruses from clinical samples.

    Depledge DP, Palser AL, Watson SJ, Lai IY, Gray ER, Grant P, Kanda RK, Leproust E, Kellam P and Breuer J

    Division of Infection and Immunity, University College London, London, United Kingdom.

    Whole genome sequencing of viruses directly from clinical samples is integral for understanding the genetics of host-virus interactions. Here, we report the use of sample sparing target enrichment (by hybridisation) for viral nucleic acid separation and deep-sequencing of herpesvirus genomes directly from a range of clinical samples including saliva, blood, virus vesicles, cerebrospinal fluid, and tumour cell lines. We demonstrate the effectiveness of the method by deep-sequencing 13 highly cell-associated human herpesvirus genomes and generating full length genome alignments at high read depth. Moreover, we show the specificity of the method enables the study of viral population structures and their diversity within a range of clinical samples types.

    Funded by: Department of Health; Medical Research Council: G07008, G0700814, G0900950; Wellcome Trust: 081703MA

    PloS one 2011;6;11;e27805

  • Dalliance: interactive genome viewing on the web.

    Down TA, Piipari M and Hubbard TJ

    Wellcome Trust/CRUK Gurdon Institute, Cambridge CB2 1QN, UK.

    Summary: Dalliance is a new genome viewer which offers a high level of interactivity while running within a web browser. All data is fetched using the established distributed annotation system (DAS) protocol, making it easy to customize the browser and add extra data.

    Availability and implementation: Dalliance runs entirely within your web browser, and relies on existing DAS server infrastructure. Browsers for several mammalian genomes are available at, and the use of DAS means you can add your own data to these browsers. In addition, the source code (Javascript) is available under the BSD license, and is straightforward to install on your own web server and embed within other documents.

    Funded by: Wellcome Trust: 077198, 083563

    Bioinformatics (Oxford, England) 2011;27;6;889-90

  • Whole genome sequencing of multiple Leishmania donovani clinical isolates provides insights into population structure and mechanisms of drug resistance.

    Downing T, Imamura H, Decuypere S, Clark TG, Coombs GH, Cotton JA, Hilley JD, de Doncker S, Maes I, Mottram JC, Quail MA, Rijal S, Sanders M, Schönian G, Stark O, Sundar S, Vanaerschot M, Hertz-Fowler C, Dujardin JC and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, United Kingdom.

    Visceral leishmaniasis is a potentially fatal disease endemic to large parts of Asia and Africa, primarily caused by the protozoan parasite Leishmania donovani. Here, we report a high-quality reference genome sequence for a strain of L. donovani from Nepal, and use this sequence to study variation in a set of 16 related clinical lines, isolated from visceral leishmaniasis patients from the same region, which also differ in their response to in vitro drug susceptibility. We show that whole-genome sequence data reveals genetic structure within these lines not shown by multilocus typing, and suggests that drug resistance has emerged multiple times in this closely related set of lines. Sequence comparisons with other Leishmania species and analysis of single-nucleotide diversity within our sample showed evidence of selection acting in a range of surface- and transport-related genes, including genes associated with drug resistance. Against a background of relative genetic homogeneity, we found extensive variation in chromosome copy number between our lines. Other forms of structural variation were significantly associated with drug resistance, notably including gene dosage and the copy number of an experimentally verified circular episome present in all lines and described here for the first time. This study provides a basis for more powerful molecular profiling of visceral leishmaniasis, providing additional power to track the drug resistance and epidemiology of an important human pathogen.

    Funded by: Wellcome Trust: 076355, 085775/Z/08/Z

    Genome research 2011;21;12;2143-56

  • Developing and implementing an institute-wide data sharing policy.

    Dyke SO and Hubbard TJ

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    The Wellcome Trust Sanger Institute has a strong reputation for prepublication data sharing as a result of its policy of rapid release of genome sequence data and particularly through its contribution to the Human Genome Project. The practicalities of broad data sharing remain largely uncharted, especially to cover the wide range of data types currently produced by genomic studies and to adequately address ethical issues. This paper describes the processes and challenges involved in implementing a data sharing policy on an institute-wide scale. This includes questions of governance, practical aspects of applying principles to diverse experimental contexts, building enabling systems and infrastructure, incentives and collaborative issues.

    Genome medicine 2011;3;9;60

  • A user's guide to the encyclopedia of DNA elements (ENCODE).

    ENCODE Project Consortium

    HudsonAlpha Institute for Biotechnology, Huntsville, Alabama, United States of America.

    The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome.

    Funded by: NHGRI NIH HHS: R01 HG003143, R01 HG004037, RC2 HG005573; NIDDK NIH HHS: R01 DK054369, R01 DK065806; Wellcome Trust: 095908

    PLoS biology 2011;9;4;e1001046

  • Meta-analysis of genome-wide association studies confirms a susceptibility locus for knee osteoarthritis on chromosome 7q22.

    Evangelou E, Valdes AM, Kerkhof HJ, Styrkarsdottir U, Zhu Y, Meulenbelt I, Lories RJ, Karassa FB, Tylzanowski P, Bos SD, arcOGEN Consortium, Akune T, Arden NK, Carr A, Chapman K, Cupples LA, Dai J, Deloukas P, Doherty M, Doherty S, Engstrom G, Gonzalez A, Halldorsson BV, Hammond CL, Hart DJ, Helgadottir H, Hofman A, Ikegawa S, Ingvarsson T, Jiang Q, Jonsson H, Kaprio J, Kawaguchi H, Kisand K, Kloppenburg M, Kujala UM, Lohmander LS, Loughlin J, Luyten FP, Mabuchi A, McCaskie A, Nakajima M, Nilsson PM, Nishida N, Ollier WE, Panoutsopoulou K, van de Putte T, Ralston SH, Rivadeneira F, Saarela J, Schulte-Merker S, Shi D, Slagboom PE, Sudo A, Tamm A, Tamm A, Thorleifsson G, Thorsteinsdottir U, Tsezou A, Wallis GA, Wilkinson JM, Yoshimura N, Zeggini E, Zhai G, Zhang F, Jonsdottir I, Uitterlinden AG, Felson DT, van Meurs JB, Stefansson K, Ioannidis JP, Spector TD and Translation Research in Europe Applied Technologies for Osteoarthritis (TreatOA)

    Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece.

    Objectives: Osteoarthritis (OA) is the most prevalent form of arthritis and accounts for substantial morbidity and disability, particularly in older people. It is characterised by changes in joint structure, including degeneration of the articular cartilage, and its aetiology is multifactorial with a strong postulated genetic component.

    Methods: A meta-analysis was performed of four genome-wide association (GWA) studies of 2371 cases of knee OA and 35 909 controls in Caucasian populations. Replication of the top hits was attempted with data from 10 additional replication datasets.

    Results: With a cumulative sample size of 6709 cases and 44 439 controls, one genome-wide significant locus was identified on chromosome 7q22 for knee OA (rs4730250, p=9.2 × 10⁻⁹), thereby confirming its role as a susceptibility locus for OA.

    Conclusion: The associated signal is located within a large (500 kb) linkage disequilibrium block that contains six genes: PRKAR2B (protein kinase, cAMP-dependent, regulatory, type II, β), HPB1 (HMG-box transcription factor 1), COG5 (component of oligomeric golgi complex 5), GPR22 (G protein-coupled receptor 22), DUS4L (dihydrouridine synthase 4-like) and BCAP29 (B cell receptor-associated protein 29). Gene expression analyses of the (six) genes in primary cells derived from different joint tissues confirmed expression of all the genes in the joint environment.

    Funded by: Arthritis Research UK: 17489, 18030; Medical Research Council: G0000934, G0100594, G0901461, MC_U122886349; Wellcome Trust: 068545, 083948, 088785, WT079557MA, WT088885/Z/09/Z

    Annals of the rheumatic diseases 2011;70;2;349-55

  • Differential protein expression throughout the life cycle of Trypanosoma congolense, a major parasite of cattle in Africa.

    Eyford BA, Sakurai T, Smith D, Loveless B, Hertz-Fowler C, Donelson JE, Inoue N and Pearson TW

    Department of Biochemistry and Microbiology, University of Victoria, Victoria, British Columbia, Canada.

    Trypanosoma congolense is an important pathogen of livestock in Africa. To study protein expression throughout the T. congolense life cycle, we used culture-derived parasites of each of the three main insect stages and bloodstream stage parasites isolated from infected mice, to perform differential protein expression analysis. Three complete biological replicates of all four life cycle stages were produced from T. congolense IL3000, a cloned parasite that is amenable to culture of major life cycle stages in vitro. Cellular proteins from each life cycle stage were trypsin digested and the resulting peptides were labeled with isobaric tags for relative and absolute quantification (iTRAQ). The peptides were then analyzed by tandem mass spectrometry (MS/MS). This method was used to identify and relatively quantify proteins from the different life cycle stages in the same experiment. A search of the Wellcome Trust's Sanger Institute's semi-annotated T. congolense database was performed using the MS/MS fragmentation data to identify the corresponding source proteins. A total of 2088 unique protein sequences were identified, representing 23% of the ∼9000 proteins predicted for the T. congolense proteome. The 1291 most confidently identified proteins were prioritized for further study. Of these, 784 yielded annotated hits while 501 were described as "hypothetical proteins". Six proteins showed no significant sequence similarity to any known proteins (from any species) and thus represent new, previously uncharacterized T. congolense proteins. Of particular interest among the remainder are several membrane molecules that showed drastic differential expression, including, not surprisingly, the well-studied variant surface glycoproteins (VSGs), invariant surface glycoproteins (ISGs) 65 and 75, congolense epimastigote specific protein (CESP), the surface protease GP63, an amino acid transporter, a pteridine transporter and a haptoglobin-hemoglobin receptor. Several of these surface disposed proteins are of functional interest as they are necessary for survival of the parasites.

    Funded by: Wellcome Trust: WT 085775/Z/08/Z

    Molecular and biochemical parasitology 2011;177;2;116-25

  • Examining the overlap between genome-wide rare variant association signals and linkage peaks in rheumatoid arthritis.

    Eyre S, Ke X, Lawrence R, Bowes J, Panoutsopoulou K, Barton A, Thomson W, Worthington J and Zeggini E

    University of Manchester, Manchester, UK.

    Objective: With the exception of the major histocompatibility complex (MHC) and STAT4, no other rheumatoid arthritis (RA) linkage peak has been successfully fine-mapped to date. This apparent failure to identify association under peaks of linkage could be ascribed to the examination of common variation, when linkage is likely to be driven by rare variants. The purpose of this study was to investigate the overlap between genome-wide rare variant RA association signals observed in the Wellcome Trust Case Control Consortium (WTCCC) study and 11 replicating RA linkage peaks, defined as regions with evidence for linkage in >1 study.

    Methods: The WTCCC data set contained 40,482 variants with minor allele frequency of ≤0.05 in 1,860 RA patients and 2,938 controls. Genotypes of all rare variants within a given gene region were collapsed into a single locus and a global P value was calculated per gene.

    Results: The distribution of rare variant signals (association P≤10(-5)) was found to differ significantly between regions with and without linkage evidence (P=2×10(-17) by Fisher's exact test). No significant difference was observed after data from the MHC region were removed or when the effect of the HLA-DRB1 locus was accounted for.

    Conclusion: The results suggest that rare variant association signals are significantly overrepresented under linkage peaks in RA, but the effect is driven by the MHC. This is the first study to examine the overlap between linkage peaks and rare variant association signals genome-wide in a complex disease.

    Funded by: Arthritis Research UK: 17552, 18030; Wellcome Trust: 076113, 079557MA, 088885, WT088885/Z/09/Z

    Arthritis and rheumatism 2011;63;6;1522-6

  • Troponin T is essential for sarcomere assembly in zebrafish skeletal muscle.

    Ferrante MI, Kiff RM, Goulding DA and Stemple DL

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK.

    In striated muscle, the basic contractile unit is the sarcomere, which comprises myosin-rich thick filaments intercalated with thin filaments made of actin, tropomyosin and troponin. Troponin is required to regulate Ca(2+)-dependent contraction, and mutant forms of troponins are associated with muscle diseases. We have disrupted several genes simultaneously in zebrafish embryos and have followed the progression of muscle degeneration in the absence of troponin. Complete loss of troponin T activity leads to loss of sarcomere structure, in part owing to the destructive nature of deregulated actin-myosin activity. When troponin T and myosin activity are simultaneously disrupted, immature sarcomeres are rescued. However, tropomyosin fails to localise to sarcomeres, and intercalating thin filaments are missing from electron microscopic cross-sections, indicating that loss of troponin T affects thin filament composition. If troponin activity is only partially disrupted, myofibrils are formed but eventually disintegrate owing to deregulated actin-myosin activity. We conclude that the troponin complex has at least two distinct activities: regulation of actin-myosin activity and, independently, a role in the proper assembly of thin filaments. Our results also indicate that sarcomere assembly can occur in the absence of normal thin filaments.

    Funded by: Wellcome Trust: WT 077037/Z/05/Z, WT 077047/Z/05/Z

    Journal of cell science 2011;124;Pt 4;565-77

  • The Genomic Standards Consortium.

    Field D, Amaral-Zettler L, Cochrane G, Cole JR, Dawyndt P, Garrity GM, Gilbert J, Glöckner FO, Hirschman L, Karsch-Mizrachi I, Klenk HP, Knight R, Kottmann R, Kyrpides N, Meyer F, San Gil I, Sansone SA, Schriml LM, Sterk P, Tatusova T, Ussery DW, White O and Wooley J

    Centre for Ecology & Hydrology, Maclean Building, Crowmarsh Gifford, Wallingford, Oxfordshire, United Kingdom.

    A vast and rich body of information has grown up as a result of the world's enthusiasm for 'omics technologies. Finding ways to describe and make available this information that maximise its usefulness has become a major effort across the 'omics world. At the heart of this effort is the Genomic Standards Consortium (GSC), an open-membership organization that drives community-based standardization activities, Here we provide a short history of the GSC, provide an overview of its range of current activities, and make a call for the scientific community to join forces to improve the quality and quantity of contextual information about our public collections of genomes, metagenomes, and marker gene sequences.

    PLoS biology 2011;9;6;e1001088

  • The Deciphering Developmental Disorders (DDD) study.

    Firth HV, Wright CF and DDD Study

    Department of Medical Genetics, Cambridge University Hospitals Foundation Trust, Cambridge, UK.

    Funded by: Wellcome Trust

    Developmental medicine and child neurology 2011;53;8;702-3

  • Germline fitness-based scoring of cancer mutations.

    Fischer A, Greenman C and Mustonen V

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    A key goal in cancer research is to find the genomic alterations that underlie malignant cells. Genomics has proved successful in identifying somatic variants at a large scale. However, it has become evident that a typical cancer exhibits a heterogenous mutation pattern across samples. Cases where the same alteration is observed repeatedly seem to be the exception rather than the norm. Thus, pinpointing the key alterations (driver mutations) from a background of variations with no direct causal link to cancer (passenger mutations) is difficult. Here we analyze somatic missense mutations from cancer samples and their healthy tissue counterparts (germline mutations) from the viewpoint of germline fitness. We calibrate a scoring system from protein domain alignments to score mutations and their target loci. We show first that this score predicts to a good degree the rate of polymorphism of the observed germline variation. The scoring is then applied to somatic mutations. We show that candidate cancer genes prone to copy number loss harbor mutations with germline fitness effects that are significantly more deleterious than expected by chance. This suggests that missense mutations play a driving role in tumor suppressor genes. Furthermore, these mutations fall preferably onto loci in sequence neighborhoods that are high scoring in terms of germline fitness. In contrast, for somatic mutations in candidate onco genes we do not observe a statistically significant effect. These results help to inform how to exploit germline fitness predictions in discovering new genes and mutations responsible for cancer.

    Funded by: Wellcome Trust: 091747

    Genetics 2011;188;2;383-93

  • aCGH.Spline--an R package for aCGH dye bias normalization.

    Fitzgerald TW, Larcombe LD, Le Scouarnec S, Clayton S, Rajan D, Carter NP and Redon R

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Motivation: The careful normalization of array-based comparative genomic hybridization (aCGH) data is of critical importance for the accurate detection of copy number changes. The difference in labelling affinity between the two fluorophores used in aCGH-usually Cy5 and Cy3-can be observed as a bias within the intensity distributions. If left unchecked, this bias is likely to skew data interpretation during downstream analysis and lead to an increased number of false discoveries.

    Results: In this study, we have developed aCGH.Spline, a natural cubic spline interpolation method followed by linear interpolation of outlier values, which is able to remove a large portion of the dye bias from large aCGH datasets in a quick and efficient manner.

    Conclusions: We have shown that removing this bias and reducing the experimental noise has a strong positive impact on the ability to detect accurately both copy number variation (CNV) and copy number alterations (CNA).

    Funded by: Wellcome Trust: WT077008

    Bioinformatics (Oxford, England) 2011;27;9;1195-200

  • Ensembl 2012.

    Flicek P, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, Fitzgerald S, Gil L, Gordon L, Hendrix M, Hourlier T, Johnson N, Kähäri AK, Keefe D, Keenan S, Kinsella R, Komorowska M, Koscielny G, Kulesha E, Larsson P, Longden I, McLaren W, Muffato M, Overduin B, Pignatelli M, Pritchard B, Riat HS, Ritchie GR, Ruffier M, Schuster M, Sobral D, Tang YA, Taylor K, Trevanion S, Vandrovcova J, White S, Wilson M, Wilder SP, Aken BL, Birney E, Cunningham F, Dunham I, Durbin R, Fernández-Suarez XM, Harrow J, Herrero J, Hubbard TJ, Parker A, Proctor G, Spudich G, Vogel J, Yates A, Zadissa A and Searle SM

    European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK.

    The Ensembl project ( provides genome resources for chordate genomes with a particular focus on human genome data as well as data for key model organisms such as mouse, rat and zebrafish. Five additional species were added in the last year including gibbon (Nomascus leucogenys) and Tasmanian devil (Sarcophilus harrisii) bringing the total number of supported species to 61 as of Ensembl release 64 (September 2011). Of these, 55 species appear on the main Ensembl website and six species are provided on the Ensembl preview site (Pre!Ensembl; with preliminary support. The past year has also seen improvements across the project.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/E011640/1; NHGRI NIH HHS: U01HG004695, U41HG006104, U54HG004563; Wellcome Trust: 095908, WT062023, WT079643

    Nucleic acids research 2011;40;Database issue;D84-90

  • Salmonella bongori provides insights into the evolution of the Salmonellae.

    Fookes M, Schroeder GN, Langridge GC, Blondel CJ, Mammina C, Connor TR, Seth-Smith H, Vernikos GS, Robinson KS, Sanders M, Petty NK, Kingsley RA, Bäumler AJ, Nuccio SP, Contreras I, Santiviago CA, Maskell D, Barrow P, Humphrey T, Nastasi A, Roberts M, Frankel G, Parkhill J, Dougan G and Thomson NR

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    The genus Salmonella contains two species, S. bongori and S. enterica. Compared to the well-studied S. enterica there is a marked lack of information regarding the genetic makeup and diversity of S. bongori. S. bongori has been found predominantly associated with cold-blooded animals, but it can infect humans. To define the phylogeny of this species, and compare it to S. enterica, we have sequenced 28 isolates representing most of the known diversity of S. bongori. This cross-species analysis allowed us to confidently differentiate ancestral functions from those acquired following speciation, which include both metabolic and virulence-associated capacities. We show that, although S. bongori inherited a basic set of Salmonella common virulence functions, it has subsequently elaborated on this in a different direction to S. enterica. It is an established feature of S. enterica evolution that the acquisition of the type III secretion systems (T3SS-1 and T3SS-2) has been followed by the sequential acquisition of genes encoding secreted targets, termed effectors proteins. We show that this is also true of S. bongori, which has acquired an array of novel effector proteins (sboA-L). All but two of these effectors have no significant S. enterica homologues and instead are highly similar to those found in enteropathogenic Escherichia coli (EPEC). Remarkably, SboH is found to be a chimeric effector protein, encoded by a fusion of the T3SS-1 effector gene sopA and a gene highly similar to the EPEC effector nleH from enteropathogenic E. coli. We demonstrate that representatives of these new effectors are translocated and that SboH, similarly to NleH, blocks intrinsic apoptotic pathways while being targeted to the mitochondria by the SopA part of the fusion. This work suggests that S. bongori has inherited the ancestral Salmonella virulence gene set, but has adapted by incorporating virulence determinants that resemble those employed by EPEC.

    Funded by: Medical Research Council; Wellcome Trust: 076964

    PLoS pathogens 2011;7;8;e1002191

  • Assessment of a 44 gene classifier for the evaluation of chronic fatigue syndrome from peripheral blood mononuclear cell gene expression.

    Frampton D, Kerr J, Harrison TJ and Kellam P

    Department of Infection, Division of Infection and Immunity, University College London, London, United Kingdom.

    Chronic fatigue syndrome (CFS) is a clinically defined illness estimated to affect millions of people worldwide causing significant morbidity and an annual cost of billions of dollars. Currently there are no laboratory-based diagnostic methods for CFS. However, differences in gene expression profiles between CFS patients and healthy persons have been reported in the literature. Using mRNA relative quantities for 44 previously identified reporter genes taken from a large dataset comprising both CFS patients and healthy volunteers, we derived a gene profile scoring metric to accurately classify CFS and healthy samples. This metric out-performed any of the reporter genes used individually as a classifier of CFS.To determine whether the reporter genes were robust across populations, we applied this metric to classify a separate blind dataset of mRNA relative quantities from a new population of CFS patients and healthy persons with limited success. Although the metric was able to successfully classify roughly two-thirds of both CFS and healthy samples correctly, the level of misclassification was high. We conclude many of the previously identified reporter genes are study-specific and thus cannot be used as a broad CFS diagnostic.

    PloS one 2011;6;3;e16872

  • Clustered coding variants in the glutamate receptor complexes of individuals with schizophrenia and bipolar disorder.

    Frank RA, McRae AF, Pocklington AJ, van de Lagemaat LN, Navarro P, Croning MD, Komiyama NH, Bradley SJ, Challiss RA, Armstrong JD, Finn RD, Malloy MP, MacLean AW, Harris SE, Starr JM, Bhaskar SS, Howard EK, Hunt SE, Coffey AJ, Ranganath V, Deloukas P, Rogers J, Muir WJ, Deary IJ, Blackwood DH, Visscher PM and Grant SG

    Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire, United Kingdom.

    Current models of schizophrenia and bipolar disorder implicate multiple genes, however their biological relationships remain elusive. To test the genetic role of glutamate receptors and their interacting scaffold proteins, the exons of ten glutamatergic 'hub' genes in 1304 individuals were re-sequenced in case and control samples. No significant difference in the overall number of non-synonymous single nucleotide polymorphisms (nsSNPs) was observed between cases and controls. However, cluster analysis of nsSNPs identified two exons encoding the cysteine-rich domain and first transmembrane helix of GRM1 as a risk locus with five mutations highly enriched within these domains. A new splice variant lacking the transmembrane GPCR domain of GRM1 was discovered in the human brain and the GRM1 mutation cluster could perturb the regulation of this variant. The predicted effect on individuals harbouring multiple mutations distributed in their ten hub genes was also examined. Diseased individuals possessed an increased load of deleteriousness from multiple concurrent rare and common coding variants. Together, these data suggest a disease model in which the interplay of compound genetic coding variants, distributed among glutamate receptors and their interacting proteins, contribute to the pathogenesis of schizophrenia and bipolar disorders.

    Funded by: Chief Scientist Office: CZB/4/505, ETM/55; Medical Research Council: G0700704, MC_U127592696; Wellcome Trust

    PloS one 2011;6;4;e19011

  • Perilipin deficiency and autosomal dominant partial lipodystrophy.

    Gandotra S, Le Dour C, Bottomley W, Cervera P, Giral P, Reznik Y, Charpentier G, Auclair M, Delépine M, Barroso I, Semple RK, Lathrop M, Lascols O, Capeau J, O'Rahilly S, Magré J, Savage DB and Vigouroux C

    University of Cambridge Metabolic Research Laboratories, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge, United Kingdom.

    Perilipin is the most abundant adipocyte-specific protein that coats lipid droplets, and it is required for optimal lipid incorporation and release from the droplet. We identified two heterozygous frameshift mutations in the perilipin gene (PLIN1) in three families with partial lipodystrophy, severe dyslipidemia, and insulin-resistant diabetes. Subcutaneous fat from the patients was characterized by smaller-than-normal adipocytes, macrophage infiltration, and fibrosis. In contrast to wild-type perilipin, mutant forms of the protein failed to increase triglyceride accumulation when expressed heterologously in preadipocytes. These findings define a novel dominant form of inherited lipodystrophy and highlight the serious metabolic consequences of a primary defect in the formation of lipid droplets in adipose tissue.

    Funded by: Medical Research Council; Wellcome Trust: 077016, 077016/Z/05/Z, 091551, 095515

    The New England journal of medicine 2011;364;8;740-8

  • Meticillin-resistant Staphylococcus aureus with a novel mecA homologue in human and bovine populations in the UK and Denmark: a descriptive study.

    García-Álvarez L, Holden MT, Lindsay H, Webb CR, Brown DF, Curran MD, Walpole E, Brooks K, Pickard DJ, Teale C, Parkhill J, Bentley SD, Edwards GF, Girvan EK, Kearns AM, Pichon B, Hill RL, Larsen AR, Skov RL, Peacock SJ, Maskell DJ and Holmes MA

    Department of Veterinary Medicine, University of Cambridge, UK.

    Background: Animals can act as a reservoir and source for the emergence of novel meticillin-resistant Staphylococcus aureus (MRSA) clones in human beings. Here, we report the discovery of a strain of S aureus (LGA251) isolated from bulk milk that was phenotypically resistant to meticillin but tested negative for the mecA gene and a preliminary investigation of the extent to which such strains are present in bovine and human populations.

    Methods: Isolates of bovine MRSA were obtained from the Veterinary Laboratories Agency in the UK, and isolates of human MRSA were obtained from diagnostic or reference laboratories (two in the UK and one in Denmark). From these collections, we searched for mecA PCR-negative bovine and human S aureus isolates showing phenotypic meticillin resistance. We used whole-genome sequencing to establish the genetic basis for the observed antibiotic resistance.

    Findings: A divergent mecA homologue (mecA(LGA251)) was discovered in the LGA251 genome located in a novel staphylococcal cassette chromosome mec element, designated type-XI SCCmec. The mecA(LGA251) was 70% identical to S aureus mecA homologues and was initially detected in 15 S aureus isolates from dairy cattle in England. These isolates were from three different multilocus sequence type lineages (CC130, CC705, and ST425); spa type t843 (associated with CC130) was identified in 60% of bovine isolates. When human mecA-negative MRSA isolates were tested, the mecA(LGA251) homologue was identified in 12 of 16 isolates from Scotland, 15 of 26 from England, and 24 of 32 from Denmark. As in cows, t843 was the most common spa type detected in human beings.

    Interpretation: Although routine culture and antimicrobial susceptibility testing will identify S aureus isolates with this novel mecA homologue as meticillin resistant, present confirmatory methods will not identify them as MRSA. New diagnostic guidelines for the detection of MRSA should consider the inclusion of tests for mecA(LGA251).

    Funding: Department for Environment, Food and Rural Affairs, Higher Education Funding Council for England, Isaac Newton Trust (University of Cambridge), and the Wellcome Trust.

    Funded by: Wellcome Trust

    The Lancet. Infectious diseases 2011;11;8;595-603

  • RNIE: genome-wide prediction of bacterial intrinsic terminators.

    Gardner PP, Barquist L, Bateman A, Nawrocki EP and Weinberg Z

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA0, UK.

    Bacterial Rho-independent terminators (RITs) are important genomic landmarks involved in gene regulation and terminating gene expression. In this investigation we present RNIE, a probabilistic approach for predicting RITs. The method is based upon covariance models which have been known for many years to be the most accurate computational tools for predicting homology in structural non-coding RNAs. We show that RNIE has superior performance in model species from a spectrum of bacterial phyla. Further analysis of species where a low number of RITs were predicted revealed a highly conserved structural sequence motif enriched near the genic termini of the pathogenic Actinobacteria, Mycobacterium tuberculosis. This motif, together with classical RITs, account for up to 90% of all the significantly structured regions from the termini of M. tuberculosis genic elements. The software, predictions and alignments described below are available from

    Funded by: Howard Hughes Medical Institute

    Nucleic acids research 2011;39;14;5845-52

  • Analysis of XMRV integration sites from human prostate cancer tissues suggests PCR contamination rather than genuine human infection.

    Garson JA, Kellam P and Towers GJ

    MRC Centre for Medical Molecular Virology, Division of Infection and Immunity, University College London, 46 Cleveland St, London W1T 4JF, UK.

    XMRV is a gammaretrovirus associated in some studies with human prostate cancer and chronic fatigue syndrome. Central to the hypothesis of XMRV as a human pathogen is the description of integration sites in DNA from prostate tumour tissues. Here we demonstrate that 2 of 14 patient-derived sites are identical to sites cloned in the same laboratory from experimentally infected DU145 cells. Identical integration sites have never previously been described in any retrovirus infection. We propose that the patient-derived sites are the result of PCR contamination. This observation further undermines the notion that XMRV is a genuine human pathogen.

    Funded by: Medical Research Council: G0801172, G9721629; Wellcome Trust: 090940, WT076608

    Retrovirology 2011;8;13

  • New gene functions in megakaryopoiesis and platelet formation.

    Gieger C, Radhakrishnan A, Cvejic A, Tang W, Porcu E, Pistis G, Serbanovic-Canic J, Elling U, Goodall AH, Labrune Y, Lopez LM, Mägi R, Meacham S, Okada Y, Pirastu N, Sorice R, Teumer A, Voss K, Zhang W, Ramirez-Solis R, Bis JC, Ellinghaus D, Gögele M, Hottenga JJ, Langenberg C, Kovacs P, O'Reilly PF, Shin SY, Esko T, Hartiala J, Kanoni S, Murgia F, Parsa A, Stephens J, van der Harst P, Ellen van der Schoot C, Allayee H, Attwood A, Balkau B, Bastardot F, Basu S, Baumeister SE, Biino G, Bomba L, Bonnefond A, Cambien F, Chambers JC, Cucca F, D'Adamo P, Davies G, de Boer RA, de Geus EJ, Döring A, Elliott P, Erdmann J, Evans DM, Falchi M, Feng W, Folsom AR, Frazer IH, Gibson QD, Glazer NL, Hammond C, Hartikainen AL, Heckbert SR, Hengstenberg C, Hersch M, Illig T, Loos RJ, Jolley J, Khaw KT, Kühnel B, Kyrtsonis MC, Lagou V, Lloyd-Jones H, Lumley T, Mangino M, Maschio A, Mateo Leach I, McKnight B, Memari Y, Mitchell BD, Montgomery GW, Nakamura Y, Nauck M, Navis G, Nöthlings U, Nolte IM, Porteous DJ, Pouta A, Pramstaller PP, Pullat J, Ring SM, Rotter JI, Ruggiero D, Ruokonen A, Sala C, Samani NJ, Sambrook J, Schlessinger D, Schreiber S, Schunkert H, Scott J, Smith NL, Snieder H, Starr JM, Stumvoll M, Takahashi A, Tang WH, Taylor K, Tenesa A, Lay Thein S, Tönjes A, Uda M, Ulivi S, van Veldhuisen DJ, Visscher PM, Völker U, Wichmann HE, Wiggins KL, Willemsen G, Yang TP, Hua Zhao J, Zitting P, Bradley JR, Dedoussis GV, Gasparini P, Hazen SL, Metspalu A, Pirastu M, Shuldiner AR, Joost van Pelt L, Zwaginga JJ, Boomsma DI, Deary IJ, Franke A, Froguel P, Ganesh SK, Jarvelin MR, Martin NG, Meisinger C, Psaty BM, Spector TD, Wareham NJ, Akkerman JW, Ciullo M, Deloukas P, Greinacher A, Jupe S, Kamatani N, Khadake J, Kooner JS, Penninger J, Prokopenko I, Stemple D, Toniolo D, Wernisch L, Sanna S, Hicks AA, Rendon A, Ferreira MA, Ouwehand WH and Soranzo N

    Institute of Genetic Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstr 1, 85764 Neuherberg, Germany.

    Platelets are the second most abundant cell type in blood and are essential for maintaining haemostasis. Their count and volume are tightly controlled within narrow physiological ranges, but there is only limited understanding of the molecular processes controlling both traits. Here we carried out a high-powered meta-analysis of genome-wide association studies (GWAS) in up to 66,867 individuals of European ancestry, followed by extensive biological and functional assessment. We identified 68 genomic loci reliably associated with platelet count and volume mapping to established and putative novel regulators of megakaryopoiesis and platelet formation. These genes show megakaryocyte-specific gene expression patterns and extensive network connectivity. Using gene silencing in Danio rerio and Drosophila melanogaster, we identified 11 of the genes as novel regulators of blood cell formation. Taken together, our findings advance understanding of novel gene functions controlling fate-determining events during megakaryopoiesis and platelet formation, providing a new example of successful translation of GWAS to function.

    Funded by: Austrian Science Fund FWF: I 434; Biotechnology and Biological Sciences Research Council: BB/F019394/1; British Heart Foundation: RG/09/012/28096; Chief Scientist Office: CZB/4/505, ETM/55; Medical Research Council: G0000111, G0401527, G0601966, G0700704, G0700931, G0701120, G0701863, G0801056, G1000143, MC_U105260799, MC_U106179471, MC_U106188470; NCRR NIH HHS: K12 RR023250, K12 RR023250-05, M01 RR016500, M01 RR016500-08, U54 RR020278, U54 RR020278-06, UL1 RR025005, UL1 RR025005-05; NHGRI NIH HHS: P41 HG003751; NHLBI NIH HHS: N01 HC055015, N01 HC055016, N01 HC055018, N01 HC055019, N01 HC055020, N01 HC055021, N01 HC055022, N01 HC085079, P01 HL076491, P01 HL076491-09, P01 HL098055, P01 HL098055-03, R01 HL059367, R01 HL059367-11, R01 HL068986, R01 HL068986-06, R01 HL073410, R01 HL073410-08, R01 HL085251, R01 HL085251-04, R01 HL086694, R01 HL086694-05, R01 HL087641, R01 HL087641-03, R01 HL087679-03, R01 HL088119, R01 HL088119-04, R01 HL103866, R01 HL103866-03, R01 HL105756, U01 HL072515, U01 HL072515-06, U01 HL084756, U01 HL084756-03; NIA NIH HHS: R01 AG018728, R01 AG018728-05S1; NICHD NIH HHS: R01 HD042157, R01 HD042157-01A1; NIDDK NIH HHS: P30 DK072488, P30 DK072488-08; NIGMS NIH HHS: R01 GM053275, R01 GM053275-14, U01 GM074518, U01 GM074518-04; NIMH NIH HHS: RL1 MH083268, RL1 MH083268-05; Wellcome Trust: 092731, 098051, WT077037/Z/05/Z, WT077047/Z/05/Z, WT082597/Z/07/Z

    Nature 2011;480;7376;201-8

  • Common variants near ATM are associated with glycemic response to metformin in type 2 diabetes.

    GoDARTS and UKPDS Diabetes Pharmacogenetics Study Group, Wellcome Trust Case Control Consortium 2, Zhou K, Bellenguez C, Spencer CC, Bennett AJ, Coleman RL, Tavendale R, Hawley SA, Donnelly LA, Schofield C, Groves CJ, Burch L, Carr F, Strange A, Freeman C, Blackwell JM, Bramon E, Brown MA, Casas JP, Corvin A, Craddock N, Deloukas P, Dronov S, Duncanson A, Edkins S, Gray E, Hunt S, Jankowski J, Langford C, Markus HS, Mathew CG, Plomin R, Rautanen A, Sawcer SJ, Samani NJ, Trembath R, Viswanathan AC, Wood NW, MAGIC investigators, Harries LW, Hattersley AT, Doney AS, Colhoun H, Morris AD, Sutherland C, Hardie DG, Peltonen L, McCarthy MI, Holman RR, Palmer CN, Donnelly P and Pearson ER

    Biomedical Research Institute, University of Dundee, Dundee, UK.

    Metformin is the most commonly used pharmacological therapy for type 2 diabetes. We report a genome-wide association study for glycemic response to metformin in 1,024 Scottish individuals with type 2 diabetes with replication in two cohorts including 1,783 Scottish individuals and 1,113 individuals from the UK Prospective Diabetes Study. In a combined meta-analysis, we identified a SNP, rs11212617, associated with treatment success (n = 3,920, P = 2.9 × 10(-9), odds ratio = 1.35, 95% CI 1.22-1.49) at a locus containing ATM, the ataxia telangiectasia mutated gene. In a rat hepatoma cell line, inhibition of ATM with KU-55933 attenuated the phosphorylation and activation of AMP-activated protein kinase in response to metformin. We conclude that ATM, a gene known to be involved in DNA repair and cell cycle control, plays a role in the effect of metformin upstream of AMP-activated protein kinase, and variation in this gene alters glycemic response to metformin.

    Funded by: Chief Scientist Office; Department of Health: PDA/02/06/016; Medical Research Council: G0601261, G0901310, G19/2; Wellcome Trust: 084726, 084726/Z/08/Z, 085475/B/08/Z, 085475/Z/08/Z

    Nature genetics 2011;43;2;117-20

  • Transition of Plasmodium sporozoites into liver stage-like forms is regulated by the RNA binding protein Pumilio.

    Gomes-Santos CS, Braks J, Prudêncio M, Carret C, Gomes AR, Pain A, Feltwell T, Khan S, Waters A, Janse C, Mair GR and Mota MM

    Malaria Unit, Instituto de Medicina Molecular, Lisboa, Portugal.

    Many eukaryotic developmental and cell fate decisions that are effected post-transcriptionally involve RNA binding proteins as regulators of translation of key mRNAs. In malaria parasites (Plasmodium spp.), the development of round, non-motile and replicating exo-erythrocytic liver stage forms from slender, motile and cell-cycle arrested sporozoites is believed to depend on environmental changes experienced during the transmission of the parasite from the mosquito vector to the vertebrate host. Here we identify a Plasmodium member of the RNA binding protein family PUF as a key regulator of this transformation. In the absence of Pumilio-2 (Puf2) sporozoites initiate EEF development inside mosquito salivary glands independently of the normal transmission-associated environmental cues. Puf2- sporozoites exhibit genome-wide transcriptional changes that result in loss of gliding motility, cell traversal ability and reduction in infectivity, and, moreover, trigger metamorphosis typical of early Plasmodium intra-hepatic development. These data demonstrate that Puf2 is a key player in regulating sporozoite developmental control, and imply that transformation of salivary gland-resident sporozoites into liver stage-like parasites is regulated by a post-transcriptional mechanism.

    Funded by: Wellcome Trust: 083811

    PLoS pathogens 2011;7;5;e1002046

  • No evidence of XMRV or related retroviruses in a London HIV-1-positive patient cohort.

    Gray ER, Garson JA, Breuer J, Edwards S, Kellam P, Pillay D and Towers GJ

    Department of Infection and Immunity, University College London, London, United Kingdom.

    Background: Several studies have implicated a recently discovered gammaretrovirus, XMRV (Xenotropic murine leukaemia virus-related virus), in chronic fatigue syndrome and prostate cancer, though whether as causative agent or opportunistic infection is unclear. It has also been suggested that the virus can be found circulating amongst the general population. The discovery has been controversial, with conflicting results from attempts to reproduce the original studies.

    Methodology/principal findings: We extracted peripheral blood DNA from a cohort of 540 HIV-1-positive patients (approximately 20% of whom have never been on anti-retroviral treatment) and determined the presence of XMRV and related viruses using TaqMan PCR. While we were able to amplify as few as 5 copies of positive control DNA, we did not find any positive samples in the patient cohort.

    Conclusions/significance: In view of these negative findings in this highly susceptible group, we conclude that it is unlikely that XMRV or related viruses are circulating at a significant level, if at all, in HIV-1-positive patients in London or in the general population.

    Funded by: Department of Health; Medical Research Council: G0801172, G9721629; Wellcome Trust: 090940

    PloS one 2011;6;3;e18096

  • A worldwide analysis of beta-defensin copy number variation suggests recent selection of a high-expressing DEFB103 gene copy in East Asia.

    Hardwick RJ, Machado LR, Zuccherato LW, Antolinos S, Xue Y, Shawa N, Gilman RH, Cabrera L, Berg DE, Tyler-Smith C, Kelly P, Tarazona-Santos E and Hollox EJ

    Department of Genetics, University of Leicester, University Road, Leicester, United Kingdom.

    Beta-defensins are a family of multifunctional genes with roles in defense against pathogens, reproduction, and pigmentation. In humans, six beta-defensin genes are clustered in a repeated region which is copy-number variable (CNV) as a block, with a diploid copy number between 1 and 12. The role in host defense makes the evolutionary history of this CNV particularly interesting, because morbidity due to infectious disease is likely to have been an important selective force in human evolution, and to have varied between geographical locations. Here, we show CNV of the beta-defensin region in chimpanzees, and identify a beta-defensin block in the human lineage that contains rapidly evolving noncoding regulatory sequences. We also show that variation at one of these rapidly evolving sequences affects expression levels and cytokine responsiveness of DEFB103, a key inhibitor of influenza virus fusion at the cell surface. A worldwide analysis of beta-defensin CNV in 67 populations shows an unusually high frequency of high-DEFB103-expressing copies in East Asia, the geographical origin of historical and modern influenza epidemics, possibly as a result of selection for increased resistance to influenza in this region.

    Funded by: Medical Research Council: G0801123, GO801123; Wellcome Trust: 067948, 077009, 087663

    Human mutation 2011;32;7;743-50

  • Exome sequencing identifies a missense mutation in Isl1 associated with low penetrance otitis media in dearisch mice.

    Hilton JM, Lewis MA, Grati M, Ingham N, Pearson S, Laskowski RA, Adams DJ and Steel KP

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    Background: Inflammation of the middle ear (otitis media) is very common and can lead to serious complications if not resolved. Genetic studies suggest an inherited component, but few of the genes that contribute to this condition are known. Mouse mutants have contributed significantly to the identification of genes predisposing to otitis media

    Results: The dearisch mouse mutant is an ENU-induced mutant detected by its impaired Preyer reflex (ear flick in response to sound). Auditory brainstem responses revealed raised thresholds from as early as three weeks old. Pedigree analysis suggested a dominant but partially penetrant mode of inheritance. The middle ear of dearisch mutants shows a thickened mucosa and cellular effusion suggesting chronic otitis media with effusion with superimposed acute infection. The inner ear, including the sensory hair cells, appears normal. Due to the low penetrance of the phenotype, normal backcross mapping of the mutation was not possible. Exome sequencing was therefore employed to identify a non-conservative tyrosine to cysteine (Y71C) missense mutation in the Islet1 gene, Isl1(Drsh). Isl1 is expressed in the normal middle ear mucosa. The findings suggest the Isl1(Drsh) mutation is likely to predispose carriers to otitis media.

    Conclusions: Dearisch, Isl1(Drsh), represents the first point mutation in the mouse Isl1 gene and suggests a previously unrecognized role for this gene. It is also the first recorded exome sequencing of the C3HeB/FeJ background relevant to many ENU-induced mutants. Most importantly, the power of exome resequencing to identify ENU-induced mutations without a mapped gene locus is illustrated.

    Funded by: Medical Research Council: G0300212, G0800024; Wellcome Trust: 077189

    Genome biology 2011;12;9;R90

  • A very early-branching Staphylococcus aureus lineage lacking the carotenoid pigment staphyloxanthin.

    Holt DC, Holden MT, Tong SY, Castillo-Ramirez S, Clarke L, Quail MA, Currie BJ, Parkhill J, Bentley SD, Feil EJ and Giffard PM

    Menzies School of Health Research, Charles Darwin University, Darwin, Northern Territory, Australia.

    Here we discuss the evolution of the northern Australian Staphylococcus aureus isolate MSHR1132 genome. MSHR1132 belongs to the divergent clonal complex 75 lineage. The average nucleotide divergence between orthologous genes in MSHR1132 and typical S. aureus is approximately sevenfold greater than the maximum divergence observed in this species to date. MSHR1132 has a small accessory genome, which includes the well-characterized genomic islands, νSAα and νSaβ, suggesting that these elements were acquired well before the expansion of the typical S. aureus population. Other mobile elements show mosaic structure (the prophage ϕSa3) or evidence of recent acquisition from a typical S. aureus lineage (SCCmec, ICE6013 and plasmid pMSHR1132). There are two differences in gene repertoire compared with typical S. aureus that may be significant clues as to the genetic basis underlying the successful emergence of S. aureus as a pathogen. First, MSHR1132 lacks the genes for production of staphyloxanthin, the carotenoid pigment that confers upon S. aureus its characteristic golden color and protects against oxidative stress. The lack of pigment was demonstrated in 126 of 126 CC75 isolates. Second, a mobile clustered regularly interspaced short palindromic repeat (CRISPR) element is inserted into orfX of MSHR1132. Although common in other staphylococcal species, these elements are very rare within S. aureus and may impact accessory genome acquisition. The CRISPR spacer sequences reveal a history of attempted invasion by known S. aureus mobile elements. There is a case for the creation of a new taxon to accommodate this and related isolates.

    Genome biology and evolution 2011;3;881-95

  • A homozygous mutant embryonic stem cell bank applicable for phenotype-driven genetic screening.

    Horie K, Kokubu C, Yoshida J, Akagi K, Isotani A, Oshitani A, Yusa K, Ikeda R, Huang Y, Bradley A and Takeda J

    Department of Social and Environmental Medicine, Graduate School of Medicine, Osaka University, Suita, Osaka, Japan.

    Genome-wide mutagenesis in mouse embryonic stem cells (ESCs) is a powerful tool, but the diploid nature of the mammalian genome hampers its application for recessive genetic screening. We have previously reported a method to induce homozygous mutant ESCs from heterozygous mutants by tetracycline-dependent transient disruption of the Bloom's syndrome gene. However, we could not purify homozygous mutants from a large population of heterozygous mutant cells, limiting the applications. Here we developed a strategy for rapid enrichment of homozygous mutant mouse ESCs and demonstrated its feasibility for cell-based phenotypic analysis. The method uses G418-plus-puromycin double selection to enrich for homozygotes and single-nucleotide polymorphism analysis for identification of homozygosity. We combined this simple approach with gene-trap mutagenesis to construct a homozygous mutant ESC bank with 138 mutant lines and demonstrate its use in phenotype-driven genetic screening.

    Nature methods 2011;8;12;1071-7

  • Exploration of signals of positive selection derived from genotype-based human genome scans using re-sequencing data.

    Hu M, Ayub Q, Guerra-Assunção JA, Long Q, Ning Z, Huang N, Romero IG, Mamanova L, Akan P, Liu X, Coffey AJ, Turner DJ, Swerdlow H, Burton J, Quail MA, Conrad DF, Enright AJ, Tyler-Smith C and Xue Y

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK.

    We have investigated whether regions of the genome showing signs of positive selection in scans based on haplotype structure also show evidence of positive selection when sequence-based tests are applied, whether the target of selection can be localized more precisely, and whether such extra evidence can lead to increased biological insights. We used two tools: simulations under neutrality or selection, and experimental investigation of two regions identified by the HapMap2 project as putatively selected in human populations. Simulations suggested that neutral and selected regions should be readily distinguished and that it should be possible to localize the selected variant to within 40 kb at least half of the time. Re-sequencing of two ~300 kb regions (chr4:158Mb and chr10:22Mb) lacking known targets of selection in HapMap CHB individuals provided strong evidence for positive selection within each and suggested the micro-RNA gene hsa-miR-548c as the best candidate target in one region, and changes in regulation of the sperm protein gene SPAG6 in the other.

    Funded by: Wellcome Trust: 077009

    Human genetics 2011;131;5;665-74

  • An activating mutation of AKT2 and human hypoglycemia.

    Hussain K, Challis B, Rocha N, Payne F, Minic M, Thompson A, Daly A, Scott C, Harris J, Smillie BJ, Savage DB, Ramaswami U, De Lonlay P, O'Rahilly S, Barroso I and Semple RK

    Clinical and Molecular Genetics Unit, Developmental Endocrinology Research Group, Institute of Child Health, University College London, London WC1N 1EH, UK.

    Pathological fasting hypoglycemia in humans is usually explained by excessive circulating insulin or insulin-like molecules or by inborn errors of metabolism impairing liver glucose production. We studied three unrelated children with unexplained, recurrent, and severe fasting hypoglycemia and asymmetrical growth. All were found to carry the same de novo mutation, p.Glu17Lys, in the serine/threonine kinase AKT2, in two cases as heterozygotes and in one case in mosaic form. In heterologous cells, the mutant AKT2 was constitutively recruited to the plasma membrane, leading to insulin-independent activation of downstream signaling. Thus, systemic metabolic disease can result from constitutive, cell-autonomous activation of signaling pathways normally controlled by insulin.

    Funded by: Medical Research Council: G0502115; Wellcome Trust: 077016, 077016/Z/05/Z, 078986, 078986/Z/06/Z, 080952, 080952/Z/06/Z, 091551, 091551/Z/10/Z, 095515

    Science (New York, N.Y.) 2011;334;6055;474

  • Large-scale gene-centric analysis identifies novel variants for coronary artery disease.

    IBC 50K CAD Consortium

    Coronary artery disease (CAD) has a significant genetic contribution that is incompletely characterized. To complement genome-wide association (GWA) studies, we conducted a large and systematic candidate gene study of CAD susceptibility, including analysis of many uncommon and functional variants. We examined 49,094 genetic variants in ∼2,100 genes of cardiovascular relevance, using a customised gene array in 15,596 CAD cases and 34,992 controls (11,202 cases and 30,733 controls of European descent; 4,394 cases and 4,259 controls of South Asian origin). We attempted to replicate putative novel associations in an additional 17,121 CAD cases and 40,473 controls. Potential mechanisms through which the novel variants could affect CAD risk were explored through association tests with vascular risk factors and gene expression. We confirmed associations of several previously known CAD susceptibility loci (eg, 9p21.3:p<10(-33); LPA:p<10(-19); 1p13.3:p<10(-17)) as well as three recently discovered loci (COL4A1/COL4A2, ZC3HC1, CYP17A1:p<5×10(-7)). However, we found essentially null results for most previously suggested CAD candidate genes. In our replication study of 24 promising common variants, we identified novel associations of variants in or near LIPA, IL5, TRIB1, and ABCG5/ABCG8, with per-allele odds ratios for CAD risk with each of the novel variants ranging from 1.06-1.09. Associations with variants at LIPA, TRIB1, and ABCG5/ABCG8 were supported by gene expression data or effects on lipid levels. Apart from the previously reported variants in LPA, none of the other ∼4,500 low frequency and functional variants showed a strong effect. Associations in South Asians did not differ appreciably from those in Europeans, except for 9p21.3 (per-allele odds ratio: 1.14 versus 1.27 respectively; P for heterogeneity = 0.003). This large-scale gene-centric analysis has identified several novel genes for CAD that relate to diverse biochemical and cellular functions and clarified the literature with regard to many previously suggested genes.

    Funded by: British Heart Foundation: RG/08/014/24067, RG/09/12/28096; Medical Research Council: G0401527, G0601966, G0700931, G0701863, G0801056, G1000143, MC_U105260792, MC_U106179471, MC_U137686857; NHLBI NIH HHS: R01 HL087647; Wellcome Trust: 090532

    PLoS genetics 2011;7;9;e1002260

  • Distinguishing driver and passenger mutations in an evolutionary history categorized by interference.

    Illingworth CJ and Mustonen V

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    In many biological scenarios, from the development of drug resistance in pathogens to the progression of healthy cells toward cancer, quantifying the selection acting on observed mutations is a central question. One difficulty in answering this question is the complexity of the background upon which mutations can arise, with multiple potential interactions between genetic loci. We here present a method for discerning selection from a population history that accounts for interference between mutations. Given sequences sampled from multiple time points in the history of a population, we infer selection at each locus by maximizing a likelihood function derived from a multilocus evolution model. We apply the method to the question of distinguishing between loci where new mutations are under positive selection (drivers) and loci that emit neutral mutations (passengers) in a Wright-Fisher model of evolution. Relative to an otherwise equivalent method in which the genetic background of mutations was ignored, our method inferred selection coefficients more accurately for both driver mutations evolving under clonal interference and passenger mutations reaching fixation in the population through genetic drift or hitchhiking. In a population history recorded by 750 sets of sequences of 100 individuals taken at intervals of 100 generations, a set of 50 loci were divided into drivers and passengers with a mean accuracy of >0.95 across a range of numbers of driver loci. The potential application of our model, either in full or in part, to a range of biological systems, is discussed.

    Funded by: Wellcome Trust: 091747

    Genetics 2011;189;3;989-1000

  • Quantifying selection acting on a complex trait using allele frequency time series data.

    Illingworth CJ, Parts L, Schiffels S, Liti G and Mustonen V

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    When selection is acting on a large genetically diverse population, beneficial alleles increase in frequency. This fact can be used to map quantitative trait loci by sequencing the pooled DNA from the population at consecutive time points and observing allele frequency changes. Here, we present a population genetic method to analyze time series data of allele frequencies from such an experiment. Beginning with a range of proposed evolutionary scenarios, the method measures the consistency of each with the observed frequency changes. Evolutionary theory is utilized to formulate equations of motion for the allele frequencies, following which likelihoods for having observed the sequencing data under each scenario are derived. Comparison of these likelihoods gives an insight into the prevailing dynamics of the system under study. We illustrate the method by quantifying selective effects from an experiment, in which two phenotypically different yeast strains were first crossed and then propagated under heat stress (Parts L, Cubillos FA, Warringer J, et al. [14 co-authors]. 2011. Revealing the genetic structure of a trait by sequencing a population under selection. Genome Res). From these data, we discover that about 6% of polymorphic sites evolve nonneutrally under heat stress conditions, either because of their linkage to beneficial (driver) alleles or because they are drivers themselves. We further identify 44 genomic regions containing one or more candidate driver alleles, quantify their apparent selective advantage, obtain estimates of recombination rates within the regions, and show that the dynamics of the drivers display a strong signature of selection going beyond additive models. Our approach is applicable to study adaptation in a range of systems under different evolutionary pressures.

    Funded by: Wellcome Trust: 098051, WT077192/Z/05/Z

    Molecular biology and evolution 2011;29;4;1187-97

  • Design and cohort description of the InterAct Project: an examination of the interaction of genetic and lifestyle factors on the incidence of type 2 diabetes in the EPIC Study.

    InterAct Consortium, Langenberg C, Sharp S, Forouhi NG, Franks PW, Schulze MB, Kerrison N, Ekelund U, Barroso I, Panico S, Tormo MJ, Spranger J, Griffin S, van der Schouw YT, Amiano P, Ardanaz E, Arriola L, Balkau B, Barricarte A, Beulens JW, Boeing H, Bueno-de-Mesquita HB, Buijsse B, Chirlaque Lopez MD, Clavel-Chapelon F, Crowe FL, de Lauzon-Guillan B, Deloukas P, Dorronsoro M, Drogan D, Froguel P, Gonzalez C, Grioni S, Groop L, Groves C, Hainaut P, Halkjaer J, Hallmans G, Hansen T, Huerta Castaño JM, Kaaks R, Key TJ, Khaw KT, Koulman A, Mattiello A, Navarro C, Nilsson P, Norat T, Overvad K, Palla L, Palli D, Pedersen O, Peeters PH, Quirós JR, Ramachandran A, Rodriguez-Suarez L, Rolandsson O, Romaguera D, Romieu I, Sacerdote C, Sánchez MJ, Sandbaek A, Slimani N, Sluijs I, Spijkerman AM, Teucher B, Tjonneland A, Tumino R, van der A DL, Verschuren WM, Tuomilehto J, Feskens E, McCarthy M, Riboli E and Wareham NJ

    Medical Research Council Epidemiology Unit, Institute of Metabolic Science, Addenbrooke’s Hospital, Box 285, Cambridge CB2 0QQ, UK e-mail:

    Aims/hypothesis: Studying gene-lifestyle interaction may help to identify lifestyle factors that modify genetic susceptibility and uncover genetic loci exerting important subgroup effects. Adequately powered studies with prospective, unbiased, standardised assessment of key behavioural factors for gene-lifestyle studies are lacking. This case-cohort study aims to investigate how genetic and potentially modifiable lifestyle and behavioural factors, particularly diet and physical activity, interact in their influence on the risk of developing type 2 diabetes.

    Methods: Incident cases of type 2 diabetes occurring in European Prospective Investigation into Cancer and Nutrition (EPIC) cohorts between 1991 and 2007 from eight of the ten EPIC countries were ascertained and verified. Prentice-weighted Cox regression and random-effects meta-analyses were used to investigate differences in diabetes incidence by age and sex.

    Results: A total of 12,403 verified incident cases of type 2 diabetes occurred during 3.99 million person-years of follow-up of 340,234 EPIC participants eligible for InterAct. We defined a centre-stratified subcohort of 16,154 individuals for comparative analyses. Individuals with incident diabetes who were randomly selected into the subcohort (n = 778) were included as cases in the analyses. All prevalent diabetes cases were excluded from the study. InterAct cases were followed-up for an average of 6.9 years; 49.7% were men. Mean baseline age and age at diagnosis were 55.6 and 62.5 years, mean BMI and waist circumference values were 29.4 kg/m(2) and 102.7 cm in men, and 30.1 kg/m(2) and 92.8 cm in women, respectively. Risk of type 2 diabetes increased linearly with age, with an overall HR of 1.56 (95% CI 1.48-1.64) for a 10 year age difference, adjusted for sex. A male excess in the risk of incident diabetes was consistently observed across all countries, with a pooled HR of 1.51 (95% CI 1.39-1.64), adjusted for age.

    Conclusions/interpretation: InterAct is a large, well-powered, prospective study that will inform our understanding of the interplay between genes and lifestyle factors on the risk of type 2 diabetes development.

    Funded by: Canadian Institutes of Health Research: G0601261; Cancer Research UK; Medical Research Council: G0401527, G0601261, G1000143, MC_EX_G0800783, MC_U106179471, MC_U106179473, MC_U106179474, MC_UP_A090_1006, MC_UP_A100_1003; Wellcome Trust: 083270/083270/z, 090532

    Diabetologia 2011;54;9;2272-82

  • Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk.

    International Consortium for Blood Pressure Genome-Wide Association Studies, Ehret GB, Munroe PB, Rice KM, Bochud M, Johnson AD, Chasman DI, Smith AV, Tobin MD, Verwoert GC, Hwang SJ, Pihur V, Vollenweider P, O'Reilly PF, Amin N, Bragg-Gresham JL, Teumer A, Glazer NL, Launer L, Zhao JH, Aulchenko Y, Heath S, Sõber S, Parsa A, Luan J, Arora P, Dehghan A, Zhang F, Lucas G, Hicks AA, Jackson AU, Peden JF, Tanaka T, Wild SH, Rudan I, Igl W, Milaneschi Y, Parker AN, Fava C, Chambers JC, Fox ER, Kumari M, Go MJ, van der Harst P, Kao WH, Sjögren M, Vinay DG, Alexander M, Tabara Y, Shaw-Hawkins S, Whincup PH, Liu Y, Shi G, Kuusisto J, Tayo B, Seielstad M, Sim X, Nguyen KD, Lehtimäki T, Matullo G, Wu Y, Gaunt TR, Onland-Moret NC, Cooper MN, Platou CG, Org E, Hardy R, Dahgam S, Palmen J, Vitart V, Braund PS, Kuznetsova T, Uiterwaal CS, Adeyemo A, Palmas W, Campbell H, Ludwig B, Tomaszewski M, Tzoulaki I, Palmer ND, CARDIoGRAM consortium, CKDGen Consortium, KidneyGen Consortium, EchoGen consortium, CHARGE-HF consortium, Aspelund T, Garcia M, Chang YP, O'Connell JR, Steinle NI, Grobbee DE, Arking DE, Kardia SL, Morrison AC, Hernandez D, Najjar S, McArdle WL, Hadley D, Brown MJ, Connell JM, Hingorani AD, Day IN, Lawlor DA, Beilby JP, Lawrence RW, Clarke R, Hopewell JC, Ongen H, Dreisbach AW, Li Y, Young JH, Bis JC, Kähönen M, Viikari J, Adair LS, Lee NR, Chen MH, Olden M, Pattaro C, Bolton JA, Köttgen A, Bergmann S, Mooser V, Chaturvedi N, Frayling TM, Islam M, Jafar TH, Erdmann J, Kulkarni SR, Bornstein SR, Grässler J, Groop L, Voight BF, Kettunen J, Howard P, Taylor A, Guarrera S, Ricceri F, Emilsson V, Plump A, Barroso I, Khaw KT, Weder AB, Hunt SC, Sun YV, Bergman RN, Collins FS, Bonnycastle LL, Scott LJ, Stringham HM, Peltonen L, Perola M, Vartiainen E, Brand SM, Staessen JA, Wang TJ, Burton PR, Soler Artigas M, Dong Y, Snieder H, Wang X, Zhu H, Lohman KK, Rudock ME, Heckbert SR, Smith NL, Wiggins KL, Doumatey A, Shriner D, Veldre G, Viigimaa M, Kinra S, Prabhakaran D, Tripathy V, Langefeld CD, Rosengren A, Thelle DS, Corsi AM, Singleton A, Forrester T, Hilton G, McKenzie CA, Salako T, Iwai N, Kita Y, Ogihara T, Ohkubo T, Okamura T, Ueshima H, Umemura S, Eyheramendy S, Meitinger T, Wichmann HE, Cho YS, Kim HL, Lee JY, Scott J, Sehmi JS, Zhang W, Hedblad B, Nilsson P, Smith GD, Wong A, Narisu N, Stančáková A, Raffel LJ, Yao J, Kathiresan S, O'Donnell CJ, Schwartz SM, Ikram MA, Longstreth WT, Mosley TH, Seshadri S, Shrine NR, Wain LV, Morken MA, Swift AJ, Laitinen J, Prokopenko I, Zitting P, Cooper JA, Humphries SE, Danesh J, Rasheed A, Goel A, Hamsten A, Watkins H, Bakker SJ, van Gilst WH, Janipalli CS, Mani KR, Yajnik CS, Hofman A, Mattace-Raso FU, Oostra BA, Demirkan A, Isaacs A, Rivadeneira F, Lakatta EG, Orru M, Scuteri A, Ala-Korpela M, Kangas AJ, Lyytikäinen LP, Soininen P, Tukiainen T, Würtz P, Ong RT, Dörr M, Kroemer HK, Völker U, Völzke H, Galan P, Hercberg S, Lathrop M, Zelenika D, Deloukas P, Mangino M, Spector TD, Zhai G, Meschia JF, Nalls MA, Sharma P, Terzic J, Kumar MV, Denniff M, Zukowska-Szczechowska E, Wagenknecht LE, Fowkes FG, Charchar FJ, Schwarz PE, Hayward C, Guo X, Rotimi C, Bots ML, Brand E, Samani NJ, Polasek O, Talmud PJ, Nyberg F, Kuh D, Laan M, Hveem K, Palmer LJ, van der Schouw YT, Casas JP, Mohlke KL, Vineis P, Raitakari O, Ganesh SK, Wong TY, Tai ES, Cooper RS, Laakso M, Rao DC, Harris TB, Morris RW, Dominiczak AF, Kivimaki M, Marmot MG, Miki T, Saleheen D, Chandak GR, Coresh J, Navis G, Salomaa V, Han BG, Zhu X, Kooner JS, Melander O, Ridker PM, Bandinelli S, Gyllensten UB, Wright AF, Wilson JF, Ferrucci L, Farrall M, Tuomilehto J, Pramstaller PP, Elosua R, Soranzo N, Sijbrands EJ, Altshuler D, Loos RJ, Shuldiner AR, Gieger C, Meneton P, Uitterlinden AG, Wareham NJ, Gudnason V, Rotter JI, Rettig R, Uda M, Strachan DP, Witteman JC, Hartikainen AL, Beckmann JS, Boerwinkle E, Vasan RS, Boehnke M, Larson MG, Järvelin MR, Psaty BM, Abecasis GR, Chakravarti A, Elliott P, van Duijn CM, Newton-Cheh C, Levy D, Caulfield MJ and Johnson T

    Blood pressure is a heritable trait influenced by several biological pathways and responsive to environmental stimuli. Over one billion people worldwide have hypertension (≥140 mm Hg systolic blood pressure or  ≥90 mm Hg diastolic blood pressure). Even small increments in blood pressure are associated with an increased risk of cardiovascular events. This genome-wide association study of systolic and diastolic blood pressure, which used a multi-stage design in 200,000 individuals of European descent, identified sixteen novel loci: six of these loci contain genes previously known or suspected to regulate blood pressure (GUCY1A3-GUCY1B3, NPR3-C5orf23, ADM, FURIN-FES, GOSR2, GNAS-EDN3); the other ten provide new clues to blood pressure physiology. A genetic risk score based on 29 genome-wide significant variants was associated with hypertension, left ventricular wall thickness, stroke and coronary artery disease, but not kidney disease or kidney function. We also observed associations with blood pressure in East Asian, South Asian and African ancestry individuals. Our findings provide new insights into the genetics and biology of blood pressure, and suggest potential novel therapeutic pathways for cardiovascular disease prevention.

    Funded by: CIHR: MOP-82810, MOP172605, MOP77682; AHRQ HHS: HS06516; Biotechnology and Biological Sciences Research Council: G20234; British Heart Foundation: CH/03/001, FS05/125, G0501942, PG/02/128, PG97012, PG97027, RG/07/005/23633, RG/07/008/23674, RG/08/008/25291, RG/08/013/25942, RG/08/014/24067, RG/98002, RG08/01, SP/04/002, SP/08/005/25115; Chief Scientist Office: CZB/4/276, CZB/4/710; FIC NIH HHS: R03 TW007165, TW008288, TW05596; Howard Hughes Medical Institute: 55005617; Intramural NIH HHS; Medical Research Council: G0000934, G0100222, G0400874, G0401527, G0500539, G0501942, G0600331, G0600705, G0601966, G0700931, G0701863, G0801056, G0902037, G0902313, G1000143, G19/35, G8802774, G9521010, G9521010D, MC_PC_U127527180, MC_PC_U127561128, MC_U106179471, MC_U106188470, MC_U123092720, MC_U123092723, MC_U127561128, MC_U137686857, MC_UP_A100_1003; NCI NIH HHS: 5U01CA086308, P01CA055075, P01CA087969, R01 CA094143; NCRR NIH HHS: 2M01RR010284, K12RR023250, M01 RR16500, M01-RR00425, RR-024156, RR20649, U54 RR020278, UL1RR025005; NHGRI NIH HHS: HG003054, HG005581, HHSN268200782096C, U01HG004399, U01HG004402, U01HG004415, U01HG004422, U01HG004423, U01HG004436, U01HG004438, U01HG004446, U01HG004726, U01HG004728, U01HG004729, U01HG004735, U01HG004738; NHLBI NIH HHS: 5R01HL086694-03, 5R01HL087679-02, 5R01HL08770002, HL 54512, HL-87660, HL043851, HL080025, HL084729, HL085144, HL086718, HL087647, HL098283, HL36310, HL45508, HL53353, HL54512, N01 HC-15103, N01 HC-55222, N01 HC-95159, N01 HC-95169, N01-HC-25195, N01-HC-35129, N01-HC-45133, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-75150, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85084, N01-HC-85085, N01-HC-85086, N01-HC-95160, N01-HC-95161, N01-HC-95162, N01-HC-95163, N01-HC-95164, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N02-HL-6-4278, R01 HL073410, R01 HL085251, R01 HL086694, R01 HL086694-03, R01 HL086694-04A1, R01 HL086694-05, R01 HL087652, R01 HL088119, R01 HL088120, R01 HL105756, R01HL056931, R01HL060894, R01HL060919, R01HL06094, R01HL061019, R01HL071051, R01HL071205, R01HL071250, R01HL071251, R01HL071252, R01HL071258, R01HL071259, R01HL086694, R01HL087641, R01HL089650-02, R01HL59367, R37HL051021, U01 HL054466, U01 HL054466-11, U01 HL054471, U01 HL054473, U01 HL072515-06, U01 HL080295, U01 HL084756, U10 HL054512, U10HL054512; NIA NIH HHS: 1R01AG032098-01A, AG13196, N01-AG-1-2109, N01-AG-12100, N01AG6210, N01AG62101, N01AG62103, R01 AG017644-09S1, R01 AG18728; NICHD NIH HHS: N01-HD-1-3107; NIDCR NIH HHS: U01DE018903, U01DE01899; NIDDK NIH HHS: DK062370, DK063491, DK072193, DK075787, DK078150, DK56350, R01 DK072193, R01 DK078150, R01DK058845, R01DK066574, U01 DK062418; NIEHS NIH HHS: ES10126, P30 ES010126, P30ES007033; NIGMS NIH HHS: S06GM008016-320107, S06GM008016-380111, U01 GM074518-04; NIMH NIH HHS: 1RL1MH083268-01, 5R01MH63706:02; NIMHD NIH HHS: 263 MD 821336, 263 MD 9164; NINDS NIH HHS: R01 NS39987, R01 NS42733, U01 NS069208, U01 NS069208-01; PHS HHS: 263-MA-410953, 33014, HHSN268200625226C, HHSN268200782096; Wellcome Trust: 068545/Z/02, 070191/Z/03/Z, 077016/Z/05/Z, 079895, 080747/Z/06/Z, 090532

    Nature 2011;478;7367;103-9

  • Blood pressure loci identified with a gene-centric array.

    Johnson T, Gaunt TR, Newhouse SJ, Padmanabhan S, Tomaszewski M, Kumari M, Morris RW, Tzoulaki I, O'Brien ET, Poulter NR, Sever P, Shields DC, Thom S, Wannamethee SG, Whincup PH, Brown MJ, Connell JM, Dobson RJ, Howard PJ, Mein CA, Onipinla A, Shaw-Hawkins S, Zhang Y, Davey Smith G, Day IN, Lawlor DA, Goodall AH, Cardiogenics Consortium, Fowkes FG, Abecasis GR, Elliott P, Gateva V, Global BPgen Consortium, Braund PS, Burton PR, Nelson CP, Tobin MD, van der Harst P, Glorioso N, Neuvrith H, Salvi E, Staessen JA, Stucchi A, Devos N, Jeunemaitre X, Plouin PF, Tichet J, Juhanson P, Org E, Putku M, Sõber S, Veldre G, Viigimaa M, Levinsson A, Rosengren A, Thelle DS, Hastie CE, Hedner T, Lee WK, Melander O, Wahlstrand B, Hardy R, Wong A, Cooper JA, Palmen J, Chen L, Stewart AF, Wells GA, Westra HJ, Wolfs MG, Clarke R, Franzosi MG, Goel A, Hamsten A, Lathrop M, Peden JF, Seedorf U, Watkins H, Ouwehand WH, Sambrook J, Stephens J, Casas JP, Drenos F, Holmes MV, Kivimaki M, Shah S, Shah T, Talmud PJ, Whittaker J, Wallace C, Delles C, Laan M, Kuh D, Humphries SE, Nyberg F, Cusi D, Roberts R, Newton-Cheh C, Franke L, Stanton AV, Dominiczak AF, Farrall M, Hingorani AD, Samani NJ, Caulfield MJ and Munroe PB

    Clinical Pharmacology and Barts and The London Genome Centre, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London, UK.

    Raised blood pressure (BP) is a major risk factor for cardiovascular disease. Previous studies have identified 47 distinct genetic variants robustly associated with BP, but collectively these explain only a few percent of the heritability for BP phenotypes. To find additional BP loci, we used a bespoke gene-centric array to genotype an independent discovery sample of 25,118 individuals that combined hypertensive case-control and general population samples. We followed up four SNPs associated with BP at our p < 8.56 × 10(-7) study-specific significance threshold and six suggestively associated SNPs in a further 59,349 individuals. We identified and replicated a SNP at LSP1/TNNT3, a SNP at MTHFR-NPPB independent (r(2) = 0.33) of previous reports, and replicated SNPs at AGT and ATP2B1 reported previously. An analysis of combined discovery and follow-up data identified SNPs significantly associated with BP at p < 8.56 × 10(-7) at four further loci (NPR3, HFE, NOS3, and SOX6). The high number of discoveries made with modest genotyping effort can be attributed to using a large-scale yet targeted genotyping array and to the development of a weighting scheme that maximized power when meta-analyzing results from samples ascertained with extreme phenotypes, in combination with results from nonascertained or population samples. Chromatin immunoprecipitation and transcript expression data highlight potential gene regulatory mechanisms at the MTHFR and NOS3 loci. These results provide candidates for further study to help dissect mechanisms affecting BP and highlight the utility of studying SNPs and samples that are independent of those studied previously even when the sample size is smaller than that in previous studies.

    Funded by: AHRQ HHS: HS06516; British Heart Foundation: CH/98001, FS05/125, PG/07/131/24254, PG/07/132/24256, PG/07/133/24260, PG/97012, RG/07/005/23633, RG/07/008/23674, RG/08/008, RG/08/008/25291, RG/08/013/25942, RG/2001004, SP/07/007/2367, SP/08/005/25115; Canadian Institutes of Health Research: MOP172605, MOP77682, MOP82810; Department of Health; Medical Research Council: G0100222, G0400874, G0401527, G0501942, G0701863, G0801056, G0802432, G0902037, G1000143, G19/35, G8802774, G9521010, G9521010D, MC_U106179471, MC_U123092720, MC_U123092723, MC_U137686857, MC_UP_A100_1003; NIA NIH HHS: AG13196, R01 AG017644-09S1; Wellcome Trust: 070191/Z/03/A, 070191/Z/03/Z, 076113/C/04/Z, 090532, 093078/Z/10/Z

    American journal of human genetics 2011;89;6;688-700

  • Chromosomal rearrangements maintain a polymorphic supergene controlling butterfly mimicry.

    Joron M, Frezal L, Jones RT, Chamberlain NL, Lee SF, Haag CR, Whibley A, Becuwe M, Baxter SW, Ferguson L, Wilkinson PA, Salazar C, Davidson C, Clark R, Quail MA, Beasley H, Glithero R, Lloyd C, Sims S, Jones MC, Rogers J, Jiggins CD and ffrench-Constant RH

    CNRS UMR 7205, Muséum National d'Histoire Naturelle, CP50, 45 Rue Buffon, 75005 Paris, France.

    Supergenes are tight clusters of loci that facilitate the co-segregation of adaptive variation, providing integrated control of complex adaptive phenotypes. Polymorphic supergenes, in which specific combinations of traits are maintained within a single population, were first described for 'pin' and 'thrum' floral types in Primula and Fagopyrum, but classic examples are also found in insect mimicry and snail morphology. Understanding the evolutionary mechanisms that generate these co-adapted gene sets, as well as the mode of limiting the production of unfit recombinant forms, remains a substantial challenge. Here we show that individual wing-pattern morphs in the polymorphic mimetic butterfly Heliconius numata are associated with different genomic rearrangements at the supergene locus P. These rearrangements tighten the genetic linkage between at least two colour-pattern loci that are known to recombine in closely related species, with complete suppression of recombination being observed in experimental crosses across a 400-kilobase interval containing at least 18 genes. In natural populations, notable patterns of linkage disequilibrium (LD) are observed across the entire P region. The resulting divergent haplotype clades and inversion breakpoints are found in complete association with wing-pattern morphs. Our results indicate that allelic combinations at known wing-patterning loci have become locked together in a polymorphic rearrangement at the P locus, forming a supergene that acts as a simple switch between complex adaptive phenotypes found in sympatry. These findings highlight how genomic rearrangements can have a central role in the coexistence of adaptive phenotypes involving several genes acting in concert, by locally limiting recombination and gene flow.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/E006191/1, BB/E008836/1, BB/H014268/1, BBE0118451; Medical Research Council: G0900740; Wellcome Trust: 079643, 098051

    Nature 2011;477;7363;203-6

  • Genetic risk prediction in complex disease.

    Jostins L and Barrett JC

    Statistical and Computational Genetics, Wellcome Trust Sanger Institute, Cambs, UK

    Attempting to classify patients into high or low risk for disease onset or outcomes is one of the cornerstones of epidemiology. For some (but by no means all) diseases, clinically usable risk prediction can be performed using classical risk factors such as body mass index, lipid levels, smoking status, family history and, under certain circumstances, genetics (e.g. BRCA1/2 in breast cancer). The advent of genome-wide association studies (GWAS) has led to the discovery of common risk loci for the majority of common diseases. These discoveries raise the possibility of using these variants for risk prediction in a clinical setting. We discuss the different ways in which the predictive accuracy of these loci can be measured, and survey the predictive accuracy of GWAS variants for 18 common diseases. We show that predictive accuracy from genetic models varies greatly across diseases, but that the range is similar to that of non-genetic risk-prediction models. We discuss what factors drive differences in predictive accuracy, and how much value these predictions add over classical predictive tests. We also review the uses and pitfalls of idealized models of risk prediction. Finally, we look forward towards possible future clinical implementation of genetic risk prediction, and discuss realistic expectations for future utility.

    Funded by: Wellcome Trust: WT089120/Z/09/Z

    Human molecular genetics 2011;20;R2;R182-8

  • Imputation of low-frequency variants using the HapMap3 benefits from large, diverse reference sets.

    Jostins L, Morley KI and Barrett JC

    Statistical and Computational Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Imputation allows the inference of unobserved genotypes in low-density data sets, and is often used to test for disease association at variants that are poorly captured by standard genotyping chips (such as low-frequency variants). Although much effort has gone into developing the best imputation algorithms, less is known about the effects of reference set choice on imputation accuracy. We assess the improvements afforded by increases in reference size and diversity, specifically comparing the HapMap2 data set, which has been used to date for imputation, and the new HapMap3 data set, which contains more samples from a more diverse range of populations. We find that, for imputation into Western European samples, the HapMap3 reference provides more accurate imputation with better-calibrated quality scores than HapMap2, and that increasing the number of HapMap3 populations included in the reference set grant further improvements. Improvements are most pronounced for low-frequency variants (frequency <5%), with the largest and most diverse reference sets bringing the accuracy of imputation of low-frequency variants close to that of common ones. For low-frequency variants, reference set diversity can improve the accuracy of imputation, independent of reference sample size. HapMap3 reference sets provide significant increases in imputation accuracy relative to HapMap2, and are of particular use if highly accurate imputation of low-frequency variants is required. Our results suggest that, although the sample sizes from the 1000 Genomes Pilot Project will not allow reliable imputation of low-frequency variants, the larger sample sizes of the main project will allow.

    Funded by: Wellcome Trust: WT089120/Z/09/Z

    European journal of human genetics : EJHG 2011;19;6;662-6

  • Total zinc intake may modify the glucose-raising effect of a zinc transporter (SLC30A8) variant: a 14-cohort meta-analysis.

    Kanoni S, Nettleton JA, Hivert MF, Ye Z, van Rooij FJ, Shungin D, Sonestedt E, Ngwa JS, Wojczynski MK, Lemaitre RN, Gustafsson S, Anderson JS, Tanaka T, Hindy G, Saylor G, Renstrom F, Bennett AJ, van Duijn CM, Florez JC, Fox CS, Hofman A, Hoogeveen RC, Houston DK, Hu FB, Jacques PF, Johansson I, Lind L, Liu Y, McKeown N, Ordovas J, Pankow JS, Sijbrands EJ, Syvänen AC, Uitterlinden AG, Yannakoulia M, Zillikens MC, MAGIC Investigators, Wareham NJ, Prokopenko I, Bandinelli S, Forouhi NG, Cupples LA, Loos RJ, Hallmans G, Dupuis J, Langenberg C, Ferrucci L, Kritchevsky SB, McCarthy MI, Ingelsson E, Borecki IB, Witteman JC, Orho-Melander M, Siscovick DS, Meigs JB, Franks PW and Dedoussis GV

    Department of Nutrition-Dietetics, Harokopio University, Athens, Greece.

    Objective: Many genetic variants have been associated with glucose homeostasis and type 2 diabetes in genome-wide association studies. Zinc is an essential micronutrient that is important for β-cell function and glucose homeostasis. We tested the hypothesis that zinc intake could influence the glucose-raising effect of specific variants.

    Research design and methods: We conducted a 14-cohort meta-analysis to assess the interaction of 20 genetic variants known to be related to glycemic traits and zinc metabolism with dietary zinc intake (food sources) and a 5-cohort meta-analysis to assess the interaction with total zinc intake (food sources and supplements) on fasting glucose levels among individuals of European ancestry without diabetes.

    Results: We observed a significant association of total zinc intake with lower fasting glucose levels (β-coefficient ± SE per 1 mg/day of zinc intake: -0.0012 ± 0.0003 mmol/L, summary P value = 0.0003), while the association of dietary zinc intake was not significant. We identified a nominally significant interaction between total zinc intake and the SLC30A8 rs11558471 variant on fasting glucose levels (β-coefficient ± SE per A allele for 1 mg/day of greater total zinc intake: -0.0017 ± 0.0006 mmol/L, summary interaction P value = 0.005); this result suggests a stronger inverse association between total zinc intake and fasting glucose in individuals carrying the glucose-raising A allele compared with individuals who do not carry it. None of the other interaction tests were statistically significant.

    Conclusions: Our results suggest that higher total zinc intake may attenuate the glucose-raising effect of the rs11558471 SLC30A8 (zinc transporter) variant. Our findings also support evidence for the association of higher total zinc intake with lower fasting glucose levels.

    Funded by: Medical Research Council: G0701863, MC_U106179471, MC_U106188470, MC_UP_A100_1003; NHLBI NIH HHS: R01 HL087700; Wellcome Trust: 090532

    Diabetes 2011;60;9;2407-16

  • In vivo identification of tumor- suppressive PTEN ceRNAs in an oncogenic BRAF-induced mouse model of melanoma.

    Karreth FA, Tay Y, Perna D, Ala U, Tan SM, Rust AG, DeNicola G, Webster KA, Weiss D, Perez-Mancera PA, Krauthammer M, Halaban R, Provero P, Adams DJ, Tuveson DA and Pandolfi PP

    Cancer Genetics Program, Division of Genetics, Beth Israel Deaconess Cancer Center, Department of Medicine and Pathology, Harvard Medical School, Boston, MA 02215, USA.

    We recently proposed that competitive endogenous RNAs (ceRNAs) sequester microRNAs to regulate mRNA transcripts containing common microRNA recognition elements (MREs). However, the functional role of ceRNAs in cancer remains unknown. Loss of PTEN, a tumor suppressor regulated by ceRNA activity, frequently occurs in melanoma. Here, we report the discovery of significant enrichment of putative PTEN ceRNAs among genes whose loss accelerates tumorigenesis following Sleeping Beauty insertional mutagenesis in a mouse model of melanoma. We validated several putative PTEN ceRNAs and further characterized one, the ZEB2 transcript. We show that ZEB2 modulates PTEN protein levels in a microRNA-dependent, protein coding-independent manner. Attenuation of ZEB2 expression activates the PI3K/AKT pathway, enhances cell transformation, and commonly occurs in human melanomas and other cancers expressing low PTEN levels. Our study genetically identifies multiple putative microRNA decoys for PTEN, validates ZEB2 mRNA as a bona fide PTEN ceRNA, and demonstrates that abrogated ZEB2 expression cooperates with BRAF(V600E) to promote melanomagenesis.

    Funded by: Cancer Research UK; NCI NIH HHS: 1P50 CA121974, P50 CA121974, P50 CA121974-01, R01 CA-82328-09, R01 CA082328, R01 CA082328-09; NCRR NIH HHS: UL1 RR025758, UL1 RR025758-04; Wellcome Trust

    Cell 2011;147;2;382-95

  • Phylogenetic analysis of murine leukemia virus sequences from longitudinally sampled chronic fatigue syndrome patients suggests PCR contamination rather than viral evolution.

    Katzourakis A, Hué S, Kellam P and Towers GJ

    Department of Zoology, University of Oxford, South Parks Road, Oxford OX13PS, United Kingdom.

    Xenotropic murine leukemia virus (MLV)-related virus (XMRV) has been amplified from human prostate cancer and chronic fatigue syndrome (CFS) patient samples. Other studies failed to replicate these findings and suggested PCR contamination with a prostate cancer cell line, 22Rv1, as a likely source. MLV-like sequences have also been detected in CFS patients in longitudinal samples 15 years apart. Here, we tested whether sequence data from these samples are consistent with viral evolution. Our phylogenetic analyses strongly reject a model of within-patient evolution and demonstrate that the sequences from the first and second time points represent distinct endogenous murine retroviruses, suggesting contamination.

    Funded by: Medical Research Council: G0801172, G9721629; Wellcome Trust: 090940

    Journal of virology 2011;85;20;10909-13

  • Mouse genomic variation and its effect on phenotypes and gene regulation.

    Keane TM, Goodstadt L, Danecek P, White MA, Wong K, Yalcin B, Heger A, Agam A, Slater G, Goodson M, Furlotte NA, Eskin E, Nellåker C, Whitley H, Cleak J, Janowitz D, Hernandez-Pliego P, Edwards A, Belgard TG, Oliver PL, McIntyre RE, Bhomra A, Nicod J, Gan X, Yuan W, van der Weyden L, Steward CA, Bala S, Stalker J, Mott R, Durbin R, Jackson IJ, Czechanski A, Guerra-Assunção JA, Donahue LR, Reinholdt LG, Payseur BA, Ponting CP, Birney E, Flint J and Adams DJ

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.

    We report genome sequences of 17 inbred strains of laboratory mice and identify almost ten times more variants than previously known. We use these genomes to explore the phylogenetic history of the laboratory mouse and to examine the functional consequences of allele-specific variation on transcript abundance, revealing that at least 12% of transcripts show a significant tissue-specific expression bias. By identifying candidate functional variants at 718 quantitative trait loci we show that the molecular nature of functional variants and their position relative to genes vary according to the effect size of the locus. These sequences provide a starting point for a new era in the functional analysis of a key model organism.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F022697/1; Cancer Research UK: A6997; Medical Research Council: G0800024, MC_EX_G0802457, MC_U127561112, MC_U137761446; NHGRI NIH HHS: T32 HG002536; NHLBI NIH HHS: K25 HL080079; NIAID NIH HHS: N01AI15416; NLM NIH HHS: 2T15LM007359, T15 LM007359; Wellcome Trust: 077192, 079912, 082356, 083573, 083573/Z/07/Z, 085906, 085906/Z/08/Z, 090532

    Nature 2011;477;7364;289-94

  • Genomic insights into the origin of parasitism in the emerging plant pathogen Bursaphelenchus xylophilus.

    Kikuchi T, Cotton JA, Dalzell JJ, Hasegawa K, Kanzaki N, McVeigh P, Takanashi T, Tsai IJ, Assefa SA, Cock PJ, Otto TD, Hunt M, Reid AJ, Sanchez-Flores A, Tsuchihara K, Yokoi T, Larsson MC, Miwa J, Maule AG, Sahashi N, Jones JT and Berriman M

    Forestry and Forest Products Research Institute, Tsukuba, Japan.

    Bursaphelenchus xylophilus is the nematode responsible for a devastating epidemic of pine wilt disease in Asia and Europe, and represents a recent, independent origin of plant parasitism in nematodes, ecologically and taxonomically distinct from other nematodes for which genomic data is available. As well as being an important pathogen, the B. xylophilus genome thus provides a unique opportunity to study the evolution and mechanism of plant parasitism. Here, we present a high-quality draft genome sequence from an inbred line of B. xylophilus, and use this to investigate the biological basis of its complex ecology which combines fungal feeding, plant parasitic and insect-associated stages. We focus particularly on putative parasitism genes as well as those linked to other key biological processes and demonstrate that B. xylophilus is well endowed with RNA interference effectors, peptidergic neurotransmitters (including the first description of ins genes in a parasite) stress response and developmental genes and has a contracted set of chemosensory receptors. B. xylophilus has the largest number of digestive proteases known for any nematode and displays expanded families of lysosome pathway genes, ABC transporters and cytochrome P450 pathway genes. This expansion in digestive and detoxification proteins may reflect the unusual diversity in foods it exploits and environments it encounters during its life cycle. In addition, B. xylophilus possesses a unique complement of plant cell wall modifying proteins acquired by horizontal gene transfer, underscoring the impact of this process on the evolution of plant parasitism by nematodes. Together with the lack of proteins homologous to effectors from other plant parasitic nematodes, this confirms the distinctive molecular basis of plant parasitism in the Bursaphelenchus lineage. The genome sequence of B. xylophilus adds to the diversity of genomic data for nematodes, and will be an important resource in understanding the biology of this unusual parasite.

    Funded by: Wellcome Trust: WT 085775/Z/08/Z

    PLoS pathogens 2011;7;9;e1002219

  • Genetic variation near IRS1 associates with reduced adiposity and an impaired metabolic profile.

    Kilpeläinen TO, Zillikens MC, Stančákova A, Finucane FM, Ried JS, Langenberg C, Zhang W, Beckmann JS, Luan J, Vandenput L, Styrkarsdottir U, Zhou Y, Smith AV, Zhao JH, Amin N, Vedantam S, Shin SY, Haritunians T, Fu M, Feitosa MF, Kumari M, Halldorsson BV, Tikkanen E, Mangino M, Hayward C, Song C, Arnold AM, Aulchenko YS, Oostra BA, Campbell H, Cupples LA, Davis KE, Döring A, Eiriksdottir G, Estrada K, Fernández-Real JM, Garcia M, Gieger C, Glazer NL, Guiducci C, Hofman A, Humphries SE, Isomaa B, Jacobs LC, Jula A, Karasik D, Karlsson MK, Khaw KT, Kim LJ, Kivimäki M, Klopp N, Kühnel B, Kuusisto J, Liu Y, Ljunggren O, Lorentzon M, Luben RN, McKnight B, Mellström D, Mitchell BD, Mooser V, Moreno JM, Männistö S, O'Connell JR, Pascoe L, Peltonen L, Peral B, Perola M, Psaty BM, Salomaa V, Savage DB, Semple RK, Skaric-Juric T, Sigurdsson G, Song KS, Spector TD, Syvänen AC, Talmud PJ, Thorleifsson G, Thorsteinsdottir U, Uitterlinden AG, van Duijn CM, Vidal-Puig A, Wild SH, Wright AF, Clegg DJ, Schadt E, Wilson JF, Rudan I, Ripatti S, Borecki IB, Shuldiner AR, Ingelsson E, Jansson JO, Kaplan RC, Gudnason V, Harris TB, Groop L, Kiel DP, Rivadeneira F, Walker M, Barroso I, Vollenweider P, Waeber G, Chambers JC, Kooner JS, Soranzo N, Hirschhorn JN, Stefansson K, Wichmann HE, Ohlsson C, O'Rahilly S, Wareham NJ, Speliotes EK, Fox CS, Laakso M and Loos RJ

    Medical Research Council (MRC) Epidemiology Unit, Institute of Metabolic Science, Cambridge, UK.

    Genome-wide association studies have identified 32 loci influencing body mass index, but this measure does not distinguish lean from fat mass. To identify adiposity loci, we meta-analyzed associations between ∼2.5 million SNPs and body fat percentage from 36,626 individuals and followed up the 14 most significant (P < 10(-6)) independent loci in 39,576 individuals. We confirmed a previously established adiposity locus in FTO (P = 3 × 10(-26)) and identified two new loci associated with body fat percentage, one near IRS1 (P = 4 × 10(-11)) and one near SPRY2 (P = 3 × 10(-8)). Both loci contain genes with potential links to adipocyte physiology. Notably, the body-fat-decreasing allele near IRS1 is associated with decreased IRS1 expression and with an impaired metabolic profile, including an increased visceral to subcutaneous fat ratio, insulin resistance, dyslipidemia, risk of diabetes and coronary artery disease and decreased adiponectin levels. Our findings provide new insights into adiposity and insulin resistance.

    Funded by: Biotechnology and Biological Sciences Research Council: G20234; British Heart Foundation: PG/07/133/24260, RG/07/008/23674, RG/08/008, RG/08/008/25291, SP/04/002, SP/07/007/23671; Cancer Research UK; Chief Scientist Office: CZB/4/710; Department of Health; Medical Research Council: G0100222, G0401527, G0601966, G0700931, G0701863, G0802051, G0902037, G1000143, G19/35, G8802774, MC_U106179471, MC_U106188470, MC_U127561128; NCRR NIH HHS: M01 RR 16500, M01 RR000425, M01 RR000425-36, M01 RR016500, M01 RR016500-04, M01-RR00425; NHLBI NIH HHS: N01 HC015103, N01 HC025195, N01 HC035129, N01 HC045133, N01 HC055222, N01 HC075150, N01 HC085079, N01 HC085086, N01-HC15103, N01-HC25195, N01-HC35129, N01-HC45133, N01-HC55222, N01-HC75150, N01-HC85079-86, N01HC25195, N01HC55222, N01HC75150, N01HC85079, N01HC85086, N02 HL64278, R01 HL036310, R01 HL087652, R01 HL087652-03, R01 HL087700, R01 HL087700-03, R01 HL088119, R01 HL088119-04, R01 HL117078, R01-HL036310-20A2, R01-HL087652, R01-HL08770003, R01-HL088119, U01 HL072515, U01 HL072515-06, U01 HL080295, U01 HL080295-04, U01 HL084756, U01 HL084756-03, U01-HL080295, U01-HL72515, U01-HL84756; NIA NIH HHS: AG13196, N01 AG062101, N01 AG062103, N01 AG062106, N01-AG12100, N01AG12100, N1AG62101A, N1AG62103A, N1AG62106A, R01 AG013196, R01 AG018728, R01 AG018728-05S1, R01 AG031890, R01 AG032098, R01 AG032098-01A1, R01-AG031890-01, R01-AG032098-01A1, R01-AG18728, R01-AR/AG41398, R37 AG013196; NIAMS NIH HHS: R01 AR041398, R01 AR041398-19, R01 AR046838, R01 AR046838-05, R01-AR046838; NIDDK NIH HHS: DK063491, K23 DK080145, K23 DK080145-05, K23-DK080145, P30 DK063491, P30 DK063491-03, P30 DK072488, P30 DK072488-04S1, P30-DK072488, R01 DK068336, R01 DK068336-03, R01 DK075681, R01 DK075681-04, R01 DK075787, R01 DK075787-05, R01 DK089256, R01-DK06833603, R01-DK07568102, R01-DK075787; Wellcome Trust: 077016/Z/05/Z, 084723/Z/08/Z, 091551, 091746/Z/10/Z, 095515

    Nature genetics 2011;43;8;753-60

  • De novo CNV analysis implicates specific abnormalities of postsynaptic signalling complexes in the pathogenesis of schizophrenia.

    Kirov G, Pocklington AJ, Holmans P, Ivanov D, Ikeda M, Ruderfer D, Moran J, Chambert K, Toncheva D, Georgieva L, Grozeva D, Fjodorova M, Wollerton R, Rees E, Nikolov I, van de Lagemaat LN, Bayés A, Fernandez E, Olason PI, Böttcher Y, Komiyama NH, Collins MO, Choudhary J, Stefansson K, Stefansson H, Grant SG, Purcell S, Sklar P, O'Donovan MC and Owen MJ

    Department of Psychological Medicine and Neurology, MRC Centre for Neuropsychiatric Genetics and Genomics, School of Medicine, Neuroscience and Mental Health Research Institute, Cardiff University, Cardiff, UK.

    A small number of rare, recurrent genomic copy number variants (CNVs) are known to substantially increase susceptibility to schizophrenia. As a consequence of the low fecundity in people with schizophrenia and other neurodevelopmental phenotypes to which these CNVs contribute, CNVs with large effects on risk are likely to be rapidly removed from the population by natural selection. Accordingly, such CNVs must frequently occur as recurrent de novo mutations. In a sample of 662 schizophrenia proband-parent trios, we found that rare de novo CNV mutations were significantly more frequent in cases (5.1% all cases, 5.5% family history negative) compared with 2.2% among 2623 controls, confirming the involvement of de novo CNVs in the pathogenesis of schizophrenia. Eight de novo CNVs occurred at four known schizophrenia loci (3q29, 15q11.2, 15q13.3 and 16p11.2). De novo CNVs of known pathogenic significance in other genomic disorders were also observed, including deletion at the TAR (thrombocytopenia absent radius) region on 1q21.1 and duplication at the WBS (Williams-Beuren syndrome) region at 7q11.23. Multiple de novos spanned genes encoding members of the DLG (discs large) family of membrane-associated guanylate kinases (MAGUKs) that are components of the postsynaptic density (PSD). Two de novos also affected EHMT1, a histone methyl transferase known to directly regulate DLG family members. Using a systems biology approach and merging novel CNV and proteomics data sets, systematic analysis of synaptic protein complexes showed that, compared with control CNVs, case de novos were significantly enriched for the PSD proteome (P=1.72 × 10⁻⁶. This was largely explained by enrichment for members of the N-methyl-D-aspartate receptor (NMDAR) (P=4.24 × 10⁻⁶) and neuronal activity-regulated cytoskeleton-associated protein (ARC) (P=3.78 × 10⁻⁸) postsynaptic signalling complexes. In an analysis of 18 492 subjects (7907 cases and 10 585 controls), case CNVs were enriched for members of the NMDAR complex (P=0.0015) but not ARC (P=0.14). Our data indicate that defects in NMDAR postsynaptic signalling and, possibly, ARC complexes, which are known to be important in synaptic plasticity and cognition, play a significant role in the pathogenesis of schizophrenia.

    Funded by: Medical Research Council: G0800509; NIMH NIH HHS: MH066392-05A1, P50 MH066392

    Molecular psychiatry 2011;17;2;142-53

  • Glyburide is anti-inflammatory and associated with reduced mortality in melioidosis.

    Koh GC, Maude RR, Schreiber MF, Limmathurotsakul D, Wiersinga WJ, Wuthiekanun V, Lee SJ, Mahavanakul W, Chaowagul W, Chierakul W, White NJ, van der Poll T, Day NP, Dougan G and Peacock SJ

    Department of Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK.

    Background: Patients with diabetes mellitus are more prone to bacterial sepsis, but there are conflicting data on whether outcomes are worse in diabetics after presentation with sepsis. Glyburide is an oral hypoglycemic agent used to treat diabetes mellitus. This K(ATP)-channel blocker and broad-spectrum ATP-binding cassette (ABC) transporter inhibitor has broad-ranging effects on the immune system, including inhibition of inflammasome assembly and would be predicted to influence the host response to infection.

    Methods: We studied a cohort of 1160 patients with gram-negative sepsis caused by a single pathogen (Burkholderia pseudomallei), 410 (35%) of whom were known to have diabetes. We subsequently studied prospectively diabetics with B. pseudomallei infection (n = 20) to compare the gene expression profile of peripheral whole blood leukocytes in patients who were taking glyburide against those not taking any sulfonylurea.

    Results: Survival was greater in diabetics than in nondiabetics (38% vs 45%, respectively, P = .04), but the survival benefit was confined to the patient group taking glyburide (adjusted odds ratio .47, 95% confidence interval .28-.74, P = .005). We identified differential expression of 63 immune-related genes (P = .001) in patients taking glyburide, the sum effect of which we predict to be antiinflammatory in the glyburide group.

    Conclusions: We present observational evidence for a glyburide-associated benefit during human melioidosis and correlate this with an anti-inflammatory effect of glyburide on the immune system.

    Funded by: Wellcome Trust: 093956

    Clinical infectious diseases : an official publication of the Infectious Diseases Society of America 2011;52;6;717-25

  • Diabetes does not influence activation of coagulation, fibrinolysis or anticoagulant pathways in Gram-negative sepsis (melioidosis).

    Koh GC, Meijers JC, Maude RR, Limmathurotsakul D, Day NP, Peacock SJ, van der Poll T and Wiersinga WJ

    Center for Experimental and Molecular Medicine, Department of Infectious Diseases, Tropical Medicine & AIDS, Academic Medical Center, Amsterdam, The Netherlands.

    Diabetes is associated with a disturbance of the haemostatic balance and is an important risk factor for sepsis, but the influence of diabetes on the pathogenesis of sepsis remains unclear. Melioidosis ( Burkholderia pseudomallei infection) is a common cause of community-acquired sepsis in Southeast Asia and northern Australia. We sought to investigate the impact of pre-existing diabetes on the coagulation and fibrinolytic systems during sepsis caused by B.pseudomallei . We recruited a cohort of 44 patients (34 with diabetes and 10 without diabetes) with culture-proven melioidosis. Diabetes was defined as a pre-admission diagnosis of diabetes or an HbA₁c>7.8% at enrolment. Thirty healthy blood donors and 52 otherwise healthy diabetes patients served as controls. Citrated plasma was collected from all subjects; additionally in melioidosis patients follow-up specimens were collected seven and ≥ 28 days after enrolment where possible. Relative to uninfected healthy controls, diabetes per se (i.e. in the absence of infection) was characterised by a procoagulant effect. Melioidosis was associated with activation of coagulation (thrombin-antithrombin complexes (TAT), prothrombin fragment F₁+₂ and fibrinogen concentrations were elevated; PT and PTT prolonged), suppression of anti-coagulation (antithrombin, protein C, total and free protein S levels were depressed) and abnormalities of fibrinolysis (D-dimer and plasmin-antiplasmin complex [PAP] were elevated). Remarkably, none of these haemostatic alterations were influenced by pre-existing diabetes. In conclusion, although diabetes is associated with multiple abnormalities of coagulation, anticoagulation and fibrinolysis, these changes are not detectable when superimposed on the background of larger abnormalities attributable to B. pseudomallei sepsis.

    Funded by: Wellcome Trust

    Thrombosis and haemostasis 2011;106;6;1139-48

  • Genome-wide association study in individuals of South Asian ancestry identifies six new type 2 diabetes susceptibility loci.

    Kooner JS, Saleheen D, Sim X, Sehmi J, Zhang W, Frossard P, Been LF, Chia KS, Dimas AS, Hassanali N, Jafar T, Jowett JB, Li X, Radha V, Rees SD, Takeuchi F, Young R, Aung T, Basit A, Chidambaram M, Das D, Grundberg E, Hedman AK, Hydrie ZI, Islam M, Khor CC, Kowlessur S, Kristensen MM, Liju S, Lim WY, Matthews DR, Liu J, Morris AP, Nica AC, Pinidiyapathirage JM, Prokopenko I, Rasheed A, Samuel M, Shah N, Shera AS, Small KS, Suo C, Wickremasinghe AR, Wong TY, Yang M, Zhang F, DIAGRAM, MuTHER, Abecasis GR, Barnett AH, Caulfield M, Deloukas P, Frayling TM, Froguel P, Kato N, Katulanda P, Kelly MA, Liang J, Mohan V, Sanghera DK, Scott J, Seielstad M, Zimmet PZ, Elliott P, Teo YY, McCarthy MI, Danesh J, Tai ES and Chambers JC

    National Heart and Lung Institute (NHLI), Imperial College London, Hammersmith Hospital, London, UK.

    We carried out a genome-wide association study of type-2 diabetes (T2D) in individuals of South Asian ancestry. Our discovery set included 5,561 individuals with T2D (cases) and 14,458 controls drawn from studies in London, Pakistan and Singapore. We identified 20 independent SNPs associated with T2D at P < 10(-4) for testing in a replication sample of 13,170 cases and 25,398 controls, also all of South Asian ancestry. In the combined analysis, we identified common genetic variants at six loci (GRB14, ST6GAL1, VPS26A, HMG20A, AP3S2 and HNF4A) newly associated with T2D (P = 4.1 × 10(-8) to P = 1.9 × 10(-11)). SNPs at GRB14 were also associated with insulin sensitivity (P = 5.0 × 10(-4)), and SNPs at ST6GAL1 and HNF4A were also associated with pancreatic beta-cell function (P = 0.02 and P = 0.001, respectively). Our findings provide additional insight into mechanisms underlying T2D and show the potential for new discovery from genetic association studies in South Asians, a population with increased susceptibility to T2D.

    Funded by: British Heart Foundation: SP/04/002; FIC NIH HHS: KO1TW006087; Medical Research Council: G0700931; NIDDK NIH HHS: DK-25446, R01DK082766; Wellcome Trust: 070854/Z/03/Z, 080747/Z/06/Z, 083270/Z/07/Z, 084723/Z/08/Z

    Nature genetics 2011;43;10;984-9

  • High-throughput semiquantitative analysis of insertional mutations in heterogeneous tumors.

    Koudijs MJ, Klijn C, van der Weyden L, Kool J, ten Hoeve J, Sie D, Prasetyanti PR, Schut E, Kas S, Whipp T, Cuppen E, Wessels L, Adams DJ and Jonkers J

    Division of Molecular Biology and Cancer Systems Biology Center, Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands.

    Retroviral and transposon-based insertional mutagenesis (IM) screens are widely used for cancer gene discovery in mice. Exploiting the full potential of IM screens requires methods for high-throughput sequencing and mapping of transposon and retroviral insertion sites. Current protocols are based on ligation-mediated PCR amplification of junction fragments from restriction endonuclease-digested genomic DNA, resulting in amplification biases due to uneven genomic distribution of restriction enzyme recognition sites. Consequently, sequence coverage cannot be used to assess the clonality of individual insertions. We have developed a novel method, called shear-splink, for the semiquantitative high-throughput analysis of insertional mutations. Shear-splink employs random fragmentation of genomic DNA, which reduces unwanted amplification biases. Additionally, shear-splink enables us to assess clonality of individual insertions by determining the number of unique ligation points (LPs) between the adapter and genomic DNA. This parameter serves as a semiquantitative measure of the relative clonality of individual insertions within heterogeneous tumors. Mixing experiments with clonal cell lines derived from mouse mammary tumor virus (MMTV)-induced tumors showed that shear-splink enables the semiquantitative assessment of the clonality of MMTV insertions. Further, shear-splink analysis of 16 MMTV- and 127 Sleeping Beauty (SB)-induced tumors showed enrichment for cancer-relevant insertions by exclusion of irrelevant background insertions marked by single LPs, thereby facilitating the discovery of candidate cancer genes. To fully exploit the use of the shear-splink method, we set up the Insertional Mutagenesis Database (iMDB), offering a publicly available web-based application to analyze both retroviral- and transposon-based insertional mutagenesis data.

    Funded by: Cancer Research UK; Wellcome Trust; Worldwide Cancer Research: 07-0585

    Genome research 2011;21;12;2181-9

  • FoSTeS, MMBIR and NAHR at the human proximal Xp region and the mechanisms of human Xq isochromosome formation.

    Koumbaris G, Hatzisevastou-Loukidou H, Alexandrou A, Ioannides M, Christodoulou C, Fitzgerald T, Rajan D, Clayton S, Kitsiou-Tzeli S, Vermeesch JR, Skordis N, Antoniou P, Kurg A, Georgiou I, Carter NP and Patsalis PC

    Department of Medical Genetics, University Medical Center Utrecht, Universiteitsweg 100, 3584 CG Utrecht, The Netherlands.

    The recently described DNA replication-based mechanisms of fork stalling and template switching (FoSTeS) and microhomology-mediated break-induced replication (MMBIR) were previously shown to catalyze complex exonic, genic and genomic rearrangements. By analyzing a large number of isochromosomes of the long arm of chromosome X (i(Xq)), using whole-genome tiling path array comparative genomic hybridization (aCGH), ultra-high resolution targeted aCGH and sequencing, we provide evidence that the FoSTeS and MMBIR mechanisms can generate large-scale gross chromosomal rearrangements leading to the deletion and duplication of entire chromosome arms, thus suggesting an important role for DNA replication-based mechanisms in both the development of genomic disorders and cancer. Furthermore, we elucidate the mechanisms of dicentric i(Xq) (idic(Xq)) formation and show that most idic(Xq) chromosomes result from non-allelic homologous recombination between palindromic low copy repeats and highly homologous palindromic LINE elements. We also show that non-recurrent-breakpoint idic(Xq) chromosomes have microhomology-associated breakpoint junctions and are likely catalyzed by microhomology-mediated replication-dependent recombination mechanisms such as FoSTeS and MMBIR. Finally, we stress the role of the proximal Xp region as a chromosomal rearrangement hotspot.

    Funded by: Wellcome Trust: 077008

    Human molecular genetics 2011;20;10;1925-36

  • 96-plex molecular barcoding for the Illumina Genome Analyzer.

    Kozarewa I and Turner DJ

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Next-generation sequencing technologies have a massive throughput, which dramatically reduces the cost of sequencing per gigabase, compared to standard Sanger sequencing. To make the most efficient use of this throughput when sequencing small regions or genomes, we developed a barcoding method, which allows multiplexing of 96 or more samples per lane. The method employs 8 bp tags, incorporated into each sequencing library during the library preparation enrichment polymerase chain reaction (PCR), pooling bar-coded libraries in equimolar ratios based on quantitative PCR, and sequencing using the three-read Illumina method.

    Methods in molecular biology (Clifton, N.J.) 2011;733;279-98

  • Amplification-free library preparation for paired-end Illumina sequencing.

    Kozarewa I and Turner DJ

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    The library preparation step is of critical importance for the quality of next-generation sequencing data. The use of the polymerase chain reaction (PCR) as a part of the standard Illumina library preparation protocol causes an appreciable proportion of the obtained sequences to be duplicates, making the sequencing run less efficient. Also, amplification introduces biases, particularly for genomes with high or low GC content, which reduces the complexity of the resulting library. To overcome these difficulties, we developed an amplification-free library preparation. By the use of custom adapters, unamplified, ligated samples can hybridize directly to the oligonucleotides on the flowcell surface.

    Methods in molecular biology (Clifton, N.J.) 2011;733;257-66

  • miR-96 regulates the progression of differentiation in mammalian cochlear inner and outer hair cells.

    Kuhn S, Johnson SL, Furness DN, Chen J, Ingham N, Hilton JM, Steffes G, Lewis MA, Zampini V, Hackney CM, Masetto S, Holley MC, Steel KP and Marcotti W

    Department of Biomedical Science, University of Sheffield, Sheffield S10 2TN, United Kingdom.

    MicroRNAs (miRNAs) are small noncoding RNAs able to regulate a broad range of protein-coding genes involved in many biological processes. miR-96 is a sensory organ-specific miRNA expressed in the mammalian cochlea during development. Mutations in miR-96 cause nonsyndromic progressive hearing loss in humans and mice. The mouse mutant diminuendo has a single base change in the seed region of the Mir96 gene leading to widespread changes in the expression of many genes. We have used this mutant to explore the role of miR-96 in the maturation of the auditory organ. We found that the physiological development of mutant sensory hair cells is arrested at around the day of birth, before their biophysical differentiation into inner and outer hair cells. Moreover, maturation of the hair cell stereocilia bundle and remodelling of auditory nerve connections within the cochlea fail to occur in miR-96 mutants. We conclude that miR-96 regulates the progression of the physiological and morphological differentiation of cochlear hair cells and, as such, coordinates one of the most distinctive functional refinements of the mammalian auditory system.

    Funded by: Action on Hearing Loss: G41; Medical Research Council: G0300212; Wellcome Trust: 077189, 088719

    Proceedings of the National Academy of Sciences of the United States of America 2011;108;6;2355-60

  • Annotation of two large contiguous regions from the Haemonchus contortus genome using RNA-seq and comparative analysis with Caenorhabditis elegans.

    Laing R, Hunt M, Protasio AV, Saunders G, Mungall K, Laing S, Jackson F, Quail M, Beech R, Berriman M and Gilleard JS

    Welcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    The genomes of numerous parasitic nematodes are currently being sequenced, but their complexity and size, together with high levels of intra-specific sequence variation and a lack of reference genomes, makes their assembly and annotation a challenging task. Haemonchus contortus is an economically significant parasite of livestock that is widely used for basic research as well as for vaccine development and drug discovery. It is one of many medically and economically important parasites within the strongylid nematode group. This group of parasites has the closest phylogenetic relationship with the model organism Caenorhabditis elegans, making comparative analysis a potentially powerful tool for genome annotation and functional studies. To investigate this hypothesis, we sequenced two contiguous fragments from the H. contortus genome and undertook detailed annotation and comparative analysis with C. elegans. The adult H. contortus transcriptome was sequenced using an Illumina platform and RNA-seq was used to annotate a 409 kb overlapping BAC tiling path relating to the X chromosome and a 181 kb BAC insert relating to chromosome I. In total, 40 genes and 12 putative transposable elements were identified. 97.5% of the annotated genes had detectable homologues in C. elegans of which 60% had putative orthologues, significantly higher than previous analyses based on EST analysis. Gene density appears to be less in H. contortus than in C. elegans, with annotated H. contortus genes being an average of two-to-three times larger than their putative C. elegans orthologues due to a greater intron number and size. Synteny appears high but gene order is generally poorly conserved, although areas of conserved microsynteny are apparent. C. elegans operons appear to be partially conserved in H. contortus. Our findings suggest that a combination of RNA-seq and comparative analysis with C. elegans is a powerful approach for the annotation and analysis of strongylid nematode genomes.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/E018130/1; Wellcome Trust: WT 085775/Z/08/Z

    PloS one 2011;6;8;e23216

  • Q8IYL2 is a candidate gene for the familial epilepsy syndrome of Partial Epilepsy with Pericentral Spikes (PEPS).

    Leschziner GD, Coffey AJ, Andrew T, Gregorio SP, Dias-Neto E, Calafato M, Bentley DR, Kinton L, Sander JW and Johnson MR

    Division of Neuroscience, Imperial College London, UK; Wellcome Trust Sanger Institute, Cambridge, UK.

    Purpose: Partial Epilepsy with Pericentral Spikes (PEPS) is a novel Mendelian idiopathic epilepsy with evidence of linkage to Chromosome 4p15. Our aim was to identify the causative mutation in this epilepsy syndrome.

    Methods: We re-annotated all 42 genes in the linked chromosomal region and sequenced all genes within the linked interval. All exons, intron-exon boundaries and untranslated regions were sequenced in the original pedigree, and novel changes segregating correctly were subjected to bioinformatic analysis. Quantitative polymerase chain reaction was performed to examine for potential copy number variation (CNV).

    Results: 29 previously undescribed variants correctly segregating with the linked haplotype were identified. Bioinformatic analysis demonstrated that six variants were non-synonymous coding sequence polymorphisms, one of which, in Q8IYL2 (Gly400Ala), was found in neither Caucasian (n=243) and ancestry-matched Brazilian (n=180) control samples, nor subjects from the 1000 Genome Project. No gene duplications or deletions were identified in the linked region.

    Discussion: We postulate that Q8IYL2 is a causative gene for PEPS, after exhaustive resequencing and bioinformatic analysis. The function of this gene is unknown, but it is expressed in brain tissue.

    Epilepsy research 2011;96;1-2;109-15

  • Inference of human population history from individual whole-genome sequences.

    Li H and Durbin R

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    The history of human population size is important for understanding human evolution. Various studies have found evidence for a founder event (bottleneck) in East Asian and European populations, associated with the human dispersal out-of-Africa event around 60 thousand years (kyr) ago. However, these studies have had to assume simplified demographic models with few parameters, and they do not provide a precise date for the start and stop times of the bottleneck. Here, with fewer assumptions on population size changes, we present a more detailed history of human population sizes between approximately ten thousand and a million years ago, using the pairwise sequentially Markovian coalescent model applied to the complete diploid genome sequences of a Chinese male (YH), a Korean male (SJK), three European individuals (J. C. Venter, NA12891 and NA12878 (ref. 9)) and two Yoruba males (NA18507 (ref. 10) and NA19239). We infer that European and Chinese populations had very similar population-size histories before 10-20 kyr ago. Both populations experienced a severe bottleneck 10-60 kyr ago, whereas African populations experienced a milder bottleneck from which they recovered earlier. All three populations have an elevated effective population size between 60 and 250 kyr ago, possibly due to population substructure. We also infer that the differentiation of genetically modern humans may have started as early as 100-120 kyr ago, but considerable genetic exchanges may still have occurred until 20-40 kyr ago.

    Funded by: Wellcome Trust: 077192

    Nature 2011;475;7357;493-6

  • Mobilization of giant piggyBac transposons in the mouse genome.

    Li MA, Turner DJ, Ning Z, Yusa K, Liang Q, Eckert S, Rad L, Fitzgerald TW, Craig NL and Bradley A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK, CB10 1SA.

    The development of technologies that allow the stable delivery of large genomic DNA fragments in mammalian systems is important for genetic studies as well as for applications in gene therapy. DNA transposons have emerged as flexible and efficient molecular vehicles to mediate stable cargo transfer. However, the ability to carry DNA fragments >10 kb is limited in most DNA transposons. Here, we show that the DNA transposon piggyBac can mobilize 100-kb DNA fragments in mouse embryonic stem (ES) cells, making it the only known transposon with such a large cargo capacity. The integrity of the cargo is maintained during transposition, the copy number can be controlled and the inserted giant transposons express the genomic cargo. Furthermore, these 100-kb transposons can also be excised from the genome without leaving a footprint. The development of piggyBac as a large cargo vector will facilitate a wider range of genetic and genomic applications.

    Funded by: Howard Hughes Medical Institute; Wellcome Trust: WT077187

    Nucleic acids research 2011;39;22;e148

  • Zebrafish Fukutin family proteins link the unfolded protein response with dystroglycanopathies.

    Lin YY, White RJ, Torelli S, Cirak S, Muntoni F and Stemple DL

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Allelic mutations in putative glycosyltransferase genes, fukutin and fukutin-related protein (fkrp), lead to a wide range of muscular dystrophies associated with hypoglycosylation of α-dystroglycan, commonly referred to as dystroglycanopathies. Defective glycosylation affecting dystroglycan-ligand interactions is considered to underlie the disease pathogenesis. We have modelled dystroglycanopathies in zebrafish using a novel loss-of-function dystroglycan allele and by inhibition of Fukutin family protein activities. We show that muscle pathology in embryos lacking Fukutin or FKRP is different from loss of dystroglycan. In addition to hypoglycosylated α-dystroglycan, knockdown of Fukutin or FKRP leads to a notochord defect and a perturbation of laminin expression before muscle degeneration. These are a consequence of endoplasmic reticulum stress and activation of the unfolded protein response (UPR), preceding loss of dystroglycan-ligand interactions. Together, our results suggest that Fukutin family proteins may play important roles in protein secretion and that the UPR may contribute to the phenotypic spectrum of some dystroglycanopathies in humans.

    Funded by: Medical Research Council: G0601943; Wellcome Trust: 077037/Z/05/Z, 077047/Z/05/Z

    Human molecular genetics 2011;20;9;1763-75

  • Stella-Cre mice are highly efficient Cre deleters.

    Liu H, Wang W, Chew SK, Lee SC, Li J, Vassiliou GS, Green T, Futreal PA, Bradley A, Zhang S and Liu P

    College of Animal Science and Technology, Huazhong Agriculture University, Wuhan, China.

    Cre-loxP recombination is widely used for genetic manipulation of the mouse genome. Here, we report generation and characterization of a new Cre line, Stella-Cre, where Cre expression cassette was targeted to the 3' UTR of the Stella locus. Stella is specifically expressed in preimplantation embryos and in the germline. Cre-loxP recombination efficiency in Stella-Cre mice was investigated at several genomic loci including Rosa26, Jak2, and Npm1. At all the loci examined, we observed 100% Cre-loxP recombination efficiency in the embryos and in the germline. Thus, Stella-Cre mice serve as a very efficient deleter line.

    Funded by: Wellcome Trust

    Genesis (New York, N.Y. : 2000) 2011;49;8;689-95

  • Comparative and demographic analysis of orang-utan genomes.

    Locke DP, Hillier LW, Warren WC, Worley KC, Nazareth LV, Muzny DM, Yang SP, Wang Z, Chinwalla AT, Minx P, Mitreva M, Cook L, Delehaunty KD, Fronick C, Schmidt H, Fulton LA, Fulton RS, Nelson JO, Magrini V, Pohl C, Graves TA, Markovic C, Cree A, Dinh HH, Hume J, Kovar CL, Fowler GR, Lunter G, Meader S, Heger A, Ponting CP, Marques-Bonet T, Alkan C, Chen L, Cheng Z, Kidd JM, Eichler EE, White S, Searle S, Vilella AJ, Chen Y, Flicek P, Ma J, Raney B, Suh B, Burhans R, Herrero J, Haussler D, Faria R, Fernando O, Darré F, Farré D, Gazave E, Oliva M, Navarro A, Roberto R, Capozzi O, Archidiacono N, Della Valle G, Purgato S, Rocchi M, Konkel MK, Walker JA, Ullmer B, Batzer MA, Smit AF, Hubley R, Casola C, Schrider DR, Hahn MW, Quesada V, Puente XS, Ordoñez GR, López-Otín C, Vinar T, Brejova B, Ratan A, Harris RS, Miller W, Kosiol C, Lawson HA, Taliwal V, Martins AL, Siepel A, Roychoudhury A, Ma X, Degenhardt J, Bustamante CD, Gutenkunst RN, Mailund T, Dutheil JY, Hobolth A, Schierup MH, Ryder OA, Yoshinaga Y, de Jong PJ, Weinstock GM, Rogers J, Mardis ER, Gibbs RA and Wilson RK

    The Genome Center at Washington University, Washington University School of Medicine, 4444 Forest Park Avenue, Saint Louis, Missouri 63108, USA.

    'Orang-utan' is derived from a Malay term meaning 'man of the forest' and aptly describes the southeast Asian great apes native to Sumatra and Borneo. The orang-utan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orang-utan draft genome assembly and short read sequence data from five Sumatran and five Bornean orang-utan genomes. Our analyses reveal that, compared to other primates, the orang-utan genome has many unique features. Structural evolution of the orang-utan genome has proceeded much more slowly than other great apes, evidenced by fewer rearrangements, less segmental duplication, a lower rate of gene family turnover and surprisingly quiescent Alu repeats, which have played a major role in restructuring other primate genomes. We also describe a primate polymorphic neocentromere, found in both Pongo species, emphasizing the gradual evolution of orang-utan genome structure. Orang-utans have extremely low energy usage for a eutherian mammal, far lower than their hominid relatives. Adding their genome to the repertoire of sequenced primates illuminates new signals of positive selection in several pathways including glycolipid metabolism. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400,000 years ago, is more recent than most previous studies and underscores the complexity of the orang-utan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (N(e)) expanded exponentially relative to the ancestral N(e) after the split, while Bornean N(e) declined over the same period. Overall, the resources and analyses presented here offer new opportunities in evolutionary genomics, insights into hominid biology, and an extensive database of variation for conservation efforts.

    Funded by: Medical Research Council: G0501331, MC_U137761446; NHGRI NIH HHS: HG002238, HG002385, R01 HG002939, U54 HG003079, U54 HG003079-08, U54 HG003273; NIA NIH HHS: P01 AG022064; NIGMS NIH HHS: R01 GM059290, R01 GM59290

    Nature 2011;469;7331;529-33

  • ATMIN is required for maintenance of genomic stability and suppression of B cell lymphoma.

    Loizou JI, Sancho R, Kanu N, Bolland DJ, Yang F, Rada C, Corcoran AE and Behrens A

    Mammalian Genetics Lab, Cancer Research UK, London Research Institute, 44, Lincoln's Inn Fields, London WC2A 3LY, UK.

    Defective V(D)J rearrangement of immunoglobulin heavy or light chain (IgH or IgL) or class switch recombination (CSR) can initiate chromosomal translocations. The DNA-damage kinase ATM is required for the suppression of chromosomal translocations but ATM regulation is incompletely understood. Here, we show that mice lacking the ATM cofactor ATMIN in B cells (ATMIN(ΔB/ΔB)) have impaired ATM signaling and develop B cell lymphomas. Notably, ATMIN(ΔB/ΔB) cells exhibited defective peripheral V(D)J rearrangement and CSR, resulting in translocations involving the Igh and Igl loci, indicating that ATMIN is required for efficient repair of DNA breaks generated during somatic recombination. Thus, our results identify a role for ATMIN in regulating the maintenance of genomic stability and tumor suppression in B cells.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F012217/1, BBS/E/B/0000C163; Cancer Research UK; Medical Research Council: MC_U105178806; Wellcome Trust

    Cancer cell 2011;19;5;587-600

  • PoolHap: inferring haplotype frequencies from pooled samples by next generation sequencing.

    Long Q, Jeffares DC, Zhang Q, Ye K, Nizhynska V, Ning Z, Tyler-Smith C and Nordborg M

    Gregor Mendel Institute, Vienna, Austria.

    With the advance of next-generation sequencing (NGS) technologies, increasingly ambitious applications are becoming feasible. A particularly powerful one is the sequencing of polymorphic, pooled samples. The pool can be naturally occurring, as in the case of multiple pathogen strains in a blood sample, multiple types of cells in a cancerous tissue sample, or multiple isoforms of mRNA in a cell. In these cases, it's difficult or impossible to partition the subtypes experimentally before sequencing, and those subtype frequencies must hence be inferred. In addition, investigators may occasionally want to artificially pool the sample of a large number of individuals for reasons of cost-efficiency, e.g., when carrying out genetic mapping using bulked segregant analysis. Here we describe PoolHap, a computational tool for inferring haplotype frequencies from pooled samples when haplotypes are known. The key insight into why PoolHap works is that the large number of SNPs that come with genome-wide coverage can compensate for the uneven coverage across the genome. The performance of PoolHap is illustrated and discussed using simulated and real data. We show that PoolHap is able to accurately estimate the proportions of haplotypes with less than 2% error for 34-strain mixtures with 2X total coverage Arabidopsis thaliana whole genome polymorphism data. This method should facilitate greater biological insight into heterogeneous samples that are difficult or impossible to isolate experimentally. Software and users manual are freely available at

    Funded by: Wellcome Trust: 085775/Z/08/Z

    PloS one 2011;6;1;e15292

  • A large palindrome with interchromosomal gene duplications in the pericentromeric region of the D. melanogaster Y chromosome.

    Méndez-Lago M, Bergman CM, de Pablos B, Tracey A, Whitehead SL and Villasante A

    Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autónoma de Madrid, Cantoblanco, Madrid, Spain.

    The non-recombining Y chromosome is expected to degenerate over evolutionary time, however, gene gain is a common feature of Y chromosomes of mammals and Drosophila. Here, we report that a large palindrome containing interchromosomal segmental duplications is located in the vicinity of the first amplicon detected in the Y chromosome of D. melanogaster. The recent appearance of such amplicons suggests that duplications to the Y chromosome, followed by the amplification of the segmental duplications, are a mechanism for the continuing evolution of Drosophila Y chromosomes.

    Funded by: Wellcome Trust

    Molecular biology and evolution 2011;28;7;1967-71

  • A research agenda for malaria eradication: basic science and enabling technologies.

    malERA Consultative Group on Basic Science and Enabling Technologies

    Today's malaria control efforts are limited by our incomplete understanding of the biology of Plasmodium and of the complex relationships between human populations and the multiple species of mosquito and parasite. Research priorities include the development of in vitro culture systems for the complete life cycle of P. falciparum and P. vivax and the development of an appropriate liver culture system to study hepatic stages. In addition, genetic technologies for the manipulation of Plasmodium need to be improved, the entire parasite metabolome needs to be characterized to identify new druggable targets, and improved information systems for monitoring the changes in epidemiology, pathology, and host-parasite-vector interactions as a result of intensified control need to be established to bridge the gap between bench, preclinical, clinical, and population-based sciences.

    Funded by: Medical Research Council: G0501670; Wellcome Trust

    PLoS medicine 2011;8;1;e1000399

  • Low-bias, strand-specific transcriptome Illumina sequencing by on-flowcell reverse transcription (FRT-seq).

    Mamanova L and Turner DJ

    The Wellcome Trust Sanger Institute, Cambridge, UK.

    The unifying feature of second-generation sequencing technologies is that single template strands are amplified clonally onto a solid surface prior to the sequencing reaction. To convert template strands into a compatible state for attachment to this surface, a multistep library preparation is required, which typically culminates in amplification by the PCR. PCR is an inherently biased process, which decreases the efficiency of data acquisition. Flowcell reverse transcription sequencing is a method of transcriptome sequencing for Illumina sequencers in which the reverse transcription reaction is performed on the flowcell by using unamplified, adapter-ligated mRNA as a template. This approach removes PCR biases and duplicates, generates strand-specific paired-end data and is highly reproducible. The procedure can be performed quickly, taking 2 d to generate clusters from mRNA.

    Funded by: Wellcome Trust: WT079643

    Nature protocols 2011;6;11;1736-47

  • APC15 drives the turnover of MCC-CDC20 to make the spindle assembly checkpoint responsive to kinetochore attachment.

    Mansfeld J, Collin P, Collins MO, Choudhary JS and Pines J

    The Gurdon Institute and Department of Zoology, Tennis Court Road, Cambridge CB2 1QN, UK.

    Faithful chromosome segregation during mitosis depends on the spindle assembly checkpoint (SAC), which monitors kinetochore attachment to the mitotic spindle. Unattached kinetochores generate mitotic checkpoint proteins complexes (MCCs) that bind and inhibit the anaphase-promoting complex, or cyclosome (APC/C). How the SAC proficiently inhibits the APC/C but still allows its rapid activation when the last kinetochore attaches to the spindle is important for the understanding of how cells maintain genomic stability. We show that the APC/C subunit APC15 is required for the turnover of the APC/C co-activator CDC20 and release of MCCs during SAC signalling but not for APC/C activity per se. In the absence of APC15, MCCs and ubiquitylated CDC20 remain 'locked' onto the APC/C, which prevents the ubiquitylation and degradation of cyclin B1 when the SAC is satisfied. We conclude that APC15 mediates the constant turnover of CDC20 and MCCs on the APC/C to allow the SAC to respond to the attachment state of kinetochores.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/G001537/1; Cancer Research UK: A3211; Wellcome Trust: 079643/Z/06/Z

    Nature cell biology 2011;13;10;1234-43

  • Insertional mutagenesis identifies multiple networks of cooperating genes driving intestinal tumorigenesis.

    March HN, Rust AG, Wright NA, ten Hoeve J, de Ridder J, Eldridge M, van der Weyden L, Berns A, Gadiot J, Uren A, Kemp R, Arends MJ, Wessels LF, Winton DJ and Adams DJ

    Cancer Research UK Cambridge Research Institute, Li Ka Shing Centre, UK.

    The evolution of colorectal cancer suggests the involvement of many genes. To identify new drivers of intestinal cancer, we performed insertional mutagenesis using the Sleeping Beauty transposon system in mice carrying germline or somatic Apc mutations. By analyzing common insertion sites (CISs) isolated from 446 tumors, we identified many hundreds of candidate cancer drivers. Comparison to human data sets suggested that 234 CIS-targeted genes are also dysregulated in human colorectal cancers. In addition, we found 183 CIS-containing genes that are candidate Wnt targets and showed that 20 CISs-containing genes are newly discovered modifiers of canonical Wnt signaling. We also identified mutations associated with a subset of tumors containing an expanded number of Paneth cells, a hallmark of deregulated Wnt signaling, and genes associated with more severe dysplasia included those encoding members of the FGF signaling cascade. Some 70 genes had co-occurrence of CIS pairs, clustering into 38 sub-networks that may regulate tumor development.

    Funded by: Cancer Research UK: 13031, A6997; Medical Research Council: MC_UP_A652_1001; Wellcome Trust

    Nature genetics 2011;43;12;1202-9

  • HLA-A*3101 and carbamazepine-induced hypersensitivity reactions in Europeans.

    McCormack M, Alfirevic A, Bourgeois S, Farrell JJ, Kasperavičiūtė D, Carrington M, Sills GJ, Marson T, Jia X, de Bakker PI, Chinthapalli K, Molokhia M, Johnson MR, O'Connor GD, Chaila E, Alhusaini S, Shianna KV, Radtke RA, Heinzen EL, Walley N, Pandolfo M, Pichler W, Park BK, Depondt C, Sisodiya SM, Goldstein DB, Deloukas P, Delanty N, Cavalleri GL and Pirmohamed M

    Molecular and Cellular Therapeutics, the Royal College of Surgeons in Ireland, Dublin, Ireland.

    Background: Carbamazepine causes various forms of hypersensitivity reactions, ranging from maculopapular exanthema to severe blistering reactions. The HLA-B*1502 allele has been shown to be strongly correlated with carbamazepine-induced Stevens-Johnson syndrome and toxic epidermal necrolysis (SJS-TEN) in the Han Chinese and other Asian populations but not in European populations.

    Methods: We performed a genomewide association study of samples obtained from 22 subjects with carbamazepine-induced hypersensitivity syndrome, 43 subjects with carbamazepine-induced maculopapular exanthema, and 3987 control subjects, all of European descent. We tested for an association between disease and HLA alleles through proxy single-nucleotide polymorphisms and imputation, confirming associations by high-resolution sequence-based HLA typing. We replicated the associations in samples from 145 subjects with carbamazepine-induced hypersensitivity reactions.

    Results: The HLA-A*3101 allele, which has a prevalence of 2 to 5% in Northern European populations, was significantly associated with the hypersensitivity syndrome (P=3.5×10(-8)). An independent genomewide association study of samples from subjects with maculopapular exanthema also showed an association with the HLA-A*3101 allele (P=1.1×10(-6)). Follow-up genotyping confirmed the variant as a risk factor for the hypersensitivity syndrome (odds ratio, 12.41; 95% confidence interval [CI], 1.27 to 121.03), maculopapular exanthema (odds ratio, 8.33; 95% CI, 3.59 to 19.36), and SJS-TEN (odds ratio, 25.93; 95% CI, 4.93 to 116.18).

    Conclusions: The presence of the HLA-A*3101 allele was associated with carbamazepine-induced hypersensitivity reactions among subjects of Northern European ancestry. The presence of the allele increased the risk from 5.0% to 26.0%, whereas its absence reduced the risk from 5.0% to 3.8%. (Funded by the U.K. Department of Health and others.).

    Funded by: Department of Health; Intramural NIH HHS; Medical Research Council: G0400126; PHS HHS: HHS-N261200800001E, HHSN261200800001E; Wellcome Trust: 084730

    The New England journal of medicine 2011;364;12;1134-43

  • Genome-wide association study identifies 12 new susceptibility loci for primary biliary cirrhosis.

    Mells GF, Floyd JA, Morley KI, Cordell HJ, Franklin CS, Shin SY, Heneghan MA, Neuberger JM, Donaldson PT, Day DB, Ducker SJ, Muriithi AW, Wheater EF, Hammond CJ, Dawwas MF, UK PBC Consortium, Wellcome Trust Case Control Consortium 3, Jones DE, Peltonen L, Alexander GJ, Sandford RN and Anderson CA

    Academic Department of Medical Genetics, Cambridge University, Cambridge, UK; Department of Hepatology, Cambridge University Hospitals National Health Service (NHS) Foundation Trust, Cambridge, UK.

    In addition to the HLA locus, six genetic risk factors for primary biliary cirrhosis (PBC) have been identified in recent genome-wide association studies (GWAS). To identify additional loci, we carried out a GWAS using 1,840 cases from the UK PBC Consortium and 5,163 UK population controls as part of the Wellcome Trust Case Control Consortium 3 (WTCCC3). We followed up 28 loci in an additional UK cohort of 620 PBC cases and 2,514 population controls. We identified 12 new susceptibility loci (at a genome-wide significance level of P < 5 × 10⁻⁸) and replicated all previously associated loci. We identified three further new loci in a meta-analysis of data from our study and previously published GWAS results. New candidate genes include STAT4, DENND1B, CD80, IL7R, CXCR5, TNFRSF1A, CLEC16A and NFKB1. This study has considerably expanded our knowledge of the genetic architecture of PBC.

    Funded by: Medical Research Council: G0500020, G0800460, G0802068; NEI NIH HHS: R01 EY018246; PHS HHS: 1R01LEY018246; Wellcome Trust: 085925/Z/08/Z, 091745, WT090355/B/09/Z, WT09355A/09/Z, WT91745/Z/10/Z

    Nature genetics 2011;43;4;329-32

  • The origins, evolution, and functional potential of alternative splicing in vertebrates.

    Mudge JM, Frankish A, Fernandez-Banet J, Alioto T, Derrien T, Howald C, Reymond A, Guigó R, Hubbard T and Harrow J

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Alternative splicing (AS) has the potential to greatly expand the functional repertoire of mammalian transcriptomes. However, few variant transcripts have been characterized functionally, making it difficult to assess the contribution of AS to the generation of phenotypic complexity and to study the evolution of splicing patterns. We have compared the AS of 309 protein-coding genes in the human ENCODE pilot regions against their mouse orthologs in unprecedented detail, utilizing traditional transcriptomic and RNAseq data. The conservation status of every transcript has been investigated, and each functionally categorized as coding (separated into coding sequence [CDS] or nonsense-mediated decay [NMD] linked) or noncoding. In total, 36.7% of human and 19.3% of mouse coding transcripts are species specific, and we observe a 3.6 times excess of human NMD transcripts compared with mouse; in contrast to previous studies, the majority of species-specific AS is unlinked to transposable elements. We observe one conserved CDS variant and one conserved NMD variant per 2.3 and 11.4 genes, respectively. Subsequently, we identify and characterize equivalent AS patterns for 22.9% of these CDS or NMD-linked events in nonmammalian vertebrate genomes, and our data indicate that functional NMD-linked AS is more widespread and ancient than previously thought. Furthermore, although we observe an association between conserved AS and elevated sequence conservation, as previously reported, we emphasize that 30% of conserved AS exons display sequence conservation below the average score for constitutive exons. In conclusion, we demonstrate the value of detailed comparative annotation in generating a comprehensive set of AS transcripts, increasing our understanding of AS evolution in vertebrates. Our data supports a model whereby the acquisition of functional AS has occurred throughout vertebrate evolution and is considered alongside amino acid change as a key mechanism in gene evolution.

    Funded by: NHGRI NIH HHS: 5U54HG004555, U54 HG004555; Wellcome Trust: 077198, WT077198/Z/05/Z

    Molecular biology and evolution 2011;28;10;2949-59

  • Sequencing skippy: the genome sequence of an Australian kangaroo, Macropus eugenii.

    Murchison EP and Adams DJ

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Sequencing of the tammar wallaby (Macropus eugenii) reveals insights into genome evolution, and mammalian reproduction and development.

    Genome biology 2011;12;8;123

  • Evidence for several waves of global transmission in the seventh cholera pandemic.

    Mutreja A, Kim DW, Thomson NR, Connor TR, Lee JH, Kariuki S, Croucher NJ, Choi SY, Harris SR, Lebens M, Niyogi SK, Kim EJ, Ramamurthy T, Chun J, Wood JL, Clemens JD, Czerkinsky C, Nair GB, Holmgren J, Parkhill J and Dougan G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Vibrio cholerae is a globally important pathogen that is endemic in many areas of the world and causes 3-5 million reported cases of cholera every year. Historically, there have been seven acknowledged cholera pandemics; recent outbreaks in Zimbabwe and Haiti are included in the seventh and ongoing pandemic. Only isolates in serogroup O1 (consisting of two biotypes known as 'classical' and 'El Tor') and the derivative O139 can cause epidemic cholera. It is believed that the first six cholera pandemics were caused by the classical biotype, but El Tor has subsequently spread globally and replaced the classical biotype in the current pandemic. Detailed molecular epidemiological mapping of cholera has been compromised by a reliance on sub-genomic regions such as mobile elements to infer relationships, making El Tor isolates associated with the seventh pandemic seem superficially diverse. To understand the underlying phylogeny of the lineage responsible for the current pandemic, we identified high-resolution markers (single nucleotide polymorphisms; SNPs) in 154 whole-genome sequences of globally and temporally representative V. cholerae isolates. Using this phylogeny, we show here that the seventh pandemic has spread from the Bay of Bengal in at least three independent but overlapping waves with a common ancestor in the 1950s, and identify several transcontinental transmission events. Additionally, we show how the acquisition of the SXT family of antibiotic resistance elements has shaped pandemic spread, and show that this family was first acquired at least ten years before its discovery in V. cholerae.

    Funded by: Wellcome Trust: 076962, 076964

    Nature 2011;477;7365;462-5

  • Activation of K-RAS by co-mutation of codons 19 and 20 is transforming.

    Naguib A, Wilson CH, Adams DJ and Arends MJ

    Department of Pathology, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK, CB2 0QQ, UK.

    The K-RAS oncogene is widely mutated in human cancers. Activating mutations in K-RAS give rise to constitutive signalling through the MAPK/ERK and PI3K/AKT pathways promoting increased cell division, reduced apoptosis and transformation. The majority of activating mutations in K-RAS are located in codons 12 and 13. In a human colorectal cancer we identified a novel K-RAS co-mutation that altered codons 19 and 20 resulting in transitions at both codons (L19F/T20A) in the same allele. Using focus forming transformation assays in vitro , we showed that co-mutation of L19F/T20A in K-RAS demonstrated intermediate transforming ability that was greater than that of individual L19F and T20A mutants, but less than that of G12D and G12V K-RAS mutants. This demonstrated the synergistic effects of co-mutation of codons 19 and 20 and illustrated that co-mutation of these codons is functionally significant.

    Journal of molecular signaling 2011;6;2

  • The critical role of histone H2A-deubiquitinase Mysm1 in hematopoiesis and lymphocyte differentiation.

    Nijnik A, Clare S, Hale C, Raisen C, McIntyre RE, Yusa K, Everitt AR, Mottram L, Podrini C, Lucas M, Estabel J, Goulding D, Sanger Institute Microarray Facility, Sanger Mouse Genetics Project, Adams N, Ramirez-Solis R, White JK, Adams DJ, Hancock RE and Dougan G

    Wellcome Trust Genome Campus, The Wellcome Trust Sanger Institute, Cambridge, United Kingdom.

    Stem cell differentiation and lineage specification depend on coordinated programs of gene expression, but our knowledge of the chromatin-modifying factors regulating these events remains incomplete. Ubiquitination of histone H2A (H2A-K119u) is a common chromatin modification associated with gene silencing, and controlled by the ubiquitin-ligase polycomb repressor complex 1 (PRC1) and H2A-deubiquitinating enzymes (H2A-DUBs). The roles of H2A-DUBs in mammalian development, stem cells, and hematopoiesis have not been addressed. Here we characterized an H2A-DUB targeted mouse line Mysm1(tm1a/tm1a) and demonstrated defects in BM hematopoiesis, resulting in lymphopenia, anemia, and thrombocytosis. Development of lymphocytes was impaired from the earliest stages of their differentiation, and there was also a depletion of erythroid cells and a defect in erythroid progenitor function. These phenotypes resulted from a cell-intrinsic requirement for Mysm1 in the BM. Importantly, Mysm1(tm1a/tm1a) HSCs were functionally impaired, and this was associated with elevated levels of reactive oxygen species, γH2AX DNA damage marker, and p53 protein in the hematopoietic progenitors. Overall, these data establish a role for Mysm1 in the maintenance of BM stem cell function, in the control of oxidative stress and genetic stability in hematopoietic progenitors, and in the development of lymphoid and erythroid lineages.

    Funded by: Canadian Institutes of Health Research; Wellcome Trust

    Blood 2011;119;6;1370-9

  • High incidence of recurrent copy number variants in patients with isolated and syndromic Müllerian aplasia.

    Nik-Zainal S, Strick R, Storer M, Huang N, Rad R, Willatt L, Fitzgerald T, Martin V, Sandford R, Carter NP, Janecke AR, Renner SP, Oppelt PG, Oppelt P, Schulze C, Brucker S, Hurles M, Beckmann MW, Strissel PL and Shaw-Smith C

    Department of Obstetrics and Gynecology, University-Clinic Erlangen, Erlangen, Germany.

    Background: Congenital malformations involving the Müllerian ducts are observed in around 5% of infertile women. Complete aplasia of the uterus, cervix, and upper vagina, also termed Müllerian aplasia or Mayer-Rokitansky-Kuster-Hauser (MRKH) syndrome, occurs with an incidence of around 1 in 4500 female births, and occurs in both isolated and syndromic forms. Previous reports have suggested that a proportion of cases, especially syndromic cases, are caused by variation in copy number at different genomic loci.

    Methods: In order to obtain an overview of the contribution of copy number variation to both isolated and syndromic forms of Müllerian aplasia, copy number assays were performed in a series of 63 cases, of which 25 were syndromic and 38 isolated.

    Results: A high incidence (9/63, 14%) of recurrent copy number variants in this cohort is reported here. These comprised four cases of microdeletion at 16p11.2, an autism susceptibility locus not previously associated with Müllerian aplasia, four cases of microdeletion at 17q12, and one case of a distal 22q11.2 microdeletion. Microdeletions at 16p11.2 and 17q12 were found in 4/38 (10.5%) cases with isolated Müllerian aplasia, and at 16p11.2, 17q12 and 22q11.2 (distal) in 5/25 cases (20%) with syndromic Müllerian aplasia.

    Conclusion: The finding of microdeletion at 16p11.2 in 2/38 (5%) of isolated and 2/25 (8%) of syndromic cases suggests a significant contribution of this copy number variant alone to the pathogenesis of Müllerian aplasia. Overall, the high incidence of recurrent copy number variants in all forms of Müllerian aplasia has implications for the understanding of the aetiopathogenesis of the condition, and for genetic counselling in families affected by it.

    Funded by: Wellcome Trust: 077008, 077014, 079973

    Journal of medical genetics 2011;48;3;197-204

  • Impact of temperament on depression and anxiety symptoms and depressive disorder in a population-based birth cohort.

    Nyman E, Miettunen J, Freimer N, Joukamaa M, Mäki P, Ekelund J, Peltonen L, Järvelin MR, Veijola J and Paunio T

    Public Health Genomics Unit, Institute for Molecular Medicine Finland FIMM, University of Helsinki and National Institute for Health and Welfare, Helsinki, Finland.

    Background: The aim of this study was to characterize at the population level how innate features of temperament relate to experience of depressive mood and anxiety, and whether these symptoms have separable temperamental backgrounds.

    Methods: The study subjects were 4773 members of the population-based Northern Finland Birth Cohort 1966, a culturally and genetically homogeneous study sample. Temperament was measured at age 31 using the temperament items of the Temperament and Character Inventory and a separate Pessimism score. Depressive mood was assessed based on a previous diagnosis of depressive disorder or symptoms of depression according to the Hopkins Symptom Check List - 25. Anxiety was assessed analogously.

    Results: High levels of Harm avoidance and Pessimism were related to both depressive mood (effect sizes; d=0.84 and d=1.25, respectively) and depressive disorder (d=0.68 and d=0.68, respectively). Of the dimensions of Harm avoidance, Anticipatory worry and Fatigability had the strongest effects. Symptoms of depression and anxiety showed very similar underlying temperament patterns.

    Limitations: Although Harm avoidance and Pessimism appear to be important endophenotype candidates for depression and anxiety, their potential usefulness as endophenotypes, and whether they meet all the suggested criteria for endophenotypes will remain to be confirmed in future studies.

    Conclusions: Personality characteristics of Pessimism and Harm avoidance, in particular its dimensions Anticipatory worry and Fatigability, are strongly related to symptoms of depression and anxiety as well as to depressive disorder in this population. These temperamental features may be used as dimensional susceptibility factors in etiological studies of depression, which may aid in the development of improved clinical practice.

    Journal of affective disorders 2011;131;1-3;393-7

  • A comprehensive evaluation of potential lung function associated genes in the SpiroMeta general population sample.

    Obeidat M, Wain LV, Shrine N, Kalsheker N, Soler Artigas M, Repapi E, Burton PR, Johnson T, Ramasamy A, Zhao JH, Zhai G, Huffman JE, Vitart V, Albrecht E, Igl W, Hartikainen AL, Pouta A, Cadby G, Hui J, Palmer LJ, Hadley D, McArdle WL, Rudnicka AR, Barroso I, Loos RJ, Wareham NJ, Mangino M, Soranzo N, Spector TD, Gläser S, Homuth G, Völzke H, Deloukas P, Granell R, Henderson J, Grkovic I, Jankovic S, Zgaga L, Polašek O, Rudan I, Wright AF, Campbell H, Wild SH, Wilson JF, Heinrich J, Imboden M, Probst-Hensch NM, Gyllensten U, Johansson Å, Zaboli G, Mustelin L, Rantanen T, Surakka I, Kaprio J, Jarvelin MR, Hayward C, Evans DM, Koch B, Musk AW, Elliott P, Strachan DP, Tobin MD, Sayers I, Hall IP and SpiroMeta Consortium

    Nottingham Respiratory Biomedical Research Unit, Division of Therapeutics and Molecular Medicine, University Hospital of Nottingham, Nottingham, United Kingdom.

    Rationale: Lung function measures are heritable traits that predict population morbidity and mortality and are essential for the diagnosis of chronic obstructive pulmonary disease (COPD). Variations in many genes have been reported to affect these traits, but attempts at replication have provided conflicting results. Recently, we undertook a meta-analysis of Genome Wide Association Study (GWAS) results for lung function measures in 20,288 individuals from the general population (the SpiroMeta consortium).

    Objectives: To comprehensively analyse previously reported genetic associations with lung function measures, and to investigate whether single nucleotide polymorphisms (SNPs) in these genomic regions are associated with lung function in a large population sample.

    Methods: We analysed association for SNPs tagging 130 genes and 48 intergenic regions (+/-10 kb), after conducting a systematic review of the literature in the PubMed database for genetic association studies reporting lung function associations.

    Results: The analysis included 16,936 genotyped and imputed SNPs. No loci showed overall significant association for FEV(1) or FEV(1)/FVC traits using a carefully defined significance threshold of 1.3×10(-5). The most significant loci associated with FEV(1) include SNPs tagging MACROD2 (P = 6.81×10(-5)), CNTN5 (P = 4.37×10(-4)), and TRPV4 (P = 1.58×10(-3)). Among ever-smokers, SERPINA1 showed the most significant association with FEV(1) (P = 8.41×10(-5)), followed by PDE4D (P = 1.22×10(-4)). The strongest association with FEV(1)/FVC ratio was observed with ABCC1 (P = 4.38×10(-4)), and ESR1 (P = 5.42×10(-4)) among ever-smokers.

    Conclusions: Polymorphisms spanning previously associated lung function genes did not show strong evidence for association with lung function measures in the SpiroMeta consortium population. Common SERPINA1 polymorphisms may affect FEV(1) among smokers in the general population.

    Funded by: Cancer Research UK; Chief Scientist Office: CZB/4/710; Medical Research Council: G0000934, G0401540, G0600705, G0701863, G0800582, G0801056, G0902125, G0902313, G1001799, G9815508, G990146, MC_QA137934, MC_U106179471, MC_U106188470; NHLBI NIH HHS: 5R01HL087679-02, R01 HL087679; NIDDK NIH HHS: U01 DK062418; NIMH NIH HHS: 1RL1MH083268-01, RL1 MH083268; Wellcome Trust: 068545/Z/02, 076113/B/04/Z, 077016/Z/05/Z, 079895, 092731

    PloS one 2011;6;5;e19382

  • Real-time sequencing.

    Otto TD

    Nature reviews. Microbiology 2011;9;9;633

  • RATT: Rapid Annotation Transfer Tool.

    Otto TD, Dillon GP, Degrave WS and Berriman M

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK.

    Second-generation sequencing technologies have made large-scale sequencing projects commonplace. However, making use of these datasets often requires gene function to be ascribed genome wide. Although tool development has kept pace with the changes in sequence production, for tasks such as mapping, de novo assembly or visualization, genome annotation remains a challenge. We have developed a method to rapidly provide accurate annotation for new genomes using previously annotated genomes as a reference. The method, implemented in a tool called RATT (Rapid Annotation Transfer Tool), transfers annotations from a high-quality reference to a new genome on the basis of conserved synteny. We demonstrate that a Mycobacterium tuberculosis genome or a single 2.5 Mb chromosome from a malaria parasite can be annotated in less than five minutes with only modest computational resources. RATT is available at

    Funded by: Wellcome Trust: WT 085775/Z/08/Z

    Nucleic acids research 2011;39;9;e57

  • High altitude adaptation in Daghestani populations from the Caucasus.

    Pagani L, Ayub Q, MacArthur DG, Xue Y, Baillie JK, Chen Y, Kozarewa I, Turner DJ, Tofanelli S, Bulayeva K, Kidd K, Paoli G and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Hinxton, UK.

    We have surveyed 15 high-altitude adaptation candidate genes for signals of positive selection in North Caucasian highlanders using targeted re-sequencing. A total of 49 unrelated Daghestani from three ethnic groups (Avars, Kubachians, and Laks) living in ancient villages located at around 2,000 m above sea level were chosen as the study population. Caucasian (Adygei living at sea level, N = 20) and CEU (CEPH Utah residents with ancestry from northern and western Europe; N = 20) were used as controls. Candidate genes were compared with 20 putatively neutral control regions resequenced in the same individuals. The regions of interest were amplified by long-PCR, pooled according to individual, indexed by adding an eight-nucleotide tag, and sequenced using the Illumina GAII platform. 1,066 SNPs were called using false discovery and false negative thresholds of ~6%. The neutral regions provided an empirical null distribution to compare with the candidate genes for signals of selection. Two genes stood out. In Laks, a non-synonymous variant within HIF1A already known to be associated with improvement in oxygen metabolism was rediscovered, and in Kubachians a cluster of 13 SNPs located in a conserved intronic region within EGLN1 showing high population differentiation was found. These variants illustrate both the common pathways of adaptation to high altitude in different populations and features specific to the Daghestani populations, showing how even a mildly hypoxic environment can lead to genetic adaptation.

    Funded by: Wellcome Trust

    Human genetics 2011;131;3;423-33

  • Coordinating cell cycle progression via cyclin specificity.

    Pagliuca FW, Collins MO and Choudhary JS

    Cell cycle (Georgetown, Tex.) 2011;10;24;4195-6

  • Quantitative proteomics reveals the basis for the biochemical specificity of the cell-cycle machinery.

    Pagliuca FW, Collins MO, Lichawska A, Zegerman P, Choudhary JS and Pines J

    The Gurdon Institute, University of Cambridge, Cambridge, UK.

    Cyclin-dependent kinases comprise the conserved machinery that drives progress through the cell cycle, but how they do this in mammalian cells is still unclear. To identify the mechanisms by which cyclin-cdks control the cell cycle, we performed a time-resolved analysis of the in vivo interactors of cyclins E1, A2, and B1 by quantitative mass spectrometry. This global analysis of context-dependent protein interactions reveals the temporal dynamics of cyclin function in which networks of cyclin-cdk interactions vary according to the type of cyclin and cell-cycle stage. Our results explain the temporal specificity of the cell-cycle machinery, thereby providing a biochemical mechanism for the genetic requirement for multiple cyclins in vivo and reveal how the actions of specific cyclins are coordinated to control the cell cycle. Furthermore, we identify key substrates (Wee1 and c15orf42/Sld3) that reveal how cyclin A is able to promote both DNA replication and mitosis.

    Funded by: Cancer Research UK: A7397; Wellcome Trust: 079643/Z/06/Z; Worldwide Cancer Research: 10-0908

    Molecular cell 2011;43;3;406-17

  • Identity-by-descent-based phasing and imputation in founder populations using graphical models.

    Palin K, Campbell H, Wright AF, Wilson JF and Durbin R

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    Accurate knowledge of haplotypes, the combination of alleles co-residing on a single copy of a chromosome, enables powerful gene mapping and sequence imputation methods. Since humans are diploid, haplotypes must be derived from genotypes by a phasing process. In this study, we present a new computational model for haplotype phasing based on pairwise sharing of haplotypes inferred to be Identical-By-Descent (IBD). We apply the Bayesian network based model in a new phasing algorithm, called systematic long-range phasing (SLRP), that can capitalize on the close genetic relationships in isolated founder populations, and show with simulated and real genome-wide genotype data that SLRP substantially reduces the rate of phasing errors compared to previous phasing algorithms. Furthermore, the method accurately identifies regions of IBD, enabling linkage-like studies without pedigrees, and can be used to impute most genotypes with very low error rate.

    Funded by: Chief Scientist Office: CZB/4/710; Medical Research Council: MC_U127561128; Wellcome Trust: 076113, 077192, 085475

    Genetic epidemiology 2011;35;8;853-60

  • Somatic SF3B1 mutation in myelodysplasia with ring sideroblasts.

    Papaemmanuil E, Cazzola M, Boultwood J, Malcovati L, Vyas P, Bowen D, Pellagatti A, Wainscoat JS, Hellstrom-Lindberg E, Gambacorti-Passerini C, Godfrey AL, Rapado I, Cvejic A, Rance R, McGee C, Ellis P, Mudie LJ, Stephens PJ, McLaren S, Massie CE, Tarpey PS, Varela I, Nik-Zainal S, Davies HR, Shlien A, Jones D, Raine K, Hinton J, Butler AP, Teague JW, Baxter EJ, Score J, Galli A, Della Porta MG, Travaglino E, Groves M, Tauro S, Munshi NC, Anderson KC, El-Naggar A, Fischer A, Mustonen V, Warren AJ, Cross NC, Green AR, Futreal PA, Stratton MR, Campbell PJ and Chronic Myeloid Disorders Working Group of the International Cancer Genome Consortium

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, United Kingdom

    Background: Myelodysplastic syndromes are a diverse and common group of chronic hematologic cancers. The identification of new genetic lesions could facilitate new diagnostic and therapeutic strategies.

    Methods: We used massively parallel sequencing technology to identify somatically acquired point mutations across all protein-coding exons in the genome in 9 patients with low-grade myelodysplasia. Targeted resequencing of the gene encoding RNA splicing factor 3B, subunit 1 (SF3B1), was also performed in a cohort of 2087 patients with myeloid or other cancers.

    Results: We identified 64 point mutations in the 9 patients. Recurrent somatically acquired mutations were identified in SF3B1. Follow-up revealed SF3B1 mutations in 72 of 354 patients (20%) with myelodysplastic syndromes, with particularly high frequency among patients whose disease was characterized by ring sideroblasts (53 of 82 [65%]). The gene was also mutated in 1 to 5% of patients with a variety of other tumor types. The observed mutations were less deleterious than was expected on the basis of chance, suggesting that the mutated protein retains structural integrity with altered function. SF3B1 mutations were associated with down-regulation of key gene networks, including core mitochondrial pathways. Clinically, patients with SF3B1 mutations had fewer cytopenias and longer event-free survival than patients without SF3B1 mutations.

    Conclusions: Mutations in SF3B1 implicate abnormalities of messenger RNA splicing in the pathogenesis of myelodysplastic syndromes. (Funded by the Wellcome Trust and others.).

    Funded by: Medical Research Council: G0800784, G1000729, MC_U105161083; NCI NIH HHS: P01 CA078378, P01 CA078378-10, R01 CA124929, R01 CA124929-05; PHS HHS: P01-155249, P01-78378, P50-100007, R01-124929; Wellcome Trust: 077012/Z/05/Z, 088340, 093867, WT088340MA; Worldwide Cancer Research: 08-0183

    The New England journal of medicine 2011;365;15;1384-95

  • Fetal-specific DNA methylation ratio permits noninvasive prenatal diagnosis of trisomy 21.

    Papageorgiou EA, Karagrigoriou A, Tsaliki E, Velissariou V, Carter NP and Patsalis PC

    Cytogenetics and Genomics Department, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus.

    The trials performed worldwide toward noninvasive prenatal diagnosis (NIPD) of Down's syndrome (or trisomy 21) have shown the commercial and medical potential of NIPD compared to the currently used invasive prenatal diagnostic procedures. Extensive investigation of methylation differences between the mother and the fetus has led to the identification of differentially methylated regions (DMRs). In this study, we present a strategy using the methylated DNA immunoprecipitation (MeDiP) methodology in combination with real-time quantitative PCR (qPCR) to achieve fetal chromosome dosage assessment, which can be performed noninvasively through the analysis of fetal-specific DMRs. We achieved noninvasive prenatal detection of trisomy 21 by determining the methylation ratio of normal and trisomy 21 cases for each tested fetal-specific DMR present in maternal peripheral blood, followed by further statistical analysis. The application of this fetal-specific methylation ratio approach provided correct diagnosis of 14 trisomy 21 and 26 normal cases.

    Funded by: Wellcome Trust: 079643

    Nature medicine 2011;17;4;510-3

  • Bacterial epidemiology and biology--lessons from genome sequencing.

    Parkhill J and Wren BW

    The Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Next-generation sequencing has ushered in a new era of microbial genomics, enabling the detailed historical and geographical tracing of bacteria. This is helping to shape our understanding of bacterial evolution.

    Funded by: Medical Research Council: G0300020; Wellcome Trust

    Genome biology 2011;12;10;230

  • Joint genetic analysis of gene expression data with inferred cellular phenotypes.

    Parts L, Stegle O, Winn J and Durbin R

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    Even within a defined cell type, the expression level of a gene differs in individual samples. The effects of genotype, measured factors such as environmental conditions, and their interactions have been explored in recent studies. Methods have also been developed to identify unmeasured intermediate factors that coherently influence transcript levels of multiple genes. Here, we show how to bring these two approaches together and analyse genetic effects in the context of inferred determinants of gene expression. We use a sparse factor analysis model to infer hidden factors, which we treat as intermediate cellular phenotypes that in turn affect gene expression in a yeast dataset. We find that the inferred phenotypes are associated with locus genotypes and environmental conditions and can explain genetic associations to genes in trans. For the first time, we consider and find interactions between genotype and intermediate phenotypes inferred from gene expression levels, complementing and extending established results.

    Funded by: Wellcome Trust: WT077192/Z/05/Z

    PLoS genetics 2011;7;1;e1001276

  • Maps of open chromatin guide the functional follow-up of genome-wide association signals: application to hematological traits.

    Paul DS, Nisbet JP, Yang TP, Meacham S, Rendon A, Hautaviita K, Tallila J, White J, Tijssen MR, Sivapalaratnam S, Basart H, Trip MD, Cardiogenics Consortium, MuTHER Consortium, Göttgens B, Soranzo N, Ouwehand WH and Deloukas P

    Wellcome Trust Sanger Institute, Hinxton, United Kingdom.

    Turning genetic discoveries identified in genome-wide association (GWA) studies into biological mechanisms is an important challenge in human genetics. Many GWA signals map outside exons, suggesting that the associated variants may lie within regulatory regions. We applied the formaldehyde-assisted isolation of regulatory elements (FAIRE) method in a megakaryocytic and an erythroblastoid cell line to map active regulatory elements at known loci associated with hematological quantitative traits, coronary artery disease, and myocardial infarction. We showed that the two cell types exhibit distinct patterns of open chromatin and that cell-specific open chromatin can guide the finding of functional variants. We identified an open chromatin region at chromosome 7q22.3 in megakaryocytes but not erythroblasts, which harbors the common non-coding sequence variant rs342293 known to be associated with platelet volume and function. Resequencing of this open chromatin region in 643 individuals provided strong evidence that rs342293 is the only putative causative variant in this region. We demonstrated that the C- and G-alleles differentially bind the transcription factor EVI1 affecting PIK3CG gene expression in platelets and macrophages. A protein-protein interaction network including up- and down-regulated genes in Pik3cg knockout mice indicated that PIK3CG is associated with gene pathways with an established role in platelet membrane biogenesis and thrombus formation. Thus, rs342293 is the functional common variant at this locus; to the best of our knowledge this is the first such variant to be elucidated among the known platelet quantitative trait loci (QTLs). Our data suggested a molecular mechanism by which a non-coding GWA index SNP modulates platelet phenotype.

    Funded by: British Heart Foundation: RG/08/014/24067, RG/09/012/28096, RG/09/12/28096; Medical Research Council: G0800784, G0900339, MC_U105260799; National Centre for the Replacement, Refinement and Reduction of Animals in Research: G0900729/1; Wellcome Trust: 081917/Z/07/Z, 091746/Z/10/Z

    PLoS genetics 2011;7;6;e1002139

  • Acquired bleeding disorders

    Perry,D.J and GROVE,C.

    Blood and Bone Marrow Pathology 2011;565-82

  • Citrobacter rodentium is an unstable pathogen showing evidence of significant genomic flux.

    Petty NK, Feltwell T, Pickard D, Clare S, Toribio AL, Fookes M, Roberts K, Monson R, Nair S, Kingsley RA, Bulgin R, Wiles S, Goulding D, Keane T, Corton C, Lennard N, Harris D, Willey D, Rance R, Yu L, Choudhary JS, Churcher C, Quail MA, Parkhill J, Frankel G, Dougan G, Salmond GP and Thomson NR

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    Citrobacter rodentium is a natural mouse pathogen that causes attaching and effacing (A/E) lesions. It shares a common virulence strategy with the clinically significant human A/E pathogens enteropathogenic E. coli (EPEC) and enterohaemorrhagic E. coli (EHEC) and is widely used to model this route of pathogenesis. We previously reported the complete genome sequence of C. rodentium ICC168, where we found that the genome displayed many characteristics of a newly evolved pathogen. In this study, through PFGE, sequencing of isolates showing variation, whole genome transcriptome analysis and examination of the mobile genetic elements, we found that, consistent with our previous hypothesis, the genome of C. rodentium is unstable as a result of repeat-mediated, large-scale genome recombination and because of active transposition of mobile genetic elements such as the prophages. We sequenced an additional C. rodentium strain, EX-33, to reveal that the reference strain ICC168 is representative of the species and that most of the inactivating mutations were common to both isolates and likely to have occurred early on in the evolution of this pathogen. We draw parallels with the evolution of other bacterial pathogens and conclude that C. rodentium is a recently evolved pathogen that may have emerged alongside the development of inbred mice as a model for human disease.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/E52708X/1; Medical Research Council; Wellcome Trust

    PLoS pathogens 2011;7;4;e1002018

  • A scalable pipeline for highly effective genetic modification of a malaria parasite.

    Pfander C, Anar B, Schwach F, Otto TD, Brochet M, Volkmann K, Quail MA, Pain A, Rosen B, Skarnes W, Rayner JC and Billker O

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    In malaria parasites, the systematic experimental validation of drug and vaccine targets by reverse genetics is constrained by the inefficiency of homologous recombination and by the difficulty of manipulating adenine and thymine (A+T)-rich DNA of most Plasmodium species in Escherichia coli. We overcame these roadblocks by creating a high-integrity library of Plasmodium berghei genomic DNA (>77% A+T content) in a bacteriophage N15-based vector that can be modified efficiently using the lambda Red method of recombineering. We built a pipeline for generating P. berghei genetic modification vectors at genome scale in serial liquid cultures on 96-well plates. Vectors have long homology arms, which increase recombination frequency up to tenfold over conventional designs. The feasibility of efficient genetic modification at scale will stimulate collaborative, genome-wide knockout and tagging programs for P. berghei.

    Funded by: Medical Research Council: G0501670, G0501670(76331); Wellcome Trust: 089085, WT089085/Z/09/Z

    Nature methods 2011;8;12;1078-82

  • Mendelian randomization study of B-type natriuretic peptide and type 2 diabetes: evidence of causal association from population studies.

    Pfister R, Sharp S, Luben R, Welsh P, Barroso I, Salomaa V, Meirhaeghe A, Khaw KT, Sattar N, Langenberg C and Wareham NJ

    Medical Research Council Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, United Kingdom.

    Background: Genetic and epidemiological evidence suggests an inverse association between B-type natriuretic peptide (BNP) levels in blood and risk of type 2 diabetes (T2D), but the prospective association of BNP with T2D is uncertain, and it is unclear whether the association is confounded.

    Methods and findings: We analysed the association between levels of the N-terminal fragment of pro-BNP (NT-pro-BNP) in blood and risk of incident T2D in a prospective case-cohort study and genotyped the variant rs198389 within the BNP locus in three T2D case-control studies. We combined our results with existing data in a meta-analysis of 11 case-control studies. Using a Mendelian randomization approach, we compared the observed association between rs198389 and T2D to that expected from the NT-pro-BNP level to T2D association and the NT-pro-BNP difference per C allele of rs198389. In participants of our case-cohort study who were free of T2D and cardiovascular disease at baseline, we observed a 21% (95% CI 3%-36%) decreased risk of incident T2D per one standard deviation (SD) higher log-transformed NT-pro-BNP levels in analysis adjusted for age, sex, body mass index, systolic blood pressure, smoking, family history of T2D, history of hypertension, and levels of triglycerides, high-density lipoprotein cholesterol, and low-density lipoprotein cholesterol. The association between rs198389 and T2D observed in case-control studies (odds ratio = 0.94 per C allele, 95% CI 0.91-0.97) was similar to that expected (0.96, 0.93-0.98) based on the pooled estimate for the log-NT-pro-BNP level to T2D association derived from a meta-analysis of our study and published data (hazard ratio = 0.82 per SD, 0.74-0.90) and the difference in NT-pro-BNP levels (0.22 SD, 0.15-0.29) per C allele of rs198389. No significant associations were observed between the rs198389 genotype and potential confounders.

    Conclusions: Our results provide evidence for a potential causal role of the BNP system in the aetiology of T2D. Further studies are needed to investigate the mechanisms underlying this association and possibilities for preventive interventions. Please see later in the article for the Editors' Summary.

    Funded by: British Heart Foundation: FS/10/005/28147; Medical Research Council: G0401527, G0601463, G1000143; Wellcome Trust: 077016/Z/05/Z

    PLoS medicine 2011;8;10;e1001112

  • Jamb and jamc are essential for vertebrate myocyte fusion.

    Powell GT and Wright GJ

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    Cellular fusion is required in the development of several tissues, including skeletal muscle. In vertebrates, this process is poorly understood and lacks an in vivo-validated cell surface heterophilic receptor pair that is necessary for fusion. Identification of essential cell surface interactions between fusing cells is an important step in elucidating the molecular mechanism of cellular fusion. We show here that the zebrafish orthologues of JAM-B and JAM-C receptors are essential for fusion of myocyte precursors to form syncytial muscle fibres. Both jamb and jamc are dynamically co-expressed in developing muscles and encode receptors that physically interact. Heritable mutations in either gene prevent myocyte fusion in vivo, resulting in an overabundance of mononuclear, but otherwise overtly normal, functional fast-twitch muscle fibres. Transplantation experiments show that the Jamb and Jamc receptors must interact between neighbouring cells (in trans) for fusion to occur. We also show that jamc is ectopically expressed in prdm1a mutant slow muscle precursors, which inappropriately fuse with other myocytes, suggesting that control of myocyte fusion through regulation of jamc expression has important implications for the growth and patterning of muscles. Our discovery of a receptor-ligand pair critical for fusion in vivo has important implications for understanding the molecular mechanisms responsible for myocyte fusion and its regulation in vertebrate myogenesis.

    Funded by: Wellcome Trust: 077047/Z/05/Z, 077108/Z/05/Z

    PLoS biology 2011;9;12;e1001216

  • A resource of vectors and ES cells for targeted deletion of microRNAs in mice.

    Prosser HM, Koike-Yusa H, Cooper JD, Law FC and Bradley A

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    The 21-23 nucleotide, single-stranded RNAs classified as microRNAs (miRNA) perform fundamental roles in diverse cellular and developmental processes. In contrast to the situation for protein-coding genes, no public resource of miRNA mouse mutant alleles exists. Here we describe a collection of 428 miRNA targeting vectors covering 476 of the miRNA genes annotated in the miRBase registry. Using these vectors, we generated a library of highly germline-transmissible C57BL/6N mouse embryonic stem (ES) cell clones harboring targeted deletions for 392 miRNA genes. For most of these targeted clones, chimerism and germline transmission can be scored through a coat color marker. The targeted alleles have been designed to be adaptable research tools that can be efficiently altered by recombinase-mediated cassette exchange to create reporter, conditional and other allelic variants. This miRNA knockout (mirKO) resource can be searched electronically and is available from ES cell repositories for distribution to the scientific community.

    Funded by: Wellcome Trust: 079643

    Nature biotechnology 2011;29;9;840-5

  • Genomic libraries: I. Construction and screening of fosmid genomic libraries.

    Quail MA, Matthews L, Sims S, Lloyd C, Beasley H and Baxter SW

    Sequencing Research and Development, Wellcome Trust Sanger Institute, Cambridge, UK.

    Large insert genome libraries have been a core resource required to sequence genomes, analyze haplotypes, and aid gene discovery. While next generation sequencing technologies are revolutionizing the field of genomics, traditional genome libraries will still be required for accurate genome assembly. Their utility is also being extended to functional studies for understanding DNA regulatory elements. Here, we present a detailed method for constructing genomic fosmid libraries, testing for common contaminants, gridding the library to nylon membranes, then hybridizing the library membranes with a radiolabeled probe to identify corresponding genomic clones. While this chapter focuses on fosmid libraries, many of these steps can also be applied to bacterial artificial chromosome libraries.

    Methods in molecular biology (Clifton, N.J.) 2011;772;37-58

  • Genomic libraries: II. Subcloning, sequencing, and assembling large-insert genomic DNA clones.

    Quail MA, Matthews L, Sims S, Lloyd C, Beasley H and Baxter SW

    Sequencing Research and Development, Wellcome Trust Sanger Institute, Cambridge, UK.

    Sequencing large insert clones to completion is useful for characterizing specific genomic regions, identifying haplotypes, and closing gaps in whole genome sequencing projects. Despite being a standard technique in molecular laboratories, DNA sequencing using the Sanger method can be highly problematic when complex secondary structures or sequence repeats are encountered in genomic clones. Here, we describe methods to isolate DNA from a large insert clone (fosmid or BAC), subclone the sample, and sequence the region to the highest industry standard. Troubleshooting solutions for sequencing difficult templates are discussed.

    Methods in molecular biology (Clifton, N.J.) 2011;772;59-81

  • Optimal enzymes for amplifying sequencing libraries.

    Quail MA, Otto TD, Gu Y, Harris SR, Skelly TF, McQuillan JA, Swerdlow HP and Oyola SO

    Nature methods 2011;9;1;10-1

  • Early Diagnosis of Werner's Syndrome Using Exome-Wide Sequencing in a Single, Atypical Patient.

    Raffan E, Hurst LA, Turki SA, Carpenter G, Scott C, Daly A, Coffey A, Bhaskar S, Howard E, Khan N, Kingston H, Palotie A, Savage DB, O'Driscoll M, Smith C, O'Rahilly S, Barroso I and Semple RK

    Institute of Metabolic Science, University of Cambridge Metabolic Research Laboratories Cambridge, UK.

    Genetic diagnosis of inherited metabolic disease is conventionally achieved through syndrome recognition and targeted gene sequencing, but many patients receive no specific diagnosis. Next-generation sequencing allied to capture of expressed sequences from genomic DNA now offers a powerful new diagnostic approach. Barriers to routine diagnostic use include cost, and the complexity of interpreting results arising from simultaneous identification of large numbers of variants. We applied exome-wide sequencing to an individual, 16-year-old daughter of consanguineous parents with a novel syndrome of short stature, severe insulin resistance, ptosis, and microcephaly. Pulldown of expressed sequences from genomic DNA followed by massively parallel sequencing was undertaken. Single nucleotide variants were called using SAMtools prior to filtering based on sequence quality and existence in control genomes and exomes. Of 485 genetic variants predicted to alter protein sequence and absent from control data, 24 were homozygous in the patient. One mutation - the p.Arg732X mutation in the WRN gene - has previously been reported in Werner's syndrome (WS). On re-evaluation of the patient several early features of WS were detected including loss of fat from the extremities and frontal hair thinning. Lymphoblastoid cells from the proband exhibited a defective decatenation checkpoint, consistent with loss of WRN activity. We have thus diagnosed WS some 15 years earlier than average, permitting aggressive prophylactic therapy and screening for WS complications, illustrating the potential of exome-wide sequencing to achieve early diagnosis and change management of rare autosomal recessive disease, even in individual patients of consanguineous parentage with apparently novel syndromes.

    Funded by: Medical Research Council: G0700733; Wellcome Trust: 095515

    Frontiers in endocrinology 2011;2;8

  • Founder effect in the Horn of Africa for an insulin receptor mutation that may impair receptor recycling.

    Raffan E, Soos MA, Rocha N, Tuthill A, Thomsen AR, Hyden CS, Gregory JW, Hindmarsh P, Dattani M, Cochran E, Al Kaabi J, Gorden P, Barroso I, Morling N, O'Rahilly S and Semple RK

    University of Cambridge Metabolic Research Laboratories, Institute of Metabolic Science, University of Cambridge, Addenbrooke's Hospital B289, Cambridge, CB2 0QR, UK.

    Aims/hypothesis: Genetic insulin receptoropathies are a rare cause of severe insulin resistance. We identified the Ile119Met missense mutation in the insulin receptor INSR gene, previously reported in a Yemeni kindred, in four unrelated patients with Somali ancestry. We aimed to investigate a possible genetic founder effect, and to study the mechanism of loss of function of the mutant receptor.

    Methods: Biochemical profiling and DNA haplotype analysis of affected patients were performed. Insulin receptor expression in lymphoblastoid cells from a homozygous p.Ile119Met INSR patient, and in cells heterologously expressing the mutant receptor, was examined. Insulin binding, insulin-stimulated receptor autophosphorylation, and cooperativity and pH dependency of insulin dissociation were also assessed.

    Results: All patients had biochemical profiles pathognomonic of insulin receptoropathy, while haplotype analysis revealed the putative shared region around the INSR mutant to be no larger than 28 kb. An increased insulin proreceptor to β subunit ratio was seen in patient-derived cells. Steady state insulin binding and insulin-stimulated autophosphorylation of the mutant receptor was normal; however it exhibited decreased insulin dissociation rates with preserved cooperativity, a difference accentuated at low pH.

    Conclusions/interpretation: The p.Ile119Met INSR appears to have arisen around the Horn of Africa, and should be sought first in severely insulin resistant patients with ancestry from this region. Despite collectively compelling genetic, clinical and biochemical evidence for its pathogenicity, loss of function in conventional in vitro assays is subtle, suggesting mildly impaired receptor recycling only.

    Funded by: Medical Research Council; Wellcome Trust: 077016/Z/05/Z, 078986/Z/06/Z, 080952/Z/06/Z, 087678/Z/08/Z, 095515

    Diabetologia 2011;54;5;1057-65

  • Evidence that Cd101 is an autoimmune diabetes gene in nonobese diabetic mice.

    Rainbow DB, Moule C, Fraser HI, Clark J, Howlett SK, Burren O, Christensen M, Moody V, Steward CA, Mohammed JP, Fusakio ME, Masteller EL, Finger EB, Houchins JP, Naf D, Koentgen F, Ridgway WM, Todd JA, Bluestone JA, Peterson LB, Mattner J and Wicker LS

    Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, United Kingdom.

    We have previously proposed that sequence variation of the CD101 gene between NOD and C57BL/6 mice accounts for the protection from type 1 diabetes (T1D) provided by the insulin-dependent diabetes susceptibility region 10 (Idd10), a <1 Mb region on mouse chromosome 3. In this study, we provide further support for the hypothesis that Cd101 is Idd10 using haplotype and expression analyses of novel Idd10 congenic strains coupled to the development of a CD101 knockout mouse. Susceptibility to T1D was correlated with genotype-dependent CD101 expression on multiple cell subsets, including Foxp3(+) regulatory CD4(+) T cells, CD11c(+) dendritic cells, and Gr1(+) myeloid cells. The correlation of CD101 expression on immune cells from four independent Idd10 haplotypes with the development of T1D supports the identity of Cd101 as Idd10. Because CD101 has been associated with regulatory T and Ag presentation cell functions, our results provide a further link between immune regulation and susceptibility to T1D.

    Funded by: NIAID NIH HHS: N01 AI015416, N01AI15416, P01 AI039671, P01 AI039671-16, P01AI039671; NIDDK NIH HHS: P30 DK078392, P30 DK078392-01, R01 DK084054, R01 DK084054-03, R01DK084054; Wellcome Trust: 079895, 091157

    Journal of immunology (Baltimore, Md. : 1950) 2011;187;1;325-36

  • Cutting edge: the membrane attack complex of complement is required for the development of murine experimental cerebral malaria.

    Ramos TN, Darley MM, Hu X, Billker O, Rayner JC, Ahras M, Wohler JE and Barnum SR

    Department of Microbiology, University of Alabama at Birmingham, Birmingham, AL 35294, USA.

    Cerebral malaria is the most severe complication of Plasmodium falciparum infection and accounts for a large number of malaria fatalities worldwide. Recent studies demonstrated that C5(-/-) mice are resistant to experimental cerebral malaria (ECM) and suggested that protection was due to loss of C5a-induced inflammation. Surprisingly, we observed that C5aR(-/-) mice were fully susceptible to disease, indicating that C5a is not required for ECM. C3aR(-/-) and C3aR(-/-) × C5aR(-/-) mice were equally susceptible to ECM as were wild-type mice, indicating that neither complement anaphylatoxin receptor is critical for ECM development. In contrast, C9 deposition in the brains of mice with ECM suggested an important role for the terminal complement pathway. Treatment with anti-C9 Ab significantly increased survival time and reduced mortality in ECM. Our data indicate that protection from ECM in C5(-/-) mice is mediated through inhibition of membrane attack complex formation and not through C5a-induced inflammation.

    Funded by: Medical Research Council: G0501670; NIAID NIH HHS: AI08382, R03 AI083820, R03 AI083820-02, T32 AI007051, T32 AI007051-35, T32 AI07051

    Journal of immunology (Baltimore, Md. : 1950) 2011;186;12;6657-60

  • A plethora of Plasmodium species in wild apes: a source of human infection?

    Rayner JC, Liu W, Peeters M, Sharp PM and Hahn BH

    Malaria Programme, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Recent studies of captive and wild-living apes in Africa have uncovered evidence of numerous new Plasmodium species, one of which was identified as the immediate precursor of human Plasmodium falciparum. These findings raise the question whether wild apes could be a recurrent source of Plasmodium infections in humans. This question is not new, but was the subject of intense investigation by researchers in the first half of the last century. Re-examination of their work in the context of recent molecular findings provides a new framework to understand the diversity of Plasmodium species and to assess the risk of future cross-species transmissions to humans in the context of proposed malaria eradication programs.

    Funded by: NIAID NIH HHS: P30 AI 27767, P30 AI027767, R01 AI050529, R01 AI058715, R01 AI091595, R01 AI50529, R01 AI58715, R03 AI074778, R37 AI050529; Wellcome Trust

    Trends in parasitology 2011;27;5;222-9

  • Genome sequencing gets func-y.

    Reid AJ

    Nature reviews. Microbiology 2011;9;6;401

  • Marked endotheliotropism of highly pathogenic avian influenza virus H5N1 following intestinal inoculation in cats.

    Reperant LA, van de Bildt MW, van Amerongen G, Leijten LM, Watson S, Palser A, Kellam P, Eissens AC, Frijlink HW, Osterhaus AD and Kuiken T

    Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey, USA.

    Highly pathogenic avian influenza virus (HPAIV) H5N1 can infect mammals via the intestine; this is unusual since influenza viruses typically infect mammals via the respiratory tract. The dissemination of HPAIV H5N1 following intestinal entry and associated pathogenesis are largely unknown. To assess the route of spread of HPAIV H5N1 to other organs and to determine its associated pathogenesis, we inoculated infected chicken liver homogenate directly into the intestine of cats by use of enteric-coated capsules. Intestinal inoculation of HPAIV H5N1 resulted in fatal systemic disease. The spread of HPAIV H5N1 from the lumen of the intestine to other organs took place via the blood and lymphatic vascular systems but not via neuronal transmission. Remarkably, the systemic spread of the virus via the vascular system was associated with massive infection of endothelial and lymphendothelial cells, resulting in widespread hemorrhages. This is unique for influenza in mammals and resembles the pathogenesis of HPAIV infection in terrestrial poultry. It contrasts with the pathogenesis of systemic disease from the same virus following entry via the respiratory tract, where lesions are characterized mainly by necrosis and inflammation and are associated with the presence of influenza virus antigen in parenchymal, not endothelial cells. The marked endotheliotropism of the virus following intestinal inoculation indicates that the pathogenesis of systemic influenza virus infection in mammals may differ according to the portal of entry.

    Funded by: Wellcome Trust

    Journal of virology 2011;86;2;1158-65

  • Drug-resistant genotypes and multi-clonality in Plasmodium falciparum analysed by direct genome sequencing from peripheral blood of malaria patients.

    Robinson T, Campino SG, Auburn S, Assefa SA, Polley SD, Manske M, MacInnis B, Rockett KA, Maslen GL, Sanders M, Quail MA, Chiodini PL, Kwiatkowski DP, Clark TG and Sutherland CJ

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.

    Naturally acquired blood-stage infections of the malaria parasite Plasmodium falciparum typically harbour multiple haploid clones. The apparent number of clones observed in any single infection depends on the diversity of the polymorphic markers used for the analysis, and the relative abundance of rare clones, which frequently fail to be detected among PCR products derived from numerically dominant clones. However, minority clones are of clinical interest as they may harbour genes conferring drug resistance, leading to enhanced survival after treatment and the possibility of subsequent therapeutic failure. We deployed new generation sequencing to derive genome data for five non-propagated parasite isolates taken directly from 4 different patients treated for clinical malaria in a UK hospital. Analysis of depth of coverage and length of sequence intervals between paired reads identified both previously described and novel gene deletions and amplifications. Full-length sequence data was extracted for 6 loci considered to be under selection by antimalarial drugs, and both known and previously unknown amino acid substitutions were identified. Full mitochondrial genomes were extracted from the sequencing data for each isolate, and these are compared against a panel of polymorphic sites derived from published or unpublished but publicly available data. Finally, genome-wide analysis of clone multiplicity was performed, and the number of infecting parasite clones estimated for each isolate. Each patient harboured at least 3 clones of P. falciparum by this analysis, consistent with results obtained with conventional PCR analysis of polymorphic merozoite antigen loci. We conclude that genome sequencing of peripheral blood P. falciparum taken directly from malaria patients provides high quality data useful for drug resistance studies, genomic structural analyses and population genetics, and also robustly represents clonal multiplicity.

    Funded by: Medical Research Council: G0600718, G19/9; Wellcome Trust: 077012/Z/05/Z, 090532

    PloS one 2011;6;8;e23204

  • Chromosome and gene copy number variation allow major structural change between species and strains of Leishmania.

    Rogers MB, Hilley JD, Dickens NJ, Wilkes J, Bates PA, Depledge DP, Harris D, Her Y, Herzyk P, Imamura H, Otto TD, Sanders M, Seeger K, Dujardin JC, Berriman M, Smith DF, Hertz-Fowler C and Mottram JC

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, United Kingdom.

    Leishmania parasites cause a spectrum of clinical pathology in humans ranging from disfiguring cutaneous lesions to fatal visceral leishmaniasis. We have generated a reference genome for Leishmania mexicana and refined the reference genomes for Leishmania major, Leishmania infantum, and Leishmania braziliensis. This has allowed the identification of a remarkably low number of genes or paralog groups (2, 14, 19, and 67, respectively) unique to one species. These were found to be conserved in additional isolates of the same species. We have predicted allelic variation and find that in these isolates, L. major and L. infantum have a surprisingly low number of predicted heterozygous SNPs compared with L. braziliensis and L. mexicana. We used short read coverage to infer ploidy and gene copy numbers, identifying large copy number variations between species, with 200 tandem gene arrays in L. major and 132 in L. mexicana. Chromosome copy number also varied significantly between species, with nine supernumerary chromosomes in L. infantum, four in L. mexicana, two in L. braziliensis, and one in L. major. A significant bias against gene arrays on supernumerary chromosomes was shown to exist, indicating that duplication events occur more frequently on disomic chromosomes. Taken together, our data demonstrate that there is little variation in unique gene content across Leishmania species, but large-scale genetic heterogeneity can result through gene amplification on disomic chromosomes and variation in chromosome number. Increased gene copy number due to chromosome amplification may contribute to alterations in gene expression in response to environmental conditions in the host, providing a genetic basis for disease tropism.

    Funded by: Wellcome Trust: 076355, 085775, 085822

    Genome research 2011;21;12;2129-42

  • Emergent neutrality in adaptive asexual evolution.

    Schiffels S, Szöllosi GJ, Mustonen V and Lässig M

    Institut für Theoretische Physik, Universität zu Köln, 50937 Köln, Germany.

    In nonrecombining genomes, genetic linkage can be an important evolutionary force. Linkage generates interference interactions, by which simultaneously occurring mutations affect each other's chance of fixation. Here, we develop a comprehensive model of adaptive evolution in linked genomes, which integrates interference interactions between multiple beneficial and deleterious mutations into a unified framework. By an approximate analytical solution, we predict the fixation rates of these mutations, as well as the probabilities of beneficial and deleterious alleles at fixed genomic sites. We find that interference interactions generate a regime of emergent neutrality: all genomic sites with selection coefficients smaller in magnitude than a characteristic threshold have nearly random fixed alleles, and both beneficial and deleterious mutations at these sites have nearly neutral fixation rates. We show that this dynamic limits not only the speed of adaptation, but also a population's degree of adaptation in its current environment. We apply the model to different scenarios: stationary adaptation in a time-dependent environment and approach to equilibrium in a fixed environment. In both cases, the analytical predictions are in good agreement with numerical simulations. Our results suggest that interference can severely compromise biological functions in an adapting population, which sets viability limits on adaptive evolution under linkage.

    Funded by: Wellcome Trust: 091747

    Genetics 2011;189;4;1361-75

  • Bioinformatics Training Network (BTN): a community resource for bioinformatics trainers.

    Schneider MV, Walter P, Blatter MC, Watson J, Brazas MD, Rother K, Budd A, Via A, van Gelder CW, Jacob J, Fernandes P, Nyrönen TH, De Las Rivas J, Blicher T, Jimenez RC, Loveland J, McDowall J, Jones P, Vaughan BW, Lopez R, Attwood TK and Brooksbank C

    EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

    Funding bodies are increasingly recognizing the need to provide graduates and researchers with access to short intensive courses in a variety of disciplines, in order both to improve the general skills base and to provide solid foundations on which researchers may build their careers. In response to the development of 'high-throughput biology', the need for training in the field of bioinformatics, in particular, is seeing a resurgence: it has been defined as a key priority by many Institutions and research programmes and is now an important component of many grant proposals. Nevertheless, when it comes to planning and preparing to meet such training needs, tension arises between the reward structures that predominate in the scientific community which compel individuals to publish or perish, and the time that must be devoted to the design, delivery and maintenance of high-quality training materials. Conversely, there is much relevant teaching material and training expertise available worldwide that, were it properly organized, could be exploited by anyone who needs to provide training or needs to set up a new course. To do this, however, the materials would have to be centralized in a database and clearly tagged in relation to target audiences, learning objectives, etc. Ideally, they would also be peer reviewed, and easily and efficiently accessible for downloading. Here, we present the Bioinformatics Training Network (BTN), a new enterprise that has been initiated to address these needs and review it, respectively, to similar initiatives and collections.

    Briefings in bioinformatics 2011;13;3;383-9

  • Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease.

    Schunkert H, König IR, Kathiresan S, Reilly MP, Assimes TL, Holm H, Preuss M, Stewart AF, Barbalic M, Gieger C, Absher D, Aherrahrou Z, Allayee H, Altshuler D, Anand SS, Andersen K, Anderson JL, Ardissino D, Ball SG, Balmforth AJ, Barnes TA, Becker DM, Becker LC, Berger K, Bis JC, Boekholdt SM, Boerwinkle E, Braund PS, Brown MJ, Burnett MS, Buysschaert I, Cardiogenics, Carlquist JF, Chen L, Cichon S, Codd V, Davies RW, Dedoussis G, Dehghan A, Demissie S, Devaney JM, Diemert P, Do R, Doering A, Eifert S, Mokhtari NE, Ellis SG, Elosua R, Engert JC, Epstein SE, de Faire U, Fischer M, Folsom AR, Freyer J, Gigante B, Girelli D, Gretarsdottir S, Gudnason V, Gulcher JR, Halperin E, Hammond N, Hazen SL, Hofman A, Horne BD, Illig T, Iribarren C, Jones GT, Jukema JW, Kaiser MA, Kaplan LM, Kastelein JJ, Khaw KT, Knowles JW, Kolovou G, Kong A, Laaksonen R, Lambrechts D, Leander K, Lettre G, Li M, Lieb W, Loley C, Lotery AJ, Mannucci PM, Maouche S, Martinelli N, McKeown PP, Meisinger C, Meitinger T, Melander O, Merlini PA, Mooser V, Morgan T, Mühleisen TW, Muhlestein JB, Münzel T, Musunuru K, Nahrstaedt J, Nelson CP, Nöthen MM, Olivieri O, Patel RS, Patterson CC, Peters A, Peyvandi F, Qu L, Quyyumi AA, Rader DJ, Rallidis LS, Rice C, Rosendaal FR, Rubin D, Salomaa V, Sampietro ML, Sandhu MS, Schadt E, Schäfer A, Schillert A, Schreiber S, Schrezenmeir J, Schwartz SM, Siscovick DS, Sivananthan M, Sivapalaratnam S, Smith A, Smith TB, Snoep JD, Soranzo N, Spertus JA, Stark K, Stirrups K, Stoll M, Tang WH, Tennstedt S, Thorgeirsson G, Thorleifsson G, Tomaszewski M, Uitterlinden AG, van Rij AM, Voight BF, Wareham NJ, Wells GA, Wichmann HE, Wild PS, Willenborg C, Witteman JC, Wright BJ, Ye S, Zeller T, Ziegler A, Cambien F, Goodall AH, Cupples LA, Quertermous T, März W, Hengstenberg C, Blankenberg S, Ouwehand WH, Hall AS, Deloukas P, Thompson JR, Stefansson K, Roberts R, Thorsteinsdottir U, O'Donnell CJ, McPherson R, Erdmann J, CARDIoGRAM Consortium and Samani NJ

    Universität zu Lübeck, Medizinische Klinik II, Lübeck, Germany.

    We performed a meta-analysis of 14 genome-wide association studies of coronary artery disease (CAD) comprising 22,233 individuals with CAD (cases) and 64,762 controls of European descent followed by genotyping of top association signals in 56,682 additional individuals. This analysis identified 13 loci newly associated with CAD at P < 5 × 10⁻⁸ and confirmed the association of 10 of 12 previously reported CAD loci. The 13 new loci showed risk allele frequencies ranging from 0.13 to 0.91 and were associated with a 6% to 17% increase in the risk of CAD per allele. Notably, only three of the new loci showed significant association with traditional CAD risk factors and the majority lie in gene regions not previously implicated in the pathogenesis of CAD. Finally, five of the new CAD risk loci appear to have pleiotropic effects, showing strong association with various other human diseases or traits.

    Funded by: British Heart Foundation: PG/08/094/26019, RG/09/012/28096; Medical Research Council: G0401527, G0801566, G1000143, MC_U106179471; NHLBI NIH HHS: HL087647, R01HL089650-02

    Nature genetics 2011;43;4;333-8

  • A role for cohesin in T-cell-receptor rearrangement and thymocyte differentiation.

    Seitan VC, Hao B, Tachibana-Konwalski K, Lavagnolli T, Mira-Bontenbal H, Brown KE, Teng G, Carroll T, Terry A, Horan K, Marks H, Adams DJ, Schatz DG, Aragon L, Fisher AG, Krangel MS, Nasmyth K and Merkenschlager M

    Lymphocyte Development Group, MRC Clinical Sciences Centre, Imperial College London, Du Cane Road, London W12 0NN, UK.

    Cohesin enables post-replicative DNA repair and chromosome segregation by holding sister chromatids together from the time of DNA replication in S phase until mitosis. There is growing evidence that cohesin also forms long-range chromosomal cis-interactions and may regulate gene expression in association with CTCF, mediator or tissue-specific transcription factors. Human cohesinopathies such as Cornelia de Lange syndrome are thought to result from impaired non-canonical cohesin functions, but a clear distinction between the cell-division-related and cell-division-independent functions of cohesion--as exemplified in Drosophila--has not been demonstrated in vertebrate systems. To address this, here we deleted the cohesin locus Rad21 in mouse thymocytes at a time in development when these cells stop cycling and rearrange their T-cell receptor (TCR) α locus (Tcra). Rad21-deficient thymocytes had a normal lifespan and retained the ability to differentiate, albeit with reduced efficiency. Loss of Rad21 led to defective chromatin architecture at the Tcra locus, where cohesion-binding sites flank the TEA promoter and the Eα enhancer, and demarcate Tcra from interspersed Tcrd elements and neighbouring housekeeping genes. Cohesin was required for long-range promoter-enhancer interactions, Tcra transcription, H3K4me3 histone modifications that recruit the recombination machinery and Tcra rearrangement. Provision of pre-rearranged TCR transgenes largely rescued thymocyte differentiation, demonstrating that among thousands of potential target genes across the genome, defective Tcra rearrangement was limiting for the differentiation of cohesin-deficient thymocytes. These findings firmly establish a cell-division-independent role for cohesin in Tcra locus rearrangement and provide a comprehensive account of the mechanisms by which cohesin enables cellular differentiation in a well-characterized mammalian system.

    Funded by: Cancer Research UK: 13031; Howard Hughes Medical Institute; Medical Research Council: MC_U120027516, MC_U120081295; NIAID NIH HHS: R37 AI032524, R37 AI032524-20; NIGMS NIH HHS: R37 GM041052, R37 GM041052-22; Wellcome Trust

    Nature 2011;476;7361;467-71

  • Silencing of RhoA nucleotide exchange factor, ARHGEF3, reveals its unexpected role in iron uptake.

    Serbanovic-Canic J, Cvejic A, Soranzo N, Stemple DL, Ouwehand WH and Freson K

    Department of Haematology, University of Cambridge and NHS Blood and Transplant, Cambridge, UK.

    Genomewide association meta-analysis studies have identified > 100 independent genetic loci associated with blood cell indices, including volume and count of platelets and erythrocytes. Although several of these loci encode known regulators of hematopoiesis, the mechanism by which most sequence variants exert their effect on blood cell formation remains elusive. An example is the Rho guanine nucleotide exchange factor, ARHGEF3, which was previously implicated by genomewide association meta-analysis studies in bone cell biology. Here, we report on the unexpected role of ARHGEF3 in regulation of iron uptake and erythroid cell maturation. Although early erythroid differentiation progressed normally, silencing of arhgef3 in Danio rerio resulted in microcytic and hypochromic anemia. This was rescued by intracellular supplementation of iron, showing that arhgef3-depleted erythroid cells are fully capable of hemoglobinization. Disruption of the arhgef3 target, RhoA, also produced severe anemia, which was, again, corrected by iron injection. Moreover, silencing of ARHGEF3 in erythromyeloblastoid cells K562 showed that the uptake of transferrin was severely impaired. Taken together, this is the first study to provide evidence for ARHGEF3 being a regulator of transferrin uptake in erythroid cells, through activation of RHOA.

    Funded by: British Heart Foundation: RG/09/012/28096; Wellcome Trust: WT 077037/Z/05/Z, WT077047/Z/05/Z, WT082597/Z/07/Z

    Blood 2011;118;18;4967-76

  • Indian Siddis: African descendants with Indian admixture.

    Shah AM, Tamang R, Moorjani P, Rani DS, Govindaraj P, Kulkarni G, Bhattacharya T, Mustak MS, Bhaskar LV, Reddy AG, Gadhvi D, Gai PB, Chaubey G, Patterson N, Reich D, Tyler-Smith C, Singh L and Thangaraj K

    Centre for Cellular and Molecular Biology, Council of Scientific and Industrial Research, Hyderabad, India.

    The Siddis (Afro-Indians) are a tribal population whose members live in coastal Karnataka, Gujarat, and in some parts of Andhra Pradesh. Historical records indicate that the Portuguese brought the Siddis to India from Africa about 300-500 years ago; however, there is little information about their more precise ancestral origins. Here, we perform a genome-wide survey to understand the population history of the Siddis. Using hundreds of thousands of autosomal markers, we show that they have inherited ancestry from Africans, Indians, and possibly Europeans (Portuguese). Additionally, analyses of the uniparental (Y-chromosomal and mitochondrial DNA) markers indicate that the Siddis trace their ancestry to Bantu speakers from sub-Saharan Africa. We estimate that the admixture between the African ancestors of the Siddis and neighboring South Asian groups probably occurred in the past eight generations (∼200 years ago), consistent with historical records.

    American journal of human genetics 2011;89;1;154-61

  • Common variants on 8p12 and 1q24.2 confer risk of schizophrenia.

    Shi Y, Li Z, Xu Q, Wang T, Li T, Shen J, Zhang F, Chen J, Zhou G, Ji W, Li B, Xu Y, Liu D, Wang P, Yang P, Liu B, Sun W, Wan C, Qin S, He G, Steinberg S, Cichon S, Werge T, Sigurdsson E, Tosato S, Palotie A, Nöthen MM, Rietschel M, Ophoff RA, Collier DA, Rujescu D, Clair DS, Stefansson H, Stefansson K, Ji J, Wang Q, Li W, Zheng L, Zhang H, Feng G and He L

    Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, China.

    Schizophrenia is a severe mental disorder affecting ∼1% of the world population, with heritability of up to 80%. To identify new common genetic risk factors, we performed a genome-wide association study (GWAS) in the Han Chinese population. The discovery sample set consisted of 3,750 individuals with schizophrenia and 6,468 healthy controls (1,578 cases and 1,592 controls from northern Han Chinese, 1,238 cases and 2,856 controls from central Han Chinese, and 934 cases and 2,020 controls from the southern Han Chinese). We further analyzed the strongest association signals in an additional independent cohort of 4,383 cases and 4,539 controls from the Han Chinese population. Meta-analysis identified common SNPs that associated with schizophrenia with genome-wide significance on 8p12 (rs16887244, P = 1.27 × 10(-10)) and 1q24.2 (rs10489202, P = 9.50 × 10(-9)). Our findings provide new insights into the pathogenesis of schizophrenia.

    Nature genetics 2011;43;12;1224-7

  • The tammar wallaby major histocompatibility complex shows evidence of past genomic instability.

    Siddle HV, Deakin JE, Coggill P, Whilming LG, Harrow J, Kaufman J, Beck S and Belov K

    Faculty of Veterinary Science, University of Sydney, NSW 2006, Australia.

    Background: The major histocompatibility complex (MHC) is a group of genes with a variety of roles in the innate and adaptive immune responses. MHC genes form a genetically linked cluster in eutherian mammals, an organization that is thought to confer functional and evolutionary advantages to the immune system. The tammar wallaby (Macropus eugenii), an Australian marsupial, provides a unique model for understanding MHC gene evolution, as many of its antigen presenting genes are not linked to the MHC, but are scattered around the genome.

    Results: Here we describe the 'core' tammar wallaby MHC region on chromosome 2q by ordering and sequencing 33 BAC clones, covering over 4.5 MB and containing 129 genes. When compared to the MHC region of the South American opossum, eutherian mammals and non-mammals, the wallaby MHC has a novel gene organization. The wallaby has undergone an expansion of MHC class II genes, which are separated into two clusters by the class III genes. The antigen processing genes have undergone duplication, resulting in two copies of TAP1 and three copies of TAP2. Notably, Kangaroo Endogenous Retroviral Elements are present within the region and may have contributed to the genomic instability.

    Conclusions: The wallaby MHC has been extensively remodeled since the American and Australian marsupials last shared a common ancestor. The instability is characterized by the movement of antigen presenting genes away from the core MHC, most likely via the presence and activity of retroviral elements. We propose that the movement of class II genes away from the ancestral class II region has allowed this gene family to expand and diversify in the wallaby. The duplication of TAP genes in the wallaby MHC makes this species a unique model organism for studying the relationship between MHC gene organization and function.

    Funded by: Wellcome Trust: 084071, 089305

    BMC genomics 2011;12;421

  • A conditional knockout resource for the genome-wide study of mouse gene function.

    Skarnes WC, Rosen B, West AP, Koutsourakis M, Bushell W, Iyer V, Mujica AO, Thomas M, Harrow J, Cox T, Jackson D, Severin J, Biggs P, Fu J, Nefedov M, de Jong PJ, Stewart AF and Bradley A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Gene targeting in embryonic stem cells has become the principal technology for manipulation of the mouse genome, offering unrivalled accuracy in allele design and access to conditional mutagenesis. To bring these advantages to the wider research community, large-scale mouse knockout programmes are producing a permanent resource of targeted mutations in all protein-coding genes. Here we report the establishment of a high-throughput gene-targeting pipeline for the generation of reporter-tagged, conditional alleles. Computational allele design, 96-well modular vector construction and high-efficiency gene-targeting strategies have been combined to mutate genes on an unprecedented scale. So far, more than 12,000 vectors and 9,000 conditional targeted alleles have been produced in highly germline-competent C57BL/6N embryonic stem cells. High-throughput genome engineering highlighted by this study is broadly applicable to rat and human stem cells and provides a foundation for future genome-wide efforts aimed at deciphering the function of all genes encoded by the mammalian genome.

    Funded by: NHGRI NIH HHS: U01 HG004080, U01-HG004080; Wellcome Trust: 077188

    Nature 2011;474;7351;337-42

  • Identification of an imprinted master trans regulator at the KLF14 locus related to multiple metabolic phenotypes.

    Small KS, Hedman AK, Grundberg E, Nica AC, Thorleifsson G, Kong A, Thorsteindottir U, Shin SY, Richards HB, GIANT Consortium, MAGIC Investigators, DIAGRAM Consortium, Soranzo N, Ahmadi KR, Lindgren CM, Stefansson K, Dermitzakis ET, Deloukas P, Spector TD, McCarthy MI and MuTHER Consortium

    Department of Twin Research and Genetic Epidemiology, King's College London, London, UK.

    Genome-wide association studies have identified many genetic variants associated with complex traits. However, at only a minority of loci have the molecular mechanisms mediating these associations been characterized. In parallel, whereas cis regulatory patterns of gene expression have been extensively explored, the identification of trans regulatory effects in humans has attracted less attention. Here we show that the type 2 diabetes and high-density lipoprotein cholesterol-associated cis-acting expression quantitative trait locus (eQTL) of the maternally expressed transcription factor KLF14 acts as a master trans regulator of adipose gene expression. Expression levels of genes regulated by this trans-eQTL are highly correlated with concurrently measured metabolic traits, and a subset of the trans-regulated genes harbor variants directly associated with metabolic phenotypes. This trans-eQTL network provides a mechanistic understanding of the effect of the KLF14 locus on metabolic disease risk and offers a potential model for other complex traits.

    Funded by: Medical Research Council: G0900339; NIMH NIH HHS: R01 MH090941; Wellcome Trust: 079771, 081878, 081917, 090532, 095515

    Nature genetics 2011;43;6;561-4

  • Candidate gene association study for diabetic retinopathy in persons with type 2 diabetes: the Candidate gene Association Resource (CARe).

    Sobrin L, Green T, Sim X, Jensen RA, Tai ES, Tay WT, Wang JJ, Mitchell P, Sandholm N, Liu Y, Hietala K, Iyengar SK, Family Investigation of Nephropathy and Diabetes-Eye Research Group, Brooks M, Buraczynska M, Van Zuydam N, Smith AV, Gudnason V, Doney AS, Morris AD, Leese GP, Palmer CN, Wellcome Trust Case Control Consortium 2, Swaroop A, Taylor HA, Wilson JG, Penman A, Chen CJ, Groop PH, Saw SM, Aung T, Klein BE, Rotter JI, Siscovick DS, Cotch MF, Klein R, Daly MJ and Wong TY

    Department of Ophthalmology, Harvard Medical School, Massachusetts Eye and Ear Infirmary, Boston, Massachusetts 02114, USA.

    Purpose: To investigate whether variants in cardiovascular candidate genes, some of which have been previously associated with type 2 diabetes (T2D), diabetic retinopathy (DR), and diabetic nephropathy (DN), are associated with DR in the Candidate gene Association Resource (CARe).

    Methods: Persons with T2D who were enrolled in the study (n = 2691) had fundus photography and genotyping of single nucleotide polymorphisms (SNPs) in 2000 candidate genes. Two case definitions were investigated: Early Treatment Diabetic Retinopathy Study (ETDRS) grades ≥ 14 and ≥ 30. The χ² analyses for each CARe cohort were combined by Cochran-Mantel-Haenszel (CMH) pooling of odds ratios (ORs) and corrected for multiple hypothesis testing. Logistic regression was performed with adjustment for other DR risk factors. Results from replication in independent cohorts were analyzed with CMH meta-analysis methods.

    Results: Among 39 genes previously associated with DR, DN, or T2D, three SNPs in P-selectin (SELP) were associated with DR. The strongest association was to rs6128 (OR = 0.43, P = 0.0001, after Bonferroni correction). These associations remained significant after adjustment for DR risk factors. Among other genes examined, several variants were associated with DR with significant P values, including rs6856425 tagging α-l-iduronidase (IDUA) (P = 2.1 × 10(-5), after Bonferroni correction). However, replication in independent cohorts did not reveal study-wide significant effects. The P values after replication were 0.55 and 0.10 for rs6128 and rs6856425, respectively.

    Conclusions: Genes associated with DN, T2D, and vascular diseases do not appear to be consistently associated with DR. A few genetic variants associated with DR, particularly those in SELP and near IDUA, should be investigated in additional DR cohorts.

    Funded by: NCRR NIH HHS: UL1 RR 025758; NEI NIH HHS: K12-EY16335, Z01 EY000401-06, Z01 EY000401-07, Z01 EY000403-06, Z01 EY000403-07, Z01 EY000425-04, Z99 EY999999, ZIA EY000401-08, ZIA EY000401-09, ZIA EY000401-10, ZIA EY000403-08, ZIA EY000403-09, ZIA EY000403-10, ZIA EY000425-06; NHLBI NIH HHS: N01-HC-65226; Wellcome Trust: 090532

    Investigative ophthalmology & visual science 2011;52;10;7593-602

  • Genome-wide association and large-scale follow up identifies 16 new loci influencing lung function.

    Soler Artigas M, Loth DW, Wain LV, Gharib SA, Obeidat M, Tang W, Zhai G, Zhao JH, Smith AV, Huffman JE, Albrecht E, Jackson CM, Evans DM, Cadby G, Fornage M, Manichaikul A, Lopez LM, Johnson T, Aldrich MC, Aspelund T, Barroso I, Campbell H, Cassano PA, Couper DJ, Eiriksdottir G, Franceschini N, Garcia M, Gieger C, Gislason GK, Grkovic I, Hammond CJ, Hancock DB, Harris TB, Ramasamy A, Heckbert SR, Heliövaara M, Homuth G, Hysi PG, James AL, Jankovic S, Joubert BR, Karrasch S, Klopp N, Koch B, Kritchevsky SB, Launer LJ, Liu Y, Loehr LR, Lohman K, Loos RJ, Lumley T, Al Balushi KA, Ang WQ, Barr RG, Beilby J, Blakey JD, Boban M, Boraska V, Brisman J, Britton JR, Brusselle GG, Cooper C, Curjuric I, Dahgam S, Deary IJ, Ebrahim S, Eijgelsheim M, Francks C, Gaysina D, Granell R, Gu X, Hankinson JL, Hardy R, Harris SE, Henderson J, Henry A, Hingorani AD, Hofman A, Holt PG, Hui J, Hunter ML, Imboden M, Jameson KA, Kerr SM, Kolcic I, Kronenberg F, Liu JZ, Marchini J, McKeever T, Morris AD, Olin AC, Porteous DJ, Postma DS, Rich SS, Ring SM, Rivadeneira F, Rochat T, Sayer AA, Sayers I, Sly PD, Smith GD, Sood A, Starr JM, Uitterlinden AG, Vonk JM, Wannamethee SG, Whincup PH, Wijmenga C, Williams OD, Wong A, Mangino M, Marciante KD, McArdle WL, Meibohm B, Morrison AC, North KE, Omenaas E, Palmer LJ, Pietiläinen KH, Pin I, Pola Sbreve Ek O, Pouta A, Psaty BM, Hartikainen AL, Rantanen T, Ripatti S, Rotter JI, Rudan I, Rudnicka AR, Schulz H, Shin SY, Spector TD, Surakka I, Vitart V, Völzke H, Wareham NJ, Warrington NM, Wichmann HE, Wild SH, Wilk JB, Wjst M, Wright AF, Zgaga L, Zemunik T, Pennell CE, Nyberg F, Kuh D, Holloway JW, Boezen HM, Lawlor DA, Morris RW, Probst-Hensch N, International Lung Cancer Consortium, GIANT consortium, Kaprio J, Wilson JF, Hayward C, Kähönen M, Heinrich J, Musk AW, Jarvis DL, Gläser S, Järvelin MR, Ch Stricker BH, Elliott P, O'Connor GT, Strachan DP, London SJ, Hall IP, Gudnason V and Tobin MD

    Department of Health Sciences, University of Leicester, Leicester, UK.

    Pulmonary function measures reflect respiratory health and are used in the diagnosis of chronic obstructive pulmonary disease. We tested genome-wide association with forced expiratory volume in 1 second and the ratio of forced expiratory volume in 1 second to forced vital capacity in 48,201 individuals of European ancestry with follow up of the top associations in up to an additional 46,411 individuals. We identified new regions showing association (combined P < 5 × 10(-8)) with pulmonary function in or near MFAP2, TGFB2, HDAC4, RARB, MECOM (also known as EVI1), SPATA9, ARMC2, NCR3, ZKSCAN3, CDC123, C10orf11, LRP1, CCDC38, MMP15, CFDP1 and KCNE2. Identification of these 16 new loci may provide insight into the molecular mechanisms regulating pulmonary function and into molecular targets for future therapy to alleviate reduced lung function.

    Funded by: CIHR: MOP-82893; Biotechnology and Biological Sciences Research Council: BB/F019394/1, G20234; British Heart Foundation: FS05/125, PG/06/154/22043, PG/97012, RG/08/013/25942; Cancer Research UK; Chief Scientist Office: CZB/4/710, CZD/16/6, CZD/16/6/2, CZD/16/6/4; Department of Health; Intramural NIH HHS: Z01 ES049019; Medical Research Council: G0000934, G0401540, G0500539, G0501942, G0600705, G0701863, G0800582, G0801056, G0902125, G0902313, G1000861, G9815508, G9901462, MC_PC_U127561128, MC_U106188470, MC_U123092720, MC_U123092721, MC_U127561128, MC_UP_A620_1014, MC_UP_A620_1015; NCI NIH HHS: 1P50 CA70907, CA127219, CA55769, P50 CA070907, R01 CA055769, R01 CA111703, R01 CA121197, R01 CA127219, R01CA111703, U19 CA148127; NCRR NIH HHS: 5M01 RR00997, M01 RR000425, M01 RR000997, M01-RR00425, RR-024156, UL1 RR024156, UL1 RR025005, UL1RR025005; NHGRI NIH HHS: HHSN268200782096C, U01 HG004402, U01 HG004729, U01-HG-004402, U01-HG-004729; NHLBI NIH HHS: 1K23HL094531-01, 5R01HL087679-02, HHSN268201100005C, HHSN268201100005G, HHSN268201100005I, HHSN268201100006C, HHSN268201100007C, HHSN268201100007I, HHSN268201100008C, HHSN268201100008I, HHSN268201100009C, HHSN268201100009I, HHSN268201100010C, HHSN268201100011C, HHSN268201100011I, HHSN268201100012C, HL075336, HL080295, HL087652, HL088133, HL105756, K23 HL094531, N01 HC-25195, N01 HC-55222, N01 HC015103, N01 HC025195, N01 HC035129, N01 HC045133, N01 HC045134, N01-HC-05187, N01-HC-45204, N01-HC-45205, N01-HC-48047, N01-HC-48048, N01-HC-48049, N01-HC-48050, N01-HC-75150, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85084, N01-HC-85085, N01-HC-85086, N01-HC-95095, N01-HC-95159, N01-HC-95160, N01-HC-95161, N01-HC-95162, N01-HC-95163, N01-HC-95164, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, N01HC05187, N01HC45204, N01HC45205, N01HC48047, N01HC48048, N01HC48049, N01HC48050, N01HC55222, N01HC75150, N01HC85079, N01HC85086, N01HC95095, N01HC95159, N01HC95169, N02-HL-6-4278, R01 HL-071022, R01 HL-074104, R01 HL-077612, R01 HL059367, R01 HL071022, R01 HL071051, R01 HL071205, R01 HL071250, R01 HL071251, R01 HL071252, R01 HL071258, R01 HL071259, R01 HL074104, R01 HL075476, R01 HL077612, R01 HL080295, R01 HL084099, R01 HL086694, R01 HL087641, R01 HL087652, R01 HL087679, R01 HL088133, R01 HL105756, R01-HL-084099, R01-HL084099, R01HL071051, R01HL071205, R01HL071250, R01HL071251, R01HL071252, R01HL071258, R01HL071259, R01HL086694, R01HL087641, R01HL59367, RC1 HL100543, U01 HL080295; NIA NIH HHS: 1R01AG032098-01A1, AG-023269, AG-027058, AG-20098, AG035835, N01 AG062101, N01 AG062103, N01 AG062106, N01AG12100, R01 AG015928, R01 AG020098, R01 AG027058, R01 AG032098, R56 AG020098, RC1 AG035835, RC1 AG035835-01; NIDDK NIH HHS: DK063491, P30 DK063491, U01 DK062418; NIEHS NIH HHS: ES015794, R01 ES015794, ZO1 ES49019; NIMH NIH HHS: 1RL1MH083268-01, 5R01MH63706:02, R01 MH063706, RL1 MH083268; PHS HHS: 268200625226C, 268200782096C, 268201100005C, 268201100006C, 268201100007C, 268201100008C, 268201100009C, 268201100010C, 268201100011C, 268201100012C; Wellcome Trust: 068545/Z/02, 076113/B/04/Z, 077016/Z/05/Z, 079895, 090532, 092731, GR069224

    Nature genetics 2011;43;11;1082-90

  • The effect of genome-wide association scan quality control on imputation outcome for common variants.

    Southam L, Panoutsopoulou K, Rayner NW, Chapman K, Durrant C, Ferreira T, Arden N, Carr A, Deloukas P, Doherty M, Loughlin J, McCaskie A, Ollier WE, Ralston S, Spector TD, Valdes AM, Wallis GA, Wilkinson JM, arcOGEN consortium, Marchini J and Zeggini E

    Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK.

    Imputation is an extremely valuable tool in conducting and synthesising genome-wide association studies (GWASs). Directly typed SNP quality control (QC) is thought to affect imputation quality. It is, therefore, common practise to use quality-controlled (QCed) data as an input for imputing genotypes. This study aims to determine the effect of commonly applied QC steps on imputation outcomes. We performed several iterations of imputing SNPs across chromosome 22 in a dataset consisting of 3177 samples with Illumina 610 k (Illumina, San Diego, CA, USA) GWAS data, applying different QC steps each time. The imputed genotypes were compared with the directly typed genotypes. In addition, we investigated the correlation between alternatively QCed data. We also applied a series of post-imputation QC steps balancing elimination of poorly imputed SNPs and information loss. We found that the difference between the unQCed data and the fully QCed data on imputation outcome was minimal. Our study shows that imputation of common variants is generally very accurate and robust to GWAS QC, which is not a major factor affecting imputation outcome. A minority of common-frequency SNPs with particular properties cannot be accurately imputed regardless of QC stringency. These findings may not generalise to the imputation of low frequency and rare variants.

    Funded by: Arthritis Research UK: 18030; Medical Research Council: G0100594, G0901461; Wellcome Trust: 079557, 088885, 090532, WT079557MA, WT088885/Z/09/Z

    European journal of human genetics : EJHG 2011;19;5;610-4

  • Genome-wide association analysis identifies variants associated with nonalcoholic fatty liver disease that have distinct effects on metabolic traits.

    Speliotes EK, Yerges-Armstrong LM, Wu J, Hernaez R, Kim LJ, Palmer CD, Gudnason V, Eiriksdottir G, Garcia ME, Launer LJ, Nalls MA, Clark JM, Mitchell BD, Shuldiner AR, Butler JL, Tomas M, Hoffmann U, Hwang SJ, Massaro JM, O'Donnell CJ, Sahani DV, Salomaa V, Schadt EE, Schwartz SM, Siscovick DS, NASH CRN, GIANT Consortium, MAGIC Investigators, Voight BF, Carr JJ, Feitosa MF, Harris TB, Fox CS, Smith AV, Kao WH, Hirschhorn JN, Borecki IB and GOLD Consortium

    Department of Internal Medicine, Division of Gastroenterology, University of Michigan, Ann Arbor, Michigan, United States of America.

    Nonalcoholic fatty liver disease (NAFLD) clusters in families, but the only known common genetic variants influencing risk are near PNPLA3. We sought to identify additional genetic variants influencing NAFLD using genome-wide association (GWA) analysis of computed tomography (CT) measured hepatic steatosis, a non-invasive measure of NAFLD, in large population based samples. Using variance components methods, we show that CT hepatic steatosis is heritable (∼26%-27%) in family-based Amish, Family Heart, and Framingham Heart Studies (n = 880 to 3,070). By carrying out a fixed-effects meta-analysis of genome-wide association (GWA) results between CT hepatic steatosis and ∼2.4 million imputed or genotyped SNPs in 7,176 individuals from the Old Order Amish, Age, Gene/Environment Susceptibility-Reykjavik study (AGES), Family Heart, and Framingham Heart Studies, we identify variants associated at genome-wide significant levels (p<5×10(-8)) in or near PNPLA3, NCAN, and PPP1R3B. We genotype these and 42 other top CT hepatic steatosis-associated SNPs in 592 subjects with biopsy-proven NAFLD from the NASH Clinical Research Network (NASH CRN). In comparisons with 1,405 healthy controls from the Myocardial Genetics Consortium (MIGen), we observe significant associations with histologic NAFLD at variants in or near NCAN, GCKR, LYPLAL1, and PNPLA3, but not PPP1R3B. Variants at these five loci exhibit distinct patterns of association with serum lipids, as well as glycemic and anthropometric traits. We identify common genetic variants influencing CT-assessed steatosis and risk of NAFLD. Hepatic steatosis associated variants are not uniformly associated with NASH/fibrosis or result in abnormalities in serum lipids or glycemic and anthropometric traits, suggesting genetic heterogeneity in the pathways influencing these traits.

    Funded by: British Heart Foundation: PG/09/002/26056; Medical Research Council: G0401527, G0701863, G0801056, G0902037, G1000143, G19/35, MC_U106179471, MC_U127561128, MC_UP_A100_1003, MC_UP_A620_1014, MC_UP_A620_1015; NCRR NIH HHS: M01RR000065, M01RR000750, M01RR000827, M01RR00188, M01RR020359, UL1 RR024989, UL1RR024989, UL1RR02501401; NHLBI NIH HHS: N01-HC-25195, N02-HL-6-4278, R01 HL087647, R01HL087700, R01HL088119, U01 HL084756, U01 HL72515; NIA NIH HHS: N01-AG-12100, R01 AG18728, T32AG000262; NIAMS NIH HHS: F32AR059469; NIDDK NIH HHS: F32 DK079466-01, K01 DK067207, K23DK080145-01, K24 DK002957, P30DK072488, P60 DK079637, R01DK075681, R01DK075787, T32 DK07191-32, U01 DK061728, U01DK061713, U01DK061718, U01DK061728, U01DK061730, U01DK061731, U01DK061732, U01DK061734, U01DK061737, U01DK061738; NIGMS NIH HHS: T32 GM074905; PHS HHS: ULRR02413101; Wellcome Trust: 090532

    PLoS genetics 2011;7;3;e1001324

  • Massive genomic rearrangement acquired in a single catastrophic event during cancer development.

    Stephens PJ, Greenman CD, Fu B, Yang F, Bignell GR, Mudie LJ, Pleasance ED, Lau KW, Beare D, Stebbings LA, McLaren S, Lin ML, McBride DJ, Varela I, Nik-Zainal S, Leroy C, Jia M, Menzies A, Butler AP, Teague JW, Quail MA, Burton J, Swerdlow H, Carter NP, Morsberger LA, Iacobuzio-Donahue C, Follows GA, Green AR, Flanagan AM, Stratton MR, Futreal PA and Campbell PJ

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Cancer is driven by somatically acquired point mutations and chromosomal rearrangements, conventionally thought to accumulate gradually over time. Using next-generation sequencing, we characterize a phenomenon, which we term chromothripsis, whereby tens to hundreds of genomic rearrangements occur in a one-off cellular crisis. Rearrangements involving one or a few chromosomes crisscross back and forth across involved regions, generating frequent oscillations between two copy number states. These genomic hallmarks are highly improbable if rearrangements accumulate over time and instead imply that nearly all occur during a single cellular catastrophe. The stamp of chromothripsis can be seen in at least 2%-3% of all cancers, across many subtypes, and is present in ∼25% of bone cancers. We find that one, or indeed more than one, cancer-causing lesion can emerge out of the genomic crisis. This phenomenon has important implications for the origins of genomic remodeling and temporal emergence of cancer.

    Funded by: Wellcome Trust: 077012/Z/05/Z, 088340, 093867, WT088340MA

    Cell 2011;144;1;27-40

  • Two covariance models for iron-responsive elements.

    Stevens SG, Gardner PP and Brown C

    Biochemistry and Genetics Otago, University of Otago, Dunedin, New Zealand.

    Iron-responsive elements (IREs) function in the 5' or 3' untranslated regions (UTRs) of mRNAs as post-transcriptional structured cis-acting RNA regulatory elements. One known functional mechanism is the binding of Iron Regulatory Proteins (IRPs) to 5' UTR IREs, reducing translation rates at low iron levels. Another known mechanism is IRPs binding to 3' UTR IREs in other mRNAs, increasing RNA stability. Experimentally proven elements are quite small, have some diversity of sequence and structure, and functional genes have similar pseudogenes in the genome. This paper presents two new IRE covariance models, comprising a new IRE clan in the RFAM database to encompass this variation without over-generalisation. Two IRE models rather than a single model is consistent with experimentally proven structures and predictions. All of the IREs with experimental support are modelled. These two new models show a marked increase in the sensitivity and specificity in detection of known iron-responsive elements and ability to predict novel IREs.

    Funded by: Wellcome Trust: WT077044/Z/05/Z

    RNA biology 2011;8;5;792-801

  • Genome-wide association identifies nine common variants associated with fasting proinsulin levels and provides new insights into the pathophysiology of type 2 diabetes.

    Strawbridge RJ, Dupuis J, Prokopenko I, Barker A, Ahlqvist E, Rybin D, Petrie JR, Travers ME, Bouatia-Naji N, Dimas AS, Nica A, Wheeler E, Chen H, Voight BF, Taneera J, Kanoni S, Peden JF, Turrini F, Gustafsson S, Zabena C, Almgren P, Barker DJ, Barnes D, Dennison EM, Eriksson JG, Eriksson P, Eury E, Folkersen L, Fox CS, Frayling TM, Goel A, Gu HF, Horikoshi M, Isomaa B, Jackson AU, Jameson KA, Kajantie E, Kerr-Conte J, Kuulasmaa T, Kuusisto J, Loos RJ, Luan J, Makrilakis K, Manning AK, Martínez-Larrad MT, Narisu N, Nastase Mannila M, Ohrvik J, Osmond C, Pascoe L, Payne F, Sayer AA, Sennblad B, Silveira A, Stancáková A, Stirrups K, Swift AJ, Syvänen AC, Tuomi T, van 't Hooft FM, Walker M, Weedon MN, Xie W, Zethelius B, DIAGRAM Consortium, GIANT Consortium, MuTHER Consortium, CARDIoGRAM Consortium, C4D Consortium, Ongen H, Mälarstig A, Hopewell JC, Saleheen D, Chambers J, Parish S, Danesh J, Kooner J, Ostenson CG, Lind L, Cooper CC, Serrano-Ríos M, Ferrannini E, Forsen TJ, Clarke R, Franzosi MG, Seedorf U, Watkins H, Froguel P, Johnson P, Deloukas P, Collins FS, Laakso M, Dermitzakis ET, Boehnke M, McCarthy MI, Wareham NJ, Groop L, Pattou F, Gloyn AL, Dedoussis GV, Lyssenko V, Meigs JB, Barroso I, Watanabe RM, Ingelsson E, Langenberg C, Hamsten A and Florez JC

    Atherosclerosis Research Unit, Department of Medicine Solna, Karolinska Institutet, Karolinska University Hospital Solna, Stockholm, Sweden.

    Objective: Proinsulin is a precursor of mature insulin and C-peptide. Higher circulating proinsulin levels are associated with impaired β-cell function, raised glucose levels, insulin resistance, and type 2 diabetes (T2D). Studies of the insulin processing pathway could provide new insights about T2D pathophysiology.

    Research design and methods: We have conducted a meta-analysis of genome-wide association tests of ∼2.5 million genotyped or imputed single nucleotide polymorphisms (SNPs) and fasting proinsulin levels in 10,701 nondiabetic adults of European ancestry, with follow-up of 23 loci in up to 16,378 individuals, using additive genetic models adjusted for age, sex, fasting insulin, and study-specific covariates.

    Results: Nine SNPs at eight loci were associated with proinsulin levels (P < 5 × 10(-8)). Two loci (LARP6 and SGSM2) have not been previously related to metabolic traits, one (MADD) has been associated with fasting glucose, one (PCSK1) has been implicated in obesity, and four (TCF7L2, SLC30A8, VPS13C/C2CD4A/B, and ARAP1, formerly CENTD2) increase T2D risk. The proinsulin-raising allele of ARAP1 was associated with a lower fasting glucose (P = 1.7 × 10(-4)), improved β-cell function (P = 1.1 × 10(-5)), and lower risk of T2D (odds ratio 0.88; P = 7.8 × 10(-6)). Notably, PCSK1 encodes the protein prohormone convertase 1/3, the first enzyme in the insulin processing pathway. A genotype score composed of the nine proinsulin-raising alleles was not associated with coronary disease in two large case-control datasets.

    Conclusions: We have identified nine genetic variants associated with fasting proinsulin. Our findings illuminate the biology underlying glucose homeostasis and T2D development in humans and argue against a direct role of proinsulin in coronary artery disease pathogenesis.

    Funded by: British Heart Foundation: RG/08/014/24067; Diabetes UK: 08/0003775; Medical Research Council: 81696, G0601261, G0601966, G0700222, G0700222(81696), G0700931, G0801056, MC_PC_U127561128, MC_PC_U127592696, MC_U106188470, MC_U127561128, MC_U137686857, MC_UP_A620_1014, MC_UP_A620_1015; NIDDK NIH HHS: DK062370, K24 DK080140, R01 DK078616; Wellcome Trust: 077016/Z/05/Z, 083270/Z/07/Z, 090532

    Diabetes 2011;60;10;2624-34

  • Human metabolic individuality in biomedical and pharmaceutical research.

    Suhre K, Shin SY, Petersen AK, Mohney RP, Meredith D, Wägele B, Altmaier E, CARDIoGRAM, Deloukas P, Erdmann J, Grundberg E, Hammond CJ, de Angelis MH, Kastenmüller G, Köttgen A, Kronenberg F, Mangino M, Meisinger C, Meitinger T, Mewes HW, Milburn MV, Prehn C, Raffler J, Ried JS, Römisch-Margl W, Samani NJ, Small KS, Wichmann HE, Zhai G, Illig T, Spector TD, Adamski J, Soranzo N and Gieger C

    Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.

    Genome-wide association studies (GWAS) have identified many risk loci for complex diseases, but effect sizes are typically small and information on the underlying biological processes is often lacking. Associations with metabolic traits as functional intermediates can overcome these problems and potentially inform individualized therapy. Here we report a comprehensive analysis of genotype-dependent metabolic phenotypes using a GWAS with non-targeted metabolomics. We identified 37 genetic loci associated with blood metabolite concentrations, of which 25 show effect sizes that are unusually high for GWAS and account for 10-60% differences in metabolite levels per allele copy. Our associations provide new functional insights for many disease-related associations that have been reported in previous studies, including those for cardiovascular and kidney disorders, type 2 diabetes, cancer, gout, venous thromboembolism and Crohn's disease. The study advances our knowledge of the genetic basis of metabolic individuality in humans and generates many new hypotheses for biomedical and pharmaceutical research.

    Funded by: CIHR: MOP172605, MOP77682, MOP‐82810; Biotechnology and Biological Sciences Research Council; British Heart Foundation; Cancer Research UK; Intramural NIH HHS; Medical Research Council; NHLBI NIH HHS: 1R01HL103931‐01, HL087647, N01‐HC‐55015, N01‐HC‐55016, N01‐HC‐55018, N01‐HC‐55019, N01‐HC‐55020, N01‐HC‐55021, N01‐HC‐55022, P01 HL098055, P01HL076491‐06, P01HL087018, R01 HL087676, R01HL089650‐02; NIA NIH HHS: N01‐AG‐12100; NIDDK NIH HHS: R01DK080732; Wellcome Trust: 091746, 091746/Z/10/Z

    Nature 2011;477;7362;54-60

  • A genome-wide screen for interactions reveals a new locus on 4p15 modifying the effect of waist-to-hip ratio on total cholesterol.

    Surakka I, Isaacs A, Karssen LC, Laurila PP, Middelberg RP, Tikkanen E, Ried JS, Lamina C, Mangino M, Igl W, Hottenga JJ, Lagou V, van der Harst P, Mateo Leach I, Esko T, Kutalik Z, Wainwright NW, Struchalin MV, Sarin AP, Kangas AJ, Viikari JS, Perola M, Rantanen T, Petersen AK, Soininen P, Johansson A, Soranzo N, Heath AC, Papamarkou T, Prokopenko I, Tönjes A, Kronenberg F, Döring A, Rivadeneira F, Montgomery GW, Whitfield JB, Kähönen M, Lehtimäki T, Freimer NB, Willemsen G, de Geus EJ, Palotie A, Sandhu MS, Waterworth DM, Metspalu A, Stumvoll M, Uitterlinden AG, Jula A, Navis G, Wijmenga C, Wolffenbuttel BH, Taskinen MR, Ala-Korpela M, Kaprio J, Kyvik KO, Boomsma DI, Pedersen NL, Gyllensten U, Wilson JF, Rudan I, Campbell H, Pramstaller PP, Spector TD, Witteman JC, Eriksson JG, Salomaa V, Oostra BA, Raitakari OT, Wichmann HE, Gieger C, Järvelin MR, Martin NG, Hofman A, McCarthy MI, Peltonen L, van Duijn CM, Aulchenko YS, Ripatti S and ENGAGE Consortium

    Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland.

    Recent genome-wide association (GWA) studies described 95 loci controlling serum lipid levels. These common variants explain ∼25% of the heritability of the phenotypes. To date, no unbiased screen for gene-environment interactions for circulating lipids has been reported. We screened for variants that modify the relationship between known epidemiological risk factors and circulating lipid levels in a meta-analysis of genome-wide association (GWA) data from 18 population-based cohorts with European ancestry (maximum N = 32,225). We collected 8 further cohorts (N = 17,102) for replication, and rs6448771 on 4p15 demonstrated genome-wide significant interaction with waist-to-hip-ratio (WHR) on total cholesterol (TC) with a combined P-value of 4.79×10(-9). There were two potential candidate genes in the region, PCDH7 and CCKAR, with differential expression levels for rs6448771 genotypes in adipose tissue. The effect of WHR on TC was strongest for individuals carrying two copies of G allele, for whom a one standard deviation (sd) difference in WHR corresponds to 0.19 sd difference in TC concentration, while for A allele homozygous the difference was 0.12 sd. Our findings may open up possibilities for targeted intervention strategies for people characterized by specific genomic profiles. However, more refined measures of both body-fat distribution and metabolic measures are needed to understand how their joint dynamics are modified by the newly found locus.

    Funded by: British Heart Foundation: PG/08/094/26019; Cancer Research UK: C865/A2883; Chief Scientist Office: CZB/4/710; Medical Research Council: G0300128, G0801566, G9502233, g0500539, g600705; NHLBI NIH HHS: 5R01HL087679, R01 HL087679; NIAAA NIH HHS: AA10248, AA11998, AA13320, AA13321, AA13326, AA14041, AA17688, K05 AA017688, P50 AA011998, R01 AA007535, R01 AA013320, R01 AA013321, R01 AA013326, R01 AA014041; NIDA NIH HHS: DA12854, R01 DA012854, R56 DA012854; NIMH NIH HHS: 1R01MH083268-01, MH66206, R01 MH066206, RL1 MH083268, U24 MH068457, U24 MH068457-06; NLM NIH HHS: R01 LM010098; PHS HHS: R01D0042157-01A; Wellcome Trust: 090532, gr069224

    PLoS genetics 2011;7;10;e1002333

  • An optimized microarray platform for assaying genomic variation in Plasmodium falciparum field populations.

    Tan JC, Miller BA, Tan A, Patel JJ, Cheeseman IH, Anderson TJ, Manske M, Maslen G, Kwiatkowski DP and Ferdig MT

    The Eck Institute for Global Health, University of Notre Dame, 100 Galvin Life Sciences, Notre Dame, IN 46556, USA.

    We present an optimized probe design for copy number variation (CNV) and SNP genotyping in the Plasmodium falciparum genome. We demonstrate that variable length and isothermal probes are superior to static length probes. We show that sample preparation and hybridization conditions mitigate the effects of host DNA contamination in field samples. The microarray and workflow presented can be used to identify CNVs and SNPs with 95% accuracy in a single hybridization, in field samples containing up to 92% human DNA contamination.

    Funded by: Medical Research Council: G19/9; NCRR NIH HHS: RR013556; NIAID NIH HHS: AI072517, AI075145; Wellcome Trust: 090532

    Genome biology 2011;12;4;R35

  • The clinical and molecular genetic features of idiopathic infantile periodic alternating nystagmus.

    Thomas MG, Crosier M, Lindsay S, Kumar A, Thomas S, Araki M, Talbot CJ, McLean RJ, Surendran M, Taylor K, Leroy BP, Moore AT, Hunter DG, Hertle RW, Tarpey P, Langmann A, Lindner S, Brandner M and Gottlob I

    Ophthalmology Group, School of Medicine, University of Leicester, RKCSB, PO Box 65, Leicester LE2 7LX, UK.

    Periodic alternating nystagmus consists of involuntary oscillations of the eyes with cyclical changes of nystagmus direction. It can occur during infancy (e.g. idiopathic infantile periodic alternating nystagmus) or later in life. Acquired forms are often associated with cerebellar dysfunction arising due to instability of the optokinetic-vestibular systems. Idiopathic infantile periodic alternating nystagmus can be familial or occur in isolation; however, very little is known about the clinical characteristics, genetic aetiology and neural substrates involved. Five loci (NYS1-5) have been identified for idiopathic infantile nystagmus; three are autosomal (NYS2, NYS3 and NYS4) and two are X-chromosomal (NYS1 and NYS5). We previously identified the FRMD7 gene on chromosome Xq26 (NYS1 locus); mutations of FRMD7 are causative of idiopathic infantile nystagmus influencing neuronal outgrowth and development. It is unclear whether the periodic alternating nystagmus phenotype is linked to NYS1, NYS5 (Xp11.4-p11.3) or a separate locus. From a cohort of 31 X-linked families and 14 singletons (70 patients) with idiopathic infantile nystagmus we identified 10 families and one singleton (21 patients) with periodic alternating nystagmus of which we describe clinical phenotype, genetic aetiology and neural substrates involved. Periodic alternating nystagmus was not detected clinically but only on eye movement recordings. The cycle duration varied from 90 to 280 s. Optokinetic reflex was not detectable horizontally. Mutations of the FRMD7 gene were found in all 10 families and the singleton (including three novel mutations). Periodic alternating nystagmus was predominantly associated with missense mutations within the FERM domain. There was significant sibship clustering of the phenotype although in some families not all affected members had periodic alternating nystagmus. In situ hybridization studies during mid-late human embryonic stages in normal tissue showed restricted FRMD7 expression in neuronal tissue with strong hybridization signals within the afferent arms of the vestibulo-ocular reflex consisting of the otic vesicle, cranial nerve VIII and vestibular ganglia. Similarly within the afferent arm of the optokinetic reflex we showed expression in the developing neural retina and ventricular zone of the optic stalk. Strong FRMD7 expression was seen in rhombomeres 1 to 4, which give rise to the cerebellum and the common integrator site for both these reflexes (vestibular nuclei). Based on the expression and phenotypic data, we hypothesize that periodic alternating nystagmus arises from instability of the optokinetic-vestibular systems. This study shows for the first time that mutations in FRMD7 can cause idiopathic infantile periodic alternating nystagmus and may affect neuronal circuits that have been implicated in acquired forms.

    Funded by: Medical Research Council: G9900837

    Brain : a journal of neurology 2011;134;Pt 3;892-902

  • Genome-wide analysis of simultaneous GATA1/2, RUNX1, FLI1, and SCL binding in megakaryocytes identifies hematopoietic regulators.

    Tijssen MR, Cvejic A, Joshi A, Hannah RL, Ferreira R, Forrai A, Bellissimo DC, Oram SH, Smethurst PA, Wilson NK, Wang X, Ottersbach K, Stemple DL, Green AR, Ouwehand WH and Göttgens B

    Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, UK.

    Hematopoietic differentiation critically depends on combinations of transcriptional regulators controlling the development of individual lineages. Here, we report the genome-wide binding sites for the five key hematopoietic transcription factors--GATA1, GATA2, RUNX1, FLI1, and TAL1/SCL--in primary human megakaryocytes. Statistical analysis of the 17,263 regions bound by at least one factor demonstrated that simultaneous binding by all five factors was the most enriched pattern and often occurred near known hematopoietic regulators. Eight genes not previously appreciated to function in hematopoiesis that were bound by all five factors were shown to be essential for thrombocyte and/or erythroid development in zebrafish. Moreover, one of these genes encoding the PDZK1IP1 protein shared transcriptional enhancer elements with the blood stem cell regulator TAL1/SCL. Multifactor ChIP-Seq analysis in primary human cells coupled with a high-throughput in vivo perturbation screen therefore offers a powerful strategy to identify essential regulators of complex mammalian differentiation processes.

    Funded by: British Heart Foundation: RG/09/012/28096; Medical Research Council: G0800784, G0900951, G0900951(91754); National Centre for the Replacement, Refinement and Reduction of Animals in Research: G0900729/1; Wellcome Trust: 077037/Z/05/Z, 077047/Z/05/Z, 082597/Z/07/Z

    Developmental cell 2011;20;5;597-609

  • Association of known loci with lipid levels among children and prediction of dyslipidemia in adults.

    Tikkanen E, Tuovinen T, Widén E, Lehtimäki T, Viikari J, Kähönen M, Peltonen L, Raitakari OT and Ripatti S

    Institute for Molecular Medicine, Finland FIMM, University of Helsinki, Helsinki, Finland.

    Background: Recent genome-wide association studies have found 95 distinct genetic loci associated with high-density (HDL-C) and low-density (LDL-C) lipoprotein cholesterol, total cholesterol (TC), and triglycerides (TG), using adult samples. It is not known if these variants are associated with lipid levels in children and adolescents and if the genetic risk score (GRS), based on these variants, could improve adulthood dyslipidemia prediction over the childhood lipid measurements.

    Methods and results: We used 2443 participants of the Cardiovascular Risk in Young Finns study cohort with up to 5 measurements of serum lipids taken between ages 3 and 45 years to estimate the effect of individual single-nucleotide polymorphisms and the GRS on lipids. The GRSs were strongly associated with lipids in all age groups (1.5 × 10(-20)<P<8.7 × 10(-12) for HDL-C, 3.5 × 10(-27)<P<5.6 × 10(-09) for LDL-C, 2.0 × 10(-25)<P<5.2 × 10(-09) for TC, and 4.1 × 10(-20)<P<8.4 × 10(-05) for TG). Jointly, the lipid loci explained 11.8-26.7% of the total variance in lipids among 3- to 6-year-old children, and the proportion dropped over age, except for TG. The discrimination of adult hypertriglyceridemia improved when GRS was added to childhood lipid measurement (C statistic=0.04, P=0.01).

    Conclusions: Previously identified lipid loci are associated with lipid levels in children and adolescents and explain up to more than 2 times of the lipid variation in children compared with adults. The TG-GRS improves the risk discrimination over childhood lipid measurement for adult hypertriglyceridemia.

    Funded by: Wellcome Trust

    Circulation. Cardiovascular genetics 2011;4;6;673-80

  • Tumor-specific diagnostic marker for transmissible facial tumors of Tasmanian devils: immunohistochemistry studies.

    Tovar C, Obendorf D, Murchison EP, Papenfuss AT, Kreiss A and Woods GM

    Menzies Research Institute, University of Tasmania, Hobart, Tasmania, Australia.

    Devil facial tumor disease (DFTD) is a transmissible neoplasm that is threatening the survival of the Tasmanian devil. Genetic analyses have indicated that the disease is a peripheral nerve sheath neoplasm of Schwann cell origin. DFTD cells express genes characteristic of myelinating Schwann cells, and periaxin, a Schwann cell protein, has been proposed as a marker for the disease. Diagnosis of DFTD is currently based on histopathology, cytogenetics, and clinical appearance of the disease in affected animals. As devils are susceptible to a variety of neoplastic processes, a specific diagnostic test is required to differentiate DFTD from cancers of similar morphological appearance. This study presents a thorough examination of the expression of a set of Schwann cell and other neural crest markers in DFTD tumors and normal devil tissues. Samples from 20 primary DFTD tumors and 10 DFTD metastases were evaluated by immunohistochemistry for the expression of periaxin, S100 protein, peripheral myelin protein 22, nerve growth factor receptor, nestin, neuron specific enolase, chromogranin A, and myelin basic protein. Of these, periaxin was confirmed as the most sensitive and specific marker, labeling the majority of DFTD cells in 100% of primary DFTD tumors and DFTD metastases. In normal tissues, periaxin showed specificity for Schwann cells in peripheral nerve bundles. This marker was then evaluated in cultured devil Schwann cells, DFTD cell lines, and xenografted DFTD tumors. Periaxin expression was maintained in all these models, validating its utility as a diagnostic marker for the disease.

    Veterinary pathology 2011;48;6;1195-203

  • Messenger RNA and microRNA profiling during early mouse EB formation.

    Tripathi R, Saini HK, Rad R, Abreu-Goodger C, van Dongen S and Enright AJ

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    Embryonic stem (ES) cells can be induced to differentiate into embryoid bodies (EBs) in a synchronised manner when plated at a fixed density in hanging drops. This differentiation procedure mimics post-implantation development in mouse embryos and also serves as the starting point of protocols used in differentiation of stem cells into various lineages. Currently, little is known about the potential influence of microRNAs (miRNAs) on mRNA expression patterns during EB formation. We have measured mRNA and miRNA expression in developing EBs plated in hanging drops until day 3, when discrete structural changes occur involving their differentiation into three germ layers. We observe significant alterations in mRNA and miRNA expression profiles during this early developmental time frame, in particular of genes involved in germ layer formation, stem cell pluripotency and nervous system development. Computational target prediction using Pictar, TargetScan and miRBase Targets reveals an enrichment of binding sites corresponding to differentially and highly expressed miRNAs in stem cell pluripotency genes and a neuroectodermal marker, Nes. We also find that members of let-7 family are significantly down-regulated at day 3 and the corresponding up-regulated genes are enriched in let-7 seed sequences. These results depict how miRNA expression changes may affect the expression of mRNAs involved in EB formation on a genome-wide scale. Understanding the regulatory effects of miRNAs during EB formation may enable more efficient derivation of different cell types in culture.

    Funded by: Wellcome Trust

    Gene expression patterns : GEP 2011;11;5-6;334-44

  • Dense genotyping identifies and localizes multiple common and rare variant association signals in celiac disease.

    Trynka G, Hunt KA, Bockett NA, Romanos J, Mistry V, Szperl A, Bakker SF, Bardella MT, Bhaw-Rosun L, Castillejo G, de la Concha EG, de Almeida RC, Dias KR, van Diemen CC, Dubois PC, Duerr RH, Edkins S, Franke L, Fransen K, Gutierrez J, Heap GA, Hrdlickova B, Hunt S, Plaza Izurieta L, Izzo V, Joosten LA, Langford C, Mazzilli MC, Mein CA, Midah V, Mitrovic M, Mora B, Morelli M, Nutland S, Núñez C, Onengut-Gumuscu S, Pearce K, Platteel M, Polanco I, Potter S, Ribes-Koninckx C, Ricaño-Ponce I, Rich SS, Rybak A, Santiago JL, Senapati S, Sood A, Szajewska H, Troncone R, Varadé J, Wallace C, Wolters VM, Zhernakova A, Spanish Consortium on the Genetics of Coeliac Disease (CEGEC), PreventCD Study Group, Wellcome Trust Case Control Consortium (WTCCC), Thelma BK, Cukrowska B, Urcelay E, Bilbao JR, Mearin ML, Barisani D, Barrett JC, Plagnol V, Deloukas P, Wijmenga C and van Heel DA

    Genetics Department, University Medical Center and University of Groningen, The Netherlands.

    Using variants from the 1000 Genomes Project pilot European CEU dataset and data from additional resequencing studies, we densely genotyped 183 non-HLA risk loci previously associated with immune-mediated diseases in 12,041 individuals with celiac disease (cases) and 12,228 controls. We identified 13 new celiac disease risk loci reaching genome-wide significance, bringing the number of known loci (including the HLA locus) to 40. We found multiple independent association signals at over one-third of these loci, a finding that is attributable to a combination of common, low-frequency and rare genetic variants. Compared to previously available data such as those from HapMap3, our dense genotyping in a large sample collection provided a higher resolution of the pattern of linkage disequilibrium and suggested localization of many signals to finer scale regions. In particular, 29 of the 54 fine-mapped signals seemed to be localized to single genes and, in some instances, to gene regulatory elements. Altogether, we define the complex genetic architecture of the risk regions of and refine the risk signals for celiac disease, providing the next step toward uncovering the causal mechanisms of the disease.

    Funded by: Medical Research Council: G0000934, G0700545, G1001158, G1001158(95979), G1001799; NCATS NIH HHS: UL1 TR000005; NCI NIH HHS: 1R01CA141743, R01 CA141743; NIDDK NIH HHS: U01 DK062418, U01-DK062418; Wellcome Trust: 068545/Z/02, 076113/C/04/Z, 084743

    Nature genetics 2011;43;12;1193-201

  • Genome watch: Honey, I shrunk the mimiviral genome.

    Tsai IJ

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    This month's Genome Watch describes how the large size of the mimiviral genome is a result of the sympatric lifestyle of mimivirus in host amoebae.

    Nature reviews. Microbiology 2011;9;8;563

  • A British approach to sampling.

    Tyler-Smith C and Xue Y

    Funded by: Wellcome Trust: 077009

    European journal of human genetics : EJHG 2011;20;2;129-30

  • Dlg3 trafficking and apical tight junction formation is regulated by nedd4 and nedd4-2 e3 ubiquitin ligases.

    Van Campenhout CA, Eitelhuber A, Gloeckner CJ, Giallonardo P, Gegg M, Oller H, Grant SG, Krappmann D, Ueffing M and Lickert H

    Institute of Stem Cell Research, Helmholtz Zentrum München, 85764 Neuherberg, Germany.

    The Drosophila Discs large (Dlg) scaffolding protein acts as a tumor suppressor regulating basolateral epithelial polarity and proliferation. In mammals, four Dlg homologs have been identified; however, their functions in cell polarity remain poorly understood. Here, we demonstrate that the X-linked mental retardation gene product Dlg3 contributes to apical-basal polarity and epithelial junction formation in mouse organizer tissues, as well as to planar cell polarity in the inner ear. We purified complexes associated with Dlg3 in polarized epithelial cells, including proteins regulating directed trafficking and tight junction formation. Remarkably, of the four Dlg family members, Dlg3 exerts a distinct function by recruiting the ubiquitin ligases Nedd4 and Nedd4-2 through its PPxY motifs. We found that these interactions are required for Dlg3 monoubiquitination, apical membrane recruitment, and tight junction consolidation. Our findings reveal an unexpected evolutionary diversification of the vertebrate Dlg family in basolateral epithelium formation.

    Funded by: European Research Council: 242807

    Developmental cell 2011;21;3;479-91

  • Acute sensitivity of the oral mucosa to oncogenic K-ras.

    van der Weyden L, Alcolea MP, Jones PH, Rust AG, Arends MJ and Adams DJ

    Experimental Cancer Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1HH, UK.

    Mouse models of cancer represent powerful tools for analysing the role of genetic alterations in carcinogenesis. Using a mouse model that allows tamoxifen-inducible somatic activation (by Cre-mediated recombination) of oncogenic K-ras(G12D) in a wide range of tissues, we observed hyperplasia of squamous epithelium located in moist or frequently abraded mucosa, with the most dramatic effects in the oral mucosa. This epithelium showed a sequence of squamous hyperplasia followed by squamous papilloma with dysplasia, in which some areas progressed to early invasive squamous cell carcinoma, within 14 days of widespread oncogenic K-ras activation. The marked proliferative response of the oral mucosa to K-ras(G12D) was most evident in the basal layers of the squamous epithelium of the outer lip with hair follicles and wet mucosal surface, with these cells staining positively for pAKT and cyclin D1, showing Ras/AKT pathway activation and increased proliferation with Ki-67 and EdU positivity. The stromal cells also showed gene activation by recombination and immunopositivity for pERK indicating K-Ras/ERK pathway activation, but without Ki-67 positivity or increase in stromal proliferation. The oral neoplasms showed changes in the expression pattern of cytokeratins (CK6 and CK13), similar to those observed in human oral tumours. Sporadic activation of the K-ras(G12D) allele (due to background spontaneous recombination in occasional cells) resulted in the development of benign oral squamous papillomas only showing a mild degree of dysplasia with no invasion. In summary, we show that oral mucosa is acutely sensitive to oncogenic K-ras, as widespread expression of activated K-ras in the murine oral mucosal squamous epithelium and underlying stroma can drive the oral squamous papilloma-carcinoma sequence.

    Funded by: Cancer Research UK: 13031; Medical Research Council: MC_U105370181; Wellcome Trust

    The Journal of pathology 2011;224;1;22-32

  • Modeling the evolution of ETV6-RUNX1-induced B-cell precursor acute lymphoblastic leukemia in mice.

    van der Weyden L, Giotopoulos G, Rust AG, Matheson LS, van Delft FW, Kong J, Corcoran AE, Greaves MF, Mullighan CG, Huntly BJ and Adams DJ

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.

    The t(12;21) translocation that generates the ETV6-RUNX1 (TEL-AML1) fusion gene, is the most common chromosomal rearrangement in childhood cancer and is exclusively associated with B-cell precursor acute lymphoblastic leukemia (BCP-ALL). The translocation arises in utero and is necessary but insufficient for the development of leukemia. Single-nucleotide polymorphism array analysis of ETV6-RUNX1 patient samples has identified multiple additional genetic alterations; however, the role of these lesions in leukemogenesis remains undetermined. Moreover, murine models of ETV6-RUNX1 ALL that faithfully recapitulate the human disease are lacking. To identify novel genes that cooperate with ETV6-RUNX1 in leukemogenesis, we generated a mouse model that uses the endogenous Etv6 locus to coexpress the Etv6-RUNX1 fusion and Sleeping Beauty transposase. An insertional mutagenesis screen was performed by intercrossing these mice with those carrying a Sleeping Beauty transposon array. In contrast to previous models, a substantial proportion (20%) of the offspring developed BCP-ALL. Isolation of the transposon insertion sites identified genes known to be associated with BCP-ALL, including Ebf1 and Epor, in addition to other novel candidates. This is the first mouse model of ETV6-RUNX1 to develop BCP-ALL and provides important insight into the cooperating genetic alterations in ETV6-RUNX1 leukemia.

    Funded by: Biotechnology and Biological Sciences Research Council; Cancer Research UK: 13031, A12401; Medical Research Council: G116/187; Wellcome Trust: 082356

    Blood 2011;118;4;1041-51

  • The mouse genetics toolkit: revealing function and mechanism.

    van der Weyden L, White JK, Adams DJ and Logan DW

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Large-scale projects are providing rapid global access to a wealth of mouse genetic resources to help discover disease genes and to manipulate their function.

    Funded by: Cancer Research UK: 13031; Medical Research Council: G0800024

    Genome biology 2011;12;6;224

  • Exome sequencing identifies frequent mutation of the SWI/SNF complex gene PBRM1 in renal carcinoma.

    Varela I, Tarpey P, Raine K, Huang D, Ong CK, Stephens P, Davies H, Jones D, Lin ML, Teague J, Bignell G, Butler A, Cho J, Dalgliesh GL, Galappaththige D, Greenman C, Hardy C, Jia M, Latimer C, Lau KW, Marshall J, McLaren S, Menzies A, Mudie L, Stebbings L, Largaespada DA, Wessels LF, Richard S, Kahnoski RJ, Anema J, Tuveson DA, Perez-Mancera PA, Mustonen V, Fischer A, Adams DJ, Rust A, Chan-on W, Subimerb C, Dykema K, Furge K, Campbell PJ, Teh BT, Stratton MR and Futreal PA

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    The genetics of renal cancer is dominated by inactivation of the VHL tumour suppressor gene in clear cell carcinoma (ccRCC), the commonest histological subtype. A recent large-scale screen of ∼3,500 genes by PCR-based exon re-sequencing identified several new cancer genes in ccRCC including UTX (also known as KDM6A), JARID1C (also known as KDM5C) and SETD2 (ref. 2). These genes encode enzymes that demethylate (UTX, JARID1C) or methylate (SETD2) key lysine residues of histone H3. Modification of the methylation state of these lysine residues of histone H3 regulates chromatin structure and is implicated in transcriptional control. However, together these mutations are present in fewer than 15% of ccRCC, suggesting the existence of additional, currently unidentified cancer genes. Here, we have sequenced the protein coding exome in a series of primary ccRCC and report the identification of the SWI/SNF chromatin remodelling complex gene PBRM1 (ref. 4) as a second major ccRCC cancer gene, with truncating mutations in 41% (92/227) of cases. These data further elucidate the somatic genetic architecture of ccRCC and emphasize the marked contribution of aberrant chromatin biology.

    Funded by: Cancer Research UK; NCI NIH HHS: R01 CA113636, R01 CA134759; Wellcome Trust: 077012, 077012/Z/05/Z, 088340, 093867

    Nature 2011;469;7331;539-42

  • Mutant nucleophosmin and cooperating pathways drive leukemia initiation and progression in mice.

    Vassiliou GS, Cooper JL, Rad R, Li J, Rice S, Uren A, Rad L, Ellis P, Andrews R, Banerjee R, Grove C, Wang W, Liu P, Wright P, Arends M and Bradley A

    Mouse Genomics Team, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Acute myeloid leukemia (AML) is a molecularly diverse malignancy with a poor prognosis whose largest subgroup is characterized by somatic mutations in NPM1, which encodes nucleophosmin. These mutations, termed NPM1c, result in cytoplasmic dislocation of nucleophosmin and are associated with distinctive transcriptional signatures, yet their role in leukemogenesis remains obscure. Here we report that activation of a humanized Npm1c knock-in allele in mouse hemopoietic stem cells causes Hox gene overexpression, enhanced self renewal and expanded myelopoiesis. One third of mice developed delayed-onset AML, suggesting a requirement for cooperating mutations. We identified such mutations using a Sleeping Beauty transposon, which caused rapid-onset AML in 80% of mice with Npm1c, associated with mutually exclusive integrations in Csf2, Flt3 or Rasgrp1 in 55 of 70 leukemias. We also identified recurrent integrations in known and newly discovered leukemia genes including Nf1, Bach2, Dleu2 and Nup98. Our results provide new pathogenetic insights and identify possible therapeutic targets in NPM1c+ AML.

    Funded by: Cancer Research UK: A7273; Medical Research Council: MC_UP_A652_1001

    Nature genetics 2011;43;5;470-5

  • Genome-wide association study identifies six new loci influencing pulse pressure and mean arterial pressure.

    Wain LV, Verwoert GC, O'Reilly PF, Shi G, Johnson T, Johnson AD, Bochud M, Rice KM, Henneman P, Smith AV, Ehret GB, Amin N, Larson MG, Mooser V, Hadley D, Dörr M, Bis JC, Aspelund T, Esko T, Janssens AC, Zhao JH, Heath S, Laan M, Fu J, Pistis G, Luan J, Arora P, Lucas G, Pirastu N, Pichler I, Jackson AU, Webster RJ, Zhang F, Peden JF, Schmidt H, Tanaka T, Campbell H, Igl W, Milaneschi Y, Hottenga JJ, Vitart V, Chasman DI, Trompet S, Bragg-Gresham JL, Alizadeh BZ, Chambers JC, Guo X, Lehtimäki T, Kühnel B, Lopez LM, Polašek O, Boban M, Nelson CP, Morrison AC, Pihur V, Ganesh SK, Hofman A, Kundu S, Mattace-Raso FU, Rivadeneira F, Sijbrands EJ, Uitterlinden AG, Hwang SJ, Vasan RS, Wang TJ, Bergmann S, Vollenweider P, Waeber G, Laitinen J, Pouta A, Zitting P, McArdle WL, Kroemer HK, Völker U, Völzke H, Glazer NL, Taylor KD, Harris TB, Alavere H, Haller T, Keis A, Tammesoo ML, Aulchenko Y, Barroso I, Khaw KT, Galan P, Hercberg S, Lathrop M, Eyheramendy S, Org E, Sõber S, Lu X, Nolte IM, Penninx BW, Corre T, Masciullo C, Sala C, Groop L, Voight BF, Melander O, O'Donnell CJ, Salomaa V, d'Adamo AP, Fabretto A, Faletra F, Ulivi S, Del Greco F, Facheris M, Collins FS, Bergman RN, Beilby JP, Hung J, Musk AW, Mangino M, Shin SY, Soranzo N, Watkins H, Goel A, Hamsten A, Gider P, Loitfelder M, Zeginigg M, Hernandez D, Najjar SS, Navarro P, Wild SH, Corsi AM, Singleton A, de Geus EJ, Willemsen G, Parker AN, Rose LM, Buckley B, Stott D, Orru M, Uda M, LifeLines Cohort Study, van der Klauw MM, Zhang W, Li X, Scott J, Chen YD, Burke GL, Kähönen M, Viikari J, Döring A, Meitinger T, Davies G, Starr JM, Emilsson V, Plump A, Lindeman JH, Hoen PA, König IR, EchoGen consortium, Felix JF, Clarke R, Hopewell JC, Ongen H, Breteler M, Debette S, Destefano AL, Fornage M, AortaGen Consortium, Mitchell GF, CHARGE Consortium Heart Failure Working Group, Smith NL, KidneyGen consortium, Holm H, Stefansson K, Thorleifsson G, Thorsteinsdottir U, CKDGen consortium, Cardiogenics consortium, CardioGram, Samani NJ, Preuss M, Rudan I, Hayward C, Deary IJ, Wichmann HE, Raitakari OT, Palmas W, Kooner JS, Stolk RP, Jukema JW, Wright AF, Boomsma DI, Bandinelli S, Gyllensten UB, Wilson JF, Ferrucci L, Schmidt R, Farrall M, Spector TD, Palmer LJ, Tuomilehto J, Pfeufer A, Gasparini P, Siscovick D, Altshuler D, Loos RJ, Toniolo D, Snieder H, Gieger C, Meneton P, Wareham NJ, Oostra BA, Metspalu A, Launer L, Rettig R, Strachan DP, Beckmann JS, Witteman JC, Erdmann J, van Dijk KW, Boerwinkle E, Boehnke M, Ridker PM, Jarvelin MR, Chakravarti A, Abecasis GR, Gudnason V, Newton-Cheh C, Levy D, Munroe PB, Psaty BM, Caulfield MJ, Rao DC, Tobin MD, Elliott P and van Duijn CM

    Department of Health Sciences, University of Leicester, Leicester, UK.

    Numerous genetic loci have been associated with systolic blood pressure (SBP) and diastolic blood pressure (DBP) in Europeans. We now report genome-wide association studies of pulse pressure (PP) and mean arterial pressure (MAP). In discovery (N = 74,064) and follow-up studies (N = 48,607), we identified at genome-wide significance (P = 2.7 × 10(-8) to P = 2.3 × 10(-13)) four new PP loci (at 4q12 near CHIC2, 7q22.3 near PIK3CG, 8q24.12 in NOV and 11q24.3 near ADAMTS8), two new MAP loci (3p21.31 in MAP4 and 10q25.3 near ADRB1) and one locus associated with both of these traits (2q24.3 near FIGN) that has also recently been associated with SBP in east Asians. For three of the new PP loci, the estimated effect for SBP was opposite of that for DBP, in contrast to the majority of common SBP- and DBP-associated variants, which show concordant effects on both traits. These findings suggest new genetic pathways underlying blood pressure variation, some of which may differentially influence SBP and DBP.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F019394/1; Chief Scientist Office: CZB/4/505, CZB/4/710, ETM/55; Intramural NIH HHS: Z01 HG000024-13; Medical Research Council: G0401527, G0601966, G0700704, G0700931, G0701863, G0801056, G0902313, G1000143, G9521010, MC_PC_U127561128, MC_PC_U127592696, MC_U106179471, MC_U106188470, MC_U127561128, MC_U127592696, MC_U137686857; NHLBI NIH HHS: K23 HL080025, N01 HC025195, N01 HC055015, N01 HC085079, N01 HC095159, R01 HL043851, R01 HL086694, R01 HL105756, U10 HL054512; NIA NIH HHS: N01 AG012109; NIMHD NIH HHS: 263 MD9164 13; Wellcome Trust: 090532

    Nature genetics 2011;43;10;1005-11

  • Dominant and diet-responsive groups of bacteria within the human colonic microbiota.

    Walker AW, Ince J, Duncan SH, Webster LM, Holtrop G, Ze X, Brown D, Stares MD, Scott P, Bergerat A, Louis P, McIntosh F, Johnstone AM, Lobley GE, Parkhill J and Flint HJ

    Pathogen Genomics, Wellcome Trust Sanger Institute, Cambridge, UK.

    The populations of dominant species within the human colonic microbiota can potentially be modified by dietary intake with consequences for health. Here we examined the influence of precisely controlled diets in 14 overweight men. Volunteers were provided successively with a control diet, diets high in resistant starch (RS) or non-starch polysaccharides (NSPs) and a reduced carbohydrate weight loss (WL) diet, over 10 weeks. Analysis of 16S rRNA sequences in stool samples of six volunteers detected 320 phylotypes (defined at >98% identity) of which 26, including 19 cultured species, each accounted for >1% of sequences. Although samples clustered more strongly by individual than by diet, time courses obtained by targeted qPCR revealed that 'blooms' in specific bacterial groups occurred rapidly after a dietary change. These were rapidly reversed by the subsequent diet. Relatives of Ruminococcus bromii (R-ruminococci) increased in most volunteers on the RS diet, accounting for a mean of 17% of total bacteria compared with 3.8% on the NSP diet, whereas the uncultured Oscillibacter group increased on the RS and WL diets. Relatives of Eubacterium rectale increased on RS (to mean 10.1%) but decreased, along with Collinsella aerofaciens, on WL. Inter-individual variation was marked, however, with >60% of RS remaining unfermented in two volunteers on the RS diet, compared to <4% in the other 12 volunteers; these two individuals also showed low numbers of R-ruminococci (<1%). Dietary non-digestible carbohydrate can produce marked changes in the gut microbiota, but these depend on the initial composition of an individual's gut microbiota.

    Funded by: Wellcome Trust: 076964, WT 76964

    The ISME journal 2011;5;2;220-30

  • High-throughput clone library analysis of the mucosa-associated microbiota reveals dysbiosis and differences between inflamed and non-inflamed regions of the intestine in inflammatory bowel disease.

    Walker AW, Sanderson JD, Churcher C, Parkes GC, Hudspith BN, Rayment N, Brostoff J, Parkhill J, Dougan G and Petrovska L

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Background: The gut microbiota is thought to play a key role in the development of the inflammatory bowel diseases Crohn's disease (CD) and ulcerative colitis (UC). Shifts in the composition of resident bacteria have been postulated to drive the chronic inflammation seen in both diseases (the "dysbiosis" hypothesis). We therefore specifically sought to compare the mucosa-associated microbiota from both inflamed and non-inflamed sites of the colon in CD and UC patients to that from non-IBD controls and to detect disease-specific profiles.

    Results: Paired mucosal biopsies of inflamed and non-inflamed intestinal tissue from 6 CD (n = 12) and 6 UC (n = 12) patients were compared to biopsies from 5 healthy controls (n = 5) by in-depth sequencing of over 10,000 near full-length bacterial 16S rRNA genes. The results indicate that mucosal microbial diversity is reduced in IBD, particularly in CD, and that the species composition is disturbed. Firmicutes were reduced in IBD samples and there were concurrent increases in Bacteroidetes, and in CD only, Enterobacteriaceae. There were also significant differences in microbial community structure between inflamed and non-inflamed mucosal sites. However, these differences varied greatly between individuals, meaning there was no obvious bacterial signature that was positively associated with the inflamed gut.

    Conclusions: These results may support the hypothesis that the overall dysbiosis observed in inflammatory bowel disease patients relative to non-IBD controls might to some extent be a result of the disturbed gut environment rather than the direct cause of disease. Nonetheless, the observed shifts in microbiota composition may be important factors in disease maintenance and severity.

    Funded by: Wellcome Trust: WT076964

    BMC microbiology 2011;11;7

  • Rapid and efficient reprogramming of somatic cells to induced pluripotent stem cells by retinoic acid receptor gamma and liver receptor homolog 1.

    Wang W, Yang J, Liu H, Lu D, Chen X, Zenonos Z, Campos LS, Rad R, Guo G, Zhang S, Bradley A and Liu P

    Wellcome Trust Sanger Institute, Hinxton CB10 1HH, United Kingdom.

    Somatic cells can be reprogrammed to induced pluripotent stem cells (iPSCs) by expressing four transcription factors: Oct4, Sox2, Klf4, and c-Myc. Here we report that enhancing RA signaling by expressing RA receptors (RARs) or by RA agonists profoundly promoted reprogramming, but inhibiting it using a RAR-α dominant-negative form completely blocked it. Coexpressing Rarg (RAR-γ) and Lrh-1 (liver receptor homologue 1; Nr5a2) with the four factors greatly accelerated reprogramming so that reprogramming of mouse embryonic fibroblast cells to ground-state iPSCs requires only 4 d induction of these six factors. The six-factor combination readily reprogrammed primary human neonatal and adult fibroblast cells to exogenous factor-independent iPSCs, which resembled ground-state mouse ES cells in growth properties, gene expression, and signaling dependency. Our findings demonstrate that signaling through RARs has critical roles in molecular reprogramming and that the synergistic interaction between Rarg and Lrh1 directs reprogramming toward ground-state pluripotency. The human iPSCs described here should facilitate functional analysis of the human genome.

    Funded by: Medical Research Council: G0700665; Wellcome Trust: 077186/Z/05/Z

    Proceedings of the National Academy of Sciences of the United States of America 2011;108;45;18283-8

  • Genome-wide association studies and type 2 diabetes.

    Wheeler E and Barroso I

    Wellcome Trust Sanger Institute, Cambridge, UK.

    In recent years, the search for genetic determinants of type 2 diabetes (T2D) has changed dramatically. Although linkage and small-scale candidate gene studies were highly successful in the identification of genes, which, when mutated, caused monogenic forms of T2D, they were largely unsuccessful when applied to the more common forms of the disease. To date, these approaches have only identified two loci (PPARG, KCNJ11) robustly implicated in T2D susceptibility. The ability to perform large-scale association analysis, including genome-wide association studies (GWAS) in many thousands of samples from different populations, and subsequently, the shift to form large international collaborations to perform meta-analyses across many studies has taken the number of independent loci showing genome-wide significant associations with T2D to 44. This number includes six loci identified initially through the analysis of quantitative glycaemic phenotypes, illustrating the usefulness of this approach both to identify new disease genes and gain insight into the mechanisms leading to disease. Combined, these loci still only account for ∼10% of the observed familial clustering in Europeans, leaving much of the variance unexplained. In this review, we will describe what GWAS have taught us about the genetic basis of T2D and discuss possible next steps to uncover the remaining heritability.

    Funded by: Wellcome Trust: 077016/Z/05/Z

    Briefings in functional genomics 2011;10;2;52-60

  • An Exceptional Gene: Evolution of the TSPY Gene Family in Humans and Other Great Apes.

    Xue Y and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambs. CB10 1SA, UK.

    The TSPY gene stands out from all other human protein-coding genes because of its high copy number and tandemly-repeated organization. Here, we review its evolutionary history in great apes in order to assess whether these unusual properties are more likely to result from a relaxation of constraint or an unusual functional role. Detailed comparisons with chimpanzee are possible because a finished sequence of the chimpanzee Y chromosome is available, together with more limited data from other apes. These comparisons suggest that the human-chimpanzee ancestral Y chromosome carried a tandem array of TSPY genes which expanded on the human lineage while undergoing multiple duplication events followed by pseudogene formation on the chimpanzee lineage. The protein coding region is the most highly conserved of the multi-copy Y genes in human-chimpanzee comparisons, and the analysis of the dN/dS ratio indicates that TSPY is evolutionarily highly constrained, but may have experienced positive selection after the human-chimpanzee split. We therefore conclude that the exceptionally high copy number in humans is most likely due to a human-specific but unknown functional role, possibly involving rapid production of a large amount of TSPY protein at some stage during spermatogenesis.

    Genes 2011;2;1;36-47

  • Sequence-based characterization of structural variation in the mouse genome.

    Yalcin B, Wong K, Agam A, Goodson M, Keane TM, Gan X, Nellåker C, Goodstadt L, Nicod J, Bhomra A, Hernandez-Pliego P, Whitley H, Cleak J, Dutton R, Janowitz D, Mott R, Adams DJ and Flint J

    The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford OX3 7BN, UK.

    Structural variation is widespread in mammalian genomes and is an important cause of disease, but just how abundant and important structural variants (SVs) are in shaping phenotypic variation remains unclear. Without knowing how many SVs there are, and how they arise, it is difficult to discover what they do. Combining experimental with automated analyses, we identified 711,920 SVs at 281,243 sites in the genomes of thirteen classical and four wild-derived inbred mouse strains. The majority of SVs are less than 1 kilobase in size and 98% are deletions or insertions. The breakpoints of 160,000 SVs were mapped to base pair resolution, allowing us to infer that insertion of retrotransposons causes more than half of SVs. Yet, despite their prevalence, SVs are less likely than other sequence variants to cause gene expression or quantitative phenotypic variation. We identified 24 SVs that disrupt coding exons, acting as rare variants of large effect on gene function. One-third of the genes so affected have immunological functions.

    Funded by: Cancer Research UK: 13031; Medical Research Council: G0800024, MC_EX_G0802457, MC_U137761446; Wellcome Trust: 079912, 082356, 090532, 098051

    Nature 2011;477;7364;326-9

  • Phase I trial of a selective c-MET inhibitor ARQ 197 incorporating proof of mechanism pharmacodynamic studies.

    Yap TA, Olmos D, Brunetto AT, Tunariu N, Barriuso J, Riisnaes R, Pope L, Clark J, Futreal A, Germuska M, Collins D, deSouza NM, Leach MO, Savage RE, Waghorne C, Chai F, Garmey E, Schwartz B, Kaye SB and de Bono JS

    Royal Marsden National Health Service Foundation Trust, The Institute of Cancer Research, Sutton, Surrey, UK.

    Purpose: The hepatocyte growth factor/c-MET axis is implicated in tumor cell proliferation, survival, and angiogenesis. ARQ 197 is an oral, selective, non-adenosine triphosphate competitive c-MET inhibitor. A phase I trial of ARQ 197 was conducted to assess safety, tolerability, and target inhibition, including intratumoral c-MET signaling, apoptosis, and angiogenesis.

    Patients and methods: Patients with solid tumors amenable to pharmacokinetic and pharmacodynamic studies using serial biopsies, dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI), and circulating endothelial cell (CEC) and circulating tumor cell (CTC) enumeration were enrolled.

    Results: Fifty-one patients received ARQ 197 at 100 to 400 mg twice per day. ARQ 197 was well tolerated, with the most common toxicities being grade 1 to 2 fatigue, nausea, and vomiting. Dose-limiting toxicities included grade 3 fatigue (200 mg twice per day; n = 1); grade 3 mucositis, palmar-plantar erythrodysesthesia, and hypokalemia (400 mg twice per day; n = 1); and grade 3 to 4 febrile neutropenia (400 mg twice per day, n = 2; 360 mg twice per day, n = 1). The recommended phase II dose was 360 mg twice per day. ARQ 197 systemic exposure was dose dependent and supported twice per day oral dosing. ARQ 197 decreased phosphorylated c-MET, total c-MET, and phosphorylated focal adhesion kinase and increased terminal deoxynucleotidyl transferase-mediated deoxyuridine triphosphate-biotin nick-end labeling (TUNEL) staining in tumor biopsies (n = 15). CECs decreased in 25 (58.1%) of 43 patients, but no significant changes in DCE-MRI parameters were observed after ARQ 197 treatment. Of 15 patients with detectable CTCs, eight (53.3%) had ≥ 30% decline in CTCs after treatment. Stable disease, as defined by Response Evaluation Criteria in Solid Tumors (RECIST), ≥ 4 months was observed in 14 patients, with minor regressions in gastric and Merkel cell cancers.

    Conclusion: ARQ 197 safely inhibited intratumoral c-MET signaling. Further clinical evaluation focusing on combination approaches, including an erlotinib combination in non-small-cell lung cancer, is ongoing.

    Funded by: Cancer Research UK: 10334, C1060/A10334; Department of Health; Medical Research Council; Wellcome Trust

    Journal of clinical oncology : official journal of the American Society of Clinical Oncology 2011;29;10;1271-9

  • WormBase 2012: more genomes, more data, new website.

    Yook K, Harris TW, Bieri T, Cabunoc A, Chan J, Chen WJ, Davis P, de la Cruz N, Duong A, Fang R, Ganesan U, Grove C, Howe K, Kadam S, Kishore R, Lee R, Li Y, Muller HM, Nakamura C, Nash B, Ozersky P, Paulini M, Raciti D, Rangarajan A, Schindelman G, Shi X, Schwarz EM, Ann Tuli M, Van Auken K, Wang D, Wang X, Williams G, Hodgkin J, Berriman M, Durbin R, Kersey P, Spieth J, Stein L and Sternberg PW

    Division of Biology 156-29, California Institute of Technology, Pasadena, CA 91125, USA.

    Since its release in 2000, WormBase ( has grown from a small resource focusing on a single species and serving a dedicated research community, to one now spanning 15 species essential to the broader biomedical and agricultural research fields. To enhance the rate of curation, we have automated the identification of key data in the scientific literature and use similar methodology for data extraction. To ease access to the data, we are collaborating with journals to link entities in research publications to their report pages at WormBase. To facilitate discovery, we have added new views of the data, integrated large-scale datasets and expanded descriptions of models for human disease. Finally, we have introduced a dramatic overhaul of the WormBase website for public beta testing. Designed to balance complexity and usability, the new site is species-agnostic, highly customizable, and interactive. Casual users and developers alike will be able to leverage the public RESTful application programming interface (API) to generate custom data mining solutions and extensions to the site. We report on the growth of our database and on our work in keeping pace with the growing demand for data, efforts to anticipate the requirements of users and new collaborations with the larger science community.

    Funded by: Howard Hughes Medical Institute; Medical Research Council: G070119, G0701197; NHGRI NIH HHS: P41 HG02223, P41-HG02223, U41 HG002223

    Nucleic acids research 2011;40;Database issue;D735-41

  • Targeted gene correction of α1-antitrypsin deficiency in induced pluripotent stem cells.

    Yusa K, Rashid ST, Strick-Marchand H, Varela I, Liu PQ, Paschon DE, Miranda E, Ordóñez A, Hannan NR, Rouhani FJ, Darche S, Alexander G, Marciniak SJ, Fusaki N, Hasegawa M, Holmes MC, Di Santo JP, Lomas DA, Bradley A and Vallier L

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Human induced pluripotent stem cells (iPSCs) represent a unique opportunity for regenerative medicine because they offer the prospect of generating unlimited quantities of cells for autologous transplantation, with potential application in treatments for a broad range of disorders. However, the use of human iPSCs in the context of genetically inherited human disease will require the correction of disease-causing mutations in a manner that is fully compatible with clinical applications. The methods currently available, such as homologous recombination, lack the necessary efficiency and also leave residual sequences in the targeted genome. Therefore, the development of new approaches to edit the mammalian genome is a prerequisite to delivering the clinical promise of human iPSCs. Here we show that a combination of zinc finger nucleases (ZFNs) and piggyBac technology in human iPSCs can achieve biallelic correction of a point mutation (Glu342Lys) in the α(1)-antitrypsin (A1AT, also known as SERPINA1) gene that is responsible for α(1)-antitrypsin deficiency. Genetic correction of human iPSCs restored the structure and function of A1AT in subsequently derived liver cells in vitro and in vivo. This approach is significantly more efficient than any other gene-targeting technology that is currently available and crucially prevents contamination of the host genome with residual non-human sequences. Our results provide the first proof of principle, to our knowledge, for the potential of combining human iPSCs with genetic correction to generate clinically relevant cells for autologous cell-based therapies.

    Funded by: Medical Research Council: G0601840, G0701448, G0800784, G0901786, G1000847; Wellcome Trust: 077187

    Nature 2011;478;7369;391-4

  • A hyperactive piggyBac transposase for mammalian applications.

    Yusa K, Zhou L, Li MA, Bradley A and Craig NL

    Wellcome Trust Sanger Institute, Cambridge CB10 1SA, United Kingdom.

    DNA transposons have been widely used for transgenesis and insertional mutagenesis in various organisms. Among the transposons active in mammalian cells, the moth-derived transposon piggyBac is most promising with its highly efficient transposition, large cargo capacity, and precise repair of the donor site. Here we report the generation of a hyperactive piggyBac transposase. The active transposition of piggyBac in multiple organisms allowed us to screen a transposase mutant library in yeast for hyperactive mutants and then to test candidates in mouse ES cells. We isolated 18 hyperactive mutants in yeast, among which five were also hyperactive in mammalian cells. By combining all mutations, a total of 7 aa substitutions, into a single reading frame, we generated a unique hyperactive piggyBac transposase with 17-fold and ninefold increases in excision and integration, respectively. We showed its applicability by demonstrating an increased efficiency of generation of transgene-free mouse induced pluripotent stem cells. We also analyzed whether this hyperactive piggyBac transposase affects the genomic integrity of the host cells. The frequency of footprints left by the hyperactive piggyBac transposase was as low as WT transposase (~1%) and we found no evidence that the expression of the transposase affects genomic integrity. This hyperactive piggyBac transposase expands the utility of the piggyBac transposon for applications in mammalian genetics and gene therapy.

    Funded by: Howard Hughes Medical Institute; Wellcome Trust: WT077187

    Proceedings of the National Academy of Sciences of the United States of America 2011;108;4;1531-6

  • Meta analysis of candidate gene variants outside the LPA locus with Lp(a) plasma levels in 14,500 participants of six White European cohorts.

    Zabaneh D, Kumari M, Sandhu M, Wareham N, Wainwright N, Papamarkou T, Hopewell J, Clarke R, Li K, Palmen J, Talmud PJ, Kronenberg F, Lamina C, Summerer M, Paulweber B, Price J, Fowkes G, Stewart M, Drenos F, Shah S, Shah T, Casas JP, Kivimaki M, Whittaker J, Hingorani AD and Humphries SE

    University College London Genetics Institute, Department of Genetics, Environment and Evolution, Gower St, London WC1E 6BT, UK.

    Background: Both genome-wide association studies and candidate gene studies have reported that the major determinant of plasma levels of the Lipoprotein (a) [Lp(a)] reside within the LPA locus on chromosome 6. We have used data from the HumanCVD BeadChip to explore the contribution of other candidate genes determining Lp(a) levels.

    Methods: 48,032 single nucleotide polymorphisms (SNPs) from the Illumina HumanCVD BeadChip were genotyped in 5059 participants of the Whitehall II study (WHII) of randomly ascertained healthy men and women. SNPs showing association with Lp(a) levels of p<10(-4) outside the LPA locus were selected for replication in a total of an additional 9463 participants of five European based studies (EAS, EPIC-Norfolk, NPHSII, PROCARDIS, and SAPHIR).

    Results: In Whitehall II, apart from the LPA locus (where p values for several SNPs were <10(-30)) there was significant association at four loci GALNT2, FABP1, PPARGC1A and TNFRSFF11A. However, a meta-analysis of the six studies did not confirm any of these findings.

    Conclusion: Results from this meta analysis of 14,522 participants revealed no candidate genes from the HumanCVD BeadChip outside the LPA locus to have an effect on Lp(a) levels. Further studies with genome-wide and denser SNP coverage are required to confirm or refute this finding.

    Funded by: AHRQ HHS: HS06516; British Heart Foundation: PG/07/133/24260, RG/08/008, SP/07/007/23671; Cancer Research UK; Department of Health; Medical Research Council; NHLBI NIH HHS: HL36310; NIA NIH HHS: AG13196; Wellcome Trust

    Atherosclerosis 2011;217;2;447-51

  • Genetic and structural variation in the gastric cancer kinome revealed through targeted deep sequencing.

    Zang ZJ, Ong CK, Cutcutache I, Yu W, Zhang SL, Huang D, Ler LD, Dykema K, Gan A, Tao J, Lim S, Liu Y, Futreal PA, Grabsch H, Furge KA, Goh LK, Rozen S, Teh BT and Tan P

    Cellular and Molecular Research, National Cancer Centre, Singapore.

    Genetic alterations in kinases have been linked to multiple human pathologies. To explore the landscape of kinase genetic variation in gastric cancer (GC), we used targeted, paired-end deep sequencing to analyze 532 protein and phosphoinositide kinases in 14 GC cell lines. We identified 10,604 single-nucleotide variants (SNV) in kinase exons including greater than 300 novel nonsynonymous SNVs. Family-wise analysis of the nonsynonymous SNVs revealed a significant enrichment in mitogen-activated protein kinase (MAPK)-related genes (P < 0.01), suggesting a preferential involvement of this kinase family in GC. A potential antioncogenic role for MAP2K4, a gene exhibiting recurrent alterations in 2 lines, was functionally supported by siRNA knockdown and overexpression studies in wild-type and MAP2K4 variant lines. The deep sequencing data also revealed novel, large-scale structural rearrangement events involving kinases including gene fusions involving CDK12 and the ERBB2 receptor tyrosine kinase in MKN7 cells. Integrating SNVs and copy number alterations, we identified Hs746T as a cell line exhibiting both splice-site mutations and genomic amplification of MET, resulting in MET protein overexpression. When applied to primary GCs, we identified somatic mutations in 8 kinases, 4 of which were recurrently altered in both primary tumors and cell lines (MAP3K6, STK31, FER, and CDKL5). These results demonstrate that how targeted deep sequencing approaches can deliver unprecedented multilevel characterization of a medically and pharmacologically relevant gene family. The catalog of kinome genetic variants assembled here may broaden our knowledge on kinases and provide useful information on genetic alterations in GC.

    Cancer research 2011;71;1;29-39

  • Animals learn new tricks from microorganisms.

    Zarowiecki M

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Nature reviews. Microbiology 2011;9;12;836

  • Next-generation association studies for complex traits.

    Zeggini E

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    A new study successfully applies complementary whole-genome sequencing and imputation approaches to establish robust disease associations in an isolated population. This strategy is poised to help elucidate the role of variants at the low end of the allele frequency spectrum in the genetic architecture of complex traits.

    Funded by: Wellcome Trust: 088885

    Nature genetics 2011;43;4;287-8

  • An evaluation of power to detect low-frequency variant associations using allele-matching tests that account for uncertainty.

    Zeggini E and Asimit JL

    Wellcome Trust Sanger Institute, Hinxton, CB10 1HH, UK.

    There is growing interest in the role of rare variants in multifactorial disease etiology, and increasing evidence that rare variants are associated with complex traits. Single SNP tests are underpowered in rare variant association analyses, so locus-based tests must be used. Quality scores at both the SNP and genotype level are available for sequencing data and they are rarely accounted for. A locus-based method that has high power in the presence of rare variants is extended to incorporate such quality scores as weights, and its power is compared with the original method via a simulation study. Preliminary results suggest that taking uncertainty into account does not improve the power.

    Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing 2011;100-5

  • Replication of the association of a MET variant with autism in a Chinese Han population.

    Zhou X, Xu Y, Wang J, Zhou H, Liu X, Ayub Q, Wang X, Tyler-Smith C, Wu L and Xue Y

    Department of Children's and Adolescent Health, Public Health College of Harbin Medical University, Harbin, Heilongjiang, People's Republic of China.

    Background: Autism is a common, severe and highly heritable neurodevelopmental disorder in children, affecting up to 100 children per 10,000. The MET gene has been regarded as a promising candidate gene for this disorder because it is located within a replicated linkage interval, is involved in pathways affecting the development of the cerebral cortex and cerebellum in ways relevant to autism patients, and has shown significant association signals in previous studies.

    Principal findings: Here, we present new ASD patient and control samples from Heilongjiang, China and use them in a case-control and family-based replication study of two MET variants. One SNP, rs38845, was successfully replicated in a case-control association study, but failed to replicate in a family-based study, possibly due to small sample size. The other SNP, rs1858830, failed to replicate in both case-control and family-based studies.

    Conclusions: This is the first attempt to replicate associations in Chinese autism samples, and our result provides evidence that MET variants may be relevant to autism susceptibility in the Chinese Han population.

    Funded by: Wellcome Trust

    PloS one 2011;6;11;e27428

  • The Lin28/let-7 axis regulates glucose metabolism.

    Zhu H, Shyh-Chang N, Segrè AV, Shinoda G, Shah SP, Einhorn WS, Takeuchi A, Engreitz JM, Hagan JP, Kharas MG, Urbach A, Thornton JE, Triboulet R, Gregory RI, DIAGRAM Consortium, MAGIC Investigators, Altshuler D and Daley GQ

    Stem Cell Transplantation Program, Division of Pediatric Hematology/Oncology, Children's Hospital Boston and Dana Farber Cancer Institute, Boston, MA, USA.

    The let-7 tumor suppressor microRNAs are known for their regulation of oncogenes, while the RNA-binding proteins Lin28a/b promote malignancy by inhibiting let-7 biogenesis. We have uncovered unexpected roles for the Lin28/let-7 pathway in regulating metabolism. When overexpressed in mice, both Lin28a and LIN28B promote an insulin-sensitized state that resists high-fat-diet induced diabetes. Conversely, muscle-specific loss of Lin28a or overexpression of let-7 results in insulin resistance and impaired glucose tolerance. These phenomena occur, in part, through the let-7-mediated repression of multiple components of the insulin-PI3K-mTOR pathway, including IGF1R, INSR, and IRS2. In addition, the mTOR inhibitor, rapamycin, abrogates Lin28a-mediated insulin sensitivity and enhanced glucose uptake. Moreover, let-7 targets are enriched for genes containing SNPs associated with type 2 diabetes and control of fasting glucose in human genome-wide association studies. These data establish the Lin28/let-7 pathway as a central regulator of mammalian glucose metabolism.

    Funded by: British Heart Foundation: RG/07/008/23674; Howard Hughes Medical Institute; Medical Research Council: G0100222, G0902037, G19/35, G8802774, MC_PC_U127561128, MC_U127561128, MC_UP_A100_1003, MC_UP_A620_1015; NCI NIH HHS: K08 CA157727, T32 CA009172; NIDDK NIH HHS: P30 DK079637, R01 DK070055; Wellcome Trust: 090532

    Cell 2011;147;1;81-94